Abstract:
BACKGROUND:Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles. RESULTS:We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information. CONCLUSION:We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Yamada KD,Kinoshita Kdoi
10.1186/s12859-018-2284-1subject
Has Abstractpub_date
2018-07-18 00:00:00pages
272issue
1issn
1471-2105pii
10.1186/s12859-018-2284-1journal_volume
19pub_type
杂志文章abstract:BACKGROUND:Failure on Highly Active Anti-Retroviral Treatment is often accompanied with development of antiviral resistance to one or more drugs included in the treatment. In general, the virus is more likely to develop resistance to drugs with a lower genetic barrier. Previously, we developed a method to reverse engin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-409
更新日期:2010-08-03 00:00:00
abstract:BACKGROUND:Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-282
更新日期:2011-07-12 00:00:00
abstract:BACKGROUND:Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-238
更新日期:2010-05-11 00:00:00
abstract::DNA methylation exhibits different patterns in different cancers. DNA methylation rates at different genomic loci appear to be highly correlated in some samples but not in others. We call such phenomena conditional concordant relationships (CCRs). In this study, we explored DNA methylation patterns in 12 common cancer...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S13-S7
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Infections are often associated to comorbidity that increases the risk of medical conditions which can lead to further morbidity and mortality. SARS is a threat which is similar to MERS virus, but the comorbidity is the key aspect to underline their different impacts. One UK doctor says "I'd rather have HIV ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-333
更新日期:2014-10-24 00:00:00
abstract:BACKGROUND:Current microRNA (miRNA) research in progress has engendered rapid accumulation of expression data evolving from microarray experiments. Such experiments are generally performed over different tissues belonging to a specific species of metazoan. For disease diagnosis, microarray probes are also prepared with...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-163
更新日期:2009-05-28 00:00:00
abstract:BACKGROUND:Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS:We present an R/Bioconductor port of a fast novel algorithm for...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-242
更新日期:2009-08-06 00:00:00
abstract:BACKGROUND:It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opp...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-206
更新日期:2012-08-17 00:00:00
abstract:BACKGROUND:It has long been recognized that sensitivity analysis plays a key role in modeling and analyzing cellular and biochemical processes. Systems biology markup language (SBML) has become a well-known platform for coding and sharing mathematical models of such processes. However, current SBML compatible software ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-342
更新日期:2008-08-15 00:00:00
abstract:BACKGROUND:Cell direct reprogramming technology has been rapidly developed with its low risk of tumor risk and avoidance of ethical issues caused by stem cells, but it is still limited to specific cell types. Direct reprogramming from an original cell to target cell type needs the cell similarity and cell specific regu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2699-3
更新日期:2019-03-04 00:00:00
abstract:BACKGROUND:Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S13-S5
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:Viral infection by dengue virus is a major public health problem in tropical countries. Early diagnosis and detection are increasingly based on quantitative reverse transcriptase real-time polymerase chain reaction (RT-qPCR) directed against genomic regions conserved between different isolates. Genetic varia...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2313-0
更新日期:2018-09-04 00:00:00
abstract:BACKGROUND:Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic re...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-100
更新日期:2013-03-19 00:00:00
abstract:BACKGROUND:Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously, or at a short distance, and may involve some mutations (such as substitutions, insertions, and deletions). Tandem repeats have been extensively studied also for their association with the class of repeat expansion dise...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S4-S3
更新日期:2012-03-28 00:00:00
abstract:BACKGROUND:When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1149-8
更新日期:2016-07-22 00:00:00
abstract:BACKGROUND:Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and hel...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-311
更新日期:2011-07-29 00:00:00
abstract:BACKGROUND:Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Di...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-278
更新日期:2012-10-30 00:00:00
abstract:BACKGROUND:Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modell...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2357-1
更新日期:2018-09-21 00:00:00
abstract:BACKGROUND:The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious pro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2665-0
更新日期:2019-02-14 00:00:00
abstract:BACKGROUND:The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S8-S5
更新日期:2010-10-26 00:00:00
abstract:BACKGROUND:Sharing sets of chemical data (e.g., chemical properties, docking scores, etc.) among collaborators with diverse skill sets is a common task in computer-aided drug design and medicinal chemistry. The ability to associate this data with images of the relevant molecular structures greatly facilitates scientifi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-159
更新日期:2014-05-23 00:00:00
abstract:BACKGROUND:Identifying similarities between patterns of differential gene expression provides an opportunity to identify similarities between the experimental and biological conditions that give rise to these gene expression alterations. The growing volume of gene expression data in open data repositories such as the N...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-381
更新日期:2011-09-27 00:00:00
abstract:BACKGROUND:Zebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2166-6
更新日期:2018-05-09 00:00:00
abstract:BACKGROUND:It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enou...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-226
更新日期:2013-07-17 00:00:00
abstract:BACKGROUND:The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which inten...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S15
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-523
更新日期:2010-10-20 00:00:00
abstract:BACKGROUND:Non-linearities in observed log-ratios of gene expressions, also known as intensity dependent log-ratios, can often be accounted for by global biases in the two channels being compared. Any step in a microarray process may introduce such offsets and in this article we study the biases introduced by the micro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-177
更新日期:2004-11-12 00:00:00
abstract:BACKGROUND:The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biol...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3046-4
更新日期:2019-12-24 00:00:00
abstract:BACKGROUND:Expression quantitative trait loci (eQTL) mapping is often used to identify genetic loci and candidate genes correlated with traits. Although usually a group of genes affect complex traits, genes in most eQTL mapping methods are considered as independent. Recently, some eQTL mapping methods have accounted fo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1387-9
更新日期:2016-12-13 00:00:00
abstract:BACKGROUND:Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequenc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0482-7
更新日期:2015-02-18 00:00:00