Abstract:
BACKGROUND:Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modelled by a Markov substitution model. In contrast, the dynamics of indels are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. But the failure to model indel evolution may lead to artificially short alignments due to biased indel placement, inconsistent with phylogenetic relationship. RESULTS:Recently, the classical indel model TKF91 was modified to describe indel evolution on a phylogeny via a Poisson process, termed PIP. PIP allows to compute the joint marginal probability of an MSA and a tree in linear time. We present a new dynamic programming algorithm to align two MSAs -represented by the underlying homology paths- by full maximum likelihood under PIP in polynomial time, and apply it progressively along a guide tree. We have corroborated the correctness of our method by simulation, and compared it with competitive methods on an illustrative real dataset. CONCLUSIONS:Our MSA method is the first polynomial time progressive aligner with a rigorous mathematical formulation of indel evolution. The new method infers phylogenetically meaningful gap patterns alternative to the popular PRANK, while producing alignments of similar length. Moreover, the inferred gap patterns agree with what was predicted qualitatively by previous studies. The algorithm is implemented in a standalone C++ program: https://github.com/acg-team/ProPIP . Supplementary data are available at BMC Bioinformatics online.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Maiolo M,Zhang X,Gil M,Anisimova Mdoi
10.1186/s12859-018-2357-1subject
Has Abstractpub_date
2018-09-21 00:00:00pages
331issue
1issn
1471-2105pii
10.1186/s12859-018-2357-1journal_volume
19pub_type
杂志文章abstract:BACKGROUND:Heritability of a phenotypic or molecular trait measures the proportion of variance that is attributable to genotypic variance. It is an important concept in breeding and genetics. Few methods are available for calculating heritability for traits derived from high-throughput sequencing. RESULTS:We propose s...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1539-6
更新日期:2017-03-02 00:00:00
abstract:BACKGROUND:Internal ribosomal entry sites (IRESs) provide alternative, cap-independent translation initiation sites in eukaryotic cells. IRES elements are important factors in viral genomes and are also useful tools for bi-cistronic expression vectors. Most existing RNA structure prediction programs are unable to deal ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-160
更新日期:2009-05-27 00:00:00
abstract:BACKGROUND:The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appea...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-4-41
更新日期:2003-09-11 00:00:00
abstract::Selected reaction monitoring (SRM)-based proteomics approaches enable highly sensitive and reproducible assays for profiling of thousands of peptides in one experiment. The development of such assays involves the determination of retention time, detectability and fragmentation properties of peptides, followed by an op...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S16-S8
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Various computer-based methods exist for the detection and quantification of protein spots in two dimensional gel electrophoresis images. Area-based methods are commonly used for spot quantification: an area is assigned to each spot and the sum of the pixel intensities in that area, the so-called volume, is ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-181
更新日期:2014-06-11 00:00:00
abstract:BACKGROUND:Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-5
更新日期:2013-01-16 00:00:00
abstract:BACKGROUND:Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of pot...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03639-7
更新日期:2020-07-14 00:00:00
abstract:BACKGROUND:Designing small-molecule kinase inhibitors with desirable selectivity profiles is a major challenge in drug discovery. A high-throughput screen for inhibitors of a given kinase will typically yield many compounds that inhibit more than one kinase. A series of chemical modifications are usually required befor...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-491
更新日期:2008-11-25 00:00:00
abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-275
更新日期:2010-05-21 00:00:00
abstract:BACKGROUND:Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-537
更新日期:2010-10-29 00:00:00
abstract:BACKGROUND:Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of ap...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-281
更新日期:2011-07-12 00:00:00
abstract:BACKGROUND:Microbial electrosynthesis and electro fermentation are techniques that aim to optimize microbial production of chemicals and fuels by regulating the cellular redox balance via interaction with electrodes. While the concept is known for decades major knowledge gaps remain, which make it hard to evaluate its ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0410-2
更新日期:2014-12-30 00:00:00
abstract:BACKGROUND:Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0374-2
更新日期:2014-12-10 00:00:00
abstract::Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is difficult to use in practice. We introduce a series of tools for processing and analyzing RNA-Seq data...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1357-2
更新日期:2016-12-01 00:00:00
abstract:BACKGROUND:Guanine protein-coupled receptors (GPCRs) constitute a eukaryotic transmembrane protein family and function as "molecular switches" in the second messenger cascades and are found in all organisms between yeast and humans. They form the single, biggest drug-target family due to their versatility of action and...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S1-S3
更新日期:2011-02-15 00:00:00
abstract:BACKGROUND:Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-508
更新日期:2010-10-12 00:00:00
abstract:BACKGROUND:Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical e...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1805-7
更新日期:2017-10-03 00:00:00
abstract:BACKGROUND:Circular RNAs (circRNAs) are a newly appreciated class of non-coding RNA molecules. Numerous tools have been developed for the detection of circRNAs, however computational tools to perform downstream functional analysis of circRNAs are scarce. RESULTS:We present circRNAprofiler, an R-based computational fra...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3500-3
更新日期:2020-04-29 00:00:00
abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3293-4
更新日期:2019-12-27 00:00:00
abstract:BACKGROUND:Recently, the availability of high-resolution microscopy together with the advancements in the development of biomarkers as reporters of biomolecular interactions increased the importance of imaging methods in molecular cell biology. These techniques enable the investigation of cellular characteristics like ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-126
更新日期:2011-04-28 00:00:00
abstract:BACKGROUND:Infections are often associated to comorbidity that increases the risk of medical conditions which can lead to further morbidity and mortality. SARS is a threat which is similar to MERS virus, but the comorbidity is the key aspect to underline their different impacts. One UK doctor says "I'd rather have HIV ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-333
更新日期:2014-10-24 00:00:00
abstract:BACKGROUND:Long-range interactions between regulatory DNA elements such as enhancers, insulators and promoters play an important role in regulating transcription. As chromatin contacts have been found throughout the human genome and in different cell types, spatial transcriptional control is now viewed as a general mec...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-414
更新日期:2011-10-25 00:00:00
abstract:BACKGROUND:The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3384-2
更新日期:2020-02-06 00:00:00
abstract:BACKGROUND:Microorganisms display vast diversity, and each one has its own set of genes, cell components and metabolic reactions. To assess their huge unexploited metabolic potential in different ecosystems, we need high throughput tools, such as functional microarrays, that allow the simultaneous analysis of thousands...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-478
更新日期:2010-09-23 00:00:00
abstract:BACKGROUND AND GOAL:The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. RESULTS:In this conte...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2264-5
更新日期:2018-07-17 00:00:00
abstract:BACKGROUND:Automatic quantification of neuronal morphology from images of fluorescence microscopy plays an increasingly important role in high-content screenings. However, there exist very few freeware tools and methods which provide automatic neuronal morphology quantification for pharmacological discovery. RESULTS:T...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-230
更新日期:2011-06-08 00:00:00
abstract:BACKGROUND:High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-94
更新日期:2010-02-18 00:00:00
abstract:BACKGROUND:In the last few years high-throughput analysis methods have become state-of-the-art in the life sciences. One of the latest developments is automated greenhouse systems for high-throughput plant phenotyping. Such systems allow the non-destructive screening of plants over a period of time by means of image ac...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-148
更新日期:2011-05-12 00:00:00
abstract:BACKGROUND:Cryo-electron microscopy (Cryo-EM) is widely used in the determination of the three-dimensional (3D) structures of macromolecules. Particle picking from 2D micrographs remains a challenging early step in the Cryo-EM pipeline due to the diversity of particle shapes and the extremely low signal-to-noise ratio ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03809-7
更新日期:2020-11-09 00:00:00
abstract:BACKGROUND:Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-215
更新日期:2010-04-29 00:00:00