Bayesian semiparametric regression models to characterize molecular evolution.

Abstract:

BACKGROUND:Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments. RESULTS:The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties. CONCLUSIONS:The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Datta S,Rodriguez A,Prado R

doi

10.1186/1471-2105-13-278

subject

Has Abstract

pub_date

2012-10-30 00:00:00

pages

278

issn

1471-2105

pii

1471-2105-13-278

journal_volume

13

pub_type

杂志文章
  • A preliminary PET radiomics study of brain metastases using a fully automatic segmentation method.

    abstract:BACKGROUND:Positron Emission Tomography (PET) is increasingly utilized in radiomics studies for treatment evaluation purposes. Nevertheless, lesion volume identification in PET images is a critical and still challenging step in the process of radiomics, due to the low spatial resolution and high noise level of PET imag...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03647-7

    authors: Stefano A,Comelli A,Bravatà V,Barone S,Daskalovski I,Savoca G,Sabini MG,Ippolito M,Russo G

    更新日期:2020-09-16 00:00:00

  • OMeta: an ontology-based, data-driven metadata tracking system.

    abstract:BACKGROUND:The development of high-throughput sequencing and analysis has accelerated multi-omics studies of thousands of microbial species, metagenomes, and infectious disease pathogens. Omics studies are enabling genotype-phenotype association studies which identify genetic determinants of pathogen virulence and drug...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2580-9

    authors: Singh I,Kuscuoglu M,Harkins DM,Sutton G,Fouts DE,Nelson KE

    更新日期:2019-01-07 00:00:00

  • ICEKAT: an interactive online tool for calculating initial rates from continuous enzyme kinetic traces.

    abstract:BACKGROUND:Continuous enzyme kinetic assays are often used in high-throughput applications, as they allow rapid acquisition of large amounts of kinetic data and increased confidence compared to discontinuous assays. However, data analysis is often rate-limiting in high-throughput enzyme assays, as manual inspection and...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3513-y

    authors: Olp MD,Kalous KS,Smith BC

    更新日期:2020-05-14 00:00:00

  • Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model.

    abstract:BACKGROUND:Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-539

    authors: Zhang ZD,Gerstein MB

    更新日期:2010-10-31 00:00:00

  • Scuba: scalable kernel-based gene prioritization.

    abstract:BACKGROUND:The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2025-5

    authors: Zampieri G,Tran DV,Donini M,Navarin N,Aiolli F,Sperduti A,Valle G

    更新日期:2018-01-25 00:00:00

  • libgapmis: extending short-read alignments.

    abstract:BACKGROUND:A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S11-S4

    authors: Alachiotis N,Berger S,Flouri T,Pissis SP,Stamatakis A

    更新日期:2013-01-01 00:00:00

  • Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes.

    abstract:BACKGROUND:In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-54

    authors: Frömke C,Hothorn LA,Kropf S

    更新日期:2008-01-27 00:00:00

  • Meta-eQTL: a tool set for flexible eQTL meta-analysis.

    abstract:BACKGROUND:Increasing number of eQTL (Expression Quantitative Trait Loci) datasets facilitate genetics and systems biology research. Meta-analysis tools are in need to jointly analyze datasets of same or similar issue types to improve statistical power especially in trans-eQTL mapping. Meta-analysis framework is also n...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0392-0

    authors: Di Narzo AF,Cheng H,Lu J,Hao K

    更新日期:2014-11-28 00:00:00

  • A multiresolution approach to automated classification of protein subcellular location images.

    abstract:BACKGROUND:Fluorescence microscopy is widely used to determine the subcellular location of proteins. Efforts to determine location on a proteome-wide basis create a need for automated methods to analyze the resulting images. Over the past ten years, the feasibility of using machine learning methods to recognize all maj...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-210

    authors: Chebira A,Barbotin Y,Jackson C,Merryman T,Srinivasa G,Murphy RF,Kovacević J

    更新日期:2007-06-19 00:00:00

  • CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies.

    abstract:BACKGROUND:Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS:We d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03777-y

    authors: Bui VK,Wei C

    更新日期:2020-10-20 00:00:00

  • SEQprocess: a modularized and customizable pipeline framework for NGS processing in R package.

    abstract:BACKGROUNDS:Next-Generation Sequencing (NGS) is now widely used in biomedical research for various applications. Processing of NGS data requires multiple programs and customization of the processing pipelines according to the data platforms. However, rapid progress of the NGS applications and processing methods urgentl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2676-x

    authors: Joo T,Choi JH,Lee JH,Park SE,Jeon Y,Jung SH,Woo HG

    更新日期:2019-02-20 00:00:00

  • Reordering based integrative expression profiling for microarray classification.

    abstract:BACKGROUND:Current network-based microarray analysis uses the information of interactions among concerned genes/gene products, but still considers each gene expression individually. We propose an organized knowledge-supervised approach - Integrative eXpression Profiling (IXP), to improve microarray classification accur...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S2-S1

    authors: Wu X,Huang H,Sonachalam M,Reinhard S,Shen J,Pandey R,Chen JY

    更新日期:2012-03-13 00:00:00

  • A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    abstract:BACKGROUND:Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1142-2

    authors: Thakur S,Guttman DS

    更新日期:2016-06-30 00:00:00

  • Trees on networks: resolving statistical patterns of phylogenetic similarities among interacting proteins.

    abstract:BACKGROUND:Phylogenies capture the evolutionary ancestry linking extant species. Correlations and similarities among a set of species are mediated by and need to be understood in terms of the phylogenic tree. In a similar way it has been argued that biological networks also induce correlations among sets of interacting...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-470

    authors: Kelly WP,Stumpf MP

    更新日期:2010-09-20 00:00:00

  • Rearrangement analysis of multiple bacterial genomes.

    abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3293-4

    authors: Noureen M,Tada I,Kawashima T,Arita M

    更新日期:2019-12-27 00:00:00

  • TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    abstract:BACKGROUND:Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now poss...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0377-z

    authors: Gilly A,Etcheverry M,Madoui MA,Guy J,Quadrana L,Alberti A,Martin A,Heitkam T,Engelen S,Labadie K,Le Pen J,Wincker P,Colot V,Aury JM

    更新日期:2014-11-19 00:00:00

  • Detecting transitions in protein dynamics using a recurrence quantification analysis based bootstrap method.

    abstract:BACKGROUND:Proteins undergo conformational transitions over different time scales. These transitions are closely intertwined with the protein's function. Numerous standard techniques such as principal component analysis are used to detect these transitions in molecular dynamics simulations. In this work, we add a new m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1943-y

    authors: Karain WI

    更新日期:2017-11-28 00:00:00

  • TPMS: a set of utilities for querying collections of gene trees.

    abstract:BACKGROUND:The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. RESULTS:In this paper, we present TPMS (Tree Pattern-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-109

    authors: Bigot T,Daubin V,Lassalle F,Perrière G

    更新日期:2013-03-27 00:00:00

  • GOmotif: A web server for investigating the biological role of protein sequence motifs.

    abstract:BACKGROUND:Many proteins contain conserved sequence patterns (motifs) that contribute to their functionality. The process of experimentally identifying and validating novel protein motifs can be difficult, expensive, and time consuming. A means for helping to identify in advance the possible function of a novel motif i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-379

    authors: Bristow F,He R,Van Domselaar G

    更新日期:2011-09-26 00:00:00

  • Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.

    abstract:BACKGROUND:With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S9-S1

    authors: Comin M,Schimd M

    更新日期:2014-01-01 00:00:00

  • The EnzymeTracker: an open-source laboratory information management system for sample tracking.

    abstract:BACKGROUND:In many laboratories, researchers store experimental data on their own workstation using spreadsheets. However, this approach poses a number of problems, ranging from sharing issues to inefficient data-mining. Standard spreadsheets are also error-prone, as data do not undergo any validation process. To overc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-15

    authors: Triplet T,Butler G

    更新日期:2012-01-26 00:00:00

  • An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse.

    abstract:BACKGROUND:Many statistical algorithms combine microarray expression data and genome sequence data to identify transcription factor binding motifs in the low eukaryotic genomes. Finding cis-regulatory elements in higher eukaryote genomes, however, remains a challenge, as searching in the promoter regions of genes with ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-44

    authors: Kim RS,Ji H,Wong WH

    更新日期:2006-01-26 00:00:00

  • Integration of shot-gun proteomics and bioinformatics analysis to explore plant hormone responses.

    abstract:BACKGROUND:Multidimensional protein identification technology (MudPIT)-based shot-gun proteomics has been proven to be an effective platform for functional proteomics. In particular, the various sample preparation methods and bioinformatics tools can be integrated to improve the proteomics platform for applications lik...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S15-S8

    authors: Zhang Y,Liu S,Dai SY,Yuan JS

    更新日期:2012-01-01 00:00:00

  • Frnakenstein: multiple target inverse RNA folding.

    abstract:BACKGROUND:RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more rece...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-260

    authors: Lyngsø RB,Anderson JW,Sizikova E,Badugu A,Hyland T,Hein J

    更新日期:2012-10-09 00:00:00

  • A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network.

    abstract:BACKGROUND:Genetic interaction profiles are highly informative and helpful for understanding the functional linkages between genes, and therefore have been extensively exploited for annotating gene functions and dissecting specific pathway structures. However, our understanding is rather limited to the relationship bet...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-343

    authors: You ZH,Yin Z,Han K,Huang DS,Zhou X

    更新日期:2010-06-24 00:00:00

  • Measuring similarities between transcription factor binding sites.

    abstract:BACKGROUND:Collections of transcription factor binding profiles (Transfac, Jaspar) are essential to identify regulatory elements in DNA sequences. Subsets of highly similar profiles complicate large scale analysis of transcription factor binding sites. RESULTS:We propose to identify and group similar profiles using tw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-237

    authors: Kielbasa SM,Gonze D,Herzel H

    更新日期:2005-09-28 00:00:00

  • Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.

    abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S1-S19

    authors: Zhang GL,Khan AM,Srinivasan KN,Heiny A,Lee K,Kwoh CK,August JT,Brusic V

    更新日期:2008-01-01 00:00:00

  • Learning by aggregating experts and filtering novices: a solution to crowdsourcing problems in bioinformatics.

    abstract:BACKGROUND:In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources. RESULTS:We proposed a probabilistic classification algorithm based on labe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S12-S5

    authors: Zhang P,Cao W,Obradovic Z

    更新日期:2013-01-01 00:00:00

  • Constructing a meaningful evolutionary average at the phylogenetic center of mass.

    abstract:BACKGROUND:As a consequence of the evolutionary process, data collected from related species tend to be similar. This similarity by descent can obscure subtler signals in the data such as the evidence of constraint on variation due to shared selective pressures. In comparative sequence analysis, for example, sequence s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-222

    authors: Stone EA,Sidow A

    更新日期:2007-06-26 00:00:00

  • VKCDB: voltage-gated potassium channel database.

    abstract:BACKGROUND:The family of voltage-gated potassium channels comprises a functionally diverse group of membrane proteins. They help maintain and regulate the potassium ion-based component of the membrane potential and are thus central to many critical physiological processes. VKCDB (Voltage-gated potassium [K] Channel Dat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1186/1471-2105-5-3

    authors: Li B,Gallin WJ

    更新日期:2004-01-09 00:00:00