Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.

Abstract:

BACKGROUND:Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny. RESULTS:We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95% Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95% Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95% HPDs are larger than those obtained by BEAST. CONCLUSION:We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95% HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Wu SH,Rodrigo AG

doi

10.1186/s12859-015-0810-y

subject

Has Abstract

pub_date

2015-11-04 00:00:00

pages

357

issn

1471-2105

pii

10.1186/s12859-015-0810-y

journal_volume

16

pub_type

杂志文章
  • ConEVA: a toolbox for comprehensive assessment of protein contacts.

    abstract:BACKGROUND:In recent years, successful contact prediction methods and contact-guided ab initio protein structure prediction methods have highlighted the importance of incorporating contact information into protein structure prediction methods. It is also observed that for almost all globular proteins, the quality of co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1404-z

    authors: Adhikari B,Nowotny J,Bhattacharya D,Hou J,Cheng J

    更新日期:2016-12-07 00:00:00

  • metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences.

    abstract::Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate E...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S5-S2

    authors: Ander C,Schulz-Trieglaff OB,Stoye J,Cox AJ

    更新日期:2013-01-01 00:00:00

  • Restricted DCJ-indel model: sorting linear genomes with DCJ and indels.

    abstract:BACKGROUND:The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S19-S14

    authors: da Silva PH,Machado R,Dantas S,Braga MD

    更新日期:2012-01-01 00:00:00

  • Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida.

    abstract:BACKGROUND:Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to envi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S7-S7

    authors: Pirooznia M,Gong P,Guan X,Inouye LS,Yang K,Perkins EJ,Deng Y

    更新日期:2007-11-01 00:00:00

  • Pripper: prediction of caspase cleavage sites from whole proteomes.

    abstract:BACKGROUND:Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to pre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-320

    authors: Piippo M,Lietzén N,Nevalainen OS,Salmi J,Nyman TA

    更新日期:2010-06-15 00:00:00

  • "METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".

    abstract:BACKGROUND:The improvements in genomics methods coupled with readily accessible high-throughput sequencing have contributed to our understanding of microbial species, metagenomes, infectious diseases and more. To maximize the impact of these genomics studies, it is important that data from biological samples will becom...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03694-0

    authors: Quiñones M,Liou DT,Shyu C,Kim W,Vujkovic-Cvijin I,Belkaid Y,Hurt DE

    更新日期:2020-09-03 00:00:00

  • Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    abstract:BACKGROUND:Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain datab...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-16

    authors: Truong K,Ikura M

    更新日期:2003-05-06 00:00:00

  • Components of the antigen processing and presentation pathway revealed by gene expression microarray analysis following B cell antigen receptor (BCR) stimulation.

    abstract:BACKGROUND:Activation of naïve B lymphocytes by extracellular ligands, e.g. antigen, lipopolysaccharide (LPS) and CD40 ligand, induces a combination of common and ligand-specific phenotypic changes through complex signal transduction pathways. For example, although all three of these ligands induce proliferation, only ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-237

    authors: Lee JA,Sinkovits RS,Mock D,Rab EL,Cai J,Yang P,Saunders B,Hsueh RC,Choi S,Subramaniam S,Scheuermann RH,Alliance for Cellular Signaling.

    更新日期:2006-05-02 00:00:00

  • Simultaneous phylogeny reconstruction and multiple sequence alignment.

    abstract:BACKGROUND:A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S11

    authors: Yue F,Shi J,Tang J

    更新日期:2009-01-30 00:00:00

  • Directed acyclic graph kernels for structural RNA analysis.

    abstract:BACKGROUND:Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between tw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-318

    authors: Sato K,Mituyama T,Asai K,Sakakibara Y

    更新日期:2008-07-22 00:00:00

  • CellSim: a novel software to calculate cell similarity and identify their co-regulation networks.

    abstract:BACKGROUND:Cell direct reprogramming technology has been rapidly developed with its low risk of tumor risk and avoidance of ethical issues caused by stem cells, but it is still limited to specific cell types. Direct reprogramming from an original cell to target cell type needs the cell similarity and cell specific regu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2699-3

    authors: Li L,Che D,Wang X,Zhang P,Rahman SU,Zhao J,Yu J,Tao S,Lu H,Liao M

    更新日期:2019-03-04 00:00:00

  • Learning by aggregating experts and filtering novices: a solution to crowdsourcing problems in bioinformatics.

    abstract:BACKGROUND:In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources. RESULTS:We proposed a probabilistic classification algorithm based on labe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S12-S5

    authors: Zhang P,Cao W,Obradovic Z

    更新日期:2013-01-01 00:00:00

  • A new pooling strategy for high-throughput screening: the Shifted Transversal Design.

    abstract:BACKGROUND:In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplicat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-28

    authors: Thierry-Mieg N

    更新日期:2006-01-19 00:00:00

  • MOSBIE: a tool for comparison and analysis of rule-based biochemical models.

    abstract:BACKGROUND:Mechanistic models that describe the dynamical behaviors of biochemical systems are common in computational systems biology, especially in the realm of cellular signaling. The development of families of such models, either by a single research group or by different groups working within the same area, presen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-316

    authors: Wenskovitch JE Jr,Harris LA,Tapia JJ,Faeder JR,Marai GE

    更新日期:2014-09-25 00:00:00

  • AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions.

    abstract:BACKGROUND:Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-163

    authors: Chew DS,Leung MY,Choi KP

    更新日期:2007-05-21 00:00:00

  • HMM Logos for visualization of protein families.

    abstract:BACKGROUND:Profile Hidden Markov Models (pHMMs) are a widely used tool for protein family research. Up to now, however, there exists no method to visualize all of their central aspects graphically in an intuitively understandable way. RESULTS:We present a visualization method that incorporates both emission and transi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-7

    authors: Schuster-Böckler B,Schultz J,Rahmann S

    更新日期:2004-01-21 00:00:00

  • Predicting blood pressure from physiological index data using the SVR algorithm.

    abstract:BACKGROUND:Blood pressure diseases have increasingly been identified as among the main factors threatening human health. How to accurately and conveniently measure blood pressure is the key to the implementation of effective prevention and control measures for blood pressure diseases. Traditional blood pressure measure...

    journal_title:BMC bioinformatics

    pub_type: 临床试验,杂志文章

    doi:10.1186/s12859-019-2667-y

    authors: Zhang B,Ren H,Huang G,Cheng Y,Hu C

    更新日期:2019-02-28 00:00:00

  • Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data.

    abstract:BACKGROUND:Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published da...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-203

    authors: Zhang Y,Xuan J,de los Reyes BG,Clarke R,Ressom HW

    更新日期:2008-04-21 00:00:00

  • Partition-based optimization model for generative anatomy modeling language (POM-GAML).

    abstract:BACKGROUND:This paper presents a novel approach for Generative Anatomy Modeling Language (GAML). This approach automatically detects the geometric partitions in 3D anatomy that in turn speeds up integrated non-linear optimization model in GAML for 3D anatomy modeling with constraints (e.g. joints). This integrated non-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2626-7

    authors: Demirel D,Cetinsaya B,Halic T,Kockara S,Ahmadi S

    更新日期:2019-03-14 00:00:00

  • Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests.

    abstract:BACKGROUND:To understand the evolutionary role of Lateral Gene Transfer (LGT), accurate methods are needed to identify transferred genes and infer their timing of acquisition. Phylogenetic methods are particularly promising for this purpose, but the reconciliation of a gene tree with a reference (species) tree is compu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-324

    authors: Abby SS,Tannier E,Gouy M,Daubin V

    更新日期:2010-06-15 00:00:00

  • Pushing the accuracy limit of shape complementarity for protein-protein docking.

    abstract:BACKGROUND:Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open que...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3270-y

    authors: Yan Y,Huang SY

    更新日期:2019-12-24 00:00:00

  • Conservation of regulatory elements between two species of Drosophila.

    abstract:BACKGROUND:One of the important goals in the post-genomic era is to determine the regulatory elements within the non-coding DNA of a given organism's genome. The identification of functional cis-regulatory modules has proven difficult since the component factor binding sites are small and the rules governing their arra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-57

    authors: Emberly E,Rajewsky N,Siggia ED

    更新日期:2003-11-20 00:00:00

  • Recursive model for dose-time responses in pharmacological studies.

    abstract:BACKGROUND:Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2831-4

    authors: Dhruba SR,Rahman A,Rahman R,Ghosh S,Pal R

    更新日期:2019-06-20 00:00:00

  • TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

    abstract:BACKGROUND:Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1419-5

    authors: Cumbo F,Fiscon G,Ceri S,Masseroli M,Weitschek E

    更新日期:2017-01-03 00:00:00

  • Discovering functional interaction patterns in protein-protein interaction networks.

    abstract:BACKGROUND:In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-276

    authors: Turanalp ME,Can T

    更新日期:2008-06-11 00:00:00

  • Accelerating a cross-correlation score function to search modifications using a single GPU.

    abstract:BACKGROUND:A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2559-6

    authors: Kim H,Han S,Um JH,Park K

    更新日期:2018-12-12 00:00:00

  • Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect.

    abstract:BACKGROUND:To evaluate statistical methods for genome-wide genetic analyses, one needs to be able to simulate realistic genotypes. We here describe a method, applicable to a broad range of association study designs, that can simulate autosome-wide single-nucleotide polymorphism data with realistic linkage disequilibriu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-2004-2

    authors: Shi M,Umbach DM,Wise AS,Weinberg CR

    更新日期:2018-01-02 00:00:00

  • Compromise or optimize? The breakpoint anti-median.

    abstract:BACKGROUND:The median of k≥3 genomes was originally defined to find a compromise genome indicative of a common ancestor. However, in gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. "Near-medi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1340-y

    authors: Larlee CA,Brandts A,Sankoff D

    更新日期:2016-12-15 00:00:00

  • PoGO: Prediction of Gene Ontology terms for fungal proteins.

    abstract:BACKGROUND:Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-215

    authors: Jung J,Yi G,Sukno SA,Thon MR

    更新日期:2010-04-29 00:00:00

  • proTRAC--a software for probabilistic piRNA cluster detection, visualization and analysis.

    abstract:BACKGROUND:Throughout the metazoan lineage, typically gonadal expressed Piwi proteins and their guiding piRNAs (~26-32nt in length) form a protective mechanism of RNA interference directed against the propagation of transposable elements (TEs). Most piRNAs are generated from genomic piRNA clusters. Annotation of experi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-5

    authors: Rosenkranz D,Zischler H

    更新日期:2012-01-10 00:00:00