An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees.

Abstract:

BACKGROUND:Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify "sufficiently large". RESULTS:Using the multispecies coalescent model, we report a general analytical upper bound on the number of gene trees n required such that with probability q, each bipartition of a species tree is represented at least once in a set of n random gene trees. This bound employs a formula that is straightforward to compute, depends only on the minimum internal branch length of the species tree and the number of taxa, and applies irrespective of the species tree topology. Using simulations, we investigate numerical properties of the bound as well as its accuracy under the multispecies coalescent. CONCLUSIONS:Our results are helpful for conservatively bounding the number of gene trees required by the ASTRAL inference method, and the approach has potential to be extended to bound other properties of gene tree sets under the model.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Uricchio LH,Warnow T,Rosenberg NA

doi

10.1186/s12859-016-1266-4

subject

Has Abstract

pub_date

2016-11-11 00:00:00

pages

417

issue

Suppl 14

issn

1471-2105

pii

10.1186/s12859-016-1266-4

journal_volume

17

pub_type

杂志文章
  • AnyExpress: integrated toolkit for analysis of cross-platform gene expression data using a fast interval matching algorithm.

    abstract:BACKGROUND:Cross-platform analysis of gene express data requires multiple, intricate processes at different layers with various platforms. However, existing tools handle only a single platform and are not flexible enough to support custom changes, which arise from the new statistical methods, updated versions of refere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-75

    authors: Kim J,Patel K,Jung H,Kuo WP,Ohno-Machado L

    更新日期:2011-03-17 00:00:00

  • A novel similarity-measure for the analysis of genetic data in complex phenotypes.

    abstract:BACKGROUND:Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S6-S24

    authors: Lagani V,Montesanto A,Di Cianni F,Moreno V,Landi S,Conforti D,Rose G,Passarino G

    更新日期:2009-06-16 00:00:00

  • Modeling, validation and verification of three-dimensional cell-scaffold contacts from terabyte-sized images.

    abstract:BACKGROUND:Cell-scaffold contact measurements are derived from pairs of co-registered volumetric fluorescent confocal laser scanning microscopy (CLSM) images (z-stacks) of stained cells and three types of scaffolds (i.e., spun coat, large microfiber, and medium microfiber). Our analysis of the acquired terabyte-sized c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1928-x

    authors: Bajcsy P,Yoon S,Florczyk SJ,Hotaling NA,Simon M,Szczypinski PM,Schaub NJ,Simon CG Jr,Brady M,Sriram RD

    更新日期:2017-11-28 00:00:00

  • Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

    abstract:BACKGROUND:The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. RESULTS:...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2103-8

    authors: Müller HM,Van Auken KM,Li Y,Sternberg PW

    更新日期:2018-03-09 00:00:00

  • In silico design of targeted SRM-based experiments.

    abstract::Selected reaction monitoring (SRM)-based proteomics approaches enable highly sensitive and reproducible assays for profiling of thousands of peptides in one experiment. The development of such assays involves the determination of retention time, detectability and fragmentation properties of peptides, followed by an op...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S16-S8

    authors: Nahnsen S,Kohlbacher O

    更新日期:2012-01-01 00:00:00

  • GLOSSI: a method to assess the association of genetic loci-sets with complex diseases.

    abstract:BACKGROUND:The developments of high-throughput genotyping technologies, which enable the simultaneous genotyping of hundreds of thousands of single nucleotide polymorphisms (SNP) have the potential to increase the benefits of genetic epidemiology studies. Although the enhanced resolution of these platforms increases th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-102

    authors: Chai HS,Sicotte H,Bailey KR,Turner ST,Asmann YW,Kocher JP

    更新日期:2009-04-03 00:00:00

  • A machine learning strategy for predicting localization of post-translational modification sites in protein-protein interacting regions.

    abstract:BACKGROUND:One very important functional domain of proteins is the protein-protein interacting region (PPIR), which forms the binding interface between interacting polypeptide chains. Post-translational modifications (PTMs) that occur in the PPIR can either interfere with or facilitate the interaction between proteins....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1165-8

    authors: Saethang T,Payne DM,Avihingsanon Y,Pisitkun T

    更新日期:2016-08-17 00:00:00

  • Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

    abstract:BACKGROUND:Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. Th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-99

    authors: Vlasblom J,Wodak SJ

    更新日期:2009-03-30 00:00:00

  • ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

    abstract:BACKGROUND:Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Meth...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-61

    authors: Hajiloo M,Sapkota Y,Mackey JR,Robson P,Greiner R,Damaraju S

    更新日期:2013-02-22 00:00:00

  • Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets.

    abstract:BACKGROUND:Identification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strateg...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S10-S19

    authors: Roy S,Heinrich K,Phan V,Berry MW,Homayouni R

    更新日期:2011-10-18 00:00:00

  • OmniMapFree: a unified tool to visualise and explore sequenced genomes.

    abstract:UNLABELLED: BACKGROUND:Acquiring and exploring whole genome sequence information for a species under investigation is now a routine experimental approach. On most genome browsers, typically, only the DNA sequence, EST support, motif search results, and GO annotations are displayed. However, for many species, a growing...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-447

    authors: Antoniw J,Beacham AM,Baldwin TK,Urban M,Rudd JJ,Hammond-Kosack KE

    更新日期:2011-11-15 00:00:00

  • TPMS: a set of utilities for querying collections of gene trees.

    abstract:BACKGROUND:The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. RESULTS:In this paper, we present TPMS (Tree Pattern-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-109

    authors: Bigot T,Daubin V,Lassalle F,Perrière G

    更新日期:2013-03-27 00:00:00

  • BioIMAX: a Web 2.0 approach for easy exploratory and collaborative access to multivariate bioimage data.

    abstract:BACKGROUND:Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-297

    authors: Loyek C,Rajpoot NM,Khan M,Nattkemper TW

    更新日期:2011-07-21 00:00:00

  • Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.

    abstract:BACKGROUND:It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-226

    authors: Meyer F,Kurtz S,Beckstette M

    更新日期:2013-07-17 00:00:00

  • A genetic approach for building different alphabets for peptide and protein classification.

    abstract:BACKGROUND:In this paper, it is proposed an optimization approach for producing reduced alphabets for peptide classification, using a Genetic Algorithm. The classification task is performed by a multi-classifier system where each classifier (Linear or Radial Basis function Support Vector Machines) is trained using feat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-45

    authors: Nanni L,Lumini A

    更新日期:2008-01-24 00:00:00

  • Identification of common coexpression modules based on quantitative network comparison.

    abstract:BACKGROUND:Finding common molecular interactions from different samples is essential work to understanding diseases and other biological processes. Coexpression networks and their modules directly reflect sample-specific interactions among genes. Therefore, identification of common coexpression network or modules may r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2193-3

    authors: Jo Y,Kim S,Lee D

    更新日期:2018-06-13 00:00:00

  • Improving interoperability between microbial information and sequence databases.

    abstract:BACKGROUND:Biological resources are essential tools for biomedical research. Their availability is promoted through on-line catalogues. Common Access to Biological Resources and Information (CABRI) is a service for distribution of biological resources and related data collected by 28 European culture collections. Linki...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S23

    authors: Romano P,Dawyndt P,Piersigilli F,Swings J

    更新日期:2005-12-01 00:00:00

  • New mini- zincin structures provide a minimal scaffold for members of this metallopeptidase superfamily.

    abstract:BACKGROUND:The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-1

    authors: Trame CB,Chang Y,Axelrod HL,Eberhardt RY,Coggill P,Punta M,Rawlings ND

    更新日期:2014-01-03 00:00:00

  • Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots.

    abstract:BACKGROUND:Analyses of molecular high-throughput data often lack in robustness, i.e. results are very sensitive to the addition or removal of a single observation. Therefore, the identification of extreme observations is an important step of quality control before doing further data analysis. Standard outlier detection...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1645-5

    authors: Kruppa J,Jung K

    更新日期:2017-05-02 00:00:00

  • Inferring latent task structure for Multitask Learning by Multiple Kernel Learning.

    abstract:BACKGROUND:The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S8-S5

    authors: Widmer C,Toussaint NC,Altun Y,Rätsch G

    更新日期:2010-10-26 00:00:00

  • Parameterizing sequence alignment with an explicit evolutionary model.

    abstract:BACKGROUND:Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not b...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0832-5

    authors: Rivas E,Eddy SR

    更新日期:2015-12-10 00:00:00

  • Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure.

    abstract:BACKGROUND:A professional recognition mechanism is required to encourage expedited publishing of an adequate volume of 'fit-for-use' biodiversity data. As a component of such a recognition mechanism, we propose the development of the Data Usage Index (DUI) to demonstrate to data publishers that their efforts of creatin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S15-S3

    authors: Ingwersen P,Chavan V

    更新日期:2011-01-01 00:00:00

  • Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

    abstract:BACKGROUND:During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-178

    authors: Nilsson RH,Kristiansson E,Ryberg M,Larsson KH

    更新日期:2005-07-18 00:00:00

  • Amino acid sequence associated with bacteriophage recombination site helps to reveal genes potentially acquired through horizontal gene transfer.

    abstract:BACKGROUND:Horizontal gene transfer, i.e. the acquisition of genetic material from nonparent organism, is considered an important force driving species evolution. Many cases of horizontal gene transfer from prokaryotes to eukaryotes have been registered, but no transfer mechanism has been deciphered so far, although vi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03599-y

    authors: Daugavet MA,Shabelnikov SV,Podgornaya OI

    更新日期:2020-07-24 00:00:00

  • Using expression arrays for copy number detection: an example from E. coli.

    abstract:BACKGROUND:The sequencing of many genomes and tiling arrays consisting of millions of DNA segments spanning entire genomes have made high-resolution copy number analysis possible. Microarray-based comparative genomic hybridization (array CGH) has enabled the high-resolution detection of DNA copy number aberrations. Whi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-203

    authors: Skvortsov D,Abdueva D,Stitzer ME,Finkel SE,Tavaré S

    更新日期:2007-06-14 00:00:00

  • Sparse multiple co-Inertia analysis with application to integrative analysis of multi -Omics data.

    abstract:BACKGROUND:Multiple co-inertia analysis (mCIA) is a multivariate analysis method that can assess relationships and trends in multiple datasets. Recently it has been used for integrative analysis of multiple high-dimensional -omics datasets. However, its estimated loading vectors are non-sparse, which presents challenge...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3455-4

    authors: Min EJ,Long Q

    更新日期:2020-04-15 00:00:00

  • Non-coding RNA detection methods combined to improve usability, reproducibility and precision.

    abstract:BACKGROUND:Non-coding RNAs gain more attention as their diverse roles in many cellular processes are discovered. At the same time, the need for efficient computational prediction of ncRNAs increases with the pace of sequencing technology. Existing tools are based on various approaches and techniques, but none of them p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-491

    authors: Raasch P,Schmitz U,Patenge N,Vera J,Kreikemeyer B,Wolkenhauer O

    更新日期:2010-09-29 00:00:00

  • Structural analysis on mutation residues and interfacial water molecules for human TIM disease understanding.

    abstract:BACKGROUND:Human triosephosphate isomerase (HsTIM) deficiency is a genetic disease caused often by the pathogenic mutation E104D. This mutation, located at the side of an abnormally large cluster of water in the inter-subunit interface, reduces the thermostability of the enzyme. Why and how these water molecules are di...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S16-S11

    authors: Li Z,He Y,Liu Q,Zhao L,Wong L,Kwoh CK,Nguyen H,Li J

    更新日期:2013-01-01 00:00:00

  • The PowerAtlas: a power and sample size atlas for microarray experimental design and research.

    abstract:BACKGROUND:Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips need...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-84

    authors: Page GP,Edwards JW,Gadbury GL,Yelisetti P,Wang J,Trivedi P,Allison DB

    更新日期:2006-02-22 00:00:00

  • Robust structure measures of metabolic networks that predict prokaryotic optimal growth temperature.

    abstract:BACKGROUND:Metabolic networks reflect the relationships between metabolites (biomolecules) and the enzymes (proteins), and are of particular interest since they describe all chemical reactions of an organism. The metabolic networks are constructed from the genome sequence of an organism, and the graphs can be used to s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3112-y

    authors: Weber Zendrera A,Sokolovska N,Soula HA

    更新日期:2019-10-15 00:00:00