Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees.

Abstract:

:Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in an MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced run time and with far fewer parameters. These properties enable inferences of the complex evolution of gene families across a broad range of species and large data sets.

journal_name

Genome Res

journal_title

Genome research

authors

Wu YC,Rasmussen MD,Bansal MS,Kellis M

doi

10.1101/gr.161968.113

subject

Has Abstract

pub_date

2014-03-01 00:00:00

pages

475-86

issue

3

eissn

1088-9051

issn

1549-5469

pii

gr.161968.113

journal_volume

24

pub_type

杂志文章
  • Polycomb preferentially targets stalled promoters of coding and noncoding transcripts.

    abstract::The Polycomb group (PcG) and Trithorax group (TrxG) of proteins are required for stable and heritable maintenance of repressed and active gene expression states. Their antagonistic function on gene control, repression for PcG and activity for TrxG, is mediated by binding to chromatin and subsequent epigenetic modifica...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.114348.110

    authors: Enderle D,Beisel C,Stadler MB,Gerstung M,Athri P,Paro R

    更新日期:2011-02-01 00:00:00

  • A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast.

    abstract::It is widely accepted that newly arisen duplicate gene pairs experience an altered selective regime that is often manifested as an increase in the rate of protein sequence evolution. Many details about the nature of the rate acceleration remain unknown, however, including its typical magnitude and duration, and whethe...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6341207

    authors: Scannell DR,Wolfe KH

    更新日期:2008-01-01 00:00:00

  • The clustering of functionally related genes contributes to CNV-mediated disease.

    abstract::Clusters of functionally related genes can be disrupted by a single copy number variant (CNV). We demonstrate that the simultaneous disruption of multiple functionally related genes is a frequent and significant characteristic of de novo CNVs in patients with developmental disorders (P = 1 × 10(-3)). Using three diffe...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.184325.114

    authors: Andrews T,Honti F,Pfundt R,de Leeuw N,Hehir-Kwa J,Vulto-van Silfhout A,de Vries B,Webber C

    更新日期:2015-06-01 00:00:00

  • Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.

    abstract::Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3176505

    authors: Ayele M,Haas BJ,Kumar N,Wu H,Xiao Y,Van Aken S,Utterback TR,Wortman JR,White OR,Town CD

    更新日期:2005-04-01 00:00:00

  • A nuclear matrix attachment site in the 4q35 locus has an enhancer-blocking activity in vivo: implications for the facio-scapulo-humeral dystrophy.

    abstract::Facio-scapulo-humeral dystrophy (FSHD), a muscular hereditary disease with a prevalence of 1 in 20,000, is caused by a partial deletion of a subtelomeric repeat array on chromosome 4q. Earlier, we demonstrated the existence in the vicinity of the D4Z4 repeat of a nuclear matrix attachment site, FR-MAR, efficient in no...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6620908

    authors: Petrov A,Allinne J,Pirozhkova I,Laoudj D,Lipinski M,Vassetzky YS

    更新日期:2008-01-01 00:00:00

  • How does replication-associated mutational pressure influence amino acid composition of proteins?

    abstract::We have performed detrended DNA walks on whole prokaryotic genomes, on noncoding sequences and, separately, on each position in codons of coding sequences. Our method enables us to distinguish between the mutational pressure associated with replication and the mutational pressure associated with transcription and othe...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: MackiewiczP,Gierlik A,Kowalczuk M,Dudek MR,Cebrat S

    更新日期:1999-05-01 00:00:00

  • Estimating coarse gene network structure from large-scale gene perturbation data.

    abstract::Large scale gene perturbation experiments generate information about the number of genes whose activity is directly or indirectly affected by a gene perturbation. From this information, one can numerically estimate coarse structural network features such as the total number of direct regulatory interactions and the nu...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.193902

    authors: Wagner A

    更新日期:2002-02-01 00:00:00

  • Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

    abstract::By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by m...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4039406

    authors: Kimura K,Wakamatsu A,Suzuki Y,Ota T,Nishikawa T,Yamashita R,Yamamoto J,Sekine M,Tsuritani K,Wakaguri H,Ishii S,Sugiyama T,Saito K,Isono Y,Irie R,Kushida N,Yoneyama T,Otsuka R,Kanda K,Yokoi T,Kondo H,Wagatsuma M

    更新日期:2006-01-01 00:00:00

  • Rate of elongation by RNA polymerase II is associated with specific gene features and epigenetic modifications.

    abstract::The rate of transcription elongation plays an important role in the timing of expression of full-length transcripts as well as in the regulation of alternative splicing. In this study, we coupled Bru-seq technology with 5,6-dichlorobenzimidazole 1-β-D-ribofuranoside (DRB) to estimate the elongation rates of over 2000 ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.171405.113

    authors: Veloso A,Kirkconnell KS,Magnuson B,Biewen B,Paulsen MT,Wilson TE,Ljungman M

    更新日期:2014-06-01 00:00:00

  • Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes.

    abstract::Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mut...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.239186.118

    authors: Maher GJ,Ralph HK,Ding Z,Koelling N,Mlcochova H,Giannoulatou E,Dhami P,Paul DS,Stricker SH,Beck S,McVean G,Wilkie AOM,Goriely A

    更新日期:2018-12-01 00:00:00

  • Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations.

    abstract::Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of in...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.239822.118

    authors: Durand É,Gagnon-Arsenault I,Hallin J,Hatin I,Dubé AK,Nielly-Thibault L,Namy O,Landry CR

    更新日期:2019-06-01 00:00:00

  • Mutation detection using mass spectrometric separation of tiny oligonucleotide fragments.

    abstract::A DNA mutation detection protocol able to identify and characterize a previously unknown change in a given sequence in a rapid, efficient, sensitive, and inexpensive manner is required to take advantage of the resources now available to researchers through the genome sequencing projects. We have developed a method bas...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.gr-1578r

    authors: Elso C,Toohey B,Reid GE,Poetter K,Simpson RJ,Foote SJ

    更新日期:2002-09-01 00:00:00

  • Background-suppressed live visualization of genomic loci with an improved CRISPR system based on a split fluorophore.

    abstract::The higher-order structural organization and dynamics of the chromosomes play a central role in gene regulation. To explore this structure-function relationship, it is necessary to directly visualize genomic elements in living cells. Genome imaging based on the CRISPR system is a powerful approach but has limited appl...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.260018.119

    authors: Chaudhary N,Nho SH,Cho H,Gantumur N,Ra JS,Myung K,Kim H

    更新日期:2020-09-01 00:00:00

  • Copy-number-aware differential analysis of quantitative DNA sequencing data.

    abstract::Developments in microarray and high-throughput sequencing (HTS) technologies have resulted in a rapid expansion of research into epigenomic changes that occur in normal development and in the progression of disease, such as cancer. Not surprisingly, copy number variation (CNV) has a direct effect on HTS read densities...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.139055.112

    authors: Robinson MD,Strbenac D,Stirzaker C,Statham AL,Song J,Speed TP,Clark SJ

    更新日期:2012-12-01 00:00:00

  • Signatures of domain shuffling in the human genome.

    abstract::To elucidate the role of exon shuffling in shaping the complexity of the human genome/proteome, we have systematically analyzed intron phase distributions in the coding sequence of human protein domains. We found that introns at the boundaries of domains show high excess of symmetrical phase combinations (i.e., 0-0, 1...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.520702

    authors: Kaessmann H,Zöllner S,Nekrutenko A,Li WH

    更新日期:2002-11-01 00:00:00

  • Exo-proofreading, a versatile SNP scoring technology.

    abstract::We report the validation of a new assay for typing single nucleotide polymorphisms (SNPs) that takes advantage of the 3'-to-5' exonuclease proofreading activity of many DNA polymerases. The assay uses one or more primers labeled on the 3' nucleotide base, and can be implemented in a variety of formats including a one-...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.939903

    authors: Cahill P,Bakis M,Hurley J,Kamath V,Nielsen W,Weymouth D,Dupuis J,Doucette-Stamm L,Smith DR

    更新日期:2003-05-01 00:00:00

  • Translation initiation downstream from annotated start codons in human mRNAs coevolves with the Kozak context.

    abstract::Eukaryotic translation initiation involves preinitiation ribosomal complex 5'-to-3' directional probing of mRNA for codons suitable for starting protein synthesis. The recognition of codons as starts depends on the codon identity and on its immediate nucleotide context known as Kozak context. When the context is weak ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257352.119

    authors: Benitez-Cantos MS,Yordanova MM,O'Connor PBF,Zhdanov AV,Kovalchuk SI,Papkovsky DB,Andreev DE,Baranov PV

    更新日期:2020-07-01 00:00:00

  • Evidence for widespread subfunctionalization of splice forms in vertebrate genomes.

    abstract::Gene duplication and alternative splicing are important sources of proteomic diversity. Despite research indicating that gene duplication and alternative splicing are negatively correlated, the evolutionary relationship between the two remains unclear. One manner in which alternative splicing and gene duplication may ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.184473.114

    authors: Lambert MJ,Cochran WO,Wilde BM,Olsen KG,Cooper CD

    更新日期:2015-05-01 00:00:00

  • Widespread somatic L1 retrotransposition occurs early during gastrointestinal cancer evolution.

    abstract::Somatic L1 retrotransposition events have been shown to occur in epithelial cancers. Here, we attempted to determine how early somatic L1 insertions occurred during the development of gastrointestinal (GI) cancers. Using L1-targeted resequencing (L1-seq), we studied different stages of four colorectal cancers arising ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.196238.115

    authors: Ewing AD,Gacita A,Wood LD,Ma F,Xing D,Kim MS,Manda SS,Abril G,Pereira G,Makohon-Moore A,Looijenga LH,Gillis AJ,Hruban RH,Anders RA,Romans KE,Pandey A,Iacobuzio-Donahue CA,Vogelstein B,Kinzler KW,Kazazian HH Jr,Sol

    更新日期:2015-10-01 00:00:00

  • WebLogo: a sequence logo generator.

    abstract::WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Ea...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.849004

    authors: Crooks GE,Hon G,Chandonia JM,Brenner SE

    更新日期:2004-06-01 00:00:00

  • Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program.

    abstract::Mammalian genomes are partitioned into domains that replicate in a defined temporal order. These domains can replicate at similar times in all cell types (constitutive) or at cell type-specific times (developmental). Genome-wide chromatin conformation capture (Hi-C) has revealed sub-megabase topologically associating ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.183699.114

    authors: Dileep V,Ay F,Sima J,Vera DL,Noble WS,Gilbert DM

    更新日期:2015-08-01 00:00:00

  • High-throughput plasmid purification for capillary sequencing.

    abstract::The need for expeditious and inexpensive methods for high-throughput DNA sequencing has been highlighted by the accelerated pace of genome DNA sequencing over the past year. At the Joint Genome Institute, the throughput in terms of high-quality bases per day has increased over 20-fold during the past 18 mo, reaching a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.167801

    authors: Elkin CJ,Richardson PM,Fourcade HM,Hammon NM,Pollard MJ,Predki PF,Glavina T,Hawkins TL

    更新日期:2001-07-01 00:00:00

  • Single-cell strand sequencing of a macaque genome reveals multiple nested inversions and breakpoint reuse during primate evolution.

    abstract::Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.265322.120

    authors: Maggiolini FAM,Sanders AD,Shew CJ,Sulovari A,Mao Y,Puig M,Catacchio CR,Dellino M,Palmisano D,Mercuri L,Bitonto M,Porubský D,Cáceres M,Eichler EE,Ventura M,Dennis MY,Korbel JO,Antonacci F

    更新日期:2020-11-01 00:00:00

  • Nurture trumps nature in a longitudinal survey of salivary bacterial communities in twins from early adolescence to early adulthood.

    abstract::Variation in the composition of the human oral microbiome in health and disease has been observed. We have characterized inter- and intra-individual variation of microbial communities of 107 individuals in one of the largest cohorts to date (264 saliva samples), using culture-independent 16S rRNA pyrosequencing. We ex...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140608.112

    authors: Stahringer SS,Clemente JC,Corley RP,Hewitt J,Knights D,Walters WA,Knight R,Krauter KS

    更新日期:2012-11-01 00:00:00

  • Long RT-PCR of the entire 8.5-kb NF1 open reading frame and mutation detection on agarose gels.

    abstract::Previous approaches to mutation detection in mRNA from the neurofibromatosis 1 (NF1) locus have required the PCR amplification of five or more overlapping cDNA segments to screen the entire 8.5-kb open reading frame (ORF). Systematically, these assays do not detect deletions that span the region of overlap (usually 1-...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.1.58

    authors: Martinez JM,Breidenbach HH,Cawthon R

    更新日期:1996-01-01 00:00:00

  • Rapid evolution of mouse Y centromere repeat DNA belies recent sequence stability.

    abstract::The Y centromere sequence of house mouse, Mus musculus, remains unknown despite our otherwise significant knowledge of the genome sequence of this important mammalian model organism. Here, we report the complete molecular characterization of the C57BL/6J chromosome Y centromere, which comprises a highly diverged minor...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.092080.109

    authors: Pertile MD,Graham AN,Choo KH,Kalitsis P

    更新日期:2009-12-01 00:00:00

  • The Release 6 reference sequence of the Drosophila melanogaster genome.

    abstract::Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and co...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.185579.114

    authors: Hoskins RA,Carlson JW,Wan KH,Park S,Mendez I,Galle SE,Booth BW,Pfeiffer BD,George RA,Svirskas R,Krzywinski M,Schein J,Accardo MC,Damia E,Messina G,Méndez-Lago M,de Pablos B,Demakova OV,Andreyeva EN,Boldyreva LV,Ma

    更新日期:2015-03-01 00:00:00

  • A genome-wide study of dual coding regions in human alternatively spliced genes.

    abstract::Alternative splicing is a major mechanism for gene product regulation in many multicellular organisms. By using different exon combinations, some coding regions can encode amino acids in multiple reading frames in different transcripts. Here we performed a systematic search through a set of high-quality human transcri...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4246506

    authors: Liang H,Landweber LF

    更新日期:2006-02-01 00:00:00

  • A GC-rich sequence feature in the 3' UTR directs UPF1-dependent mRNA decay in mammalian cells.

    abstract::Up-frameshift protein 1 (UPF1) is an ATP-dependent RNA helicase that has essential roles in RNA surveillance and in post-transcriptional gene regulation by promoting the degradation of mRNAs. Previous studies revealed that UPF1 is associated with the 3' untranslated region (UTR) of target mRNAs via as-yet-unknown sequ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.206060.116

    authors: Imamachi N,Salam KA,Suzuki Y,Akimitsu N

    更新日期:2017-03-01 00:00:00

  • 2C-Cas9: a versatile tool for clonal analysis of gene function.

    abstract::CRISPR/Cas9-mediated targeted mutagenesis allows efficient generation of loss-of-function alleles in zebrafish. To date, this technology has been primarily used to generate genetic knockout animals. Nevertheless, the study of the function of certain loci might require tight spatiotemporal control of gene inactivation....

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.196170.115

    authors: Di Donato V,De Santis F,Auer TO,Testa N,Sánchez-Iranzo H,Mercader N,Concordet JP,Del Bene F

    更新日期:2016-05-01 00:00:00