Transcriptional fates of human-specific segmental duplications in brain.

Abstract:

:Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.

journal_name

Genome Res

journal_title

Genome research

authors

Dougherty ML,Underwood JG,Nelson BJ,Tseng E,Munson KM,Penn O,Nowakowski TJ,Pollen AA,Eichler EE

doi

10.1101/gr.237610.118

subject

Has Abstract

pub_date

2018-10-01 00:00:00

pages

1566-1576

issue

10

eissn

1088-9051

issn

1549-5469

pii

gr.237610.118

journal_volume

28

pub_type

杂志文章
  • Retroposed copies of the HMG genes: a window to genome dynamics.

    abstract::Retroposed copies (RPCs) of genes are functional (intronless paralogs) or nonfunctional (processed pseudogenes) copies derived from mRNA through a process of retrotransposition. Previous studies found that gene families involved in mRNA translation or nuclear function were more likely to have large numbers of RPCs. He...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.893803

    authors: Strichman-Almashanu LZ,Bustin M,Landsman D

    更新日期:2003-05-01 00:00:00

  • The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes.

    abstract::Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.240952.118

    authors: Schield DR,Card DC,Hales NR,Perry BW,Pasquesi GM,Blackmon H,Adams RH,Corbin AB,Smith CF,Ramesh B,Demuth JP,Betrán E,Tollis M,Meik JM,Mackessy SP,Castoe TA

    更新日期:2019-04-01 00:00:00

  • Predicting deleterious amino acid substitutions.

    abstract::Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.176601

    authors: Ng PC,Henikoff S

    更新日期:2001-05-01 00:00:00

  • taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time.

    abstract::High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive, and fully scalable taxonomic classi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.225276.117

    authors: Corvelo A,Clarke WE,Robine N,Zody MC

    更新日期:2018-05-01 00:00:00

  • Chromosomal instability mediated by non-B DNA: cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans.

    abstract::Chromosomal aberrations have been thought to be random events. However, recent findings introduce a new paradigm in which certain DNA segments have the potential to adopt unusual conformations that lead to genomic instability and nonrandom chromosomal rearrangement. One of the best-studied examples is the palindromic ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.079244.108

    authors: Inagaki H,Ohye T,Kogo H,Kato T,Bolor H,Taniguchi M,Shaikh TH,Emanuel BS,Kurahashi H

    更新日期:2009-02-01 00:00:00

  • Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.

    abstract::The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.246462.118

    authors: Mudge JM,Jungreis I,Hunt T,Gonzalez JM,Wright JC,Kay M,Davidson C,Fitzgerald S,Seal R,Tweedie S,He L,Waterhouse RM,Li Y,Bruford E,Choudhary JS,Frankish A,Kellis M

    更新日期:2019-12-01 00:00:00

  • Eukaryotic regulatory element conservation analysis and identification using comparative genomics.

    abstract::Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from evolutionary constraints. We systematically analyzed known human and Saccharomyces cerevisiae regulatory elements and disc...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1327604

    authors: Liu Y,Liu XS,Wei L,Altman RB,Batzoglou S

    更新日期:2004-03-01 00:00:00

  • A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles.

    abstract::An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4303406

    authors: Chang LW,Nagarajan R,Magee JA,Milbrandt J,Stormo GD

    更新日期:2006-03-01 00:00:00

  • Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability.

    abstract::Whole-genome sequencing using massively parallel sequencing technologies enables accurate detection of somatic rearrangements in cancer. Pinpointing large numbers of rearrangement breakpoints to base-pair resolution allows analysis of rearrangement microhomology and genomic location for every sample. Here we analyze 9...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.141382.112

    authors: Drier Y,Lawrence MS,Carter SL,Stewart C,Gabriel SB,Lander ES,Meyerson M,Beroukhim R,Getz G

    更新日期:2013-02-01 00:00:00

  • Short-insert libraries as a method of problem solving in genome sequencing.

    abstract::As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the seque...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.5.562

    authors: McMurray AA,Sulston JE,Quail MA

    更新日期:1998-05-01 00:00:00

  • Reconstructing large regions of an ancestral mammalian genome in silico.

    abstract::It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can exp...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2800104

    authors: Blanchette M,Green ED,Miller W,Haussler D

    更新日期:2004-12-01 00:00:00

  • Inference of population genetic parameters in metagenomics: a clean look at messy data.

    abstract::Metagenomic projects generate short, overlapping fragments of DNA sequence, each deriving from a different individual. We report a new method for inferring the scaled mutation rate, theta = 2Neu, and the scaled exponential growth rate, R = Ner, from the site-frequency spectrum of these data while accounting for sequen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5431206

    authors: Johnson PL,Slatkin M

    更新日期:2006-10-01 00:00:00

  • Domain regulation of imprinting cluster in Kip2/Lit1 subdomain on mouse chromosome 7F4/F5: large-scale DNA methylation analysis reveals that DMR-Lit1 is a putative imprinting control region.

    abstract::Mouse chromosome 7F4/F5, where the imprinting domain is located, is syntenic to human 11p15.5, the locus for Beckwith-Wiedemann syndrome. The domain is thought to consist of the two subdomains Kip2 (p57(kip2))/Lit1 and Igf2/H19. Because DNA methylation is believed to be a key factor in genomic imprinting, we performed...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.110702

    authors: Yatsuki H,Joh K,Higashimoto K,Soejima H,Arai Y,Wang Y,Hatada I,Obata Y,Morisaki H,Zhang Z,Nakagawachi T,Satoh Y,Mukai T

    更新日期:2002-12-01 00:00:00

  • Gene expression profiling of single cells from archival tissue with laser-capture microdissection and Smart-3SEQ.

    abstract::RNA sequencing (RNA-seq) is a sensitive and accurate method for quantifying gene expression. Small samples or those whose RNA is degraded, such as formalin-fixed paraffin-embedded (FFPE) tissue, remain challenging to study with nonspecialized RNA-seq protocols. Here, we present a new method, Smart-3SEQ, that accuratel...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.234807.118

    authors: Foley JW,Zhu C,Jolivet P,Zhu SX,Lu P,Meaney MJ,West RB

    更新日期:2019-11-01 00:00:00

  • Interaction between the X chromosome and an autosome regulates size sexual dimorphism in Portuguese Water Dogs.

    abstract::Size sexual dimorphism occurs in almost all mammals. In Portuguese Water Dogs, much of the difference in skeletal size between females and males is due to the interaction between a Quantitative Trait Locus (QTL) on the X-chromosome and a QTL linked to Insulin-like Growth Factor 1 (IGF-1) on the CFA 15 autosome. In fem...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3712705

    authors: Chase K,Carrier DR,Adler FR,Ostrander EA,Lark KG

    更新日期:2005-12-01 00:00:00

  • The Release 6 reference sequence of the Drosophila melanogaster genome.

    abstract::Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and co...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.185579.114

    authors: Hoskins RA,Carlson JW,Wan KH,Park S,Mendez I,Galle SE,Booth BW,Pfeiffer BD,George RA,Svirskas R,Krzywinski M,Schein J,Accardo MC,Damia E,Messina G,Méndez-Lago M,de Pablos B,Demakova OV,Andreyeva EN,Boldyreva LV,Ma

    更新日期:2015-03-01 00:00:00

  • Multiple major disease-associated clones of Legionella pneumophila have emerged recently and independently.

    abstract::Legionella pneumophila is an environmental bacterium and the leading cause of Legionnaires' disease. Just five sequence types (ST), from more than 2000 currently described, cause nearly half of disease cases in northwest Europe. Here, we report the sequence and analyses of 364 L. pneumophila genomes, including 337 fro...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.209536.116

    authors: David S,Rusniok C,Mentasti M,Gomez-Valero L,Harris SR,Lechat P,Lees J,Ginevra C,Glaser P,Ma L,Bouchier C,Underwood A,Jarraud S,Harrison TG,Parkhill J,Buchrieser C

    更新日期:2016-11-01 00:00:00

  • Inferring tumor progression from genomic heterogeneity.

    abstract::Cancer progression in humans is difficult to infer because we do not routinely sample patients at multiple stages of their disease. However, heterogeneous breast tumors provide a unique opportunity to study human tumor progression because they still contain evidence of early and intermediate subpopulations in the form...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.099622.109

    authors: Navin N,Krasnitz A,Rodgers L,Cook K,Meth J,Kendall J,Riggs M,Eberling Y,Troge J,Grubor V,Levy D,Lundin P,Månér S,Zetterberg A,Hicks J,Wigler M

    更新日期:2010-01-01 00:00:00

  • Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography.

    abstract::Y chromosome haplotypes are particularly useful in deciphering human evolutionary history because they accentuate the effects of drift, migration, and range expansion. Significant acceleration of Y biallelic marker discovery and subsequent typing involving heteroduplex detection has been achieved by implementing an in...

    journal_title:Genome research

    pub_type: 信件

    doi:10.1101/gr.7.10.996

    authors: Underhill PA,Jin L,Lin AA,Mehdi SQ,Jenkins T,Vollrath D,Davis RW,Cavalli-Sforza LL,Oefner PJ

    更新日期:1997-10-01 00:00:00

  • PRDM9 binding organizes hotspot nucleosomes and limits Holliday junction migration.

    abstract::In mammals, genetic recombination during meiosis is limited to a set of 1- to 2-kb regions termed hotspots. Their locations are predominantly determined by the zinc finger protein PRDM9, which binds to DNA in hotspots and subsequently uses its SET domain to locally trimethylate histone H3 at lysine 4 (H3K4me3). This s...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.170167.113

    authors: Baker CL,Walker M,Kajita S,Petkov PM,Paigen K

    更新日期:2014-05-01 00:00:00

  • The Ensembl automatic gene annotation system.

    abstract::As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, an...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1858004

    authors: Curwen V,Eyras E,Andrews TD,Clarke L,Mongin E,Searle SM,Clamp M

    更新日期:2004-05-01 00:00:00

  • The portability of tagSNPs across populations: a worldwide survey.

    abstract::In the search for common genetic variants that contribute to prevalent human diseases, patterns of linkage disequilibrium (LD) among linked markers should be considered when selecting SNPs. Genotyping efficiency can be increased by choosing tagging SNPs (tagSNPs) in LD with other SNPs. However, it remains to be seen w...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4138406

    authors: González-Neira A,Ke X,Lao O,Calafell F,Navarro A,Comas D,Cann H,Bumpstead S,Ghori J,Hunt S,Deloukas P,Dunham I,Cardon LR,Bertranpetit J

    更新日期:2006-03-01 00:00:00

  • Sensitive mapping of recombination hotspots using sequencing-based detection of ssDNA.

    abstract::Meiotic DNA double-stranded breaks (DSBs) initiate genetic recombination in discrete areas of the genome called recombination hotspots. DSBs can be directly mapped using chromatin immunoprecipitation followed by sequencing (ChIP-seq). Nevertheless, the genome-wide mapping of recombination hotspots in mammals is still ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.130583.111

    authors: Khil PP,Smagulova F,Brick KM,Camerini-Otero RD,Petukhova GV

    更新日期:2012-05-01 00:00:00

  • Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee.

    abstract::To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), we created expressed sequence tag (EST) and cDNA microarray resources for the bee brain. Over 20,000 cDNA clones were partially sequenced from a normalized (and subsequently subtracted) library generated from adult A. mellifera brains....

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.5302

    authors: Whitfield CW,Band MR,Bonaldo MF,Kumar CG,Liu L,Pardinas JR,Robertson HM,Soares MB,Robinson GE

    更新日期:2002-04-01 00:00:00

  • Multiparameter functional diversity of human C2H2 zinc finger proteins.

    abstract::C2H2 zinc finger proteins represent the largest and most enigmatic class of human transcription factors. Their C2H2-ZF arrays are highly variable, indicating that most will have unique DNA binding motifs. However, most of the binding motifs have not been directly determined. In addition, little is known about whether ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.209643.116

    authors: Schmitges FW,Radovani E,Najafabadi HS,Barazandeh M,Campitelli LF,Yin Y,Jolma A,Zhong G,Guo H,Kanagalingam T,Dai WF,Taipale J,Emili A,Greenblatt JF,Hughes TR

    更新日期:2016-12-01 00:00:00

  • Background-suppressed live visualization of genomic loci with an improved CRISPR system based on a split fluorophore.

    abstract::The higher-order structural organization and dynamics of the chromosomes play a central role in gene regulation. To explore this structure-function relationship, it is necessary to directly visualize genomic elements in living cells. Genome imaging based on the CRISPR system is a powerful approach but has limited appl...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.260018.119

    authors: Chaudhary N,Nho SH,Cho H,Gantumur N,Ra JS,Myung K,Kim H

    更新日期:2020-09-01 00:00:00

  • Widespread genome duplications throughout the history of flowering plants.

    abstract::Genomic comparisons provide evidence for ancient genome-wide duplications in a diverse array of animals and plants. We developed a birth-death model to identify evidence for genome duplication in EST data, and applied a mixture model to estimate the age distribution of paralogous pairs identified in EST sets for speci...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4825606

    authors: Cui L,Wall PK,Leebens-Mack JH,Lindsay BG,Soltis DE,Doyle JJ,Soltis PS,Carlson JE,Arumuganathan K,Barakat A,Albert VA,Ma H,dePamphilis CW

    更新日期:2006-06-01 00:00:00

  • A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.

    abstract::All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications-from answering questions about human evolution to locating regions in the human genome containing di...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.115360.110

    authors: Moltke I,Albrechtsen A,Hansen TV,Nielsen FC,Nielsen R

    更新日期:2011-07-01 00:00:00

  • TATA is a modular component of synthetic promoters.

    abstract::The expression of most genes is regulated by multiple transcription factors. The interactions between transcription factors produce complex patterns of gene expression that are not always obvious from the arrangement of cis-regulatory elements in a promoter. One critical element of promoters is the TATA box, the docki...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.106732.110

    authors: Mogno I,Vallania F,Mitra RD,Cohen BA

    更新日期:2010-10-01 00:00:00

  • Construction of a genome-scale structural map at single-nucleotide resolution.

    abstract::Few methods are available for mapping the local structure of DNA throughout a genome. The hydroxyl radical cleavage pattern is a measure of the local variation in solvent-accessible surface area of duplex DNA, and thus provides information on the local shape and structure of DNA. We report the construction of a relati...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6073107

    authors: Greenbaum JA,Pang B,Tullius TD

    更新日期:2007-06-01 00:00:00