Abstract:
:Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.
journal_name
Genome Resjournal_title
Genome researchauthors
Dougherty ML,Underwood JG,Nelson BJ,Tseng E,Munson KM,Penn O,Nowakowski TJ,Pollen AA,Eichler EEdoi
10.1101/gr.237610.118subject
Has Abstractpub_date
2018-10-01 00:00:00pages
1566-1576issue
10eissn
1088-9051issn
1549-5469pii
gr.237610.118journal_volume
28pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::Retroposed copies (RPCs) of genes are functional (intronless paralogs) or nonfunctional (processed pseudogenes) copies derived from mRNA through a process of retrotransposition. Previous studies found that gene families involved in mRNA translation or nuclear function were more likely to have large numbers of RPCs. He...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.893803
更新日期:2003-05-01 00:00:00
abstract::Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.240952.118
更新日期:2019-04-01 00:00:00
abstract::Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.176601
更新日期:2001-05-01 00:00:00
abstract::High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive, and fully scalable taxonomic classi...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.225276.117
更新日期:2018-05-01 00:00:00
abstract::Chromosomal aberrations have been thought to be random events. However, recent findings introduce a new paradigm in which certain DNA segments have the potential to adopt unusual conformations that lead to genomic instability and nonrandom chromosomal rearrangement. One of the best-studied examples is the palindromic ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.079244.108
更新日期:2009-02-01 00:00:00
abstract::The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.246462.118
更新日期:2019-12-01 00:00:00
abstract::Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from evolutionary constraints. We systematically analyzed known human and Saccharomyces cerevisiae regulatory elements and disc...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1327604
更新日期:2004-03-01 00:00:00
abstract::An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4303406
更新日期:2006-03-01 00:00:00
abstract::Whole-genome sequencing using massively parallel sequencing technologies enables accurate detection of somatic rearrangements in cancer. Pinpointing large numbers of rearrangement breakpoints to base-pair resolution allows analysis of rearrangement microhomology and genomic location for every sample. Here we analyze 9...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.141382.112
更新日期:2013-02-01 00:00:00
abstract::As the Human Genome Project moves into its sequencing phase, a serious problem has arisen. The same problem has been increasingly vexing in the closing phase of the Caenorhabditis elegans project. The difficulty lies in sequencing efficiently through certain regions in which the templates (DNA substrates for the seque...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.8.5.562
更新日期:1998-05-01 00:00:00
abstract::It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can exp...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.2800104
更新日期:2004-12-01 00:00:00
abstract::Metagenomic projects generate short, overlapping fragments of DNA sequence, each deriving from a different individual. We report a new method for inferring the scaled mutation rate, theta = 2Neu, and the scaled exponential growth rate, R = Ner, from the site-frequency spectrum of these data while accounting for sequen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.5431206
更新日期:2006-10-01 00:00:00
abstract::Mouse chromosome 7F4/F5, where the imprinting domain is located, is syntenic to human 11p15.5, the locus for Beckwith-Wiedemann syndrome. The domain is thought to consist of the two subdomains Kip2 (p57(kip2))/Lit1 and Igf2/H19. Because DNA methylation is believed to be a key factor in genomic imprinting, we performed...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.110702
更新日期:2002-12-01 00:00:00
abstract::RNA sequencing (RNA-seq) is a sensitive and accurate method for quantifying gene expression. Small samples or those whose RNA is degraded, such as formalin-fixed paraffin-embedded (FFPE) tissue, remain challenging to study with nonspecialized RNA-seq protocols. Here, we present a new method, Smart-3SEQ, that accuratel...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.234807.118
更新日期:2019-11-01 00:00:00
abstract::Size sexual dimorphism occurs in almost all mammals. In Portuguese Water Dogs, much of the difference in skeletal size between females and males is due to the interaction between a Quantitative Trait Locus (QTL) on the X-chromosome and a QTL linked to Insulin-like Growth Factor 1 (IGF-1) on the CFA 15 autosome. In fem...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.3712705
更新日期:2005-12-01 00:00:00
abstract::Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and co...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.185579.114
更新日期:2015-03-01 00:00:00
abstract::Legionella pneumophila is an environmental bacterium and the leading cause of Legionnaires' disease. Just five sequence types (ST), from more than 2000 currently described, cause nearly half of disease cases in northwest Europe. Here, we report the sequence and analyses of 364 L. pneumophila genomes, including 337 fro...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.209536.116
更新日期:2016-11-01 00:00:00
abstract::Cancer progression in humans is difficult to infer because we do not routinely sample patients at multiple stages of their disease. However, heterogeneous breast tumors provide a unique opportunity to study human tumor progression because they still contain evidence of early and intermediate subpopulations in the form...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.099622.109
更新日期:2010-01-01 00:00:00
abstract::Y chromosome haplotypes are particularly useful in deciphering human evolutionary history because they accentuate the effects of drift, migration, and range expansion. Significant acceleration of Y biallelic marker discovery and subsequent typing involving heteroduplex detection has been achieved by implementing an in...
journal_title:Genome research
pub_type: 信件
doi:10.1101/gr.7.10.996
更新日期:1997-10-01 00:00:00
abstract::In mammals, genetic recombination during meiosis is limited to a set of 1- to 2-kb regions termed hotspots. Their locations are predominantly determined by the zinc finger protein PRDM9, which binds to DNA in hotspots and subsequently uses its SET domain to locally trimethylate histone H3 at lysine 4 (H3K4me3). This s...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.170167.113
更新日期:2014-05-01 00:00:00
abstract::As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, an...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1858004
更新日期:2004-05-01 00:00:00
abstract::In the search for common genetic variants that contribute to prevalent human diseases, patterns of linkage disequilibrium (LD) among linked markers should be considered when selecting SNPs. Genotyping efficiency can be increased by choosing tagging SNPs (tagSNPs) in LD with other SNPs. However, it remains to be seen w...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4138406
更新日期:2006-03-01 00:00:00
abstract::Meiotic DNA double-stranded breaks (DSBs) initiate genetic recombination in discrete areas of the genome called recombination hotspots. DSBs can be directly mapped using chromatin immunoprecipitation followed by sequencing (ChIP-seq). Nevertheless, the genome-wide mapping of recombination hotspots in mammals is still ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.130583.111
更新日期:2012-05-01 00:00:00
abstract::To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), we created expressed sequence tag (EST) and cDNA microarray resources for the bee brain. Over 20,000 cDNA clones were partially sequenced from a normalized (and subsequently subtracted) library generated from adult A. mellifera brains....
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.5302
更新日期:2002-04-01 00:00:00
abstract::C2H2 zinc finger proteins represent the largest and most enigmatic class of human transcription factors. Their C2H2-ZF arrays are highly variable, indicating that most will have unique DNA binding motifs. However, most of the binding motifs have not been directly determined. In addition, little is known about whether ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.209643.116
更新日期:2016-12-01 00:00:00
abstract::The higher-order structural organization and dynamics of the chromosomes play a central role in gene regulation. To explore this structure-function relationship, it is necessary to directly visualize genomic elements in living cells. Genome imaging based on the CRISPR system is a powerful approach but has limited appl...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.260018.119
更新日期:2020-09-01 00:00:00
abstract::Genomic comparisons provide evidence for ancient genome-wide duplications in a diverse array of animals and plants. We developed a birth-death model to identify evidence for genome duplication in EST data, and applied a mixture model to estimate the age distribution of paralogous pairs identified in EST sets for speci...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4825606
更新日期:2006-06-01 00:00:00
abstract::All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications-from answering questions about human evolution to locating regions in the human genome containing di...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.115360.110
更新日期:2011-07-01 00:00:00
abstract::The expression of most genes is regulated by multiple transcription factors. The interactions between transcription factors produce complex patterns of gene expression that are not always obvious from the arrangement of cis-regulatory elements in a promoter. One critical element of promoters is the TATA box, the docki...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.106732.110
更新日期:2010-10-01 00:00:00
abstract::Few methods are available for mapping the local structure of DNA throughout a genome. The hydroxyl radical cleavage pattern is a measure of the local variation in solvent-accessible surface area of duplex DNA, and thus provides information on the local shape and structure of DNA. We report the construction of a relati...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.6073107
更新日期:2007-06-01 00:00:00