Abstract:
:The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.
journal_name
Genome Resjournal_title
Genome researchauthors
Mudge JM,Jungreis I,Hunt T,Gonzalez JM,Wright JC,Kay M,Davidson C,Fitzgerald S,Seal R,Tweedie S,He L,Waterhouse RM,Li Y,Bruford E,Choudhary JS,Frankish A,Kellis Mdoi
10.1101/gr.246462.118subject
Has Abstractpub_date
2019-12-01 00:00:00pages
2073-2087issue
12eissn
1088-9051issn
1549-5469pii
gr.246462.118journal_volume
29pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::Yeasts and filamentous fungi do not have adenosine deaminase acting on RNA (ADAR) orthologs and are believed to lack A-to-I RNA editing, which is the most prevalent editing of mRNA in animals. However, during this study with the PUK1(FGRRES_01058) pseudokinase gene important for sexual reproduction in Fusarium gramine...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.199877.115
更新日期:2016-04-01 00:00:00
abstract::We have identified previously a putative tumor suppressor gene (TSG) locus at human chromosome (hchr) 7q31 showing that it is altered in a variety of human epithelial tumors. To determine whether this TSG is conserved in mice, we studied loss of heterozygosity (LOH) in chemically induced mouse liver adenomas. The LOH ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.6.11.1070
更新日期:1996-11-01 00:00:00
abstract::Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat S...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.135780.111
更新日期:2012-06-01 00:00:00
abstract::Low-copy repeats, or segmental duplications, are highly dynamic regions in the genome. The low-copy repeats on chromosome 22q11.2 (LCR22) are a complex mosaic of genes and pseudogenes formed by duplication processes; they mediate chromosome rearrangements associated with velo-cardio-facial syndrome/DiGeorge syndrome, ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1549503
更新日期:2003-12-01 00:00:00
abstract::Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. el...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.244830.118
更新日期:2019-06-01 00:00:00
abstract::Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, an...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.2438004
更新日期:2005-01-01 00:00:00
abstract::Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates e...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.146233.112
更新日期:2013-06-01 00:00:00
abstract::When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pr...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.10.2.228
更新日期:2000-02-01 00:00:00
abstract::Using both env and long terminal repeat (LTR) sequences, with maximal representation of genetic diversity within primate strains, we revise and expand the unique evolutionary history of human and simian T-cell leukemia/lymphotropic viruses (HTLV/STLV). Based on the robust application of three different phylogenetic al...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:
更新日期:1999-06-01 00:00:00
abstract::Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessin...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.220640.117
更新日期:2017-11-01 00:00:00
abstract::Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is obse...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.gr-1619r
更新日期:2001-03-01 00:00:00
abstract::Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.240952.118
更新日期:2019-04-01 00:00:00
abstract::Transcript leaders (TLs) can have profound effects on mRNA translation and stability. To map TL boundaries genome-wide, we developed TL-sequencing (TL-seq), a technique combining enzymatic capture of m(7)G-capped mRNA 5' ends with high-throughput sequencing. TL-seq identified mRNA start sites for the majority of yeast...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.150342.112
更新日期:2013-06-01 00:00:00
abstract::Isochromosome 17q, or i(17q), is one of the most frequent nonrandom changes occurring in human neoplasia. Most of the i(17q) breakpoints cluster within a approximately 240-kb interval located in the Smith-Magenis syndrome common deletion region in 17p11.2. The breakpoint cluster region is characterized by a complex ar...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.080697.108
更新日期:2008-11-01 00:00:00
abstract::Little is known about novel genetic elements that drove the emergence of anthropoid primates. We exploited the sequencing of the marmoset genome to identify 23,849 anthropoid-specific constrained (ASC) regions and confirmed their robust functional signatures. Of the ASC base pairs, 99.7% were noncoding, suggesting tha...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.168963.113
更新日期:2014-09-01 00:00:00
abstract::In-gel competitive reassociation (IGCR) is a method of differential subtraction to enrich polymorphic DNA restriction fragments between two DNA samples without probes or specific sequence information. Here, we show that by combining IGCR and expressed sequence tags (EST) array hybridization, polymorphic DNA fragments ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.434103
更新日期:2003-03-01 00:00:00
abstract::Forty-three yeast artificial chromosomes (YACs) from the X chromosome have been overlapped across the 4-Mb Xq21.3 region, which is homologous to a segment in Yp11.1. The region is formatted to 60-kb resolution with 57 STSs and is merged at its edges with contigs specific for X. This allows a direct comparison of marke...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.7.4.307
更新日期:1997-04-01 00:00:00
abstract::In the search for common genetic variants that contribute to prevalent human diseases, patterns of linkage disequilibrium (LD) among linked markers should be considered when selecting SNPs. Genotyping efficiency can be increased by choosing tagging SNPs (tagSNPs) in LD with other SNPs. However, it remains to be seen w...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4138406
更新日期:2006-03-01 00:00:00
abstract::With the genomic sequencing of Arabidopsis nearing completion and rice sequencing very much in its infancy, a key question is whether we can exploit the Arabidopsis sequence to identify candidate genes for traits in cereal crops using a map-based approach. This requires the existence of colinearity between the Arabido...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.9.9.825
更新日期:1999-09-01 00:00:00
abstract::Using AP-PCR-based DNA profiling we examined some structural features of B chromosomes from yellow-necked mice Apodemus flavicollis. Mice harboring one, two, or three or lacking B chromosomes were examined. Chromosomal structure was scanned for variant bands by using a series of arbitrary primers and from these, infor...
journal_title:Genome research
pub_type: 杂志文章
doi:
更新日期:2000-01-01 00:00:00
abstract::X-linked Mental Retardation (XLMR) occurs in 1 in 600 males and is highly genetically heterogeneous. We used a novel human X chromosome cDNA microarray (XCA) to survey the expression profile of X-linked genes in lymphoblasts of XLMR males. Genes with altered expression verified by Northern blot and/or quantitative PCR...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.5336307
更新日期:2007-05-01 00:00:00
abstract::The need to translate genes to function has positioned the rat as an invaluable animal model for genomic research. The significant increase in genomic resources in recent years has had an immediate functional application in the rat. Many of the resources for translational research are already in place and are ready to...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.3744005
更新日期:2005-12-01 00:00:00
abstract::By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by m...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4039406
更新日期:2006-01-01 00:00:00
abstract::Human chromosomal regions enriched in segmental duplications are subject to extensive genomic reorganization. Such regions are particularly informative for illuminating the evolutionary history of a given chromosome. We have analyzed 866 kb of Y-chromosomal non-palindromic segmental duplications delineating four euchr...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.076711.108
更新日期:2008-07-01 00:00:00
abstract::Here, we report that CRISPR guide RNAs (gRNAs) with a 5'-triphosphate group (5'-ppp gRNAs) produced via in vitro transcription trigger RNA-sensing innate immune responses in human and murine cells, leading to cytotoxicity. 5'-ppp gRNAs in the cytosol are recognized by DDX58, which in turn activates type I interferon r...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.231936.117
更新日期:2018-02-22 00:00:00
abstract::We have used the FANTOM2 mouse cDNA set (60,770 clones), public mRNA data, and mouse genome sequence data to identify 2481 pairs of sense-antisense transcripts and 899 further pairs of nonantisense bidirectional transcription based upon genomic mapping. The analysis greatly expands the number of known examples of sens...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.982903
更新日期:2003-06-01 00:00:00
abstract::Eukaryotic DNA replication initiates from multiple discrete sites in the genome, termed origins of replication (origins). Prior to S phase, multiple origins are poised to initiate replication by recruitment of the pre-replicative complex (pre-RC). For proper replication to occur, origin activation must be tightly regu...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.209940.116
更新日期:2017-02-01 00:00:00
abstract::Gene expression levels can be an important link DNA between variation and phenotypic manifestations. Our previous map of global gene expression, based on ~400K single nucleotide polymorphisms (SNPs) and 50K transcripts in 400 sib pairs from the MRCA family panel, has been widely used to interpret the results of genome...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.142521.112
更新日期:2013-04-01 00:00:00
abstract::The apicomplexan Cryptosporidium parvum is one of the most prevalent protozoan parasites of humans. We report the physical mapping of the genome of the Iowa isolate, sequencing and analysis of chromosome 6, and approximately 0.9 Mbp of sequence sampled from the remainder of the genome. To construct a robust physical m...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1555203
更新日期:2003-08-01 00:00:00
abstract::Aberrant DNA methylation (DNAm) was first linked to cancer over 25 yr ago. Since then, many studies have associated hypermethylation of tumor suppressor genes and hypomethylation of oncogenes to the tumorigenic process. However, most of these studies have been limited to the analysis of promoters and CpG islands (CGIs...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.109678.110
更新日期:2011-04-01 00:00:00