Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

Abstract:

:Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.

journal_name

Genome Res

journal_title

Genome research

authors

Hegyi H,Gerstein M

doi

10.1101/gr.183801

subject

Has Abstract

pub_date

2001-10-01 00:00:00

pages

1632-40

issue

10

eissn

1088-9051

issn

1549-5469

journal_volume

11

pub_type

杂志文章
  • The Release 6 reference sequence of the Drosophila melanogaster genome.

    abstract::Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and co...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.185579.114

    authors: Hoskins RA,Carlson JW,Wan KH,Park S,Mendez I,Galle SE,Booth BW,Pfeiffer BD,George RA,Svirskas R,Krzywinski M,Schein J,Accardo MC,Damia E,Messina G,Méndez-Lago M,de Pablos B,Demakova OV,Andreyeva EN,Boldyreva LV,Ma

    更新日期:2015-03-01 00:00:00

  • New class of microRNA targets containing simultaneous 5'-UTR and 3'-UTR interaction sites.

    abstract::MicroRNAs (miRNAs) are known to post-transcriptionally regulate target mRNAs through the 3'-UTR, which interacts mainly with the 5'-end of miRNA in animals. Here we identify many endogenous motifs within human 5'-UTRs specific to the 3'-ends of miRNAs. The 3'-end of conserved miRNAs in particular has significant inter...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.089367.108

    authors: Lee I,Ajay SS,Yook JI,Kim HS,Hong SH,Kim NH,Dhanasekaran SM,Chinnaiyan AM,Athey BD

    更新日期:2009-07-01 00:00:00

  • metaSPAdes: a new versatile metagenomic assembler.

    abstract::While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amp...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.213959.116

    authors: Nurk S,Meleshko D,Korobeynikov A,Pevzner PA

    更新日期:2017-05-01 00:00:00

  • Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome.

    abstract::In diploid mammalian genomes, parental alleles can exhibit different methylation patterns (allele-specific DNA methylation, ASM), which have been documented in a small number of cases except for the imprinted regions and X chromosomes in females. We carried out a chromosome-wide survey of ASM across 16 human pluripote...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.104695.109

    authors: Shoemaker R,Deng J,Wang W,Zhang K

    更新日期:2010-07-01 00:00:00

  • Connecting sequence and biology in the laboratory mouse.

    abstract::The Mouse Genome Sequencing Consortium and the RIKEN Genome Exploration Research grouphave generated large sets of sequence data representing the mouse genome and transcriptome, respectively. These data provide a valuable foundation for genomic research. The challenges for the informatics community are how to integrat...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.991003

    authors: Baldarelli RM,Hill DP,Blake JA,Adachi J,Furuno M,Bradt D,Corbani LE,Cousins S,Frazer KS,Qi D,Yang L,Ramachandran S,Reed D,Zhu Y,Kasukawa T,Ringwald M,King BL,Maltais LJ,McKenzie LM,Schriml LM,Maglott D,Church DM

    更新日期:2003-06-01 00:00:00

  • Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II.

    abstract::The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC re...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.213538.116

    authors: Norman PJ,Norberg SJ,Guethlein LA,Nemat-Gorgani N,Royce T,Wroblewski EE,Dunn T,Mann T,Alicata C,Hollenbach JA,Chang W,Shults Won M,Gunderson KL,Abi-Rached L,Ronaghi M,Parham P

    更新日期:2017-05-01 00:00:00

  • Widespread plasticity in CTCF occupancy linked to DNA methylation.

    abstract::CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.136101.111

    authors: Wang H,Maurano MT,Qu H,Varley KE,Gertz J,Pauli F,Lee K,Canfield T,Weaver M,Sandstrom R,Thurman RE,Kaul R,Myers RM,Stamatoyannopoulos JA

    更新日期:2012-09-01 00:00:00

  • Genomic analysis of primordial dwarfism reveals novel disease genes.

    abstract::Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in d...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.160572.113

    authors: Shaheen R,Faqeih E,Ansari S,Abdel-Salam G,Al-Hassnan ZN,Al-Shidi T,Alomar R,Sogaty S,Alkuraya FS

    更新日期:2014-02-01 00:00:00

  • Distribution of hammerhead and hammerhead-like RNA motifs through the GenBank.

    abstract::Hammerhead ribozymes previously were found in satellite RNAs from plant viroids and in repetitive DNA from certain species of newts and schistosomes. To determine if this catalytic RNA motif has a wider distribution, we decided to scrutinize the GenBank database for RNAs that contain hammerhead or hammerhead-like moti...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.7.1011

    authors: Ferbeyre G,Bourdeau V,Pageau M,Miramontes P,Cedergren R

    更新日期:2000-07-01 00:00:00

  • Exo-proofreading, a versatile SNP scoring technology.

    abstract::We report the validation of a new assay for typing single nucleotide polymorphisms (SNPs) that takes advantage of the 3'-to-5' exonuclease proofreading activity of many DNA polymerases. The assay uses one or more primers labeled on the 3' nucleotide base, and can be implemented in a variety of formats including a one-...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.939903

    authors: Cahill P,Bakis M,Hurley J,Kamath V,Nielsen W,Weymouth D,Dupuis J,Doucette-Stamm L,Smith DR

    更新日期:2003-05-01 00:00:00

  • Genetic analysis of complex traits in the emerging Collaborative Cross.

    abstract::The Collaborative Cross (CC) is a mouse recombinant inbred strain panel that is being developed as a resource for mammalian systems genetics. Here we describe an experiment that uses partially inbred CC lines to evaluate the genetic properties and utility of this emerging resource. Genome-wide analysis of the incipien...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.111310.110

    authors: Aylor DL,Valdar W,Foulds-Mathes W,Buus RJ,Verdugo RA,Baric RS,Ferris MT,Frelinger JA,Heise M,Frieman MB,Gralinski LE,Bell TA,Didion JD,Hua K,Nehrenberg DL,Powell CL,Steigerwalt J,Xie Y,Kelada SN,Collins FS,Yang IV

    更新日期:2011-08-01 00:00:00

  • A palindromic structure in the pericentromeric region of various human chromosomes.

    abstract::The primate-specific multisequence family chAB4 is represented with approximately 40 copies within the haploid human genome. Former analyis revealed that unusually long repetition units ( > 35 kb) are distributed to at least eight different chromosomal loci. Remarkably varying copy-numbers within the genomes of closel...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.4.267

    authors: Wöhr G,Fink T,Assum G

    更新日期:1996-04-01 00:00:00

  • Wolbachia genome integrated in an insect chromosome: evolution and fate of laterally transferred endosymbiont genes.

    abstract::Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we tho...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7144908

    authors: Nikoh N,Tanaka K,Shibata F,Kondo N,Hizume M,Shimada M,Fukatsu T

    更新日期:2008-02-01 00:00:00

  • Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

    abstract::By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by m...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4039406

    authors: Kimura K,Wakamatsu A,Suzuki Y,Ota T,Nishikawa T,Yamashita R,Yamamoto J,Sekine M,Tsuritani K,Wakaguri H,Ishii S,Sugiyama T,Saito K,Isono Y,Irie R,Kushida N,Yoneyama T,Otsuka R,Kanda K,Yokoi T,Kondo H,Wagatsuma M

    更新日期:2006-01-01 00:00:00

  • The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes.

    abstract::Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.240952.118

    authors: Schield DR,Card DC,Hales NR,Perry BW,Pasquesi GM,Blackmon H,Adams RH,Corbin AB,Smith CF,Ramesh B,Demuth JP,Betrán E,Tollis M,Meik JM,Mackessy SP,Castoe TA

    更新日期:2019-04-01 00:00:00

  • A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles.

    abstract::An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4303406

    authors: Chang LW,Nagarajan R,Magee JA,Milbrandt J,Stormo GD

    更新日期:2006-03-01 00:00:00

  • The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine.

    abstract::ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.092841.109

    authors: Biesecker LG,Mullikin JC,Facio FM,Turner C,Cherukuri PF,Blakesley RW,Bouffard GG,Chines PS,Cruz P,Hansen NF,Teer JK,Maskeri B,Young AC,NISC Comparative Sequencing Program.,Manolio TA,Wilson AF,Finkel T,Hwang P,Arai A

    更新日期:2009-09-01 00:00:00

  • Reconstructing large regions of an ancestral mammalian genome in silico.

    abstract::It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can exp...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2800104

    authors: Blanchette M,Green ED,Miller W,Haussler D

    更新日期:2004-12-01 00:00:00

  • Automatic analysis of dividing cells in live cell movies to detect mitotic delays and correlate phenotypes in time.

    abstract::Live-cell imaging allows detailed dynamic cellular phenotyping for cell biology and, in combination with small molecule or drug libraries, for high-content screening. Fully automated analysis of live cell movies has been hampered by the lack of computational approaches that allow tracking and recognition of individual...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.092494.109

    authors: Harder N,Mora-Bermúdez F,Godinez WJ,Wünsche A,Eils R,Ellenberg J,Rohr K

    更新日期:2009-11-01 00:00:00

  • High-throughput genotyping by whole-genome resequencing.

    abstract::The next-generation sequencing technology coupled with the growing number of genome sequences opens the opportunity to redesign genotyping strategies for more effective genetic mapping and genome analysis. We have developed a high-throughput method for genotyping recombinant populations utilizing whole-genome resequen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.089516.108

    authors: Huang X,Feng Q,Qian Q,Zhao Q,Wang L,Wang A,Guan J,Fan D,Weng Q,Huang T,Dong G,Sang T,Han B

    更新日期:2009-06-01 00:00:00

  • CG dinucleotides enhance promoter activity independent of DNA methylation.

    abstract::Most mammalian RNA polymerase II initiation events occur at CpG islands, which are rich in CpGs and devoid of DNA methylation. Despite their relevance for gene regulation, it is unknown to what extent the CpG dinucleotide itself actually contributes to promoter activity. To address this question, we determined the tra...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.241653.118

    authors: Hartl D,Krebs AR,Grand RS,Baubec T,Isbel L,Wirbelauer C,Burger L,Schübeler D

    更新日期:2019-04-01 00:00:00

  • Estimating coarse gene network structure from large-scale gene perturbation data.

    abstract::Large scale gene perturbation experiments generate information about the number of genes whose activity is directly or indirectly affected by a gene perturbation. From this information, one can numerically estimate coarse structural network features such as the total number of direct regulatory interactions and the nu...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.193902

    authors: Wagner A

    更新日期:2002-02-01 00:00:00

  • Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis.

    abstract::The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.216754.116

    authors: Yang F,Wang J,GTEx Consortium.,Pierce BL,Chen LS

    更新日期:2017-11-01 00:00:00

  • Identification of protein features encoded by alternative exons using Exon Ontology.

    abstract::Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward thi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.212696.116

    authors: Tranchevent LC,Aubé F,Dulaurier L,Benoit-Pilven C,Rey A,Poret A,Chautard E,Mortada H,Desmet FO,Chakrama FZ,Moreno-Garcia MA,Goillot E,Janczarski S,Mortreux F,Bourgeois CF,Auboeuf D

    更新日期:2017-06-01 00:00:00

  • Preference of DNA methyltransferases for CpG islands in mouse embryonic stem cells.

    abstract::Many CpG islands have tissue-dependent and differentially methylated regions (T-DMRs) in normal cells and tissues. To elucidate how DNA methyltransferases (Dnmts) participate in methylation of the genomic components, we investigated the genome-wide DNA methylation pattern of the T-DMRs with Dnmt1-, Dnmt3a-, and/or Dnm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2431504

    authors: Hattori N,Abe T,Hattori N,Suzuki M,Matsuyama T,Yoshida S,Li E,Shiota K

    更新日期:2004-09-01 00:00:00

  • Biological data sciences in genome research.

    abstract::The last 20 years have been a remarkable era for biology and medicine. One of the most significant achievements has been the sequencing of the first human genomes, which has laid the foundation for profound insights into human genetics, the intricacies of regulation and development, and the forces of evolution. Incred...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.191684.115

    authors: Schatz MC

    更新日期:2015-10-01 00:00:00

  • Fourfold faster rate of genome rearrangement in nematodes than in Drosophila.

    abstract::We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that t...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.172702

    authors: Coghlan A,Wolfe KH

    更新日期:2002-06-01 00:00:00

  • Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages.

    abstract::Molecular evolution studies are usually based on the analysis of individual genes and thus reflect only small-range variations in genomic sequences. A complementary approach is to study the evolutionary history of rearrangements in entire genomes based on the analysis of gene orders. The progress in whole genome seque...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3002305

    authors: Bourque G,Zdobnov EM,Bork P,Pevzner PA,Tesler G

    更新日期:2005-01-01 00:00:00

  • High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications.

    abstract::We present a database of copy number variations (CNVs) detected in 2026 disease-free individuals, using high-density, SNP-based oligonucleotide microarrays. This large cohort, comprised mainly of Caucasians (65.2%) and African-Americans (34.2%), was analyzed for CNVs in a single study using a uniform array platform an...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.083501.108

    authors: Shaikh TH,Gai X,Perin JC,Glessner JT,Xie H,Murphy K,O'Hara R,Casalunovo T,Conlin LK,D'Arcy M,Frackelton EC,Geiger EA,Haldeman-Englert C,Imielinski M,Kim CE,Medne L,Annaiah K,Bradfield JP,Dabaghyan E,Eckert A,Onyia

    更新日期:2009-09-01 00:00:00

  • A comprehensive transcript map of the mouse Gnas imprinted complex.

    abstract::The recent publication of the FANTOM mouse transcriptome has provided a unique opportunity to study the diversity of transcripts arising from a single gene locus. We have focused on the Gnas complex, as imprinting loci themselves provide unique insights into transcriptional regulation. Thirteen full-length cDNAs from ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.955503

    authors: Holmes R,Williamson C,Peters J,Denny P,Wells C,RIKEN GER Group.,GSL Members.

    更新日期:2003-06-01 00:00:00