Abstract:
:Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.
journal_name
Genome Resjournal_title
Genome researchauthors
Hegyi H,Gerstein Mdoi
10.1101/gr.183801subject
Has Abstractpub_date
2001-10-01 00:00:00pages
1632-40issue
10eissn
1088-9051issn
1549-5469journal_volume
11pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and co...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.185579.114
更新日期:2015-03-01 00:00:00
abstract::MicroRNAs (miRNAs) are known to post-transcriptionally regulate target mRNAs through the 3'-UTR, which interacts mainly with the 5'-end of miRNA in animals. Here we identify many endogenous motifs within human 5'-UTRs specific to the 3'-ends of miRNAs. The 3'-end of conserved miRNAs in particular has significant inter...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.089367.108
更新日期:2009-07-01 00:00:00
abstract::While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amp...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.213959.116
更新日期:2017-05-01 00:00:00
abstract::In diploid mammalian genomes, parental alleles can exhibit different methylation patterns (allele-specific DNA methylation, ASM), which have been documented in a small number of cases except for the imprinted regions and X chromosomes in females. We carried out a chromosome-wide survey of ASM across 16 human pluripote...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.104695.109
更新日期:2010-07-01 00:00:00
abstract::The Mouse Genome Sequencing Consortium and the RIKEN Genome Exploration Research grouphave generated large sets of sequence data representing the mouse genome and transcriptome, respectively. These data provide a valuable foundation for genomic research. The challenges for the informatics community are how to integrat...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.991003
更新日期:2003-06-01 00:00:00
abstract::The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC re...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.213538.116
更新日期:2017-05-01 00:00:00
abstract::CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.136101.111
更新日期:2012-09-01 00:00:00
abstract::Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in d...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.160572.113
更新日期:2014-02-01 00:00:00
abstract::Hammerhead ribozymes previously were found in satellite RNAs from plant viroids and in repetitive DNA from certain species of newts and schistosomes. To determine if this catalytic RNA motif has a wider distribution, we decided to scrutinize the GenBank database for RNAs that contain hammerhead or hammerhead-like moti...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.10.7.1011
更新日期:2000-07-01 00:00:00
abstract::We report the validation of a new assay for typing single nucleotide polymorphisms (SNPs) that takes advantage of the 3'-to-5' exonuclease proofreading activity of many DNA polymerases. The assay uses one or more primers labeled on the 3' nucleotide base, and can be implemented in a variety of formats including a one-...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.939903
更新日期:2003-05-01 00:00:00
abstract::The Collaborative Cross (CC) is a mouse recombinant inbred strain panel that is being developed as a resource for mammalian systems genetics. Here we describe an experiment that uses partially inbred CC lines to evaluate the genetic properties and utility of this emerging resource. Genome-wide analysis of the incipien...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.111310.110
更新日期:2011-08-01 00:00:00
abstract::The primate-specific multisequence family chAB4 is represented with approximately 40 copies within the haploid human genome. Former analyis revealed that unusually long repetition units ( > 35 kb) are distributed to at least eight different chromosomal loci. Remarkably varying copy-numbers within the genomes of closel...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.6.4.267
更新日期:1996-04-01 00:00:00
abstract::Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we tho...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.7144908
更新日期:2008-02-01 00:00:00
abstract::By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by m...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4039406
更新日期:2006-01-01 00:00:00
abstract::Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.240952.118
更新日期:2019-04-01 00:00:00
abstract::An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4303406
更新日期:2006-03-01 00:00:00
abstract::ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicin...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.092841.109
更新日期:2009-09-01 00:00:00
abstract::It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can exp...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.2800104
更新日期:2004-12-01 00:00:00
abstract::Live-cell imaging allows detailed dynamic cellular phenotyping for cell biology and, in combination with small molecule or drug libraries, for high-content screening. Fully automated analysis of live cell movies has been hampered by the lack of computational approaches that allow tracking and recognition of individual...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.092494.109
更新日期:2009-11-01 00:00:00
abstract::The next-generation sequencing technology coupled with the growing number of genome sequences opens the opportunity to redesign genotyping strategies for more effective genetic mapping and genome analysis. We have developed a high-throughput method for genotyping recombinant populations utilizing whole-genome resequen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.089516.108
更新日期:2009-06-01 00:00:00
abstract::Most mammalian RNA polymerase II initiation events occur at CpG islands, which are rich in CpGs and devoid of DNA methylation. Despite their relevance for gene regulation, it is unknown to what extent the CpG dinucleotide itself actually contributes to promoter activity. To address this question, we determined the tra...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.241653.118
更新日期:2019-04-01 00:00:00
abstract::Large scale gene perturbation experiments generate information about the number of genes whose activity is directly or indirectly affected by a gene perturbation. From this information, one can numerically estimate coarse structural network features such as the total number of direct regulatory interactions and the nu...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.193902
更新日期:2002-02-01 00:00:00
abstract::The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.216754.116
更新日期:2017-11-01 00:00:00
abstract::Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward thi...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.212696.116
更新日期:2017-06-01 00:00:00
abstract::Many CpG islands have tissue-dependent and differentially methylated regions (T-DMRs) in normal cells and tissues. To elucidate how DNA methyltransferases (Dnmts) participate in methylation of the genomic components, we investigated the genome-wide DNA methylation pattern of the T-DMRs with Dnmt1-, Dnmt3a-, and/or Dnm...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.2431504
更新日期:2004-09-01 00:00:00
abstract::The last 20 years have been a remarkable era for biology and medicine. One of the most significant achievements has been the sequencing of the first human genomes, which has laid the foundation for profound insights into human genetics, the intricacies of regulation and development, and the forces of evolution. Incred...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.191684.115
更新日期:2015-10-01 00:00:00
abstract::We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that t...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.172702
更新日期:2002-06-01 00:00:00
abstract::Molecular evolution studies are usually based on the analysis of individual genes and thus reflect only small-range variations in genomic sequences. A complementary approach is to study the evolutionary history of rearrangements in entire genomes based on the analysis of gene orders. The progress in whole genome seque...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.3002305
更新日期:2005-01-01 00:00:00
abstract::We present a database of copy number variations (CNVs) detected in 2026 disease-free individuals, using high-density, SNP-based oligonucleotide microarrays. This large cohort, comprised mainly of Caucasians (65.2%) and African-Americans (34.2%), was analyzed for CNVs in a single study using a uniform array platform an...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.083501.108
更新日期:2009-09-01 00:00:00
abstract::The recent publication of the FANTOM mouse transcriptome has provided a unique opportunity to study the diversity of transcripts arising from a single gene locus. We have focused on the Gnas complex, as imprinting loci themselves provide unique insights into transcriptional regulation. Thirteen full-length cDNAs from ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.955503
更新日期:2003-06-01 00:00:00