Abstract:
:One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.
journal_name
Genome Resjournal_title
Genome researchauthors
Guigó R,Agarwal P,Abril JF,Burset M,Fickett JWdoi
10.1101/gr.122800subject
Has Abstractpub_date
2000-10-01 00:00:00pages
1631-42issue
10eissn
1088-9051issn
1549-5469journal_volume
10pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::Analysis procedures are needed to extract useful information from the large amount of gene expression data that is becoming available. This work describes a set of analytical tools and their application to yeast cell cycle data. The components of our approach are (1) a similarity measure that reduces the number of fal...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.9.11.1106
更新日期:1999-11-01 00:00:00
abstract::Fish-mammal genomic comparisons have proved powerful in identifying conserved noncoding elements likely to be cis-regulatory in nature, and the majority of those tested in vivo have been shown to act as tissue-specific enhancers associated with genes involved in transcriptional regulation of development. Although most...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.4143406
更新日期:2006-04-01 00:00:00
abstract::The allele fraction (AF) distribution, occurrence rate, and evolutionary contribution of postzygotic single-nucleotide mosaicisms (pSNMs) remain largely unknown. In this study, we developed a mathematical model to describe the accumulation and AF drift of pSNMs during the development of multicellular organisms. By app...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.230003.117
更新日期:2018-07-01 00:00:00
abstract::We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.081398.108
更新日期:2009-02-01 00:00:00
abstract::The discovery of the genetic code was one of the most important advances of modern biology. But there is more to a DNA code than protein sequence; DNA carries signals for splicing, localization, folding, and regulation that are often embedded within the protein-coding sequence. In this issue, Itzkovitz and Alon show t...
journal_title:Genome research
pub_type: 评论,杂志文章,评审
doi:10.1101/gr.6144007
更新日期:2007-04-01 00:00:00
abstract::Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates e...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.146233.112
更新日期:2013-06-01 00:00:00
abstract::Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all fou...
journal_title:Genome research
pub_type: 杂志文章
doi:
更新日期:1999-07-01 00:00:00
abstract::Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and co...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.185579.114
更新日期:2015-03-01 00:00:00
abstract::The study of genetic variability within natural populations of pathogens may provide insight into their evolution and pathogenesis. We used a Mycobacterium tuberculosis high-density oligonucleotide microarray to detect small-scale genomic deletions among 19 clinically and epidemiologically well-characterized isolates ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.166401
更新日期:2001-04-01 00:00:00
abstract::We developed a high-throughput technique for the generation of cDNA libraries in the yeast Saccharomyces cerevisiae which enables the selection of cloned cDNA inserts containing open reading frames (ORFs). For direct screening of random-primed cDNA libraries, we have constructed a yeast shuttle/expression vector, the ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.181501
更新日期:2001-10-01 00:00:00
abstract::Microsatellites--tandem repeats of short DNA motifs--are abundant in the human genome and have high mutation rates. While microsatellite instability is implicated in numerous genetic diseases, the molecular processes involved in their emergence and disappearance are still not well understood. Microsatellites are hypot...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.122937.111
更新日期:2011-12-01 00:00:00
abstract::The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.246462.118
更新日期:2019-12-01 00:00:00
abstract::Repetitive DNA is a significant component of eukaryotic genomes. We have developed a strategy to efficiently and accurately sequence repetitive DNA in the nematode Caenorhabditis elegans using integrated artificial transposons and automated fluorescent sequencing. Mapping and assembly tools represent important compone...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.7.5.551
更新日期:1997-05-01 00:00:00
abstract::Phenotypic differences within populations and between closely related species are often driven by variation and evolution of gene expression. However, most analyses have focused on the effects of genomic variation at cis-regulatory elements such as promoters and enhancers that control transcriptional activity, and lit...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.212563.116
更新日期:2017-03-01 00:00:00
abstract::The sequence of the first plant genome was completed and published at the end of 2000. This spawned a series of large-scale projects aimed at discovering the functions of the 25,000+ genes identified in Arabidopsis thaliana (Arabidopsis). This review summarizes progress made in the past five years and speculates about...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.3723405
更新日期:2005-12-01 00:00:00
abstract::Notwithstanding their biological importance, Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences; due to its high content of repetitive DNA, in most genome projects, the Y chromosome sequence is fragmented into a large number of small, unma...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.156034.113
更新日期:2013-11-01 00:00:00
abstract::In contrast to other animal cell lines, the chicken pre-B cell lymphoma line, DT40, exhibits a high level of homologous recombination, which can be exploited to generate site-specific alterations in defined target genes or regions. In addition, the ability to generate human/chicken monochromosomal hybrids in the DT40 ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.8.6.666
更新日期:1998-06-01 00:00:00
abstract::Drug development efforts against cancer are often hampered by the complex properties of signaling networks. Here we combined the results of an RNAi screen targeting the cellular signaling machinery, with graph theoretical analysis to extract the core modules that process both mitogenic and oncogenic signals to drive c...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.116145.110
更新日期:2011-12-01 00:00:00
abstract::Using both env and long terminal repeat (LTR) sequences, with maximal representation of genetic diversity within primate strains, we revise and expand the unique evolutionary history of human and simian T-cell leukemia/lymphotropic viruses (HTLV/STLV). Based on the robust application of three different phylogenetic al...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:
更新日期:1999-06-01 00:00:00
abstract::Intra-tumoral genetic heterogeneity has been characterized across cancers by genome sequencing of bulk tumors, including chronic lymphocytic leukemia (CLL). In order to more accurately identify subclones, define phylogenetic relationships, and probe genotype-phenotype relationships, we developed methods for targeted m...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.217331.116
更新日期:2017-08-01 00:00:00
abstract::Most mammalian RNA polymerase II initiation events occur at CpG islands, which are rich in CpGs and devoid of DNA methylation. Despite their relevance for gene regulation, it is unknown to what extent the CpG dinucleotide itself actually contributes to promoter activity. To address this question, we determined the tra...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.241653.118
更新日期:2019-04-01 00:00:00
abstract::Establishment of spatial coordinates during Drosophila embryogenesis relies on differential regulatory activity of axis patterning enhancers. Concentration gradients of activator and repressor transcription factors (TFs) provide positional information to each enhancer, which in turn promotes transcription of a target ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.242362.118
更新日期:2019-05-01 00:00:00
abstract::DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexit...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.313703
更新日期:2003-02-01 00:00:00
abstract::Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger s...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.168450.113
更新日期:2014-04-01 00:00:00
abstract::Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.202903.115
更新日期:2016-11-01 00:00:00
abstract::Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of in...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.239822.118
更新日期:2019-06-01 00:00:00
abstract::Primate pericentromeric regions recently have been shown to exhibit extraordinary evolutionary plasticity. In this paper we report an additional peculiar feature of these regions that we discovered while analyzing, by FISH, the evolutionary conservation of primate phylogenetic chromosome IX. If the position of the cen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.9.12.1184
更新日期:1999-12-01 00:00:00
abstract::Mouse chromosome 7F4/F5, where the imprinting domain is located, is syntenic to human 11p15.5, the locus for Beckwith-Wiedemann syndrome. The domain is thought to consist of the two subdomains Kip2 (p57(kip2))/Lit1 and Igf2/H19. Because DNA methylation is believed to be a key factor in genomic imprinting, we performed...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.110702
更新日期:2002-12-01 00:00:00
abstract::A new algorithm, WABA, was developed for doing large-scale alignments between genomic DNA of different species. WABA was used to align 8 million bases of Caenorhabditis briggsae genomic DNA against the entire 97-million-base Caenorhabditis elegans genome. The alignment, including C. briggsae homologs of 154 geneticall...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.10.8.1115
更新日期:2000-08-01 00:00:00
abstract::A subset of colorectal cancers was postulated to have the CpG island methylator phenotype (CIMP), a higher propensity for CpG island DNA methylation. The validity of CIMP, its molecular basis, and its prognostic value remain highly controversial. Using MBD-isolated genome sequencing, we mapped and compared genome-wide...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.122788.111
更新日期:2012-02-01 00:00:00