Abstract:
:GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, and start and stop codons are predicted and scored along the sequence using position weight matrices (PWMs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the log-likelihood ratio of a Markov model for coding DNA. In the last step, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. In this paper we describe the obtention of PWMs for sites, and the Markov model of coding DNA in Drosophila melanogaster. We also compare other models of coding DNA with the Markov model. Finally, we present and discuss the results obtained when GeneID is used to predict genes in the Adh region. These results show that the accuracy of GeneID predictions compares currently with that of other existing tools but that GeneID is likely to be more efficient in terms of speed and memory usage.
journal_name
Genome Resjournal_title
Genome researchauthors
Parra G,Blanco E,Guigó Rdoi
10.1101/gr.10.4.511subject
Has Abstractpub_date
2000-04-01 00:00:00pages
511-5issue
4eissn
1088-9051issn
1549-5469journal_volume
10pub_type
杂志文章相关文献
GENOME RESEARCH文献大全abstract::The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.106344.110
更新日期:2010-11-01 00:00:00
abstract::Meiotic DNA double-stranded breaks (DSBs) initiate genetic recombination in discrete areas of the genome called recombination hotspots. DSBs can be directly mapped using chromatin immunoprecipitation followed by sequencing (ChIP-seq). Nevertheless, the genome-wide mapping of recombination hotspots in mammals is still ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.130583.111
更新日期:2012-05-01 00:00:00
abstract::Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. B...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.229202
更新日期:2002-04-01 00:00:00
abstract::Noncoding RNA (ncRNA) constitutes a significant portion of the mammalian transcriptome. Emerging evidence suggests that it regulates gene expression in cis or trans by modulating the chromatin structure. To uncover the functional role of ncRNA in chromatin organization, we deep sequenced chromatin-associated RNAs (CAR...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.103473.109
更新日期:2010-07-01 00:00:00
abstract::The need to translate genes to function has positioned the rat as an invaluable animal model for genomic research. The significant increase in genomic resources in recent years has had an immediate functional application in the rat. Many of the resources for translational research are already in place and are ready to...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.3744005
更新日期:2005-12-01 00:00:00
abstract::Metagenomic projects generate short, overlapping fragments of DNA sequence, each deriving from a different individual. We report a new method for inferring the scaled mutation rate, theta = 2Neu, and the scaled exponential growth rate, R = Ner, from the site-frequency spectrum of these data while accounting for sequen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.5431206
更新日期:2006-10-01 00:00:00
abstract::Nutrient availability profoundly influences gene expression. Many animal genes encode multiple transcript isoforms, yet the effect of nutrient availability on transcript isoform expression has not been studied in genome-wide fashion. When Caenorhabditis elegans larvae hatch without food, they arrest development in the...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.133587.111
更新日期:2012-10-01 00:00:00
abstract::Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward thi...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.212696.116
更新日期:2017-06-01 00:00:00
abstract::Centromeres pose an evolutionary paradox: strongly conserved in function but rapidly changing in sequence and structure. However, in the absence of damage, centromere locations are usually conserved within a species. We report here that isolates of the pathogenic yeast species Candida parapsilosis show within-species ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.257816.119
更新日期:2020-05-01 00:00:00
abstract::The regulation of gene expression is mediated at the transcriptional level by enhancer regions that are bound by sequence-specific transcription factors (TFs). Recent studies have shown that the in vivo binding sites of single TFs differ between developmental or cellular contexts. How this context-specific binding is ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.132811.111
更新日期:2012-10-01 00:00:00
abstract::Aberrations of protein-coding genes are a focus of cancer genomics; however, the impact of oncogenes on expression of the ~50% of transcripts without protein-coding potential, including long noncoding RNAs (lncRNAs), has been largely uncharacterized. Activating mutations in the BRAF oncogene are present in >70% of mel...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.140061.112
更新日期:2012-06-01 00:00:00
abstract::We have developed a simplified method for multiplex PCR based on the use of chimeric primers. Each primer contains a 3' region complementary to sequence-specific recognition sites and a 5' region made up of an unrelated 20-nucleotide sequence. Identical reaction conditions, cycling times, and annealing temperatures ha...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.5.5.488
更新日期:1995-12-01 00:00:00
abstract::In cancer cells, aberrant DNA methylation is commonly associated with transcriptional alterations, including silencing of tumor suppressor genes. However, multiple epigenetic mechanisms, including polycomb repressive marks, contribute to gene deregulation in cancer. To dissect the relative contribution of DNA methylat...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.249219.119
更新日期:2019-10-01 00:00:00
abstract::CTCF is an architectural protein with a critical role in connecting higher-order chromatin folding in pluripotent stem cells. Recent reports have suggested that CTCF binding is more dynamic during development than previously appreciated. Here, we set out to understand the extent to which shifts in genome-wide CTCF occ...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.215160.116
更新日期:2017-07-01 00:00:00
abstract::Early embryogenesis is characterized by the maternal to zygotic transition (MZT), in which maternally deposited messenger RNAs are degraded while zygotic transcription begins. Before the MZT, post-transcriptional gene regulation by RNA-binding proteins (RBPs) is the dominant force in embryo patterning. We used two mRN...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.200386.115
更新日期:2016-07-01 00:00:00
abstract::Microsatellites are abundant in vertebrate genomes, but their sequence representation and length distributions vary greatly within each family of repeats (e.g., tetranucleotides). Biophysical studies of 82 synthetic single-stranded oligonucleotides comprising all tetra- and trinucleotide repeats revealed an inverse co...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.078303.108
更新日期:2008-10-01 00:00:00
abstract::In comparison to genotypes, knowledge about haplotypes (the combination of alleles present on a single chromosome) is much more useful for whole-genome association studies and for making inferences about human evolutionary history. Haplotypes are typically inferred from population genotype data using computational met...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.077065.108
更新日期:2008-08-01 00:00:00
abstract::Theory is developed for the process of sequencing randomly selected large-insert clones. Genome size, library depth, clone size, and clone distribution are considered relevant properties and perfect overlap detection for contig assembly is assumed. Genome-specific and nonrandom effects are neglected. Order of magnitud...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.gr-1339r
更新日期:2001-02-01 00:00:00
abstract::Analysis procedures are needed to extract useful information from the large amount of gene expression data that is becoming available. This work describes a set of analytical tools and their application to yeast cell cycle data. The components of our approach are (1) a similarity measure that reduces the number of fal...
journal_title:Genome research
pub_type: 杂志文章,评审
doi:10.1101/gr.9.11.1106
更新日期:1999-11-01 00:00:00
abstract::Translocations are known to affect the expression of genes at the breakpoints and, in the case of unbalanced translocations, alter the gene copy number. However, a comprehensive understanding of the functional impact of this class of variation is lacking. Here, we have studied the effect of balanced chromosomal rearra...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.103622.109
更新日期:2010-05-01 00:00:00
abstract::The reported human genome sequence includes about 400 gaps of unknown sequence that were not found in the bacterial artificial chromosome (BAC) and cosmid libraries used for sequencing of the genome. These missing sequences correspond to approximately 1% of euchromatic regions of the human genome. Gap filling is a lab...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1929904
更新日期:2004-02-01 00:00:00
abstract::Loss of heterozygosity (LOH) and copy number alteration (CNA) feature prominently in the somatic genomic landscape of tumors. As such, karyotypic aberrations in cancer genomes have been studied extensively to discover novel oncogenes and tumor-suppressor genes. Advances in sequencing technology have enabled the cost-e...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.137570.112
更新日期:2012-10-01 00:00:00
abstract::Detecting rare sequence variants in genomic DNA is central to the analysis of de novo mutation and recombination events and the detection of rare pathological mutations in mixed cell populations. Current PCR techniques suffer from noise that limits detection to variants present at a frequency of at least 10(-4)-10(-5)...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.1214603
更新日期:2003-10-01 00:00:00
abstract::We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that t...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.172702
更新日期:2002-06-01 00:00:00
abstract::Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is import...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.183801
更新日期:2001-10-01 00:00:00
abstract::Isochromosome 17q, or i(17q), is one of the most frequent nonrandom changes occurring in human neoplasia. Most of the i(17q) breakpoints cluster within a approximately 240-kb interval located in the Smith-Magenis syndrome common deletion region in 17p11.2. The breakpoint cluster region is characterized by a complex ar...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.080697.108
更新日期:2008-11-01 00:00:00
abstract::Recent evidence from proteomics and deep massively parallel sequencing studies have revealed that eukaryotic genomes contain substantial numbers of as-yet-uncharacterized open reading frames (ORFs). We define these uncharacterized ORFs as novel ORFs (nORFs). nORFs in humans are mostly under 100 codons and are found in...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.263202.120
更新日期:2021-01-19 00:00:00
abstract::Since complete redundancy between extant duplicates (paralogs) is evolutionarily unfavorable, some degree of functional congruency is eventually lost. However, in budding yeast, experimental evidence collected for duplicated metabolic enzymes and in global physical interaction surveys had suggested widespread function...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.076174.108
更新日期:2008-07-01 00:00:00
abstract::Duplication of the genome in mammalian cells occurs in a defined temporal order referred to as its replication-timing (RT) program. RT changes dynamically during development, regulated in units of 400-800 kb referred to as replication domains (RDs). Changes in RT are generally coordinated with transcriptional competen...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.187989.114
更新日期:2015-08-01 00:00:00
abstract::Extracellular cues play critical roles in the establishment of the epigenome during development and may also contribute to epigenetic perturbations found in disease states. The direct role of the local tissue environment on the post-development human epigenome, however, remains unclear due to limitations in studies of...
journal_title:Genome research
pub_type: 杂志文章
doi:10.1101/gr.166439.113
更新日期:2014-04-01 00:00:00