The Ensembl automatic gene annotation system.

Abstract:

:As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C. briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.

journal_name

Genome Res

journal_title

Genome research

authors

Curwen V,Eyras E,Andrews TD,Clarke L,Mongin E,Searle SM,Clamp M

doi

10.1101/gr.1858004

subject

Has Abstract

pub_date

2004-05-01 00:00:00

pages

942-50

issue

5

eissn

1088-9051

issn

1549-5469

pii

14/5/942

journal_volume

14

pub_type

杂志文章
  • Uncovering cis-regulatory sequence requirements for context-specific transcription factor binding.

    abstract::The regulation of gene expression is mediated at the transcriptional level by enhancer regions that are bound by sequence-specific transcription factors (TFs). Recent studies have shown that the in vivo binding sites of single TFs differ between developmental or cellular contexts. How this context-specific binding is ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.132811.111

    authors: Yáñez-Cuna JO,Dinh HQ,Kvon EZ,Shlyueva D,Stark A

    更新日期:2012-10-01 00:00:00

  • Characterization and dynamics of pericentromere-associated domains in mice.

    abstract::Despite recent progress in genome topology knowledge, the role of repeats, which make up the majority of mammalian genomes, remains elusive. Satellite repeats are highly abundant sequences that cluster around centromeres, attract pericentromeric heterochromatin, and aggregate into nuclear chromocenters. These nuclear ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.186643.114

    authors: Wijchers PJ,Geeven G,Eyres M,Bergsma AJ,Janssen M,Verstegen M,Zhu Y,Schell Y,Vermeulen C,de Wit E,de Laat W

    更新日期:2015-07-01 00:00:00

  • A comprehensive transcript map of the mouse Gnas imprinted complex.

    abstract::The recent publication of the FANTOM mouse transcriptome has provided a unique opportunity to study the diversity of transcripts arising from a single gene locus. We have focused on the Gnas complex, as imprinting loci themselves provide unique insights into transcriptional regulation. Thirteen full-length cDNAs from ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.955503

    authors: Holmes R,Williamson C,Peters J,Denny P,Wells C,RIKEN GER Group.,GSL Members.

    更新日期:2003-06-01 00:00:00

  • Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules.

    abstract::Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates e...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.146233.112

    authors: Roy S,Wapinski I,Pfiffner J,French C,Socha A,Konieczka J,Habib N,Kellis M,Thompson D,Regev A

    更新日期:2013-06-01 00:00:00

  • Genome-scale cloning and expression of individual open reading frames using topoisomerase I-mediated ligation.

    abstract::The in vitro cloning of DNA molecules traditionally uses PCR amplification or site-specific restriction endonucleases to generate linear DNA inserts with defined termini and requires DNA ligase to covalently join those inserts to vectors with the corresponding ends. We have used the properties of Vaccinia DNA topoisom...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:

    authors: Heyman JA,Cornthwaite J,Foncerrada L,Gilmore JR,Gontang E,Hartman KJ,Hernandez CL,Hood R,Hull HM,Lee WY,Marcil R,Marsh EJ,Mudd KM,Patino MJ,Purcell TJ,Rowland JJ,Sindici ML,Hoeffler JP

    更新日期:1999-04-01 00:00:00

  • The genome sequence of Mycoplasma mycoides subsp. mycoides SC type strain PG1T, the causative agent of contagious bovine pleuropneumonia (CBPP).

    abstract::Mycoplasma mycoides subsp. mycoidesSC (MmymySC)is the etiological agent of contagious bovine pleuropneumonia (CBPP), a highly contagious respiratory disease in cattle. The genome of Mmymy SC type strain PG1(T) has been sequenced to map all the genes and to facilitate further studies regarding the cell function of the ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1673304

    authors: Westberg J,Persson A,Holmberg A,Goesmann A,Lundeberg J,Johansson KE,Pettersson B,Uhlén M

    更新日期:2004-02-01 00:00:00

  • Immune signatures correlate with L1 retrotransposition in gastrointestinal cancers.

    abstract::Long interspersed nuclear element-1 (LINE-1 or L1) retrotransposons are normally suppressed in somatic tissues mainly due to DNA methylation and antiviral defense. However, the mechanism to suppress L1s may be disrupted in cancers, thus allowing L1s to act as insertional mutagens and cause genomic rearrangement and in...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.231837.117

    authors: Jung H,Choi JK,Lee EA

    更新日期:2018-08-01 00:00:00

  • Properties of overlapping genes are conserved across microbial genomes.

    abstract::There are numerous examples from the genomes of viruses, mitochondria, and chromosomes that adjacent genes can overlap, sharing at least one nucleotide. Overlaps have been hypothesized to be involved in genome size minimization and as a regulatory mechanism of gene expression. Here we show that overlapping genes are a...

    journal_title:Genome research

    pub_type: 信件

    doi:10.1101/gr.2433104

    authors: Johnson ZI,Chisholm SW

    更新日期:2004-11-01 00:00:00

  • Global analysis of protein homomerization in Saccharomyces cerevisiae.

    abstract::In vivo analyses of the occurrence, subcellular localization, and dynamics of protein-protein interactions (PPIs) are important issues in functional proteomic studies. The bimolecular fluorescence complementation (BiFC) assay has many advantages in that it provides a reliable way to detect PPIs in living cells with mi...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.231860.117

    authors: Kim Y,Jung JP,Pack CG,Huh WK

    更新日期:2019-01-01 00:00:00

  • Nature and structure of human genes that generate retropseudogenes.

    abstract::The human genome is estimated to contain 23,000 to 33,000 retropseudogenes. To study the properties of genes giving rise to these retroelements, we compared the structure and expression of genes with or without known retropseudogenes. Four main features have emerged from the analysis of 181 genes associated to retrops...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.5.672

    authors: Gonçalves I,Duret L,Mouchiroud D

    更新日期:2000-05-01 00:00:00

  • metaSPAdes: a new versatile metagenomic assembler.

    abstract::While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amp...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.213959.116

    authors: Nurk S,Meleshko D,Korobeynikov A,Pevzner PA

    更新日期:2017-05-01 00:00:00

  • Meiotic recombination generates rich diversity in NK cell receptor genes, alleles, and haplotypes.

    abstract::Natural killer (NK) cells contribute to the essential functions of innate immunity and reproduction. Various genes encode NK cell receptors that recognize the major histocompatibility complex (MHC) Class I molecules expressed by other cells. For primate NK cells, the killer-cell immunoglobulin-like receptors (KIR) are...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.085738.108

    authors: Norman PJ,Abi-Rached L,Gendzekhadze K,Hammond JA,Moesta AK,Sharma D,Graef T,McQueen KL,Guethlein LA,Carrington CV,Chandanayingyong D,Chang YH,Crespí C,Saruhan-Direskeneli G,Hameed K,Kamkamidze G,Koram KA,Layrisse Z,Ma

    更新日期:2009-05-01 00:00:00

  • Gene expression profiling of human breast tissue samples using SAGE-Seq.

    abstract::We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.108217.110

    authors: Wu ZJ,Meyer CA,Choudhury S,Shipitsin M,Maruyama R,Bessarabova M,Nikolskaya T,Sukumar S,Schwartzman A,Liu JS,Polyak K,Liu XS

    更新日期:2010-12-01 00:00:00

  • The distribution of variation in regulatory gene segments, as present in MHC class II promoters.

    abstract::Diversity in the antigen-binding receptors of the immune system has long been a primary interest of biologists. Recently it has been suggested that polymorphism in regulatory (noncoding) gene segments is of substantial importance as well. Here, we survey the level of variation in MHC class II gene promoters in man and...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.2.124

    authors: Cowell LG,Kepler TB,Janitz M,Lauster R,Mitchison NA

    更新日期:1998-02-01 00:00:00

  • Comparative gene mapping: a fine-scale survey of chromosome rearrangements between ruminants and humans.

    abstract::A total of 202 genes were cytogenetically mapped to goat chromosomes, multiplying by five the total number of regional gene localizations in domestic ruminants (255). This map encompasses 249 and 173 common anchor loci regularly spaced along human and murine chromosomes, respectively, which makes it possible to perfor...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.9.901

    authors: Schibler L,Vaiman D,Oustry A,Giraud-Delville C,Cribiu EP

    更新日期:1998-09-01 00:00:00

  • Translation initiation downstream from annotated start codons in human mRNAs coevolves with the Kozak context.

    abstract::Eukaryotic translation initiation involves preinitiation ribosomal complex 5'-to-3' directional probing of mRNA for codons suitable for starting protein synthesis. The recognition of codons as starts depends on the codon identity and on its immediate nucleotide context known as Kozak context. When the context is weak ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257352.119

    authors: Benitez-Cantos MS,Yordanova MM,O'Connor PBF,Zhdanov AV,Kovalchuk SI,Papkovsky DB,Andreev DE,Baranov PV

    更新日期:2020-07-01 00:00:00

  • A bioinformatics-based strategy identifies c-Myc and Cdc25A as candidates for the Apmt mammary tumor latency modifiers.

    abstract::The epistatically interacting modifier loci (Apmt1 and Apmt2) accelerate the polyoma Middle-T (PyVT)-induced mammary tumor. To identify potential candidate genes loci, a combined bioinformatics and genomics strategy was used. On the basis of the assumption that the loci were functioning in the same or intersecting pat...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.210502

    authors: Cozma D,Lukes L,Rouse J,Qiu TH,Liu ET,Hunter KW

    更新日期:2002-06-01 00:00:00

  • Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error.

    abstract::It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.097543.109

    authors: Liu X,Fu YX,Maxwell TJ,Boerwinkle E

    更新日期:2010-01-01 00:00:00

  • Whole-genome sequence assembly for mammalian genomes: Arachne 2.

    abstract::We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal change...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.828403

    authors: Jaffe DB,Butler J,Gnerre S,Mauceli E,Lindblad-Toh K,Mesirov JP,Zody MC,Lander ES

    更新日期:2003-01-01 00:00:00

  • Multiplex mapping of chromatin accessibility and DNA methylation within targeted single molecules identifies epigenetic heterogeneity in neural stem cells and glioblastoma.

    abstract::Human tumors are comprised of heterogeneous cell populations that display diverse molecular and phenotypic features. To examine the extent to which epigenetic differences contribute to intratumoral cellular heterogeneity, we have developed a high-throughput method, termed MAPit-patch. The method uses multiplexed ampli...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.161737.113

    authors: Nabilsi NH,Deleyrolle LP,Darst RP,Riva A,Reynolds BA,Kladde MP

    更新日期:2014-02-01 00:00:00

  • Theories and applications for sequencing randomly selected clones.

    abstract::Theory is developed for the process of sequencing randomly selected large-insert clones. Genome size, library depth, clone size, and clone distribution are considered relevant properties and perfect overlap detection for contig assembly is assumed. Genome-specific and nonrandom effects are neglected. Order of magnitud...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.gr-1339r

    authors: Wendl MC,Marra MA,Hillier LW,Chinwalla AT,Wilson RK,Waterston RH

    更新日期:2001-02-01 00:00:00

  • rVista for comparative sequence-based discovery of functional transcription factor binding sites.

    abstract::Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVista, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) a...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.225502

    authors: Loots GG,Ovcharenko I,Pachter L,Dubchak I,Rubin EM

    更新日期:2002-05-01 00:00:00

  • Spidey: a tool for mRNA-to-genomic alignments.

    abstract::We have developed a computer program that aligns spliced sequences to genomic sequences, using local alignment algorithms and heuristics to put together a global spliced alignment. Spidey can produce reliable alignments quickly, even when confronted with noise from alternative splicing, polymorphisms, sequencing error...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.195301

    authors: Wheelan SJ,Church DM,Ostell JM

    更新日期:2001-11-01 00:00:00

  • Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program.

    abstract::Mammalian genomes are partitioned into domains that replicate in a defined temporal order. These domains can replicate at similar times in all cell types (constitutive) or at cell type-specific times (developmental). Genome-wide chromatin conformation capture (Hi-C) has revealed sub-megabase topologically associating ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.183699.114

    authors: Dileep V,Ay F,Sima J,Vera DL,Noble WS,Gilbert DM

    更新日期:2015-08-01 00:00:00

  • Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes.

    abstract::Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mut...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.239186.118

    authors: Maher GJ,Ralph HK,Ding Z,Koelling N,Mlcochova H,Giannoulatou E,Dhami P,Paul DS,Stricker SH,Beck S,McVean G,Wilkie AOM,Goriely A

    更新日期:2018-12-01 00:00:00

  • Birth and expression evolution of mammalian microRNA genes.

    abstract::MicroRNAs (miRNAs) are major post-transcriptional regulators of gene expression, yet their origins and functional evolution in mammals remain little understood due to the lack of appropriate comparative data. Using RNA sequencing, we have generated extensive and comparable miRNA data for five organs in six species tha...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140269.112

    authors: Meunier J,Lemoine F,Soumillon M,Liechti A,Weier M,Guschanski K,Hu H,Khaitovich P,Kaessmann H

    更新日期:2013-01-01 00:00:00

  • Genome-wide map of regulatory interactions in the human genome.

    abstract::Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.176586.114

    authors: Heidari N,Phanstiel DH,He C,Grubert F,Jahanbani F,Kasowski M,Zhang MQ,Snyder MP

    更新日期:2014-12-01 00:00:00

  • The region surrounding the PKD1 gene: a 700-kb P1 contig from a YAC-deficient interval.

    abstract::As part of an effort to identify the gene responsible for the predominant form of polycystic kidney disease (PKD1), we used a gridded human P1 library for contig assembly. The interval of interest, a 700-kb segment on chromosome 16p13.3, can be physically delineated by the genetic markers D16S125 and D16S84 and chromo...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.6.6.515

    authors: Dackowski WR,Connors TD,Bowe AE,Stanton V Jr,Housman D,Doggett NA,Landes GM,Klinger KW

    更新日期:1996-06-01 00:00:00

  • Probing genomic diversity and evolution of Escherichia coli O157 by single nucleotide polymorphisms.

    abstract::Infections by Shiga toxin-producing Escherichia coli O157:H7 (STEC O157) are the predominant cause of bloody diarrhea and hemolytic uremic syndrome in the United States. In silico comparison of the two complete STEC O157 genomes (Sakai and EDL933) revealed a strikingly high level of sequence identity in orthologous pr...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.4759706

    authors: Zhang W,Qi W,Albert TJ,Motiwala AS,Alland D,Hyytia-Trees EK,Ribot EM,Fields PI,Whittam TS,Swaminathan B

    更新日期:2006-06-01 00:00:00

  • Random mutagenesis of proximal mouse chromosome 5 uncovers predominantly embryonic lethal mutations.

    abstract::A region-specific ENU mutagenesis screen was conducted to elucidate the functional content of proximal mouse Chr 5. We used the visibly marked, recessive, lethal inversion Rump White (Rw) as a balancer in a three-generation breeding scheme to identify recessive mutations within the approximately 50 megabases spanned b...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3826505

    authors: Wilson L,Ching YH,Farias M,Hartford SA,Howell G,Shao H,Bucan M,Schimenti JC

    更新日期:2005-08-01 00:00:00