Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.

Abstract:

:A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.

journal_name

Genome Res

journal_title

Genome research

authors

Aaronson JS,Eckman B,Blevins RA,Borkowski JA,Myerson J,Imran S,Elliston KO

doi

10.1101/gr.6.9.829

subject

Has Abstract

pub_date

1996-09-01 00:00:00

pages

829-45

issue

9

eissn

1088-9051

issn

1549-5469

journal_volume

6

pub_type

杂志文章
  • New insulin-like proteins with atypical disulfide bond pattern characterized in Caenorhabditis elegans by comparative sequence analysis and homology modeling.

    abstract::We have identified three new families of insulin homologs in Caenorhabditis elegans. In two of these families, concerted mutations suggest that an additional disulfide bond links B and A domains, and that the A-domain internal disulfide bond is substituted by a hydrophobic interaction. Homology modeling remarkably con...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.4.348

    authors: Duret L,Guex N,Peitsch MC,Bairoch A

    更新日期:1998-04-01 00:00:00

  • Characterization of complex chromosomal rearrangements by targeted capture and next-generation sequencing.

    abstract::Translocations are a common class of chromosomal aberrations and can cause disease by physically disrupting genes or altering their regulatory environment. Some translocations, apparently balanced at the microscopic level, include deletions, duplications, insertions, or inversions at the molecular level. Traditionally...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.122986.111

    authors: Sobreira NL,Gnanakkan V,Walsh M,Marosy B,Wohler E,Thomas G,Hoover-Fong JE,Hamosh A,Wheelan SJ,Valle D

    更新日期:2011-10-01 00:00:00

  • Optical mapping of BAC clones from the human Y chromosome DAZ locus.

    abstract::The accurate mapping of clones derived from genomic regions containing complex arrangements of repeated elements presents special problems for DNA sequencers. Recent advances in the automation of optical mapping have enabled us to map a set of 16 BAC clones derived from the DAZ locus of the human Y chromosome long arm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.112100

    authors: Giacalone J,Delobette S,Gibaja V,Ni L,Skiadas Y,Qi R,Edington J,Lai Z,Gebauer D,Zhao H,Anantharaman T,Mishra B,Brown LG,Saxena R,Page DC,Schwartz DC

    更新日期:2000-09-01 00:00:00

  • Systematic recovery and analysis of full-ORF human cDNA clones.

    abstract::The Mammalian Gene Collection (MGC) consortium (http://mgc.nci.nih.gov) seeks to establish publicly available collections of full-ORF cDNAs for several organisms of significance to biomedical research, including human. To date over 15,200 human cDNA clones containing full-length open reading frames (ORFs) have been id...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2473704

    authors: Baross A,Butterfield YS,Coughlin SM,Zeng T,Griffith M,Griffith OL,Petrescu AS,Smailus DE,Khattra J,McDonald HL,McKay SJ,Moksa M,Holt RA,Marra MA

    更新日期:2004-10-01 00:00:00

  • The repetitive landscape of the chicken genome.

    abstract::Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, an...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2438004

    authors: Wicker T,Robertson JS,Schulze SR,Feltus FA,Magrini V,Morrison JA,Mardis ER,Wilson RK,Peterson DG,Paterson AH,Ivarie R

    更新日期:2005-01-01 00:00:00

  • High resolution mapping of modified DNA nucleobases using excision repair enzymes.

    abstract::The incorporation and creation of modified nucleobases in DNA have profound effects on genome function. We describe methods for mapping positions and local content of modified DNA nucleobases in genomic DNA. We combined in vitro nucleobase excision with massively parallel DNA sequencing (Excision-seq) to determine the...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.174052.114

    authors: Bryan DS,Ransom M,Adane B,York K,Hesselberth JR

    更新日期:2014-09-01 00:00:00

  • The multicomparative 2-n-way genome suite.

    abstract::To effectively analyze the increasing amounts of available genomic data, improved comparative analytical tools that are accessible to and applicable by a broad scientific community are essential. We built the "2-n-way" software suite to provide a fundamental and innovative processing framework for revealing and compar...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.262261.120

    authors: Churakov G,Zhang F,Grundmann N,Makalowski W,Noll A,Doronina L,Schmitz J

    更新日期:2020-10-01 00:00:00

  • Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules.

    abstract::Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates e...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.146233.112

    authors: Roy S,Wapinski I,Pfiffner J,French C,Socha A,Konieczka J,Habib N,Kellis M,Thompson D,Regev A

    更新日期:2013-06-01 00:00:00

  • BLAT--the BLAST-like alignment tool.

    abstract::Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. B...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.229202

    authors: Kent WJ

    更新日期:2002-04-01 00:00:00

  • Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data.

    abstract::Transposable elements (TEs) are an integral part of the host transcriptome. TE-containing noncoding RNAs (ncRNAs) show considerable tissue specificity and play important roles during development, including stem cell maintenance and cell differentiation. Recent advances in single-cell RNA-seq (scRNA-seq) revolutionized...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.265173.120

    authors: Shao W,Wang T

    更新日期:2021-01-01 00:00:00

  • Exploring the human genome with functional maps.

    abstract::Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular prot...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.082214.108

    authors: Huttenhower C,Haley EM,Hibbs MA,Dumeaux V,Barrett DR,Coller HA,Troyanskaya OG

    更新日期:2009-06-01 00:00:00

  • Genome-scale identification of cellular pathways required for cell surface recognition.

    abstract::Interactions mediated by cell surface receptors initiate important instructive signaling cues but can be difficult to detect in biochemical assays because they are often highly transient and membrane-embedded receptors are difficult to solubilize in their native conformation. Here, we address these biochemical challen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.231183.117

    authors: Sharma S,Bartholdson SJ,Couch ACM,Yusa K,Wright GJ

    更新日期:2018-09-01 00:00:00

  • Exploring expression data: identification and analysis of coexpressed genes.

    abstract::Analysis procedures are needed to extract useful information from the large amount of gene expression data that is becoming available. This work describes a set of analytical tools and their application to yeast cell cycle data. The components of our approach are (1) a similarity measure that reduces the number of fal...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.9.11.1106

    authors: Heyer LJ,Kruglyak S,Yooseph S

    更新日期:1999-11-01 00:00:00

  • A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.

    abstract::All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications-from answering questions about human evolution to locating regions in the human genome containing di...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.115360.110

    authors: Moltke I,Albrechtsen A,Hansen TV,Nielsen FC,Nielsen R

    更新日期:2011-07-01 00:00:00

  • Why do human diversity levels vary at a megabase scale?

    abstract::Levels of diversity vary across the human genome. This variation is caused by two forces: differences in mutation rates and the differential impact of natural selection. Pertinent to the question of the relative importance of these two forces is the observation that both diversity within species and interspecies diver...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3461105

    authors: Hellmann I,Prüfer K,Ji H,Zody MC,Pääbo S,Ptak SE

    更新日期:2005-09-01 00:00:00

  • A matter of life or death: how microsatellites emerge in and vanish from the human genome.

    abstract::Microsatellites--tandem repeats of short DNA motifs--are abundant in the human genome and have high mutation rates. While microsatellite instability is implicated in numerous genetic diseases, the molecular processes involved in their emergence and disappearance are still not well understood. Microsatellites are hypot...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.122937.111

    authors: Kelkar YD,Eckert KA,Chiaromonte F,Makova KD

    更新日期:2011-12-01 00:00:00

  • Coevolution within a transcriptional network by compensatory trans and cis mutations.

    abstract::Transcriptional networks have been shown to evolve very rapidly, prompting questions as to how such changes arise and are tolerated. Recent comparisons of transcriptional networks across species have implicated variations in the cis-acting DNA sequences near genes as the main cause of divergence. What is less clear is...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.111765.110

    authors: Kuo D,Licon K,Bandyopadhyay S,Chuang R,Luo C,Catalana J,Ravasi T,Tan K,Ideker T

    更新日期:2010-12-01 00:00:00

  • Preference of DNA methyltransferases for CpG islands in mouse embryonic stem cells.

    abstract::Many CpG islands have tissue-dependent and differentially methylated regions (T-DMRs) in normal cells and tissues. To elucidate how DNA methyltransferases (Dnmts) participate in methylation of the genomic components, we investigated the genome-wide DNA methylation pattern of the T-DMRs with Dnmt1-, Dnmt3a-, and/or Dnm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2431504

    authors: Hattori N,Abe T,Hattori N,Suzuki M,Matsuyama T,Yoshida S,Li E,Shiota K

    更新日期:2004-09-01 00:00:00

  • Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties.

    abstract::Microsatellites are abundant in vertebrate genomes, but their sequence representation and length distributions vary greatly within each family of repeats (e.g., tetranucleotides). Biophysical studies of 82 synthetic single-stranded oligonucleotides comprising all tetra- and trinucleotide repeats revealed an inverse co...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.078303.108

    authors: Bacolla A,Larson JE,Collins JR,Li J,Milosavljevic A,Stenson PD,Cooper DN,Wells RD

    更新日期:2008-10-01 00:00:00

  • Computational comparison of human genomic sequence assemblies for a region of chromosome 4.

    abstract::Much of the available human genomic sequence data exist in a fragmentary draft state following the completion of the initial high-volume sequencing performed by the International Human Genome Sequencing Consortium (IHGSC) and Celera Genomics (CG). We compared six draft genome assemblies over a region of chromosome 4p ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.207902

    authors: Semple CA,Morris SW,Porteous DJ,Evans KL

    更新日期:2002-03-01 00:00:00

  • LSH and G9a/GLP complex are required for developmentally programmed DNA methylation.

    abstract::LSH, a member of the SNF2 family of chromatin remodeling ATPases encoded by the Hells gene, is essential for normal levels of DNA methylation in the mammalian genome. While the role of LSH in the methylation of repetitive DNA sequences is well characterized, its contribution to the regulation of DNA methylation and th...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.108498.110

    authors: Myant K,Termanis A,Sundaram AY,Boe T,Li C,Merusi C,Burrage J,de Las Heras JI,Stancheva I

    更新日期:2011-01-01 00:00:00

  • Spotted long oligonucleotide arrays for human gene expression analysis.

    abstract::DNA microarrays produced by deposition (or 'spotting')of a single long oligonucleotide probe for each gene may be an attractive alternative to other types of arrays. We produced spotted oligonucleotide arrays using two large collections of approximately 70-mer probes, and used these arrays to analyze gene expression i...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1048803

    authors: Barczak A,Rodriguez MW,Hanspers K,Koth LL,Tai YC,Bolstad BM,Speed TP,Erle DJ

    更新日期:2003-07-01 00:00:00

  • Pervasive polymorphic imprinted methylation in the human placenta.

    abstract::The maternal and paternal copies of the genome are both required for mammalian development, and this is primarily due to imprinted genes, those that are monoallelically expressed based on parent-of-origin. Typically, this pattern of expression is regulated by differentially methylated regions (DMRs) that are establish...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.196139.115

    authors: Hanna CW,Peñaherrera MS,Saadeh H,Andrews S,McFadden DE,Kelsey G,Robinson WP

    更新日期:2016-06-01 00:00:00

  • High-throughput genotyping by whole-genome resequencing.

    abstract::The next-generation sequencing technology coupled with the growing number of genome sequences opens the opportunity to redesign genotyping strategies for more effective genetic mapping and genome analysis. We have developed a high-throughput method for genotyping recombinant populations utilizing whole-genome resequen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.089516.108

    authors: Huang X,Feng Q,Qian Q,Zhao Q,Wang L,Wang A,Guan J,Fan D,Weng Q,Huang T,Dong G,Sang T,Han B

    更新日期:2009-06-01 00:00:00

  • Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors.

    abstract::Intra-tumor heterogeneity poses substantial challenges for cancer treatment. A tumor's composition can be deduced by reconstructing its mutational history. Central to current approaches is the infinite sites assumption that every genomic position can only mutate once over the lifetime of a tumor. The validity of this ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.220707.117

    authors: Kuipers J,Jahn K,Raphael BJ,Beerenwinkel N

    更新日期:2017-11-01 00:00:00

  • Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.

    abstract::Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is obse...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.gr-1619r

    authors: Wolf YI,Rogozin IB,Kondrashov AS,Koonin EV

    更新日期:2001-03-01 00:00:00

  • Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability.

    abstract::Whole-genome sequencing using massively parallel sequencing technologies enables accurate detection of somatic rearrangements in cancer. Pinpointing large numbers of rearrangement breakpoints to base-pair resolution allows analysis of rearrangement microhomology and genomic location for every sample. Here we analyze 9...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.141382.112

    authors: Drier Y,Lawrence MS,Carter SL,Stewart C,Gabriel SB,Lander ES,Meyerson M,Beroukhim R,Getz G

    更新日期:2013-02-01 00:00:00

  • High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications.

    abstract::We present a database of copy number variations (CNVs) detected in 2026 disease-free individuals, using high-density, SNP-based oligonucleotide microarrays. This large cohort, comprised mainly of Caucasians (65.2%) and African-Americans (34.2%), was analyzed for CNVs in a single study using a uniform array platform an...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.083501.108

    authors: Shaikh TH,Gai X,Perin JC,Glessner JT,Xie H,Murphy K,O'Hara R,Casalunovo T,Conlin LK,D'Arcy M,Frackelton EC,Geiger EA,Haldeman-Englert C,Imielinski M,Kim CE,Medne L,Annaiah K,Bradfield JP,Dabaghyan E,Eckert A,Onyia

    更新日期:2009-09-01 00:00:00

  • Long-read single-molecule maps of the functional methylome.

    abstract::We report on the development of a methylation analysis workflow for optical detection of fluorescent methylation profiles along chromosomal DNA molecules. In combination with Bionano Genomics genome mapping technology, these profiles provide a hybrid genetic/epigenetic genome-wide map composed of DNA molecules spannin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.240739.118

    authors: Sharim H,Grunwald A,Gabrieli T,Michaeli Y,Margalit S,Torchinsky D,Arielly R,Nifker G,Juhasz M,Gularek F,Almalvez M,Dufault B,Chandra SS,Liu A,Bhattacharya S,Chen YW,Vilain E,Wagner KR,Pevsner J,Reifenberger J,Lam

    更新日期:2019-04-01 00:00:00

  • Nutritional control of mRNA isoform expression during developmental arrest and recovery in C. elegans.

    abstract::Nutrient availability profoundly influences gene expression. Many animal genes encode multiple transcript isoforms, yet the effect of nutrient availability on transcript isoform expression has not been studied in genome-wide fashion. When Caenorhabditis elegans larvae hatch without food, they arrest development in the...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.133587.111

    authors: Maxwell CS,Antoshechkin I,Kurhanewicz N,Belsky JA,Baugh LR

    更新日期:2012-10-01 00:00:00