FragGeneScan: predicting genes in short and error-prone reads.

Abstract:

:The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Rho M,Tang H,Ye Y

doi

10.1093/nar/gkq747

subject

Has Abstract

pub_date

2010-11-01 00:00:00

pages

e191

issue

20

eissn

0305-1048

issn

1362-4962

pii

gkq747

journal_volume

38

pub_type

杂志文章
  • The splicing regulators Tra and Tra2 are unusually potent activators of pre-mRNA splicing.

    abstract::Sexual differentiation in Drosophila is regulated through alternative splicing of doublesex. Female-specific splicing is activated through the activity of splicing enhancer complexes assembled on multiple repeat elements. Each of these repeats serves as a binding platform for the cooperative assembly of a heterotrimer...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl984

    authors: Sciabica KS,Hertel KJ

    更新日期:2006-01-01 00:00:00

  • Apple II software for M13 shotgun DNA sequencing.

    abstract::A set of programs is presented for the reconstruction of a DNA sequence from data generated by the M13 shotgun sequencing technique. Once the sequence has been established and stored other programs are used for its analysis. The programs have been written for the Apple II microcomputer. A minimum investment is require...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.1.39

    authors: Larson R,Messing J

    更新日期:1982-01-11 00:00:00

  • Arabidopsis thaliana XRN2 is required for primary cleavage in the pre-ribosomal RNA.

    abstract::Three Rat1/Xrn2 homologues exist in Arabidopsis thaliana: nuclear AtXRN2 and AtXRN3, and cytoplasmic AtXRN4. The latter has a role in degrading 3' products of miRNA-mediated mRNA cleavage, whereas all three proteins act as endogenous post-transcriptional gene silencing suppressors. Here we show that, similar to yeast ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq172

    authors: Zakrzewska-Placzek M,Souret FF,Sobczyk GJ,Green PJ,Kufel J

    更新日期:2010-07-01 00:00:00

  • The signal recognition particle database (SRPDB).

    abstract::The SRPDB (signal recognition particle database) provides aligned SRP RNA and protein sequences, annotated and phylogenetically ordered. This release includes 82 SRP RNAs (including 22 bacterial and 9 archaeal homologs) and a total of 20 protein sequences representing SRP9, SRP14, SRP19, SRP54, SRP68, and SRP72. The o...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.17.3483

    authors: Zwieb C,Larsen N

    更新日期:1994-09-01 00:00:00

  • CAG*CTG repeat instability in cultured human astrocytes.

    abstract::Cells of the central nervous system (CNS) are prone to the devastating consequences of trinucleotide repeat (TNR) expansion. Some CNS cells, including astrocytes, show substantial TNR instability in affected individuals. Since astrocyte enrichment occurs in brain regions sensitive to neurodegeneration and somatic TNR ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl614

    authors: Farrell BT,Lahue RS

    更新日期:2006-01-01 00:00:00

  • Bovine Genome Database: new annotation tools for a new reference genome.

    abstract::The Bovine Genome Database (BGD) (http://bovinegenome.org) has been the key community bovine genomics database for more than a decade. To accommodate the increasing amount and complexity of bovine genomics data, BGD continues to advance its practices in data acquisition, curation, integration and efficient data retrie...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz944

    authors: Shamimuzzaman M,Le Tourneau JJ,Unni DR,Diesh CM,Triant DA,Walsh AT,Tayal A,Conant GC,Hagen DE,Elsik CG

    更新日期:2020-01-08 00:00:00

  • DiseaseEnhancer: a resource of human disease-associated enhancer catalog.

    abstract::Large-scale sequencing studies discovered substantial genetic variants occurring in enhancers which regulate genes via long range chromatin interactions. Importantly, such variants could affect enhancer regulation by changing transcription factor bindings or enhancer hijacking, and in turn, make an essential contribut...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx920

    authors: Zhang G,Shi J,Zhu S,Lan Y,Xu L,Yuan H,Liao G,Liu X,Zhang Y,Xiao Y,Li X

    更新日期:2018-01-04 00:00:00

  • Cloning and sequence analysis of an Ig lambda light chain mRNA expressed in the Burkitt's lymphoma cell line EB4.

    abstract::A cDNA library was constructed from the mRNA of the Ig lambda producing Burkitt's lymphoma cell line, EB4. Overlapping clones encompassing the coding sequence of the Ig lambda mRNA were isolated and sequenced. The predicted amino acid sequence shows a short hydrophobic leader peptide and a mature polypeptide of 217 re...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/13.8.2931

    authors: Anderson ML,Brown L,McKenzie E,Kellow JE,Young BD

    更新日期:1985-04-25 00:00:00

  • ASTRAL compendium enhancements.

    abstract::The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. It is partially derived from the SCOP database of protein domains, and it includes sequences for each domain as well as other resources useful for studying these seq...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/30.1.260

    authors: Chandonia JM,Walker NS,Lo Conte L,Koehl P,Levitt M,Brenner SE

    更新日期:2002-01-01 00:00:00

  • Single-molecule analysis of 1D diffusion and transcription elongation of T7 RNA polymerase along individual stretched DNA molecules.

    abstract::Using total internal reflection fluorescence microscopy, we directly visualize in real-time, the 1D Brownian motion and transcription elongation of T7 RNA polymerase along aligned DNA molecules bound to substrates by molecular combing. We fluorescently label T7 RNA polymerase with antibodies and use flow to convect th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm332

    authors: Kim JH,Larson RG

    更新日期:2007-01-01 00:00:00

  • VIDA: a virus database system for the organization of animal virus genome open reading frames.

    abstract::VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.1.133

    authors: Albà MM,Lee D,Pearl FM,Shepherd AJ,Martin N,Orengo CA,Kellam P

    更新日期:2001-01-01 00:00:00

  • Probabilistic error correction for RNA sequencing.

    abstract::Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not availab...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt215

    authors: Le HS,Schulz MH,McCauley BM,Hinman VF,Bar-Joseph Z

    更新日期:2013-05-01 00:00:00

  • Conserved 5' flank homologies in dipteran 5S RNA genes that would function on 'A' form DNA.

    abstract::We have sequenced the 480 base pair (bp) repeating unit of the 5S RNA genes of the Dipteran fly Calliphora erythrocephala and compared this sequence to the three known 5S RNA gene sequences from the Dipteran Genus Drosophila (1,2). A striking series of five perfectly conserved homologies identically positioned within ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/12.21.8193

    authors: Rubacha A,Sumner W 3rd,Richter L,Beckingham K

    更新日期:1984-11-12 00:00:00

  • A general computational approach to predicting synergistic transcriptional cores that determine cell subpopulation identities.

    abstract::Advances in single-cell RNA-sequencing techniques reveal the existence of distinct cell subpopulations. Identification of transcription factors (TFs) that define the identity of these subpopulations poses a challenge. Here, we postulate that identity depends on background subpopulations, and is determined by a synergi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz147

    authors: Okawa S,Del Sol A

    更新日期:2019-04-23 00:00:00

  • Explaining the varied glycosidic conformational, G-tract length and sequence preferences for anti-parallel G-quadruplexes.

    abstract::Guanine-rich DNA sequences tend to form four-stranded G-quadruplex structures. Characteristic glycosidic conformational patterns along the G-strands, such as the 5'-syn-anti-syn-anti pattern observed with the Oxytricha nova telomeric G-quadruplexes, have been well documented. However, an explanation for these featured...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr031

    authors: Cang X,Šponer J,Cheatham TE 3rd

    更新日期:2011-05-01 00:00:00

  • GenBank: update.

    abstract::GenBank is a comprehensive database that contains publicly available DNA sequences for more than 140 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin program an...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkh045

    authors: Benson DA,Karsch-Mizrachi I,Lipman DJ,Ostell J,Wheeler DL

    更新日期:2004-01-01 00:00:00

  • Femtosecond near-infrared laser microirradiation reveals a crucial role for PARP signaling on factor assemblies at DNA damage sites.

    abstract::Laser microirradiation is a powerful tool for real-time single-cell analysis of the DNA damage response (DDR). It is often found, however, that factor recruitment or modification profiles vary depending on the laser system employed. This is likely due to an incomplete understanding of how laser conditions/dosages affe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv976

    authors: Saquilabon Cruz GM,Kong X,Silva BA,Khatibzadeh N,Thai R,Berns MW,Yokomori K

    更新日期:2016-02-18 00:00:00

  • Proteomic and transcriptomic experiments reveal an essential role of RNA degradosome complexes in shaping the transcriptome of Mycobacterium tuberculosis.

    abstract::The phenotypic adjustments of Mycobacterium tuberculosis are commonly inferred from the analysis of transcript abundance. While mechanisms of transcriptional regulation have been extensively analysed in mycobacteria, little is known about mechanisms that shape the transcriptome by regulating RNA decay rates. The aim o...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz251

    authors: Płociński P,Macios M,Houghton J,Niemiec E,Płocińska R,Brzostek A,Słomka M,Dziadek J,Young D,Dziembowski A

    更新日期:2019-06-20 00:00:00

  • Modulation of glutathione peroxidase expression by selenium: effect on human MCF-7 breast cancer cell transfectants expressing a cellular glutathione peroxidase cDNA and doxorubicin-resistant MCF-7 cells.

    abstract::We have studied the effect of selenium on the expression of a cellular glutathione peroxidase, GSHPx-1, in transfected MCF-7 cells and in doxorubicin-resistant (Adrr) MCF-7 cells. A GSHPx-1 cDNA with a Rous Sarcoma virus promoter was transfected into a human mammary carcinoma cell line, MCF-7, which has very low endog...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/18.6.1531

    authors: Chu FF,Esworthy RS,Akman S,Doroshow JH

    更新日期:1990-03-25 00:00:00

  • Differential utilization of poly (A) signals between DHFR alleles in CHL cells.

    abstract::The Chinese hamster cell line, DC-3F, is heterozygous at the DHFR locus, and each allele can be distinguished on the basis of a unique DNA restriction pattern, protein isoelectric profile and in the abundancy of the DHFR mRNAs it expresses. Although each allele produces four transcripts, 1000, 1650 and 2150 nucleotide...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.24.6597

    authors: Scotto KW,Yang H,Davide JP,Melera PW

    更新日期:1992-12-25 00:00:00

  • Deaminase-independent inhibition of HIV-1 reverse transcription by APOBEC3G.

    abstract::APOBEC3G (A3G), a host protein that inhibits HIV-1 reverse transcription and replication in the absence of Vif, displays cytidine deaminase and single-stranded (ss) nucleic acid binding activities. HIV-1 nucleocapsid protein (NC) also binds nucleic acids and has a unique property, nucleic acid chaperone activity, whic...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm750

    authors: Iwatani Y,Chan DS,Wang F,Stewart-Maynard K,Sugiura W,Gronenborn AM,Rouzina I,Williams MC,Musier-Forsyth K,Levin JG

    更新日期:2007-01-01 00:00:00

  • Destabilization of tetranucleotide repeats in Haemophilus influenzae mutants lacking RnaseHI or the Klenow domain of PolI.

    abstract::A feature of Haemophilus influenzae genomes is the presence of several loci containing tracts of six or more identical tetranucleotide repeat units. These repeat tracts are unstable and mediate high frequency, reversible alterations in the expression of surface antigens. This process, termed phase variation (PV), enab...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki180

    authors: Bayliss CD,Sweetman WA,Moxon ER

    更新日期:2005-01-14 00:00:00

  • Stabilization of XIAP mRNA through the RNA binding protein HuR regulated by cellular polyamines.

    abstract::The X chromosome-linked inhibitor of apoptosis protein (XIAP) is the most potent intrinsic caspase inhibitor and plays an important role in the maintenance of intestinal epithelial integrity. The RNA binding protein, HuR, regulates the stability and translation of many target transcripts. Here, we report that HuR asso...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp755

    authors: Zhang X,Zou T,Rao JN,Liu L,Xiao L,Wang PY,Cui YH,Gorospe M,Wang JY

    更新日期:2009-12-01 00:00:00

  • Human protein reference database--2006 update.

    abstract::Human Protein Reference Database (HPRD) (http://www.hprd.org) was developed to serve as a comprehensive collection of protein features, post-translational modifications (PTMs) and protein-protein interactions. Since the original report, this database has increased to >20 000 proteins entries and has become the largest...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkj141

    authors: Mishra GR,Suresh M,Kumaran K,Kannabiran N,Suresh S,Bala P,Shivakumar K,Anuradha N,Reddy R,Raghavan TM,Menon S,Hanumanthu G,Gupta M,Upendran S,Gupta S,Mahesh M,Jacob B,Mathew P,Chatterjee P,Arun KS,Sharma S,Chand

    更新日期:2006-01-01 00:00:00

  • Intracellular RNA cleavage by the hairpin ribozyme.

    abstract::Studies involving ribozyme-directed inactivation of targeted RNA molecules have met with mixed success, making clear the importance of methods to measure and optimize ribozyme activity within cells. The interpretation of biochemical assays for determining ribozyme activity in the cellular environment have been complic...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.15.3494

    authors: Seyhan AA,Amaral J,Burke JM

    更新日期:1998-08-01 00:00:00

  • Benzpyrene groups bind preferentially to the DNA of active chromatin in human lung cells.

    abstract::The cells of the bronchial epithelium of man are targets for benzo(a)pyrene carcinogenesis. When cultures of these cells, and of non-target fibroblasts, are exposed to [3H]-benzo(a)pyrene, we find that the epithelial cells metabolise and bind to DNA far greater amounts of benzpyrene than do fibroblasts. By analysis of...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.5.1547

    authors: Arrand JE,Murray AM

    更新日期:1982-03-11 00:00:00

  • Polymers of random short oligonucleotides detect polymorphic loci in the human genome.

    abstract::Polymers of random 14 mer oligonucleotides are shown to detect discrete loci in the human genome. Eighteen different synthetic tandem repeats of random 14 base-pair units (STRs) have been generated and all of them turn out to detect polymorphic loci on southern blots of human DNA samples, presumably corresponding to a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/17.19.7623

    authors: Vergnaud G

    更新日期:1989-10-11 00:00:00

  • MAZ induces MYB expression during the exit from quiescence via the E2F site in the MYB promoter.

    abstract::Most E2F-binding sites repress transcription through the recruitment of Retinoblastoma (RB) family members until the end of the G1 cell-cycle phase. Although the MYB promoter contains an E2F-binding site, its transcription is activated shortly after the exit from quiescence, before RB family members inactivation, by u...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx641

    authors: Álvaro-Blanco J,Urso K,Chiodo Y,Martín-Cortázar C,Kourani O,Arco PG,Rodríguez-Martínez M,Calonge E,Alcamí J,Redondo JM,Iglesias T,Campanero MR

    更新日期:2017-09-29 00:00:00

  • Vectors for P element-mediated gene transfer in Drosophila.

    abstract::We have constructed and tested several new vectors for P element-mediated gene transfer. These vectors contain restriction sites for cloning a wide variety of DNA fragments within a small, non-autonomous P element and can be used to efficiently transduce microinjected DNA sequences into the germ line chromosomes of D....

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/11.18.6341

    authors: Rubin GM,Spradling AC

    更新日期:1983-09-24 00:00:00

  • H-DNA and Z-DNA in the mouse c-Ki-ras promoter.

    abstract::The mouse c-Ki-ras protooncogene promoter contains a homopurine-homopyrimidine domain that exhibits S1 nuclease sensitivity in vitro. We have studied the structure of this DNA region in a supercoiled state using a number of chemical probes for non-B DNA conformations including diethyl pyrocarbonate, osmium tetroxide, ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/19.23.6527

    authors: Pestov DG,Dayn A,Siyanova EYu,George DL,Mirkin SM

    更新日期:1991-12-11 00:00:00