Ab initio gene identification in metagenomic sequences.

Abstract:

:We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Zhu W,Lomsadze A,Borodovsky M

doi

10.1093/nar/gkq275

subject

Has Abstract

pub_date

2010-07-01 00:00:00

pages

e132

issue

12

eissn

0305-1048

issn

1362-4962

pii

gkq275

journal_volume

38

pub_type

杂志文章
  • ARTS: a web-based tool for the set-up of high-throughput genome-wide mapping panels for the SNP genotyping of mouse mutants.

    abstract::Genome-wide mapping in the identification of novel candidate genes has always been the standard method in genetics and genomics to correlate a clinically interesting phenotypic trait with a genotype. However, the performance of a mapping experiment using classical microsatellite approaches can be very time consuming. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki430

    authors: Klaften M,Hrabé de Angelis M

    更新日期:2005-07-01 00:00:00

  • ECHO-liveFISH: in vivo RNA labeling reveals dynamic regulation of nuclear RNA foci in living tissues.

    abstract::Elucidating the dynamic organization of nuclear RNA foci is important for understanding and manipulating these functional sites of gene expression in both physiological and pathological states. However, such studies have been difficult to establish in vivo as a result of the absence of suitable RNA imaging methods. He...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv614

    authors: Oomoto I,Suzuki-Hirano A,Umeshima H,Han YW,Yanagisawa H,Carlton P,Harada Y,Kengaku M,Okamoto A,Shimogori T,Wang DO

    更新日期:2015-10-30 00:00:00

  • Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis.

    abstract::To date, an effective therapeutic treatment that confers strong attenuation toward coronaviruses (CoVs) remains elusive. Of all the potential drug targets, the helicase of CoVs is considered to be one of the most important. Here, we first present the structure of the full-length Nsp13 helicase of SARS-CoV (SARS-Nsp13)...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz409

    authors: Jia Z,Yan L,Ren Z,Wu L,Wang J,Guo J,Zheng L,Ming Z,Zhang L,Lou Z,Rao Z

    更新日期:2019-07-09 00:00:00

  • The evolutionary scrambling and developmental unscrambling of germline genes in hypotrichous ciliates.

    abstract::Genes in the germline (micronuclear) genome of hypotrichous ciliates are interrupted by multiple, short, non-coding, AT-rich sequences called internal eliminated segments, or IESs. During conversion of a micronucleus to a somatic nucleus (macronucleus) after cell mating, all IESs are excised from the germline genes an...

    journal_title:Nucleic acids research

    pub_type: 杂志文章,评审

    doi:10.1093/nar/27.5.1243

    authors: Prescott DM

    更新日期:1999-03-01 00:00:00

  • Determination of tumor necrosis factor receptor-associated factor trimerization in living cells by CFP->YFP->mRFP FRET detected by flow cytometry.

    abstract::The availability of protein fluorophores with appropriate spectral properties has made it possible to employ fluorescence resonance energy transfer (FRET) to assess interactions between three proteins microscopically. Flow cytometry offers excellent sensitivity, effective signal separation and the capacity to assess a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gni057

    authors: He L,Wu X,Simone J,Hewgill D,Lipsky PE

    更新日期:2005-04-01 00:00:00

  • AH-DB: collecting protein structure pairs before and after binding.

    abstract::This work presents the Apo-Holo DataBase (AH-DB, http://ahdb.ee.ncku.edu.tw/ and http://ahdb.csbb.ntu.edu.tw/), which provides corresponding pairs of protein structures before and after binding. Conformational transitions are commonly observed in various protein interactions that are involved in important biological f...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr940

    authors: Chang DT,Yao TJ,Fan CY,Chiang CY,Bai YH

    更新日期:2012-01-01 00:00:00

  • Function of the C-terminal transactivation domain of human heat shock factor 2 is modulated by the adjacent negative regulatory segment.

    abstract::DNA binding of heat shock factor 2 (HSF2) is induced during hemin-induced differentiation of human erythroleukemia cell line K562. To identify the transcriptional activation and the regulatory domains of HSF2, we constructed a series of deletion derivatives fused to the yeast GAL4 DNA binding domain and analyzed their...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.11.2580

    authors: Yoshima T,Yura T,Yanagi H

    更新日期:1998-06-01 00:00:00

  • Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones.

    abstract::This report documents the error rate in a commercially distributed subset of the IMAGE Consortium mouse cDNA clone collection. After isolation of plasmid DNA from 1189 bacterial stock cultures, only 62. 2% were uncontaminated and contained cDNA inserts that had significant sequence identity to published data for the o...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.2.582

    authors: Halgren RG,Fielden MR,Fong CJ,Zacharewski TR

    更新日期:2001-01-15 00:00:00

  • A NAC transcription factor and SNI1 cooperatively suppress basal pathogen resistance in Arabidopsis thaliana.

    abstract::Transcriptional repression of pathogen defense-related genes is essential for plant growth and development. Several proteins are known to be involved in the transcriptional regulation of plant defense responses. However, mechanisms by which expression of defense-related genes are regulated by repressor proteins are po...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks683

    authors: Kim HS,Park HC,Kim KE,Jung MS,Han HJ,Kim SH,Kwon YS,Bahk S,An J,Bae DW,Yun DJ,Kwak SS,Chung WS

    更新日期:2012-10-01 00:00:00

  • ModBase, a database of annotated comparative protein structure models, and associated resources.

    abstract::ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modelle...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1091

    authors: Pieper U,Webb BM,Barkan DT,Schneidman-Duhovny D,Schlessinger A,Braberg H,Yang Z,Meng EC,Pettersen EF,Huang CC,Datta RS,Sampathkumar P,Madhusudhan MS,Sjölander K,Ferrin TE,Burley SK,Sali A

    更新日期:2011-01-01 00:00:00

  • IMP1 KH1 and KH2 domains create a structural platform with unique RNA recognition and re-modelling properties.

    abstract::IGF2 mRNA-binding protein 1 (IMP1) is a key regulator of messenger RNA (mRNA) metabolism and transport in organismal development and, in cancer, its mis-regulation is an important component of tumour metastasis. IMP1 function relies on the recognition of a diverse set of mRNA targets that is mediated by the combinator...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz136

    authors: Dagil R,Ball NJ,Ogrodowicz RW,Hobor F,Purkiss AG,Kelly G,Martin SR,Taylor IA,Ramos A

    更新日期:2019-05-07 00:00:00

  • The Pfam protein families database.

    abstract::Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that ha...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr1065

    authors: Punta M,Coggill PC,Eberhardt RY,Mistry J,Tate J,Boursnell C,Pang N,Forslund K,Ceric G,Clements J,Heger A,Holm L,Sonnhammer EL,Eddy SR,Bateman A,Finn RD

    更新日期:2012-01-01 00:00:00

  • Analysis of the yeast SPT3 gene and identification of its product, a positive regulator of Ty transcription.

    abstract::Previous work has demonstrated that the yeast SPT3 gene is required for transcription from delta sequences, the long terminal repeats that flank yeast Ty elements. In spt3 null mutants, transcription fails to initiate in delta sequences and instead initiates farther downstream. Null mutations in SPT3 cause other mutan...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/14.17.6885

    authors: Winston F,Minehart PL

    更新日期:1986-09-11 00:00:00

  • Mapping the LINE1 ORF1 protein interactome reveals associated inhibitors of human retrotransposition.

    abstract::LINE1s occupy 17% of the human genome and are its only active autonomous mobile DNA. L1s are also responsible for genomic insertion of processed pseudogenes and >1 million non-autonomous retrotransposons (Alus and SVAs). These elements have significant effects on gene organization and expression. Despite the importanc...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt512

    authors: Goodier JL,Cheung LE,Kazazian HH Jr

    更新日期:2013-08-01 00:00:00

  • Efficient silencing of gene expression with modular trimeric Pol II expression cassettes comprising microRNA shuttles.

    abstract::Expressed polycistronic microRNA (miR) cassettes have useful properties that can be utilized for RNA interference (RNAi)-based gene silencing. To advance their application we generated modular trimeric anti-hepatitis B virus (HBV) Pol II cassettes encoding primary (pri)-miR-31-derived shuttles that target three differ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp446

    authors: Ely A,Naidoo T,Arbuthnot P

    更新日期:2009-07-01 00:00:00

  • Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database.

    abstract::At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usua...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/24.8.1395

    authors: Billoud B,Kontic M,Viari A

    更新日期:1996-04-15 00:00:00

  • The synthesis of blocked triplet-phosphoramidites and their use in mutagenesis.

    abstract::A general approach for the synthesis of oligonucleotide-triplet phosphoramidites and the synthesis of four such blocks are described. A strategy was devised to minimize the number of dimer precursors needed for synthesis of a complete set of triplet-amidite blocks encoding all 20 amino acids. Whereas synthesis of 20 t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/23.22.4677

    authors: Ono A,Matsuda A,Zhao J,Santi DV

    更新日期:1995-11-25 00:00:00

  • Peptidyl transferase centre of bacterial ribosomes: substrate specificity and binding sites.

    abstract::A detailed scheme of the Peptidyl Transferase Centre of bacterial ribosomes is proposed by summarizing the literature data on the substrate specificity of the acceptor and donor sites. According to the proposed scheme only the elements of the donor and acceptor having a stable structure bind with the ribosome. The pre...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/2.12.2223

    authors: Krayevsky AA,Kukhanova MK,Gottikh BP

    更新日期:1975-12-01 00:00:00

  • Evidence that a consensus element found in naturally intronless mRNAs promotes mRNA export.

    abstract::We previously showed that mRNAs synthesized from three genes that naturally lack introns contain a portion of their coding sequence, known as a cytoplasmic accumulation region (CAR), which is essential for stable accumulation of the intronless mRNAs in the cytoplasm. The CAR in each mRNA is unexpectedly large, ranging...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks1314

    authors: Lei H,Zhai B,Yin S,Gygi S,Reed R

    更新日期:2013-02-01 00:00:00

  • Characterization of L1 retrotransposition with high-throughput dual-luciferase assays.

    abstract::Recent studies employing genome-wide approaches have provided an unprecedented view of the scope of L1 activities on structural variations in the human genome, and further reinforced the role of L1s as one of the major driving forces behind human genome evolution. The rapid identification of novel L1 elements by these...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1076

    authors: Xie Y,Rosser JM,Thompson TL,Boeke JD,An W

    更新日期:2011-02-01 00:00:00

  • The regulatory region of the human plasminogen activator inhibitor type-1 (PAI-1) gene.

    abstract::The human gene for plasminogen activator inhibitor type-1 (PAI-1) has been isolated and its promoter region characterized. PAI-1 regulation by glucocorticoids, transforming growth factor-beta (TGF-beta) and the phorbol ester PMA is shown to be exerted at the promoter level. A fragment spanning 805 nucleotides of the 5...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/16.7.2805

    authors: Riccio A,Lund LR,Sartorio R,Lania A,Andreasen PA,Danø K,Blasi F

    更新日期:1988-04-11 00:00:00

  • RADAR: a rigorously annotated database of A-to-I RNA editing.

    abstract::We present RADAR--a rigorously annotated database of A-to-I RNA editing (available at http://RNAedit.com). The identification of A-to-I RNA editing sites has been dramatically accelerated in the past few years by high-throughput RNA sequencing studies. RADAR includes a comprehensive collection of A-to-I RNA editing si...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt996

    authors: Ramaswami G,Li JB

    更新日期:2014-01-01 00:00:00

  • p73 competes with p53 and attenuates its response in a human ovarian cancer cell line.

    abstract::The transcriptional activity of the p53 tumor suppressor protein is crucial for the regulation of cell growth, apoptosis and tumor progression. The first identified p53 relative, p73, was reported to be monoallelically expressed in normal tissues. In some tumors, loss of heterozygosity was associated with overexpressi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/28.2.513

    authors: Vikhanskaya F,D'Incalci M,Broggini M

    更新日期:2000-01-15 00:00:00

  • Rearrangement of repeated DNA sequences during development of macronucleus in Tetrahymena thermophila.

    abstract::Three clones of non-repetitive sequences and six clones containing repetitive sequences were obtained from micronuclear DNA of Tetrahymena thermophila. All the non-repetitive and three repetitive sequences had the same organization in micro- and macronuclear DNAs as revealed by blot hybridization. On the other hand, t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.14.4279

    authors: Iwamura Y,Sakai M,Muramatsu M

    更新日期:1982-07-24 00:00:00

  • Conserved elements associated with ribosomal genes and their trans-splice acceptor sites in Caenorhabditis elegans.

    abstract::The recent publication of the Caenorhabditis elegans cisRED database has provided an extensive catalog of upstream elements that are conserved between nematode genomes. We have performed a secondary analysis to determine which subsequences of the cisRED motifs are found in multiple locations throughout the C. elegans ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq003

    authors: Sleumer MC,Mah AK,Baillie DL,Jones SJ

    更新日期:2010-05-01 00:00:00

  • ASD: a comprehensive database of allosteric proteins and modulators.

    abstract::Allostery is the most direct, rapid and efficient way of regulating protein function, ranging from the control of metabolic mechanisms to signal-transduction pathways. However, an enormous amount of unsystematic allostery information has deterred scientists who could benefit from this field. Here, we present the AlloS...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1022

    authors: Huang Z,Zhu L,Cao Y,Wu G,Liu X,Chen Y,Wang Q,Shi T,Zhao Y,Wang Y,Li W,Li Y,Chen H,Chen G,Zhang J

    更新日期:2011-01-01 00:00:00

  • MutSβ and histone deacetylase complexes promote expansions of trinucleotide repeats in human cells.

    abstract::Trinucleotide repeat (TNR) expansions cause at least 17 heritable neurological diseases, including Huntington's disease. Expansions are thought to arise from abnormal processing of TNR DNA by specific trans-acting proteins. For example, the DNA repair complex MutSβ (MSH2-MSH3 heterodimer) is required in mice for on-go...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks810

    authors: Gannon AM,Frizzell A,Healy E,Lahue RS

    更新日期:2012-11-01 00:00:00

  • Direct-acting mutagenicity of N4-aminocytidine derivatives bearing alkyl groups at the hydrazino nitrogens.

    abstract::To investigate the mechanism of N4-aminocytidine-induced mutagenesis, N'-alkyl-N4-aminocytidines and N4-alkyl-N4-aminocytidines were prepared and their mutagenicity on bacteria were assayed. N'-Methyl-N4-aminocytidine, N'-(2-hydroxyethyl)-N4-aminocytidine and N',N'-dimethyl-N4-aminocytidine showed direct-acting mutage...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/13.24.8893

    authors: Nomura A,Negishi K,Hayatsu H

    更新日期:1985-12-20 00:00:00

  • Polynucleotides. XXIV. Synthesis and properties of a dinucleoside monophosphate derived from uridine 6,2-cyclonucleoside.

    abstract::A dinucleoside monophosphate, 6,2'-anhydro-6-oxy-1-beta-D-arabinofuranosyluracil-phosphoryl- (3'-5')-6,2'-anhydro-6-oxy-1-beta-D-arabinofuranosyluracil (I) was synthesized by the condensation reaction using DCC from 5'-monomethoxytrityl derivative(VII) and 3'-acetyl-5'-phosphate(X) of the monomer units. Yield was ca. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/1.3.479

    authors: Ikehara M,Tezuka T

    更新日期:1974-03-01 00:00:00

  • Mechanistic insight into ligand binding to G-quadruplex DNA.

    abstract::Specific guanine-rich regions in human genome can form higher-order DNA structures called G-quadruplexes, which regulate many relevant biological processes. For instance, the formation of G-quadruplex at telomeres can alter cellular functions, inducing apoptosis. Thus, developing small molecules that are able to bind ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku247

    authors: Di Leva FS,Novellino E,Cavalli A,Parrinello M,Limongelli V

    更新日期:2014-05-01 00:00:00