Abstract:
:We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.
journal_name
Nucleic Acids Resjournal_title
Nucleic acids researchauthors
Zhu W,Lomsadze A,Borodovsky Mdoi
10.1093/nar/gkq275subject
Has Abstractpub_date
2010-07-01 00:00:00pages
e132issue
12eissn
0305-1048issn
1362-4962pii
gkq275journal_volume
38pub_type
杂志文章abstract::Genome-wide mapping in the identification of novel candidate genes has always been the standard method in genetics and genomics to correlate a clinically interesting phenotypic trait with a genotype. However, the performance of a mapping experiment using classical microsatellite approaches can be very time consuming. ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gki430
更新日期:2005-07-01 00:00:00
abstract::Elucidating the dynamic organization of nuclear RNA foci is important for understanding and manipulating these functional sites of gene expression in both physiological and pathological states. However, such studies have been difficult to establish in vivo as a result of the absence of suitable RNA imaging methods. He...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkv614
更新日期:2015-10-30 00:00:00
abstract::To date, an effective therapeutic treatment that confers strong attenuation toward coronaviruses (CoVs) remains elusive. Of all the potential drug targets, the helicase of CoVs is considered to be one of the most important. Here, we first present the structure of the full-length Nsp13 helicase of SARS-CoV (SARS-Nsp13)...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkz409
更新日期:2019-07-09 00:00:00
abstract::Genes in the germline (micronuclear) genome of hypotrichous ciliates are interrupted by multiple, short, non-coding, AT-rich sequences called internal eliminated segments, or IESs. During conversion of a micronucleus to a somatic nucleus (macronucleus) after cell mating, all IESs are excised from the germline genes an...
journal_title:Nucleic acids research
pub_type: 杂志文章,评审
doi:10.1093/nar/27.5.1243
更新日期:1999-03-01 00:00:00
abstract::The availability of protein fluorophores with appropriate spectral properties has made it possible to employ fluorescence resonance energy transfer (FRET) to assess interactions between three proteins microscopically. Flow cytometry offers excellent sensitivity, effective signal separation and the capacity to assess a...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gni057
更新日期:2005-04-01 00:00:00
abstract::This work presents the Apo-Holo DataBase (AH-DB, http://ahdb.ee.ncku.edu.tw/ and http://ahdb.csbb.ntu.edu.tw/), which provides corresponding pairs of protein structures before and after binding. Conformational transitions are commonly observed in various protein interactions that are involved in important biological f...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkr940
更新日期:2012-01-01 00:00:00
abstract::DNA binding of heat shock factor 2 (HSF2) is induced during hemin-induced differentiation of human erythroleukemia cell line K562. To identify the transcriptional activation and the regulatory domains of HSF2, we constructed a series of deletion derivatives fused to the yeast GAL4 DNA binding domain and analyzed their...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/26.11.2580
更新日期:1998-06-01 00:00:00
abstract::This report documents the error rate in a commercially distributed subset of the IMAGE Consortium mouse cDNA clone collection. After isolation of plasmid DNA from 1189 bacterial stock cultures, only 62. 2% were uncontaminated and contained cDNA inserts that had significant sequence identity to published data for the o...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/29.2.582
更新日期:2001-01-15 00:00:00
abstract::Transcriptional repression of pathogen defense-related genes is essential for plant growth and development. Several proteins are known to be involved in the transcriptional regulation of plant defense responses. However, mechanisms by which expression of defense-related genes are regulated by repressor proteins are po...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gks683
更新日期:2012-10-01 00:00:00
abstract::ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modelle...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkq1091
更新日期:2011-01-01 00:00:00
abstract::IGF2 mRNA-binding protein 1 (IMP1) is a key regulator of messenger RNA (mRNA) metabolism and transport in organismal development and, in cancer, its mis-regulation is an important component of tumour metastasis. IMP1 function relies on the recognition of a diverse set of mRNA targets that is mediated by the combinator...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkz136
更新日期:2019-05-07 00:00:00
abstract::Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that ha...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkr1065
更新日期:2012-01-01 00:00:00
abstract::Previous work has demonstrated that the yeast SPT3 gene is required for transcription from delta sequences, the long terminal repeats that flank yeast Ty elements. In spt3 null mutants, transcription fails to initiate in delta sequences and instead initiates farther downstream. Null mutations in SPT3 cause other mutan...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/14.17.6885
更新日期:1986-09-11 00:00:00
abstract::LINE1s occupy 17% of the human genome and are its only active autonomous mobile DNA. L1s are also responsible for genomic insertion of processed pseudogenes and >1 million non-autonomous retrotransposons (Alus and SVAs). These elements have significant effects on gene organization and expression. Despite the importanc...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkt512
更新日期:2013-08-01 00:00:00
abstract::Expressed polycistronic microRNA (miR) cassettes have useful properties that can be utilized for RNA interference (RNAi)-based gene silencing. To advance their application we generated modular trimeric anti-hepatitis B virus (HBV) Pol II cassettes encoding primary (pri)-miR-31-derived shuttles that target three differ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkp446
更新日期:2009-07-01 00:00:00
abstract::At the DNA/RNA level, biological signals are defined by a combination of spatial structures and sequence motifs. Until now, few attempts had been made in writing general purpose search programs that take into account both sequence and structure criteria. Indeed, the most successful structure scanning programs are usua...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/24.8.1395
更新日期:1996-04-15 00:00:00
abstract::A general approach for the synthesis of oligonucleotide-triplet phosphoramidites and the synthesis of four such blocks are described. A strategy was devised to minimize the number of dimer precursors needed for synthesis of a complete set of triplet-amidite blocks encoding all 20 amino acids. Whereas synthesis of 20 t...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/23.22.4677
更新日期:1995-11-25 00:00:00
abstract::A detailed scheme of the Peptidyl Transferase Centre of bacterial ribosomes is proposed by summarizing the literature data on the substrate specificity of the acceptor and donor sites. According to the proposed scheme only the elements of the donor and acceptor having a stable structure bind with the ribosome. The pre...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/2.12.2223
更新日期:1975-12-01 00:00:00
abstract::We previously showed that mRNAs synthesized from three genes that naturally lack introns contain a portion of their coding sequence, known as a cytoplasmic accumulation region (CAR), which is essential for stable accumulation of the intronless mRNAs in the cytoplasm. The CAR in each mRNA is unexpectedly large, ranging...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gks1314
更新日期:2013-02-01 00:00:00
abstract::Recent studies employing genome-wide approaches have provided an unprecedented view of the scope of L1 activities on structural variations in the human genome, and further reinforced the role of L1s as one of the major driving forces behind human genome evolution. The rapid identification of novel L1 elements by these...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkq1076
更新日期:2011-02-01 00:00:00
abstract::The human gene for plasminogen activator inhibitor type-1 (PAI-1) has been isolated and its promoter region characterized. PAI-1 regulation by glucocorticoids, transforming growth factor-beta (TGF-beta) and the phorbol ester PMA is shown to be exerted at the promoter level. A fragment spanning 805 nucleotides of the 5...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/16.7.2805
更新日期:1988-04-11 00:00:00
abstract::We present RADAR--a rigorously annotated database of A-to-I RNA editing (available at http://RNAedit.com). The identification of A-to-I RNA editing sites has been dramatically accelerated in the past few years by high-throughput RNA sequencing studies. RADAR includes a comprehensive collection of A-to-I RNA editing si...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkt996
更新日期:2014-01-01 00:00:00
abstract::The transcriptional activity of the p53 tumor suppressor protein is crucial for the regulation of cell growth, apoptosis and tumor progression. The first identified p53 relative, p73, was reported to be monoallelically expressed in normal tissues. In some tumors, loss of heterozygosity was associated with overexpressi...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/28.2.513
更新日期:2000-01-15 00:00:00
abstract::Three clones of non-repetitive sequences and six clones containing repetitive sequences were obtained from micronuclear DNA of Tetrahymena thermophila. All the non-repetitive and three repetitive sequences had the same organization in micro- and macronuclear DNAs as revealed by blot hybridization. On the other hand, t...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/10.14.4279
更新日期:1982-07-24 00:00:00
abstract::The recent publication of the Caenorhabditis elegans cisRED database has provided an extensive catalog of upstream elements that are conserved between nematode genomes. We have performed a secondary analysis to determine which subsequences of the cisRED motifs are found in multiple locations throughout the C. elegans ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkq003
更新日期:2010-05-01 00:00:00
abstract::Allostery is the most direct, rapid and efficient way of regulating protein function, ranging from the control of metabolic mechanisms to signal-transduction pathways. However, an enormous amount of unsystematic allostery information has deterred scientists who could benefit from this field. Here, we present the AlloS...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkq1022
更新日期:2011-01-01 00:00:00
abstract::Trinucleotide repeat (TNR) expansions cause at least 17 heritable neurological diseases, including Huntington's disease. Expansions are thought to arise from abnormal processing of TNR DNA by specific trans-acting proteins. For example, the DNA repair complex MutSβ (MSH2-MSH3 heterodimer) is required in mice for on-go...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gks810
更新日期:2012-11-01 00:00:00
abstract::To investigate the mechanism of N4-aminocytidine-induced mutagenesis, N'-alkyl-N4-aminocytidines and N4-alkyl-N4-aminocytidines were prepared and their mutagenicity on bacteria were assayed. N'-Methyl-N4-aminocytidine, N'-(2-hydroxyethyl)-N4-aminocytidine and N',N'-dimethyl-N4-aminocytidine showed direct-acting mutage...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/13.24.8893
更新日期:1985-12-20 00:00:00
abstract::A dinucleoside monophosphate, 6,2'-anhydro-6-oxy-1-beta-D-arabinofuranosyluracil-phosphoryl- (3'-5')-6,2'-anhydro-6-oxy-1-beta-D-arabinofuranosyluracil (I) was synthesized by the condensation reaction using DCC from 5'-monomethoxytrityl derivative(VII) and 3'-acetyl-5'-phosphate(X) of the monomer units. Yield was ca. ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/1.3.479
更新日期:1974-03-01 00:00:00
abstract::Specific guanine-rich regions in human genome can form higher-order DNA structures called G-quadruplexes, which regulate many relevant biological processes. For instance, the formation of G-quadruplex at telomeres can alter cellular functions, inducing apoptosis. Thus, developing small molecules that are able to bind ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gku247
更新日期:2014-05-01 00:00:00