GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Abstract:

:Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Besemer J,Lomsadze A,Borodovsky M

doi

10.1093/nar/29.12.2607

keywords:

subject

Has Abstract

pub_date

2001-06-15 00:00:00

pages

2607-18

issue

12

eissn

0305-1048

issn

1362-4962

journal_volume

29

pub_type

杂志文章
  • The DNA sequence at the T7 C promoter.

    abstract::Restriction fragments of T7 DNA which selectively bind E. coli RNA polymerase have been identified. These include fragments located close to the beginning of gene 1 where according to Minkley and Pribnow (1973) there is a promoter called C. The smallest fragment from this region which binds RNA polymerase has been seq...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/6.2.525

    authors: McConnell DJ

    更新日期:1979-02-01 00:00:00

  • The effects of a unique D-loop structure of a minor tRNA(UUALeu) from Streptomyces on its structural stability and amino acid accepting activity.

    abstract::Streptomyces bldA gene, which encodes a tRNA corresponding to a very minor leucine codon, UUA, regulates pleiotropic gene expression which is involved in sporulation and secondary metabolism. The unique structural feature of this tRNA is the lack of GG sequence in dihydrouridine loop (D-loop) that generally is conserv...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.15.3911

    authors: Ueda Y,Kumagai I,Miura K

    更新日期:1992-08-11 00:00:00

  • DNA analysis with multiplex microarray-enhanced PCR.

    abstract::We have developed a highly sensitive method for DNA analysis on 3D gel element microarrays, a technique we call multiplex microarray-enhanced PCR (MME-PCR). Two amplification strategies are carried out simultaneously in the reaction chamber: on or within gel elements, and in bulk solution over the gel element array. M...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gnh184

    authors: Pemov A,Modi H,Chandler DP,Bavykin S

    更新日期:2005-01-20 00:00:00

  • Effects of codon usage on gene expression are promoter context dependent.

    abstract::Codon usage bias is a universal feature of all genomes. Although codon usage has been shown to regulate mRNA and protein levels by influencing mRNA decay and transcription in eukaryotes, little or no genome-wide correlations between codon usage and mRNA levels are detected in mammalian cells, raising doubt on the sign...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkaa1253

    authors: Yang Q,Lyu X,Zhao F,Liu Y

    更新日期:2021-01-25 00:00:00

  • Massively parallel characterization of restriction endonucleases.

    abstract::Restriction endonucleases are highly specific in recognizing the particular DNA sequence they act on. However, their activity is affected by sequence context, enzyme concentration and buffer composition. Changes in these factors may lead to either ineffective cleavage at the cognate restriction site or relaxed specifi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt257

    authors: Kamps-Hughes N,Quimby A,Zhu Z,Johnson EA

    更新日期:2013-06-01 00:00:00

  • A cDNA clone of the hnRNP C proteins and its homology with the single-stranded DNA binding protein UP2.

    abstract::A cDNA clone which expresses a protein that cross-reacts immunologically with the human C1 and C2 hnRNP core proteins has been isolated. The clone was selected by a sensitive immunochemical assay employing an avidin-biotin complex for detection, and identified as a clone for the hnRNP C proteins by a highly sensitive ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/14.10.4077

    authors: Lahiri DK,Thomas JO

    更新日期:1986-05-27 00:00:00

  • Molecular recognition of RhlB and RNase D in the Caulobacter crescentus RNA degradosome.

    abstract::The endoribonuclease RNase E is a key enzyme in RNA metabolism for many bacterial species. In Escherichia coli, RNase E contributes to the majority of RNA turnover and processing events, and the enzyme has been extensively characterized as the central component of the RNA degradosome assembly. A similar RNA degradosom...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku1134

    authors: Voss JE,Luisi BF,Hardwick SW

    更新日期:2014-12-01 00:00:00

  • Fluoride-cleavable biotinylation phosphoramidite for 5'-end-labeling and affinity purification of synthetic oligonucleotides.

    abstract::A fluoride-cleavable phosphoramidite for biotinylation was designed, synthesized and coupled efficiently to the 5'-end of DNA on an automatic synthesizer. The diisopropylsilyl acetal functionality was used to link the biotin moiety through a tertiary hydroxide group to the 5'-end of DNA. This linkage proved to be comp...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkg130

    authors: Fang S,Bergstrom DE

    更新日期:2003-01-15 00:00:00

  • Molecular characterization of Drosophila NELF.

    abstract::NELF and DSIF act together to inhibit transcription elongation in vitro, and are implicated in causing promoter proximal pausing on the hsp70 gene in Drosophila. Here, further characterization of Drosophila NELF is provided. Drosophila NELF has four subunits similar to subunits of human NELF. The amino acid sequences ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki274

    authors: Wu CH,Lee C,Fan R,Smith MJ,Yamaguchi Y,Handa H,Gilmour DS

    更新日期:2005-03-01 00:00:00

  • An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences.

    abstract::This paper describes a computer program designed to look for similarities between pairs of nucleic or amino acid sequences. The program looks both for segments of perfect identity or for regions where, using a scoring matrix, a minimum value is exceeded. The results of comparisons are presented as a matrix which is di...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.9.2951

    authors: Staden R

    更新日期:1982-05-11 00:00:00

  • MMDB: Entrez's 3D structure database.

    abstract::The three dimensional structures for representatives of nearly half of all protein families are now available in public databases. Thus, no matter which protein one investigates, it is increasingly likely that the 3D structure of a homolog will be known and may reveal unsuspected structure-function relationships. The ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/27.1.240

    authors: Marchler-Bauer A,Addess KJ,Chappey C,Geer L,Madej T,Matsuo Y,Wang Y,Bryant SH

    更新日期:1999-01-01 00:00:00

  • The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse.

    abstract::The Mouse Genome Database (MGD, http://www.informatics.jax.org) is the international community resource for integrated genetic, genomic and biological data about the laboratory mouse. Data in MGD are obtained through loads from major data providers and experimental consortia, electronic submissions from laboratories a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr974

    authors: Eppig JT,Blake JA,Bult CJ,Kadin JA,Richardson JE,Mouse Genome Database Group.

    更新日期:2012-01-01 00:00:00

  • PoSSuM: a database of similar protein-ligand binding and putative pockets.

    abstract::Numerous potential ligand-binding sites are available today, along with hundreds of thousands of known binding sites observed in the PDB. Exhaustive similarity search for such vastly numerous binding site pairs is useful to predict protein functions and to enable rapid screening of target proteins for drug design. Exi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr1130

    authors: Ito J,Tabei Y,Shimizu K,Tsuda K,Tomii K

    更新日期:2012-01-01 00:00:00

  • Masking repeats while clustering ESTs.

    abstract::A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gki511

    authors: Schneeberger K,Malde K,Coward E,Jonassen I

    更新日期:2005-04-14 00:00:00

  • Updates of the HbVar database of human hemoglobin variants and thalassemia mutations.

    abstract::HbVar (http://globin.bx.psu.edu/hbvar) is one of the oldest and most appreciated locus-specific databases launched in 2001 by a multi-center academic effort to provide timely information on the genomic alterations leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Database records incl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt911

    authors: Giardine B,Borg J,Viennas E,Pavlidis C,Moradkhani K,Joly P,Bartsakoulia M,Riemer C,Miller W,Tzimas G,Wajcman H,Hardison RC,Patrinos GP

    更新日期:2014-01-01 00:00:00

  • Primary structure, developmentally regulated expression and potential duplication of the zebrafish homeobox gene ZF-21.

    abstract::We report the molecular cloning and characterization of a cDNA derived from a zebrafish gene (ZF-21) related to the mouse homeobox containing gene Hox2.1. Interesting information about the differential conservation of various domains was gained from comparisons between the putative protein sequences from ZF-21 (275 am...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/16.19.9097

    authors: Njølstad PR,Molven A,Hordvik I,Apold J,Fjose A

    更新日期:1988-10-11 00:00:00

  • Structural activation of the transcriptional repressor EthR from Mycobacterium tuberculosis by single amino acid change mimicking natural and synthetic ligands.

    abstract::Ethionamide is an antituberculous drug for the treatment of multidrug-resistant Mycobacterium tuberculosis. This antibiotic requires activation by the monooxygenase EthA to exert its activity. Production of EthA is controlled by the transcriptional repressor EthR, a member of the TetR family. The sensitivity of M. tub...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr1113

    authors: Carette X,Blondiaux N,Willery E,Hoos S,Lecat-Guillet N,Lens Z,Wohlkönig A,Wintjens R,Soror SH,Frénois F,Dirié B,Villeret V,England P,Lippens G,Deprez B,Locht C,Willand N,Baulard AR

    更新日期:2012-04-01 00:00:00

  • Compilation of tRNA sequences and sequences of tRNA genes.

    abstract::Sequences of 3279 sequences of tRNA genes and tRNAs published up to December 1996 are included in the compilation. Alignment of the sequences, which is most compatible with the tRNA phylogeny and known three-dimensional structures of tRNA, is used. Sequences and references are available under http://www.uni-bayreuth. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.1.148

    authors: Sprinzl M,Horn C,Brown M,Ioudovitch A,Steinberg S

    更新日期:1998-01-01 00:00:00

  • Activation of interferon regulatory factor-3 via toll-like receptor 3 and immunomodulatory functions detected in A549 lung epithelial cells exposed to misplaced U1-snRNA.

    abstract::U1-snRNA is an integral part of the U1 ribonucleoprotein pivotal for pre-mRNA splicing. Toll-like receptor (TLR) signaling has recently been associated with immunoregulatory capacities of U1-snRNA. Using lung A549 epithelial/carcinoma cells, we report for the first time on interferon regulatory factor (IRF)-3 activati...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp525

    authors: Sadik CD,Bachmann M,Pfeilschifter J,Mühl H

    更新日期:2009-08-01 00:00:00

  • Fine structure analyses of the Drosophila and Saccharomyces heat shock factor--heat shock element interactions.

    abstract::Heat shock genes are activated by the binding of the heat shock transcription factor (HSF) to heat shock elements (HSEs), consisting of arrays of the 5-bp unit NGAAN arranged as inverted repeats. Here, we have investigated the interaction of the 5-bp unit with HSFs of Drosophila and Saccharomyces. Mutations within the...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.2.167

    authors: Fernandes M,Xiao H,Lis JT

    更新日期:1994-01-25 00:00:00

  • Intracellular receptor-type transcription factor, LasR, contains a highly conserved amphipathic region which precedes the putative helix-turn-helix DNA binding motif.

    abstract::We have cloned and sequenced the lasR gene, which is involved in the transcriptional activation of several pathogenic factors, from Pseudomonas aeruginosa IFO3455 and PA103. These clones were predicted to be an open reading frame of 239 amino acids as reported for the PAO1 strain. There is only a single base change re...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.18.3706

    authors: Fukushima J,Ishiwata T,Kurata M,You Z,Okuda K

    更新日期:1994-09-11 00:00:00

  • Role of nucleotide identity in effective CRISPR target escape mutations.

    abstract::Prokaryotes use primed CRISPR adaptation to update their memory bank of spacers against invading genetic elements that have escaped CRISPR interference through mutations in their protospacer target site. We previously observed a trend that nucleotide-dependent mismatches between crRNA and the protospacer strongly infl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky687

    authors: Künne T,Zhu Y,da Silva F,Konstantinides N,McKenzie RE,Jackson RN,Brouns SJ

    更新日期:2018-11-02 00:00:00

  • Transcription factor decoy oligonucleotides modified with locked nucleic acids: an in vitro study to reconcile biostability with binding affinity.

    abstract::Double-stranded oligonucleotides (ODNs) containing the consensus binding sequence of a transcription factor provide a rationally designed tool to manipulate gene expression at the transcriptional level by the decoy approach. However, modifications introduced into oligonucleotides to increase stability quite often do n...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkh503

    authors: Crinelli R,Bianchi M,Gentilini L,Palma L,Sørensen MD,Bryld T,Babu RB,Arar K,Wengel J,Magnani M

    更新日期:2004-03-29 00:00:00

  • Ordered distribution of modified bases in the DNA of a dinoflagellate.

    abstract::In DNA of the dinoflagellate Crypthecodinium cohnii, 38% of the thymine is replaced by the modified base 5-hydroxymethyluracil, and approximately 3% of the cytosine is replaced by 5-methylcytosine. Both of the modified bases are non-randomly distributed in the DNA. Determinations of 3' nearest neighbors show that HOMe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.20.4709

    authors: Steele RE,Rae PM

    更新日期:1980-10-24 00:00:00

  • The mapping of nucleosomes and regulatory protein binding sites at the Saccharomyces cerevisiae MFA2 gene: a high resolution approach.

    abstract::We have developed an end-labelling approach to map the positions of nucleosomes and protein binding sites at nucleotide resolution by footprinting micrococcal nuclease (MNase)-sensitive sites. Using this approach we determined that the MFA2 gene and its upstream control regions have four positioned nucleosomes when tr...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.13.e64

    authors: Teng Y,Yu S,Waters R

    更新日期:2001-07-01 00:00:00

  • Nucleotide sequence at the end of the gene for the RNA polymerase beta' subunit (rpoC).

    abstract::We have determined the DNA sequence surrounding the transcription terminator following rpoC, the gene that codes for the beta' subunit of RNA polymerase in E. coli K12. The 2044 bp sequence obtained contains the distal 335 codons of rpoC followed by a 212 bp non-coding region and a second open reading frame (ORFa) of ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/9.24.6827

    authors: Squires C,Krainer A,Barry G,Shen WF,Squires CL

    更新日期:1981-12-21 00:00:00

  • Redundancy of primary RNA-binding functions of the bacterial transcription terminator Rho.

    abstract::The bacterial transcription terminator, Rho, terminates transcription at half of the operons. According to the classical model derived from in vitro assays on a few terminators, Rho is recruited to the transcription elongation complex (EC) by recognizing specific sites (rut) on the nascent RNA. Here, we explored the m...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku690

    authors: Shashni R,Qayyum MZ,Vishalini V,Dey D,Sen R

    更新日期:2014-09-01 00:00:00

  • Differences in unwinding of supercoiled DNA induced by the two enantiomers of anti-benzo[a]pyrene diol epoxide.

    abstract::The unwinding of supercoiled phi X174 RFI DNA induced by the tumorigenic (+) and non-tumorigenic (-) enantiomers of trans-7,8-dihydroxy-anti-9,10-epoxy-7,8,9,10-tetrahydrobenzo[a]pyrene (BPDE) has been investigated by agarose slab-gel and ethidium titration tube gel electrophoresis. The differences in adduct conformat...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.23.6167

    authors: Xu R,Birke S,Carberry SE,Geacintov NE,Swenberg CE,Harvey RG

    更新日期:1992-12-11 00:00:00

  • Conditional DNA repair mutants enable highly precise genome engineering.

    abstract::Oligonucleotide-mediated multiplex genome engineering is an important tool for bacterial genome editing. The efficient application of this technique requires the inactivation of the endogenous methyl-directed mismatch repair system that in turn leads to a drastically elevated genomic mutation rate and the consequent a...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku105

    authors: Nyerges Á,Csorgő B,Nagy I,Latinovics D,Szamecz B,Pósfai G,Pál C

    更新日期:2014-04-01 00:00:00

  • NMR structure of a parallel-stranded DNA duplex at atomic resolution.

    abstract::DNA dodecamers have been designed with two cytosines on each end and intervening A and T stretches, such that the oligomers have fully complementary A:T base pairs when aligned in the parallel orientation. Spectroscopic (UV, CD and IR), NMR and molecular dynamics studies have shown that oligomers having the sequences ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/30.7.1500

    authors: Parvathy VR,Bhaumik SR,Chary KV,Govil G,Liu K,Howard FB,Miles HT

    更新日期:2002-04-01 00:00:00