Abstract:
:Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.
journal_name
Nucleic Acids Resjournal_title
Nucleic acids researchauthors
Besemer J,Lomsadze A,Borodovsky Mdoi
10.1093/nar/29.12.2607keywords:
subject
Has Abstractpub_date
2001-06-15 00:00:00pages
2607-18issue
12eissn
0305-1048issn
1362-4962journal_volume
29pub_type
杂志文章abstract::Restriction fragments of T7 DNA which selectively bind E. coli RNA polymerase have been identified. These include fragments located close to the beginning of gene 1 where according to Minkley and Pribnow (1973) there is a promoter called C. The smallest fragment from this region which binds RNA polymerase has been seq...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/6.2.525
更新日期:1979-02-01 00:00:00
abstract::Streptomyces bldA gene, which encodes a tRNA corresponding to a very minor leucine codon, UUA, regulates pleiotropic gene expression which is involved in sporulation and secondary metabolism. The unique structural feature of this tRNA is the lack of GG sequence in dihydrouridine loop (D-loop) that generally is conserv...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/20.15.3911
更新日期:1992-08-11 00:00:00
abstract::We have developed a highly sensitive method for DNA analysis on 3D gel element microarrays, a technique we call multiplex microarray-enhanced PCR (MME-PCR). Two amplification strategies are carried out simultaneously in the reaction chamber: on or within gel elements, and in bulk solution over the gel element array. M...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gnh184
更新日期:2005-01-20 00:00:00
abstract::Codon usage bias is a universal feature of all genomes. Although codon usage has been shown to regulate mRNA and protein levels by influencing mRNA decay and transcription in eukaryotes, little or no genome-wide correlations between codon usage and mRNA levels are detected in mammalian cells, raising doubt on the sign...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkaa1253
更新日期:2021-01-25 00:00:00
abstract::Restriction endonucleases are highly specific in recognizing the particular DNA sequence they act on. However, their activity is affected by sequence context, enzyme concentration and buffer composition. Changes in these factors may lead to either ineffective cleavage at the cognate restriction site or relaxed specifi...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkt257
更新日期:2013-06-01 00:00:00
abstract::A cDNA clone which expresses a protein that cross-reacts immunologically with the human C1 and C2 hnRNP core proteins has been isolated. The clone was selected by a sensitive immunochemical assay employing an avidin-biotin complex for detection, and identified as a clone for the hnRNP C proteins by a highly sensitive ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/14.10.4077
更新日期:1986-05-27 00:00:00
abstract::The endoribonuclease RNase E is a key enzyme in RNA metabolism for many bacterial species. In Escherichia coli, RNase E contributes to the majority of RNA turnover and processing events, and the enzyme has been extensively characterized as the central component of the RNA degradosome assembly. A similar RNA degradosom...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gku1134
更新日期:2014-12-01 00:00:00
abstract::A fluoride-cleavable phosphoramidite for biotinylation was designed, synthesized and coupled efficiently to the 5'-end of DNA on an automatic synthesizer. The diisopropylsilyl acetal functionality was used to link the biotin moiety through a tertiary hydroxide group to the 5'-end of DNA. This linkage proved to be comp...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkg130
更新日期:2003-01-15 00:00:00
abstract::NELF and DSIF act together to inhibit transcription elongation in vitro, and are implicated in causing promoter proximal pausing on the hsp70 gene in Drosophila. Here, further characterization of Drosophila NELF is provided. Drosophila NELF has four subunits similar to subunits of human NELF. The amino acid sequences ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gki274
更新日期:2005-03-01 00:00:00
abstract::This paper describes a computer program designed to look for similarities between pairs of nucleic or amino acid sequences. The program looks both for segments of perfect identity or for regions where, using a scoring matrix, a minimum value is exceeded. The results of comparisons are presented as a matrix which is di...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/10.9.2951
更新日期:1982-05-11 00:00:00
abstract::The three dimensional structures for representatives of nearly half of all protein families are now available in public databases. Thus, no matter which protein one investigates, it is increasingly likely that the 3D structure of a homolog will be known and may reveal unsuspected structure-function relationships. The ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/27.1.240
更新日期:1999-01-01 00:00:00
abstract::The Mouse Genome Database (MGD, http://www.informatics.jax.org) is the international community resource for integrated genetic, genomic and biological data about the laboratory mouse. Data in MGD are obtained through loads from major data providers and experimental consortia, electronic submissions from laboratories a...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkr974
更新日期:2012-01-01 00:00:00
abstract::Numerous potential ligand-binding sites are available today, along with hundreds of thousands of known binding sites observed in the PDB. Exhaustive similarity search for such vastly numerous binding site pairs is useful to predict protein functions and to enable rapid screening of target proteins for drug design. Exi...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkr1130
更新日期:2012-01-01 00:00:00
abstract::A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gki511
更新日期:2005-04-14 00:00:00
abstract::HbVar (http://globin.bx.psu.edu/hbvar) is one of the oldest and most appreciated locus-specific databases launched in 2001 by a multi-center academic effort to provide timely information on the genomic alterations leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Database records incl...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkt911
更新日期:2014-01-01 00:00:00
abstract::We report the molecular cloning and characterization of a cDNA derived from a zebrafish gene (ZF-21) related to the mouse homeobox containing gene Hox2.1. Interesting information about the differential conservation of various domains was gained from comparisons between the putative protein sequences from ZF-21 (275 am...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/16.19.9097
更新日期:1988-10-11 00:00:00
abstract::Ethionamide is an antituberculous drug for the treatment of multidrug-resistant Mycobacterium tuberculosis. This antibiotic requires activation by the monooxygenase EthA to exert its activity. Production of EthA is controlled by the transcriptional repressor EthR, a member of the TetR family. The sensitivity of M. tub...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkr1113
更新日期:2012-04-01 00:00:00
abstract::Sequences of 3279 sequences of tRNA genes and tRNAs published up to December 1996 are included in the compilation. Alignment of the sequences, which is most compatible with the tRNA phylogeny and known three-dimensional structures of tRNA, is used. Sequences and references are available under http://www.uni-bayreuth. ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/26.1.148
更新日期:1998-01-01 00:00:00
abstract::U1-snRNA is an integral part of the U1 ribonucleoprotein pivotal for pre-mRNA splicing. Toll-like receptor (TLR) signaling has recently been associated with immunoregulatory capacities of U1-snRNA. Using lung A549 epithelial/carcinoma cells, we report for the first time on interferon regulatory factor (IRF)-3 activati...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkp525
更新日期:2009-08-01 00:00:00
abstract::Heat shock genes are activated by the binding of the heat shock transcription factor (HSF) to heat shock elements (HSEs), consisting of arrays of the 5-bp unit NGAAN arranged as inverted repeats. Here, we have investigated the interaction of the 5-bp unit with HSFs of Drosophila and Saccharomyces. Mutations within the...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/22.2.167
更新日期:1994-01-25 00:00:00
abstract::We have cloned and sequenced the lasR gene, which is involved in the transcriptional activation of several pathogenic factors, from Pseudomonas aeruginosa IFO3455 and PA103. These clones were predicted to be an open reading frame of 239 amino acids as reported for the PAO1 strain. There is only a single base change re...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/22.18.3706
更新日期:1994-09-11 00:00:00
abstract::Prokaryotes use primed CRISPR adaptation to update their memory bank of spacers against invading genetic elements that have escaped CRISPR interference through mutations in their protospacer target site. We previously observed a trend that nucleotide-dependent mismatches between crRNA and the protospacer strongly infl...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gky687
更新日期:2018-11-02 00:00:00
abstract::Double-stranded oligonucleotides (ODNs) containing the consensus binding sequence of a transcription factor provide a rationally designed tool to manipulate gene expression at the transcriptional level by the decoy approach. However, modifications introduced into oligonucleotides to increase stability quite often do n...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gkh503
更新日期:2004-03-29 00:00:00
abstract::In DNA of the dinoflagellate Crypthecodinium cohnii, 38% of the thymine is replaced by the modified base 5-hydroxymethyluracil, and approximately 3% of the cytosine is replaced by 5-methylcytosine. Both of the modified bases are non-randomly distributed in the DNA. Determinations of 3' nearest neighbors show that HOMe...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/8.20.4709
更新日期:1980-10-24 00:00:00
abstract::We have developed an end-labelling approach to map the positions of nucleosomes and protein binding sites at nucleotide resolution by footprinting micrococcal nuclease (MNase)-sensitive sites. Using this approach we determined that the MFA2 gene and its upstream control regions have four positioned nucleosomes when tr...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/29.13.e64
更新日期:2001-07-01 00:00:00
abstract::We have determined the DNA sequence surrounding the transcription terminator following rpoC, the gene that codes for the beta' subunit of RNA polymerase in E. coli K12. The 2044 bp sequence obtained contains the distal 335 codons of rpoC followed by a 212 bp non-coding region and a second open reading frame (ORFa) of ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/9.24.6827
更新日期:1981-12-21 00:00:00
abstract::The bacterial transcription terminator, Rho, terminates transcription at half of the operons. According to the classical model derived from in vitro assays on a few terminators, Rho is recruited to the transcription elongation complex (EC) by recognizing specific sites (rut) on the nascent RNA. Here, we explored the m...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gku690
更新日期:2014-09-01 00:00:00
abstract::The unwinding of supercoiled phi X174 RFI DNA induced by the tumorigenic (+) and non-tumorigenic (-) enantiomers of trans-7,8-dihydroxy-anti-9,10-epoxy-7,8,9,10-tetrahydrobenzo[a]pyrene (BPDE) has been investigated by agarose slab-gel and ethidium titration tube gel electrophoresis. The differences in adduct conformat...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/20.23.6167
更新日期:1992-12-11 00:00:00
abstract::Oligonucleotide-mediated multiplex genome engineering is an important tool for bacterial genome editing. The efficient application of this technique requires the inactivation of the endogenous methyl-directed mismatch repair system that in turn leads to a drastically elevated genomic mutation rate and the consequent a...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/gku105
更新日期:2014-04-01 00:00:00
abstract::DNA dodecamers have been designed with two cytosines on each end and intervening A and T stretches, such that the oligomers have fully complementary A:T base pairs when aligned in the parallel orientation. Spectroscopic (UV, CD and IR), NMR and molecular dynamics studies have shown that oligomers having the sequences ...
journal_title:Nucleic acids research
pub_type: 杂志文章
doi:10.1093/nar/30.7.1500
更新日期:2002-04-01 00:00:00