One size does not fit all: on how Markov model order dictates performance of genomic sequence analyses.

Abstract:

:The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov model. Most biological applications use a predetermined order for all data sets indiscriminately. Here, we show the vast variation in the performance of such applications with the order. To identify the 'optimal' order, we investigated two model selection criteria: Akaike information criterion and Bayesian information criterion (BIC). The BIC optimal order delivers the best performance for mammalian phylogeny reconstruction and motif discovery. Importantly, this order is different from orders typically used by many tools, suggesting that a simple additional step determining this order can significantly improve results. Further, we describe a novel classification approach based on BIC optimal Markov models to predict functionality of tissue-specific promoters. Our classifier discriminates between promoters active across 12 different tissues with remarkable accuracy, yielding 3 times the precision expected by chance. Application to the metagenomics problem of identifying the taxum from a short DNA fragment yields accuracies at least as high as the more complex mainstream methodologies, while retaining conceptual and computational simplicity.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Narlikar L,Mehta N,Galande S,Arjunwadkar M

doi

10.1093/nar/gks1285

subject

Has Abstract

pub_date

2013-02-01 00:00:00

pages

1416-24

issue

3

eissn

0305-1048

issn

1362-4962

pii

gks1285

journal_volume

41

pub_type

杂志文章
  • The COP9 signalosome is vital for timely repair of DNA double-strand breaks.

    abstract::The DNA damage response is vigorously activated by DNA double-strand breaks (DSBs). The chief mobilizer of the DSB response is the ATM protein kinase. We discovered that the COP9 signalosome (CSN) is a crucial player in the DSB response and an ATM target. CSN is a protein complex that regulates the activity of cullin ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv270

    authors: Meir M,Galanty Y,Kashani L,Blank M,Khosravi R,Fernández-Ávila MJ,Cruz-García A,Star A,Shochot L,Thomas Y,Garrett LJ,Chamovitz DA,Bodine DM,Kurz T,Huertas P,Ziv Y,Shiloh Y

    更新日期:2015-05-19 00:00:00

  • Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis.

    abstract::Amino acid sequences of 2 giant non-structural polyproteins (F1 and F2) of infectious bronchitis virus (IBV), a member of Coronaviridae, were compared, by computer-assisted methods, to sequences of a number of other positive strand RNA viral and cellular proteins. By this approach, juxtaposed putative RNA-dependent RN...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/17.12.4847

    authors: Gorbalenya AE,Koonin EV,Donchenko AP,Blinov VM

    更新日期:1989-06-26 00:00:00

  • SFmap: a web server for motif analysis and prediction of splicing factor binding sites.

    abstract::Alternative splicing (AS) is a post-transcriptional process considered to be responsible for the huge diversity of proteins in higher eukaryotes. AS events are regulated by different splicing factors (SFs) that bind to sequence elements on the RNA. SFmap is a web server for predicting putative SF binding sites in geno...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq444

    authors: Paz I,Akerman M,Dror I,Kosti I,Mandel-Gutfreund Y

    更新日期:2010-07-01 00:00:00

  • PHUSER (Primer Help for USER): a novel tool for USER fusion primer design.

    abstract::Uracil-Specific Exision Reagent (USER) fusion is a recently developed technique that allows for assembly of multiple DNA fragments in a few simple steps. However, designing primers for USER fusion is both tedious and time consuming. Here, we present the Primer Help for USER (PHUSER) software, a novel tool for designin...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr394

    authors: Olsen LR,Hansen NB,Bonde MT,Genee HJ,Holm DK,Carlsen S,Hansen BG,Patil KR,Mortensen UH,Wernersson R

    更新日期:2011-07-01 00:00:00

  • Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis.

    abstract::To date, an effective therapeutic treatment that confers strong attenuation toward coronaviruses (CoVs) remains elusive. Of all the potential drug targets, the helicase of CoVs is considered to be one of the most important. Here, we first present the structure of the full-length Nsp13 helicase of SARS-CoV (SARS-Nsp13)...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz409

    authors: Jia Z,Yan L,Ren Z,Wu L,Wang J,Guo J,Zheng L,Ming Z,Zhang L,Lou Z,Rao Z

    更新日期:2019-07-09 00:00:00

  • A two-dimensional thin layer chromatographic procedure for the sequential analysis of oligonucleotides employing tritium post-labeling.

    abstract::Two dimensional PEI-cellulose thin layer chromatography can resolve sequentially degraded oligonucleotide fragments of tRNA. This technique entails the sequential degradation of the oligonucleotide with snake venom phosphodiesterase in the presence of bacterial alkaline phosphatase, and periodate oxidation followed by...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/4.10.3563

    authors: Chen EY,Roe BA

    更新日期:1977-10-01 00:00:00

  • Genetic manipulation of an exogenous non-immunoglobulin protein by gene conversion machinery in a chicken B cell line.

    abstract::During culture, a chicken B cell line DT40 spontaneously mutates immunoglobulin (Ig) genes by gene conversion, which involves activation-induced cytidine deaminase (AID)-dependent homologous recombination of the variable (V) region gene with upstream pseudo-V genes. To explore whether this mutation mechanism can targe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gnj013

    authors: Kanayama N,Todo K,Takahashi S,Magari M,Ohmori H

    更新日期:2006-01-18 00:00:00

  • CDK12 regulates alternative last exon mRNA splicing and promotes breast cancer cell invasion.

    abstract::CDK12 (cyclin-dependent kinase 12) is a regulatory kinase with evolutionarily conserved roles in modulating transcription elongation. Recent tumor genome studies of breast and ovarian cancers highlighted recurrent CDK12 mutations, which have been shown to disrupt DNA repair in cell-based assays. In breast cancers, CDK...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx187

    authors: Tien JF,Mazloomian A,Cheng SG,Hughes CS,Chow CCT,Canapi LT,Oloumi A,Trigo-Gonzalez G,Bashashati A,Xu J,Chang VC,Shah SP,Aparicio S,Morin GB

    更新日期:2017-06-20 00:00:00

  • OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs.

    abstract::OrthoDB is a comprehensive catalog of orthologs, genes inherited by extant species from a single gene in their last common ancestor. In 2016 OrthoDB reached its 9th release, growing to over 22 million genes from over 5000 species, now adding plants, archaea and viruses. In this update we focused on usability of this f...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1119

    authors: Zdobnov EM,Tegenfeldt F,Kuznetsov D,Waterhouse RM,Simão FA,Ioannidis P,Seppey M,Loetscher A,Kriventseva EV

    更新日期:2017-01-04 00:00:00

  • Determination of DNA cooperativity factor.

    abstract::The paper presents measurements of the difference in the melting temperature of a colE1 DNA region when it is located inside the DNA helix and at its end. A direct comparison of calculations based on the rigorous theory of helix-coil transition with experimental data for .2 M Na+ (the conditions for fully reversible m...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/9.20.5469

    authors: Amirikyan BR,Vologodskii AV,Lyubchenko YuL

    更新日期:1981-10-24 00:00:00

  • Breaksite batch mapping, a rapid method for assay and identification of DNA breaksites in mammalian cells.

    abstract::DNA breaks occur during many processes in mammalian cells, including recombination, repair, mutagenesis and apoptosis. Here we report a simple and rapid method for assaying DNA breaks and identifying DNA breaksites. Breaksites are first tagged and amplified by ligation-mediated PCR (LM-PCR), using nested PCR primers t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.6.e33

    authors: Kong Q,Maizels N

    更新日期:2001-03-15 00:00:00

  • Single-molecule FRET reveals the pre-initiation and initiation conformations of influenza virus promoter RNA.

    abstract::Influenza viruses have a segmented viral RNA (vRNA) genome, which is replicated by the viral RNA-dependent RNA polymerase (RNAP). Replication initiates on the vRNA 3' terminus, producing a complementary RNA (cRNA) intermediate, which serves as a template for the synthesis of new vRNA. RNAP structures show the 3' termi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw884

    authors: Robb NC,Te Velthuis AJ,Wieneke R,Tampé R,Cordes T,Fodor E,Kapanidis AN

    更新日期:2016-12-01 00:00:00

  • FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones.

    abstract::FANTOM DB, the database of Functional Annotation of RIKEN Mouse cDNA Clones, is designed to store sequence information of RIKEN full-length enriched mouse cDNA clones, graphical views of sequence analysis results, curated functional annotation information and additional descriptions, including Gene Ontology terms. RIK...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/30.1.116

    authors: Bono H,Kasukawa T,Furuno M,Hayashizaki Y,Okazaki Y

    更新日期:2002-01-01 00:00:00

  • Three-step PCR mutagenesis for 'linker scanning'.

    abstract::'Linker scanning' has been used as an efficient method for systematically surveying a segment of DNA for functional elements by mutagenesis. A three-step PCR method was developed to simplify this process. In this method, a set of 'mutation primers' was made with 6 to 8 base substitutions in the center of the primers. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.16.3745

    authors: Li XM,Shapiro LJ

    更新日期:1993-08-11 00:00:00

  • Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks.

    abstract::More and more evidences demonstrate that the long non-coding RNAs (lncRNAs) play many key roles in diverse biological processes. There is a critical need to annotate the functions of increasing available lncRNAs. In this article, we try to apply a global network-based strategy to tackle this issue for the first time. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks967

    authors: Guo X,Gao L,Liao Q,Xiao H,Ma X,Yang X,Luo H,Zhao G,Bu D,Jiao F,Shao Q,Chen R,Zhao Y

    更新日期:2013-01-01 00:00:00

  • The PKR-binding domain of adenovirus VA RNAI exists as a mixture of two functionally non-equivalent structures.

    abstract::VA RNA(I) is a non-coding adenoviral transcript that counteracts the host cell anti-viral defenses such as immune responses mediated via PKR. We investigated potential alternate secondary structure conformations within the PKR-binding domain of VA RNA(I) using site-directed mutagenesis, RNA UV-melting analysis and enz...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp595

    authors: Wahid AM,Coventry VK,Conn GL

    更新日期:2009-09-01 00:00:00

  • Unrecognized sequence homologies may confound genome-wide association studies.

    abstract::Genome-wide association studies (GWAS) have become a preferred method to identify new genetic susceptibility loci. This technique aims to understanding the molecular etiology of common diseases, but in many cases, it has led to the identification of loci with no obvious biological relevance. Herein, we show that previ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks169

    authors: Galichon P,Mesnard L,Hertig A,Stengel B,Rondeau E

    更新日期:2012-06-01 00:00:00

  • A simple polypyrimidine repeat acts as an artificial Rho-dependent terminator in vivo and in vitro.

    abstract::In this paper, we present evidence that an efficient Rho-dependent terminator can be created by introducing a simple (AG/TC) n DNA repeat into a transcription unit. The Rho termination activity in vivo and in vitro is dependent on the length and the orientation of the insert. The transcription of at least 30 bp of the...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.21.4895

    authors: Guérin M,Robichon N,Geiselmann J,Rahmouni AR

    更新日期:1998-11-01 00:00:00

  • NAIMA: target amplification strategy allowing quantitative on-chip detection of GMOs.

    abstract::We have developed a novel multiplex quantitative DNA-based target amplification method suitable for sensitive, specific and quantitative detection on microarray. This new method named NASBA Implemented Microarray Analysis (NAIMA) was applied to GMO detection in food and feed, but its application can be extended to all...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn524

    authors: Morisset D,Dobnik D,Hamels S,Zel J,Gruden K

    更新日期:2008-10-01 00:00:00

  • Nucleotide sequence of the putative recognition site for coat protein in the RNAs of alfalfa mosaic virus and tobacco streak virus.

    abstract::The sequence of the 3'-terminal 180 and 140 nucleotides of RNAs 2 and 3, respectively, of tobacco streak virus (TSV) was deduced by reverse transcription in the presence of a specific primer and chain terminators. Homology between the two RNAs was found to be restricted to a 3-terminal region of about 45 nucleotides. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.15.3307

    authors: Koper-Zwarthoff EC,Bol JF

    更新日期:1980-08-11 00:00:00

  • The initiator element of the Drosophila beta2 tubulin gene core promoter contributes to gene expression in vivo but is not required for male germ-cell specific expression.

    abstract::The tissue-specific expression of the Drosophila beta 2 tubulin gene ( B2t ) is accomplished by the action of a 14-bp activator element (beta2UE1) in combination with certain regulatory elements of the TATA-less, Inr-containing B2t core promoter. We performed an in vivo analysis of the Inr element function in the B2t ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/28.6.1439

    authors: Santel A,Kaufmann J,Hyland R,Renkawitz-Pohl R

    更新日期:2000-03-15 00:00:00

  • Signatures of accelerated somatic evolution in gene promoters in multiple cancer types.

    abstract::Cancer-associated somatic mutations outside protein-coding regions remain largely unexplored. Analyses of the TERT locus have indicated that non-coding regulatory mutations can be more frequent than previously suspected and play important roles in oncogenesis. Using a computational method called SASE-hunter, developed...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv419

    authors: Smith KS,Yadav VK,Pedersen BS,Shaknovich R,Geraci MW,Pollard KS,De S

    更新日期:2015-06-23 00:00:00

  • Quantitative sampling of conformational heterogeneity of a DNA hairpin using molecular dynamics simulations and ultrafast fluorescence spectroscopy.

    abstract::Molecular dynamics (MD) simulations and time resolved fluorescence (TRF) spectroscopy were combined to quantitatively describe the conformational landscape of the DNA primary binding sequence (PBS) of the HIV-1 genome, a short hairpin targeted by retroviral nucleocapsid proteins implicated in the viral reverse transcr...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw077

    authors: Voltz K,Léonard J,Touceda PT,Conyard J,Chaker Z,Dejaegere A,Godet J,Mély Y,Haacke S,Stote RH

    更新日期:2016-04-20 00:00:00

  • Studies on transfer ribonucleic acids and related compounds. 8(1). Further studies on aromatic phosphoramidates as a protecting group for phosphomonoesters.

    abstract::Stability of aromatic phosphoramidates was studied using 2',3'-O-dibenzoyluridine 5'-phosphoramidates and N,2',3'-O-tribenzoylcytidine 5'-phosphate. The effect of dicyclohexylcarbodiimide in this mixture was investigated. Decomposition of the anilidate was slower in the presence of DCC. Substituted anilidates of uridi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/1.2.223

    authors: Otsuka E,Honda A,Shigyo H,Morioka S,Sugiyama T

    更新日期:1974-02-01 00:00:00

  • Photosnesitization of DNA by gold.

    abstract::Au (III) reacts with DNA at pH 5.6 to form a complex which is sensitive to mid-UV radiation. Cyclobutane pyrimidine dimers are produced at some 15 to 30 times the rate that they are in untreated DNA. The mechanism of photosensitization appears to involve energy absorption by Au-urine and Au-cytosine adducts which can ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/5.10.3731

    authors: Wilkins RJ

    更新日期:1978-10-01 00:00:00

  • The PathoYeastract database: an information system for the analysis of gene and genomic transcription regulation in pathogenic yeasts.

    abstract::We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Up...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw817

    authors: Monteiro PT,Pais P,Costa C,Manna S,Sá-Correia I,Teixeira MC

    更新日期:2017-01-04 00:00:00

  • Comparison of the base pairing properties of a series of nitroazole nucleobase analogs in the oligodeoxyribonucleotide sequence 5'-d(CGCXAATTYGCG)-3'.

    abstract::The nucleoside analogs 1-(2'-deoxy-beta-D-ribofuranosyl)- 3-nitropyrrole (9), 1-(2'-deoxy-beta-D-ribofuranosyl)-4-nitropyrazole (10), 1-(2'-deoxy-beta-D-ribofuranosyl)-4-nitroimidazole (11) and 1-(2'-deoxy-beta-D-ribofuranosyl)-5-nitroindole (21) were incorporated into the oligonucleotide 5'-d(CGCXAATTYGCG)-3'in the f...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.10.1935

    authors: Bergstrom DE,Zhang P,Johnson WT

    更新日期:1997-05-15 00:00:00

  • p59OASL, a 2'-5' oligoadenylate synthetase like protein: a novel human gene related to the 2'-5' oligoadenylate synthetase family.

    abstract::The 2'-5' oligoadenylate synthetases form a well conserved family of interferon induced proteins, presumably present throughout the mammalian class. Using the Expressed Sequence Tag databases, we have identified a novel member of this family. This protein, which we named p59 2'-5' oligoadenylate synthetase-like protei...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.18.4121

    authors: Hartmann R,Olsen HS,Widder S,Jorgensen R,Justesen J

    更新日期:1998-09-15 00:00:00

  • Solution structure of duplex DNA containing an extrahelical abasic site analog determined by NMR spectroscopy and molecular dynamics.

    abstract::Translesional DNA synthesis past abasic sites proceeds with the preferential incorporation of dAMP opposite the lesion and, depending on the sequence context, one or two base deletions. High-resolution NMR spectroscopy and molecular dynamics simulations were used to determine the three-dimensional structure of a DNA h...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/26.10.2385

    authors: Lin Z,Hung KN,Grollman AP,de los Santos C

    更新日期:1998-05-15 00:00:00

  • Role of nucleotide identity in effective CRISPR target escape mutations.

    abstract::Prokaryotes use primed CRISPR adaptation to update their memory bank of spacers against invading genetic elements that have escaped CRISPR interference through mutations in their protospacer target site. We previously observed a trend that nucleotide-dependent mismatches between crRNA and the protospacer strongly infl...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky687

    authors: Künne T,Zhu Y,da Silva F,Konstantinides N,McKenzie RE,Jackson RN,Brouns SJ

    更新日期:2018-11-02 00:00:00