Probabilistic error correction for RNA sequencing.

Abstract:

:Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Le HS,Schulz MH,McCauley BM,Hinman VF,Bar-Joseph Z

doi

10.1093/nar/gkt215

subject

Has Abstract

pub_date

2013-05-01 00:00:00

pages

e109

issue

10

eissn

0305-1048

issn

1362-4962

pii

gkt215

journal_volume

41

pub_type

杂志文章
  • Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes.

    abstract::Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindere...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp248

    authors: Zhang S,Xu M,Li S,Su Z

    更新日期:2009-06-01 00:00:00

  • Measuring the dynamic surface accessibility of RNA with the small paramagnetic molecule TEMPOL.

    abstract::The surface accessibility of macromolecules plays a key role in modulating molecular recognition events. RNA is a complex and dynamic molecule involved in many aspects of gene expression. However, there are few experimental methods available to measure the accessible surface of RNA. Here, we investigate the accessible...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm1062

    authors: Venditti V,Niccolai N,Butcher SE

    更新日期:2008-03-01 00:00:00

  • Mutant isolation of mouse DNA topoisomerase II alpha in yeast.

    abstract::For characterizing in vivo functions of a mammalian protein, it is informative to obtain conditional mutations and apply them to the mouse genetic system. However, the isolation of conditional mutations has been quite difficult in cultured cells. We report here that functional expression of a heterologous mammalian ge...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.20.4229

    authors: Adachi N,Ikeda H,Kikuchi A

    更新日期:1994-10-11 00:00:00

  • A crystalline end product produced by the hydrolytic cleavage of an RNA-like fragment by an organometallointercalator: 1,10-phenanthroline-platinum(II)-ethylenediamine-cytidine 3' monophosphate.

    abstract::1,10-Phenanthroline-platinum(II)-ethylenediamine ( PEPt ) forms a crystalline complex with cytidine-3'-phosphate (3'-CMP) and its structure has been determined by X-ray crystallography. 3'-CMP molecules are hemiprotonated and form hydrogen-bonded pairs that stack above and below the phenanthroline-platinum(II) drug mo...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/12.8.3649

    authors: Vijay-Kumar S,Sakore TD,Sobell HM

    更新日期:1984-04-25 00:00:00

  • The ribonuclease DIS3 promotes let-7 miRNA maturation by degrading the pluripotency factor LIN28B mRNA.

    abstract::Multiple myeloma, the second most frequent hematologic tumor after lymphomas, is an incurable cancer. Recent sequencing efforts have identified the ribonuclease DIS3 as one of the most frequently mutated genes in this disease. DIS3 represents the catalytic subunit of the exosome, a macromolecular complex central to th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv387

    authors: Segalla S,Pivetti S,Todoerti K,Chudzik MA,Giuliani EC,Lazzaro F,Volta V,Lazarevic D,Musco G,Muzi-Falconi M,Neri A,Biffo S,Tonon G

    更新日期:2015-05-26 00:00:00

  • Modulation of HBV replication by microRNA-15b through targeting hepatocyte nuclear factor 1α.

    abstract::Hepatitis B virus (HBV) infection remains a major health problem worldwide. The role played by microRNAs (miRNAs) in HBV replication and pathogenesis is being increasingly recognized. In this study, we found that miR-15b, an important miRNA during HBV infection and hepatocellular carcinoma development, directly binds ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gku260

    authors: Dai X,Zhang W,Zhang H,Sun S,Yu H,Guo Y,Kou Z,Zhao G,Du L,Jiang S,Zhang J,Li J,Zhou Y

    更新日期:2014-06-01 00:00:00

  • The mechanism of mutation induction by a hydrogen bond ambivalent, bicyclic N4-oxy-2'-deoxycytidine in Escherichia coli.

    abstract::The triphosphate of the nucleoside deoxyribosyl dihydropyrimido[4,5-c][1,2]oxazin-7-one (dP) is known to be incorporated into DNA efficiently by Taq polymerase and is a useful tool for polymerase-mediated in vitro mutagenesis. It is shown here that dP is a potent mutagen in Escherichia coli and Salmonella typhimurium ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.8.1548

    authors: Negishi K,Williams DM,Inoue Y,Moriyama K,Brown DM,Hayatsu H

    更新日期:1997-04-15 00:00:00

  • Different enhancer classes in Drosophila bind distinct architectural proteins and mediate unique chromatin interactions and 3D architecture.

    abstract::Eukaryotic gene expression is regulated by enhancer-promoter interactions but the molecular mechanisms that govern specificity have remained elusive. Genome-wide studies utilizing STARR-seq identified two enhancer classes in Drosophila that interact with different core promoters: housekeeping enhancers (hkCP) and deve...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1114

    authors: Cubeñas-Potts C,Rowley MJ,Lyu X,Li G,Lei EP,Corces VG

    更新日期:2017-02-28 00:00:00

  • How to fold and protect mitochondrial ribosomal RNA with fewer guanines.

    abstract::Mammalian mitochondrial ribosomes evolved from bacterial ribosomes by reduction of ribosomal RNAs, increase of ribosomal protein content, and loss of guanine nucleotides. Guanine is the base most sensitive to oxidative damage. By systematically comparing high-quality, small ribosomal subunit RNA sequence alignments an...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky762

    authors: Hosseini M,Roy P,Sissler M,Zirbel CL,Westhof E,Leontis N

    更新日期:2018-11-16 00:00:00

  • Analysis of DNA sequences which regulate the transcription of herpes simplex virus immediate early gene 3: DNA sequences required for enhancer-like activity and response to trans-activation by a virion polypeptide.

    abstract::The far upstream region of herpes simplex virus (HSV) immediate early (IE) gene 3 has previously been shown to increase gene expression in an enhancer-like manner, and to contain sequences which respond to stimulation of transcription by a virion polypeptide, Vmw65. To analyse the specific DNA sequences which mediate ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/14.2.929

    authors: Bzik DJ,Preston CM

    更新日期:1986-01-24 00:00:00

  • Characterization of a cAMP responsive transcription factor, Cmr (Rv1675c), in TB complex mycobacteria reveals overlap with the DosR (DevR) dormancy regulon.

    abstract::Mycobacterium tuberculosis (Mtb) Cmr (Rv1675c) is a CRP/FNR family transcription factor known to be responsive to cAMP levels and during macrophage infections. However, Cmr's DNA binding properties, cellular targets and overall role in tuberculosis (TB) complex bacteria have not been characterized. In this study, we u...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv889

    authors: Ranganathan S,Bai G,Lyubetskaya A,Knapp GS,Peterson MW,Gazdik M,C Gomes AL,Galagan JE,McDonough KA

    更新日期:2016-01-08 00:00:00

  • Alu retrotransposons promote differentiation of human carcinoma cells through the aryl hydrocarbon receptor.

    abstract::Cell differentiation is a central process in development and in cancer growth and dissemination. OCT4 (POU5F1) and NANOG are essential for cell stemness and pluripotency; yet, the mechanisms that regulate their expression remain largely unknown. Repetitive elements account for almost half of the Human Genome; still, t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw095

    authors: Morales-Hernández A,González-Rico FJ,Román AC,Rico-Leo E,Alvarez-Barrientos A,Sánchez L,Macia Á,Heras SR,García-Pérez JL,Merino JM,Fernández-Salguero PM

    更新日期:2016-06-02 00:00:00

  • The IntAct molecular interaction database in 2012.

    abstract::IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr1088

    authors: Kerrien S,Aranda B,Breuza L,Bridge A,Broackes-Carter F,Chen C,Duesbury M,Dumousseau M,Feuermann M,Hinz U,Jandrasits C,Jimenez RC,Khadake J,Mahadevan U,Masson P,Pedruzzi I,Pfeiffenberger E,Porras P,Raghunath A,Roeche

    更新日期:2012-01-01 00:00:00

  • Saccharomyces genome database informs human biology.

    abstract::The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existin...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx1112

    authors: Skrzypek MS,Nash RS,Wong ED,MacPherson KA,Hellerstedt ST,Engel SR,Karra K,Weng S,Sheppard TK,Binkley G,Simison M,Miyasato SR,Cherry JM

    更新日期:2018-01-04 00:00:00

  • Coordination logic of the sensing machinery in the transcriptional regulatory network of Escherichia coli.

    abstract::The active and inactive state of transcription factors in growing cells is usually directed by allosteric physicochemical signals or metabolites, which are in turn either produced in the cell or obtained from the environment by the activity of the products of effector genes. To understand the regulatory dynamics and t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm743

    authors: Janga SC,Salgado H,Martínez-Antonio A,Collado-Vides J

    更新日期:2007-01-01 00:00:00

  • Different sequence signatures in the upstream regions of plant and animal tRNA genes shape distinct modes of regulation.

    abstract::In eukaryotes, the transcription of tRNA genes is initiated by the concerted action of transcription factors IIIC (TFIIIC) and IIIB (TFIIIB) which direct the recruitment of polymerase III. While TFIIIC recognizes highly conserved, intragenic promoter elements, TFIIIB binds to the non-coding 5'-upstream regions of the ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1257

    authors: Zhang G,Lukoszek R,Mueller-Roeber B,Ignatova Z

    更新日期:2011-04-01 00:00:00

  • MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions.

    abstract::Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of i...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw374

    authors: Li M,Simonetti FL,Goncearenco A,Panchenko AR

    更新日期:2016-07-08 00:00:00

  • Molecular characterization of a gene encoding a photolyase from Streptomyces griseus.

    abstract::By using a synthetic DNA probe derived from an amino acid sequence in the most conserved region of three known photolyases (Escherichia coli, Anacystis nidulans and Saccharomyces cerevisiae), we isolated a DNA fragment containing two long open reading frames (ORFs) from a genomic DNA library of Streptomyces griseus. O...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/17.12.4731

    authors: Kobayashi T,Takao M,Oikawa A,Yasui A

    更新日期:1989-06-26 00:00:00

  • Inhibition of telomerase by 2'-O-(2-methoxyethyl) RNA oligomers: effect of length, phosphorothioate substitution and time inside cells.

    abstract::2'-O-(2-methoxyethyl) (2'-MOE) RNA possesses favorable pharmocokinetic properties that make it a promising option for the design of oligonucleotide drugs. Telomerase is a ribonucleoprotein that is up-regulated in many types of cancer, but its potential as a target for chemotherapy awaits the development of potent and ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.8.1683

    authors: Elayadi AN,Demieville A,Wancewicz EV,Monia BP,Corey DR

    更新日期:2001-04-15 00:00:00

  • BCNTB bioinformatics: the next evolutionary step in the bioinformatics of breast cancer tissue banking.

    abstract::Here, we present an update of Breast Cancer Now Tissue Bank bioinformatics, a rich platform for the sharing, mining, integration and analysis of breast cancer data. Its modalities provide researchers with access to a centralised information gateway from which they can access a network of bioinformatic resources to que...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx913

    authors: Gadaleta E,Pirrò S,Dayem Ullah AZ,Marzec J,Chelala C

    更新日期:2018-01-04 00:00:00

  • European Nucleotide Archive in 2016.

    abstract::The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) offers a rich platform for data sharing, publishing and archiving and a globally comprehensive data set for onward use by the scientific community. With a broad scope spanning raw sequencing reads, genome assemblies and functional annotation, the resource...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1106

    authors: Toribio AL,Alako B,Amid C,Cerdeño-Tarrága A,Clarke L,Cleland I,Fairley S,Gibson R,Goodgame N,Ten Hoopen P,Jayathilaka S,Kay S,Leinonen R,Liu X,Martínez-Villacorta J,Pakseresht N,Rajan J,Reddy K,Rosello M,Silvester N

    更新日期:2017-01-04 00:00:00

  • alpha-DNA. I. Synthesis, characterization by high field 1H-NMR, and base-pairing properties of the unnatural hexadeoxyribonucleotide alpha-[d(CpCpTpTpCpC)] with its complement beta-[d(GpGpApApGpG)].

    abstract::The novel deoxyribonucleotide alpha-[d(CpCpTpTpCpC)] and its complement beta-[d(GpGpApApGpG)] were synthesized by the phosphotriester method. 1H-NMR-NOE examination of the alpha-hexamer revealed that the cytosine and thymine bases appear to adopt anti conformations in this strand. In addition the deoxyribose of the th...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/14.12.5019

    authors: Morvan F,Rayner B,Imbach JL,Chang DK,Lown JW

    更新日期:1986-06-25 00:00:00

  • Transcription of intragenic CpG islands influences spatiotemporal host gene pre-mRNA processing.

    abstract::Alternative splicing (AS) and alternative polyadenylation (APA) generate diverse transcripts in mammalian genomes during development and differentiation. Epigenetic marks such as trimethylation of histone H3 lysine 36 (H3K36me3) and DNA methylation play a role in generating transcriptome diversity. Intragenic CpG isla...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkaa556

    authors: Amante SM,Montibus B,Cowley M,Barkas N,Setiadi J,Saadeh H,Giemza J,Contreras-Castillo S,Fleischanderl K,Schulz R,Oakey RJ

    更新日期:2020-09-04 00:00:00

  • LigParGen web server: an automatic OPLS-AA parameter generator for organic ligands.

    abstract::The accurate calculation of protein/nucleic acid-ligand interactions or condensed phase properties by force field-based methods require a precise description of the energetics of intermolecular interactions. Despite the progress made in force fields, small molecule parameterization remains an open problem due to the m...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx312

    authors: Dodda LS,Cabeza de Vaca I,Tirado-Rives J,Jorgensen WL

    更新日期:2017-07-03 00:00:00

  • The RdgC protein employs a novel mechanism involving a finger domain to bind to circular DNA.

    abstract::The DNA-binding protein RdgC has been identified as an inhibitor of RecA-mediated homologous recombination in Escherichia coli. In Neisseria species, RdgC also has a role in virulence-associated antigenic variation. We have previously solved the crystal structure of the E. coli RdgC protein and shown it to form a toro...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq509

    authors: Briggs GS,Yu J,Mahdi AA,Lloyd RG

    更新日期:2010-10-01 00:00:00

  • Magnetic tweezers measurements of the nanomechanical stability of DNA against denaturation at various conditions of pH and ionic strength.

    abstract::The opening of DNA double strands is extremely relevant to several biological functions, such as replication and transcription or binding of specific proteins. Such opening phenomenon is particularly sensitive to the aqueous solvent conditions in which the DNA molecule is dispersed, as it can be observed by considerin...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks1206

    authors: Tempestini A,Cassina V,Brogioli D,Ziano R,Erba S,Giovannoni R,Cerrito MG,Salerno D,Mantegazza F

    更新日期:2013-02-01 00:00:00

  • One RNA aptamer sequence, two structures: a collaborating pair that inhibits AMPA receptors.

    abstract::RNA is ideally suited for in vitro evolution experiments, because a single RNA molecule possesses both genotypic (replicable sequence) and phenotypic (selectable shape) properties. Using systematic evolution of ligands by exponential enrichment (SELEX), we found a single 58-nt aptamer sequence that assumes two structu...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp284

    authors: Huang Z,Pei W,Han Y,Jayaseelan S,Shekhtman A,Shi H,Niu L

    更新日期:2009-07-01 00:00:00

  • Ordered distribution of modified bases in the DNA of a dinoflagellate.

    abstract::In DNA of the dinoflagellate Crypthecodinium cohnii, 38% of the thymine is replaced by the modified base 5-hydroxymethyluracil, and approximately 3% of the cytosine is replaced by 5-methylcytosine. Both of the modified bases are non-randomly distributed in the DNA. Determinations of 3' nearest neighbors show that HOMe...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/8.20.4709

    authors: Steele RE,Rae PM

    更新日期:1980-10-24 00:00:00

  • Sequence preference and structural heterogeneity of BZ junctions.

    abstract::BZ junctions, which connect B-DNA to Z-DNA, are necessary for local transformation of B-DNA to Z-DNA in the genome. However, the limited information on the junction-forming sequences and junction structures has led to a lack of understanding of the structural diversity and sequence preferences of BZ junctions. We dete...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky784

    authors: Kim D,Hur J,Han JH,Ha SC,Shin D,Lee S,Park S,Sugiyama H,Kim KK

    更新日期:2018-11-02 00:00:00

  • PReMod: a database of genome-wide mammalian cis-regulatory module predictions.

    abstract::We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656-668, exploits the fact that many known CRMs are made of clusters of phylogenetically conser...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl879

    authors: Ferretti V,Poitras C,Bergeron D,Coulombe B,Robert F,Blanchette M

    更新日期:2007-01-01 00:00:00