mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications.

Abstract:

:High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Hach F,Sarrafi I,Hormozdiari F,Alkan C,Eichler EE,Sahinalp SC

doi

10.1093/nar/gku370

subject

Has Abstract

pub_date

2014-07-01 00:00:00

pages

W494-500

issue

Web Server issue

eissn

0305-1048

issn

1362-4962

pii

gku370

journal_volume

42

pub_type

杂志文章
  • NMR solution structure of an asymmetric intermolecular leaped V-shape G-quadruplex: selective recognition of the d(G2NG3NG4) sequence motif by a short linear G-rich DNA probe.

    abstract::Aside from classical loops among G-quadruplexes, the unique leaped V-shape scaffold spans over three G-tetrads, without any intervening residues. This scaffold enables a sharp reversal of two adjacent strand directions and simultaneously participates in forming the G-tetrad core. These features make this scaffold itse...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky1167

    authors: Wan C,Fu W,Jing H,Zhang N

    更新日期:2019-02-20 00:00:00

  • Analysis of the proximal transcriptional element of the myelin basic protein gene.

    abstract::The gene encoding myelin basic protein (MBP) contains multiple activator sequences spanning upstream of its transcriptional initiation site which differentially promote transcription in glial cells. The proximal activator sequence, designated MB1, activates transcription in a glial cell type specific manner. This sequ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.3.545

    authors: Devine-Beach K,Haas S,Khalili K

    更新日期:1992-02-11 00:00:00

  • Single-cell transcriptome analysis of Physcomitrella leaf cells during reprogramming using microcapillary manipulation.

    abstract::Next-generation sequencing technologies have made it possible to carry out transcriptome analysis at the single-cell level. Single-cell RNA-sequencing (scRNA-seq) data provide insights into cellular dynamics, including intercellular heterogeneity as well as inter- and intra-cellular fluctuations in gene expression tha...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz181

    authors: Kubo M,Nishiyama T,Tamada Y,Sano R,Ishikawa M,Murata T,Imai A,Lang D,Demura T,Reski R,Hasebe M

    更新日期:2019-05-21 00:00:00

  • Conserved and species-specific transcription factor co-binding patterns drive divergent gene regulation in human and mouse.

    abstract::The mouse is widely used as system to study human genetic mechanisms. However, extensive rewiring of transcriptional regulatory networks often confounds translation of findings between human and mouse. Site-specific gain and loss of individual transcription factor binding sites (TFBS) has caused functional divergence ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky018

    authors: Diehl AG,Boyle AP

    更新日期:2018-02-28 00:00:00

  • Genome-wide quantitative assessment of variation in DNA methylation patterns.

    abstract::Genomic DNA methylation contributes substantively to transcriptional regulations that underlie mammalian development and cellular differentiation. Much effort has been made to decipher the molecular mechanisms governing the establishment and maintenance of DNA methylation patterns. However, little is known about genom...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkr017

    authors: Xie H,Wang M,de Andrade A,Bonaldo Mde F,Galat V,Arndt K,Rajaram V,Goldman S,Tomita T,Soares MB

    更新日期:2011-05-01 00:00:00

  • DNA intercalation without flipping in the specific ThaI-DNA complex.

    abstract::The PD-(D/E)XK type II restriction endonuclease ThaI cuts the target sequence CG/CG with blunt ends. Here, we report the 1.3 Å resolution structure of the enzyme in complex with substrate DNA and a sodium or calcium ion taking the place of a catalytic magnesium ion. The structure identifies Glu54, Asp82 and Lys93 as t...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq834

    authors: Firczuk M,Wojciechowski M,Czapinska H,Bochtler M

    更新日期:2011-01-01 00:00:00

  • Reorganization of terminator DNA upon binding replication terminator protein: implications for the functional replication fork arrest complex.

    abstract::Termination of DNA replication in Bacillus subtilis involves the polar arrest of replication forks by a specific complex formed between the replication terminator protein (RTP) and DNA terminator sites. While determination of the crystal structure of RTP has facilitated our understanding of how a single RTP dimer inte...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.3.590

    authors: Kralicek AV,Wilson PK,Ralston GB,Wake RG,King GF

    更新日期:1997-02-01 00:00:00

  • NF45 dimerizes with NF90, Zfr and SPNR via a conserved domain that has a nucleotidyltransferase fold.

    abstract::Nuclear factors NF90 and NF45 form a complex involved in a variety of cellular processes and are thought to affect gene expression both at the transcriptional and translational level. In addition, this complex affects the replication of several viruses through direct interactions with viral RNA. NF90 and NF45 dimerize...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks696

    authors: Wolkowicz UM,Cook AG

    更新日期:2012-10-01 00:00:00

  • PHI-base: a new interface and further additions for the multi-species pathogen-host interactions database.

    abstract::The pathogen-host interactions database (PHI-base) is available at www.phi-base.org PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. In addition, literature that indicates specific ge...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkw1089

    authors: Urban M,Cuzick A,Rutherford K,Irvine A,Pedro H,Pant R,Sadanadan V,Khamari L,Billal S,Mohanty S,Hammond-Kosack KE

    更新日期:2017-01-04 00:00:00

  • Transcription Regulatory Regions Database (TRRD):its status in 1999.

    abstract::The Transcription Regulatory Regions Database (TRRD) is a curated database designed for accumulation of experimental data on extended regulatory regions of eukaryotic genes, the regulatory elements they contain, i.e., transcription factor binding sites, promoters, enhancers, silencers, etc., and expression patterns of...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/27.1.303

    authors: Kolchanov NA,Ananko EA,Podkolodnaya OA,Ignatieva EV,Stepanenko IL,Kel-Margoulis OV,Kel AE,Merkulova TI,Goryachkovskaya TN,Busygina TV,Kolpakov FA,Podkolodny NL,Naumochkin AN,Romashchenko AG

    更新日期:1999-01-01 00:00:00

  • In vivo screening of modified siRNAs for non-specific antiviral effect in a small fish model: number and localization in the strands are important.

    abstract::Small interfering RNAs (siRNAs) are promising new active compounds in gene medicine but the induction of non-specific immune responses following their delivery continues to be a serious problem. With the purpose of avoiding such effects chemically modified siRNAs are tested in screening assay but often only examining ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks033

    authors: Schyth BD,Bramsen JB,Pakula MM,Larashati S,Kjems J,Wengel J,Lorenzen N

    更新日期:2012-05-01 00:00:00

  • A gene-specific DNA sequencing chip for exploring molecular evolutionary change.

    abstract::Sequencing by hybridization (SBH) approaches to DNA sequencing face two conflicting constraints. First, in order to ensure that the target DNA binds reliably, the oligonucleotide probes that are attached to the chip array must be >15 bp in length. Secondly, the total number of possible 15 bp oligonucleotides is too la...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkh210

    authors: Fedrigo O,Naylor G

    更新日期:2004-02-18 00:00:00

  • Checkpoint kinase 1 negatively regulates somatic hypermutation.

    abstract::Immunoglobulin (Ig) diversification by somatic hypermutation in germinal center B cells is instrumental for maturation of the humoral immune response, but also bears the risk of excessive or aberrant genetic changes. Thus, introduction of DNA damage by activation-induced cytidine deaminase as well as DNA repair by mul...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt1378

    authors: Frankenberger S,Davari K,Fischer-Burkart S,Böttcher K,Tomi NS,Zimber-Strobl U,Jungnickel B

    更新日期:2014-04-01 00:00:00

  • Spermine-DNA complexes build up metastable structures. Small-angle X-ray scattering and circular dichroism studies.

    abstract::Spermine-DNA complexes have been examined by small-angle and wide-angle X-ray scattering as well as by circular dichroism studies. Condensed complexes are building up below a critical ionic strength. We have found that at one and the same ionic strength condensed complexes having two different supramolecular structure...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/7.5.1297

    authors: Becker M,Misselwitz R,Damaschun H,Damaschun G,Zirwer D

    更新日期:1979-11-10 00:00:00

  • Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis.

    abstract::To date, an effective therapeutic treatment that confers strong attenuation toward coronaviruses (CoVs) remains elusive. Of all the potential drug targets, the helicase of CoVs is considered to be one of the most important. Here, we first present the structure of the full-length Nsp13 helicase of SARS-CoV (SARS-Nsp13)...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz409

    authors: Jia Z,Yan L,Ren Z,Wu L,Wang J,Guo J,Zheng L,Ming Z,Zhang L,Lou Z,Rao Z

    更新日期:2019-07-09 00:00:00

  • The transfer RNA of certain Enterobacteriacae contain 2-methylthiozeatin riboside (ms2io6A) an isopentenyl adenosine derivative.

    abstract::Isopentenyl adenosine derivatives are always located adjacent to the 3' end of the anticodon in transfer RNA and have been implicated in certain biological functions. In the enteric bacterium, E. coli, the derivative is ms2i6A whereas in some plant associated bacteria the derivative is the hydroxylated form, ms2io6A. ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/10.18.5663

    authors: Janzer JJ,Raney JP,McLennan BD

    更新日期:1982-09-25 00:00:00

  • Characterization of multiple alternative RNAs resulting from antisense transcription of the PR264/SC35 splicing factor gene.

    abstract::The PR264/SC35 splicing factor belongs to the family of SR proteins which function as essential and alternative splicing factors. Here, we report that the human PR264/SC35 locus is bidirectionally transcribed. Double in situ hybridization experiments have allowed simultaneous detection of sense and antisense RNA in hu...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.22.4513

    authors: Sureau A,Soret J,Guyon C,Gaillard C,Dumon S,Keller M,Crisanti P,Perbal B

    更新日期:1997-11-15 00:00:00

  • Structure and transcription of the Drosophila mulleri alcohol dehydrogenase genes.

    abstract::The D. melanogaster Adh gene is transcribed from two different promoters; a proximal (larval) promoter is active during late embryonic and larval stages, and a distal (adult) promoter is active primarily in third instar larvae and in adult flies (1). Genetic analyses suggest that several species of the mulleri subgrou...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/13.19.6899

    authors: Fischer JA,Maniatis T

    更新日期:1985-10-11 00:00:00

  • Different sequence signatures in the upstream regions of plant and animal tRNA genes shape distinct modes of regulation.

    abstract::In eukaryotes, the transcription of tRNA genes is initiated by the concerted action of transcription factors IIIC (TFIIIC) and IIIB (TFIIIB) which direct the recruitment of polymerase III. While TFIIIC recognizes highly conserved, intragenic promoter elements, TFIIIB binds to the non-coding 5'-upstream regions of the ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1257

    authors: Zhang G,Lukoszek R,Mueller-Roeber B,Ignatova Z

    更新日期:2011-04-01 00:00:00

  • Grad-seq shines light on unrecognized RNA and protein complexes in the model bacterium Escherichia coli.

    abstract::Stable protein complexes, including those formed with RNA, are major building blocks of every living cell. Escherichia coli has been the leading bacterial organism with respect to global protein-protein networks. Yet, there has been no global census of RNA/protein complexes in this model species of microbiology. Here,...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkaa676

    authors: Hör J,Di Giorgio S,Gerovac M,Venturini E,Förstner KU,Vogel J

    更新日期:2020-09-18 00:00:00

  • Superhelical stress restrained in plasmid DNA during repair synthesis initiated by the UvrA, B and C proteins in vitro.

    abstract::Purified UvrA, UvrB, UvrC, UvrD, PolA and Lig proteins from Escherichia coli have been used to assess the effect of nucleotide excision repair on the conformation of native negatively supercoiled plasmid DNA in an in vitro test system. The analysis of labeled reaction products on specific gel systems suggests that the...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/17.24.10337

    authors: Backendorf C,Olsthoorn R,van de Putte P

    更新日期:1989-12-25 00:00:00

  • Identification of HDA15-PIF1 as a key repression module directing the transcriptional network of seed germination in the dark.

    abstract::Light is a major external factor in regulating seed germination. Photoreceptor phytochrome B (PHYB) plays a predominant role in promoting seed germination in the initial phase after imbibition, partially by repressing phytochrome-interacting factor1 (PIF1). However, the mechanism underlying the PHYB-PIF1-mediated tran...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx283

    authors: Gu D,Chen CY,Zhao M,Zhao L,Duan X,Duan J,Wu K,Liu X

    更新日期:2017-07-07 00:00:00

  • Processing of the external transcribed spacer of murine rRNA and site of action of actinomycin D.

    abstract::The primary rRNA transcript contains a large external transcribed spacer (ETS) approximately 4,000 nucleotides in length. We have used subcloned DNA probes derived from the 5' end of the ETS in conjunction with Northern blot analysis of murine nuclear RNA to examine processing of this region. In agreement with the res...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/12.18.7187

    authors: Fetherston J,Werner E,Patterson R

    更新日期:1984-09-25 00:00:00

  • Sequences within and flanking hypersensitive sites 3 and 2 of the beta-globin locus control region required for synergistic versus additive interaction with the epsilon-globin gene promoter.

    abstract::The locus control region is required for high-level, position-independent expression of mammalian beta-globin genes. It is marked by five major DNase hypersensitive sites (HSs) in a 16 kb region of chromatin, and the protein-DNA complexes that form these HSs may interact in a holocomplex that carries out the full func...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/24.21.4327

    authors: Jackson JD,Miller W,Hardison RC

    更新日期:1996-11-01 00:00:00

  • Regulation of the small regulatory RNA MicA by ribonuclease III: a target-dependent pathway.

    abstract::MicA is a trans-encoded small non-coding RNA, which downregulates porin-expression in stationary-phase. In this work, we focus on the role of endoribonucleases III and E on Salmonella typhimurium sRNA MicA regulation. RNase III is shown to regulate MicA in a target-coupled way, while RNase E is responsible for the con...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq1239

    authors: Viegas SC,Silva IJ,Saramago M,Domingues S,Arraiano CM

    更新日期:2011-04-01 00:00:00

  • Nucleosomes undergo slow spontaneous gaping.

    abstract::In eukaryotes, DNA is packaged into a basic unit, the nucleosome which consists of 147 bp of DNA wrapped around a histone octamer composed of two copies each of the histones H2A, H2B, H3 and H4. Nucleosome structures are diverse not only by histone variants, histone modifications, histone composition but also through ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv276

    authors: Ngo TT,Ha T

    更新日期:2015-04-30 00:00:00

  • A possible origin of newly-born bacterial genes: significance of GC-rich nonstop frame on antisense strand.

    abstract::Base compositions were examined at every position in codons of more than 50 genes from taxonomically different bacteria and of the corresponding antisense sequences on the bacterial genes. We propose that the nonstop frame on antisense strand [NSF(a)] of GC-rich bacterial genes is the most promising sequence for newly...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/24.21.4249

    authors: Ikehara K,Amada F,Yoshida S,Mikata Y,Tanaka A

    更新日期:1996-11-01 00:00:00

  • AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system.

    abstract::We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be o...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl471

    authors: Bryson K,Loux V,Bossy R,Nicolas P,Chaillou S,van de Guchte M,Penaud S,Maguin E,Hoebeke M,Bessières P,Gibrat JF

    更新日期:2006-07-19 00:00:00

  • Variations of the C2H2 zinc finger motif in the yeast genome and classification of yeast zinc finger proteins.

    abstract::The PROSITE pattern Zinc_Finger_C2H2 was extended to permit the detection of all C2H2 zinc fingers and their parent proteins in the recently completed sequence of the yeast genome. Additionally, a new computer program was written that extracts other zinc binding motifs (non C2H2 'fingers'), overlapping with the classi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.12.2464

    authors: Böhm S,Frishman D,Mewes HW

    更新日期:1997-06-15 00:00:00

  • SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences.

    abstract::The single nucleotide polymorphisms (SNPs) in conserved protein regions have been thought to be strong candidates that alter protein functions. Thus, we have developed SNP@Domain, a web resource, to identify SNPs within human protein domains. We annotated SNPs from dbSNP with protein structure-based as well as sequenc...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl323

    authors: Han A,Kang HJ,Cho Y,Lee S,Kim YJ,Gong S

    更新日期:2006-07-01 00:00:00