HapZipper: sharing HapMap populations just got easier.

Abstract:

:The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.

journal_name

Nucleic Acids Res

journal_title

Nucleic acids research

authors

Chanda P,Elhaik E,Bader JS

doi

10.1093/nar/gks709

subject

Has Abstract

pub_date

2012-11-01 00:00:00

pages

e159

issue

20

eissn

0305-1048

issn

1362-4962

pii

gks709

journal_volume

40

pub_type

杂志文章
  • PCR amplification from single DNA molecules on magnetic beads in emulsion: application for high-throughput screening of transcription factor targets.

    abstract::We have developed a novel method of genetic library construction on magnetic microbeads based on solid-phase single-molecule PCR in a fine and robust water-phase compartment formed in water-in-oil (w/o) emulsions. In this method, critically diluted DNA fragments were distributed over the emulsion as templates, where b...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gni143

    authors: Kojima T,Takei Y,Ohtsuka M,Kawarasaki Y,Yamane T,Nakano H

    更新日期:2005-10-06 00:00:00

  • Structures of m-iodo Hoechst-DNA complexes in crystals with reduced solvent content: implications for minor groove binder drug design.

    abstract::The DNA photosensitisers m-iodo Hoechst and m-iodo, p-methoxy Hoechst have been co-crystallised with the oligonucleotide d(CGCGAATTCGCG)(2)and their crystal structures determined. The crystals were then subjected to slow dehydration, which reduced their solvent contents from 40 (normal) to 30 (partially dehydrated) an...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/28.5.1252

    authors: Squire CJ,Baker LJ,Clark GR,Martin RF,White J

    更新日期:2000-03-01 00:00:00

  • Topoisomerase II regulates yeast genes with singular chromatin architectures.

    abstract::Eukaryotic topoisomerase II (topo II) is the essential decatenase of newly replicated chromosomes and the main relaxase of nucleosomal DNA. Apart from these general tasks, topo II participates in more specialized functions. In mammals, topo IIα interacts with specific RNA polymerases and chromatin-remodeling complexes...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkt707

    authors: Nikolaou C,Bermúdez I,Manichanh C,García-Martinez J,Guigó R,Pérez-Ortín JE,Roca J

    更新日期:2013-11-01 00:00:00

  • DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model.

    abstract::RNA pseudoknots are functional structure elements with key roles in viral and cellular processes. Prediction of a pseudoknotted minimum free energy structure is an NP-complete problem. Practical algorithms for RNA structure prediction including restricted classes of pseudoknots suffer from high runtime and poor accura...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq021

    authors: Sperschneider J,Datta A

    更新日期:2010-04-01 00:00:00

  • Microcomputer programs for back translation of protein to DNA sequences and analysis of ambiguous DNA sequences.

    abstract::Three computer programs are described which may be used to translate a DNA sequence into a protein sequence, back translate the protein sequence into an ambiguous DNA sequence, and then do pattern searching in the ambiguous sequence. The programs are written in the C programming language, have been compiled to run on ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/12.1part2.819

    authors: Mount DW,Conrad B

    更新日期:1984-01-11 00:00:00

  • Recombinant hnRNP protein A1 and its N-terminal domain show preferential affinity for oligodeoxynucleotides homologous to intron/exon acceptor sites.

    abstract::The reported binding preference of human hnRNP protein A1 for the 3'-splice site of some introns (Swanson and Dreyfuss (1988) EMBO J. 7, 3519-3529; Mayrand and Pederson (1990) Nucleic Acids Res. 18, 3307-3318) was tested by assaying in vitro the binding of purified recombinant A1 protein (expressed in bacteria) to syn...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/18.22.6595

    authors: Buvoli M,Cobianchi F,Biamonti G,Riva S

    更新日期:1990-11-25 00:00:00

  • Isolation and developmental expression of a rat cDNA encoding a cysteine-rich zinc finger protein.

    abstract::A number of cysteine-rich proteins have recently been isolated by homology screening, differential library screens, and association with other proteins. In this report, we describe the isolation of the rat cysteine-rich protein from a rat brain library during a search for clones with homology to the delta-opioid recep...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/22.24.5477

    authors: McLaughlin CR,Tao Q,Abood ME

    更新日期:1994-12-11 00:00:00

  • A small nucleolar RNA functions in rRNA processing in Caenorhabditis elegans.

    abstract::CeR-2 RNA is one of the newly identified Caenorhabditis elegans noncoding RNAs (ncRNAs). The characterization of CeR-2 by RNomic studies has failed to classify it into any known ncRNA family. In this study, we examined the spatiotemporal expression patterns of CeR-2 to gain insight into its function. CeR-2 is expresse...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq335

    authors: Hokii Y,Sasano Y,Sato M,Sakamoto H,Sakata K,Shingai R,Taneda A,Oka S,Himeno H,Muto A,Fujiwara T,Ushida C

    更新日期:2010-09-01 00:00:00

  • Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology.

    abstract::Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; howev...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkz1173

    authors: Zhou W,Emery SB,Flasch DA,Wang Y,Kwan KY,Kidd JM,Moran JV,Mills RE

    更新日期:2020-02-20 00:00:00

  • A clustering property of highly-degenerate transcription factor binding sites in the mammalian genome.

    abstract::Transcription factor binding sites (TFBSs) are short DNA sequences interacting with transcription factors (TFs), which regulate gene expression. Due to the relatively short length of such binding sites, it is largely unclear how the specificity of protein-DNA interaction is achieved. Here, we have performed a genome-w...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkl248

    authors: Zhang C,Xuan Z,Otto S,Hover JR,McCorkle SR,Mandel G,Zhang MQ

    更新日期:2006-05-02 00:00:00

  • Impact of probe annotation on the integration of miRNA-mRNA expression profiles for miRNA target detection.

    abstract::MicroRNAs (miRNAs) are small non-coding RNAs that mediate gene expression at the post-transcriptional and translational levels by an imperfect binding to target mRNA 3'UTR regions. While the ab-initio computational prediction of miRNA-mRNA interactions still poses significant challenges, it is possible to overcome som...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkp1239

    authors: Sales G,Coppe A,Bicciato S,Bortoluzzi S,Romualdi C

    更新日期:2010-04-01 00:00:00

  • Tripartite mitochondrial genome of spinach: physical structure, mitochondrial gene mapping, and locations of transposed chloroplast DNA sequences.

    abstract::A complete physical map of the spinach mitochondrial genome has been established. The entire sequence content of 327 kilobase pairs (kb) is postulated to occur as a single circular molecule. Two directly repeated elements of approximately 6 kb, located on this "master chromosome", are proposed to participate in an int...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/14.14.5651

    authors: Stern DB,Palmer JD

    更新日期:1986-07-25 00:00:00

  • An abasic site in DNA. Solution conformation determined by proton NMR and molecular mechanics calculations.

    abstract::We have determined the three-dimensional structure of a non-selfcomplementary nonanucleotide duplex which contains an abasic (apyrimidinic) site in the centre, i.e. a deoxyribose residue opposite an adenosine. The majority of the base and sugar proton resonances were assigned by NOESY, COSY and 2DQF spectra in D2O and...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/15.19.8003

    authors: Cuniasse P,Sowers LC,Eritja R,Kaplan B,Goodman MF,Cognet JA,LeBret M,Guschlbauer W,Fazakerley GV

    更新日期:1987-10-12 00:00:00

  • Sustained expression of miR-26a promotes chromosomal instability and tumorigenesis through regulation of CHFR.

    abstract::MicroRNA 26a (miR-26a) reduces cell viability in several cancers, indicating that miR-26a could be used as a therapeutic option in patients. We demonstrate that miR-26a not only inhibits G1-S cell cycle transition and promotes apoptosis, as previously described, but also regulates multiple cell cycle checkpoints. We s...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx022

    authors: Castellano L,Dabrowska A,Pellegrino L,Ottaviani S,Cathcart P,Frampton AE,Krell J,Stebbing J

    更新日期:2017-05-05 00:00:00

  • R-Coffee: a web server for accurately aligning noncoding RNA sequences.

    abstract::The R-Coffee web server produces highly accurate multiple alignments of noncoding RNA (ncRNA) sequences, taking into account predicted secondary structures. R-Coffee uses a novel algorithm recently incorporated in the T-Coffee package. R-Coffee works along the same lines as T-Coffee: it uses pairwise or multiple seque...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn278

    authors: Moretti S,Wilm A,Higgins DG,Xenarios I,Notredame C

    更新日期:2008-07-01 00:00:00

  • The production of PCR products with 5' single-stranded tails using primers that incorporate novel phosphoramidite intermediates.

    abstract::We have prepared several novel phosphoramidites and have synthesised oligonucleotides incorporating them internally. The presence of these residues in an oligonucleotide template presents an impossible barrier to primed synthesis by Taq DNA polymerase. When extended as polymerase chain reaction products, these oligonu...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.5.1155

    authors: Newton CR,Holland D,Heptinstall LE,Hodgson I,Edge MD,Markham AF,McLean MJ

    更新日期:1993-03-11 00:00:00

  • Polynucleotide:adenosine glycosidase activity of ribosome-inactivating proteins: effect on DNA, RNA and poly(A).

    abstract::Ribosome-inactivating proteins (RIP) are a family of plant enzymes for which a unique activity was determined: rRNAN-glycosidase at a specific universally conserved position, A4324in the case of rat ribosomes. Recently we have shown that the RIP from Saponaria officinalis have a much wider substrate specificity: they ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/25.3.518

    authors: Barbieri L,Valbonesi P,Bonora E,Gorini P,Bolognesi A,Stirpe F

    更新日期:1997-02-01 00:00:00

  • 3' Alu PCR: a simple and rapid method to isolate human polymorphic markers.

    abstract::Microsatellites, such as (TG)n found at random throughout the genome, or as 3' extensions of Alu sequences are being increasingly used as genetic markers because of their pluriallelic character. The search for polymorphic microsatellites is time consuming, however, as it is necessary to sequence clones containing the ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/20.6.1333

    authors: Charlieu JP,Laurent AM,Carter DA,Bellis M,Roizès G

    更新日期:1992-03-25 00:00:00

  • 5' flanking sequence and genomic structure of Egr-1, a murine mitogen inducible zinc finger encoding gene.

    abstract::Egr-1 is a murine zinc finger encoding cDNA whose expression is modulated by a variety of ligand-receptor interactions and is often coregulated with c-fos (1). This study reports the isolation of a mouse Egr-1 genomic clone, its intron-exon structure, and 935 bp of 5' flanking sequence. The gene spans about 3.8 kb and...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/16.18.8835

    authors: Tsai-Morris CH,Cao XM,Sukhatme VP

    更新日期:1988-09-26 00:00:00

  • Génolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes.

    abstract::The Génolevures online database (http://cbi.labri.fr/Genolevures/ and http://genolevures.org/) provides exploratory tools and curated data sets relative to nine complete and seven partial genome sequences determined and manually annotated by the Génolevures Consortium, to facilitate comparative genomic studies of Hemi...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkn859

    authors: Sherman DJ,Martin T,Nikolski M,Cayla C,Souciet JL,Durrens P,Génolevures Consortium.

    更新日期:2009-01-01 00:00:00

  • MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score.

    abstract::Reliable prediction of orthology is central to comparative genomics. Approaches based on phylogenetic analyses closely resemble the original definition of orthology and paralogy and are known to be highly accurate. However, the large computational cost associated to these analyses is a limiting factor that often preve...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkq953

    authors: Pryszcz LP,Huerta-Cepas J,Gabaldón T

    更新日期:2011-03-01 00:00:00

  • Encapsidation of heterologous RNAs by bacteriophage MS2 coat protein.

    abstract::The RNA bacteriophages of E. coli specifically encapsidate a single copy of the viral genome in a protein shell composed mainly of 180 molecules of coat protein. Coat protein is also a translational repressor and shuts off viral replicase synthesis by interaction with a RNA stem-loop containing the replicase initiatio...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/21.19.4621

    authors: Pickett GG,Peabody DS

    更新日期:1993-09-25 00:00:00

  • SRPDB (Signal Recognition Particle Database).

    abstract::Signal recognition particle (SRP) is a stable cytoplasmic ribonucleoprotein complex that serves to translocate secretory proteins across membranes during translation. The SRP Database (SRPDB) provides compilations of SRP components, ordered alphabetically and phylogenetically. Alignments emphasize phylogenetically-sup...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/29.1.169

    authors: Gorodkin J,Knudsen B,Zwieb C,Samuelsson T

    更新日期:2001-01-01 00:00:00

  • A conserved U-rich RNA region implicated in regulation of translation in Plasmodium female gametocytes.

    abstract::Translational repression (TR) plays an important role in post-transcriptional regulation of gene expression and embryonic development in metazoans. TR also regulates the expression of a subset of the cytoplasmic mRNA population during development of fertilized female gametes of the unicellular malaria parasite, Plasmo...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkm1142

    authors: Braks JA,Mair GR,Franke-Fayard B,Janse CJ,Waters AP

    更新日期:2008-03-01 00:00:00

  • Transcript quantitation in total yeast cellular RNA using kinetic PCR.

    abstract::Kinetically monitored, reverse transcriptase-initiated PCR (kinetic RT-PCR, kRT-PCR) is a novel application of kinetic PCR for high throughput transcript quantitation in total cellular RNA. The assay offers the simplicity and flexibility of an enzyme assay with distinct advantages over DNA microarray hybridization and...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/28.2.e2

    authors: Kang JJ,Watson RM,Fisher ME,Higuchi R,Gelfand DH,Holland MJ

    更新日期:2000-01-15 00:00:00

  • High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2.

    abstract::The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence id...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/28.20.e88

    authors: Abécassis V,Pompon D,Truan G

    更新日期:2000-10-15 00:00:00

  • ChIPprimersDB: a public repository of verified qPCR primers for chromatin immunoprecipitation (ChIP).

    abstract::Chromatin immunoprecipitation (ChIP) has ushered in a new era of scientific discovery by allowing new insights into DNA-protein interactions. ChIP is used to quantify enriched genomic regions using qPCR, and more recently is combined with next generation sequencing (ChIP-seq) to obtain a genome wide profile of protein...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gky813

    authors: Kurtenbach S,Reddy R,Harbour JW

    更新日期:2019-01-08 00:00:00

  • The siRNA suppressor RTL1 is redox-regulated through glutathionylation of a conserved cysteine in the double-stranded-RNA-binding domain.

    abstract::RNase III enzymes cleave double stranded (ds)RNA. This is an essential step for regulating the processing of mRNA, rRNA, snoRNA and other small RNAs, including siRNA and miRNA. Arabidopsis thaliana encodes nine RNase III: four DICER-LIKE (DCL) and five RNASE THREE LIKE (RTL). To better understand the molecular functio...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkx820

    authors: Charbonnel C,Niazi AK,Elvira-Matelot E,Nowak E,Zytnicki M,de Bures A,Jobet E,Opsomer A,Shamandi N,Nowotny M,Carapito C,Reichheld JP,Vaucheret H,Sáez-Vásquez J

    更新日期:2017-11-16 00:00:00

  • Replication of a carcinogenic nitropyrene DNA lesion by human Y-family DNA polymerase.

    abstract::Nitrated polycyclic aromatic hydrocarbons are common environmental pollutants, of which many are mutagenic and carcinogenic. 1-Nitropyrene is the most abundant nitrated polycyclic aromatic hydrocarbon, which causes DNA damage and is carcinogenic in experimental animals. Error-prone translesion synthesis of 1-nitropyre...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gks1296

    authors: Kirouac KN,Basu AK,Ling H

    更新日期:2013-02-01 00:00:00

  • Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium.

    abstract::Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and ...

    journal_title:Nucleic acids research

    pub_type: 杂志文章

    doi:10.1093/nar/gkv608

    authors: Yu NY,Hallström BM,Fagerberg L,Ponten F,Kawaji H,Carninci P,Forrest AR,Fantom Consortium.,Hayashizaki Y,Uhlén M,Daub CO

    更新日期:2015-08-18 00:00:00