An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.

Abstract:

:Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray.

journal_name

Genome Res

journal_title

Genome research

authors

Wang Y,Lu J,Yu J,Gibbs RA,Yu F

doi

10.1101/gr.146084.112

subject

Has Abstract

pub_date

2013-05-01 00:00:00

pages

833-42

issue

5

eissn

1088-9051

issn

1549-5469

pii

gr.146084.112

journal_volume

23

pub_type

杂志文章
  • Spotted long oligonucleotide arrays for human gene expression analysis.

    abstract::DNA microarrays produced by deposition (or 'spotting')of a single long oligonucleotide probe for each gene may be an attractive alternative to other types of arrays. We produced spotted oligonucleotide arrays using two large collections of approximately 70-mer probes, and used these arrays to analyze gene expression i...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1048803

    authors: Barczak A,Rodriguez MW,Hanspers K,Koth LL,Tai YC,Bolstad BM,Speed TP,Erle DJ

    更新日期:2003-07-01 00:00:00

  • Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages.

    abstract::Molecular evolution studies are usually based on the analysis of individual genes and thus reflect only small-range variations in genomic sequences. A complementary approach is to study the evolutionary history of rearrangements in entire genomes based on the analysis of gene orders. The progress in whole genome seque...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.3002305

    authors: Bourque G,Zdobnov EM,Bork P,Pevzner PA,Tesler G

    更新日期:2005-01-01 00:00:00

  • lobSTR: A short tandem repeat profiler for personal genomes.

    abstract::Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat S...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.135780.111

    authors: Gymrek M,Golan D,Rosset S,Erlich Y

    更新日期:2012-06-01 00:00:00

  • Reprogramming of the human intestinal epigenome by surgical tissue transposition.

    abstract::Extracellular cues play critical roles in the establishment of the epigenome during development and may also contribute to epigenetic perturbations found in disease states. The direct role of the local tissue environment on the post-development human epigenome, however, remains unclear due to limitations in studies of...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.166439.113

    authors: Lay FD,Triche TJ Jr,Tsai YC,Su SF,Martin SE,Daneshmand S,Skinner EC,Liang G,Chihara Y,Jones PA

    更新日期:2014-04-01 00:00:00

  • Software for automated analysis of DNA fingerprinting gels.

    abstract::Here we describe software tools for the automated detection of DNA restriction fragments resolved on agarose fingerprinting gels. We present a mathematical model for the location and shape of the restriction fragments as a function of fragment size, with model parameters determined empirically from "marker" lanes cont...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.904303

    authors: Fuhrmann DR,Krzywinski MI,Chiu R,Saeedi P,Schein JE,Bosdet IE,Chinwalla A,Hillier LW,Waterston RH,McPherson JD,Jones SJ,Marra MA

    更新日期:2003-05-01 00:00:00

  • Enzymatic regional methylation assay: a novel method to quantify regional CpG methylation density.

    abstract::We have developed a novel quantitative method for rapidly assessing the CpG methylation density of a DNA region in mammalian cells. After bisulfite modification of genomic DNA, the region of interest is PCR amplified with primers containing two dam sites (GATC). The purified PCR products are then incubated with 14C-la...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.202501

    authors: Galm O,Rountree MR,Bachman KE,Jair KW,Baylin SB,Herman JG

    更新日期:2002-01-01 00:00:00

  • Long noncoding RNAs in C. elegans.

    abstract::Thousands of long noncoding RNAs (lncRNAs) have been found in vertebrate animals, a few of which have known biological roles. To better understand the genomics and features of lncRNAs in invertebrates, we used available RNA-seq, poly(A)-site, and ribosome-mapping data to identify lncRNAs of Caenorhabditis elegans. We ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.140475.112

    authors: Nam JW,Bartel DP

    更新日期:2012-12-01 00:00:00

  • Selective enrichment of damaged DNA molecules for ancient genome sequencing.

    abstract::Contamination by present-day human and microbial DNA is one of the major hindrances for large-scale genomic studies using ancient biological material. We describe a new molecular method, U selection, which exploits one of the most distinctive features of ancient DNA--the presence of deoxyuracils--for selective enrichm...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.174201.114

    authors: Gansauge MT,Meyer M

    更新日期:2014-09-01 00:00:00

  • Screening of gene-associated polymorphisms by use of in-gel competitive reassociation and EST (cDNA) array hybridization.

    abstract::In-gel competitive reassociation (IGCR) is a method of differential subtraction to enrich polymorphic DNA restriction fragments between two DNA samples without probes or specific sequence information. Here, we show that by combining IGCR and expressed sequence tags (EST) array hybridization, polymorphic DNA fragments ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.434103

    authors: Gotoh K,Oishi M

    更新日期:2003-03-01 00:00:00

  • A Plasmodium gene family encoding Maurer's cleft membrane proteins: structural properties and expression profiling.

    abstract::Upon invasion of the erythrocyte cell, the malaria parasite remodels its environment; in particular, it establishes a complex membrane network, which connects the parasitophorous vacuole to the host plasma membrane and is involved in protein transport and trafficking. We have identified a novel subtelomeric gene famil...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.2126104

    authors: Sam-Yellowe TY,Florens L,Johnson JR,Wang T,Drazba JA,Le Roch KG,Zhou Y,Batalov S,Carucci DJ,Winzeler EA,Yates JR 3rd

    更新日期:2004-06-01 00:00:00

  • Recompleting the Caenorhabditis elegans genome.

    abstract::Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. el...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.244830.118

    authors: Yoshimura J,Ichikawa K,Shoura MJ,Artiles KL,Gabdank I,Wahba L,Smith CL,Edgley ML,Rougvie AE,Fire AZ,Morishita S,Schwarz EM

    更新日期:2019-06-01 00:00:00

  • Centromere repositioning.

    abstract::Primate pericentromeric regions recently have been shown to exhibit extraordinary evolutionary plasticity. In this paper we report an additional peculiar feature of these regions that we discovered while analyzing, by FISH, the evolutionary conservation of primate phylogenetic chromosome IX. If the position of the cen...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.9.12.1184

    authors: Montefalcone G,Tempesta S,Rocchi M,Archidiacono N

    更新日期:1999-12-01 00:00:00

  • A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.

    abstract::All individuals in a finite population are related if traced back long enough and will, therefore, share regions of their genomes identical by descent (IBD). Detection of such regions has several important applications-from answering questions about human evolution to locating regions in the human genome containing di...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.115360.110

    authors: Moltke I,Albrechtsen A,Hansen TV,Nielsen FC,Nielsen R

    更新日期:2011-07-01 00:00:00

  • Time course regulatory analysis based on paired expression and chromatin accessibility data.

    abstract::A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility da...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.257063.119

    authors: Duren Z,Chen X,Xin J,Wang Y,Wong WH

    更新日期:2020-04-01 00:00:00

  • Impact of genomics on research in the rat.

    abstract::The need to translate genes to function has positioned the rat as an invaluable animal model for genomic research. The significant increase in genomic resources in recent years has had an immediate functional application in the rat. Many of the resources for translational research are already in place and are ready to...

    journal_title:Genome research

    pub_type: 杂志文章,评审

    doi:10.1101/gr.3744005

    authors: Lazar J,Moreno C,Jacob HJ,Kwitek AE

    更新日期:2005-12-01 00:00:00

  • Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution.

    abstract::Low-copy repeats, or segmental duplications, are highly dynamic regions in the genome. The low-copy repeats on chromosome 22q11.2 (LCR22) are a complex mosaic of genes and pseudogenes formed by duplication processes; they mediate chromosome rearrangements associated with velo-cardio-facial syndrome/DiGeorge syndrome, ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1549503

    authors: Babcock M,Pavlicek A,Spiteri E,Kashork CD,Ioshikhes I,Shaffer LG,Jurka J,Morrow BE

    更新日期:2003-12-01 00:00:00

  • Genome-wide map of regulatory interactions in the human genome.

    abstract::Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencin...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.176586.114

    authors: Heidari N,Phanstiel DH,He C,Grubert F,Jahanbani F,Kasowski M,Zhang MQ,Snyder MP

    更新日期:2014-12-01 00:00:00

  • The amphioxus genome illuminates vertebrate origins and cephalochordate biology.

    abstract::Cephalochordates, urochordates, and vertebrates evolved from a common ancestor over 520 million years ago. To improve our understanding of chordate evolution and the origin of vertebrates, we intensively searched for particular genes, gene families, and conserved noncoding elements in the sequenced genome of the cepha...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.073676.107

    authors: Holland LZ,Albalat R,Azumi K,Benito-Gutiérrez E,Blow MJ,Bronner-Fraser M,Brunet F,Butts T,Candiani S,Dishaw LJ,Ferrier DE,Garcia-Fernàndez J,Gibson-Brown JJ,Gissi C,Godzik A,Hallböök F,Hirose D,Hosomichi K,Ikuta T,I

    更新日期:2008-07-01 00:00:00

  • A dynamic H3K27ac signature identifies VEGFA-stimulated endothelial enhancers and requires EP300 activity.

    abstract::Histone modifications are now well-established mediators of transcriptional programs that distinguish cell states. However, the kinetics of histone modification and their role in mediating rapid, signal-responsive gene expression changes has been little studied on a genome-wide scale. Vascular endothelial growth facto...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.149674.112

    authors: Zhang B,Day DS,Ho JW,Song L,Cao J,Christodoulou D,Seidman JG,Crawford GE,Park PJ,Pu WT

    更新日期:2013-06-01 00:00:00

  • Rescue of targeted regions of mammalian chromosomes by in vivo recombination in yeast.

    abstract::In contrast to other animal cell lines, the chicken pre-B cell lymphoma line, DT40, exhibits a high level of homologous recombination, which can be exploited to generate site-specific alterations in defined target genes or regions. In addition, the ability to generate human/chicken monochromosomal hybrids in the DT40 ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.6.666

    authors: Kouprina N,Kawamoto K,Barrett JC,Larionov V,Koi M

    更新日期:1998-06-01 00:00:00

  • Widespread plasticity in CTCF occupancy linked to DNA methylation.

    abstract::CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.136101.111

    authors: Wang H,Maurano MT,Qu H,Varley KE,Gertz J,Pauli F,Lee K,Canfield T,Weaver M,Sandstrom R,Thurman RE,Kaul R,Myers RM,Stamatoyannopoulos JA

    更新日期:2012-09-01 00:00:00

  • Relationship between histone modifications and transcription factor binding is protein family specific.

    abstract::The very small fraction of putative binding sites (BSs) that are occupied by transcription factors (TFs) in vivo can be highly variable across different cell types. This observation has been partly attributed to changes in chromatin accessibility and histone modification (HM) patterns surrounding BSs. Previous studies...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.220079.116

    authors: Xin B,Rohs R

    更新日期:2018-01-11 00:00:00

  • Closing the gaps on human chromosome 19 revealed genes with a high density of repetitive tandemly arrayed elements.

    abstract::The reported human genome sequence includes about 400 gaps of unknown sequence that were not found in the bacterial artificial chromosome (BAC) and cosmid libraries used for sequencing of the genome. These missing sequences correspond to approximately 1% of euchromatic regions of the human genome. Gap filling is a lab...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.1929904

    authors: Leem SH,Kouprina N,Grimwood J,Kim JH,Mullokandov M,Yoon YH,Chae JY,Morgan J,Lucas S,Richardson P,Detter C,Glavina T,Rubin E,Barrett JC,Larionov V

    更新日期:2004-02-01 00:00:00

  • Construction of a linkage map of the medaka (Oryzias latipes) and mapping of the Da mutant locus defective in dorsoventral patterning.

    abstract::Double anal fin (Da) is a medaka with an autosomal semidominant mutation that causes mirror image duplication of the ventral region concentrating on the caudal region. The chromosomal location of the Da gene and its sequence have remained unknown. We constructed a medaka linkage map as a first step to approach positio...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.9.12.1277

    authors: Ohtsuka M,Makino S,Yoda K,Wada H,Naruse K,Mitani H,Shima A,Ozato K,Kimura M,Inoko H

    更新日期:1999-12-01 00:00:00

  • Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination.

    abstract::The detailed genomic organization of a gene-dense region at human chromosome 12p13, spanning 223 kb of contiguous sequence, was determined. This region is composed of 20 genes and several other expressed sequences. Experimental tools including RT-PCR and cDNA sequencing, combined with gene prediction programs, were ut...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.7.3.268

    authors: Ansari-Lari MA,Shen Y,Muzny DM,Lee W,Gibbs RA

    更新日期:1997-03-01 00:00:00

  • A large database of chicken bursal ESTs as a resource for the analysis of vertebrate gene function.

    abstract::Chicken B cells create their immunoglobulin repertoire within the Bursa of Fabricius by gene conversion. The high homologous recombination activity is shared by the bursal B-cell-derived DT40 cell line, which integrates transfected DNA constructs at high rates into its endogenous loci. Targeted integration in DT40 is ...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.10.12.2062

    authors: Abdrakhmanov I,Lodygin D,Geroth P,Arakawa H,Law A,Plachy J,Korn B,Buerstedde JM

    更新日期:2000-12-01 00:00:00

  • Sequencing of cDNA clones from the genetic map of tomato (Lycopersicon esculentum).

    abstract::The dense RFLP linkage map of tomato (Lycopersicon esculentum) contains >300 anonymous cDNA clones. Of those clones, 272 were partially or completely sequenced. The sequences were compared at the DNA and protein level to known genes in databases. For 57% of the clones, a significant match to previously described genes...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.8.8.842

    authors: Ganal MW,Czihal R,Hannappel U,Kloos DU,Polley A,Ling HQ

    更新日期:1998-08-01 00:00:00

  • A complexity reduction algorithm for analysis and annotation of large genomic sequences.

    abstract::DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexit...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.313703

    authors: Chuang TJ,Lin WC,Lee HC,Wang CW,Hsiao KL,Wang ZH,Shieh D,Lin SC,Ch'ang LY

    更新日期:2003-02-01 00:00:00

  • Dissecting transcription regulatory pathways through a new bacterial one-hybrid reporter system.

    abstract::Sequence-specific DNA-binding transcription factors have widespread biological significance in the regulation of gene expression. However, in lower prokaryotes and eukaryotic metazoans, it is usually difficult to find transcription regulatory factors that recognize specific target promoters. To address this, we have d...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.086595.108

    authors: Guo M,Feng H,Zhang J,Wang W,Wang Y,Li Y,Gao C,Chen H,Feng Y,He ZG

    更新日期:2009-07-01 00:00:00

  • Large-scale genome analysis of bovine commensal Escherichia coli reveals that bovine-adapted E. coli lineages are serving as evolutionary sources of the emergence of human intestinal pathogenic strains.

    abstract::How pathogens evolve their virulence to humans in nature is a scientific issue of great medical and biological importance. Shiga toxin (Stx)-producing Escherichia coli (STEC) and enteropathogenic E. coli (EPEC) are the major foodborne pathogens that can cause hemolytic uremic syndrome and infantile diarrhea, respectiv...

    journal_title:Genome research

    pub_type: 杂志文章

    doi:10.1101/gr.249268.119

    authors: Arimizu Y,Kirino Y,Sato MP,Uno K,Sato T,Gotoh Y,Auvray F,Brugere H,Oswald E,Mainil JG,Anklam KS,Döpfer D,Yoshino S,Ooka T,Tanizawa Y,Nakamura Y,Iguchi A,Morita-Ishihara T,Ohnishi M,Akashi K,Hayashi T,Ogura Y

    更新日期:2019-09-01 00:00:00