A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.

Abstract:

BACKGROUND:Genome-wide association studies (GWAS) have identified many common polymorphisms associated with complex traits. However, these associated common variants explain only a small fraction of the phenotypic variances, leaving a substantial portion of genetic heritability unexplained. As a result, searches for "missing" heritability are drawing increasing attention, particularly for rare variant studies that often require a large sample size and, thus, extensive sequencing effort. Although the development of next generation sequencing (NGS) technologies has made it possible to sequence a large number of reads economically and efficiently, it is still often cost prohibitive to sequence thousands of individuals that are generally required for association studies. A more efficient and cost-effective design would involve pooling the genetic materials of multiple individuals together and then sequencing the pools, instead of the individuals. This pooled sequencing approach has improved the plausibility of association studies for rare variants, while, at the same time, posed a great challenge to the pooled sequencing data analysis, essentially because individual sample identity is lost, and NGS sequencing errors could be hard to distinguish from low frequency alleles. RESULTS:A unified approach for estimating minor allele frequency, SNP calling and association studies based on pooled sequencing data using an expectation maximization (EM) algorithm is developed in this paper. This approach makes it possible to study the effects of minor allele frequency, sequencing error rate, number of pools, number of individuals in each pool, and the sequencing depth on the estimation accuracy of minor allele frequencies. We show that the naive method of estimating minor allele frequencies by taking the fraction of observed minor alleles can be significantly biased, especially for rare variants. In contrast, our EM approach can give an unbiased estimate of the minor allele frequency under all scenarios studied in this paper. A SNP calling approach, EM-SNP, for pooled sequencing data based on the EM algorithm is then developed and compared with another recent SNP calling method, SNVer. We show that EM-SNP outperforms SNVer in terms of the fraction of db-SNPs among the called SNPs, as well as transition/transversion (Ti/Tv) ratio. Finally, the EM approach is used to study the association between variants and type I diabetes. CONCLUSIONS:The EM-based approach for the analysis of pooled sequencing data can accurately estimate minor allele frequencies, call SNPs, and find associations between variants and complex traits. This approach is especially useful for studies involving rare variants.

journal_name

BMC Genomics

journal_title

BMC genomics

authors

Chen Q,Sun F

doi

10.1186/1471-2164-14-S1-S1

subject

Has Abstract

pub_date

2013-01-01 00:00:00

pages

S1

issn

1471-2164

pii

1471-2164-14-S1-S1

journal_volume

14 Suppl 1

pub_type

杂志文章
  • Divergence of the SigB regulon and pathogenesis of the Bacillus cereus sensu lato group.

    abstract:BACKGROUND:The Bacillus cereus sensu lato group currently includes seven species (B. cereus, B. anthracis, B. mycoides, B. pseudomycoides, B. thuringiensis, B. weihenstephanensis and B. cytotoxicus) that recent phylogenetic and phylogenomic analyses suggest are likely a single species, despite their varied phenotypes. ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-13-564

    authors: Scott E 2nd,Dyer DW

    更新日期:2012-10-22 00:00:00

  • Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding.

    abstract:BACKGROUND:The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar h...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-57

    authors: Ralph SG,Chun HJ,Cooper D,Kirkpatrick R,Kolosova N,Gunter L,Tuskan GA,Douglas CJ,Holt RA,Jones SJ,Marra MA,Bohlmann J

    更新日期:2008-01-29 00:00:00

  • Comparative metabolite profiling of drought stress in roots and leaves of seven Triticeae species.

    abstract:BACKGROUND:Drought is a lifestyle disease. Plant metabolomics has been exercised for understanding the fine-tuning of the potential pathways to surmount the adverse effects of drought stress. A broad spectrum of morphological and metabolic responses from seven Triticeae species including wild types with different droug...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4321-2

    authors: Ullah N,Yüce M,Neslihan Öztürk Gökçe Z,Budak H

    更新日期:2017-12-15 00:00:00

  • Leaf transcriptome analysis of a subtropical evergreen broadleaf plant, wild oil-tea camellia (Camellia oleifera), revealing candidate genes for cold acclimation.

    abstract:BACKGROUND:Cold tolerance is a key determinant of the geographical distribution range of a plant species and crop production. Cold acclimation can enhance freezing-tolerance of plant species through a period of exposure to low nonfreezing temperatures. As a subtropical evergreen broadleaf plant, oil-tea camellia demons...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-3570-4

    authors: Chen J,Yang X,Huang X,Duan S,Long C,Chen J,Rong J

    更新日期:2017-02-28 00:00:00

  • Developing and applying a gene functional association network for anti-angiogenic kinase inhibitor activity assessment in an angiogenesis co-culture model.

    abstract:BACKGROUND:Tumor angiogenesis is a highly regulated process involving intercellular communication as well as the interactions of multiple downstream signal transduction pathways. Disrupting one or even a few angiogenesis pathways is often insufficient to achieve sustained therapeutic benefits due to the complexity of a...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-264

    authors: Chen Y,Wei T,Yan L,Lawrence F,Qian HR,Burkholder TP,Starling JJ,Yingling JM,Shou J

    更新日期:2008-06-02 00:00:00

  • Host DNA contents in fecal metagenomics as a biomarker for intestinal diseases and effective treatment.

    abstract:BACKGROUND:Compromised intestinal barrier (CIB) has been associated with many enteropathies, including colorectal cancer (CRC) and inflammatory bowel disease (IBD). We hypothesized that CIB could lead to increased host-derived contents including epithelial cells into the gut, change its physio-metabolic properties, and...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6749-z

    authors: Jiang P,Lai S,Wu S,Zhao XM,Chen WH

    更新日期:2020-05-11 00:00:00

  • The cytochrome P450 (CYP) gene superfamily in Daphnia pulex.

    abstract:BACKGROUND:Cytochrome P450s (CYPs) in animals fall into two categories: those that synthesize or metabolize endogenous molecules and those that interact with exogenous chemicals from the diet or the environment. The latter form a critical component of detoxification systems. RESULTS:Data mining and manual curation of ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-169

    authors: Baldwin WS,Marko PB,Nelson DR

    更新日期:2009-04-21 00:00:00

  • SNP identification and marker assay development for high-throughput selection of soybean cyst nematode resistance.

    abstract:BACKGROUND:Soybean cyst nematode (SCN) is the most economically devastating pathogen of soybean. Two resistance loci, Rhg1 and Rhg4 primarily contribute resistance to SCN race 3 in soybean. Peking and PI 88788 are the two major sources of SCN resistance with Peking requiring both Rhg1 and Rhg4 alleles and PI 88788 only...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-015-1531-3

    authors: Shi Z,Liu S,Noe J,Arelli P,Meksem K,Li Z

    更新日期:2015-04-18 00:00:00

  • Bcheck: a wrapper tool for detecting RNase P RNA genes.

    abstract:BACKGROUND:Effective bioinformatics solutions are needed to tackle challenges posed by industrial-scale genome annotation. We present Bcheck, a wrapper tool which predicts RNase P RNA genes by combining the speed of pattern matching and sensitivity of covariance models. The core of Bcheck is a library of subfamily spec...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-432

    authors: Yusuf D,Marz M,Stadler PF,Hofacker IL

    更新日期:2010-07-13 00:00:00

  • Genome-wide investigation of calcium-dependent protein kinase gene family in pineapple: evolution and expression profiles during development and stress.

    abstract:BACKGROUND:Calcium-dependent protein kinase (CPK) is one of the main Ca2+ combined protein kinase that play significant roles in plant growth, development and response to multiple stresses. Despite an important member of the stress responsive gene family, little is known about the evolutionary history and expression pa...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-6501-8

    authors: Zhang M,Liu Y,He Q,Chai M,Huang Y,Chen F,Wang X,Liu Y,Cai H,Qin Y

    更新日期:2020-01-23 00:00:00

  • Protein acetylation in mitochondria plays critical functions in the pathogenesis of fatty liver disease.

    abstract:BACKGROUND:Fatty liver is a high incidence of perinatal disease in dairy cows caused by negative energy balance, which seriously threatens the postpartum health and milk production. It has been reported that lysine acetylation plays an important role in substance and energy metabolism. Predictably, most metabolic proce...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-020-06837-y

    authors: Le-Tian Z,Cheng-Zhang H,Xuan Z,Zhang Q,Zhen-Gui Y,Qing-Qing W,Sheng-Xuan W,Zhong-Jin X,Ran-Ran L,Ting-Jun L,Zhong-Qu S,Zhong-Hua W,Ke-Rong S

    更新日期:2020-06-26 00:00:00

  • A comparison of Illumina and Ion Torrent sequencing platforms in the context of differential gene expression.

    abstract:BACKGROUND:Though Illumina has largely dominated the RNA-Seq field, the simultaneous availability of Ion Torrent has left scientists wondering which platform is most effective for differential gene expression (DGE) analysis. Previous investigations of this question have typically used reference samples derived from cel...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-017-4011-0

    authors: Lahens NF,Ricciotti E,Smirnova O,Toorens E,Kim EJ,Baruzzo G,Hayer KE,Ganguly T,Schug J,Grant GR

    更新日期:2017-08-10 00:00:00

  • Expression profiles of urbilaterian genes uniquely shared between honey bee and vertebrates.

    abstract:BACKGROUND:Large-scale comparison of metazoan genomes has revealed that a significant fraction of genes of the last common ancestor of Bilateria (Urbilateria) is lost in each animal lineage. This event could be one of the underlying mechanisms involved in generating metazoan diversity. However, the present functions of...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-10-17

    authors: Matsui T,Yamamoto T,Wyder S,Zdobnov EM,Kadowaki T

    更新日期:2009-01-12 00:00:00

  • The development and characterization of a 60K SNP chip for chicken.

    abstract:BACKGROUND:In livestock species like the chicken, high throughput single nucleotide polymorphism (SNP) genotyping assays are increasingly being used for whole genome association studies and as a tool in breeding (referred to as genomic selection). To be of value in a wide variety of breeds and populations, the success ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-12-274

    authors: Groenen MA,Megens HJ,Zare Y,Warren WC,Hillier LW,Crooijmans RP,Vereijken A,Okimoto R,Muir WM,Cheng HH

    更新日期:2011-05-31 00:00:00

  • Transcriptome analysis reveals differentially expressed genes associated with germ cell and gonad development in the Southern bluefin tuna (Thunnus maccoyii).

    abstract:BACKGROUND:Controlling and managing the breeding of bluefin tuna (Thunnus spp.) in captivity is an imperative step towards obtaining a sustainable supply of these fish in aquaculture production systems. Germ cell transplantation (GCT) is an innovative technology for the production of inter-species surrogates, by transp...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-2397-8

    authors: Bar I,Cummins S,Elizur A

    更新日期:2016-03-10 00:00:00

  • TICdb: a collection of gene-mapped translocation breakpoints in cancer.

    abstract:BACKGROUND:Despite the importance of chromosomal translocations in the initiation and/or progression of cancer, a comprehensive catalog of translocation breakpoints in which these are precisely located on the reference sequence of the human genome is not available at present. DESCRIPTION:We have created a database tha...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-8-33

    authors: Novo FJ,de Mendíbil IO,Vizmanos JL

    更新日期:2007-01-26 00:00:00

  • Anthocyanin biosynthetic genes in Brassica rapa.

    abstract:BACKGROUND:Anthocyanins are a group of flavonoid compounds. As a group of important secondary metabolites, they perform several key biological functions in plants. Anthocyanins also play beneficial health roles as potentially protective factors against cancer and heart disease. To elucidate the anthocyanin biosynthetic...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-426

    authors: Guo N,Cheng F,Wu J,Liu B,Zheng S,Liang J,Wang X

    更新日期:2014-06-04 00:00:00

  • In silico characterization of a novel putative aerotaxis chemosensory system in the myxobacterium, Corallococcus coralloides.

    abstract:BACKGROUND:An efficient signal transduction system allows a bacterium to sense environmental cues and then to respond positively or negatively to those signals; this process is referred to as taxis. In addition to external cues, the internal metabolic state of any bacterium plays a major role in determining its ability...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-5151-6

    authors: Sharma G,Parales R,Singer M

    更新日期:2018-10-19 00:00:00

  • Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data.

    abstract:BACKGROUND:Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits an...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4489-0

    authors: Ceballos FC,Hazelhurst S,Ramsay M

    更新日期:2018-01-30 00:00:00

  • Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction.

    abstract:BACKGROUND:Tandem mass spectrometry (MS/MS) has become a standard method for identification of proteins extracted from biological samples but the huge number and the noise contamination of MS/MS spectra obstruct swift and reliable computer-aided interpretation. Typically, a minor fraction of the spectra per sample (mos...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-S1-S13

    authors: Mujezinovic N,Schneider G,Wildpaner M,Mechtler K,Eisenhaber F

    更新日期:2010-02-10 00:00:00

  • Comparative DNA methylome analysis of endometrial carcinoma reveals complex and distinct deregulation of cancer promoters and enhancers.

    abstract:BACKGROUND:Aberrant DNA methylation is a hallmark of many cancers. Classically there are two types of endometrial cancer, endometrioid adenocarcinoma (EAC), or Type I, and uterine papillary serous carcinoma (UPSC), or Type II. However, the whole genome DNA methylation changes in these two classical types of endometrial...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-868

    authors: Zhang B,Xing X,Li J,Lowdon RF,Zhou Y,Lin N,Zhang B,Sundaram V,Chiappinelli KB,Hagemann IS,Mutch DG,Goodfellow PJ,Wang T

    更新日期:2014-10-06 00:00:00

  • Microarray analysis of Foxa2 mutant mouse embryos reveals novel gene expression and inductive roles for the gastrula organizer and its derivatives.

    abstract:BACKGROUND:The Spemann/Mangold organizer is a transient tissue critical for patterning the gastrula stage vertebrate embryo and formation of the three germ layers. Despite its important role during development, there are still relatively few genes with specific expression in the organizer and its derivatives. Foxa2 is ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-9-511

    authors: Tamplin OJ,Kinzel D,Cox BJ,Bell CE,Rossant J,Lickert H

    更新日期:2008-10-30 00:00:00

  • Digital genotyping of sorghum - a diverse plant species with a large repeat-rich genome.

    abstract:BACKGROUND:Rapid acquisition of accurate genotyping information is essential for all genetic marker-based studies. For species with relatively small genomes, complete genome resequencing is a feasible approach for genotyping; however, for species with large and highly repetitive genomes, the acquisition of whole genome...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-14-448

    authors: Morishige DT,Klein PE,Hilley JL,Sahraeian SM,Sharma A,Mullet JE

    更新日期:2013-07-05 00:00:00

  • Analysis of functional variants in mitochondrial DNA of Finnish athletes.

    abstract:BACKGROUND:We have previously reported on paucity of mitochondrial DNA (mtDNA) haplogroups J and K among Finnish endurance athletes. Here we aimed to further explore differences in mtDNA variants between elite endurance and sprint athletes. For this purpose, we determined the rate of functional variants and the mutatio...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6171-6

    authors: Kiiskilä J,Moilanen JS,Kytövuori L,Niemi AK,Majamaa K

    更新日期:2019-10-29 00:00:00

  • RNA-seq analysis provides insights into cold stress responses of Xanthomonas citri pv. citri.

    abstract:BACKGROUND:Xanthomonas citri pv. citri (Xcc) is a citrus canker causing Gram-negative bacteria. Currently, little is known about the biological and molecular responses of Xcc to low temperatures. RESULTS:Results depicted that low temperature significantly reduced growth and increased biofilm formation and unsaturated ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-019-6193-0

    authors: Liao JX,Li KH,Wang JP,Deng JR,Liu QG,Chang CQ

    更新日期:2019-11-06 00:00:00

  • Gene expression profiling of Spodoptera frugiperda hemocytes and fat body using cDNA microarray reveals polydnavirus-associated variations in lepidopteran host genes transcript levels.

    abstract:BACKGROUND:Genomic approaches provide unique opportunities to study interactions of insects with their pathogens. We developed a cDNA microarray to analyze the gene transcription profile of the lepidopteran pest Spodoptera frugiperda in response to injection of the polydnavirus HdIV associated with the ichneumonid wasp...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-7-160

    authors: Barat-Houari M,Hilliou F,Jousset FX,Sofer L,Deleury E,Rocher J,Ravallec M,Galibert L,Delobel P,Feyereisen R,Fournier P,Volkoff AN

    更新日期:2006-06-21 00:00:00

  • A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity.

    abstract:BACKGROUND:Analysis of single nucleotide polymorphisms (SNPs) derived from whole-genome studies allows for rapid evaluation of genome-wide diversity, and genomic epidemiology studies of Plasmodium falciparum provide insights into parasite population structure, gene flow, drug resistance and vaccine development. In area...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-15-719

    authors: Jacob CG,Tan JC,Miller BA,Tan A,Takala-Harrison S,Ferdig MT,Plowe CV

    更新日期:2014-08-26 00:00:00

  • Single feature polymorphism (SFP)-based selective sweep identification and association mapping of growth-related metabolic traits in Arabidopsis thaliana.

    abstract:BACKGROUND:Natural accessions of Arabidopsis thaliana are characterized by a high level of phenotypic variation that can be used to investigate the extent and mode of selection on the primary metabolic traits. A collection of 54 A. thaliana natural accession-derived lines were subjected to deep genotyping through Singl...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/1471-2164-11-188

    authors: Childs LH,Witucka-Wall H,Günther T,Sulpice R,Korff MV,Stitt M,Walther D,Schmid KJ,Altmann T

    更新日期:2010-03-20 00:00:00

  • Molecular mechanisms of an antimicrobial peptide piscidin (Lc-pis) in a parasitic protozoan, Cryptocaryon irritans.

    abstract:BACKGROUND:Cryptocaryon irritans is an obligate parasitic ciliate protozoan that can infect various commercially important mariculture fish species and cause high lethality and economic loss. Current methods of controlling this parasite with chemicals or antibiotics are widely considered to be environmentally harmful. ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-018-4565-5

    authors: Chen R,Mao Y,Wang J,Liu M,Qiao Y,Zheng L,Su Y,Ke Q,Zheng W

    更新日期:2018-03-12 00:00:00

  • Networking in microbes: conjugative elements and plasmids in the genus Alteromonas.

    abstract:BACKGROUND:To develop evolutionary models for the free living bacterium Alteromonas the genome sequences of isolates of the genus have been extensively analyzed. However, the main genetic exchange drivers in these microbes, conjugative elements (CEs), have not been considered in detail thus far. In this work, CEs have ...

    journal_title:BMC genomics

    pub_type: 杂志文章

    doi:10.1186/s12864-016-3461-0

    authors: López-Pérez M,Ramon-Marco N,Rodriguez-Valera F

    更新日期:2017-01-05 00:00:00