On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows.

Abstract:

:For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: https://github.com/heidefier/cluster_wgs_data.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Loehlein Fier H,Prokopenko D,Hecker J,Cho MH,Silverman EK,Weiss ST,Tanzi RE,Lange C

doi

10.1002/gepi.22040

subject

Has Abstract

pub_date

2017-05-01 00:00:00

pages

332-340

issue

4

eissn

0741-0395

issn

1098-2272

journal_volume

41

pub_type

杂志文章
  • Multivariate genetic analysis of apo AI concentration and HDL subfractions: evidence for major locus pleiotropy.

    abstract::A major locus influencing apolipoprotein AI (apo AI) serum levels was detected using data from the Donner Laboratory Family Study. This locus accounts for 46% of the phenotypic variability in apo AI levels. Multivariate segregation analysis revealed that this major locus also has significant pleiotropic effects on the...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100648

    authors: Blangero J,Williams-Blangero S,Mahaney MC

    更新日期:1993-01-01 00:00:00

  • APO B 3' HVR polymorphism in healthy population: relationships to serum lipid levels.

    abstract::We have analyzed allele frequency distribution at the hypervariable locus 3' to the apolipoprotein B gene in a healthy population sample (241 women and 246 men) from the Belgrade area. The bimodal distribution of sixteen different hypervariable region (HVR) alleles and the heterozygosity index (average 0.76) in both s...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:2<113::AID-GEPI1>3

    authors: Alavantić D,Glisić S,Kandić I

    更新日期:1998-01-01 00:00:00

  • Linkage analysis of Alzheimer's disease with methods using relative pairs.

    abstract::Four relative-pair methods for detecting genetic linkage were applied to familial Alzheimer's disease data. Results obtained using an extended Haseman-Elston test and a weighted rank pairwise correlation test, which both use information from all relative pairs, were consistent with previously published likelihood resu...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100608

    authors: Blossey H,Commenges D,Olson JM

    更新日期:1993-01-01 00:00:00

  • Phenotype validation in electronic health records based genetic association studies.

    abstract::The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in th...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22080

    authors: Wang L,Damrauer SM,Zhang H,Zhang AX,Xiao R,Moore JH,Chen J

    更新日期:2017-12-01 00:00:00

  • How can maximum likelihood methods reveal candidate gene effects on a quantitative trait?

    abstract::Different maximum likelihood approaches were used to explore the role of candidate genes in the variability of quantitative trait Q1 while accounting for the effects of age, Q2, and Q3. Segregation analysis, under the class D regressive model, provides evidence for a Mendelian gene effect on the adjusted trait Q1. Res...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120643

    authors: Martinez M,Abel L,Demenais F

    更新日期:1995-01-01 00:00:00

  • Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring.

    abstract::In diseases with a complex mode of inheritance, families with multiple affected individuals are difficult to ascertain. The haplotype sharing statistic (HSS) uses (hidden) co-ancestry between affected individuals from a founder population. These affected individuals will likely not only share the same mutation(s), but...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:6<915::AID-GEPI59>

    authors: Van der Meulen MA,te Meerman GJ

    更新日期:1997-01-01 00:00:00

  • Cancer risks to spouses and offspring in the Family-Cancer Database.

    abstract::It is generally accepted that cancer is caused by environmental and inherited factors but these are only partially identified. Family studies can be informative but they do not separate shared lifestyles and genes. We estimate familial risks for concordant cancers between spouses in common cancers of both sexes in ord...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/1098-2272(200102)20:2<247::AID-GEPI7>3.0.C

    authors: Hemminki K,Dong C,Vaittinen P

    更新日期:2001-02-01 00:00:00

  • Estimating the power of variance component linkage analysis in large pedigrees.

    abstract::Variance component linkage analysis is commonly used to map quantitative trait loci (QTLs) in general pedigrees. Large pedigrees are especially attractive for these studies because they provide greater power per genotyped individual than small pedigrees. We propose accurate and computationally efficient methods to cal...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20160

    authors: Chen WM,Abecasis GR

    更新日期:2006-09-01 00:00:00

  • Use of variable marker density, principal components, and neural networks in the dissection of disease etiology.

    abstract::Several approaches were taken to identify the loci contributing to the quantitative and qualitative phenotypes in the Genetic Analysis Workshop 12 simulated data set. To identify possible quantitative trait loci (QTL), the quantitative traits were analyzed using SOLAR. The four replicates identified as the "best repli...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s732

    authors: Pankratz N,Kirkwood SC,Flury L,Koller DL,Foroud T

    更新日期:2001-01-01 00:00:00

  • Defining the power limits of genome-wide association scan meta-analyses.

    abstract::Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits o...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20627

    authors: Chapman K,Ferreira T,Morris A,Asimit J,Zeggini E

    更新日期:2011-12-01 00:00:00

  • Increasing the power of identifying gene x gene interactions in genome-wide association studies.

    abstract::In this paper we investigate the power to identify gene x gene interactions in genome-wide association studies. In our analysis we focus on two-stage analyses: analyses in which we only test for interactions between single nucleotide polymorphisms that show some marginal effect. We give two algorithms to compute signi...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20300

    authors: Kooperberg C,Leblanc M

    更新日期:2008-04-01 00:00:00

  • Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits.

    abstract::The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the pr...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21901

    authors: Broadaway KA,Duncan R,Conneely KN,Almli LM,Bradley B,Ressler KJ,Epstein MP

    更新日期:2015-07-01 00:00:00

  • Major gene with sex-specific effects influences fat mass in Mexican Americans.

    abstract::Increased adiposity has repeatedly been identified as a major risk factor for a variety of chronic diseases. However, the question still remains whether the amount of adipose tissue itself is genetically mediated. To address this question, a segregation analysis, using maximum likelihood techniques as implemented in t...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120505

    authors: Comuzzie AG,Blangero J,Mahaney MC,Mitchell BD,Hixson JE,Samollow PB,Stern MP,MacCluer JW

    更新日期:1995-01-01 00:00:00

  • Progress toward resolving the possible linkage of multiple endocrine neoplasia type 2A to haptoglobin and group-specific loci: use of restriction fragment length polymorphisms extends exclusion region.

    abstract::In an earlier paper, positive but nonsignificant lod scores were found in pair-wise linkage tests between multiple endocrine neoplasia type 2A (MEN-2A) and both the haptoglobin (HP) locus on chromosome 16 and group-specific component (GC) locus on chromosome 4. Recently discovered restriction fragment length polymorph...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370030306

    authors: Kidd KK,Kidd JR,Castiglione CM,Pakstis AJ,Sparkes RS

    更新日期:1986-01-01 00:00:00

  • Mapping alcoholism genes using linkage/linkage disequilibrium analysis.

    abstract::Using a recently developed semiparametric method for combined linkage/linkage-disequilibrium analysis, we analyzed the Collaborative Study on the Genetics of Alcoholism data subset developed for Genetic Analysis Workshop 11 (GAW11). This semiparametric approach estimates recombination fractions for linkage, marker log...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170708

    authors: Aragaki C,Quiaoit F,Hsu L,Zhao LP

    更新日期:1999-01-01 00:00:00

  • Effect of polygenes on Xiong's transmission disequilibrium test of a QTL in nuclear families with multiple children.

    abstract::The transmission disequilibrium test (TDT), originally developed for mapping disease genes, has recently been extended to identify quantitative trait loci (QTL). For quantitative traits important for human health, generally multiple QTLs are involved. In the investigation of the statistical properties of the TDT, back...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1032

    authors: Deng HW,Li J,Recker RR

    更新日期:2001-11-01 00:00:00

  • Robustness of the unified model to shared environmental effects in the analysis of dichotomous traits.

    abstract::Simulation studies were conducted to assess to what extent the conclusions of segregation analysis, performed under the unified model, can be affected by the presence of unmeasured environmental factors shared by family members. Dichotomous data were generated on six-member nuclear families under two variants of the m...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370060140

    authors: Demenais F,Abel L

    更新日期:1989-01-01 00:00:00

  • Genetic heterogeneity in Alzheimer's disease: a grade of membership analysis.

    abstract::Grade of membership analysis (GoM) may have particular relevance for genetic epidemiology. The method can flexibly relate genetic markers, clinical features, and environmental exposures to possible subtypes of disease termed pure types even when population allele frequencies and penetrance functions are not known. Hen...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100628

    authors: Corder EH,Woodbury MA

    更新日期:1993-01-01 00:00:00

  • Inferential testing for linkage with GENEHUNTER-MODSCORE: the impact of the pedigree structure on the null distribution of multipoint MOD scores.

    abstract::The asymptotic distribution of [MOD] scores under the null hypothesis of no linkage is only known for affected sib pairs and other types of affected relative pairs. We have extended the GENEHUNTER-MODSCORE program to allow for simulations under the null hypothesis of no linkage to determine the empirical significance ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20264

    authors: Mattheisen M,Dietter J,Knapp M,Baur MP,Strauch K

    更新日期:2008-01-01 00:00:00

  • SimPEL: Simulation-based power estimation for sequencing studies of low-prevalence conditions.

    abstract::Power estimations are important for optimizing genotype-phenotype association study designs. However, existing frameworks are designed for common disorders, and thus ill-suited for the inherent challenges of studies for low-prevalence conditions such as rare diseases and infrequent adverse drug reactions. These challe...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22129

    authors: Mak L,Li M,Cao C,Gordon P,Tarailo-Graovac M,Bousman C,Wang P,Long Q

    更新日期:2018-07-01 00:00:00

  • Effect of linkage disequilibrium between markers in linkage and association analyses.

    abstract::Contributions to Group 17 of the Genetic Analysis Workshop 15 considered dense markers in linkage disequilibrium (LD) in the context of either linkage or association analysis. Three contributions reported on methods for modeling LD or selecting a subset of markers in linkage equilibrium to perform linkage analysis. Wh...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20291

    authors: Dupuis J,Albers K,Allen-Brady K,Cho K,Elston RC,Kappen HJ,Tang H,Thomas A,Thomson G,Tsung E,Yang Q,Zhang W,Zhao K,Zheng G,Ziegler JT

    更新日期:2007-01-01 00:00:00

  • Identifying SNPs predictive of phenotype using random forests.

    abstract::There has been a great interest and a few successes in the identification of complex disease susceptibility genes in recent years. Association studies, where a large number of single-nucleotide polymorphisms (SNPs) are typed in a sample of cases and controls to determine which genes are associated with a specific dise...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20041

    authors: Bureau A,Dupuis J,Falls K,Lunetta KL,Hayward B,Keith TP,Van Eerdewegh P

    更新日期:2005-02-01 00:00:00

  • Improving power in genome-wide association studies: weights tip the scale.

    abstract::The potential of genome-wide association analysis can only be realized when they have power to detect signals despite the detrimental effect of multiple testing on power. We develop a weighted multiple testing procedure that facilitates the input of prior information in the form of groupings of tests. For each group a...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20237

    authors: Roeder K,Devlin B,Wasserman L

    更新日期:2007-11-01 00:00:00

  • Analysis of bipolar disorder using affected relatives.

    abstract::We have analyzed the GAW10 data from several studies of bipolar affective disorder (BPAD) using the software packages SimIBD and SIMWALK2. SimIBD implements a simulation-based affected-pedigree-member (APM) statistic, called SimAPM, as well as an APM-like statistic, also called SimIBD, that measures identical-by-desce...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:6<605::AID-GEPI9>3

    authors: Davis S,Sobel E,Marinov M,Weeks DE

    更新日期:1997-01-01 00:00:00

  • Mantel statistics to correlate gene expression levels from microarrays with clinical covariates.

    abstract::Mantel statistics provide an additional step to standard approaches in the analysis of gene expression and covariate data, allow the calculation of standard statistics such as correlation, partial correlation, and regression coefficients, and, with permutation tests, provide P values for these statistics to relate the...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1115

    authors: Shannon WD,Watson MA,Perry A,Rich K

    更新日期:2002-06-01 00:00:00

  • Sib-pair linkage tests for disease susceptibility loci: common tests vs. the asymptotically most powerful test.

    abstract::Several statistical tests for linkage between a disease susceptibility locus and a marker locus for sib-pair data are examined analytically. Two common statistics, a test based on the mean number of marker alleles shared identical by descent by sib-pairs, and a test based on the proportion of sib-pairs sharing exactly...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370070506

    authors: Schaid DJ,Nick TG

    更新日期:1990-01-01 00:00:00

  • Association mapping, using a mixture model for complex traits.

    abstract::Association mapping for complex diseases using unrelated individuals can be more powerful than family-based analysis in many settings. In addition, this approach has major practical advantages, including greater efficiency in sample recruitment. Association mapping may lead to false-positive findings, however, if popu...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.210

    authors: Zhu X,Zhang S,Zhao H,Cooper RS

    更新日期:2002-08-01 00:00:00

  • A Bayesian integrative genomic model for pathway analysis of complex traits.

    abstract::With new technologies, multiple types of genomic data are commonly collected on a single set of samples. However, standard analysis methods concentrate on a single data type at a time and ignore the relationships between genes, proteins, and biochemical reactions that give rise to complex phenotypes. In this paper, we...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21628

    authors: Fridley BL,Lund S,Jenkins GD,Wang L

    更新日期:2012-05-01 00:00:00

  • Increased risk for familial ovarian cancer among Jewish women: a population-based case-control study.

    abstract::Jewish women have been reported to have a higher risk for familial breast cancer than non-Jewish women and to be more likely to carry mutations in breast cancer genes such as BRCA1. Because BRCA1 mutations also increase women's risk for ovarian cancer, we asked whether Jewish women are at higher risk for familial ovar...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:10.1002/(SICI)1098-2272(1998)15:1<51::AID-GEPI4>3.

    authors: Steinberg KK,Pernarelli JM,Marcus M,Khoury MJ,Schildkraut JM,Marchbanks PA

    更新日期:1998-01-01 00:00:00

  • Genome-wide approaches for identifying interacting susceptibility regions for asthma.

    abstract::A genome-wide correlation analysis and cluster analysis were utilized to determine chromosomal regions that had similar nonparametric linkage scores across families in order to locate interacting susceptibility loci for asthma. Conditional analysis was performed to detect any increase in lod score over baseline. Eight...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s266

    authors: Colilla S,Tsalenko A,Pluznikov A,Cox NJ

    更新日期:2001-01-01 00:00:00