A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction.

Abstract:

:Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1,000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1,000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1,000-fold permutation test.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Pattin KA,White BC,Barney N,Gui J,Nelson HH,Kelsey KT,Andrew AS,Karagas MR,Moore JH

doi

10.1002/gepi.20360

subject

Has Abstract

pub_date

2009-01-01 00:00:00

pages

87-94

issue

1

eissn

0741-0395

issn

1098-2272

journal_volume

33

pub_type

杂志文章
  • Method for calculating risk associated with family history of a disease.

    abstract::A method is described for estimating excess relative risks of a disease from familial factors. Beginning with population-based series of cases and controls, a cohort of each subject's relatives is formed and checked for disease against a population based registry. The disease experience of the cohort formed from each ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120306

    authors: Kerber RA

    更新日期:1995-01-01 00:00:00

  • Analysis of twin data ascertained through probands: the double-entry approach.

    abstract::Twin pairs are sometimes included in studies because at least one of them is a proband, and conventionally the analysis of the data is based on the conditional distribution of the co twin given the proband. In the case of more than one proband in each pair, an often used "ad hoc" method of analysis is to allow each tw...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10253

    authors: Hindsberger C,Bryld LE

    更新日期:2003-11-01 00:00:00

  • Haplotype variation and genotype imputation in African populations.

    abstract::Sub-Saharan Africa has been identified as the part of the world with the greatest human genetic diversity. This high level of diversity causes difficulties for genome-wide association (GWA) studies in African populations-for example, by reducing the accuracy of genotype imputation in African populations compared to no...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20626

    authors: Huang L,Jakobsson M,Pemberton TJ,Ibrahim M,Nyambo T,Omar S,Pritchard JK,Tishkoff SA,Rosenberg NA

    更新日期:2011-12-01 00:00:00

  • Adaptive testing for association between two random vectors in moderate to high dimensions.

    abstract::Testing for association between two random vectors is a common and important task in many fields, however, existing tests, such as Escoufier's RV test, are suitable only for low-dimensional data, not for high-dimensional data. In moderate to high dimensions, it is necessary to consider sparse signals, which are often ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22059

    authors: Xu Z,Xu G,Pan W,Alzheimer's Disease Neuroimaging Initiative.

    更新日期:2017-11-01 00:00:00

  • Quantitative allelic test--a fast test for very large association studies.

    abstract::Advances in high throughput technology have enabled the generation of unprecedented amounts of genomic data (e.g., next-generation sequence data, transcriptomics, metabolomics, and proteomics), which promises to unravel the genetic architecture of complex traits. These discoveries may lead to novel therapeutic targets...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21768

    authors: Lee SM,Karrison TG,Cox NJ,Im HK

    更新日期:2013-12-01 00:00:00

  • Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale.

    abstract::Logistic regression is the primary analysis tool for binary traits in genome-wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22276

    authors: German CA,Sinsheimer JS,Klimentidis YC,Zhou H,Zhou JJ

    更新日期:2020-04-01 00:00:00

  • Genetic analysis of IDDM: summary of GAW5 IDDM results.

    abstract::This paper summarizes the analyses by participants in the insulin-dependent diabetes mellitus (IDDM) component of Genetic Analysis Workshop 5 (GAW5). The data were obtained from 94 families with two or more IDDM sibs. Topics treated in the Workshop analysis included the following: methods for detecting associations an...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,评审

    doi:10.1002/gepi.1370060111

    authors: Spielman RS,Baur MP,Clerget-Darpoux F

    更新日期:1989-01-01 00:00:00

  • Linkage disequilibrium between DNA markers at the low-density lipoprotein receptor gene.

    abstract::We determined pairwise linkage disequilibria between 12 restriction fragment length polymorphism (RFLP) markers at or near the low-density lipoprotein receptor (LDLR) locus on chromosome 19p13.2-13.1 in 92 unrelated individuals. Of these 12 RFLPs, two were newly identified under a cosmid-based strategy designed to scr...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370070114

    authors: Hegele RA,Plaetke R,Lalouel JM

    更新日期:1990-01-01 00:00:00

  • Increased risk for familial ovarian cancer among Jewish women: a population-based case-control study.

    abstract::Jewish women have been reported to have a higher risk for familial breast cancer than non-Jewish women and to be more likely to carry mutations in breast cancer genes such as BRCA1. Because BRCA1 mutations also increase women's risk for ovarian cancer, we asked whether Jewish women are at higher risk for familial ovar...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:10.1002/(SICI)1098-2272(1998)15:1<51::AID-GEPI4>3.

    authors: Steinberg KK,Pernarelli JM,Marcus M,Khoury MJ,Schildkraut JM,Marchbanks PA

    更新日期:1998-01-01 00:00:00

  • Parental genotype reconstruction: applications of haplotype relative risk to incomplete parental data.

    abstract::Intended to resolve the problem of constructing a matched population-based control sample, haplotype relative risk techniques frequently suffer from loss of power for late-onset diseases due to unavailability of parental genotypes that are required to form parent-offspring pairs. However, much of this missing informat...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:5<471::AID-GEPI3>3

    authors: Martin RB,Alda M,MacLean CJ

    更新日期:1998-01-01 00:00:00

  • Evaluation of methods accounting for population structure with pedigree data and continuous outcomes.

    abstract::Methods to account for population structure (PS) in genome-wide association studies have been well developed in samples of unrelated individuals, but when a sample is composed of families, the task of finding and accounting for PS is not as straight forward. Family-based tests that condition on parental genotypes or t...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20590

    authors: Peloso GM,Dupuis J,Lunetta KL

    更新日期:2011-09-01 00:00:00

  • Improving power in genome-wide association studies: weights tip the scale.

    abstract::The potential of genome-wide association analysis can only be realized when they have power to detect signals despite the detrimental effect of multiple testing on power. We develop a weighted multiple testing procedure that facilitates the input of prior information in the form of groupings of tests. For each group a...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20237

    authors: Roeder K,Devlin B,Wasserman L

    更新日期:2007-11-01 00:00:00

  • Scope and strategies of genetic epidemiology: analysis of articles published in Genetic Epidemiology, 1984-1991.

    abstract::Genetic epidemiology is a relatively new discipline that seeks to unravel the role of genetic factors and their interactions with environmental factors in the etiology of diseases, using population and family study approaches. To characterize the overall direction and emphasis of research strategies used in this field...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100505

    authors: Khoury MJ,Beaty TH,Cohen BH

    更新日期:1993-01-01 00:00:00

  • Constructing meiotic maps with known error probability.

    abstract::We propose methods to construct meiotic gene maps while controlling the probability of a decision-error. First, a single step gene ordering procedure is presented whose decision-error probability is bounded above by a prespecified threshold. The bound for the error probability is valid under quite general circumstance...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1999)16:3<274::AID-GEPI4>3

    authors: Rogatko A,Babb J,Jordan H,Zacks S

    更新日期:1999-01-01 00:00:00

  • Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits.

    abstract::The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the pr...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21901

    authors: Broadaway KA,Duncan R,Conneely KN,Almli LM,Bradley B,Ressler KJ,Epstein MP

    更新日期:2015-07-01 00:00:00

  • A multipoint method for meta-analysis of genetic association studies.

    abstract::Meta-analyses of genetic association studies are usually performed using a single polymorphism at a time, even though in many cases the individual studies report results from partially overlapping sets of polymorphisms. We present here a multipoint (or multilocus) method for multivariate meta-analysis of published pop...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20531

    authors: Bagos PG,Liakopoulos TD

    更新日期:2010-11-01 00:00:00

  • Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics.

    abstract::In this study, we compare the statistical properties of a number of methods for estimating P-values for allele-sharing statistics in non-parametric linkage analysis. Some of the methods are based on the normality assumption, using different variance estimation methods, and others use simulation (gene-dropping) to find...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20177

    authors: Jung J,Weeks DE,Feingold E

    更新日期:2006-12-01 00:00:00

  • Genetic and environmental causes of variation in renal tubular handling of sodium and potassium: a twin study.

    abstract::We have conducted a study of renal sodium and potassium reabsorption in 205 pairs of twins on freely chosen diets; 89 of the subjects were studied on more than one occasion. Renal tubular sodium and potassium handling, as measured by the fractional excretions FENa and FEK, show repeatable differences between individua...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370020103

    authors: Whitfield JB,Martin NG

    更新日期:1985-01-01 00:00:00

  • Modelling the major histocompatibility complex susceptibility to RA using the MASC method.

    abstract::To explain the association between HLA-DRB1 gene and rheumatoid arthritis (RA), two main hypotheses have been proposed. The first, the shared epitope hypothesis, assumes a direct role of DRB1 in RA susceptibility. The second hypothesis assumes a recessive disease susceptibility gene in linkage disequilibrium with DRB1...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:4<419::AID-GEPI7>3

    authors: Génin E,Babron MC,McDermott MF,Mulcahy B,Waldron-Lynch F,Adams C,Clegg DO,Ward RH,Shanahan F,Molloy MG,O'Gara F,Clerget-Darpoux F

    更新日期:1998-01-01 00:00:00

  • Apolipoprotein E phenotype, arterial disease, and mortality among older women: the study of osteoporotic fractures.

    abstract::This study is an investigation of the relationship between apolipoprotein E (apoE) phenotype, arterial disease, and mortality in a group of women (n = 1,751) aged 65 years and older enrolled in the Study of Osteoporotic Fractures. Crude mortality rates were highest among women with the 4-3 and 4-4 phenotypes but age-a...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章,多中心研究

    doi:10.1002/(SICI)1098-2272(1997)14:2<147::AID-GEPI4>3

    authors: Vogt MT,Cauley JA,Kuller LH

    更新日期:1997-01-01 00:00:00

  • Direct genetic effects and their estimation from matched case-control data.

    abstract::In genetic association studies, a single marker is often associated with multiple, correlated phenotypes (e.g., obesity and cardiovascular disease, or nicotine dependence and lung cancer). A pervasive question is then whether that marker exerts independent effects on all phenotypes. In this paper, we address this ques...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21660

    authors: Berzuini C,Vansteelandt S,Foco L,Pastorino R,Bernardinelli L

    更新日期:2012-09-01 00:00:00

  • Two common polymorphisms in the APO A-IV coding gene: their evolution and linkage disequilibrium.

    abstract::Human apolipoprotein A-IV (APO A-IV) exhibits a common protein polymorphism detectable by isoelectric focusing (IEF) due to a single base substitution at codon 360 which replaces the frequently occurring glutamine residue (allele 1) with histidine (allele 2). Recently, sequence analysis of the APO A-IV coding region h...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370090503

    authors: Kamboh MI,Hamman RF,Ferrell RE

    更新日期:1992-01-01 00:00:00

  • Genetic epidemiology of Menkes disease.

    abstract::Copper incorporation studies were performed on individuals from 58 pedigrees, comprising 140 sibships. As previously reported, there is considerable overlap between heterozygotes and normal homozygotes. Segregation analysis supports recessive inheritance of disease, with residual heritability for 64Cu uptake in cultur...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370030403

    authors: Horn N,Morton NE

    更新日期:1986-01-01 00:00:00

  • Permutation-based adjustments for the significance of partial regression coefficients in microarray data analysis.

    abstract::The aim of this paper is to generalize permutation methods for multiple testing adjustment of significant partial regression coefficients in a linear regression model used for microarray data. Using a permutation method outlined by Anderson and Legendre [1999] and the permutation P-value adjustment from Simon et al. [...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20255

    authors: Wagner BD,Zerbe GO,Mexal S,Leonard SS

    更新日期:2008-01-01 00:00:00

  • Genetic association tests based on ranks (GATOR) for quantitative traits with and without censoring.

    abstract::Linkage disequilibrium mapping of quantitative traits is a powerful method for dissecting the genetic etiology of complex phenotypes. Quantitative traits, however, often exhibit characteristics that make their use problematic. For example, the distribution of the trait may be censored, highly skewed, or contaminated w...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20141

    authors: Allen AS,Martin ER,Qin X,Li YJ

    更新日期:2006-04-01 00:00:00

  • Power of the linkage test for a heterogeneous disorder due to two independent inherited causes: a simulation study.

    abstract::We have conducted a simulation study in small pedigrees to investigate the power to detect linkage and heterogeneity for a disorder due to either one of two independent disease loci. We have considered a highly polymorphic marker locus (PIC = 70%) linked to one disease locus and unlinked to the second. The power to de...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370070306

    authors: Martinez M,Goldin LR

    更新日期:1990-01-01 00:00:00

  • Conditional multipoint linkage analysis using affected sib pairs: an alternative approach.

    abstract::Recently, Liang et al. ([2001b] Genet. Epidemiol. 21:105-122) proposed a conditional approach to assess linkage evidence on the target region by incorporating linkage information from an unlinked (reference) region using allele shared IBD (identity-by-decent) from affected sib pairs. This is carried out by conditionin...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10305

    authors: Chiu YF,Liang KY

    更新日期:2004-02-01 00:00:00

  • Effect of linkage disequilibrium between markers in linkage and association analyses.

    abstract::Contributions to Group 17 of the Genetic Analysis Workshop 15 considered dense markers in linkage disequilibrium (LD) in the context of either linkage or association analysis. Three contributions reported on methods for modeling LD or selecting a subset of markers in linkage equilibrium to perform linkage analysis. Wh...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20291

    authors: Dupuis J,Albers K,Allen-Brady K,Cho K,Elston RC,Kappen HJ,Tang H,Thomas A,Thomson G,Tsung E,Yang Q,Zhang W,Zhao K,Zheng G,Ziegler JT

    更新日期:2007-01-01 00:00:00

  • Investigation of a candidate gene, environment, and G x E interaction using case-control and case-parent study designs.

    abstract::We investigated the independent contributions of a candidate gene and an environmental factor, and the presence of gene x environment (G x E) interaction, in the etiology of a disease in the Genetic Analysis Workshop (GAW) 12 problem 2 simulated data using a two-stage approach utilizing both case-control and case-pare...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s843

    authors: Norris JM,Selinger-Leneman H,Génin E

    更新日期:2001-01-01 00:00:00

  • Bayesian linkage and segregation analysis: factoring the problem.

    abstract::Complex segregation analysis and linkage methods are mathematical techniques for the genetic dissection of complex diseases. They are used to delineate complex modes of familial transmission and to localize putative disease susceptibility loci to specific chromosomal locations. The computational problem of Bayesian li...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/1098-2272(2000)19:1+<::AID-GEPI8>3.0.CO;2-

    authors: Matthysse S

    更新日期:2000-01-01 00:00:00