An ensemble learning approach jointly modeling main and interaction effects in genetic association studies.

Abstract:

:Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for "base learners" and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P-value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single-marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi-locus methods in almost all cases. In an application to a large-scale case-control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi-locus effect (P-value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two-locus combinations showed significant two-locus interaction effects.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Zhang Z,Zhang S,Wong MY,Wareham NJ,Sha Q

doi

10.1002/gepi.20304

subject

Has Abstract

pub_date

2008-05-01 00:00:00

pages

285-300

issue

4

eissn

0741-0395

issn

1098-2272

journal_volume

32

pub_type

杂志文章
  • Linear trend tests for case-control genetic association that incorporate random phenotype and genotype misclassification error.

    abstract::The purpose of this work is the development of linear trend tests that allow for error (LTT ae), specifically incorporating double-sampling information on phenotypes and/or genotypes. We use a likelihood framework. Misclassification errors are estimated via double sampling. Unbiased estimates of penetrances and genoty...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20246

    authors: Gordon D,Haynes C,Yang Y,Kramer PL,Finch SJ

    更新日期:2007-12-01 00:00:00

  • Genome-wide detection and characterization of mating asymmetry in human populations.

    abstract::The study of the genetic component of early-onset diseases requires investigation into parental genetic effects, particularly those mediated by the mother who can influence the offspring's risk of disease through the effects of her genes acting directly on the intrauterine milieu or indirectly through maternal-gene ch...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20602

    authors: Bourgey M,Healy J,Saint-Onge P,Massé H,Sinnett D,Roy-Gagnon MH

    更新日期:2011-09-01 00:00:00

  • Relationship between body mass index, cigarette smoking, and plasma sex steroids in normal male twins.

    abstract::Smoking has been observed to affect plasma sex hormones and body mass index. The relationship between smoking, body mass index, and plasma concentration of sex hormones was studied in normal adult male twins. The analyses were performed for between 150 and 159 twin pairs for whom hormonal data were available on both t...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370060303

    authors: Meikle AW,Bishop DT,Stringham JD,Ford MH,West DW

    更新日期:1989-01-01 00:00:00

  • Estimation of a significance threshold for epigenome-wide association studies.

    abstract::Epigenome-wide association studies (EWAS) are designed to characterise population-level epigenetic differences across the genome and link them to disease. Most commonly, they assess DNA-methylation status at cytosine-guanine dinucleotide (CpG) sites, using platforms such as the Illumina 450k array that profile a subse...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22086

    authors: Saffari A,Silver MJ,Zavattari P,Moi L,Columbano A,Meaburn EL,Dudbridge F

    更新日期:2018-02-01 00:00:00

  • Power of non-parametric linkage analysis in mapping genes contributing to human longevity in long-lived sib-pairs.

    abstract::This report investigates the power issue in applying the non-parametric linkage analysis of affected sib-pairs (ASP) [Kruglyak and Lander, 1995: Am J Hum Genet 57:439-454] to localize genes that contribute to human longevity using long-lived sib-pairs. Data were simulated by introducing a recently developed statistica...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10304

    authors: Tan Q,Zhao JH,Iachine I,Hjelmborg J,Vach W,Vaupel JW,Christensen K,Kruse TA

    更新日期:2004-04-01 00:00:00

  • PANDA: Prioritization of autism-genes using network-based deep-learning approach.

    abstract::Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given a large number of possibilities. Thus, computational methods have seen increasing appli...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22282

    authors: Zhang Y,Chen Y,Hu T

    更新日期:2020-06-01 00:00:00

  • Major genetic effects on airway-parenchymal dysanapsis of the lung: the Humboldt family study.

    abstract::We examined familial resemblance and performed segregation analysis for the maximal expiratory flow rate at 50% of vital capacity (Vmax50) and the ratio of Vmax50 to forced vital capacity (FVC), based on data from 309 nuclear families with 1,045 individuals in the town of Humboldt, Saskatchewan, in 1993. Vmax50 is con...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1999)16:1<95::AID-GEPI8>3.

    authors: Chen Y,Dosman JA,Rennie DC,Lockinger LA

    更新日期:1999-01-01 00:00:00

  • A multipoint method for meta-analysis of genetic association studies.

    abstract::Meta-analyses of genetic association studies are usually performed using a single polymorphism at a time, even though in many cases the individual studies report results from partially overlapping sets of polymorphisms. We present here a multipoint (or multilocus) method for multivariate meta-analysis of published pop...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20531

    authors: Bagos PG,Liakopoulos TD

    更新日期:2010-11-01 00:00:00

  • Monte Carlo analysis on a large pedigree.

    abstract::Monte Carlo methods for linkage and segregation analysis are applied to the HGAR1 pedigree. To address these data, the methods are extended in several ways. The results are compared with those provided by PAP. ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100658

    authors: Thompson EA,Lin S,Olshen AB,Wijsman EM

    更新日期:1993-01-01 00:00:00

  • Adaptive testing for association between two random vectors in moderate to high dimensions.

    abstract::Testing for association between two random vectors is a common and important task in many fields, however, existing tests, such as Escoufier's RV test, are suitable only for low-dimensional data, not for high-dimensional data. In moderate to high dimensions, it is necessary to consider sparse signals, which are often ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22059

    authors: Xu Z,Xu G,Pan W,Alzheimer's Disease Neuroimaging Initiative.

    更新日期:2017-11-01 00:00:00

  • Genetic epidemiology of breast cancer: segregation analysis of 389 Icelandic pedigrees.

    abstract::A genetic epidemiologic investigation of breast cancer involving 389 breast cancer pedigrees including information on 14,721 individuals from the Icelandic population-based cancer registry is presented. Probands were women born in or after 1920 and reported to have breast cancer in the cancer registry. The average age...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(200001)18:1<81::AID-GEPI6>

    authors: Baffoe-Bonnie AB,Beaty TH,Bailey-Wilson JE,Kiemeney LA,Sigvaldason H,Olafsdóttir G,Tryggvadóttir L,Tulinius H

    更新日期:2000-01-01 00:00:00

  • eQuIPS: eQTL Analysis Using Informed Partitioning of SNPs - A Fully Bayesian Approach.

    abstract::We develop a Bayesian multi-SNP Markov chain Monte Carlo approach that allows published functional significance scores to objectively inform single nucleotide polymorphism (SNP) prior effect sizes in expression quantitative trait locus (eQTL) studies. We developed the Normal Gamma prior to allow the inclusion of funct...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21961

    authors: Boggis EM,Milo M,Walters K

    更新日期:2016-05-01 00:00:00

  • Estimating the power of variance component linkage analysis in large pedigrees.

    abstract::Variance component linkage analysis is commonly used to map quantitative trait loci (QTLs) in general pedigrees. Large pedigrees are especially attractive for these studies because they provide greater power per genotyped individual than small pedigrees. We propose accurate and computationally efficient methods to cal...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20160

    authors: Chen WM,Abecasis GR

    更新日期:2006-09-01 00:00:00

  • Modeling the HLA component in rheumatoid arthritis: sensitivity to DRB1 allele frequencies.

    abstract::Rheumatoid arthritis is an inflammatory disease for which positive associations have been described with some HLA-DRB1 alleles. The associated alleles share a similar amino acid sequence in the third hypervariable region, the shared epitope, but differ at position 71 and 86. It has been suggested that HLA susceptibili...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/1098-2272(200012)19:4<422::AID-GEPI12>3.0.

    authors: Tézenas du Montcel S,Reviron D,Genin E,Roudier J,Mercier P,Clerget-Darpoux F

    更新日期:2000-12-01 00:00:00

  • Integrative sparse principal component analysis of gene expression data.

    abstract::In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22089

    authors: Liu M,Fan X,Fang K,Zhang Q,Ma S

    更新日期:2017-12-01 00:00:00

  • Linkage analysis of asthma and atopy including models with genomic imprinting.

    abstract::Asthma and atopy are two closely related, common complex traits in which a number of genetic and environmental factors are suspected to play a role. We have performed parametric and nonparametric multi-marker linkage analysis for the Busselton data set, which is part of problem 1 of Genetic Analysis Workshop 12. In pa...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s204

    authors: Strauch K,Bogdanow M,Fimmers R,Baur MP,Wienker TF

    更新日期:2001-01-01 00:00:00

  • Genetic epidemiology of Menkes disease.

    abstract::Copper incorporation studies were performed on individuals from 58 pedigrees, comprising 140 sibships. As previously reported, there is considerable overlap between heterozygotes and normal homozygotes. Segregation analysis supports recessive inheritance of disease, with residual heritability for 64Cu uptake in cultur...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370030403

    authors: Horn N,Morton NE

    更新日期:1986-01-01 00:00:00

  • Trends in prenatal diagnosis of Down syndrome and other autosomal trisomies in Scotland 1990 to 1994, with associated cytogenetic and epidemiological findings.

    abstract::The present report summarizes findings on 670 cases of autosomal trisomy diagnosed in Scotland, with actual or expected dates of delivery in 1990 to 1994 inclusive. Cases were notified by cytogenetic service laboratories. There were 277 prenatal and 369 postnatal diagnoses and 24 spontaneous losses. Excluding the latt...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1999)16:2<179::AID-GEPI5>3

    authors: Carothers AD,Boyd E,Lowther G,Ellis PM,Couzin DA,Faed MJ,Robb A

    更新日期:1999-01-01 00:00:00

  • Comparison of the QTDT analysis for IgE in the CSGA data set.

    abstract::Over the past few years at least 13 transmission/disequilibrium test (TDT)-based tests have been developed for quantitative (Q) traits for the assessment of association or linkage in the presence of the other. A total of six of these QTDT methods were used to analyze log10IgE in the Collaborative Study on the Genetics...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s312

    authors: Page GP,Wilcox MA,Occhiuto J,Adak S,Neuberg D,Bajorunaite R,George V

    更新日期:2001-01-01 00:00:00

  • SimPEL: Simulation-based power estimation for sequencing studies of low-prevalence conditions.

    abstract::Power estimations are important for optimizing genotype-phenotype association study designs. However, existing frameworks are designed for common disorders, and thus ill-suited for the inherent challenges of studies for low-prevalence conditions such as rare diseases and infrequent adverse drug reactions. These challe...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22129

    authors: Mak L,Li M,Cao C,Gordon P,Tarailo-Graovac M,Bousman C,Wang P,Long Q

    更新日期:2018-07-01 00:00:00

  • Identification of gene-gene interactions in the presence of missing data using the multifactor dimensionality reduction method.

    abstract::Gene-gene interaction is believed to play an important role in understanding complex traits. Multifactor dimensionality reduction (MDR) was proposed by Ritchie et al. [2001. Am J Hum Genet 69:138-147] to identify multiple loci that simultaneously affect disease susceptibility. Although the MDR method has been widely u...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20416

    authors: Namkung J,Elston RC,Yang JM,Park T

    更新日期:2009-11-01 00:00:00

  • Sib-pair linkage tests for disease susceptibility loci: common tests vs. the asymptotically most powerful test.

    abstract::Several statistical tests for linkage between a disease susceptibility locus and a marker locus for sib-pair data are examined analytically. Two common statistics, a test based on the mean number of marker alleles shared identical by descent by sib-pairs, and a test based on the proportion of sib-pairs sharing exactly...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370070506

    authors: Schaid DJ,Nick TG

    更新日期:1990-01-01 00:00:00

  • Testing for association in SLE families.

    abstract::Systemic lupus erythematosus (SLE) is a complex disease which is partly determined by genetic factors which influence susceptibility to the disease phenotype. In this association study we try to define the high risk haplotypes which are responsible for this disease, together with other environmental factors. In many o...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370080607

    authors: Seuchter SA,Knapp M,Hartung K,Coldewey R,Kalden JR,Lakomek HJ,Peter HH,Deicher H,Baur MP

    更新日期:1991-01-01 00:00:00

  • Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics.

    abstract::In this study, we compare the statistical properties of a number of methods for estimating P-values for allele-sharing statistics in non-parametric linkage analysis. Some of the methods are based on the normality assumption, using different variance estimation methods, and others use simulation (gene-dropping) to find...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20177

    authors: Jung J,Weeks DE,Feingold E

    更新日期:2006-12-01 00:00:00

  • Evaluation of path analysis through computer simulation: effect of incorrectly assuming independent distribution of familial correlations.

    abstract::Path analysis of family data has been widely applied to resolve genetic and environmental patterns of familial resemblance. A prevalent statistical approach in path analysis has been, first, to estimate the familial correlations and, second, by assuming these estimates to be independently distributed, define a likelih...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370010305

    authors: McGue M,Wette R,Rao DC

    更新日期:1984-01-01 00:00:00

  • Epidemiologic analysis of gene-environment interaction in twins.

    abstract::Our aim was to develop a simple method for testing gene-environment interaction in twin data ascertained through affected twins (probands), with known exposure status of both cotwins. To this end we derived formulae for two epidemiologic measures, as a function of prevalence of an exposure and genotype, and disease ri...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370110108

    authors: Ottman R

    更新日期:1994-01-01 00:00:00

  • Use of variable marker density, principal components, and neural networks in the dissection of disease etiology.

    abstract::Several approaches were taken to identify the loci contributing to the quantitative and qualitative phenotypes in the Genetic Analysis Workshop 12 simulated data set. To identify possible quantitative trait loci (QTL), the quantitative traits were analyzed using SOLAR. The four replicates identified as the "best repli...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s732

    authors: Pankratz N,Kirkwood SC,Flury L,Koller DL,Foroud T

    更新日期:2001-01-01 00:00:00

  • Mapping alcoholism genes using linkage/linkage disequilibrium analysis.

    abstract::Using a recently developed semiparametric method for combined linkage/linkage-disequilibrium analysis, we analyzed the Collaborative Study on the Genetics of Alcoholism data subset developed for Genetic Analysis Workshop 11 (GAW11). This semiparametric approach estimates recombination fractions for linkage, marker log...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170708

    authors: Aragaki C,Quiaoit F,Hsu L,Zhao LP

    更新日期:1999-01-01 00:00:00

  • Genetic epidemiology of autosomal recessive spastic ataxia of Charlevoix-Saguenay in northeastern Quebec.

    abstract::Autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS) is a disorder that has an elevated frequency in Saguenay-Lac-St-Jean (SLSJ) and Charlevoix, two geographically isolated regions in the past of northeastern Quebec. The incidence at birth and the carrier rate in SLSJ were estimated at 1/1,932 liveborn i...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100103

    authors: De Braekeleer M,Giasson F,Mathieu J,Roy M,Bouchard JP,Morgan K

    更新日期:1993-01-01 00:00:00

  • Phenotype validation in electronic health records based genetic association studies.

    abstract::The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in th...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22080

    authors: Wang L,Damrauer SM,Zhang H,Zhang AX,Xiao R,Moore JH,Chen J

    更新日期:2017-12-01 00:00:00