SNP selection in genome-wide and candidate gene studies via penalized logistic regression.

Abstract:

:Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Ayers KL,Cordell HJ

doi

10.1002/gepi.20543

subject

Has Abstract

pub_date

2010-12-01 00:00:00

pages

879-91

issue

8

eissn

0741-0395

issn

1098-2272

journal_volume

34

pub_type

杂志文章
  • Multiethnic polygenic risk scores improve risk prediction in diverse populations.

    abstract::Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sam...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22083

    authors: Márquez-Luna C,Loh PR,South Asian Type 2 Diabetes (SAT2D) Consortium.,SIGMA Type 2 Diabetes Consortium.,Price AL

    更新日期:2017-12-01 00:00:00

  • The impact of improved microarray coverage and larger sample sizes on future genome-wide association studies.

    abstract::Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with complex traits. However, the genetic heritability of most of these traits remains unexplained. To help guide future studies, we address the crucial question of whether future GWAS can detect new SNP assoc...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21724

    authors: Lindquist KJ,Jorgenson E,Hoffmann TJ,Witte JS

    更新日期:2013-05-01 00:00:00

  • Familial aggregation of breast cancer with early onset lung cancer.

    abstract::Site-specific familial aggregation and evidence supporting Mendelian codominant inheritance have been shown in lung cancer. In characterizing lung cancer families, a number of other cancers have been observed. The current study evaluates whether first-degree relatives of early onset lung cancer cases are at increased ...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章

    doi:10.1002/(SICI)1098-2272(199911)17:4<274::AID-GEPI3

    authors: Schwartz AG,Siegfried JM,Weiss L

    更新日期:1999-11-01 00:00:00

  • Immunoglobulin allotyping (Gm, Km) of GAW5 families.

    abstract::The following Gm and Km immunoglobulin allotypes were determined on the Genetic Analysis Workshop 5 insulin-dependent diabetes mellitus (GAW5 IDDM) families: G1m (1,2,3,17), G2m (23), G3m (5,10,11,13,14,21,28) and Km (1,3). Since the allotype G2m (23) has been rarely studied, due to paucity of typing reagents, it was ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370060108

    authors: Field LL,Dugoujon JM

    更新日期:1989-01-01 00:00:00

  • Apolipoprotein E phenotype, arterial disease, and mortality among older women: the study of osteoporotic fractures.

    abstract::This study is an investigation of the relationship between apolipoprotein E (apoE) phenotype, arterial disease, and mortality in a group of women (n = 1,751) aged 65 years and older enrolled in the Study of Osteoporotic Fractures. Crude mortality rates were highest among women with the 4-3 and 4-4 phenotypes but age-a...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章,多中心研究

    doi:10.1002/(SICI)1098-2272(1997)14:2<147::AID-GEPI4>3

    authors: Vogt MT,Cauley JA,Kuller LH

    更新日期:1997-01-01 00:00:00

  • Analysis of twin data ascertained through probands: the double-entry approach.

    abstract::Twin pairs are sometimes included in studies because at least one of them is a proband, and conventionally the analysis of the data is based on the conditional distribution of the co twin given the proband. In the case of more than one proband in each pair, an often used "ad hoc" method of analysis is to allow each tw...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10253

    authors: Hindsberger C,Bryld LE

    更新日期:2003-11-01 00:00:00

  • GAW10: simulated family data for a common oligogenic disease with quantitative risk factors.

    abstract::GAW10 Problem 2 involves a simulated common disease defined by imposing a threshold, T, on a quantitative trait, Q1. Every individual with a value of Q1 > or = T (where T = 40) is defined as affected. Also thought to be associated with the disease as intervening variables are four other quantitative traits (Q2, Q3, Q4...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:6<737::AID-GEPI29>

    authors: MacCluer JW,Blangero J,Dyer TD,Speer MC

    更新日期:1997-01-01 00:00:00

  • Major gene with sex-specific effects influences fat mass in Mexican Americans.

    abstract::Increased adiposity has repeatedly been identified as a major risk factor for a variety of chronic diseases. However, the question still remains whether the amount of adipose tissue itself is genetically mediated. To address this question, a segregation analysis, using maximum likelihood techniques as implemented in t...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120505

    authors: Comuzzie AG,Blangero J,Mahaney MC,Mitchell BD,Hixson JE,Samollow PB,Stern MP,MacCluer JW

    更新日期:1995-01-01 00:00:00

  • Model-based linkage analysis with imprinting for quantitative traits: ignoring imprinting effects can severely jeopardize detection of linkage.

    abstract::Genes with imprinting (parent-of-origin) effects express differently when inheriting from the mother or from the father. Some genes for development and behavior in mammals are known to be imprinted. We developed parametric linkage analysis that accounts for imprinting effects for continuous traits, implementing it in ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20321

    authors: Sung YJ,Rao DC

    更新日期:2008-07-01 00:00:00

  • Phenotype validation in electronic health records based genetic association studies.

    abstract::The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in th...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22080

    authors: Wang L,Damrauer SM,Zhang H,Zhang AX,Xiao R,Moore JH,Chen J

    更新日期:2017-12-01 00:00:00

  • Bias in parameter estimates due to omitting gene-environment interaction terms in case-control studies.

    abstract::Genetic studies are continuing to generate volumes and variety of data that can be used to examine the genetic effects. Often the effect of a genetic variant varies by nongenetic measures, what is traditionally defined as gene-environment interaction (G×E). If the G×E term is neglected, estimates of the main effects c...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22154

    authors: Lobach I

    更新日期:2018-12-01 00:00:00

  • TDT with covariates and genomic screens with mod scores: their behavior on simulated data.

    abstract::We describe an extension to the TDT (transmission/disequilibrium test) which allows for more than two marker alleles and for covariates measured on the parent or offspring. We also describe a systematic genomic search where the mod score (maximized lod score) is computed for each marker under constraints on the popula...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120623

    authors: Rice JP,Neuman RJ,Hoshaw SL,Daw EW,Gu C

    更新日期:1995-01-01 00:00:00

  • Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees.

    abstract::In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant id...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21844

    authors: Saad M,Wijsman EM

    更新日期:2014-11-01 00:00:00

  • Measuring the inflation of the lod score due to its maximization over model parameter values in human linkage analysis.

    abstract::A computer-simulation method is presented for determining and correcting for the effect of maximizing the lod score over disease definitions, penetrance values, and perhaps other model parameters. The method consists of simulating the complete analysis using marker genotypes randomly generated under the assumption of ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370070402

    authors: Weeks DE,Lehner T,Squires-Wheeler E,Kaufmann C,Ott J

    更新日期:1990-01-01 00:00:00

  • Inferential testing for linkage with GENEHUNTER-MODSCORE: the impact of the pedigree structure on the null distribution of multipoint MOD scores.

    abstract::The asymptotic distribution of [MOD] scores under the null hypothesis of no linkage is only known for affected sib pairs and other types of affected relative pairs. We have extended the GENEHUNTER-MODSCORE program to allow for simulations under the null hypothesis of no linkage to determine the empirical significance ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20264

    authors: Mattheisen M,Dietter J,Knapp M,Baur MP,Strauch K

    更新日期:2008-01-01 00:00:00

  • Genome-wide association studies for discrete traits.

    abstract::Genome-wide association studies of discrete traits generally use simple methods of analysis based on chi(2) tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combin...

    journal_title:Genetic epidemiology

    pub_type:

    doi:10.1002/gepi.20465

    authors: Thomas DC

    更新日期:2009-01-01 00:00:00

  • A multimarker regression-based test of linkage for affected sib-pairs at two linked loci.

    abstract::We address the analytical problem of evaluating the evidence for linkage at a test locus while taking into account the effect of a known linked disease locus. The method we propose is a multimarker regression approach that models the identity-by-descent states for affected sib-pairs at a series of linked markers in te...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20137

    authors: Barber MJ,Todd JA,Cordell HJ

    更新日期:2006-04-01 00:00:00

  • Comparison of variance components, ANOVA and regression of offspring on midparent (ROMP) methods for SNP markers.

    abstract::An extension of the traditional regression of offspring on midparent (ROMP) method was used to estimate the heritability of the trait, test for marker association, and estimate the heritability attributable to a marker locus. The fifty replicates of the Genetic Analysis Workshop (GAW) 12 simulated general population d...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s794

    authors: Pugh EW,Papanicolaou GJ,Justice CM,Roy-Gagnon MH,Sorant AJ,Kingman A,Wilson AF

    更新日期:2001-01-01 00:00:00

  • Phenotypic effects of apolipoprotein structural variation on lipid profiles: II. Apolipoprotein A-IV and quantitative lipid measures in the healthy women study.

    abstract::Apolipoprotein A-IV (APO A-IV) is a major protein component of mesenteric lymph chylomicrons and very-low-density lipoproteins. It is found in plasma predominantly unassociated with major lipoprotein fractions and in high density lipoproteins. APO A-IV exhibits structural heterogeneity owing to two codominant alleles,...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370060404

    authors: Eichner JE,Kuller LH,Ferrell RE,Kamboh MI

    更新日期:1989-01-01 00:00:00

  • Univariate analysis of dichotomous or ordinal data from twin pairs: a simulation study comparing structural equation modeling and logistic regression.

    abstract::The univariate analysis of categorical twin data can be performed using either structural equation modeling (SEM) or logistic regression. This paper presents a comparison between these two methods using a simulation study. Dichotomous and ordinal (three category) twin data are simulated under two different sample size...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1996)13:1<79::AID-GEPI7>3.

    authors: Ramakrishnan V,Meyer JM,Goldberg J,Henderson WG

    更新日期:1996-01-01 00:00:00

  • Improving power for rare-variant tests by integrating external controls.

    abstract::Due to the drop in sequencing cost, the number of sequenced genomes is increasing rapidly. To improve power of rare-variant tests, these sequenced samples could be used as external control samples in addition to control samples from the study itself. However, when using external controls, possible batch effects due to...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22057

    authors: Lee S,Kim S,Fuchsberger C

    更新日期:2017-11-01 00:00:00

  • Epidemiologic analysis of gene-environment interaction in twins.

    abstract::Our aim was to develop a simple method for testing gene-environment interaction in twin data ascertained through affected twins (probands), with known exposure status of both cotwins. To this end we derived formulae for two epidemiologic measures, as a function of prevalence of an exposure and genotype, and disease ri...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370110108

    authors: Ottman R

    更新日期:1994-01-01 00:00:00

  • Using case-control designs for genome-wide screening for associations between genetic markers and disease susceptibility loci.

    abstract::We used a case-control design to scan the genome for any associations between genetic markers and disease susceptibility loci using the first two replicates of the Mycenaean population from the GAW11 (Problem 2) data. Using a case-control approach, we constructed a series of 2-by-3 tables for each allele of every mark...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.13701707128

    authors: Yang Q,Khoury MJ,Atkinson M,Sun F,Cheng R,Flanders WD

    更新日期:1999-01-01 00:00:00

  • A multipoint method for meta-analysis of genetic association studies.

    abstract::Meta-analyses of genetic association studies are usually performed using a single polymorphism at a time, even though in many cases the individual studies report results from partially overlapping sets of polymorphisms. We present here a multipoint (or multilocus) method for multivariate meta-analysis of published pop...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20531

    authors: Bagos PG,Liakopoulos TD

    更新日期:2010-11-01 00:00:00

  • Apolipoprotein E-epsilon 4 allele and familial risk in Alzheimer's disease.

    abstract::Recent studies have found an association between presence of apolipoprotein E (APOE) epsilon 4 allele and Alzheimer's disease (AD). The present study compared the cumulative risk of primary progressive dementia (PPD) in relatives of AD probands carrying at least one copy of the epsilon 4 allele with the relatives of A...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1996)13:3<285::AID-GEPI5>3

    authors: Li G,Silverman JM,Altstiel LD,Haroutunian V,Perl DP,Purohit D,Birstein S,Lantz M,Mohs RC,Davis KL

    更新日期:1996-01-01 00:00:00

  • Efficient strategy for detecting gene × gene joint action and its application in schizophrenia.

    abstract::We propose a new approach to detect gene × gene joint action in genome-wide association studies (GWASs) for case-control designs. This approach offers an exhaustive search for all two-way joint action (including, as a special case, single gene action) that is computationally feasible at the genome-wide level and has r...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21779

    authors: Won S,Kwon MS,Mattheisen M,Park S,Park C,Kihara D,Cichon S,Ophoff R,Nöthen MM,Rietschel M,Baur M,Uitterlinden AG,Hofmann A,GROUP Investigators.,Lange C

    更新日期:2014-01-01 00:00:00

  • Robust inference for variance components models in families ascertained through probands: I. Conditioning on proband's phenotype.

    abstract::A robust approach for estimating standard errors of variance components by using quantitative phenotypes from families ascertained through a proband with an extreme phenotypic value is presented. Estimators that use the multivariate normal distribution as a "working likelihood" are obtained by computing conditional ln...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370040305

    authors: Beaty TH,Liang KY

    更新日期:1987-01-01 00:00:00

  • POLARIS: Polygenic LD-adjusted risk score approach for set-based analysis of GWAS data.

    abstract::Polygenic risk scores (PRSs) are a method to summarize the additive trait variance captured by a set of SNPs, and can increase the power of set-based analyses by leveraging public genome-wide association study (GWAS) datasets. PRS aims to assess the genetic liability to some phenotype on the basis of polygenic risk fo...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22117

    authors: Baker E,Schmidt KM,Sims R,O'Donovan MC,Williams J,Holmans P,Escott-Price V,Consortium WTG

    更新日期:2018-06-01 00:00:00

  • Power and sample size calculations for SNP association studies with censored time-to-event outcomes.

    abstract::For many clinical studies in cancer, germline DNA is prospectively collected for the purpose of discovering or validating single-nucleotide polymorphisms (SNPs) associated with clinical outcomes. The primary clinical endpoint for many of these studies are time-to-event outcomes such as time of death or disease progres...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21645

    authors: Owzar K,Li Z,Cox N,Jung SH

    更新日期:2012-09-01 00:00:00

  • Segregation analysis of juvenile myoclonic epilepsy.

    abstract::We examined the inheritance of juvenile myoclonic epilepsy (JME). We looked at both the trait of "epilepsy" and the trait of "epilepsy-plus-EEG abnormalities," since EEG abnormalities are frequently found in the clinically unaffected sibs of JME patients. We tested several modes of inheritance including the fully pene...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370050204

    authors: Greenberg DA,Delgado-Escueta AV,Maldonado HM,Widelitz H

    更新日期:1988-01-01 00:00:00