Abstract:
:Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for.
journal_name
Genet Epidemioljournal_title
Genetic epidemiologyauthors
Ayers KL,Cordell HJdoi
10.1002/gepi.20543subject
Has Abstractpub_date
2010-12-01 00:00:00pages
879-91issue
8eissn
0741-0395issn
1098-2272journal_volume
34pub_type
杂志文章abstract::Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sam...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.22083
更新日期:2017-12-01 00:00:00
abstract::Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with complex traits. However, the genetic heritability of most of these traits remains unexplained. To help guide future studies, we address the crucial question of whether future GWAS can detect new SNP assoc...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.21724
更新日期:2013-05-01 00:00:00
abstract::Site-specific familial aggregation and evidence supporting Mendelian codominant inheritance have been shown in lung cancer. In characterizing lung cancer families, a number of other cancers have been observed. The current study evaluates whether first-degree relatives of early onset lung cancer cases are at increased ...
journal_title:Genetic epidemiology
pub_type: 临床试验,杂志文章
doi:10.1002/(SICI)1098-2272(199911)17:4<274::AID-GEPI3
更新日期:1999-11-01 00:00:00
abstract::The following Gm and Km immunoglobulin allotypes were determined on the Genetic Analysis Workshop 5 insulin-dependent diabetes mellitus (GAW5 IDDM) families: G1m (1,2,3,17), G2m (23), G3m (5,10,11,13,14,21,28) and Km (1,3). Since the allotype G2m (23) has been rarely studied, due to paucity of typing reagents, it was ...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370060108
更新日期:1989-01-01 00:00:00
abstract::This study is an investigation of the relationship between apolipoprotein E (apoE) phenotype, arterial disease, and mortality in a group of women (n = 1,751) aged 65 years and older enrolled in the Study of Osteoporotic Fractures. Crude mortality rates were highest among women with the 4-3 and 4-4 phenotypes but age-a...
journal_title:Genetic epidemiology
pub_type: 临床试验,杂志文章,多中心研究
doi:10.1002/(SICI)1098-2272(1997)14:2<147::AID-GEPI4>3
更新日期:1997-01-01 00:00:00
abstract::Twin pairs are sometimes included in studies because at least one of them is a proband, and conventionally the analysis of the data is based on the conditional distribution of the co twin given the proband. In the case of more than one proband in each pair, an often used "ad hoc" method of analysis is to allow each tw...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.10253
更新日期:2003-11-01 00:00:00
abstract::GAW10 Problem 2 involves a simulated common disease defined by imposing a threshold, T, on a quantitative trait, Q1. Every individual with a value of Q1 > or = T (where T = 40) is defined as affected. Also thought to be associated with the disease as intervening variables are four other quantitative traits (Q2, Q3, Q4...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/(SICI)1098-2272(1997)14:6<737::AID-GEPI29>
更新日期:1997-01-01 00:00:00
abstract::Increased adiposity has repeatedly been identified as a major risk factor for a variety of chronic diseases. However, the question still remains whether the amount of adipose tissue itself is genetically mediated. To address this question, a segregation analysis, using maximum likelihood techniques as implemented in t...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370120505
更新日期:1995-01-01 00:00:00
abstract::Genes with imprinting (parent-of-origin) effects express differently when inheriting from the mother or from the father. Some genes for development and behavior in mammals are known to be imprinted. We developed parametric linkage analysis that accounts for imprinting effects for continuous traits, implementing it in ...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.20321
更新日期:2008-07-01 00:00:00
abstract::The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in th...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.22080
更新日期:2017-12-01 00:00:00
abstract::Genetic studies are continuing to generate volumes and variety of data that can be used to examine the genetic effects. Often the effect of a genetic variant varies by nongenetic measures, what is traditionally defined as gene-environment interaction (G×E). If the G×E term is neglected, estimates of the main effects c...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.22154
更新日期:2018-12-01 00:00:00
abstract::We describe an extension to the TDT (transmission/disequilibrium test) which allows for more than two marker alleles and for covariates measured on the parent or offspring. We also describe a systematic genomic search where the mod score (maximized lod score) is computed for each marker under constraints on the popula...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370120623
更新日期:1995-01-01 00:00:00
abstract::In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant id...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.21844
更新日期:2014-11-01 00:00:00
abstract::A computer-simulation method is presented for determining and correcting for the effect of maximizing the lod score over disease definitions, penetrance values, and perhaps other model parameters. The method consists of simulating the complete analysis using marker genotypes randomly generated under the assumption of ...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370070402
更新日期:1990-01-01 00:00:00
abstract::The asymptotic distribution of [MOD] scores under the null hypothesis of no linkage is only known for affected sib pairs and other types of affected relative pairs. We have extended the GENEHUNTER-MODSCORE program to allow for simulations under the null hypothesis of no linkage to determine the empirical significance ...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.20264
更新日期:2008-01-01 00:00:00
abstract::Genome-wide association studies of discrete traits generally use simple methods of analysis based on chi(2) tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combin...
journal_title:Genetic epidemiology
pub_type:
doi:10.1002/gepi.20465
更新日期:2009-01-01 00:00:00
abstract::We address the analytical problem of evaluating the evidence for linkage at a test locus while taking into account the effect of a known linked disease locus. The method we propose is a multimarker regression approach that models the identity-by-descent states for affected sib-pairs at a series of linked markers in te...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.20137
更新日期:2006-04-01 00:00:00
abstract::An extension of the traditional regression of offspring on midparent (ROMP) method was used to estimate the heritability of the trait, test for marker association, and estimate the heritability attributable to a marker locus. The fifty replicates of the Genetic Analysis Workshop (GAW) 12 simulated general population d...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.2001.21.s1.s794
更新日期:2001-01-01 00:00:00
abstract::Apolipoprotein A-IV (APO A-IV) is a major protein component of mesenteric lymph chylomicrons and very-low-density lipoproteins. It is found in plasma predominantly unassociated with major lipoprotein fractions and in high density lipoproteins. APO A-IV exhibits structural heterogeneity owing to two codominant alleles,...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370060404
更新日期:1989-01-01 00:00:00
abstract::The univariate analysis of categorical twin data can be performed using either structural equation modeling (SEM) or logistic regression. This paper presents a comparison between these two methods using a simulation study. Dichotomous and ordinal (three category) twin data are simulated under two different sample size...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/(SICI)1098-2272(1996)13:1<79::AID-GEPI7>3.
更新日期:1996-01-01 00:00:00
abstract::Due to the drop in sequencing cost, the number of sequenced genomes is increasing rapidly. To improve power of rare-variant tests, these sequenced samples could be used as external control samples in addition to control samples from the study itself. However, when using external controls, possible batch effects due to...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.22057
更新日期:2017-11-01 00:00:00
abstract::Our aim was to develop a simple method for testing gene-environment interaction in twin data ascertained through affected twins (probands), with known exposure status of both cotwins. To this end we derived formulae for two epidemiologic measures, as a function of prevalence of an exposure and genotype, and disease ri...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370110108
更新日期:1994-01-01 00:00:00
abstract::We used a case-control design to scan the genome for any associations between genetic markers and disease susceptibility loci using the first two replicates of the Mycenaean population from the GAW11 (Problem 2) data. Using a case-control approach, we constructed a series of 2-by-3 tables for each allele of every mark...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.13701707128
更新日期:1999-01-01 00:00:00
abstract::Meta-analyses of genetic association studies are usually performed using a single polymorphism at a time, even though in many cases the individual studies report results from partially overlapping sets of polymorphisms. We present here a multipoint (or multilocus) method for multivariate meta-analysis of published pop...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.20531
更新日期:2010-11-01 00:00:00
abstract::Recent studies have found an association between presence of apolipoprotein E (APOE) epsilon 4 allele and Alzheimer's disease (AD). The present study compared the cumulative risk of primary progressive dementia (PPD) in relatives of AD probands carrying at least one copy of the epsilon 4 allele with the relatives of A...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/(SICI)1098-2272(1996)13:3<285::AID-GEPI5>3
更新日期:1996-01-01 00:00:00
abstract::We propose a new approach to detect gene × gene joint action in genome-wide association studies (GWASs) for case-control designs. This approach offers an exhaustive search for all two-way joint action (including, as a special case, single gene action) that is computationally feasible at the genome-wide level and has r...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.21779
更新日期:2014-01-01 00:00:00
abstract::A robust approach for estimating standard errors of variance components by using quantitative phenotypes from families ascertained through a proband with an extreme phenotypic value is presented. Estimators that use the multivariate normal distribution as a "working likelihood" are obtained by computing conditional ln...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370040305
更新日期:1987-01-01 00:00:00
abstract::Polygenic risk scores (PRSs) are a method to summarize the additive trait variance captured by a set of SNPs, and can increase the power of set-based analyses by leveraging public genome-wide association study (GWAS) datasets. PRS aims to assess the genetic liability to some phenotype on the basis of polygenic risk fo...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.22117
更新日期:2018-06-01 00:00:00
abstract::For many clinical studies in cancer, germline DNA is prospectively collected for the purpose of discovering or validating single-nucleotide polymorphisms (SNPs) associated with clinical outcomes. The primary clinical endpoint for many of these studies are time-to-event outcomes such as time of death or disease progres...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.21645
更新日期:2012-09-01 00:00:00
abstract::We examined the inheritance of juvenile myoclonic epilepsy (JME). We looked at both the trait of "epilepsy" and the trait of "epilepsy-plus-EEG abnormalities," since EEG abnormalities are frequently found in the clinically unaffected sibs of JME patients. We tested several modes of inheritance including the fully pene...
journal_title:Genetic epidemiology
pub_type: 杂志文章
doi:10.1002/gepi.1370050204
更新日期:1988-01-01 00:00:00