Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale.

Abstract:

:Logistic regression is the primary analysis tool for binary traits in genome-wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case-control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

German CA,Sinsheimer JS,Klimentidis YC,Zhou H,Zhou JJ

doi

10.1002/gepi.22276

subject

Has Abstract

pub_date

2020-04-01 00:00:00

pages

248-260

issue

3

eissn

0741-0395

issn

1098-2272

journal_volume

44

pub_type

杂志文章
  • Effect of physical activity on lipid levels in a population-based sample of men with and without the Arg192 variant of the human paraoxonase gene.

    abstract::The prevalence of cardiovascular risk factors in Gerona, Spain, is high for the low myocardial infarction incidence and mortality rates in the province. Physical activity is a protective factor against coronary heart disease. We investigated whether the genetic variants Q and R of the paraoxonase Gln-Arg 192 polymorph...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(200003)18:3<276::AID-GEPI6

    authors: Sentí M,Aubó C,Elosua R,Sala J,Tomás M,Marrugat J

    更新日期:2000-03-01 00:00:00

  • Presidential address: Six open questions to genetic epidemiologists.

    abstract::Given the rapid pace with which genomics and other -omics disciplines are evolving, it is sometimes necessary to shift down a gear to consider more general scientific questions. In this line, in my presidential address I formulate six questions for genetic epidemiologists to ponder on. These cover the areas of reprodu...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22191

    authors: König IR

    更新日期:2019-04-01 00:00:00

  • Case-only gene-environment interaction studies: when does association imply mechanistic interaction?

    abstract::Case-only studies are often used to identify interactions between a genetic factor and an environmental factor under the assumption both factors are independent in the population. However, interpreting a statistical association between the genetic and the environmental factors among the cases, as evidence of a mechani...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20484

    authors: VanderWeele TJ,Hernández-Díaz S,Hernán MA

    更新日期:2010-05-01 00:00:00

  • Scope and strategies of genetic epidemiology: analysis of articles published in Genetic Epidemiology, 1984-1991.

    abstract::Genetic epidemiology is a relatively new discipline that seeks to unravel the role of genetic factors and their interactions with environmental factors in the etiology of diseases, using population and family study approaches. To characterize the overall direction and emphasis of research strategies used in this field...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100505

    authors: Khoury MJ,Beaty TH,Cohen BH

    更新日期:1993-01-01 00:00:00

  • A new association test based on Chi-square partition for case-control GWA studies.

    abstract::In case-control genetic association studies, the robust procedure, Pearson's Chi-square test, is commonly used for testing association between disease status and genetic markers. However, this test does not take the possible trend of relative risks, which are due to genotype, into account. On the contrary, although Co...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20615

    authors: Chen Z

    更新日期:2011-11-01 00:00:00

  • Genetic epidemiology of autosomal recessive spastic ataxia of Charlevoix-Saguenay in northeastern Quebec.

    abstract::Autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS) is a disorder that has an elevated frequency in Saguenay-Lac-St-Jean (SLSJ) and Charlevoix, two geographically isolated regions in the past of northeastern Quebec. The incidence at birth and the carrier rate in SLSJ were estimated at 1/1,932 liveborn i...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100103

    authors: De Braekeleer M,Giasson F,Mathieu J,Roy M,Bouchard JP,Morgan K

    更新日期:1993-01-01 00:00:00

  • Genetic and environmental causes of variation in renal tubular handling of sodium and potassium: a twin study.

    abstract::We have conducted a study of renal sodium and potassium reabsorption in 205 pairs of twins on freely chosen diets; 89 of the subjects were studied on more than one occasion. Renal tubular sodium and potassium handling, as measured by the fractional excretions FENa and FEK, show repeatable differences between individua...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370020103

    authors: Whitfield JB,Martin NG

    更新日期:1985-01-01 00:00:00

  • Allelic association patterns for a dense SNP map.

    abstract::A dense set of 5,000 SNPs on a 10-Mb region of human chromosome 20 has been typed on samples of African Americans, East Asians, and United Kingdom Caucasians. There are departures from Hardy-Weinberg equilibrium beyond the level at which markers are often discarded because of possible genotyping errors. The observatio...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20038

    authors: Weir BS,Hill WG,Cardon LR,SNP Consortium.

    更新日期:2004-12-01 00:00:00

  • Efficient computation of patterned covariance matrix mixed models in quantitative segregation analysis.

    abstract::The use of patterned covariance matrices in forming pedigree-based mixed models for quantitative traits is discussed. It is suggested that patterned covariance matrix models provide intuitive, theoretically appealing, and flexible genetic modeling devices for pedigree data. It is suggested further that the very great ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370080104

    authors: Schork N

    更新日期:1991-01-01 00:00:00

  • Investigation of a candidate gene, environment, and G x E interaction using case-control and case-parent study designs.

    abstract::We investigated the independent contributions of a candidate gene and an environmental factor, and the presence of gene x environment (G x E) interaction, in the etiology of a disease in the Genetic Analysis Workshop (GAW) 12 problem 2 simulated data using a two-stage approach utilizing both case-control and case-pare...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s843

    authors: Norris JM,Selinger-Leneman H,Génin E

    更新日期:2001-01-01 00:00:00

  • Lifestyle and blood pressure levels in male twins in Utah.

    abstract::Healthy male monozygotic (MZ) and dizygotic (DZ) twin pairs (MZ pairs = 77; DZ pairs = 88) were studied to assess the effect of dietary intake, physical activity, physical fitness, body mass index (BMI), sum of the triceps and subscapular skinfold measurements, alcohol and caffeine consumption, and smoking patterns on...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370050409

    authors: Slattery ML,Bishop DT,French TK,Hunt SC,Meikle AW,Williams RR

    更新日期:1988-01-01 00:00:00

  • Maximum-likelihood estimation of haplotype frequencies in nuclear families.

    abstract::The importance of haplotype analysis in the context of association fine mapping of disease genes has grown steadily over the last years. Since experimental methods to determine haplotypes on a large scale are not available, phase has to be inferred statistically. For individual genotype data, several reconstruction te...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10323

    authors: Becker T,Knapp M

    更新日期:2004-07-01 00:00:00

  • Population-based family study designs: an interdisciplinary research framework for genetic epidemiology.

    abstract::Most complex traits such as cancer and coronary heart diseases are attributed either to heritable factors or to environmental factors or to both. Dissecting the genetic and environmental etiology of complex traits thus requires an interdisciplinary research strategy. Genetic studies generally involve families and inve...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,评审

    doi:10.1002/(SICI)1098-2272(1997)14:4<365::AID-GEPI3>3

    authors: Zhao LP,Hsu L,Davidov O,Potter J,Elston RC,Prentice RL

    更新日期:1997-01-01 00:00:00

  • Hierarchical Bayesian model for rare variant association analysis integrating genotype uncertainty in human sequence data.

    abstract::Next-generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21871

    authors: He L,Pitkäniemi J,Sarin AP,Salomaa V,Sillanpää MJ,Ripatti S

    更新日期:2015-02-01 00:00:00

  • Rank-based robust tests for quantitative-trait genetic association studies.

    abstract::Standard linear regression is commonly used for genetic association studies of quantitative traits. This approach may not be appropriate if the trait, on its original or transformed scales, does not follow a normal distribution. A rank-based nonparametric approach that does not rely on any distributional assumptions c...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21723

    authors: Li Q,Li Z,Zheng G,Gao G,Yu K

    更新日期:2013-05-01 00:00:00

  • An ensemble learning approach jointly modeling main and interaction effects in genetic association studies.

    abstract::Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20304

    authors: Zhang Z,Zhang S,Wong MY,Wareham NJ,Sha Q

    更新日期:2008-05-01 00:00:00

  • A flexible and parallelizable approach to genome-wide polygenic risk scores.

    abstract::The heritability of most complex traits is driven by variants throughout the genome. Consequently, polygenic risk scores, which combine information on multiple variants genome-wide, have demonstrated improved accuracy in genetic risk prediction. We present a new two-step approach to constructing genome-wide polygenic ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22245

    authors: Newcombe PJ,Nelson CP,Samani NJ,Dudbridge F

    更新日期:2019-10-01 00:00:00

  • Autoimmune thyroid disease in type I diabetic families.

    abstract::The prevalence rate for autoimmune thyroid disease (ATD) is about 30 times higher in the type I diabetic (IDDM) families that were ascertained for Genetic Analysis Workshop 5 (GAW5) than in the general population. Two approaches were used to study the clustering of ATD and IDDM in these families: 1) HLA haplotype shar...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370060126

    authors: Payami H,Joe S,Thomson G

    更新日期:1989-01-01 00:00:00

  • Linkage analysis in alcohol dependence.

    abstract::Alcohol dependence often is a familial disorder and has a genetic component. Research in causative factors of alcoholism is coordinated by a multi-center program, COGA [The Collaborative Study on the Genetics of Alcoholism, Begleiter et al., 1995]. We analyzed a subset of the COGA family sample, 84 pedigrees of Caucas...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170768

    authors: Windemuth C,Hahn A,Strauch K,Baur MP,Wienker TF

    更新日期:1999-01-01 00:00:00

  • Genetic prediction in the Genetic Analysis Workshop 18 sequencing data.

    abstract::High-throughput sequencing data can be used to predict phenotypes from genotypes, and this corresponds to establishing a prognostic model. In extended pedigrees the relatedness of subjects provides additional information so that genetic values, fixed or random genetic components, and heritability can be estimated. At ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21826

    authors: Ziegler A,Bohossian N,Diego VP,Yao C

    更新日期:2014-09-01 00:00:00

  • Genetic association with multiple traits in the presence of population stratification.

    abstract::Testing association between a genetic marker and multiple-dependent traits is a challenging task when both binary and quantitative traits are involved. The inverted regression model is a convenient method, in which the traits are treated as predictors although the genetic marker is an ordinal response. It is known tha...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21738

    authors: Yan T,Li Q,Li Y,Li Z,Zheng G

    更新日期:2013-09-01 00:00:00

  • Multivariate genetic analysis of apo AI concentration and HDL subfractions: evidence for major locus pleiotropy.

    abstract::A major locus influencing apolipoprotein AI (apo AI) serum levels was detected using data from the Donner Laboratory Family Study. This locus accounts for 46% of the phenotypic variability in apo AI levels. Multivariate segregation analysis revealed that this major locus also has significant pleiotropic effects on the...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100648

    authors: Blangero J,Williams-Blangero S,Mahaney MC

    更新日期:1993-01-01 00:00:00

  • Risk factors for atherosclerosis in twins.

    abstract::We performed multivariate genetic analyses of cardiovascular risk factors from two sets of data on US and Australian female twins. Similar models for body mass index (BMI), serum low density (LDL) and high density (HDL) lipoproteins, including age as a covariate, were fitted successfully to both groups. These suggeste...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100638

    authors: Duffy DL,O'Connell DL,Heller RF,Martin NG

    更新日期:1993-01-01 00:00:00

  • Major genetic effects on airway-parenchymal dysanapsis of the lung: the Humboldt family study.

    abstract::We examined familial resemblance and performed segregation analysis for the maximal expiratory flow rate at 50% of vital capacity (Vmax50) and the ratio of Vmax50 to forced vital capacity (FVC), based on data from 309 nuclear families with 1,045 individuals in the town of Humboldt, Saskatchewan, in 1993. Vmax50 is con...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1999)16:1<95::AID-GEPI8>3.

    authors: Chen Y,Dosman JA,Rennie DC,Lockinger LA

    更新日期:1999-01-01 00:00:00

  • Linkage analysis of candidate obesity genes among the Mexican-American population of Starr County, Texas.

    abstract::Recent advances in the molecular basis of body fat regulation have identified several genes in which genetic variation may influence obesity and related measures in human populations. Genes that have been shown to have a regulatory function in the control of body fat utilization, eating behavior, and/or metabolic rate...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1999)16:4<397::AID-GEPI6>3

    authors: Bray MS,Boerwinkle E,Hanis CL

    更新日期:1999-01-01 00:00:00

  • Comparison of the QTDT analysis for IgE in the CSGA data set.

    abstract::Over the past few years at least 13 transmission/disequilibrium test (TDT)-based tests have been developed for quantitative (Q) traits for the assessment of association or linkage in the presence of the other. A total of six of these QTDT methods were used to analyze log10IgE in the Collaborative Study on the Genetics...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s312

    authors: Page GP,Wilcox MA,Occhiuto J,Adak S,Neuberg D,Bajorunaite R,George V

    更新日期:2001-01-01 00:00:00

  • Identification of gene-gene interactions in the presence of missing data using the multifactor dimensionality reduction method.

    abstract::Gene-gene interaction is believed to play an important role in understanding complex traits. Multifactor dimensionality reduction (MDR) was proposed by Ritchie et al. [2001. Am J Hum Genet 69:138-147] to identify multiple loci that simultaneously affect disease susceptibility. Although the MDR method has been widely u...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20416

    authors: Namkung J,Elston RC,Yang JM,Park T

    更新日期:2009-11-01 00:00:00

  • Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics.

    abstract::In this study, we compare the statistical properties of a number of methods for estimating P-values for allele-sharing statistics in non-parametric linkage analysis. Some of the methods are based on the normality assumption, using different variance estimation methods, and others use simulation (gene-dropping) to find...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20177

    authors: Jung J,Weeks DE,Feingold E

    更新日期:2006-12-01 00:00:00

  • Genetic epidemiology with a capital "E".

    abstract::Three characteristics of genetic epidemiology that distinguish it from its parent disciplines are a focus on population-based research, a focus on the joint effects of genes and the environment, and the incorporation of the underlying biology of the disease into its conceptual models. These principles are illustrated ...

    journal_title:Genetic epidemiology

    pub_type:

    doi:10.1002/1098-2272(200012)19:4<289::AID-GEPI2>3.0.C

    authors: Thomas DC

    更新日期:2000-12-01 00:00:00

  • Monte Carlo analysis on a large pedigree.

    abstract::Monte Carlo methods for linkage and segregation analysis are applied to the HGAR1 pedigree. To address these data, the methods are extended in several ways. The results are compared with those provided by PAP. ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100658

    authors: Thompson EA,Lin S,Olshen AB,Wijsman EM

    更新日期:1993-01-01 00:00:00