Multiethnic polygenic risk scores improve risk prediction in diverse populations.

Abstract:

:Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sample size or training data from the target population in small sample size, but not both. Here, we introduce a multiethnic polygenic risk score that combines training data from European samples and training data from the target population. We applied this approach to predict type 2 diabetes (T2D) in a Latino cohort using both publicly available European summary statistics in large sample size (Neff  = 40k) and Latino training data in small sample size (Neff  = 8k). Here, we attained a >70% relative improvement in prediction accuracy (from R2  = 0.027 to 0.047) compared to methods that use only one source of training data, consistent with large relative improvements in simulations. We observed a systematically lower load of T2D risk alleles in Latino individuals with more European ancestry, which could be explained by polygenic selection in ancestral European and/or Native American populations. We predict T2D in a South Asian UK Biobank cohort using European (Neff  = 40k) and South Asian (Neff  = 16k) training data and attained a >70% relative improvement in prediction accuracy, and application to predict height in an African UK Biobank cohort using European (N = 113k) and African (N = 2k) training data attained a 30% relative improvement. Our work reduces the gap in polygenic risk prediction accuracy between European and non-European target populations.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Márquez-Luna C,Loh PR,South Asian Type 2 Diabetes (SAT2D) Consortium.,SIGMA Type 2 Diabetes Consortium.,Price AL

doi

10.1002/gepi.22083

subject

Has Abstract

pub_date

2017-12-01 00:00:00

pages

811-823

issue

8

eissn

0741-0395

issn

1098-2272

journal_volume

41

pub_type

杂志文章
  • Genetic association with multiple traits in the presence of population stratification.

    abstract::Testing association between a genetic marker and multiple-dependent traits is a challenging task when both binary and quantitative traits are involved. The inverted regression model is a convenient method, in which the traits are treated as predictors although the genetic marker is an ordinal response. It is known tha...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21738

    authors: Yan T,Li Q,Li Y,Li Z,Zheng G

    更新日期:2013-09-01 00:00:00

  • Optimizing the power of genome-wide association studies by using publicly available reference samples to expand the control group.

    abstract::Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousan...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20482

    authors: Zhuang JJ,Zondervan K,Nyberg F,Harbron C,Jawaid A,Cardon LR,Barratt BJ,Morris AP

    更新日期:2010-05-01 00:00:00

  • eQuIPS: eQTL Analysis Using Informed Partitioning of SNPs - A Fully Bayesian Approach.

    abstract::We develop a Bayesian multi-SNP Markov chain Monte Carlo approach that allows published functional significance scores to objectively inform single nucleotide polymorphism (SNP) prior effect sizes in expression quantitative trait locus (eQTL) studies. We developed the Normal Gamma prior to allow the inclusion of funct...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21961

    authors: Boggis EM,Milo M,Walters K

    更新日期:2016-05-01 00:00:00

  • Lifestyle and blood pressure levels in male twins in Utah.

    abstract::Healthy male monozygotic (MZ) and dizygotic (DZ) twin pairs (MZ pairs = 77; DZ pairs = 88) were studied to assess the effect of dietary intake, physical activity, physical fitness, body mass index (BMI), sum of the triceps and subscapular skinfold measurements, alcohol and caffeine consumption, and smoking patterns on...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370050409

    authors: Slattery ML,Bishop DT,French TK,Hunt SC,Meikle AW,Williams RR

    更新日期:1988-01-01 00:00:00

  • Analysis of twin data ascertained through probands: the double-entry approach.

    abstract::Twin pairs are sometimes included in studies because at least one of them is a proband, and conventionally the analysis of the data is based on the conditional distribution of the co twin given the proband. In the case of more than one proband in each pair, an often used "ad hoc" method of analysis is to allow each tw...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10253

    authors: Hindsberger C,Bryld LE

    更新日期:2003-11-01 00:00:00

  • Presidential address: Six open questions to genetic epidemiologists.

    abstract::Given the rapid pace with which genomics and other -omics disciplines are evolving, it is sometimes necessary to shift down a gear to consider more general scientific questions. In this line, in my presidential address I formulate six questions for genetic epidemiologists to ponder on. These cover the areas of reprodu...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22191

    authors: König IR

    更新日期:2019-04-01 00:00:00

  • Detecting interactions between gene, site, and environmental variables using GAP.

    abstract::Regressive models that incorporate measured variables and assumed genetic parameters were used to detect interactions between gene, research site, and environmental variables in GAW11 Problem 2. Replicates 1 to 5 were used in the analyses. Significant three-way gene x environment x site interactions were seen for all ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.13701707118

    authors: Shin J,Corey M

    更新日期:1999-01-01 00:00:00

  • Pooling data and linkage analysis in the chromosome 5q candidate region for asthma.

    abstract::We investigated a variety of methods for pooling data from eight data sets (n = 5,424 subjects) to validate evidence for linkage of markers in the cytokine cluster on chromosome 5q31-33 to asthma and asthma-associated phenotypes. Chromosome 5 markers were integrated into current genetic linkage and physical maps, and ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,meta分析

    doi:10.1002/gepi.2001.21.s1.s103

    authors: Jacobs KB,Burton PR,Iyengar SK,Elston RC,Palmer LJ

    更新日期:2001-01-01 00:00:00

  • Conditional multipoint linkage analysis using affected sib pairs: an alternative approach.

    abstract::Recently, Liang et al. ([2001b] Genet. Epidemiol. 21:105-122) proposed a conditional approach to assess linkage evidence on the target region by incorporating linkage information from an unlinked (reference) region using allele shared IBD (identity-by-decent) from affected sib pairs. This is carried out by conditionin...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10305

    authors: Chiu YF,Liang KY

    更新日期:2004-02-01 00:00:00

  • Multipoint linkage mapping using sibpairs: non-parametric estimation of trait effects with quantitative covariates.

    abstract::Multipoint linkage analysis using sibpair designs remains a common approach to help investigators to narrow chromosomal regions for traits (either qualitative or quantitative) of interest. Despite its popularity, the success of this approach depends heavily on how issues such as genetic heterogeneity, gene-gene, and g...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20036

    authors: Chiou JM,Liang KY,Chiu YF

    更新日期:2005-01-01 00:00:00

  • Regressive logistic modeling of familial aggregation for asthma in 7,394 population-based nuclear families.

    abstract::The aim of this population-based study was to determine whether asthma aggregates in families, and if so, whether aggregation was consistent with environmental and/or genetic etiologies. Data were from 7,394 nuclear families (41,506 individuals) from the 1968 Tasmanian Asthma Survey, in which all Tasmanian schoolchild...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:3<317::AID-GEPI9>3

    authors: Jenkins MA,Hopper JL,Giles GG

    更新日期:1997-01-01 00:00:00

  • Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment.

    abstract::Population stratification (PS) can lead to an inflated rate of false-positive findings in genome-wide association studies (GWAS). The commonly used approach of adjustment for a fixed number of principal components (PCs) could have a deleterious impact on power when selected PCs are equally distributed in cases and con...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20396

    authors: Li Q,Wacholder S,Hunter DJ,Hoover RN,Chanock S,Thomas G,Yu K

    更新日期:2009-07-01 00:00:00

  • Variance component models for X-linked QTLs.

    abstract::This paper discusses the theory and implementation of a model for mapping X-linked quantitative trait loci (QTL). As a result of X inactivation, a female's body is subdivided into a number of patches. In each patch one of her two X chromosomes is randomly switched off. This smooths the allelic contributions in a heter...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20158

    authors: Lange K,Sobel E

    更新日期:2006-07-01 00:00:00

  • Evaluation of path analysis through computer simulation: effect of incorrectly assuming independent distribution of familial correlations.

    abstract::Path analysis of family data has been widely applied to resolve genetic and environmental patterns of familial resemblance. A prevalent statistical approach in path analysis has been, first, to estimate the familial correlations and, second, by assuming these estimates to be independently distributed, define a likelih...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370010305

    authors: McGue M,Wette R,Rao DC

    更新日期:1984-01-01 00:00:00

  • Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: application to searching for gene × gene patterns.

    abstract::We construct data exploration tools for recognizing important covariate patterns associated with a phenotype, with particular focus on searching for association with gene-gene patterns. To this end, we propose a new variable selection procedure that employs latent selection weights and compare it to an alternative for...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21661

    authors: Papathomas M,Molitor J,Hoggart C,Hastie D,Richardson S

    更新日期:2012-09-01 00:00:00

  • Tests for gene-environment interaction from case-control data: a novel study of type I error, power and designs.

    abstract::To evaluate the risk of a disease associated with the joint effects of genetic susceptibility and environmental exposures, epidemiologic researchers often test for non-multiplicative gene-environment effects from case-control studies. In this article, we present a comparative study of four alternative tests for intera...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20337

    authors: Mukherjee B,Ahn J,Gruber SB,Rennert G,Moreno V,Chatterjee N

    更新日期:2008-11-01 00:00:00

  • Model-based linkage analysis with imprinting for quantitative traits: ignoring imprinting effects can severely jeopardize detection of linkage.

    abstract::Genes with imprinting (parent-of-origin) effects express differently when inheriting from the mother or from the father. Some genes for development and behavior in mammals are known to be imprinted. We developed parametric linkage analysis that accounts for imprinting effects for continuous traits, implementing it in ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20321

    authors: Sung YJ,Rao DC

    更新日期:2008-07-01 00:00:00

  • Design of artificial neural network and its applications to the analysis of alcoholism data.

    abstract::Artificial neural networks were applied to the alcoholism data to reveal nonlinear relationships between intermediate phenotypes, marker identity-by-descent sharing, and the affection status. A variable number of hidden units were considered to achieve a balance between the minimal mean-squared error and over-fitting ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170738

    authors: Li W,Haghighi F,Falk CT

    更新日期:1999-01-01 00:00:00

  • Two common polymorphisms in the APO A-IV coding gene: their evolution and linkage disequilibrium.

    abstract::Human apolipoprotein A-IV (APO A-IV) exhibits a common protein polymorphism detectable by isoelectric focusing (IEF) due to a single base substitution at codon 360 which replaces the frequently occurring glutamine residue (allele 1) with histidine (allele 2). Recently, sequence analysis of the APO A-IV coding region h...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370090503

    authors: Kamboh MI,Hamman RF,Ferrell RE

    更新日期:1992-01-01 00:00:00

  • Genetic analysis of a complex disease in the presence of an environmental risk factor.

    abstract::The role of a gene in a disease may be hidden by the presence of another risk factor such as an environmental factor. In that case, stratifying the data according to this factor strengthens power to detect linkage or association. We followed this strategy on the simulated data provided by GAW11. The transmission/diseq...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170788

    authors: Eichenbaum-Voline S,Baur MP,Knapp M

    更新日期:1999-01-01 00:00:00

  • APO B 3' HVR polymorphism in healthy population: relationships to serum lipid levels.

    abstract::We have analyzed allele frequency distribution at the hypervariable locus 3' to the apolipoprotein B gene in a healthy population sample (241 women and 246 men) from the Belgrade area. The bimodal distribution of sixteen different hypervariable region (HVR) alleles and the heterozygosity index (average 0.76) in both s...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:2<113::AID-GEPI1>3

    authors: Alavantić D,Glisić S,Kandić I

    更新日期:1998-01-01 00:00:00

  • New simple tests for age-at-onset anticipation: application to panic disorder.

    abstract::Recently, testing for anticipation has received renewed interest. It is well known that standard statistical methods are inappropriate for this purpose due to problems of sampling bias. Few statistical tests have been proposed for comparing mean age of onset in affected parents with mean age of onset in affected child...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20057

    authors: Tsai WY,Heiman GA,Hodge SE

    更新日期:2005-04-01 00:00:00

  • Bayesian variable and model selection methods for genetic association studies.

    abstract::Variable selection is growing in importance with the advent of high throughput genotyping methods requiring analysis of hundreds to thousands of single nucleotide polymorphisms (SNPs) and the increased interest in using these genetic studies to better understand common, complex diseases. Up to now, the standard approa...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20353

    authors: Fridley BL

    更新日期:2009-01-01 00:00:00

  • Testing Hardy-Weinberg equilibrium using mother-child case-control samples.

    abstract::Genetic association studies of obstetric complications may genotype case and control mothers, or their respective newborns, or both case-control mothers and their children. The relatively high prevalence of many obstetric complications and the availability of both maternal and offspring's genotype data have provided m...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20406

    authors: Chen J,Zheng H,Wilson ML,Kraft P

    更新日期:2009-09-01 00:00:00

  • Relevance of the genes for bone mass variation to susceptibility to osteoporotic fractures and its implications to gene search for complex human diseases.

    abstract::We investigate the relevance of the genetic determination of bone mineral density (BMD) variation to that of differential risk to osteoporotic fractures (OF). The high heritability (h(2)) of BMD and the significant phenotypic correlations between high BMD and low risk to OF are well known. Little is reported on h(2) f...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1040

    authors: Deng HW,Mahaney MC,Williams JT,Li J,Conway T,Davies KM,Li JL,Deng H,Recker RR

    更新日期:2002-01-01 00:00:00

  • Estimating the power of variance component linkage analysis in large pedigrees.

    abstract::Variance component linkage analysis is commonly used to map quantitative trait loci (QTLs) in general pedigrees. Large pedigrees are especially attractive for these studies because they provide greater power per genotyped individual than small pedigrees. We propose accurate and computationally efficient methods to cal...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20160

    authors: Chen WM,Abecasis GR

    更新日期:2006-09-01 00:00:00

  • National database of familial cancer in Sweden.

    abstract::A family cancer database was constructed from the nationwide Swedish registries and includes approximately 6 million persons and >30,000 cancers in offspring diagnosed at ages 15-51 years and their parents. A particular advantage of the database is that the contribution of both parental lineages on cancer risk can be ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:3<225::AID-GEPI2>3

    authors: Hemminki K,Vaittinen P

    更新日期:1998-01-01 00:00:00

  • Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits.

    abstract::The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the pr...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21901

    authors: Broadaway KA,Duncan R,Conneely KN,Almli LM,Bradley B,Ressler KJ,Epstein MP

    更新日期:2015-07-01 00:00:00

  • SimPEL: Simulation-based power estimation for sequencing studies of low-prevalence conditions.

    abstract::Power estimations are important for optimizing genotype-phenotype association study designs. However, existing frameworks are designed for common disorders, and thus ill-suited for the inherent challenges of studies for low-prevalence conditions such as rare diseases and infrequent adverse drug reactions. These challe...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22129

    authors: Mak L,Li M,Cao C,Gordon P,Tarailo-Graovac M,Bousman C,Wang P,Long Q

    更新日期:2018-07-01 00:00:00

  • Case-only gene-environment interaction studies: when does association imply mechanistic interaction?

    abstract::Case-only studies are often used to identify interactions between a genetic factor and an environmental factor under the assumption both factors are independent in the population. However, interpreting a statistical association between the genetic and the environmental factors among the cases, as evidence of a mechani...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20484

    authors: VanderWeele TJ,Hernández-Díaz S,Hernán MA

    更新日期:2010-05-01 00:00:00