Phenotype validation in electronic health records based genetic association studies.

Abstract:

:The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Wang L,Damrauer SM,Zhang H,Zhang AX,Xiao R,Moore JH,Chen J

doi

10.1002/gepi.22080

subject

Has Abstract

pub_date

2017-12-01 00:00:00

pages

790-800

issue

8

eissn

0741-0395

issn

1098-2272

journal_volume

41

pub_type

杂志文章
  • Direct genetic effects and their estimation from matched case-control data.

    abstract::In genetic association studies, a single marker is often associated with multiple, correlated phenotypes (e.g., obesity and cardiovascular disease, or nicotine dependence and lung cancer). A pervasive question is then whether that marker exerts independent effects on all phenotypes. In this paper, we address this ques...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21660

    authors: Berzuini C,Vansteelandt S,Foco L,Pastorino R,Bernardinelli L

    更新日期:2012-09-01 00:00:00

  • Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: application to searching for gene × gene patterns.

    abstract::We construct data exploration tools for recognizing important covariate patterns associated with a phenotype, with particular focus on searching for association with gene-gene patterns. To this end, we propose a new variable selection procedure that employs latent selection weights and compare it to an alternative for...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21661

    authors: Papathomas M,Molitor J,Hoggart C,Hastie D,Richardson S

    更新日期:2012-09-01 00:00:00

  • Projection regression models for multivariate imaging phenotype.

    abstract::This paper presents a projection regression model (PRM) to assess the relationship between a multivariate phenotype and a set of covariates, such as a genetic marker, age, and gender. In the existing literature, a standard statistical approach to this problem is to fit a multivariate linear model to the multivariate p...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21658

    authors: Lin JA,Zhu H,Knickmeyer R,Styner M,Gilmore J,Ibrahim JG

    更新日期:2012-09-01 00:00:00

  • A general autosomal/X-linked model.

    abstract::This paper describes a general genetic model which encompasses both autosomal and X-linked inheritance as submodels. It allows one to test for X-linked inheritance of a trait by comparing the likelihood of X-linked inheritance to the likelihood of the general genetic model. The general model is formulated as two loci,...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370010105

    authors: Hasstedt SJ,Skolnick M

    更新日期:1984-01-01 00:00:00

  • Major gene with sex-specific effects influences fat mass in Mexican Americans.

    abstract::Increased adiposity has repeatedly been identified as a major risk factor for a variety of chronic diseases. However, the question still remains whether the amount of adipose tissue itself is genetically mediated. To address this question, a segregation analysis, using maximum likelihood techniques as implemented in t...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120505

    authors: Comuzzie AG,Blangero J,Mahaney MC,Mitchell BD,Hixson JE,Samollow PB,Stern MP,MacCluer JW

    更新日期:1995-01-01 00:00:00

  • A small-sample multivariate kernel machine test for microbiome association studies.

    abstract::High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been rout...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22030

    authors: Zhan X,Tong X,Zhao N,Maity A,Wu MC,Chen J

    更新日期:2017-04-01 00:00:00

  • Commingling analysis of memory performance in elderly men.

    abstract::Smalley et al. [(1992) Genet Epidemiol 9:333-345] found evidence of a mixture of two distributions in memory performance among offspring of patients with dementia of the Alzheimer type (DAT), suggesting that these groups reflect genotypic subgroups of carriers and non-carriers of a putative DAT gene. One prediction of...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370110506

    authors: Palmer CG,Wolkenstein BH,La Rue A,Swan GE,Smalley SL

    更新日期:1994-01-01 00:00:00

  • Apolipoprotein E phenotype, arterial disease, and mortality among older women: the study of osteoporotic fractures.

    abstract::This study is an investigation of the relationship between apolipoprotein E (apoE) phenotype, arterial disease, and mortality in a group of women (n = 1,751) aged 65 years and older enrolled in the Study of Osteoporotic Fractures. Crude mortality rates were highest among women with the 4-3 and 4-4 phenotypes but age-a...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章,多中心研究

    doi:10.1002/(SICI)1098-2272(1997)14:2<147::AID-GEPI4>3

    authors: Vogt MT,Cauley JA,Kuller LH

    更新日期:1997-01-01 00:00:00

  • Meta-analysis of linkage studies.

    abstract::Lander and Kruglyak [1995] gave guidelines for interpreting linkage results based on estimating how often a particular threshold for significance would be exceeded by chance in a single genome scan. What is unknown is how often two or more genome scans would exceed a particular threshold within the same region. We dev...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170778

    authors: Badner JA,Goldin LR

    更新日期:1999-01-01 00:00:00

  • A likelihood ratio-based Mann-Whitney approach finds novel replicable joint gene action for type 2 diabetes.

    abstract::The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test t...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21651

    authors: Lu Q,Wei C,Ye C,Li M,Elston RC

    更新日期:2012-09-01 00:00:00

  • Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17.

    abstract::As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20650

    authors: Tintle N,Aschard H,Hu I,Nock N,Wang H,Pugh E

    更新日期:2011-01-01 00:00:00

  • A meta-analysis approach with filtering for identifying gene-level gene-environment interactions.

    abstract::There is a growing recognition that gene-environment interaction (G × E) plays a pivotal role in the development and progression of complex diseases. Despite a wealth of genetic data on various complex diseases/traits generated from association and sequencing studies, detecting G × E via genome-wide analysis remains c...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,meta分析

    doi:10.1002/gepi.22115

    authors: Wang J,Liu Q,Pierce BL,Huo D,Olopade OI,Ahsan H,Chen LS

    更新日期:2018-07-01 00:00:00

  • Using case-control designs for genome-wide screening for associations between genetic markers and disease susceptibility loci.

    abstract::We used a case-control design to scan the genome for any associations between genetic markers and disease susceptibility loci using the first two replicates of the Mycenaean population from the GAW11 (Problem 2) data. Using a case-control approach, we constructed a series of 2-by-3 tables for each allele of every mark...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.13701707128

    authors: Yang Q,Khoury MJ,Atkinson M,Sun F,Cheng R,Flanders WD

    更新日期:1999-01-01 00:00:00

  • Family-based association tests for qualitative and quantitative traits using single-nucleotide polymorphism and microsatellite data.

    abstract::Using the Genetic Analysis Workshop 12 simulated data, we contrasted results for association tests in nuclear families and extended pedigrees using single-nucleotide polymorphism (SNP) data, and we compared results for different trait definitions, for outbred and isolate populations, and for SNP and microsatellite dat...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s364

    authors: Wilk JB,Volcjak JS,Myers RH,Maher NE,Knowlton BA,Heard-Costa NL,Demissie S,Cupples LA,DeStefano AL

    更新日期:2001-01-01 00:00:00

  • Analysis of multiple phenotypes.

    abstract::The complex etiology of common diseases like cardiovascular disease, diabetes, hypertension, and rheumatoid arthritis has led investigators to focus on the genetics of correlated phenotypes and risk factors. Joint analysis of multiple disease-related phenotypes may reveal genes of pleiotropic effect and increase analy...

    journal_title:Genetic epidemiology

    pub_type:

    doi:10.1002/gepi.20470

    authors: Kent JW Jr

    更新日期:2009-01-01 00:00:00

  • PANDA: Prioritization of autism-genes using network-based deep-learning approach.

    abstract::Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given a large number of possibilities. Thus, computational methods have seen increasing appli...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22282

    authors: Zhang Y,Chen Y,Hu T

    更新日期:2020-06-01 00:00:00

  • Genetic epidemiology of autosomal recessive spastic ataxia of Charlevoix-Saguenay in northeastern Quebec.

    abstract::Autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS) is a disorder that has an elevated frequency in Saguenay-Lac-St-Jean (SLSJ) and Charlevoix, two geographically isolated regions in the past of northeastern Quebec. The incidence at birth and the carrier rate in SLSJ were estimated at 1/1,932 liveborn i...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100103

    authors: De Braekeleer M,Giasson F,Mathieu J,Roy M,Bouchard JP,Morgan K

    更新日期:1993-01-01 00:00:00

  • Statistical considerations for the analysis of massively parallel reporter assays data.

    abstract::Noncoding DNA contains gene regulatory elements that alter gene expression, and the function of these elements can be modified by genetic variation. Massively parallel reporter assays (MPRA) enable high-throughput identification and characterization of functional genetic variants, but the statistical methods to identi...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22337

    authors: Qiao D,Zigler CM,Cho MH,Silverman EK,Zhou X,Castaldi PJ,Laird NH

    更新日期:2020-10-01 00:00:00

  • Mortality differences by APOE genotype estimated from demographic synthesis.

    abstract::The 4 allele of apolipoprotein E (APOE) is associated with increased risk of two major causes of death in low-mortality populations: ischemic heart disease and Alzheimer's disease. It is less common among centenarians than at younger ages. Therefore, it is likely that it is associated with excess risk of death. This a...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.0164

    authors: Ewbank DC

    更新日期:2002-02-01 00:00:00

  • Genome-wide linkage analysis using genetic variance components of alcohol dependency-associated censored and continuous traits.

    abstract::We used variance-components analysis to investigate the additive genetic effects regulating some of the phenotypes included in the GAW11 data set. Variance-components models were fitted using Gibbs sampling methods in BUGS v 0.6. Linkage analyses for both multivariate normal (MvN) traits and right censored survival ti...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170748

    authors: Palmer LJ,Tiller KJ,Burton PR

    更新日期:1999-01-01 00:00:00

  • Monte Carlo analysis on a large pedigree.

    abstract::Monte Carlo methods for linkage and segregation analysis are applied to the HGAR1 pedigree. To address these data, the methods are extended in several ways. The results are compared with those provided by PAP. ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100658

    authors: Thompson EA,Lin S,Olshen AB,Wijsman EM

    更新日期:1993-01-01 00:00:00

  • A novel association test for multiple secondary phenotypes from a case-control GWAS.

    abstract::In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,随机对照试验

    doi:10.1002/gepi.22045

    authors: Ray D,Basu S

    更新日期:2017-07-01 00:00:00

  • Multifactorial disease risk calculator: Risk prediction for multifactorial disease pedigrees.

    abstract::Construction of multifactorial disease models from epidemiological findings and their application to disease pedigrees for risk prediction is nontrivial for all but the simplest of cases. Multifactorial Disease Risk Calculator is a web tool facilitating this. It provides a user-friendly interface, extending a reported...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22101

    authors: Campbell DD,Li Y,Sham PC

    更新日期:2018-03-01 00:00:00

  • Comparison of variance components, ANOVA and regression of offspring on midparent (ROMP) methods for SNP markers.

    abstract::An extension of the traditional regression of offspring on midparent (ROMP) method was used to estimate the heritability of the trait, test for marker association, and estimate the heritability attributable to a marker locus. The fifty replicates of the Genetic Analysis Workshop (GAW) 12 simulated general population d...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s794

    authors: Pugh EW,Papanicolaou GJ,Justice CM,Roy-Gagnon MH,Sorant AJ,Kingman A,Wilson AF

    更新日期:2001-01-01 00:00:00

  • Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment.

    abstract::Population stratification (PS) can lead to an inflated rate of false-positive findings in genome-wide association studies (GWAS). The commonly used approach of adjustment for a fixed number of principal components (PCs) could have a deleterious impact on power when selected PCs are equally distributed in cases and con...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20396

    authors: Li Q,Wacholder S,Hunter DJ,Hoover RN,Chanock S,Thomas G,Yu K

    更新日期:2009-07-01 00:00:00

  • Genotyping errors, pedigree errors, and missing data.

    abstract::Our group studied the effects of genotyping errors, pedigree errors, and missing data on a wide range of techniques, with a focus on the role of single-nucleotide polymorphisms (SNPs). Half of our group used simulated data, and half of our group used data from the Collaborative Study on the Genetics of Alcoholism (COG...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20120

    authors: Hinrichs AL,Suarez BK

    更新日期:2005-01-01 00:00:00

  • Familial analysis of eosinophilia caused by helminthic parasites.

    abstract::A highly significant familial aggregation of eosinophil levels (X2(3) = 38.00) was detected in a sample from three Brazilian populations with a high incidence of helminthic parasitism. The data were unable to resolve genetic or common environment causation due to the lack of environmental concomitant variables. Result...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370090305

    authors: Moro-Furlani AM,Krieger H

    更新日期:1992-01-01 00:00:00

  • Model selection and Bayesian methods in statistical genetics: summary of group 11 contributions to Genetic Analysis Workshop 15.

    abstract::The research presented in group 11 of the Genetic Analysis Workshop 15 (GAW15) falls into two major themes: Model selection approaches for gene mapping (both Bayesian and Frequentist); and other Bayesian methods. These methods either allow relaxation of some of the common assumptions, such as mode of inheritance, for ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,评审

    doi:10.1002/gepi.20285

    authors: Swartz MD,Thomas DC,Daw EW,Albers K,Charlesworth JC,Dyer TC,Fridley BL,Govil M,Kraft P,Kwon S,Logue MW,Oh C,Pique-Regi R,Saba L,Schumacher FR,Uh HW

    更新日期:2007-01-01 00:00:00

  • Power and sample size calculations for SNP association studies with censored time-to-event outcomes.

    abstract::For many clinical studies in cancer, germline DNA is prospectively collected for the purpose of discovering or validating single-nucleotide polymorphisms (SNPs) associated with clinical outcomes. The primary clinical endpoint for many of these studies are time-to-event outcomes such as time of death or disease progres...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21645

    authors: Owzar K,Li Z,Cox N,Jung SH

    更新日期:2012-09-01 00:00:00

  • Genome-wide association studies for discrete traits.

    abstract::Genome-wide association studies of discrete traits generally use simple methods of analysis based on chi(2) tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combin...

    journal_title:Genetic epidemiology

    pub_type:

    doi:10.1002/gepi.20465

    authors: Thomas DC

    更新日期:2009-01-01 00:00:00