Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17.

Abstract:

:As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Tintle N,Aschard H,Hu I,Nock N,Wang H,Pugh E

doi

10.1002/gepi.20650

subject

Has Abstract

pub_date

2011-01-01 00:00:00

pages

S56-60

eissn

0741-0395

issn

1098-2272

journal_volume

35 Suppl 1

pub_type

杂志文章
  • Mortality differences by APOE genotype estimated from demographic synthesis.

    abstract::The 4 allele of apolipoprotein E (APOE) is associated with increased risk of two major causes of death in low-mortality populations: ischemic heart disease and Alzheimer's disease. It is less common among centenarians than at younger ages. Therefore, it is likely that it is associated with excess risk of death. This a...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.0164

    authors: Ewbank DC

    更新日期:2002-02-01 00:00:00

  • Linear trend tests for case-control genetic association that incorporate random phenotype and genotype misclassification error.

    abstract::The purpose of this work is the development of linear trend tests that allow for error (LTT ae), specifically incorporating double-sampling information on phenotypes and/or genotypes. We use a likelihood framework. Misclassification errors are estimated via double sampling. Unbiased estimates of penetrances and genoty...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20246

    authors: Gordon D,Haynes C,Yang Y,Kramer PL,Finch SJ

    更新日期:2007-12-01 00:00:00

  • Bayesian variable and model selection methods for genetic association studies.

    abstract::Variable selection is growing in importance with the advent of high throughput genotyping methods requiring analysis of hundreds to thousands of single nucleotide polymorphisms (SNPs) and the increased interest in using these genetic studies to better understand common, complex diseases. Up to now, the standard approa...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20353

    authors: Fridley BL

    更新日期:2009-01-01 00:00:00

  • Evaluation of genetic and environmental effects using GEE and APM methods.

    abstract::Two analytic methods were used in the Problem 2 data set. First, generalized estimating equations (GEE) modelling was developed to adjust for familial correlation in regressions evaluating candidate genes and an environmental factor. Second, the affected-pedigree-member (APM) method was used to identify chromosomal re...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120633

    authors: Bull SB,Chapman NH,Greenwood CM,Darlington GA

    更新日期:1995-01-01 00:00:00

  • Genetic epidemiology of autosomal recessive spastic ataxia of Charlevoix-Saguenay in northeastern Quebec.

    abstract::Autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS) is a disorder that has an elevated frequency in Saguenay-Lac-St-Jean (SLSJ) and Charlevoix, two geographically isolated regions in the past of northeastern Quebec. The incidence at birth and the carrier rate in SLSJ were estimated at 1/1,932 liveborn i...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100103

    authors: De Braekeleer M,Giasson F,Mathieu J,Roy M,Bouchard JP,Morgan K

    更新日期:1993-01-01 00:00:00

  • Modeling the HLA component in rheumatoid arthritis: sensitivity to DRB1 allele frequencies.

    abstract::Rheumatoid arthritis is an inflammatory disease for which positive associations have been described with some HLA-DRB1 alleles. The associated alleles share a similar amino acid sequence in the third hypervariable region, the shared epitope, but differ at position 71 and 86. It has been suggested that HLA susceptibili...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/1098-2272(200012)19:4<422::AID-GEPI12>3.0.

    authors: Tézenas du Montcel S,Reviron D,Genin E,Roudier J,Mercier P,Clerget-Darpoux F

    更新日期:2000-12-01 00:00:00

  • A novel association test for multiple secondary phenotypes from a case-control GWAS.

    abstract::In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章,随机对照试验

    doi:10.1002/gepi.22045

    authors: Ray D,Basu S

    更新日期:2017-07-01 00:00:00

  • Parental transmission and D18S37 allele sharing in bipolar affective disorder.

    abstract::We combined the five chromosome 18 bipolar affective disorder data sets provided by GAW10, totaling 185 families with 3,394 individuals, and performed analysis of differential parental transmission and chromosome 18 marker allele sharing in families with transmission through fathers vs those through mothers. Results i...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:6<665::AID-GEPI19>

    authors: Lin JP,Bale SJ

    更新日期:1997-01-01 00:00:00

  • PANDA: Prioritization of autism-genes using network-based deep-learning approach.

    abstract::Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given a large number of possibilities. Thus, computational methods have seen increasing appli...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22282

    authors: Zhang Y,Chen Y,Hu T

    更新日期:2020-06-01 00:00:00

  • Lifestyle and blood pressure levels in male twins in Utah.

    abstract::Healthy male monozygotic (MZ) and dizygotic (DZ) twin pairs (MZ pairs = 77; DZ pairs = 88) were studied to assess the effect of dietary intake, physical activity, physical fitness, body mass index (BMI), sum of the triceps and subscapular skinfold measurements, alcohol and caffeine consumption, and smoking patterns on...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370050409

    authors: Slattery ML,Bishop DT,French TK,Hunt SC,Meikle AW,Williams RR

    更新日期:1988-01-01 00:00:00

  • Projection regression models for multivariate imaging phenotype.

    abstract::This paper presents a projection regression model (PRM) to assess the relationship between a multivariate phenotype and a set of covariates, such as a genetic marker, age, and gender. In the existing literature, a standard statistical approach to this problem is to fit a multivariate linear model to the multivariate p...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21658

    authors: Lin JA,Zhu H,Knickmeyer R,Styner M,Gilmore J,Ibrahim JG

    更新日期:2012-09-01 00:00:00

  • Influence of marker heterozygosity and genetic heterogeneity on fine mapping.

    abstract::The purpose of the current study was to utilize the Genetic Analysis Workshop 12 simulated data to evaluate fine-mapping strategies for quantitative traits. We approached the analysis as if it was a follow-up to a genome scan that had identified two regions of interest and used the provided 1-cM density microsatellite...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s467

    authors: Heard-Costa NL,Demissie S,DeStefano AL,Knowlton BA,Maher NE,Myers RH,Volcjak JS,Wilk JB,Cupples LA

    更新日期:2001-01-01 00:00:00

  • Entropy-supported marker selection and Mantel statistics for haplotype sharing analysis.

    abstract::Haplotype sharing analysis is a well-established option for the investigation of the etiology of complex diseases. The statistical power of haplotype association methods depends strongly on how the information of unobserved haplotypes can be captured by multilocus genotypes. In this study we combine an entropy-based m...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20491

    authors: Schulz A,Fischer C,Chang-Claude J,Beckmann L

    更新日期:2010-05-01 00:00:00

  • Regressive logistic modeling of familial aggregation for asthma in 7,394 population-based nuclear families.

    abstract::The aim of this population-based study was to determine whether asthma aggregates in families, and if so, whether aggregation was consistent with environmental and/or genetic etiologies. Data were from 7,394 nuclear families (41,506 individuals) from the 1968 Tasmanian Asthma Survey, in which all Tasmanian schoolchild...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:3<317::AID-GEPI9>3

    authors: Jenkins MA,Hopper JL,Giles GG

    更新日期:1997-01-01 00:00:00

  • Statistical considerations for the analysis of massively parallel reporter assays data.

    abstract::Noncoding DNA contains gene regulatory elements that alter gene expression, and the function of these elements can be modified by genetic variation. Massively parallel reporter assays (MPRA) enable high-throughput identification and characterization of functional genetic variants, but the statistical methods to identi...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22337

    authors: Qiao D,Zigler CM,Cho MH,Silverman EK,Zhou X,Castaldi PJ,Laird NH

    更新日期:2020-10-01 00:00:00

  • Linkage analysis of Alzheimer's disease with methods using relative pairs.

    abstract::Four relative-pair methods for detecting genetic linkage were applied to familial Alzheimer's disease data. Results obtained using an extended Haseman-Elston test and a weighted rank pairwise correlation test, which both use information from all relative pairs, were consistent with previously published likelihood resu...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100608

    authors: Blossey H,Commenges D,Olson JM

    更新日期:1993-01-01 00:00:00

  • Genetic heterogeneity in Alzheimer's disease: a grade of membership analysis.

    abstract::Grade of membership analysis (GoM) may have particular relevance for genetic epidemiology. The method can flexibly relate genetic markers, clinical features, and environmental exposures to possible subtypes of disease termed pure types even when population allele frequencies and penetrance functions are not known. Hen...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100628

    authors: Corder EH,Woodbury MA

    更新日期:1993-01-01 00:00:00

  • Robustness of the unified model to shared environmental effects in the analysis of dichotomous traits.

    abstract::Simulation studies were conducted to assess to what extent the conclusions of segregation analysis, performed under the unified model, can be affected by the presence of unmeasured environmental factors shared by family members. Dichotomous data were generated on six-member nuclear families under two variants of the m...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370060140

    authors: Demenais F,Abel L

    更新日期:1989-01-01 00:00:00

  • Commingling analysis of memory performance in elderly men.

    abstract::Smalley et al. [(1992) Genet Epidemiol 9:333-345] found evidence of a mixture of two distributions in memory performance among offspring of patients with dementia of the Alzheimer type (DAT), suggesting that these groups reflect genotypic subgroups of carriers and non-carriers of a putative DAT gene. One prediction of...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370110506

    authors: Palmer CG,Wolkenstein BH,La Rue A,Swan GE,Smalley SL

    更新日期:1994-01-01 00:00:00

  • Robust inference for variance components models in families ascertained through probands: I. Conditioning on proband's phenotype.

    abstract::A robust approach for estimating standard errors of variance components by using quantitative phenotypes from families ascertained through a proband with an extreme phenotypic value is presented. Estimators that use the multivariate normal distribution as a "working likelihood" are obtained by computing conditional ln...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370040305

    authors: Beaty TH,Liang KY

    更新日期:1987-01-01 00:00:00

  • Constructing meiotic maps with known error probability.

    abstract::We propose methods to construct meiotic gene maps while controlling the probability of a decision-error. First, a single step gene ordering procedure is presented whose decision-error probability is bounded above by a prespecified threshold. The bound for the error probability is valid under quite general circumstance...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1999)16:3<274::AID-GEPI4>3

    authors: Rogatko A,Babb J,Jordan H,Zacks S

    更新日期:1999-01-01 00:00:00

  • Inferential testing for linkage with GENEHUNTER-MODSCORE: the impact of the pedigree structure on the null distribution of multipoint MOD scores.

    abstract::The asymptotic distribution of [MOD] scores under the null hypothesis of no linkage is only known for affected sib pairs and other types of affected relative pairs. We have extended the GENEHUNTER-MODSCORE program to allow for simulations under the null hypothesis of no linkage to determine the empirical significance ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20264

    authors: Mattheisen M,Dietter J,Knapp M,Baur MP,Strauch K

    更新日期:2008-01-01 00:00:00

  • Multipoint analysis using affected sib pairs: incorporating linkage evidence from unlinked regions.

    abstract::In this paper, we proposed a multipoint method to assess evidence of linkage to one region by incorporating linkage evidence from another region. This approach uses affected sib pairs in which the number of alleles shared identical by descent (IBD) is the primary statistic. This generalized estimating equation (GEE) a...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1021

    authors: Liang KY,Chiu YF,Beaty TH,Wjst M

    更新日期:2001-09-01 00:00:00

  • Univariate analysis of dichotomous or ordinal data from twin pairs: a simulation study comparing structural equation modeling and logistic regression.

    abstract::The univariate analysis of categorical twin data can be performed using either structural equation modeling (SEM) or logistic regression. This paper presents a comparison between these two methods using a simulation study. Dichotomous and ordinal (three category) twin data are simulated under two different sample size...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1996)13:1<79::AID-GEPI7>3.

    authors: Ramakrishnan V,Meyer JM,Goldberg J,Henderson WG

    更新日期:1996-01-01 00:00:00

  • An ensemble learning approach jointly modeling main and interaction effects in genetic association studies.

    abstract::Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20304

    authors: Zhang Z,Zhang S,Wong MY,Wareham NJ,Sha Q

    更新日期:2008-05-01 00:00:00

  • Sample size calculations for linkage analysis using extreme sib pairs based on segregation analysis with the quantitative phenotype body weight as an example.

    abstract::One approach to establish linkage is based on allele-sharing methods for sib pairs. Recently, the use of extreme sib pairs (ESP) has been proposed to increase power for mapping quantitative traits in humans. Several approaches have been discussed. In this study, we calculate sample sizes for the various ESP approaches...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:6<577::AID-GEPI3>3

    authors: Ziegler A,Hebebrand J

    更新日期:1998-01-01 00:00:00

  • Linkage analysis of asthma and atopy including models with genomic imprinting.

    abstract::Asthma and atopy are two closely related, common complex traits in which a number of genetic and environmental factors are suspected to play a role. We have performed parametric and nonparametric multi-marker linkage analysis for the Busselton data set, which is part of problem 1 of Genetic Analysis Workshop 12. In pa...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s204

    authors: Strauch K,Bogdanow M,Fimmers R,Baur MP,Wienker TF

    更新日期:2001-01-01 00:00:00

  • On the detection of linkage in multiple data sets: a comparison of various statistical approaches.

    abstract::We contrast the pooling of multiple data sets with the compound HLOD (HLOD-C) and the posterior probability of linkage (PPL), two approaches that have been shown to have more power in the presence of genetic heterogeneity. We also propose and evaluate several multipoint extensions. ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s67

    authors: Van Eerdewegh P,Dowd M,Dupuis J,Falls K,Hayward B,Santangelo SL

    更新日期:2001-01-01 00:00:00

  • Testing untyped alleles (TUNA)-applications to genome-wide association studies.

    abstract::The large number of tests performed in analyzing data from genome-wide association studies has a large impact on the power of detecting risk variants, and analytic strategies specifying the optimal set of hypotheses to be tested are necessary. We propose a genome-wide strategy that is based on one degree of freedom te...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20182

    authors: Nicolae DL

    更新日期:2006-12-01 00:00:00

  • Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring.

    abstract::In diseases with a complex mode of inheritance, families with multiple affected individuals are difficult to ascertain. The haplotype sharing statistic (HSS) uses (hidden) co-ancestry between affected individuals from a founder population. These affected individuals will likely not only share the same mutation(s), but...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:6<915::AID-GEPI59>

    authors: Van der Meulen MA,te Meerman GJ

    更新日期:1997-01-01 00:00:00