Statistical considerations for the analysis of massively parallel reporter assays data.

Abstract:

:Noncoding DNA contains gene regulatory elements that alter gene expression, and the function of these elements can be modified by genetic variation. Massively parallel reporter assays (MPRA) enable high-throughput identification and characterization of functional genetic variants, but the statistical methods to identify allelic effects in MPRA data have not been fully developed. In this study, we demonstrate how the baseline allelic imbalance in MPRA libraries can produce biased results, and we propose a novel, nonparametric, adaptive testing method that is robust to this bias. We compare the performance of this method with other commonly used methods, and we demonstrate that our novel adaptive method controls Type I error in a wide range of scenarios while maintaining excellent power. We have implemented these tests along with routines for simulating MPRA data in the Analysis Toolset for MPRA (@MPRA), an R package for the design and analyses of MPRA experiments. It is publicly available at http://github.com/redaq/atMPRA.

journal_name

Genet Epidemiol

journal_title

Genetic epidemiology

authors

Qiao D,Zigler CM,Cho MH,Silverman EK,Zhou X,Castaldi PJ,Laird NH

doi

10.1002/gepi.22337

subject

Has Abstract

pub_date

2020-10-01 00:00:00

pages

785-794

issue

7

eissn

0741-0395

issn

1098-2272

journal_volume

44

pub_type

杂志文章
  • Information on ancestry from genetic markers.

    abstract::It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10319

    authors: Pfaff CL,Barnholtz-Sloan J,Wagner JK,Long JC

    更新日期:2004-05-01 00:00:00

  • Estimation of genetic and environmental components in colorectal and lung cancer and melanoma.

    abstract::Cancer has predominant environmental and somatic causes but the assessment of hereditary (genetic) causes is difficult, except for highly penetrant single-gene causes. Family studies are only partially informative in this regard because family members share diet and life-styles. Twin studies have been classically used...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/1098-2272(200101)20:1<107::AID-GEPI9>3.0.C

    authors: Hemminki K,Lönnstedt I,Vaittinen P,Lichtenstein P

    更新日期:2001-01-01 00:00:00

  • Regressive logistic modeling of familial aggregation for asthma in 7,394 population-based nuclear families.

    abstract::The aim of this population-based study was to determine whether asthma aggregates in families, and if so, whether aggregation was consistent with environmental and/or genetic etiologies. Data were from 7,394 nuclear families (41,506 individuals) from the 1968 Tasmanian Asthma Survey, in which all Tasmanian schoolchild...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1997)14:3<317::AID-GEPI9>3

    authors: Jenkins MA,Hopper JL,Giles GG

    更新日期:1997-01-01 00:00:00

  • Adaptive testing for association between two random vectors in moderate to high dimensions.

    abstract::Testing for association between two random vectors is a common and important task in many fields, however, existing tests, such as Escoufier's RV test, are suitable only for low-dimensional data, not for high-dimensional data. In moderate to high dimensions, it is necessary to consider sparse signals, which are often ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22059

    authors: Xu Z,Xu G,Pan W,Alzheimer's Disease Neuroimaging Initiative.

    更新日期:2017-11-01 00:00:00

  • Bayesian variable and model selection methods for genetic association studies.

    abstract::Variable selection is growing in importance with the advent of high throughput genotyping methods requiring analysis of hundreds to thousands of single nucleotide polymorphisms (SNPs) and the increased interest in using these genetic studies to better understand common, complex diseases. Up to now, the standard approa...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20353

    authors: Fridley BL

    更新日期:2009-01-01 00:00:00

  • TDT with covariates and genomic screens with mod scores: their behavior on simulated data.

    abstract::We describe an extension to the TDT (transmission/disequilibrium test) which allows for more than two marker alleles and for covariates measured on the parent or offspring. We also describe a systematic genomic search where the mod score (maximized lod score) is computed for each marker under constraints on the popula...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370120623

    authors: Rice JP,Neuman RJ,Hoshaw SL,Daw EW,Gu C

    更新日期:1995-01-01 00:00:00

  • Random effects model for meta-analysis of multiple quantitative sibpair linkage studies.

    abstract::The growing interest in detection of genetic effects for complex traits along with molecular revolution has stimulated many linkage studies. Multiple replication studies tend to produce different results. In such situations, rigorous meta-analysis methods can be useful for assessing the overall evidence for linkage. W...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1996)13:4<377::AID-GEPI6>3

    authors: Li Z,Rao DC

    更新日期:1996-01-01 00:00:00

  • Data mining and computationally intensive methods: summary of Group 7 contributions to Genetic Analysis Workshop 13.

    abstract::The Framingham Heart Study data, as well as a related simulated data set, were generously provided to the participants of the Genetic Analysis Workshop 13 in order that newly developed and emerging statistical methodologies could be tested on that well-characterized data set. The impetus driving the development of nov...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10285

    authors: Costello TJ,Falk CT,Ye KQ

    更新日期:2003-01-01 00:00:00

  • Demonstration of a common major gene with pleiotropic effects on immunoglobulin E levels and allergy.

    abstract::Atopic disease is generally recognized to be familial, although specific genetic components have yet to be identified. High levels of a unique class of immunoglobulins, immunoglobulin E (IgE), have been shown to be associated with allergies. Several investigators have reported evidence indicating a recessive regulator...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370020402

    authors: Borecki IB,Rao DC,Lalouel JM,McGue M,Gerrard JW

    更新日期:1985-01-01 00:00:00

  • Linkage analysis of asthma and atopy including models with genomic imprinting.

    abstract::Asthma and atopy are two closely related, common complex traits in which a number of genetic and environmental factors are suspected to play a role. We have performed parametric and nonparametric multi-marker linkage analysis for the Busselton data set, which is part of problem 1 of Genetic Analysis Workshop 12. In pa...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.2001.21.s1.s204

    authors: Strauch K,Bogdanow M,Fimmers R,Baur MP,Wienker TF

    更新日期:2001-01-01 00:00:00

  • National database of familial cancer in Sweden.

    abstract::A family cancer database was constructed from the nationwide Swedish registries and includes approximately 6 million persons and >30,000 cancers in offspring diagnosed at ages 15-51 years and their parents. A particular advantage of the database is that the contribution of both parental lineages on cancer risk can be ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:3<225::AID-GEPI2>3

    authors: Hemminki K,Vaittinen P

    更新日期:1998-01-01 00:00:00

  • Design of artificial neural network and its applications to the analysis of alcoholism data.

    abstract::Artificial neural networks were applied to the alcoholism data to reveal nonlinear relationships between intermediate phenotypes, marker identity-by-descent sharing, and the affection status. A variable number of hidden units were considered to achieve a balance between the minimal mean-squared error and over-fitting ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170738

    authors: Li W,Haghighi F,Falk CT

    更新日期:1999-01-01 00:00:00

  • Genetic analysis of a complex disease in the presence of an environmental risk factor.

    abstract::The role of a gene in a disease may be hidden by the presence of another risk factor such as an environmental factor. In that case, stratifying the data according to this factor strengthens power to detect linkage or association. We followed this strategy on the simulated data provided by GAW11. The transmission/diseq...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370170788

    authors: Eichenbaum-Voline S,Baur MP,Knapp M

    更新日期:1999-01-01 00:00:00

  • Increasing the power of identifying gene x gene interactions in genome-wide association studies.

    abstract::In this paper we investigate the power to identify gene x gene interactions in genome-wide association studies. In our analysis we focus on two-stage analyses: analyses in which we only test for interactions between single nucleotide polymorphisms that show some marginal effect. We give two algorithms to compute signi...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20300

    authors: Kooperberg C,Leblanc M

    更新日期:2008-04-01 00:00:00

  • APO B 3' HVR polymorphism in healthy population: relationships to serum lipid levels.

    abstract::We have analyzed allele frequency distribution at the hypervariable locus 3' to the apolipoprotein B gene in a healthy population sample (241 women and 246 men) from the Belgrade area. The bimodal distribution of sixteen different hypervariable region (HVR) alleles and the heterozygosity index (average 0.76) in both s...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/(SICI)1098-2272(1998)15:2<113::AID-GEPI1>3

    authors: Alavantić D,Glisić S,Kandić I

    更新日期:1998-01-01 00:00:00

  • Apolipoprotein E phenotype, arterial disease, and mortality among older women: the study of osteoporotic fractures.

    abstract::This study is an investigation of the relationship between apolipoprotein E (apoE) phenotype, arterial disease, and mortality in a group of women (n = 1,751) aged 65 years and older enrolled in the Study of Osteoporotic Fractures. Crude mortality rates were highest among women with the 4-3 and 4-4 phenotypes but age-a...

    journal_title:Genetic epidemiology

    pub_type: 临床试验,杂志文章,多中心研究

    doi:10.1002/(SICI)1098-2272(1997)14:2<147::AID-GEPI4>3

    authors: Vogt MT,Cauley JA,Kuller LH

    更新日期:1997-01-01 00:00:00

  • Maximum-likelihood estimation of haplotype frequencies in nuclear families.

    abstract::The importance of haplotype analysis in the context of association fine mapping of disease genes has grown steadily over the last years. Since experimental methods to determine haplotypes on a large scale are not available, phase has to be inferred statistically. For individual genotype data, several reconstruction te...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.10323

    authors: Becker T,Knapp M

    更新日期:2004-07-01 00:00:00

  • A two-locus model for familial Alzheimer's disease?

    abstract::The present findings for familial Alzheimer's disease suggest a possible linkage to gene(s) on chromosome 21 for the early onset form and to chromosome 19 for the late onset. Since these results are not unequivocal, possible alternative hypotheses include the effect of genetic heterogeneity or of an oligogenic model o...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100618

    authors: Macciardi F,Cavallini MC

    更新日期:1993-01-01 00:00:00

  • Multiethnic polygenic risk scores improve risk prediction in diverse populations.

    abstract::Methods for genetic risk prediction have been widely investigated in recent years. However, most available training data involves European samples, and it is currently unclear how to accurately predict disease risk in other populations. Previous studies have used either training data from European samples in large sam...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22083

    authors: Márquez-Luna C,Loh PR,South Asian Type 2 Diabetes (SAT2D) Consortium.,SIGMA Type 2 Diabetes Consortium.,Price AL

    更新日期:2017-12-01 00:00:00

  • Model-based linkage analysis with imprinting for quantitative traits: ignoring imprinting effects can severely jeopardize detection of linkage.

    abstract::Genes with imprinting (parent-of-origin) effects express differently when inheriting from the mother or from the father. Some genes for development and behavior in mammals are known to be imprinted. We developed parametric linkage analysis that accounts for imprinting effects for continuous traits, implementing it in ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20321

    authors: Sung YJ,Rao DC

    更新日期:2008-07-01 00:00:00

  • Genetic and environmental causes of variation in renal tubular handling of sodium and potassium: a twin study.

    abstract::We have conducted a study of renal sodium and potassium reabsorption in 205 pairs of twins on freely chosen diets; 89 of the subjects were studied on more than one occasion. Renal tubular sodium and potassium handling, as measured by the fractional excretions FENa and FEK, show repeatable differences between individua...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370020103

    authors: Whitfield JB,Martin NG

    更新日期:1985-01-01 00:00:00

  • Tag SNPs chosen from HapMap perform well in several population isolates.

    abstract::Population isolates may be particularly useful for association studies of complex traits. This utility, however, largely depends on the transferability of tag SNPs chosen from reference samples, such as HapMap, to samples from such populations. Factors that characterize population isolates, such as widespread genetic ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20201

    authors: Service S,International Collaborative Group on Isolated Populations.,Sabatti C,Freimer N

    更新日期:2007-04-01 00:00:00

  • Segregation analysis of autosomal dominant polycystic kidney disease.

    abstract::The results of classical segregation analysis on 159 families with polycystic kidney disease (PKD) are presented. It had been previously estimated that about 95% of autosomal dominant PKD (ADPKD) families have PKD1, the gene localized to chromosome 16p. The main purpose of the study was to determine if PKD shows any s...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370100305

    authors: Dobin A,Kimberling WJ,Pettinger W,Bailey-Wilson JE,Shugart YY,Gabow P

    更新日期:1993-01-01 00:00:00

  • Meta-Analysis of Rare Variant Association Tests in Multiethnic Populations.

    abstract::Several methods have been proposed to increase power in rare variant association testing by aggregating information from individual rare variants (MAF < 0.005). However, how to best combine rare variants across multiple ethnicities and the relative performance of designs using different ethnic sampling fractions remai...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21939

    authors: Mensah-Ablorh A,Lindstrom S,Haiman CA,Henderson BE,Marchand LL,Lee S,Stram DO,Eliassen AH,Price A,Kraft P

    更新日期:2016-01-01 00:00:00

  • Measuring the inflation of the lod score due to its maximization over model parameter values in human linkage analysis.

    abstract::A computer-simulation method is presented for determining and correcting for the effect of maximizing the lod score over disease definitions, penetrance values, and perhaps other model parameters. The method consists of simulating the complete analysis using marker genotypes randomly generated under the assumption of ...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370070402

    authors: Weeks DE,Lehner T,Squires-Wheeler E,Kaufmann C,Ott J

    更新日期:1990-01-01 00:00:00

  • Familial analysis of eosinophilia caused by helminthic parasites.

    abstract::A highly significant familial aggregation of eosinophil levels (X2(3) = 38.00) was detected in a sample from three Brazilian populations with a high incidence of helminthic parasitism. The data were unable to resolve genetic or common environment causation due to the lack of environmental concomitant variables. Result...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.1370090305

    authors: Moro-Furlani AM,Krieger H

    更新日期:1992-01-01 00:00:00

  • Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits.

    abstract::The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the pr...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.21901

    authors: Broadaway KA,Duncan R,Conneely KN,Almli LM,Bradley B,Ressler KJ,Epstein MP

    更新日期:2015-07-01 00:00:00

  • A small-sample multivariate kernel machine test for microbiome association studies.

    abstract::High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been rout...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22030

    authors: Zhan X,Tong X,Zhao N,Maity A,Wu MC,Chen J

    更新日期:2017-04-01 00:00:00

  • Linear trend tests for case-control genetic association that incorporate random phenotype and genotype misclassification error.

    abstract::The purpose of this work is the development of linear trend tests that allow for error (LTT ae), specifically incorporating double-sampling information on phenotypes and/or genotypes. We use a likelihood framework. Misclassification errors are estimated via double sampling. Unbiased estimates of penetrances and genoty...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.20246

    authors: Gordon D,Haynes C,Yang Y,Kramer PL,Finch SJ

    更新日期:2007-12-01 00:00:00

  • SimPEL: Simulation-based power estimation for sequencing studies of low-prevalence conditions.

    abstract::Power estimations are important for optimizing genotype-phenotype association study designs. However, existing frameworks are designed for common disorders, and thus ill-suited for the inherent challenges of studies for low-prevalence conditions such as rare diseases and infrequent adverse drug reactions. These challe...

    journal_title:Genetic epidemiology

    pub_type: 杂志文章

    doi:10.1002/gepi.22129

    authors: Mak L,Li M,Cao C,Gordon P,Tarailo-Graovac M,Bousman C,Wang P,Long Q

    更新日期:2018-07-01 00:00:00