Combining dependent p-values by gamma distributions.

Abstract:

:Combining correlated p-values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p-values from individual component problems into a single unified p-value are unsuitable for the correlational structure among p-values from multiple hypothesis testing. Although some existing p-value combination methods had been modified to overcome the potential limitations, there is no uniformly most powerful method for combining correlated p-values in genetic data analysis. Therefore, providing a p-value combination method that can robustly control type I errors and keep the good power rates is necessary. In this paper, we propose an empirical method based on the gamma distribution (EMGD) for combining dependent p-values from multiple hypothesis testing. The proposed test, EMGD, allows for flexible accommodating the highly correlated p-values from the multiple hypothesis testing into a unified p-value for examining the combined hypothesis that we are interested in. The EMGD retains the robustness character of the empirical Brown's method (EBM) for pooling the dependent p-values from multiple hypothesis testing. Moreover, the EMGD keeps the character of the method based on the gamma distribution that simultaneously retains the advantages of the z-transform test and the gamma-transform test for combining dependent p-values from multiple statistical tests. The two characters lead to the EMGD that can keep the robust power for combining dependent p-values from multiple hypothesis testing. The performance of the proposed method EMGD is illustrated with simulations and real data applications by comparing with the existing methods, such as Kost and McDermott's method, the EBM and the harmonic mean p-value method.

authors

Chien LC

doi

10.1515/sagmb-2019-0057

subject

Has Abstract

pub_date

2020-11-06 00:00:00

eissn

2194-6302

issn

1544-6115

pii

/j/sagmb.ahead-of-print/sagmb-2019-0057/sagmb-2019

pub_type

杂志文章
  • Approximate maximum likelihood estimation for population genetic inference.

    abstract::In many population genetic problems, parameter estimation is obstructed by an intractable likelihood function. Therefore, approximate estimation methods have been developed, and with growing computational power, sampling-based methods became popular. However, these methods such as Approximate Bayesian Computation (ABC...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0016

    authors: Bertl J,Ewing G,Kosiol C,Futschik A

    更新日期:2017-11-27 00:00:00

  • Addressing the shortcomings of three recent Bayesian methods for detecting interspecific recombination in DNA sequence alignments.

    abstract::We address a potential shortcoming of three probabilistic models for detecting interspecific recombination in DNA sequence alignments: the multiple change-point model (MCP) of Suchard et al. (2003), the dual multiple change-point model (DMCP) of Minin et al. (2005), and the phylogenetic factorial hidden Markov model (...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1399

    authors: Husmeier D,Mantzaris AV

    更新日期:2008-01-01 00:00:00

  • Sparse inverse of covariance matrix of QTL effects with incomplete marker data.

    abstract::Gametic models for fitting breeding values at QTL as random effects in outbred populations have become popular because they require few assumptions about the number and distribution of QTL alleles segregating. The covariance matrix of the gametic effects has an inverse that is sparse and can be constructed rapidly by ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1048

    authors: Thallman RM,Hanford KJ,Kachman SD,Van Vleck LD

    更新日期:2004-01-01 00:00:00

  • LCox: a tool for selecting genes related to survival outcomes using longitudinal gene expression data.

    abstract::Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, fo...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0060

    authors: Sun J,Herazo-Maya JD,Wang JL,Kaminski N,Zhao H

    更新日期:2019-02-13 00:00:00

  • Comparison and visualisation of agreement for paired lists of rankings.

    abstract::Output from analysis of a high-throughput 'omics' experiment very often is a ranked list. One commonly encountered example is a ranked list of differentially expressed genes from a gene expression experiment, with a length of many hundreds of genes. There are numerous situations where interest is in the comparison of ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2016-0036

    authors: Donald MR,Wilson SR

    更新日期:2017-03-01 00:00:00

  • Buckley-James boosting for survival analysis with high-dimensional biomarker data.

    abstract::There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data. In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection s...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1550

    authors: Wang Z,Wang CY

    更新日期:2010-01-01 00:00:00

  • Detecting outlier samples in microarray data.

    abstract::In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to exp...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1426

    authors: Shieh AD,Hung YS

    更新日期:2009-01-01 00:00:00

  • Dimension reduction for classification with gene expression microarray data.

    abstract::An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction tech...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1147

    authors: Dai JJ,Lieu L,Rocke D

    更新日期:2006-01-01 00:00:00

  • Combining nearest neighbor classifiers versus cross-validation selection.

    abstract::Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN metho...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1054

    authors: Paik M,Yang Y

    更新日期:2004-01-01 00:00:00

  • Empirical bayes microarray ANOVA and grouping cell lines by equal expression levels.

    abstract::In the exploding field of gene expression techniques such as DNA microarrays, there are still few general probabilistic methods for analysis of variance. Linear models and ANOVA are heavily used tools in many other disciplines of scientific research. The usual F-statistic is unsatisfactory for microarray data, which e...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1125

    authors: Lönnstedt I,Rimini R,Nilsson P

    更新日期:2005-01-01 00:00:00

  • Discrete Wavelet Packet Transform Based Discriminant Analysis for Whole Genome Sequences.

    abstract::In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may no...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2018-0045

    authors: Huang HH,Girimurugan SB

    更新日期:2019-02-15 00:00:00

  • A multiple testing approach to high-dimensional association studies with an application to the detection of associations between risk factors of heart disease and genetic polymorphisms.

    abstract::We present an approach to association studies involving a dozen or so ;response' variables and a few hundred ;explanatory' variables which emphasizes transparency, simplicity, and protection against spurious results. The methods proposed are largely non-parametric, and they are systematically rounded-off by the Benjam...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1420

    authors: Ferreira JA,Berkhof J,Souverein O,Zwinderman K

    更新日期:2009-01-01 00:00:00

  • Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments.

    abstract::The ENCODE project has funded the generation of a diverse collection of methylation profiles using reduced representation bisulfite sequencing (RRBS) technology, enabling the analysis of epigenetic variation on a genomic scale at single-site resolution. A standard application of RRBS experiments is in the location of ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0027

    authors: Lacey MR,Baribault C,Ehrlich M

    更新日期:2013-12-01 00:00:00

  • TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists.

    abstract::High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite di...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2014-0093

    authors: Schimek MG,Budinská E,Kugler KG,Švendová V,Ding J,Lin S

    更新日期:2015-06-01 00:00:00

  • Model selection based on FDR-thresholding optimizing the area under the ROC-curve.

    abstract::We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1462

    authors: Graf AC,Bauer P

    更新日期:2009-01-01 00:00:00

  • Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies.

    abstract::Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded wit...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2018-0039

    authors: Liang Y,Kelemen A,Kelemen A

    更新日期:2019-05-11 00:00:00

  • Semi-parametric differential expression analysis via partial mixture estimation.

    abstract::We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situa...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1333

    authors: Rossell D,Guerra R,Scott C

    更新日期:2008-01-01 00:00:00

  • Genetic association test based on principal component analysis.

    abstract::Many gene- and pathway-based association tests have been proposed in the literature. Among them, the SKAT is widely used, especially for rare variants association studies. In this paper, we investigate the connection between SKAT and a principal component analysis. This investigation leads to a procedure that encompas...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2016-0061

    authors: Chen Z,Han S,Wang K

    更新日期:2017-07-26 00:00:00

  • Empirical bayes estimation of a sparse vector of gene expression changes.

    abstract::Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available numb...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1132

    authors: Erickson S,Sabatti C

    更新日期:2005-01-01 00:00:00

  • Likelihood-based inference for multi-color optical mapping.

    abstract::Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognitio...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1266

    authors: Tong L,Mets L,McPeek MS

    更新日期:2007-01-01 00:00:00

  • Multiple testing in candidate gene situations: a comparison of classical, discrete, and resampling-based procedures.

    abstract::In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1729

    authors: Elsäβer A,Victor A,Hommel G

    更新日期:2011-01-01 00:00:00

  • BayesMendel: an R environment for Mendelian risk prediction.

    abstract::Several important syndromes are caused by deleterious germline mutations of individual genes. In both clinical and research applications it is useful to evaluate the probability that an individual carries an inherited genetic variant of these genes, and to predict the risk of disease for that individual, using informa...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1063

    authors: Chen S,Wang W,Broman KW,Katki HA,Parmigiani G

    更新日期:2004-01-01 00:00:00

  • Principal component discriminant analysis.

    abstract::The approach adopted involved two-stages. First the 11205 measurements in the mass spectrometry data were reduced to 14 scores by a principal component analysis of the centered but otherwise untreated and unscaled data matrix. Then a linear classifier was derived by linear discriminant analysis using these 14 scores a...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1350

    authors: Fearn T

    更新日期:2008-01-01 00:00:00

  • Genetic linkage analysis in the presence of germline mosaicism.

    abstract::Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. This condition violates the assumptions underlying classic genetic analysis and may lead to failure of such analysis. In this work we extend the statistical model used for genetic linkage analysis in order to incorp...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1709

    authors: Weissbrod O,Geiger D

    更新日期:2011-10-04 00:00:00

  • Mapping quantitative trait loci in a non-equilibrium population.

    abstract::The genetic control of a complex trait can be studied by testing and mapping the genotypes of the underlying quantitative trait loci (QTLs) through their associations with observable marker genotypes. All existing statistical methods for QTL mapping assume an equilibrium population, allowing marker-QTL associations to...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1578

    authors: Wu S,Yang J,Wu R

    更新日期:2010-01-01 00:00:00

  • The cyclohedron test for finding periodic genes in time course expression studies.

    abstract::The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebr...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1286

    authors: Morton J,Pachter L,Shiu A,Sturmfels B

    更新日期:2007-01-01 00:00:00

  • Asymptotic optimality of likelihood-based cross-validation.

    abstract::Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth index...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1036

    authors: van der Laan MJ,Dudoit S,Keles S

    更新日期:2004-01-01 00:00:00

  • Polyunphased: an extension to polytomous outcomes of the Unphased package for family-based genetic association analysis.

    abstract::Polytomous phenotypes arise when a disease has multiple subtypes or when two dichotomous phenotypes are analyzed simultaneously. Few software programs offer the option to analyze such phenotypes in family studies, and none implements conditional polytomous logistic regression for within-family analysis robust to popul...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2016-0035

    authors: Bureau A,Croteau J

    更新日期:2017-03-01 00:00:00

  • A test for detecting differential indirect trans effects between two groups of samples.

    abstract::Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, n...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0058

    authors: Chaturvedi N,Menezes RX,Goeman JJ,Wieringen WV

    更新日期:2018-07-31 00:00:00

  • Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

    abstract::Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these m...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1666

    authors: Neuwald AF

    更新日期:2011-01-01 00:00:00