Fully Bayesian mixture model for differential gene expression: simulations and model checks.

Abstract:

:We present a Bayesian hierarchical model for detecting differentially expressed genes using a mixture prior on the parameters representing differential effects. We formulate an easily interpretable 3-component mixture to classify genes as over-expressed, under-expressed and non-differentially expressed, and model gene variances as exchangeable to allow for variability between genes. We show how the proportion of differentially expressed genes, and the mixture parameters, can be estimated in a fully Bayesian way, extending previous approaches where this proportion was fixed and empirically estimated. Good estimates of the false discovery rates are also obtained. Different parametric families for the mixture components can lead to quite different classifications of genes for a given data set. Using Affymetrix data from a knock out and wildtype mice experiment, we show how predictive model checks can be used to guide the choice between possible mixture priors. These checks show that extending the mixture model to allow extra variability around zero instead of the usual point mass null fits the data better. A software package for R is available.

authors

Lewin A,Bochkina N,Richardson S

doi

10.2202/1544-6115.1314

subject

Has Abstract

pub_date

2007-01-01 00:00:00

pages

Article36

eissn

2194-6302

issn

1544-6115

journal_volume

6

pub_type

杂志文章
  • Weighted-LASSO for structured network inference from time course data.

    abstract::We present a weighted-LASSO method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own prior internal structures of connectivity which drive the inference method. This ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1519

    authors: Charbonnier C,Chiquet J,Ambroise C

    更新日期:2010-01-01 00:00:00

  • Second order optimization for the inference of gene regulatory pathways.

    abstract::With the increasing availability of experimental data on gene interactions, modeling of gene regulatory pathways has gained special attention. Gradient descent algorithms have been widely used for regression and classification applications. Unfortunately, results obtained after training a model by gradient descent are...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2012-0021

    authors: Das M,Murthy CA,De RK

    更新日期:2014-02-01 00:00:00

  • Approximating the variance of the conditional probability of the state of a hidden Markov model.

    abstract::In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe dire...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章,评审

    doi:10.2202/1544-6115.1296

    authors: Siegmund DO,Yakir B

    更新日期:2007-01-01 00:00:00

  • A probabilistic approach to large-scale association scans: a semi-Bayesian method to detect disease-predisposing alleles.

    abstract::Recent analytic and technological breakthroughs have set the stage for genome-wide linkage disequilibrium studies to map disease-susceptibility variants. This paper discusses a probabilistic methodology for making disease-mapping inferences in large-scale case-control genetic studies. The semi-Bayesian approach promot...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1168

    authors: Schrodi SJ

    更新日期:2005-01-01 00:00:00

  • Node sampling for protein complex estimation in bait-prey graphs.

    abstract::In cellular biology, node-and-edge graph or "network" data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or "interactions" between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data freque...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2015-0007

    authors: Scholtens DM,Spencer BD

    更新日期:2015-08-01 00:00:00

  • Asymptotic optimality of likelihood-based cross-validation.

    abstract::Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth index...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1036

    authors: van der Laan MJ,Dudoit S,Keles S

    更新日期:2004-01-01 00:00:00

  • LCox: a tool for selecting genes related to survival outcomes using longitudinal gene expression data.

    abstract::Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, fo...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0060

    authors: Sun J,Herazo-Maya JD,Wang JL,Kaminski N,Zhao H

    更新日期:2019-02-13 00:00:00

  • Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

    abstract::Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a com...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0021

    authors: Taylor SL,Leiserowitz GS,Kim K

    更新日期:2013-12-01 00:00:00

  • Detecting outlier samples in microarray data.

    abstract::In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to exp...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1426

    authors: Shieh AD,Hung YS

    更新日期:2009-01-01 00:00:00

  • Empirical bayes microarray ANOVA and grouping cell lines by equal expression levels.

    abstract::In the exploding field of gene expression techniques such as DNA microarrays, there are still few general probabilistic methods for analysis of variance. Linear models and ANOVA are heavily used tools in many other disciplines of scientific research. The usual F-statistic is unsatisfactory for microarray data, which e...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1125

    authors: Lönnstedt I,Rimini R,Nilsson P

    更新日期:2005-01-01 00:00:00

  • Principal component discriminant analysis.

    abstract::The approach adopted involved two-stages. First the 11205 measurements in the mass spectrometry data were reduced to 14 scores by a principal component analysis of the centered but otherwise untreated and unscaled data matrix. Then a linear classifier was derived by linear discriminant analysis using these 14 scores a...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1350

    authors: Fearn T

    更新日期:2008-01-01 00:00:00

  • Approximate maximum likelihood estimation for population genetic inference.

    abstract::In many population genetic problems, parameter estimation is obstructed by an intractable likelihood function. Therefore, approximate estimation methods have been developed, and with growing computational power, sampling-based methods became popular. However, these methods such as Approximate Bayesian Computation (ABC...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0016

    authors: Bertl J,Ewing G,Kosiol C,Futschik A

    更新日期:2017-11-27 00:00:00

  • TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists.

    abstract::High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite di...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2014-0093

    authors: Schimek MG,Budinská E,Kugler KG,Švendová V,Ding J,Lin S

    更新日期:2015-06-01 00:00:00

  • On the operational characteristics of the Benjamini and Hochberg False Discovery Rate procedure.

    abstract::Multiple testing procedures are commonly used in gene expression studies for the detection of differential expression, where typically thousands of genes are measured over at least two experimental conditions. Given the need for powerful testing procedures, and the attendant danger of false positives in multiple testi...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1302

    authors: Green GH,Diggle PJ

    更新日期:2007-01-01 00:00:00

  • Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

    abstract::Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these m...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1666

    authors: Neuwald AF

    更新日期:2011-01-01 00:00:00

  • Comparison and visualisation of agreement for paired lists of rankings.

    abstract::Output from analysis of a high-throughput 'omics' experiment very often is a ranked list. One commonly encountered example is a ranked list of differentially expressed genes from a gene expression experiment, with a length of many hundreds of genes. There are numerous situations where interest is in the comparison of ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2016-0036

    authors: Donald MR,Wilson SR

    更新日期:2017-03-01 00:00:00

  • Genetic linkage analysis in the presence of germline mosaicism.

    abstract::Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. This condition violates the assumptions underlying classic genetic analysis and may lead to failure of such analysis. In this work we extend the statistical model used for genetic linkage analysis in order to incorp...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1709

    authors: Weissbrod O,Geiger D

    更新日期:2011-10-04 00:00:00

  • Buckley-James boosting for survival analysis with high-dimensional biomarker data.

    abstract::There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data. In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection s...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1550

    authors: Wang Z,Wang CY

    更新日期:2010-01-01 00:00:00

  • A method to increase the power of multiple testing procedures through sample splitting.

    abstract::Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the e...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1148

    authors: Rubin D,Dudoit S,van der Laan M

    更新日期:2006-01-01 00:00:00

  • Polyunphased: an extension to polytomous outcomes of the Unphased package for family-based genetic association analysis.

    abstract::Polytomous phenotypes arise when a disease has multiple subtypes or when two dichotomous phenotypes are analyzed simultaneously. Few software programs offer the option to analyze such phenotypes in family studies, and none implements conditional polytomous logistic regression for within-family analysis robust to popul...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2016-0035

    authors: Bureau A,Croteau J

    更新日期:2017-03-01 00:00:00

  • BayesMendel: an R environment for Mendelian risk prediction.

    abstract::Several important syndromes are caused by deleterious germline mutations of individual genes. In both clinical and research applications it is useful to evaluate the probability that an individual carries an inherited genetic variant of these genes, and to predict the risk of disease for that individual, using informa...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1063

    authors: Chen S,Wang W,Broman KW,Katki HA,Parmigiani G

    更新日期:2004-01-01 00:00:00

  • Transmission disequilibrium test power and sample size in the presence of locus heterogeneity.

    abstract::Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this re...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1501

    authors: Chen C,Yang G,Buyske S,Matise T,Finch SJ,Gordon D

    更新日期:2009-01-01 00:00:00

  • Combining nearest neighbor classifiers versus cross-validation selection.

    abstract::Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN metho...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1054

    authors: Paik M,Yang Y

    更新日期:2004-01-01 00:00:00

  • A test for detecting differential indirect trans effects between two groups of samples.

    abstract::Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, n...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0058

    authors: Chaturvedi N,Menezes RX,Goeman JJ,Wieringen WV

    更新日期:2018-07-31 00:00:00

  • The cyclohedron test for finding periodic genes in time course expression studies.

    abstract::The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebr...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1286

    authors: Morton J,Pachter L,Shiu A,Sturmfels B

    更新日期:2007-01-01 00:00:00

  • Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments.

    abstract::The ENCODE project has funded the generation of a diverse collection of methylation profiles using reduced representation bisulfite sequencing (RRBS) technology, enabling the analysis of epigenetic variation on a genomic scale at single-site resolution. A standard application of RRBS experiments is in the location of ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0027

    authors: Lacey MR,Baribault C,Ehrlich M

    更新日期:2013-12-01 00:00:00

  • Multiple testing in candidate gene situations: a comparison of classical, discrete, and resampling-based procedures.

    abstract::In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1729

    authors: Elsäβer A,Victor A,Hommel G

    更新日期:2011-01-01 00:00:00

  • Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting.

    abstract::Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established conce...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0038

    authors: Dazard JE,Ishwaran H,Mehlotra R,Weinberg A,Zimmerman P

    更新日期:2018-02-17 00:00:00

  • Variance and covariance heterogeneity analysis for detection of metabolites associated with cadmium exposure.

    abstract::In this study, we propose a novel statistical framework for detecting progressive changes in molecular traits as response to a pathogenic stimulus. In particular, we propose to employ Bayesian hierarchical models to analyse changes in mean level, variance and correlation of metabolic traits in relation to covariates. ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0041

    authors: Salamanca BV,Ebbels TM,Iorio MD

    更新日期:2014-04-01 00:00:00

  • Semi-parametric differential expression analysis via partial mixture estimation.

    abstract::We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situa...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1333

    authors: Rossell D,Guerra R,Scott C

    更新日期:2008-01-01 00:00:00