Modelling of zero-inflation improves inference of metagenomic gene count data.

Abstract:

:Metagenomics enables the study of gene abundances in complex mixtures of microorganisms and has become a standard methodology for the analysis of the human microbiome. However, gene abundance data is inherently noisy and contains high levels of biological and technical variability as well as an excess of zeros due to non-detected genes. This makes the statistical analysis challenging. In this study, we present a new hierarchical Bayesian model for inference of metagenomic gene abundance data. The model uses a zero-inflated overdispersed Poisson distribution which is able to simultaneously capture the high gene-specific variability as well as zero observations in the data. By analysis of three comprehensive datasets, we show that zero-inflation is common in metagenomic data from the human gut and, if not correctly modelled, it can lead to substantial reductions in statistical power. We also show, by using resampled metagenomic data, that our model has, compared to other methods, a higher and more stable performance for detecting differentially abundant genes. We conclude that proper modelling of the gene-specific variability, including the excess of zeros, is necessary to accurately describe gene abundances in metagenomic data. The proposed model will thus pave the way for new biological insights into the structure of microbial communities.

journal_name

Stat Methods Med Res

authors

Jonsson V,Österlund T,Nerman O,Kristiansson E

doi

10.1177/0962280218811354

subject

Has Abstract

pub_date

2019-12-01 00:00:00

pages

3712-3728

issue

12

eissn

0962-2802

issn

1477-0334

journal_volume

28

pub_type

杂志文章
  • Reference-based pattern-mixture models for analysis of longitudinal binary data.

    abstract::Pattern-mixture model (PMM)-based controlled imputations have become a popular tool to assess the sensitivity of primary analysis inference to different post-dropout assumptions or to estimate treatment effectiveness. The methodology is well established for continuous responses but less well established for binary res...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220941880

    authors: Lu K

    更新日期:2020-12-01 00:00:00

  • Probability intervals of toxicity and efficacy design for dose-finding clinical trials in oncology.

    abstract::Immunotherapy, gene therapy or adoptive cell therapies, such as the chimeric antigen receptor+ T-cell therapies, have demonstrated promising therapeutic effects in oncology patients. We consider statistical designs for dose-finding adoptive cell therapy trials, in which the monotonic dose-response relationship assumed...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220977009

    authors: Lin X,Ji Y

    更新日期:2020-12-16 00:00:00

  • Estimating the dependence of mixed sensitive response types in randomized response technique.

    abstract::Sensitive questions are often involved in healthcare or medical survey research. Much empirical evidence has shown that the randomized response technique is useful for the collection of truthful responses. However, few studies have discussed methods to estimate the dependence of sensitive responses of multiple types. ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280219847492

    authors: Chu AM,So MK,Chan TW,Tiwari A

    更新日期:2020-03-01 00:00:00

  • Relative efficiency of unequal cluster sizes for variance component estimation in cluster randomized and multicentre trials.

    abstract::Cluster randomized and multicentre trials evaluate the effect of a treatment on persons nested within clusters, for instance patients within clinics or pupils within schools. Although equal sample sizes per cluster are generally optimal for parameter estimation, they are rarely feasible. This paper addresses the relat...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280206079018

    authors: van Breukelen GJ,Candel MJ,Berger MP

    更新日期:2008-08-01 00:00:00

  • Expected p-values in light of an ROC curve analysis applied to optimal multiple testing procedures.

    abstract::Many statistical studies report p-values for inferential purposes. In several scenarios, the stochastic aspect of p-values is neglected, which may contribute to drawing wrong conclusions in real data experiments. The stochastic nature of p-values makes their use to examine the performance of given testing procedures o...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217704451

    authors: Vexler A,Yu J,Zhao Y,Hutson AD,Gurevich G

    更新日期:2018-12-01 00:00:00

  • Efficient estimation of a linear transformation model for current status data via penalized splines.

    abstract::We propose a flexible and computationally efficient penalized estimation method for a semi-parametric linear transformation model with current status data. To facilitate model fitting, the unknown monotone function is approximated by monotone B-splines, and a computationally efficient hybrid algorithm involving the Fi...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218820406

    authors: Lu M,Liu Y,Li CS

    更新日期:2020-01-01 00:00:00

  • Combining estimates of the odds ratio: the state of the art.

    abstract::Medical research commonly relies on the combination of 2 x 2 tables of counted data for making inferences about treatment effects or about the causes of disease. This article reviews point estimation and interval estimation for a common odds ratio. Traditional methods for providing these estimates face special challen...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029400300204

    authors: Emerson JD

    更新日期:1994-01-01 00:00:00

  • Sample size for binary logistic prediction models: Beyond events per variable criteria.

    abstract::Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictor...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218784726

    authors: van Smeden M,Moons KG,de Groot JA,Collins GS,Altman DG,Eijkemans MJ,Reitsma JB

    更新日期:2019-08-01 00:00:00

  • Modelling breast cancer tumour growth for a stable disease population.

    abstract::Statistical models of breast cancer tumour progression have been used to further our knowledge of the natural history of breast cancer, to evaluate mammography screening in terms of mortality, to estimate overdiagnosis, and to estimate the impact of lead-time bias when comparing survival times between screen detected ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217734583

    authors: Isheden G,Humphreys K

    更新日期:2019-03-01 00:00:00

  • Random-effects models for multivariate repeated measures.

    abstract::Mixed models are widely used for the analysis of one repeatedly measured outcome. If more than one outcome is present, a mixed model can be used for each one. These separate models can be tied together into a multivariate mixed model by specifying a joint distribution for their random effects. This strategy has been u...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280206075305

    authors: Fieuws S,Verbeke G,Molenberghs G

    更新日期:2007-10-01 00:00:00

  • Copas-like selection model to correct publication bias in systematic review of diagnostic test studies.

    abstract::The accuracy of a diagnostic test, which is often quantified by a pair of measures such as sensitivity and specificity, is critical for medical decision making. Separate studies of an investigational diagnostic test can be combined through meta-analysis; however, such an analysis can be threatened by publication bias....

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218791602

    authors: Piao J,Liu Y,Chen Y,Ning J

    更新日期:2019-10-01 00:00:00

  • Projections of cancer mortality risks using spatio-temporal P-spline models.

    abstract::Cancer mortality risk estimates are essential for planning resource allocation and designing and evaluating cancer prevention and management strategies. However, mortality figures generally become available after a few years, making necessary to develop reliable procedures to provide current and near future mortality ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280212446366

    authors: Ugarte MD,Goicoa T,Etxeberria J,Militino AF

    更新日期:2012-10-01 00:00:00

  • Modeling fecundity in the presence of a sterile fraction using a semi-parametric transformation model for grouped survival data.

    abstract::The analysis of fecundity data is challenging and requires consideration of both highly timed and interrelated biologic processes in the context of essential behaviors such as sexual intercourse during the fertile window. Understanding human fecundity is further complicated by presence of a sterile population, i.e. co...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280212438646

    authors: McLain AC,Sundaram R,Buck Louis GM

    更新日期:2016-02-01 00:00:00

  • Inferences about population means of health care costs.

    abstract::The analysis of health care costs is complicated by the skewed and heteroscedastic nature of their distribution with possibly additional zero values. Statistical methods that do not adjust for these features can lead to incorrect conclusions. This paper reviews recent developments in statistical methods for the analys...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1191/0962280202sm290ra

    authors: Zhou XH

    更新日期:2002-08-01 00:00:00

  • Measurement error correction using validation data: a review of methods and their applicability in case-control studies.

    abstract::Measurement error is a serious problem in the analysis of epidemiological data. In the past 20 years, a large number of methods for the correction of measurement error have been developed. While at the beginning mostly methods for cohort studies were considered, recently more attention has been paid to case-control st...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228020000900504

    authors: Thürigen D,Spiegelman D,Blettner M,Heuer C,Brenner H

    更新日期:2000-10-01 00:00:00

  • Bayesian latent structure modeling of walking behavior in a physical activity intervention.

    abstract::The analysis of walking behavior in a physical activity intervention is considered. A Bayesian latent structure modeling approach is proposed whereby the ability and willingness of participants is modeled via latent effects. The dropout process is jointly modeled via a linked survival model. Computational issues are a...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214529932

    authors: Lawson AB,Ellerbe C,Carroll R,Alia K,Coulon S,Wilson DK,VanHorn ML,George SM

    更新日期:2016-12-01 00:00:00

  • Prospective analysis of infectious disease surveillance data using syndromic information.

    abstract::In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease coun...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214527385

    authors: Corberán-Vallet A,Lawson AB

    更新日期:2014-12-01 00:00:00

  • The application of multidimensional scaling methods to epidemiological data.

    abstract::This paper illustrates the use of multidimensional scaling methods (MDS) to examine space-time patterns in epidemic data. The paper begins by outlining the principles of MDS. The model is then formally specified and illustrated by application to two data sets. The first is partly a tutorial example. It uses monthly re...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/096228029500400202

    authors: Cliff AD,Haggett P,Smallman-Raynor MR,Stroup DF,Williamson GD

    更新日期:1995-06-01 00:00:00

  • Estimation of regression quantiles in complex surveys with data missing at random: An application to birthweight determinants.

    abstract::The estimation of population parameters using complex survey data requires careful statistical modelling to account for the design features. This is further complicated by unit and item nonresponse for which a number of methods have been developed in order to reduce estimation bias. In this paper, we address some issu...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280213484401

    authors: Geraci M

    更新日期:2016-08-01 00:00:00

  • Inferences about a linear combination of proportions.

    abstract::Statistical methods for carrying out asymptotic inferences (tests or confidence intervals) relative to one or two independent binomial proportions are very frequent. However, inferences about a linear combination of K independent proportions L = Σβ(i)p(i) (in which the first two are special cases) have had very little...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280209347953

    authors: Martín Andrés A,Alvarez Hernández M,Herranz Tejedor I

    更新日期:2011-08-01 00:00:00

  • Multiplicity adjustments in trials with two correlated comparisons of interest.

    abstract::Clinical trials investigating the efficacy of two or more doses of an experimental treatment compared to a single reference arm are not uncommon. In such situations, if each dose is compared to the reference arm using an un-adjusted significance level, consideration of the Type I familywise error is likely to be requi...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280210378943

    authors: Fernandes N,Stone A

    更新日期:2011-12-01 00:00:00

  • Statistical challenges in assessing potential efficacy of complex interventions in pilot or feasibility studies.

    abstract::Early phase trials of complex interventions currently focus on assessing the feasibility of a large randomised control trial and on conducting pilot work. Assessing the efficacy of the proposed intervention is generally discouraged, due to concerns of underpowered hypothesis testing. In contrast, early assessment of e...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280215589507

    authors: Wilson DT,Walwyn RE,Brown J,Farrin AJ,Brown SR

    更新日期:2016-06-01 00:00:00

  • Interval estimation of a population mean using existing knowledge or data on effect sizes.

    abstract::Bayes or empirical Bayes methods to improve inferential accuracy for a population mean has been widely adopted in medical research. As the joint prior distribution of both the mean and variance parameters can be difficult to specify or estimate, most of these methods have relied on certain level of simplifications of ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218773537

    authors: Shen C

    更新日期:2019-06-01 00:00:00

  • A goodness-of-fit test for the random-effects distribution in mixed models.

    abstract::In this paper, we develop a simple diagnostic test for the random-effects distribution in mixed models. The test is based on the gradient function, a graphical tool proposed by Verbeke and Molenberghs to check the impact of assumptions about the random-effects distribution in mixed models on inferences. Inference is c...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214564721

    authors: Efendi A,Drikvandi R,Verbeke G,Molenberghs G

    更新日期:2017-04-01 00:00:00

  • A generalization of functional clustering for discrete multivariate longitudinal data.

    abstract::This paper presents a new model-based generalized functional clustering method for discrete longitudinal data, such as multivariate binomial and Poisson distributed data. For this purpose, we propose a multivariate functional principal component analysis (MFPCA)-based clustering procedure for a latent multivariate Gau...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220921912

    authors: Lim Y,Cheung YK,Oh HS

    更新日期:2020-11-01 00:00:00

  • Multi-state Markov models in cancer screening evaluation: a brief review and case study.

    abstract::This work presents a brief overview of Markov models in cancer screening evaluation and focuses on two specific models. A three-state model was first proposed to estimate jointly the sensitivity of the screening procedure and the average duration in the preclinical phase, i.e. the period when the cancer is asymptomati...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/0962280209359848

    authors: Uhry Z,Hédelin G,Colonna M,Asselain B,Arveux P,Rogel A,Exbrayat C,Guldenfels C,Courtial I,Soler-Michel P,Molinié F,Eilstein D,Duffy SW

    更新日期:2010-10-01 00:00:00

  • Inferential tools in penalized logistic regression for small and sparse data: A comparative study.

    abstract::This paper focuses on inferential tools in the logistic regression model fitted by the Firth penalized likelihood. In this context, the Likelihood Ratio statistic is often reported to be the preferred choice as compared to the 'traditional' Wald statistic. In this work, we consider and discuss a wider range of test st...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216661213

    authors: Siino M,Fasola S,Muggeo VM

    更新日期:2018-05-01 00:00:00

  • A composite likelihood approach to predict the sex of the baby.

    abstract::Couples with diseases associated with the sexual chromosomes, as well as families in countries where the desire for a male is extreme, are interested in influencing the sex of the baby. We propose an original composite likelihood approach to analyse the relation between sex of the newborn and timing of the intercourse...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217702415

    authors: Tiberi S,Scarpa B,Sartori N

    更新日期:2018-11-01 00:00:00

  • Optimal quantile level selection for disease classification and biomarker discovery with application to electrocardiogram data.

    abstract::Classification with a large number of predictors and biomarker discovery become increasingly important in biological and medical research. This paper focuses on performing classification of cardiovascular diseases based on electrocardiogram analysis which deals with many variables and a lot of measurements within vari...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217699996

    authors: Zhou Y,Huang R,Yu S,Ma Y

    更新日期:2018-11-01 00:00:00

  • Bayesian nonparametric inference for the three-class Youden index and its associated optimal cutoff points.

    abstract::The three-class Youden index serves both as a measure of medical test accuracy and a criterion to choose the optimal pair of cutoff values for classifying subjects into three ordinal disease categories (e.g. no disease, mild disease, advanced disease). We present a Bayesian nonparametric approach for estimating the th...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217742538

    authors: Carvalho VI,Branscum AJ

    更新日期:2018-03-01 00:00:00