An adaptive independence test for microbiome community data.

Abstract:

:Advances in sequencing technologies and bioinformatics tools have vastly improved our ability to collect and analyze data from complex microbial communities. A major goal of microbiome studies is to correlate the overall microbiome composition with clinical or environmental variables. La Rosa et al. recently proposed a parametric test for comparing microbiome populations between two or more groups of subjects. However, this method is not applicable for testing the association between the community composition and a continuous variable. Although multivariate nonparametric methods based on permutations are widely used in ecology studies, they lack interpretability and can be inefficient for analyzing microbiome data. We consider the problem of testing for independence between the microbial community composition and a continuous or many-valued variable. By partitioning the range of the variable into a few slices, we formulate the problem as a problem of comparing multiple groups of microbiome samples, with each group indexed by a slice. To model multivariate and over-dispersed count data, we use the Dirichlet-multinomial distribution. We propose an adaptive likelihood-ratio test by learning a good partition or slicing scheme from the data. A dynamic programming algorithm is developed for numerical optimization. We demonstrate the superiority of the proposed test by numerically comparing it with that of La Rosa et al. and other popular approaches on the same topic including PERMANOVA, the distance covariance test, and the microbiome regression-based kernel association test. We further apply it to test the association of gut microbiome with age in three geographically distinct populations and show how the learned partition facilitates differential abundance analysis.

journal_name

Biometrics

journal_title

Biometrics

authors

Song Y,Zhao H,Wang T

doi

10.1111/biom.13154

subject

Has Abstract

pub_date

2020-06-01 00:00:00

pages

414-426

issue

2

eissn

0006-341X

issn

1541-0420

journal_volume

76

pub_type

杂志文章
  • Bayesian estimation of the probability of asbestos exposure from lung fiber counts.

    abstract::Asbestos exposure is a well-known risk factor for various lung diseases, and when they occur, workmen's compensation boards need to make decisions concerning the probability the cause is work related. In the absence of a definitive work history, measures of short and long asbestos fibers as well as counts of asbestos ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01279.x

    authors: Weichenthal S,Joseph L,Bélisle P,Dufresne A

    更新日期:2010-06-01 00:00:00

  • Identification of differential aberrations in multiple-sample array CGH studies.

    abstract::Most existing methods for identifying aberrant regions with array CGH data are confined to a single target sample. Focusing on the comparison of multiple samples from two different groups, we develop a new penalized regression approach with a fused adaptive lasso penalty to accommodate the spatial dependence of the cl...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01457.x

    authors: Wang HJ,Hu J

    更新日期:2011-06-01 00:00:00

  • The proportional odds cumulative incidence model for competing risks.

    abstract::We suggest an estimator for the proportional odds cumulative incidence model for competing risks data. The key advantage of this model is that the regression parameters have the simple and useful odds ratio interpretation. The model has been considered by many authors, but it is rarely used in practice due to the lack...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12330

    authors: Eriksson F,Li J,Scheike T,Zhang MJ

    更新日期:2015-09-01 00:00:00

  • Nonparametric Bayesian covariate-adjusted estimation of the Youden index.

    abstract::A novel nonparametric regression model is developed for evaluating the covariate-specific accuracy of a continuous biological marker. Accurately screening diseased from nondiseased individuals and correctly diagnosing disease stage are critically important to health care on several fronts, including guiding recommenda...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12686

    authors: Inácio de Carvalho V,de Carvalho M,Branscum AJ

    更新日期:2017-12-01 00:00:00

  • A stochastic model for censored-survival data in the presence of an auxiliary variable.

    abstract::In clinical trials and other investigations of survival time, information is often available on a time-dependent event other than survival. An example of such an auxiliary event in cancer studies is objective progression of disease. While some patients expire without experiencing objective disease progression, others ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lagakos SW

    更新日期:1976-09-01 00:00:00

  • Semiparametric modeling of longitudinal measurements and time-to-event data--a two-stage regression calibration approach.

    abstract:SUMMARY:In this article we investigate regression calibration methods to jointly model longitudinal and survival data using a semiparametric longitudinal model and a proportional hazards model. In the longitudinal model, a biomarker is assumed to follow a semiparametric mixed model where covariate effects are modeled p...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2007.00983.x

    authors: Ye W,Lin X,Taylor JM

    更新日期:2008-12-01 00:00:00

  • On the Colton model for clinical trials with delayed observations -- normally-distributed responses.

    abstract::The Colton model for the choice between two medical treatments is studied, with the additional assumption that there is a time lag between the administration of the treatments and the availability of the responses. Two simple procedures are suggested for dealing with patients who arrive during the waiting period, caus...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Langenberg P,Srinivasan R

    更新日期:1981-03-01 00:00:00

  • Biased and unbiased estimation in longitudinal studies with informative visit processes.

    abstract::The availability of data in longitudinal studies is often driven by features of the characteristics being studied. For example, clinical databases are increasingly being used for research to address longitudinal questions. Because visit times in such data are often driven by patient characteristics that may be related...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12501

    authors: McCulloch CE,Neuhaus JM,Olin RL

    更新日期:2016-12-01 00:00:00

  • Combining band recovery data and Pollock's robust design to model temporary and permanent emigration.

    abstract::Capture-recapture models are widely used to estimate demographic parameters of marked populations. Recently, this statistical theory has been extended to modeling dispersal of open populations. Multistate models can be used to estimate movement probabilities among subdivided populations if multiple sites are sampled. ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.00273.x

    authors: Lindberg MS,Kendall WL,Hines JE,Anderson MG

    更新日期:2001-03-01 00:00:00

  • A score regression approach to assess calibration of continuous probabilistic predictions.

    abstract::Calibration, the statistical consistency of forecast distributions and the observations, is a central requirement for probabilistic predictions. Calibration of continuous forecasts is typically assessed using the probability integral transform histogram. In this article, we propose significance tests based on scoring ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01406.x

    authors: Held L,Rufibach K,Balabdaoui F

    更新日期:2010-12-01 00:00:00

  • Further aspects of a Markovian sampling policy for water quality monitoring.

    abstract::In this paper, a Markov process is developed as a mathematical model to study the general problem of quality control monitoring. This approach was previously used by Arnold (1970) in development of sampling plans to study the water quality monitoring of streams. Arnold considered the expected sample size required for ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Smeach SC,Jernigan RW

    更新日期:1977-03-01 00:00:00

  • The analysis of pair-matched case-control studies, a multivariate approach.

    abstract::In matched case-control studies one frequently must consider more than one variable in the analysis and in this paper a log-linear model is presented to meet this objective. A conditional argument yields a method for making inferences on the parameters measuring the association between the variables and disease. The r...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Holford TR

    更新日期:1978-12-01 00:00:00

  • Markov models for covariate dependence of binary sequences.

    abstract::Suppose that a heterogeneous group of individuals is followed over time and that each individual can be in state 0 or state 1 at each time point. The sequence of states is assumed to follow a binary Markov chain. In this paper we model the transition probabilities for the 0 to 0 and 1 to 0 transitions by two logistic ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Muenz LR,Rubinstein LV

    更新日期:1985-03-01 00:00:00

  • Multilevel functional clustering analysis.

    abstract::In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2011.01714.x

    authors: Serban N,Jiang H

    更新日期:2012-09-01 00:00:00

  • Selecting the smoothing parameter for estimation of slowly changing evoked potential signals.

    abstract::Brain evoked potential (EP) data consist of a true response ("signal") and random background activity ("noise"), which are observed over repeated stimulus presentations ("trials"). A signal that changes slowly from trial to trial can be estimated by smoothing across trials and over time within trials. We present a met...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Raz J,Turetsky B,Fein G

    更新日期:1989-09-01 00:00:00

  • Analysis of longitudinal data in the presence of informative observational times and a dependent terminal event, with application to medical cost data.

    abstract::In longitudinal observational studies, repeated measures are often taken at informative observation times. Also, there may exist a dependent terminal event such as death that stops the follow-up. For example, patients in poorer health are more likely to seek medical treatment and their medical cost for each visit tend...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2007.00954.x

    authors: Liu L,Huang X,O'Quigley J

    更新日期:2008-09-01 00:00:00

  • Effects of exposure misclassification on regression analyses of epidemiologic follow-up study data.

    abstract::In epidemiologic studies, subjects are often misclassified as to their level of exposure. Ignoring this misclassification error in the analysis introduces bias in the estimates of certain parameters and invalidates many hypothesis tests. For situations in which there is misclassification of exposure in a follow-up stu...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Reade-Christopher SJ,Kupper LL

    更新日期:1991-06-01 00:00:00

  • Spatial regression and spillover effects in cluster randomized trials with count outcomes.

    abstract::This paper describes methodology for analyzing data from cluster randomized trials with count outcomes, taking indirect effects as well spatial effects into account. Indirect effects are modeled using a novel application of a measure of depth within the intervention arm. Both direct and indirect effects can be estimat...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13316

    authors: Anaya-Izquierdo K,Alexander N

    更新日期:2020-06-18 00:00:00

  • Estimating treatment effect in a proportional hazards model in randomized clinical trials with all-or-nothing compliance.

    abstract::We consider methods for estimating the treatment effect and/or the covariate by treatment interaction effect in a randomized clinical trial under noncompliance with time-to-event outcome. As in Cuzick et al. (2007), assuming that the patient population consists of three (possibly latent) subgroups based on treatment p...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12472

    authors: Li S,Gray RJ

    更新日期:2016-09-01 00:00:00

  • Logarithmic transformations in ANOVA.

    abstract::A method is presented for choosing an additive constant c when transforming data x to y = log(x + c). The method preserves Type I error probability and power in ANOVA under the assumption that the x + c for some c are log-normally distributed. The method has advantages similar to those of rank transformations--namely,...

    journal_title:Biometrics

    pub_type: 临床试验,杂志文章

    doi:

    authors: Berry DA

    更新日期:1987-06-01 00:00:00

  • Post-stratification in the randomized clinical trial.

    abstract::A topic of current biometric discussion is whether stratification should be used in randomized clinical trials and, if so, which kind. An approach based upon randomization theory is used to evaluate pre- versus post-stratification. The results obtained relate specifically to the effect of the size of the clinical tria...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: McHugh R,Matts J

    更新日期:1983-03-01 00:00:00

  • A study of deleterious gene structure in plants using Markov chain Monte Carlo.

    abstract::The characteristics of deleterious genes have been of great interest in both theory and practice in genetics. Because of the complex genetic mechanism of these deleterious genes, most current studies try to estimate the overall magnitude of mortality effects on a population, which is characterized classically by the n...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00376.x

    authors: Lee JK,Lascoux M,Newton MA,Nordheim EV

    更新日期:1999-06-01 00:00:00

  • Cohort case-control design and analysis for clustered failure-time data.

    abstract::Cohort case-control design is an efficient and economical design to study risk factors for disease incidence or mortality in a large cohort. In the last few decades, a variety of cohort case-control designs have been developed and theoretically justified. These designs have been exclusively applied to the analysis of ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2002.00764.x

    authors: Lu SE,Wang MC

    更新日期:2002-12-01 00:00:00

  • On the accommodation of disease rate correlations in aggregate data studies of disease risk factors.

    abstract::Prentice and Sheppard (1995, Biometrika 82, 113-125) proposed a method for estimating relative risks associated with poorly measured exposures using disease rates from multiple populations and exposure and confounding factor data from sample surveys of persons in each population. The method involved an assumption of i...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Anderson AB,Prentice RL

    更新日期:1998-12-01 00:00:00

  • Spatial cluster detection for weighted outcomes using cumulative geographic residuals.

    abstract::Spatial cluster detection is an important methodology for identifying regions with excessive numbers of adverse health events without making strong model assumptions on the underlying spatial dependence structure. Previous work has focused on point or individual-level outcome data and few advances have been made when ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01323.x

    authors: Cook AJ,Li Y,Arterburn D,Tiwari RC

    更新日期:2010-09-01 00:00:00

  • Multivariate bioassay, combination of bioassays, and Fieller's theorem.

    abstract::In this paper alternative methods for estimating the relative potency, its confidence intervals, and testing for proportionality are developed for multivariate bioassays. The test and estimate are based on the smaller characteristic root and the corresponding characteristic vector of a 2 X 2 matrix. The same idea is a...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Srivastava MS

    更新日期:1986-03-01 00:00:00

  • Estimating acute air pollution health effects from cohort study data.

    abstract::Traditional studies of short-term air pollution health effects use time series data, while cohort studies generally focus on long-term effects. There is increasing interest in exploiting individual level cohort data to assess short-term health effects in order to understand the mechanisms and time scales of action. We...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12125

    authors: Szpiro AA,Sheppard L,Adar SD,Kaufman JD

    更新日期:2014-03-01 00:00:00

  • Exact inference on the random-effects model for meta-analyses with few studies.

    abstract::We describe an exact, unconditional, non-randomized procedure for producing confidence intervals for the grand mean in a normal-normal random effects meta-analysis. The procedure targets meta-analyses based on too few primary studies, ≤ 7 , say, to ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12998

    authors: Michael H,Thornton S,Xie M,Tian L

    更新日期:2019-06-01 00:00:00

  • A Monte Carlo investigation of homogeneity tests of the odds ratio under various sample size configurations.

    abstract::Epidemiologic data for case-control studies are often summarized into K 2 x 2 tables. Given a fixed number of cases and controls, the degree of sparseness in the data depends on the number of strata, K. The effect of increasing stratification on size and power of seven tests of homogeneity of the odds ratio is studied...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Jones MP,O'Gorman TW,Lemke JH,Woolson RF

    更新日期:1989-03-01 00:00:00

  • Robust semiparametric microarray normalization and significance analysis.

    abstract::Microarray technology allows the monitoring of expression levels of thousands of genes simultaneously. A semiparametric location and scale model is proposed to model gene expression levels for normalization and significance analysis purposes. Robust estimation based on weighted least absolute deviation regression and ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2005.00452.x

    authors: Ma S,Kosorok MR,Huang J,Xie H,Manzella L,Soares MB

    更新日期:2006-06-01 00:00:00