Improving propensity score weighting using machine learning.

Abstract:

:Machine learning techniques such as classification and regression trees (CART) have been suggested as promising alternatives to logistic regression for the estimation of propensity scores. The authors examined the performance of various CART-based propensity score models using simulated data. Hypothetical studies of varying sample sizes (n=500, 1000, 2000) with a binary exposure, continuous outcome, and 10 covariates were simulated under seven scenarios differing by degree of non-linear and non-additive associations between covariates and the exposure. Propensity score weights were estimated using logistic regression (all main effects), CART, pruned CART, and the ensemble methods of bagged CART, random forests, and boosted CART. Performance metrics included covariate balance, standard error, per cent absolute bias, and 95 per cent confidence interval (CI) coverage. All methods displayed generally acceptable performance under conditions of either non-linearity or non-additivity alone. However, under conditions of both moderate non-additivity and moderate non-linearity, logistic regression had subpar performance, whereas ensemble methods provided substantially better bias reduction and more consistent 95 per cent CI coverage. The results suggest that ensemble methods, especially boosted CART, may be useful for propensity score weighting.

journal_name

Stat Med

journal_title

Statistics in medicine

authors

Lee BK,Lessler J,Stuart EA

doi

10.1002/sim.3782

subject

Has Abstract

pub_date

2010-02-10 00:00:00

pages

337-46

issue

3

eissn

0277-6715

issn

1097-0258

journal_volume

29

pub_type

杂志文章
  • Measurement in clinical trials: a neglected issue for statisticians?

    abstract::Biostatisticians have frequently uncritically accepted the measurements provided by their medical colleagues engaged in clinical research. Such measures often involve considerable loss of information. Particularly, unfortunate is the widespread use of the so-called 'responder analysis', which may involve not only a lo...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3603

    authors: Senn S,Julious S

    更新日期:2009-11-20 00:00:00

  • Robust estimation for linear panel data models.

    abstract::In different fields of applications including, but not limited to, behavioral, environmental, medical sciences, and econometrics, the use of panel data regression models has become increasingly popular as a general framework for making meaningful statistical inferences. However, when the ordinary least squares (OLS) m...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8732

    authors: Hamiye Beyaztas B,Bandyopadhyay S

    更新日期:2020-12-20 00:00:00

  • Proportional hazards models with frailties and random effects.

    abstract::We discuss some of the fundamental concepts underlying the development of frailty and random effects models in survival. One of these fundamental concepts was the idea of a frailty model where each subject has his or her own disposition to failure, their so-called frailty, additional to any effects we wish to quantify...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1259

    authors: O'Quigley J,Stare J

    更新日期:2002-11-15 00:00:00

  • Conditional power and predictive power based on right censored data with supplementary auxiliary information.

    abstract::Conditional power and predictive power provide estimates of the probability of success at the end of the trial based on the information from the interim analysis. The observed value of the time to event endpoint at the interim analysis could be biased for the true treatment effect due to early censoring, leading to a ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7673

    authors: Sun L,Wan Y

    更新日期:2018-08-15 00:00:00

  • Fast linear mixed model computations for genome-wide association studies with longitudinal data.

    abstract::Genome-wide association studies are characterized by a huge number of statistical tests performed to discover new disease-related genetic variants [in the form of single-nucleotide polymorphisms (SNPs)] in human DNA. Many SNPs have been identified for cross-sectionally measured phenotypes. However, there is a growing ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5517

    authors: Sikorska K,Rivadeneira F,Groenen PJ,Hofman A,Uitterlinden AG,Eilers PH,Lesaffre E

    更新日期:2013-01-15 00:00:00

  • Analysis of panel data under hidden mover-stayer models.

    abstract::Analysis of panel data is often challenged by the presence of heterogeneity and state misclassification. In this paper, we propose a hidden mover-stayer model to facilitate heterogeneity for a population that consists of two subpopulations each of movers or of stayers and to simultaneously account for state misclassif...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7346

    authors: Yi GY,He W,He F

    更新日期:2017-09-10 00:00:00

  • A method to test for a recent increase in HIV-1 seroconversion incidence: results from the Multicenter AIDS Cohort Study (MACS).

    abstract::We have formulated the problem of determining whether there has been an upturn in HIV-1 seroconversion incidence over the first five years of follow-up in the Multicenter AIDS Cohort Study (MACS) as that of locating the minimum of a quadratic regression or examination of two-knot piecewise spline models. Under a quadr...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,多中心研究

    doi:10.1002/sim.4780120207

    authors: Zhou SY,Kingsley LA,Taylor JM,Chmiel JS,He DY,Hoover DR

    更新日期:1993-01-30 00:00:00

  • Regression analysis of clustered interval-censored data with informative cluster size.

    abstract::Interval-censored data are commonly found in studies of diseases that progress without symptoms, which require clinical evaluation for detection. Several techniques have been suggested with independent assumption. However, the assumption will not be valid if observations come from clusters. Furthermore, when the clust...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4042

    authors: Kim YJ

    更新日期:2010-12-10 00:00:00

  • Estimation of the wild-type minimum inhibitory concentration value distribution.

    abstract::Antimicrobial resistance has become one of the main public health burdens of the last decades, and monitoring the development and spread of non-wild-type isolates has therefore gained increased interest. Monitoring is performed based on the minimum inhibitory concentration (MIC) values, which are collected through the...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5939

    authors: Jaspers S,Aerts M,Verbeke G,Beloeil PA

    更新日期:2014-01-30 00:00:00

  • Logistic regression with incompletely observed categorical covariates--investigating the sensitivity against violation of the missing at random assumption.

    abstract::Missing values in the covariates are a widespread complication in the statistical inference of regression models. The maximum likelihood principle requires specification of the distribution of the covariates, at least in part. For categorical covariates, log-linear models can be used. Additionally, the missing at rand...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780141205

    authors: Vach W,Blettner M

    更新日期:1995-06-30 00:00:00

  • Location-scale cumulative odds models for ordinal data: a generalized non-linear model approach.

    abstract::Proportional odds regression models for multinomial probabilities based on ordered categories have been generalized in two somewhat different directions. Models having scale as well as location parameters for adjustment of boundaries (on an unobservable, underlying continuum) between categories have been employed in t...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780141105

    authors: Cox C

    更新日期:1995-06-15 00:00:00

  • Exploring the benefits of adaptive sequential designs in time-to-event endpoint settings.

    abstract::Sequential analysis is frequently employed to address ethical and financial issues in clinical trials. Sequential analysis may be performed using standard group sequential designs, or, more recently, with adaptive designs that use estimates of treatment effect to modify the maximal statistical information to be collec...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4156

    authors: Emerson SC,Rudser KD,Emerson SS

    更新日期:2011-05-20 00:00:00

  • Predictive diagnostics for logistic models.

    abstract::Novel methodology is implemented to assess the predictive power of covariate information associated with sequential binary events. Logistic models are first fitted on the basis of a subset of the observations and then evaluated sequentially on the rest. The probabilistic forecasts are compared to the outcomes via a sc...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(SICI)1097-0258(19961030)15:20<2149::AID-S

    authors: Seillier-Moiseiwitsch F

    更新日期:1996-10-30 00:00:00

  • Viral load detectability profiles for HIV infection.

    abstract::The introduction of potent antiretroviral therapies for treatment of HIV infection typically results in a dramatic reduction in plasma HIV RNA concentration, often to levels undetectable by current measurement practices. However, although a high proportion of patients achieve 'undetectability', many then experience a ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1325

    authors: McKinnon EJ,James IR,John M,Mallal SA

    更新日期:2003-02-15 00:00:00

  • A review of methods for futility stopping based on conditional power.

    abstract::Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the c...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/sim.2151

    authors: Lachin JM

    更新日期:2005-09-30 00:00:00

  • Latent transition analysis: inference and estimation.

    abstract::Parameters for latent transition analysis (LTA) are easily estimated by maximum likelihood (ML) or Bayesian method via Markov chain Monte Carlo (MCMC). However, unusual features in the likelihood can cause difficulties in ML and Bayesian inference and estimation, especially with small samples. In this study we explore...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3130

    authors: Chung H,Lanza ST,Loken E

    更新日期:2008-05-20 00:00:00

  • Graphical model checking with correlated response data.

    abstract::Correlated response data arise often in biomedical studies. The generalized estimation equation (GEE) approach is widely used in regression analysis for such data. However, there are few methods available to check the adequacy of regression models in GEE. In this paper, a graphical method is proposed based on Cook and...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.889

    authors: Pan W,Connett JE,Porzio GC,Weisberg S

    更新日期:2001-10-15 00:00:00

  • Hierarchical nested trial design (HNTD) for demonstrating treatment efficacy of new antibacterial drugs in patient populations with emerging bacterial resistance.

    abstract::In the last decade or so, pharmaceutical drug development activities in the area of new antibacterial drugs for treating serious bacterial diseases have declined, and at the same time, there are worries that the increased prevalence of antibiotic-resistant bacterial infections, especially the increase in drug-resistan...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6233

    authors: Huque MF,Valappil T,Soon GG

    更新日期:2014-11-10 00:00:00

  • The analysis of contingency tables with ordinal data: an application to monitoring antibiotic resistance.

    abstract::Rationalization of antibiotic therapy in the management of infectious diseases is helped by a knowledge of the patterns of sensitivity and resistance of bacteria to antibiotics and their possible changes both in time and from one hospital unit to another. In this paper we present the results regarding the sensitivitie...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2447

    authors: Bonetto C,Giannerini S,Giovagnoli A

    更新日期:2006-10-30 00:00:00

  • Predicting analysis times in randomized clinical trials.

    abstract::Randomized clinical trial designs commonly include one or more planned interim analyses. At these times an external monitoring committee reviews the accumulated data and determines whether it is scientifically and ethically appropriate for the study to continue. With failure-time endpoints, it is common to schedule an...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.843

    authors: Bagiella E,Heitjan DF

    更新日期:2001-07-30 00:00:00

  • The use of an extended baseline period in the evaluation of treatment in a longitudinal Duchenne muscular dystrophy trial.

    abstract::A trial of Duchenne muscular dystrophy involved tracking boys of all ages through a one-year baseline period, followed by a one-year trial of leucine versus placebo treatment. In this paper we develop a model for a total-muscle-strength score that uses the data of the extended baseline period in the evaluation of the ...

    journal_title:Statistics in medicine

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:10.1002/sim.4780050304

    authors: Madsen KS,Miller JP,Province MA

    更新日期:1986-05-01 00:00:00

  • Identifying the types of missingness in quality of life data from clinical trials.

    abstract::This paper discusses methods of identifying the types of missingness in quality of life (QOL) data in cancer clinical trials. The first approach involves collecting information on why the QOL questionnaires were not completed. Based on the reasons provided one may be able to distinguish the mechanisms causing missing ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/(sici)1097-0258(19980315/15)17:5/7<739::ai

    authors: Curran D,Bacchi M,Schmitz SF,Molenberghs G,Sylvester RJ

    更新日期:1998-03-15 00:00:00

  • An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes.

    abstract::Often multiple outcomes are of interest in each study identified by a systematic review, and in this situation a separate univariate meta-analysis is usually applied to synthesize the evidence for each outcome independently; an alternative approach is a single multivariate meta-analysis model that utilizes any correla...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2524

    authors: Riley RD,Abrams KR,Lambert PC,Sutton AJ,Thompson JR

    更新日期:2007-01-15 00:00:00

  • Last observation carry-forward and last observation analysis.

    abstract::Drop-out often occurs in clinical trials with multiple visits and drop-out is often informative in the sense that the population of patients who dropped out is different from the population of patients who completed the study. To handle data with informative drop-out, an intention-to-treat analysis, which evaluates tr...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1519

    authors: Shao J,Zhong B

    更新日期:2003-08-15 00:00:00

  • Causal inference in survival analysis using pseudo-observations.

    abstract::Causal inference for non-censored response variables, such as binary or quantitative outcomes, is often based on either (1) direct standardization ('G-formula') or (2) inverse probability of treatment assignment weights ('propensity score'). To do causal inference in survival analysis, one needs to address right-censo...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7297

    authors: Andersen PK,Syriopoulou E,Parner ET

    更新日期:2017-07-30 00:00:00

  • An evaluation of phase I clinical trial designs in the continuous dose-response setting.

    abstract::Both traditional phase I designs and the increasingly popular continual reassessment method (CRM) designs select an estimate of maximum tolerable dose (MTD) from among a set of prespecified dose levels. Although CRM designs use an implied dose-response model to select the next dose level, in general it is neither assu...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.903

    authors: Storer BE

    更新日期:2001-08-30 00:00:00

  • Current development in clinical trials: issues old and new.

    abstract::Clinical trials, especially the randomized clinical trial, have been and will remain the gold standard for the evaluation of new interventions, including pharmaceuticals, biologics, medical devices, procedures, or behavioral modifications. Despite more than five decades of experience, there are still challenges in the...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5405

    authors: DeMets DL

    更新日期:2012-11-10 00:00:00

  • Performance of weighted estimating equations for longitudinal binary data with drop-outs missing at random.

    abstract::The generalized estimating equations (GEE) approach is commonly used to model incomplete longitudinal binary data. When drop-outs are missing at random through dependence on observed responses (MAR), GEE may give biased parameter estimates in the model for the marginal means. A weighted estimating equations approach g...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1241

    authors: Preisser JS,Lohman KK,Rathouz PJ

    更新日期:2002-10-30 00:00:00

  • Beta-binomial/Poisson regression models for repeated bivariate counts.

    abstract::We analyze data obtained from a study designed to evaluate training effects on the performance of certain motor activities of Parkinson's disease patients. Maximum likelihood methods were used to fit beta-binomial/Poisson regression models tailored to evaluate the effects of training on the numbers of attempted and su...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3303

    authors: Lora MI,Singer JM

    更新日期:2008-07-30 00:00:00

  • The social contagion hypothesis: comment on 'Social contagion theory: examining dynamic social networks and human behavior'.

    abstract::I reflect on the statistical methods of the Christakis-Fowler studies on network-based contagion of traits by checking the sensitivity of these kinds of results to various alternate specifications and generative mechanisms. Despite the honest efforts of all involved, I remain pessimistic about establishing whether bin...

    journal_title:Statistics in medicine

    pub_type: 评论,杂志文章

    doi:10.1002/sim.5551

    authors: Thomas AC

    更新日期:2013-02-20 00:00:00