Optimal matching with minimal deviation from fine balance in a study of obesity and surgical outcomes.


:In multivariate matching, fine balance constrains the marginal distributions of a nominal variable in treated and matched control groups to be identical without constraining who is matched to whom. In this way, a fine balance constraint can balance a nominal variable with many levels while focusing efforts on other more important variables when pairing individuals to minimize the total covariate distance within pairs. Fine balance is not always possible; that is, it is a constraint on an optimization problem, but the constraint is not always feasible. We propose a new algorithm that returns a minimum distance finely balanced match when one is feasible, and otherwise minimizes the total distance among all matched samples that minimize the deviation from fine balance. Perhaps we can come very close to fine balance when fine balance is not attainable; moreover, in any event, because our algorithm is guaranteed to come as close as possible to fine balance, the investigator may perform one match, and on that basis judge whether the best attainable balance is adequate or not. We also show how to incorporate an additional constraint. The algorithm is implemented in two similar ways, first as an optimal assignment problem with an augmented distance matrix, second as a minimum cost flow problem in a network. The case of knee surgery in the Obesity and Surgical Outcomes Study motivated the development of this algorithm and is used as an illustration. In that example, 2 of 47 hospitals had too few nonobese patients to permit fine balance for the nominal variable with 47 levels representing the hospital, but our new algorithm came very close to fine balance. Moreover, in that example, there was a shortage of nonobese diabetic patients, and incorporation of an additional constraint forced the match to include all of these nonobese diabetic patients, thereby coming as close as possible to balance for this important but recalcitrant covariate.






Yang D,Small DS,Silber JH,Rosenbaum PR




Has Abstract


2012-06-01 00:00:00












  • A general model for the analysis of mark-resight, mark-recapture, and band-recovery data under tag loss.

    abstract::Estimates of waterfowl demographic parameters often come from resighting studies where birds fit with individually identifiable neck collars are resighted at a distance. Concerns have been raised about the effects of collar loss on parameter estimates, and the reliability of extrapolating from collared individuals to ...


    pub_type: 杂志文章


    authors: Conn PB,Kendall WL,Samuel MD

    更新日期:2004-12-01 00:00:00

  • Accurate critical constants for the one-sided approximate likelihood ratio test of a normal mean vector when the covariance matrix is estimated.

    abstract::Tang, Gnecco, and Geller (1989, Biometrika 76, 577-583) proposed an approximate likelihood ratio (ALR) test of the null hypothesis that a normal mean vector equals a null vector against the alternative that all of its components are nonnegative with at least one strictly positive. This test is useful for comparing a t...


    pub_type: 杂志文章


    authors: Tamhane AC,Logan BR

    更新日期:2002-09-01 00:00:00

  • Regression calibration in semiparametric accelerated failure time models.

    abstract::In large cohort studies, it often happens that some covariates are expensive to measure and hence only measured on a validation set. On the other hand, relatively cheap but error-prone measurements of the covariates are available for all subjects. Regression calibration (RC) estimation method (Prentice, 1982, Biometri...


    pub_type: 杂志文章


    authors: Yu M,Nan B

    更新日期:2010-06-01 00:00:00

  • Power and sample size for testing homogeneity of relative risks in prospective studies.

    abstract::Power and sample-size formulas for testing the homogeneity of relative risks using the score method are presented. The homogeneity score test (Gart, 1985, Biometrika 72, 673-677) is formally equivalent to the Pearson chi-square test, although they look different. Results of this paper may be useful in assessing the va...


    pub_type: 杂志文章


    authors: Nam JM

    更新日期:1999-03-01 00:00:00

  • To use or not to use? Backward equations in stochastic carcinogenesis models.

    abstract::The method based on the Kolmogorov backward equations of Little (1995, Biometrics 51, 1278-1291) for computing hazard functions for the multistage carcinogenesis models fails when model parameters are time-dependent. In addition to suggesting an alternative method based on the Kolmogorov forward equation, this note hi...


    pub_type: 杂志文章


    authors: Zheng Q

    更新日期:1998-03-01 00:00:00

  • Estimation in a Cox proportional hazards cure model.

    abstract::Some failure time data come from a population that consists of some subjects who are susceptible to and others who are nonsusceptible to the event of interest. The data typically have heavy censoring at the end of the follow-up period, and a standard survival analysis would not always be appropriate. In such situation...


    pub_type: 杂志文章


    authors: Sy JP,Taylor JM

    更新日期:2000-03-01 00:00:00

  • Sequential model selection-based segmentation to detect DNA copy number variation.

    abstract::Array-based CGH experiments are designed to detect genomic aberrations or regions of DNA copy-number variation that are associated with an outcome, typically a state of disease. Most of the existing statistical methods target on detecting DNA copy number variations in a single sample or array. We focus on the detectio...


    pub_type: 杂志文章


    authors: Hu J,Zhang L,Wang HJ

    更新日期:2016-09-01 00:00:00

  • Sequential construction of multiple-objective optimal designs.

    abstract::We propose a sequential approach for constructing multiple-objective locally optimal designs for nonlinear models. The technique used here is a general one and we demonstrate the added benefits of using a multiple-objective design over a single-objective design with examples from biomedical studies. ...


    pub_type: 杂志文章


    authors: Huang YC,Wong WK

    更新日期:1998-12-01 00:00:00

  • Bayesian modeling of multiple episode occurrence and severity with a terminating event.

    abstract::An individual's health condition can affect the frequency and intensity of episodes that can occur repeatedly and that may be related to an event time of interest. For example, bleeding episodes during pregnancy may indicate problems predictive of preterm delivery. Motivated by this application, we propose a joint mod...


    pub_type: 杂志文章


    authors: Herring AH,Yang J

    更新日期:2007-06-01 00:00:00

  • Efficient analysis of Weibull survival data from experiments on heterogeneous patient populations.

    abstract::An efficient method is presented for analyses of death rated in one-way or cross-classified experiments where expected survival time for a patient at time of entry on trial is a function of observable covariates. The survival-time distribution used is a Weibull form of Cox's (1972) model. The analysis proceeds in two ...


    pub_type: 杂志文章


    authors: Williams JS

    更新日期:1978-06-01 00:00:00

  • Time series models based on generalized linear models: some further results.

    abstract::This paper considers the problem of extending the classical moving average models to time series with conditional distributions given by generalized linear models. These models have the advantage of easy construction and estimation. Statistical modelling techniques are also proposed. Some simulation results and an ill...


    pub_type: 杂志文章


    authors: Li WK

    更新日期:1994-06-01 00:00:00

  • Spatial-temporal modeling of the association between air pollution exposure and preterm birth: identifying critical windows of exposure.

    abstract::Exposure to high levels of air pollution during the pregnancy is associated with increased probability of preterm birth (PTB), a major cause of infant morbidity and mortality. New statistical methodology is required to specifically determine when a particular pollutant impacts the PTB outcome, to determine the role of...


    pub_type: 杂志文章


    authors: Warren J,Fuentes M,Herring A,Langlois P

    更新日期:2012-12-01 00:00:00

  • Estimating acute air pollution health effects from cohort study data.

    abstract::Traditional studies of short-term air pollution health effects use time series data, while cohort studies generally focus on long-term effects. There is increasing interest in exploiting individual level cohort data to assess short-term health effects in order to understand the mechanisms and time scales of action. We...


    pub_type: 杂志文章


    authors: Szpiro AA,Sheppard L,Adar SD,Kaufman JD

    更新日期:2014-03-01 00:00:00

  • Applications of multiple imputation to the analysis of censored regression data.

    abstract::The first part of the article reviews the Data Augmentation algorithm and presents two approximations to the Data Augmentation algorithm for the analysis of missing-data problems: the Poor Man's Data Augmentation algorithm and the Asymptotic Data Augmentation algorithm. These two algorithms are then implemented in the...


    pub_type: 杂志文章


    authors: Wei GC,Tanner MA

    更新日期:1991-12-01 00:00:00

  • Interval estimates for the ratio of the means of two normal populations with variances related to the means.

    abstract::A procedure is given for estimating the ratio of the means of two populations using the data from two independent random samples when the observations are normally distributed with population variances that are related to the population means. ...


    pub_type: 杂志文章


    authors: Cox CP

    更新日期:1985-03-01 00:00:00

  • A semiparametric estimate of treatment effects with censored data.

    abstract::A semiparametric estimate of an average regression effect with right-censored failure time data has recently been proposed under the Cox-type model where the regression effect beta(t) is allowed to vary with time. In this article, we derive a simple algebraic relationship between this average regression effect and a m...


    pub_type: 杂志文章


    authors: Xu R,Harrington DP

    更新日期:2001-09-01 00:00:00

  • Tests for monotone mean residual life, using randomly censored data.

    abstract::At any age the mean residual life function gives the expected remaining life at that age. Reliabilists and biometricians have found it useful to categorize failure distributions by the monotonicity properties of the mean residual life function. Hollander and Proschan (1975, Biometrika 62, 585-593) have derived tests o...


    pub_type: 杂志文章


    authors: Chen YY,Hollander M,Langberg NA

    更新日期:1983-03-01 00:00:00

  • Receiver operating characteristic curves and confidence bands for support vector machines.

    abstract::Many problems that appear in biomedical decision-making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The support vector machine (SVM) is a popular classification technique that is robust to model misspecification and effectively handles high-dime...


    pub_type: 杂志文章


    authors: Luckett DJ,Laber EB,El-Kamary SS,Fan C,Jhaveri R,Perou CM,Shebl FM,Kosorok MR

    更新日期:2020-08-31 00:00:00

  • Nonparametric analysis of covariance by matching.

    abstract::The basic problem under consideration is the comparison of treatments with respect to a response Y when a covariable X is taken into account. Various methods involving matching may be regarded as compromises between the standard analysis of covariance and the standard analysis of independent matched pairs. First, ther...


    pub_type: 杂志文章


    authors: Quade D

    更新日期:1982-09-01 00:00:00

  • A signed-rank test for clustered data.

    abstract::We consider the problem of comparing two outcome measures when the pairs are clustered. Using the general principle of within-cluster resampling, we obtain a novel signed-rank test for clustered paired data. We show by a simple informative cluster size simulation model that only our test maintains the correct size und...


    pub_type: 杂志文章


    authors: Datta S,Satten GA

    更新日期:2008-06-01 00:00:00

  • A Monte Carlo investigation of homogeneity tests of the odds ratio under various sample size configurations.

    abstract::Epidemiologic data for case-control studies are often summarized into K 2 x 2 tables. Given a fixed number of cases and controls, the degree of sparseness in the data depends on the number of strata, K. The effect of increasing stratification on size and power of seven tests of homogeneity of the odds ratio is studied...


    pub_type: 杂志文章


    authors: Jones MP,O'Gorman TW,Lemke JH,Woolson RF

    更新日期:1989-03-01 00:00:00

  • Fitting nonlinear and constrained generalized estimating equations with optimization software.

    abstract::In this article, we present an estimation approach for solving nonlinear constrained generalized estimating equations that can be implemented using object-oriented software for nonlinear programming, such as nlminb in Splus or fmincon and lsqnonlin in Matlab. We show how standard estimating equation theory includes th...


    pub_type: 杂志文章


    authors: Contreras M,Ryan LM

    更新日期:2000-12-01 00:00:00

  • Bayesian prediction of spatial count data using generalized linear mixed models.

    abstract::Spatial weed count data are modeled and predicted using a generalized linear mixed model combined with a Bayesian approach and Markov chain Monte Carlo. Informative priors for a data set with sparse sampling are elicited using a previously collected data set with extensive sampling. Furthermore, we demonstrate that so...


    pub_type: 杂志文章


    authors: Christensen OF,Waagepetersen R

    更新日期:2002-06-01 00:00:00

  • Biometry and medical statistics.

    abstract::The "biometric school" founded by K. Pearson, F. Galton, and W. F. R. Weldon was concerned especially with heredity and variation, and between the wars "biometry" was not widely used as a general term for quantitative biology. The foundation of the Biometric Society encouraged this wider usage, and medical and biologi...


    pub_type: 杂志文章


    authors: Armitage P

    更新日期:1985-12-01 00:00:00

  • Performance of generalized estimating equations in practical situations.

    abstract::Moment methods for analyzing repeated binary responses have been proposed by Liang and Zeger (1986, Biometrika 73, 13-22), and extended by Prentice (1988, Biometrics 44, 1033-1048). In their generalized estimating equations (GEE), both Liang and Zeger (1986) and Prentice (1988) estimate the parameters associated with ...


    pub_type: 杂志文章


    authors: Lipsitz SR,Fitzmaurice GM,Orav EJ,Laird NM

    更新日期:1994-03-01 00:00:00

  • Variable selection and prediction using a nested, matched case-control study: Application to hospital acquired pneumonia in stroke patients.

    abstract::Matched case-control designs are commonly used in epidemiologic studies for increased efficiency. These designs have recently been introduced to the setting of modern imaging and genomic studies, which are characterized by high-dimensional covariates. However, appropriate statistical analyses that adjust for the match...


    pub_type: 杂志文章


    authors: Qian J,Payabvash S,Kemmling A,Lev MH,Schwamm LH,Betensky RA

    更新日期:2014-03-01 00:00:00

  • FPCA-based method to select optimal sampling schedules that capture between-subject variability in longitudinal studies.

    abstract::A critical component of longitudinal study design involves determining the sampling schedule. Criteria for optimal design often focus on accurate estimation of the mean profile, although capturing the between-subject variance of the longitudinal process is also important since variance patterns may be associated with ...


    pub_type: 杂志文章


    authors: Wu M,Diez-Roux A,Raghunathan TE,Sánchez BN

    更新日期:2018-03-01 00:00:00

  • Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials.

    abstract::Complex issues arise when investigating the association between longitudinal immunologic measures and time to an event, such as time to relapse, in cancer vaccine trials. Unlike many clinical trials, we may encounter patients who are cured and no longer susceptible to the time-to-event endpoint. If there are cured pat...


    pub_type: 杂志文章


    authors: Brown ER,Ibrahim JG

    更新日期:2003-09-01 00:00:00

  • Aberrant crypt foci and semiparametric modeling of correlated binary data.

    abstract::Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines....


    pub_type: 杂志文章


    authors: Apanasovich TV,Ruppert D,Lupton JR,Popovic N,Turner ND,Chapkin RS,Carroll RJ

    更新日期:2008-06-01 00:00:00

  • The Jolly-Seber model with tag loss.

    abstract::Tag loss in mark-recapture experiments is a violation of one of the Jolly-Seber model assumptions. It causes bias in parameter estimates and has only been dealt with in an ad hoc manner. We develop methodology to estimate tag retention and abundance in double-tagging mark-recapture experiments. We apply this methodolo...


    pub_type: 杂志文章


    authors: Cowen L,Schwarz CJ

    更新日期:2006-09-01 00:00:00