Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study.

Abstract:

BACKGROUND:Multiple imputation is becoming increasingly popular for handling missing data, with Markov chain Monte Carlo assuming multivariate normality (MVN) a commonly used approach. Imputing categorical variables (which are clearly non-normal) using MVN imputation is challenging, and several approaches have been suggested. However, it remains unclear which approach should be preferred. METHODS:We explore methods for imputing ordinal variables using MVN imputation, including imputing as a continuous variable and as a set of indicators, and various methods for assigning imputed values to the possible categories (rounding), for estimating a non-linear association between an ordinal exposure and binary outcome. We introduce a new approach where we impute as continuous and assign imputed values into categories based on the mean indicators imputed in a separate round of imputation. We compare these approaches in a simple setting where we make 50% of data in an ordinal exposure missing completely at random, within an otherwise complete real dataset. RESULTS:Methods that impute the ordinal exposure as continuous distorted the non-linear exposure-outcome association by biasing the relationship towards linearity irrespective of the rounding method. In contrast, imputing using indicators preserved the non-linear association but not the marginal distribution of the ordinal variable. CONCLUSIONS:Imputing ordinal variables as continuous can bias the estimation of the exposure-outcome association in the presence of non-linear relationships. Further work is needed to develop optimal methods for handling ordinal (and nominal) variables when using MVN imputation.

journal_name

Stat Med

journal_title

Statistics in medicine

authors

Lee KJ,Galati JC,Simpson JA,Carlin JB

doi

10.1002/sim.5445

subject

Has Abstract

pub_date

2012-12-30 00:00:00

pages

4164-74

issue

30

eissn

0277-6715

issn

1097-0258

journal_volume

31

pub_type

杂志文章
  • The social contagion hypothesis: comment on 'Social contagion theory: examining dynamic social networks and human behavior'.

    abstract::I reflect on the statistical methods of the Christakis-Fowler studies on network-based contagion of traits by checking the sensitivity of these kinds of results to various alternate specifications and generative mechanisms. Despite the honest efforts of all involved, I remain pessimistic about establishing whether bin...

    journal_title:Statistics in medicine

    pub_type: 评论,杂志文章

    doi:10.1002/sim.5551

    authors: Thomas AC

    更新日期:2013-02-20 00:00:00

  • A joint test for progression and survival with interval-censored data from a cancer clinical trial.

    abstract::Clinical trials often assess efficacy by comparing treatments on the basis of two or more event-time outcomes. In the case of cancer clinical trials, progression-free survival (PFS), which is the minimum of the time from randomization to progression or to death, summarizes the comparison of treatments on the hazards f...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6096

    authors: Finkelstein DM,Schoenfeld DA

    更新日期:2014-05-30 00:00:00

  • Estimation of ROC curve with complex survey data.

    abstract::The receiver operating characteristic (ROC) curve can be utilized to evaluate the performance of diagnostic tests. The area under the ROC curve (AUC) is a widely used summary index for comparing multiple ROC curves. Both parametric and nonparametric methods have been developed to estimate and compare the AUCs. However...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6405

    authors: Yao W,Li Z,Graubard BI

    更新日期:2015-04-15 00:00:00

  • Estimation of the time-dependent vaccine efficacy from a measles epidemic.

    abstract::We present a method to estimate the time-dependent vaccine efficacy from the cohort-specific vaccination coverage and from data on the vaccination status of cases and apply it to a measles epidemic in Germany which involved 529 cases, 88 of whom were vaccinated and 370 unvaccinated (for the remaining 71 cases the vacc...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1043

    authors: Eichner M,Diebner HH,Schubert C,Kreth HW,Dietz K

    更新日期:2002-08-30 00:00:00

  • Estimating the mean hazard ratio parameters for clustered survival data with random clusters.

    abstract::We consider a latent variable hazard model for clustered survival data where clusters are a random sample from an underlying population. We allow interactions between the random cluster effect and covariates. We use a maximum pseudo-likelihood estimator to estimate the mean hazard ratio parameters. We propose a bootst...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19970915)16:17<2009::aid-s

    authors: Cai J,Zhou H,Davis CE

    更新日期:1997-09-15 00:00:00

  • Ratio of geometric means to analyze continuous outcomes in meta-analysis: comparison to mean differences and ratio of arithmetic means using empiric data and simulation.

    abstract::Meta-analyses pooling continuous outcomes can use mean differences (MD), standardized MD (MD in pooled standard deviation units, SMD), or ratio of arithmetic means (RoM). Recently, ratio of geometric means using ad hoc (RoGM (ad hoc) ) or Taylor series (RoGM (Taylor) ) methods for estimating variances have been propos...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4501

    authors: Friedrich JO,Adhikari NK,Beyene J

    更新日期:2012-07-30 00:00:00

  • Signal detection in FDA AERS database using Dirichlet process.

    abstract::In the recent two decades, data mining methods for signal detection have been developed for drug safety surveillance, using large post-market safety data. Several of these methods assume that the number of reports for each drug-adverse event combination is a Poisson random variable with mean proportional to the unknow...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6510

    authors: Hu N,Huang L,Tiwari RC

    更新日期:2015-08-30 00:00:00

  • Spatial disease clusters: detection and inference.

    abstract::We present a new method of detection and inference for spatial clusters of a disease. To avoid ad hoc procedures to test for clustering, we have a clearly defined alternative hypothesis and our test statistic is based on the likelihood ratio. The proposed test can detect clusters of any size, located anywhere in the s...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780140809

    authors: Kulldorff M,Nagarwalla N

    更新日期:1995-04-30 00:00:00

  • Corrections for exposure measurement error in logistic regression models with an application to nutritional data.

    abstract::Two correction methods are considered for multiple logistic regression models with some covariates measured with error. Both methods are based on approximating the complicated regression model between the response and the observed covariates with simpler models. The first model is the logistic approximation proposed b...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780131105

    authors: Kuha J

    更新日期:1994-06-15 00:00:00

  • Nonparametric estimation of broad sense agreement between ordinal and censored continuous outcomes.

    abstract::The concept of broad sense agreement (BSA) has recently been proposed for studying the relationship between a continuous measurement and an ordinal measurement. They developed a nonparametric procedure for estimating the BSA index, which is only applicable to completely observed data. In this work, we consider the pro...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8523

    authors: Dai T,Guo Y,Peng L,Manatunga A

    更新日期:2020-06-30 00:00:00

  • Incorporating data from various trial designs into a mixed treatment comparison model.

    abstract::Estimates of relative efficacy between alternative treatments are crucial for decision making in health care. Bayesian mixed treatment comparison models provide a powerful methodology to obtain such estimates when head-to-head evidence is not available or insufficient. In recent years, this methodology has become wide...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5764

    authors: Schmitz S,Adams R,Walsh C

    更新日期:2013-07-30 00:00:00

  • Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index.

    abstract::Diagnostic problems in medicine are sometimes polytomous, meaning that the outcome has more than two distinct categories. For example, ovarian tumors can be benign, borderline, primary invasive, or metastatic. Extending the main measure of binary discrimination, the c-statistic or area under the ROC curve, to nominal ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5321

    authors: Van Calster B,Van Belle V,Vergouwe Y,Timmerman D,Van Huffel S,Steyerberg EW

    更新日期:2012-10-15 00:00:00

  • On the relationship between association and surrogacy when both the surrogate and true endpoint are binary outcomes.

    abstract::The relationship between association and surrogacy has been the focus of much debate in the surrogate marker literature. Recently, the individual causal association (ICA) has been introduced as a metric of surrogacy in the causal inference framework, when both the surrogate and the true endpoint are normally distribut...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8698

    authors: Meyvisch P,Alonso A,Van der Elst W,Molenberghs G

    更新日期:2020-11-20 00:00:00

  • STRengthening analytical thinking for observational studies: the STRATOS initiative.

    abstract::The validity and practical utility of observational medical research depends critically on good study design, excellent data quality, appropriate statistical methods and accurate interpretation of results. Statistical methodology has seen substantial development in recent times. Unfortunately, many of these methodolog...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6265

    authors: Sauerbrei W,Abrahamowicz M,Altman DG,le Cessie S,Carpenter J,STRATOS initiative.

    更新日期:2014-12-30 00:00:00

  • Classification using ensemble learning under weighted misclassification loss.

    abstract::Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy requires periodic assessment of treatment fail...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8082

    authors: Xu Y,Liu T,Daniels MJ,Kantor R,Mwangi A,Hogan JW

    更新日期:2019-05-20 00:00:00

  • Statistical inferences for a twin correlation with multinomial outcomes.

    abstract::Current methods for statistical analysis of twin studies focus on continuous and dichotomous data, while only limited methodology exists for analysing multinomial data. As a consequence, investigators are often tempted to collapse multinomial data into two categories simply to facilitate the analysis. We address this ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/1097-0258(20010130)20:2<249::aid-sim641>3.

    authors: Bartfay E,Donner A

    更新日期:2001-01-30 00:00:00

  • A simulation-free approach to assessing the performance of the continual reassessment method.

    abstract::The continual reassessment method (CRM) is an adaptive design for Phase I trials whose operating characteristics, including appropriate sample size, probability of correctly identifying the maximum tolerated dose, and the expected proportion of participants assigned to each dose, can only be determined via simulation....

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8746

    authors: Braun TM

    更新日期:2020-09-16 00:00:00

  • Nonparametric collective spectral density estimation with an application to clustering the brain signals.

    abstract::In this paper, we develop a method for the simultaneous estimation of spectral density functions (SDFs) for a collection of stationary time series that share some common features. Due to the similarities among the SDFs, the log-SDF can be represented using a common set of basis functions. The basis shared by the colle...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7972

    authors: Maadooliat M,Sun Y,Chen T

    更新日期:2018-12-30 00:00:00

  • Using follow-up data to avoid omitted variable bias: an application to cardiovascular epidemiology.

    abstract::Omitted variable bias is discussed in the context of linear models. It is shown that the effect of omitted variables can be controlled in linear models for metric dependent variables by using data from follow-up studies. Two different models for analysing such data are proposed. In the first model the omitted variable...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780110906

    authors: Rehm J,Arminger G,Kohlmeier L

    更新日期:1992-06-30 00:00:00

  • Identifying optimal risk windows for self-controlled case series studies of vaccine safety.

    abstract::In vaccine safety studies, subjects are considered at increased risk for adverse events for a period of time after vaccination known as risk window. To our knowledge, risk windows for vaccine safety studies have tended to be pre-defined and not to use information from the current study. Inaccurate specification of the...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4125

    authors: Xu S,Zhang L,Nelson JC,Zeng C,Mullooly J,McClure D,Glanz J

    更新日期:2011-03-30 00:00:00

  • A frailty model for recurrent events during alternating restraint and non-restraint time periods.

    abstract::We consider recurrent events of the same type that occur during alternating restraint and non-restraint time periods. This research is motivated by a study on juvenile recidivism, where the probationers were followed for re-offenses during alternating placement periods and free-time periods. During the placement perio...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7150

    authors: Li X,Chen Y,Li R

    更新日期:2017-02-20 00:00:00

  • Methodological considerations on the design and analysis of an equivalence stratified cluster randomization trial.

    abstract::The World Health Organization and collaborating institutions in four developing countries have conducted a multi-centre randomized controlled trial, in which clinics were allocated at random to two antenatal care (ANC) models. These were the standard 'Western' ANC model and a 'new' ANC model consisting of tests, clini...

    journal_title:Statistics in medicine

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:10.1002/1097-0258(20010215)20:3<401::aid-sim801>3.

    authors: Piaggio G,Carroli G,Villar J,Pinol A,Bakketeig L,Lumbiganon P,Bergsjø P,Al-Mazrou Y,Ba'aqeel H,Belizán JM,Farnot U,Berendes H,WHO Antenatal Care Trial Research Group.

    更新日期:2001-02-15 00:00:00

  • Multilevel latent variable models for global health-related quality of life assessment.

    abstract::Quality of life (QOL) assessment is a key component of many clinical studies and frequently requires the use of single global summary measures that capture the overall balance of findings from a potentially wide-ranging assessment of QOL issues. We propose and evaluate an irregular multilevel latent variable model sui...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4455

    authors: Kifley A,Heller GZ,Beath KJ,Bulger D,Ma J,Gebski V

    更新日期:2012-05-20 00:00:00

  • Four-fold table cell frequencies imputation in meta analysis.

    abstract::Meta analysis is a collection of quantitative methods devoted to combine summary information from related but independent studies. Because research reports usually present only data reductions and summary statistics rather than detailed data, the reviewer must often resort to rather crude methods for constructing summ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2287

    authors: Di Pietrantonj C

    更新日期:2006-07-15 00:00:00

  • A flexible, interpretable framework for assessing sensitivity to unmeasured confounding.

    abstract::When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-p...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6973

    authors: Dorie V,Harada M,Carnegie NB,Hill J

    更新日期:2016-09-10 00:00:00

  • Estimation methods for marginal and association parameters for longitudinal binary data with nonignorable missing observations.

    abstract::In longitudinal studies, missing observations occur commonly. It has been well known that biased results could be produced if missingness is not properly handled in the analysis. Authors have developed many methods with the focus on either incomplete response or missing covariate observations, but rarely on both. The ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5536

    authors: Li H,Yi GY

    更新日期:2013-02-28 00:00:00

  • Spatial clustering of the failure to geocode and its implications for the detection of disease clustering.

    abstract::Geocoding a study population as completely as possible is an important data assimilation component of many spatial epidemiologic studies. Unfortunately, complete geocoding is rare in practice. The failure of a substantial proportion of study subjects' addresses to geocode has consequences for spatial analyses, some of...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3288

    authors: Zimmerman DL,Fang X,Mazumdar S

    更新日期:2008-09-20 00:00:00

  • Estimating kappa from binocular data and comparing marginal probabilities.

    abstract::Suppose that two graders classify all eyes in a sample of patients for the presence or absence of a specified abnormality. In the statistical analysis of the data, possible correlation between the observations in the right and left eyes should be taken into account. Recently, general methods have been developed to ana...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780122306

    authors: Schouten HJ

    更新日期:1993-12-15 00:00:00

  • Correction of sampling bias in a cross-sectional study of post-surgical complications.

    abstract::Cross-sectional designs are often used to monitor the proportion of infections and other post-surgical complications acquired in hospitals. However, conventional methods for estimating incidence proportions when applied to cross-sectional data may provide estimators that are highly biased, as cross-sectional designs t...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5608

    authors: Fluss R,Mandel M,Freedman LS,Weiss IS,Zohar AE,Haklai Z,Gordon ES,Simchen E

    更新日期:2013-06-30 00:00:00

  • Common predictor effects for multivariate longitudinal data.

    abstract::Multivariate outcomes measured longitudinally over time are common in medicine, public health, psychology and sociology. The typical (saturated) longitudinal multivariate regression model has a separate set of regression coefficients for each outcome. However, multivariate outcomes are often quite similar and many out...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3589

    authors: Jia J,Weiss RE

    更新日期:2009-06-15 00:00:00