Multiple imputation for model checking: completed-data plots with missing and latent data.

Abstract:

:In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as "predictive inference" in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.

journal_name

Biometrics

journal_title

Biometrics

authors

Gelman A,Van Mechelen I,Verbeke G,Heitjan DF,Meulders M

doi

10.1111/j.0006-341X.2005.031010.x

subject

Has Abstract

pub_date

2005-03-01 00:00:00

pages

74-85

issue

1

eissn

0006-341X

issn

1541-0420

pii

BIOM031010

journal_volume

61

pub_type

杂志文章
  • On the use of the variogram in checking for independence in spatial data.

    abstract::The variogram is a standard tool in the analysis of spatial data, and its shape provides useful information on the form of spatial correlation that may be present. However, it is also useful to be able to assess the evidence for the presence of any spatial correlation. A method of doing this, based on an assessment of...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.00211.x

    authors: Diblasi A,Bowman AW

    更新日期:2001-03-01 00:00:00

  • A semiparametric estimate of treatment effects with censored data.

    abstract::A semiparametric estimate of an average regression effect with right-censored failure time data has recently been proposed under the Cox-type model where the regression effect beta(t) is allowed to vary with time. In this article, we derive a simple algebraic relationship between this average regression effect and a m...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.00875.x

    authors: Xu R,Harrington DP

    更新日期:2001-09-01 00:00:00

  • Two-stage designs for gene-disease association studies with sample size constraints.

    abstract::Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage metho...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2004.00207.x

    authors: Satagopan JM,Venkatraman ES,Begg CB

    更新日期:2004-09-01 00:00:00

  • Random-effects models, for longitudinal data using Gibbs sampling.

    abstract::Analysis of longitudinal studies is often complicated through differences amongst individuals in the number and spacing of observations. Laird and Ware (1982, Biometrics 38, 963-974) proposed a linear random-effects model to deal with this problem. We propose a generalisation of this model to accommodate multiple rand...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Gilks WR,Wang CC,Yvonnet B,Coursaget P

    更新日期:1993-06-01 00:00:00

  • Maximum likelihood estimation for incomplete repeated-measures experiments under an ARMA covariance structure.

    abstract::A stochastic model is presented for the analysis of incomplete repeated-measures experiments. The general linear model is used to relate the response measures to other variables which are thought to account for inherent variation; an autoregressive moving average (ARMA) time series representation is used to model dist...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Rochon J,Helms RW

    更新日期:1989-03-01 00:00:00

  • A test of homogeneity of distributions when observations are subject to measurement errors.

    abstract::When the observed data are contaminated with errors, the standard two-sample testing approaches that ignore measurement errors may produce misleading results, including a higher type-I error rate than the nominal level. To tackle this inconsistency, a nonparametric test is proposed for testing equality of two distribu...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13207

    authors: Lee D,Lahiri SN,Sinha S

    更新日期:2020-09-01 00:00:00

  • Multipoint linkage analysis via Metropolis jumping kernels.

    abstract::Multipoint linkage analysis is being performed routinely in medical genetic studies to localize disease genes. This likelihood-based method is very computationally intensive. Exact computations are thus formidable for problems with large number of genetic markers and complex pedigrees. This paper proposes a Monte Carl...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lin S

    更新日期:1996-12-01 00:00:00

  • A general theory for modeling capture-recapture data from a closed population.

    abstract::A general theory for estimating the size of a closed population from multiple-recapture data is presented. This theory is easily extended to open population models for multiple-recapture data. Estimation is based on a log-linear model developed for modeling dependent capture-recapture data when capture probabilities v...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Evans MA,Bonett DG,McDonald LL

    更新日期:1994-06-01 00:00:00

  • A general noninteractive multiple toxicity model including probit, logit, and Weibull transformations.

    abstract::A multiple toxicity model for the quantal response of organisms is constructed based on an existing bivariate theory. The main assumption is that the tolerances follow a multivariate normal distribution function. However, any monotone tolerance distribution can be applied by mapping the integration region in the n-dim...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Christensen ER,Chen CY

    更新日期:1985-09-01 00:00:00

  • A one-step-ahead pseudo-DIC for comparison of Bayesian state-space models.

    abstract::In the context of state-space modeling, conventional usage of the deviance information criterion (DIC) evaluates the ability of the model to predict an observation at time t given the underlying state at time t. Motivated by the failure of conventional DIC to clearly choose between competing multivariate nonlinear Bay...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12237

    authors: Millar RB,McKechnie S

    更新日期:2014-12-01 00:00:00

  • Regression dilution in the proportional hazards model.

    abstract::The problem of regression dilution arising from covariate measurement error is investigated for survival data using the proportional hazards model. The naive approach to parameter estimation is considered whereby observed covariate values are used, inappropriately, in the usual analysis instead of the underlying covar...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Hughes MD

    更新日期:1993-12-01 00:00:00

  • Selecting factors predictive of heterogeneity in multivariate event time data.

    abstract::In multivariate survival analysis, investigators are often interested in testing for heterogeneity among clusters, both overall and within specific classes. We represent different hypotheses about the heterogeneity structure using a sequence of gamma frailty models, ranging from a null model with no random effects to ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2004.00179.x

    authors: Dunson DB,Chen Z

    更新日期:2004-06-01 00:00:00

  • Semiparametric maximum likelihood for nonlinear regression with measurement errors.

    abstract::This article demonstrates semiparametric maximum likelihood estimation of a nonlinear growth model for fish lengths using imprecisely measured ages. Data on the species corvina reina, found in the Gulf of Nicoya, Costa Rica, consist of lengths and imprecise ages for 168 fish and precise ages for a subset of 16 fish. T...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2002.00448.x

    authors: Suh EY,Schafer DW

    更新日期:2002-06-01 00:00:00

  • Capitalizing on opportunistic data for monitoring relative abundances of species.

    abstract::With the internet, a massive amount of information on species abundance can be collected by citizen science programs. However, these data are often difficult to use directly in statistical inference, as their collection is generally opportunistic, and the distribution of the sampling effort is often not known. In this...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12431

    authors: Giraud C,Calenge C,Coron C,Julliard R

    更新日期:2016-06-01 00:00:00

  • Attributable effects in case2-studies.

    abstract::In an effort to determine whether a particular treatment causes a particular outcome event, data are obtained from a database system that records events when they occur, and for such events, the system records exposure to the treatment. That is, the system records information about cases. The system provides no inform...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2005.030920.x

    authors: Rosenbaum PR

    更新日期:2005-03-01 00:00:00

  • A note on case-control sampling to estimate kappa coefficients.

    abstract::The feasibility and cost-effectiveness of estimation of kappa using a case-control method of sampling, proposed by Jannarone, Macera, and Garrison (1987, Biometrics 43, 433-437), is provided support. However, in this article unrealistic assumptions in their presentation are identified and more general results for more...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Kraemer HC,Bloch DA

    更新日期:1990-03-01 00:00:00

  • Assessing effects of cholera vaccination in the presence of interference.

    abstract::Interference occurs when the treatment of one person affects the outcome of another. For example, in infectious diseases, whether one individual is vaccinated may affect whether another individual becomes infected or develops disease. Quantifying such indirect (or spillover) effects of vaccination could have important...

    journal_title:Biometrics

    pub_type: 杂志文章,随机对照试验

    doi:10.1111/biom.12184

    authors: Perez-Heydrich C,Hudgens MG,Halloran ME,Clemens JD,Ali M,Emch ME

    更新日期:2014-09-01 00:00:00

  • Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums.

    abstract::Johnson and Wehrly (1978, Journal of the American Statistical Association 73, 602-606) and Wehrly and Johnson (1980, Biometrika 67, 255-256) show one way to construct the joint distribution of a circular and a linear random variable, or the joint distribution of a pair of circular random variables from their marginal ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00716.x

    authors: Fernández-Durán JJ

    更新日期:2007-06-01 00:00:00

  • Bayesian dose-finding in phase I/II clinical trials using toxicity and efficacy odds ratios.

    abstract::A Bayesian adaptive design is proposed for dose-finding in phase I/II clinical trials to incorporate the bivariate outcomes, toxicity and efficacy, of a new treatment. Without specifying any parametric functional form for the drug dose-response curve, we jointly model the bivariate binary data to account for the corre...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00534.x

    authors: Yin G,Li Y,Ji Y

    更新日期:2006-09-01 00:00:00

  • Sharpening bounds on principal effects with covariates.

    abstract::Estimation of treatment effects in randomized studies is often hampered by possible selection bias induced by conditioning on or adjusting for a variable measured post-randomization. One approach to obviate such selection bias is to consider inference about treatment effects within principal strata, that is, principal...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12103

    authors: Long DM,Hudgens MG

    更新日期:2013-12-01 00:00:00

  • Post-stratification in the randomized clinical trial.

    abstract::A topic of current biometric discussion is whether stratification should be used in randomized clinical trials and, if so, which kind. An approach based upon randomization theory is used to evaluate pre- versus post-stratification. The results obtained relate specifically to the effect of the size of the clinical tria...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: McHugh R,Matts J

    更新日期:1983-03-01 00:00:00

  • Type I error robustness of ANOVA and ANOVA on ranks when the number of treatments is large.

    abstract::Agricultural screening trials often involve a large number (t) of treatments in a complete block design with limited replication (b = 3 or 4 blocks). The null hypothesis of interest is that of no differences between treatments. For the commonly used analysis of variance (ANOVA) procedure, most texts do not discuss agr...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Brownie C,Boos DD

    更新日期:1994-06-01 00:00:00

  • Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure.

    abstract::We propose a method to estimate the regression coefficients in a competing risks model where the cause-specific hazard for the cause of interest is related to covariates through a proportional hazards relationship and when cause of failure is missing for some individuals. We use multiple imputation procedures to imput...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.01191.x

    authors: Lu K,Tsiatis AA

    更新日期:2001-12-01 00:00:00

  • Objective Bayesian search of Gaussian directed acyclic graphical models for ordered variables with non-local priors.

    abstract::Directed acyclic graphical (DAG) models are increasingly employed in the study of physical and biological systems to model direct influences between variables. Identifying the graph from data is a challenging endeavor, which can be more reasonably tackled if the variables are assumed to satisfy a given ordering; in th...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12018

    authors: Altomare D,Consonni G,La Rocca L

    更新日期:2013-06-01 00:00:00

  • Feature-specific penalized latent class analysis for genomic data.

    abstract::Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00566.x

    authors: Houseman EA,Coull BA,Betensky RA

    更新日期:2006-12-01 00:00:00

  • Two-stage method of estimation for general linear growth curve models.

    abstract::We extend the linear random-effects growth curve model (REGCM) (Laird and Ware, 1982, Biometrics 38, 963-974) to study the effects of population covariates on one or more characteristics of the growth curve when the characteristics are expressed as linear combinations of the growth curve parameters. This definition in...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Stukel TA,Demidenko E

    更新日期:1997-06-01 00:00:00

  • Estimating the size of closed populations using inverse multiple-recapture sampling.

    abstract::A log-linear model for estimating the size of a closed population is defined for inverse multiple-recapture sampling with dependent samples. Efficient estimators of the log-linear model parameters and the population size are obtained by the method of minimum chi-square. A chi-square test of the general linear hypothes...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Bonett DG,Woodward JA,Bentler PM

    更新日期:1987-12-01 00:00:00

  • Confidence intervals for the generalized ROC criterion.

    abstract::Receiver operating characteristic (ROC) curves are frequently used to assess the usefulness of diagnostic markers. When several diagnostic markers are available, they can be combined by a best linear combination: that is, when the area under the ROC curve of this combination is maximized among all possible linear comb...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Reiser B,Faraggi D

    更新日期:1997-06-01 00:00:00

  • Combining multivariate bioassays.

    abstract::Linear multivariate theory is applied to the problem of combining several multivariate bioassays. Results are an asymptotic test of the hypothesis of a common log relative potency; the maximum likelihood estimator of the common log relative potency; and an exact and asymptotic confidence interval estimator for log rel...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Meisner M,Kushner HB,Laska EM

    更新日期:1986-06-01 00:00:00

  • An implicitly defined parametric model for censored survival data and covariates.

    abstract::Parametric survival functions are usually defined as explicit functions of time and covariates. However, consideration of some simple differential equations describing certain survival curves leads to a descriptive equation which cannot be explicitly solved for the survival function. Nevertheless, the resulting surviv...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Piantadosi S,Crowley J

    更新日期:1995-03-01 00:00:00