Abstract:
:Kappa coefficients are measures of correlation between categorical variables often used as reliability or validity coefficients. We recapitulate development and definitions of the K (categories) by M (ratings) kappas (K x M), discuss what they are well- or ill-designed to do, and summarize where kappas now stand with regard to their application in medical research. The 2 x M(M>/=2) intraclass kappa seems the ideal measure of binary reliability; a 2 x 2 weighted kappa is an excellent choice, though not a unique one, as a validity measure. For both the intraclass and weighted kappas, we address continuing problems with kappas. There are serious problems with using the K x M intraclass (K>2) or the various K x M weighted kappas for K>2 or M>2 in any context, either because they convey incomplete and possibly misleading information, or because other approaches are preferable to their use. We illustrate the use of the recommended kappas with applications in medical research.
journal_name
Stat Medjournal_title
Statistics in medicineauthors
Chmura Kraemer H,Periyakoil VS,Noda Adoi
10.1002/sim.1180subject
Has Abstractpub_date
2002-07-30 00:00:00pages
2109-29issue
14eissn
0277-6715issn
1097-0258journal_volume
21pub_type
杂志文章,评审abstract::An observed confidence distribution is proposed as a measure of strength of evidence for practically equivalent efficacies of two treatments. The concept is independent of prior opinions about relevant sizes of a difference in efficacy. It also avoids retrospective power calculations for trials with missed recruitment...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780071207
更新日期:1988-12-01 00:00:00
abstract::The "some invalid, some valid instrumental variable estimator" (sisVIVE) is a lasso-based method for instrumental variables (IVs) regression of outcome on an exposure. In principle, sisVIVE is robust to some of the IVs in the analysis being invalid, in the sense of being related to the outcome variable through pathway...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8066
更新日期:2019-04-30 00:00:00
abstract::Selection of dose for cancer patients treated with radiation therapy (RT) must balance the increased efficacy with the increased toxicity associated with higher dose. Historically, a single dose has been selected for a population of patients (e.g., all stage III non-small cell lung cancer). However, the availability o...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6285
更新日期:2014-12-30 00:00:00
abstract::The development of drugs and biologicals whose mechanisms of action may extend beyond their target indications has led to a need to identify unexpected potential toxicities promptly even while blinded clinical trials are under way. One component of recently issued FDA rules regarding safety reporting requirements rais...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7129
更新日期:2017-01-15 00:00:00
abstract::An important problem in oncology is comparing chemotherapy (chemo) agents in terms of their effects on survival or progression-free survival time. When the goal is to evaluate individual agents, a difficulty commonly encountered with observational data is that many patients receive a chemo combination including two or...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.4249
更新日期:2011-07-10 00:00:00
abstract::Conditional power and predictive power provide estimates of the probability of success at the end of the trial based on the information from the interim analysis. The observed value of the time to event endpoint at the interim analysis could be biased for the true treatment effect due to early censoring, leading to a ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7673
更新日期:2018-08-15 00:00:00
abstract::Although the frequentist paradigm has been the predominant approach to clinical trial design since the 1940s, it has several notable limitations. Advancements in computational algorithms and computer hardware have greatly enhanced the alternative Bayesian paradigm. Compared with its frequentist counterpart, the Bayesi...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.5404
更新日期:2012-11-10 00:00:00
abstract::We explore the 'reassessment' design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mech...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6456
更新日期:2015-05-20 00:00:00
abstract::Surrogate endpoint validation has been well established by the meta-analytical correlation-based approach as outlined in the seminal work of Buyse et al. (Biostatistics, 2000). Surrogacy can be assumed if strong associations on individual and study levels can be demonstrated. Alternatively, if an effect on a true endp...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6778
更新日期:2016-03-30 00:00:00
abstract::We present an estimate of the kappa-coefficient of agreement between two methods of rating based on matched pairs of binary responses and show that the estimate depends on the common intraclass correlation coefficient between the pairs. Via Monte Carlo simulation, we investigate power of the test of significance on ka...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780140109
更新日期:1995-01-15 00:00:00
abstract::Although statistical methodology is well-developed for comparing diagnostic tests in terms of their sensitivity and specificity, comparative inference about predictive values is not. In this paper, we consider the analysis of studies comparing operating characteristics of two diagnostic tests that are measured on all ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6485
更新日期:2015-07-10 00:00:00
abstract::Assessment of quality of life is becoming standard in clinical trials. A popular method for measuring quality of life is with instruments which utilize multiple-item subscales, in which each item is scored on a Likert scale. Most statistical methods for the analysis of quality of life data in clinical trials do not ex...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19991115)18:21<2917::aid-s
更新日期:1999-11-15 00:00:00
abstract::To compare the survival functions based on right-truncated data, Lagakos et al. proposed a weighted logrank test based on a reverse time scale. This is in contrast to Bilker and Wang, who suggested a semi-parametric version of the Mann-Whitney test by assuming that the distribution of truncation times is known or can ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2556
更新日期:2007-02-20 00:00:00
abstract::A new, intuitive method has recently been proposed to explore treatment-covariate interactions in survival data arising from two treatment arms of a clinical trial. The method is based on constructing overlapping subpopulations of patients with respect to one (or more) covariates of interest and in observing the patte...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3524
更新日期:2009-04-15 00:00:00
abstract::Cross-sectional designs are often used to monitor the proportion of infections and other post-surgical complications acquired in hospitals. However, conventional methods for estimating incidence proportions when applied to cross-sectional data may provide estimators that are highly biased, as cross-sectional designs t...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5608
更新日期:2013-06-30 00:00:00
abstract::Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations ar...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4067
更新日期:2011-02-20 00:00:00
abstract::Hierarchical regression analysis holds much promise for epidemiologic analysis, but has as yet seen limited application because of lack of easily used software and the relatively lengthy run times of preferred fitting methods (such as true maximum likelihood and Bayesian approaches). This paper compares three relative...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19970315)16:5<515::aid-sim
更新日期:1997-03-15 00:00:00
abstract::Recurrent event data are commonly encountered in health-related longitudinal studies. In this paper time-to-events models for recurrent event data are studied with non-informative and informative censorings. In statistical literature, the risk set methods have been confirmed to serve as an appropriate and efficient ap...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1029
更新日期:2002-02-15 00:00:00
abstract::We propose a joint modeling approach to investigating the observed and latent risk factors of mixed types of outcomes. The proposed model comprises three parts. The first part is an exploratory factor analysis model that summarizes latent factors through multiple observed variables. The second part is a proportional h...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8840
更新日期:2020-12-09 00:00:00
abstract::Many models for clinical prediction (prognosis or diagnosis) are published in the medical literature every year but few such models find their way into clinical practice. The reason may be that since in most cases models have not been validated in independent data, they lack generality and/or credibility. In this pape...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1691
更新日期:2004-03-30 00:00:00
abstract::Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the c...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.2151
更新日期:2005-09-30 00:00:00
abstract::Two features commonly exhibited by randomized trials of health promotion interventions are cluster randomization and stratification. Ignoring correlations between individuals within clusters can lead to an inflated type I error rate and hence a P-value which overstates the significance of the result. This paper compar...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1256
更新日期:2002-12-30 00:00:00
abstract::In biomedical studies and clinical trials, repeated measures are often subject to some upper and/or lower limits of detection. Hence, the responses are either left or right censored. A complication arises when more than one series of responses is repeatedly collected on each subject at irregular intervals over a perio...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8017
更新日期:2019-03-15 00:00:00
abstract::A new goodness-of-fit test for the logistic regression model is proposed. It exploits the property of this model that when it is correct, i.e. not misspecified, the parameter estimates are (asymptotically) invariant under reweighting the observations by weights wi that are a function of the binary (0/1) outcomes yi. M...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1997
更新日期:2005-01-15 00:00:00
abstract::Risk prediction procedures can be quite useful for the patient's treatment selection, prevention strategy, or disease management in evidence-based medicine. Often, potentially important new predictors are available in addition to the conventional markers. The question is how to quantify the improvement from the new ma...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5647
更新日期:2013-06-30 00:00:00
abstract::The construction, validation and updating of a prognostic model for kidney graft survival is reported using data from the Eurotransplant database. First, a model is constructed for data from transplantations in the period 1984 to 1987. The model is later updated for the 1988 1990 data. The first data set was randomly ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780141806
更新日期:1995-09-30 00:00:00
abstract::It is valuable in many studies to assess both intrarater and interrater agreement. Most measures of intrarater agreement do not adjust for unequal estimates of prevalence between the separate rating occasions for a given rater and measures of interrater agreement typically ignore data from the second set of assessment...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1138
更新日期:2002-06-30 00:00:00
abstract::The use of outcome-dependent sampling with longitudinal data analysis has previously been shown to improve efficiency in the estimation of regression parameters. The motivating scenario is when outcome data exist for all cohort members but key exposure variables will be gathered only on a subset. Inference with outcom...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7633
更新日期:2018-06-15 00:00:00
abstract::Many complex diseases are known to be affected by the interactions between genetic variants and environmental exposures beyond the main genetic and environmental effects. Study of gene-environment (G×E) interactions is important for elucidating the disease etiology. Existing Bayesian methods for G×E interaction studie...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8434
更新日期:2020-02-28 00:00:00
abstract::In Part I we presented a covariance structure model for analysing measurement error in the assessment of nitrogen intake. In this paper we include data on urine nitrogen excretion which allows a critical assessment of the model proposed. Inclusion of urine nitrogen data produces more pessimistic estimates of the quali...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780121005
更新日期:1993-05-30 00:00:00