Abstract:
:In the medical literature, hundreds of prediction models are being developed to predict health outcomes in individuals. For continuous outcomes, typically a linear regression model is developed to predict an individual's outcome value conditional on values of multiple predictors (covariates). To improve model development and reduce the potential for overfitting, a suitable sample size is required in terms of the number of subjects (n) relative to the number of predictor parameters (p) for potential inclusion. We propose that the minimum value of n should meet the following four key criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9; (ii) small absolute difference of ≤ 0.05 in the apparent and adjusted R2 ; (iii) precise estimation (a margin of error ≤ 10% of the true value) of the model's residual standard deviation; and similarly, (iv) precise estimation of the mean predicted outcome value (model intercept). The criteria require prespecification of the user's chosen p and the model's anticipated R2 as informed by previous studies. The value of n that meets all four criteria provides the minimum sample size required for model development. In an applied example, a new model to predict lung function in African-American women using 25 predictor parameters requires at least 918 subjects to meet all criteria, corresponding to at least 36.7 subjects per predictor parameter. Even larger sample sizes may be needed to additionally ensure precise estimates of key predictor effects, especially when important categorical predictors have low prevalence in certain categories.
journal_name
Stat Medjournal_title
Statistics in medicineauthors
Riley RD,Snell KIE,Ensor J,Burke DL,Harrell FE Jr,Moons KGM,Collins GSdoi
10.1002/sim.7993subject
Has Abstractpub_date
2019-03-30 00:00:00pages
1262-1275issue
7eissn
0277-6715issn
1097-0258journal_volume
38pub_type
杂志文章abstract::Cancer immunotherapy trials have two special features: a delayed treatment effect and a cure rate. Both features violate the proportional hazard model assumption and ignoring either one of the two features in an immunotherapy trial design will result in substantial loss of statistical power. To properly design immunot...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8440
更新日期:2020-03-15 00:00:00
abstract::Cox proportional hazard regression model is a popular tool to analyze the relationship between a censored lifetime variable with other relevant factors. The semiparametric Cox model is widely used to study different types of data arising from applied disciplines such as medical science, biology, and reliability studie...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8377
更新日期:2019-11-30 00:00:00
abstract::Spatial scan statistics are widely used for count data to detect geographical disease clusters of high or low incidence, mortality or prevalence and to evaluate their statistical significance. Some data are ordinal or continuous in nature, however, so that it is necessary to dichotomize the data to use a traditional s...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2607
更新日期:2007-03-30 00:00:00
abstract::We consider several sources of heterogeneity in a clinical trial with patients' survival time as the main response criterion: differences in prognosis which can be attributed to a latent or ignored prognostic factor; differences in treatment efficacy in subgroups of patients, and differences in treatment combinations ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780060708
更新日期:1987-10-01 00:00:00
abstract::The purpose of this paper is to show that the sensitivity and specificity estimates obtained by 'discrepant analysis' are biased. Discrepant analysis is a widely used technique that attempts to provide estimates of sensitivity and specificity in the presence of an imperfect gold standard. Many researchers have applied...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19970630)16:12<1391::aid-s
更新日期:1997-06-30 00:00:00
abstract::Quantifying socioeconomic disparities and understanding the roots of inequalities are growing topics in cancer research. However, socioeconomic differences are challenging to investigate mainly due to the lack of accurate data at individual-level, while aggregate indicators are only partially informative. We implement...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8392
更新日期:2020-01-15 00:00:00
abstract::The construction, validation and updating of a prognostic model for kidney graft survival is reported using data from the Eurotransplant database. First, a model is constructed for data from transplantations in the period 1984 to 1987. The model is later updated for the 1988 1990 data. The first data set was randomly ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780141806
更新日期:1995-09-30 00:00:00
abstract::We consider the use of the assurance method in clinical trial planning. In the assurance method, which is an alternative to a power calculation, we calculate the probability of a clinical trial resulting in a successful outcome, via eliciting a prior probability distribution about the relevant treatment effect. This i...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5916
更新日期:2014-01-15 00:00:00
abstract::This paper discusses and compares several estimators of mean rate of change in unbalanced longitudinal data based on a model with randomly distributed regression coefficients across individuals. The estimators are unweighted and weighted means of these coefficients. The paper also evaluates commonly used variance esti...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780060509
更新日期:1987-07-01 00:00:00
abstract::In cancer clinical trials, patients often experience a recurrence of disease prior to the outcome of interest, overall survival. Additionally, for many cancers, there is a cured fraction of the population who will never experience a recurrence. There is often interest in how different covariates affect the probability...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6056
更新日期:2014-05-10 00:00:00
abstract::We propose a two-step procedure to personalize drug dosage over time under the framework of a log-linear mixed-effect model. We model patients' heterogeneity using subject-specific random effects, which are treated as the realizations of an unspecified stochastic process. We extend the conditional quadratic inference ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7016
更新日期:2016-10-30 00:00:00
abstract::Random forest is a supervised learning method that combines many classification or regression trees for prediction. Here we describe an extension of the random forest method for building event risk prediction models in survival analysis with competing risks. In case of right-censored data, the event status at the pred...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5775
更新日期:2013-08-15 00:00:00
abstract::In this paper we describe Bonferroni-based multiple testing procedures (MTPs) as strategies to split and recycle test mass. Here, 'test mass' refers to (parts of) the nominal level alpha at which the family-wise error rate is controlled. Briefly, test mass is split between different null hypotheses, and whenever a nul...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3513
更新日期:2009-02-28 00:00:00
abstract::Conditional power (CP) is the probability that the final study result will be statistically significant, given the data observed thus far and a specific assumption about the pattern of the data to be observed in the remainder of the study, such as assuming the original design effect, or the effect estimated from the c...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.2151
更新日期:2005-09-30 00:00:00
abstract::Multistate models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this arti...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7448
更新日期:2017-12-20 00:00:00
abstract::Maps of estimated disease rates over multiple time periods are useful tools for gaining etiologic insights regarding potential exposures associated with specific locations and times. In this paper, we describe an extension of the Gangnon-Clayton model for spatial clustering to spatio-temporal data. As in the purely sp...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3984
更新日期:2010-09-30 00:00:00
abstract::In clinical trials, treatment comparisons are often performed by models that incorporate important prognostic factors. Since these models require complete covariate information on all patients, statisticians frequently resort to complete case analysis or to omission of an important covariate. A probability imputation ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780090707
更新日期:1990-07-01 00:00:00
abstract::There is a rich literature that considers whether an observed relation between treatment and response is due to an unobserved covariate. In order to quantify this unmeasured bias, an assumption is made about the distribution of this unobserved covariate; typically that it is either binary or at least confined to the u...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2344
更新日期:2006-07-15 00:00:00
abstract::In randomised trials, continuous endpoints are often measured with some degree of error. This study explores the impact of ignoring measurement error and proposes methods to improve statistical inference in the presence of measurement error. Three main types of measurement error in continuous endpoints are considered:...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8359
更新日期:2019-11-30 00:00:00
abstract::Previous work on the consequences of regression to the mean for the interpretation of responses to treatment is extended to the situation where the response measured is the proportional change in some variable. Methods for correcting for the bias are discussed. ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780060203
更新日期:1987-03-01 00:00:00
abstract::Assay sensitivity has been proposed as a criterion for including psychiatric clinical outcome studies in meta-analyses. The authors assess the performance of assay sensitivity as a method for determining study appropriateness for meta-analysis by calculating expected standard drug vs placebo effect sizes for various c...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2240
更新日期:2006-03-30 00:00:00
abstract::The Spearman (rho(s)) and Kendall (tau) rank correlation coefficient are routinely used as measures of association between non-normally distributed random variables. However, confidence limits for rho(s) are only available under the assumption of bivariate normality and for tau under the assumption of asymptotic norma...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2547
更新日期:2007-02-10 00:00:00
abstract::Vaccination in populations can have several kinds of effects. Establishing that vaccination produces population-level effects beyond the direct effects in the vaccinated individuals can have important consequences for public health policy. Formal methods have been developed for study designs and analysis that can esti...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7392
更新日期:2018-01-30 00:00:00
abstract::In repeated measures settings, modeling the correlation pattern of the data can be immensely important for proper analyses. Accurate inference requires proper choice of the correlation model. Optimal efficiency of the estimation procedure demands a parsimonious parameterization of the correlation structure, with suffi...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3928
更新日期:2010-07-30 00:00:00
abstract::It is unclear to what extent the incremental predictive performance of a novel biomarker is impacted by the method used to control for standard predictors. We investigated whether adding a biomarker to a model with a published risk score overestimates its incremental performance as compared to adding it to a multivari...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6165
更新日期:2014-07-10 00:00:00
abstract::Relative survival is used to estimate patient survival excluding causes of death not related to the disease of interest. Rather than using cause of death information from death certificates, which is often poorly recorded, relative survival compares the observed survival to that expected in a matched group from the ge...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2399
更新日期:2005-12-30 00:00:00
abstract::We propose a goodness-of-fit test statistic for linear regression with heterogeneous variance, which is asymptotically chi-square if the given model is correct. The test statistic is computed as a quadratic form of observed minus predicted responses. We apply the method to a linear regression for an ordinal categorica...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780130205
更新日期:1994-01-30 00:00:00
abstract::We consider bivariate survival times for heterogeneous populations, where heterogeneity induces deviations in an individual's risk of an event as well as associations between survival times. The heterogeneity is characterized by a bivariate frailty model. We measure the heterogeneity effects through deviations associa...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19990430)18:8<907::aid-sim
更新日期:1999-04-30 00:00:00
abstract::Outcomes research often requires estimating the impact of a binary treatment on a binary outcome in a non-randomized setting, such as the effect of taking a drug on mortality. The data often come from self-selected samples, leading to a spurious correlation between the treatment and outcome when standard binary depend...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2226
更新日期:2006-02-15 00:00:00
abstract::Patients who switch treatment groups in randomized clinical trials can cause problems in the interpretation of the results. Although the intention-to-treat method is recognized as being the most reliable analysis, it may result in an underestimate of the treatment effect if there have been patients who switch treatmen...
journal_title:Statistics in medicine
pub_type: 临床试验,杂志文章,随机对照试验
doi:10.1002/(SICI)1097-0258(19961015)15:19<2069::AID-S
更新日期:1996-10-15 00:00:00