Abstract:
:When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R2 , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R2 , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.
journal_name
Stat Medjournal_title
Statistics in medicineauthors
Riley RD,Snell KI,Ensor J,Burke DL,Harrell FE Jr,Moons KG,Collins GSdoi
10.1002/sim.7992subject
Has Abstractpub_date
2019-03-30 00:00:00pages
1276-1296issue
7eissn
0277-6715issn
1097-0258journal_volume
38pub_type
杂志文章abstract::In randomised trials, continuous endpoints are often measured with some degree of error. This study explores the impact of ignoring measurement error and proposes methods to improve statistical inference in the presence of measurement error. Three main types of measurement error in continuous endpoints are considered:...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8359
更新日期:2019-11-30 00:00:00
abstract::Shared random effects models have been increasingly common in the joint analyses of repeated measures (e.g. CD4 counts, hemoglobin levels) and a correlated failure time such as death. In this paper we study several shared random effects models in the multi-level repeated measures data setting with dependent failure ti...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3392
更新日期:2008-11-29 00:00:00
abstract::In conventional survival analysis there is an underlying assumption that all study subjects are susceptible to the event. In general, this assumption does not adequately hold when investigating the time to an event other than death. Owing to genetic and/or environmental etiology, study subjects may not be susceptible ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5845
更新日期:2013-10-30 00:00:00
abstract::The publication of Fisher's correspondence on statistics has shed new light on his views on randomization. Quotations from this correspondence and from other works of Fisher are used to illustrate the role of randomization in clinical trials. It is concluded that Fisher's views not only are coherent but, despite havin...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780130305
更新日期:1994-02-15 00:00:00
abstract::A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi-likelihood type approximation for nonlinear variables and transform the prop...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7401
更新日期:2017-11-10 00:00:00
abstract::When the event time of interest depends on the censoring time, conventional two-sample test methods, such as the log-rank and Wilcoxon tests, can produce an invalid test result. We extend our previous work on estimation using auxiliary variables to adjust for dependent censoring via multiple imputation, to the compari...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3480
更新日期:2009-02-01 00:00:00
abstract::A Non-Parametric Maximum Likelihood approach to the estimation of relative risks in the context of disease mapping is discussed and a NPML approximation to conditional autoregressive models is proposed. NPML estimates have been compared to other proposed solutions (Maximum Likelihood via Monte Carlo Scoring, Hierarchi...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/1097-0258(20000915/30)19:17/18<2539::aid-s
更新日期:2000-09-15 00:00:00
abstract::Most phase I dose-finding methods in oncology aim to find the maximum-tolerated dose from a set of prespecified doses. However, in practice, because of a lack of understanding of the true dose-toxicity relationship, it is likely that none of these prespecified doses are equal or reasonably close to the true maximum-to...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6933
更新日期:2016-09-10 00:00:00
abstract::Using both simulated and real datasets, we compared two approaches for estimating absolute risk from nested case-control (NCC) data and demonstrated the feasibility of using the NCC design for estimating absolute risk. In contrast to previously published results, we successfully demonstrated not only that data from a ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7143
更新日期:2017-02-10 00:00:00
abstract::A mixture model incorporating long-term survivors has been adopted in the field of biostatistics where some individuals may never experience the failure event under study. The surviving fractions may be considered as cured. In most applications, the survival times are assumed to be independent. However, when the survi...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.932
更新日期:2001-06-15 00:00:00
abstract::Two correction methods are considered for multiple logistic regression models with some covariates measured with error. Both methods are based on approximating the complicated regression model between the response and the observed covariates with simpler models. The first model is the logistic approximation proposed b...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780131105
更新日期:1994-06-15 00:00:00
abstract::Risk prediction procedures can be quite useful for the patient's treatment selection, prevention strategy, or disease management in evidence-based medicine. Often, potentially important new predictors are available in addition to the conventional markers. The question is how to quantify the improvement from the new ma...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5647
更新日期:2013-06-30 00:00:00
abstract::We consider the problem of identifying a subgroup of patients who may have an enhanced treatment effect in a randomized clinical trial, and it is desirable that the subgroup be defined by a limited number of covariates. For this problem, the development of a standard, pre-determined strategy may help to avoid the well...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4322
更新日期:2011-10-30 00:00:00
abstract::Although the frequentist paradigm has been the predominant approach to clinical trial design since the 1940s, it has several notable limitations. Advancements in computational algorithms and computer hardware have greatly enhanced the alternative Bayesian paradigm. Compared with its frequentist counterpart, the Bayesi...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.5404
更新日期:2012-11-10 00:00:00
abstract::There are many settings in which the distribution of error in a mismeasured covariate varies with the value of another covariate. Take, for example, the case of HIV phylogenetic cluster size, large values of which are an indication of rapid HIV transmission. Researchers wish to find behavioral correlates of HIV phylog...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7289
更新日期:2017-07-30 00:00:00
abstract::Drop-out often occurs in clinical trials with multiple visits and drop-out is often informative in the sense that the population of patients who dropped out is different from the population of patients who completed the study. To handle data with informative drop-out, an intention-to-treat analysis, which evaluates tr...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1519
更新日期:2003-08-15 00:00:00
abstract::Lung function tests are used both clinically, in assessing disease, and epidemiologically, in identifying those factors which influence the growth and aging process of the lungs. The user must beware of several common pitfalls in the use of these tests, however. First, the commonly used tests of lung function can only...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780070106
更新日期:1988-01-01 00:00:00
abstract::In this paper we describe Bonferroni-based multiple testing procedures (MTPs) as strategies to split and recycle test mass. Here, 'test mass' refers to (parts of) the nominal level alpha at which the family-wise error rate is controlled. Briefly, test mass is split between different null hypotheses, and whenever a nul...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3513
更新日期:2009-02-28 00:00:00
abstract::This paper introduces a dynamic clustering methodology based on multi-valued descriptors of dermoscopic images. The main idea is to support medical diagnosis to decide if pigmented skin lesions belonging to an uncertain set are nearer to malignant melanoma or to benign nevi. Melanoma is the most deadly skin cancer, an...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4285
更新日期:2011-09-10 00:00:00
abstract::To update the British growth reference, anthropometric data for weight, height, body mass index (weight/height2) and head circumference from 17 distinct surveys representative of England, Scotland and Wales (37,700 children, age range 23 weeks gestation to 23 years) were analysed by maximum penalized likelihood using ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:
更新日期:1998-02-28 00:00:00
abstract::Determination of the equation that relates an ordered dependent variable to ordered independent variables is sought. One solution, non-parametric discriminant analysis (NPD), involves obtaining the best monotonic step function by means of a computer search procedure. Although one can use alternative selection criteria...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780110804
更新日期:1992-06-15 00:00:00
abstract::Health authorities are often alerted to suspected cancer clusters near the vicinity of potential point sources by members of the public. A surveillance system, where administrative regions around the potential point sources are regularly monitored for high disease rates, would allow for responses which are easier to o...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19960415)15:7/9<727::aid-s
更新日期:1996-04-15 00:00:00
abstract::The goal of screening programmes for cancer is early detection and treatment with a consequent reduction in mortality from the disease. Screening programmes need to assess the true benefit of screening, that is, the length of time of extension of survival beyond the time of advancement of diagnosis (lead-time). This p...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780142410
更新日期:1995-12-30 00:00:00
abstract::The Response Evaluation Criteria in Solid Tumors are used as standard guidelines for the clinical evaluation of cancer treatments. The assessment is based on the anatomical tumor burden: change in size of target lesions and evolution of nontarget lesions (NTL). Despite unquestionable advantages of this standard tool, ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7640
更新日期:2018-06-15 00:00:00
abstract::To study the effect of a mega hydropower dam in southwest Ethiopia on malaria incidence, we have set up a longitudinal study. To gain insight in temporal and spatial aspects, that is, in time (period = year-season combination) and location (village), we need models that account for these effects. The frailty model w...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5752
更新日期:2013-08-15 00:00:00
abstract::We propose to use a very simple model to test whether a cancer cluster is due to chance alone. We focus on the acute childhood leukaemia cluster in Columbus, Ohio. In 1975, 12 leukaemia cases were observed in Columbus while the expected number is 6 cases per year. According to our simple model, the probability of such...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/1097-0258(20000830)19:16<2195::aid-sim522>
更新日期:2000-08-30 00:00:00
abstract::To compare the survival functions based on right-truncated data, Lagakos et al. proposed a weighted logrank test based on a reverse time scale. This is in contrast to Bilker and Wang, who suggested a semi-parametric version of the Mann-Whitney test by assuming that the distribution of truncation times is known or can ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2556
更新日期:2007-02-20 00:00:00
abstract::Assessment of equivalence or non-inferiority in accuracy between two diagnostic procedures often involves comparisons of paired areas under the receiver operating characteristic (ROC) curves. With some pre-specified clinically meaningful limits, the current approach to evaluating equivalence is to perform the two one-...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2358
更新日期:2006-04-15 00:00:00
abstract::The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7028
更新日期:2016-11-20 00:00:00
abstract::Composite endpoints are frequently used in clinical trials, but simple approaches, such as the time to first event, do not reflect any ordering among the endpoints. However, some endpoints, such as mortality, are worse than others. A variety of procedures have been proposed to reflect the severity of the individual en...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8431
更新日期:2020-02-28 00:00:00