Sample size for binary logistic prediction models: Beyond events per variable criteria.

Abstract:

:Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.

journal_name

Stat Methods Med Res

authors

van Smeden M,Moons KG,de Groot JA,Collins GS,Altman DG,Eijkemans MJ,Reitsma JB

doi

10.1177/0962280218784726

subject

Has Abstract

pub_date

2019-08-01 00:00:00

pages

2455-2474

issue

8

eissn

0962-2802

issn

1477-0334

journal_volume

28

pub_type

杂志文章
  • Estimating the average treatment effects of nutritional label use using subclassification with regression adjustment.

    abstract::Propensity score methods are common for estimating a binary treatment effect when treatment assignment is not randomized. When exposure is measured on an ordinal scale (i.e. low-medium-high), however, propensity score inference requires extensions which have received limited attention. Estimands of possible interest w...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214560046

    authors: Lopez MJ,Gutman R

    更新日期:2017-04-01 00:00:00

  • Multilevel models for censored and latent responses.

    abstract::Multilevel models were originally developed to allow linear regression or ANOVA models to be applied to observations that are not mutually independent. This lack of independence commonly arises due to clustering of the units of observations into 'higher level units' such as patients in hospitals. In linear mixed model...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/096228020101000604

    authors: Rabe-Hesketh S,Yang S,Pickles A

    更新日期:2001-12-01 00:00:00

  • Quantile residual lifetime regression with functional principal component analysis of longitudinal data for dynamic prediction.

    abstract::Optimal therapeutic decisions can be made according to disease prognosis, where the residual lifetime is extensively used because of its straightforward interpretation and formula. To predict the residual lifetime in a dynamic manner, a longitudinal biomarker that is repeatedly measured during the post-baseline follow...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217753466

    authors: Lin X,Li R,Yan F,Lu T,Huang X

    更新日期:2019-04-01 00:00:00

  • Measurement error, time lag, unmeasured confounding: Considerations for longitudinal estimation of the effect of a mediator in randomised clinical trials.

    abstract::Clinical trials are expensive and time-consuming and so should also be used to study how treatments work, allowing for the evaluation of theoretical treatment models and refinement and improvement of treatments. These treatment processes can be studied using mediation analysis. Randomised treatment makes some of the a...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216666111

    authors: Goldsmith KA,Chalder T,White PD,Sharpe M,Pickles A

    更新日期:2018-06-01 00:00:00

  • Practical issues arising in an exploratory analysis evaluating progression-free survival as a surrogate endpoint for overall survival in advanced colorectal cancer.

    abstract::This paper is based on a conference presentation in which several authors presented results from analyses of the same dataset concerning the evaluation of progression-free survival (PFS) as a surrogate endpoint for overall survival in advanced colorectal cancer clinical trials. In evaluating a potential surrogate endp...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280207081860

    authors: Hughes MD

    更新日期:2008-10-01 00:00:00

  • Fitting competing risks with an assumed copula.

    abstract::We propose a fully parametric model for the analysis of competing risks data where the types of failure may not be independent. We show how the dependence between the cause-specific survival times can be modelled with a copula function. Features include: identifiability of the problem; accessible understanding of the ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1191/0962280203sm335ra

    authors: Escarela G,Carrière JF

    更新日期:2003-08-01 00:00:00

  • Fitting mechanistic epidemic models to data: A comparison of simple Markov chain Monte Carlo approaches.

    abstract::Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical fra...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217747054

    authors: Li M,Dushoff J,Bolker BM

    更新日期:2018-07-01 00:00:00

  • Inferences about a linear combination of proportions.

    abstract::Statistical methods for carrying out asymptotic inferences (tests or confidence intervals) relative to one or two independent binomial proportions are very frequent. However, inferences about a linear combination of K independent proportions L = Σβ(i)p(i) (in which the first two are special cases) have had very little...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280209347953

    authors: Martín Andrés A,Alvarez Hernández M,Herranz Tejedor I

    更新日期:2011-08-01 00:00:00

  • Causal mediation analysis with multiple causally non-ordered mediators.

    abstract::In many health studies, researchers are interested in estimating the treatment effects on the outcome around and through an intermediate variable. Such causal mediation analyses aim to understand the mechanisms that explain the treatment effect. Although multiple mediators are often involved in real studies, most of t...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280215615899

    authors: Taguri M,Featherstone J,Cheng J

    更新日期:2018-01-01 00:00:00

  • Gene selection for survival data under dependent censoring: A copula-based approach.

    abstract::Dependent censoring arises in biomedical studies when the survival outcome of interest is censored by competing risks. In survival data with microarray gene expressions, gene selection based on the univariate Cox regression analyses has been used extensively in medical research, which however, is only valid under the ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214533378

    authors: Emura T,Chen YH

    更新日期:2016-12-01 00:00:00

  • Inferring the direction of a causal link and estimating its effect via a Bayesian Mendelian randomization approach.

    abstract::The use of genetic variants as instrumental variables - an approach known as Mendelian randomization - is a popular epidemiological method for estimating the causal effect of an exposure (phenotype, biomarker, risk factor) on a disease or health-related outcome from observational data. Instrumental variables must sati...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280219851817

    authors: Bucur IG,Claassen T,Heskes T

    更新日期:2020-04-01 00:00:00

  • Pattern discovery of health curves using an ordered probit model with Bayesian smoothing and functional principal component analysis.

    abstract::This article is motivated by the need for discovering patterns of patients' health based on their daily settings of care to aid the health policy-makers to improve the effectiveness of distributing funding for health services. The hidden process of one's health status is assumed to be a continuous smooth function, cal...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220951834

    authors: Wang S,Nie Y,Sutherland JM,Wang L

    更新日期:2020-09-25 00:00:00

  • On adaptive propensity score truncation in causal inference.

    abstract::The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of p...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218774817

    authors: Ju C,Schwab J,van der Laan MJ

    更新日期:2019-06-01 00:00:00

  • Inferences about population means of health care costs.

    abstract::The analysis of health care costs is complicated by the skewed and heteroscedastic nature of their distribution with possibly additional zero values. Statistical methods that do not adjust for these features can lead to incorrect conclusions. This paper reviews recent developments in statistical methods for the analys...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1191/0962280202sm290ra

    authors: Zhou XH

    更新日期:2002-08-01 00:00:00

  • Functional data analysis in longitudinal settings using smoothing splines.

    abstract::Data in many experiments arise as curves and therefore it is natural to use a curve as a basic unit in the analysis, which is termed functional data analysis (FDA). In longitudinal studies, recent developments in FDA have extended classical linear models and linear mixed effects models to functional linear models (als...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1191/0962280204sm352ra

    authors: Guo W

    更新日期:2004-02-01 00:00:00

  • armDNA: A functional beta model for detecting age-related genomewide DNA methylation marks.

    abstract::DNA methylation has been shown to play an important role in many complex diseases. The rapid development of high-throughput DNA methylation scan technologies provides great opportunities for genomewide DNA methylation-disease association studies. As methylation is a dynamic process involving time, it is quite plausibl...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216683571

    authors: Wang C,Shen Q,Du L,Xu J,Zhang H

    更新日期:2018-09-01 00:00:00

  • Prospective analysis of infectious disease surveillance data using syndromic information.

    abstract::In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease coun...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214527385

    authors: Corberán-Vallet A,Lawson AB

    更新日期:2014-12-01 00:00:00

  • Statistical methods for longitudinal and clustered designs with binary responses.

    abstract::Dependent binary response data arise frequently in practice due to repeated measurements in longitudinal studies or to subsampling primary sampling units as in fields such as teratology and ophthalmology. Several classes of approaches have recently been proposed to analyse such repeated binary outcome data. The differ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029200100303

    authors: Neuhaus JM

    更新日期:1992-01-01 00:00:00

  • Model selection in multivariate semiparametric regression.

    abstract::Variable selection in semiparametric mixed models for longitudinal data remains a challenge, especially in the presence of multiple correlated outcomes. In this paper, we propose a model selection procedure that simultaneously selects fixed and random effects using a maximum penalized likelihood method with the adapti...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217690769

    authors: Li Z,Liu H,Tu W

    更新日期:2018-10-01 00:00:00

  • Underestimation of treatment effects in sequentially monitored clinical trials that did not stop early for benefit.

    abstract::In recent years, there has been a prominent discussion in the literature about the potential for overestimation of the treatment effect when a clinical trial stops at an interim analysis due to the experimental treatment showing a benefit over the control. However, there has been much less attention paid to the conver...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218795320

    authors: Marschner IC,Schou IM

    更新日期:2019-10-01 00:00:00

  • Mixture modelling for cluster analysis.

    abstract::Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1191/0962280204sm372ra

    authors: McLachlan GJ,Chang SU

    更新日期:2004-10-01 00:00:00

  • Copas-like selection model to correct publication bias in systematic review of diagnostic test studies.

    abstract::The accuracy of a diagnostic test, which is often quantified by a pair of measures such as sensitivity and specificity, is critical for medical decision making. Separate studies of an investigational diagnostic test can be combined through meta-analysis; however, such an analysis can be threatened by publication bias....

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218791602

    authors: Piao J,Liu Y,Chen Y,Ning J

    更新日期:2019-10-01 00:00:00

  • Multi-state Markov models in cancer screening evaluation: a brief review and case study.

    abstract::This work presents a brief overview of Markov models in cancer screening evaluation and focuses on two specific models. A three-state model was first proposed to estimate jointly the sensitivity of the screening procedure and the average duration in the preclinical phase, i.e. the period when the cancer is asymptomati...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/0962280209359848

    authors: Uhry Z,Hédelin G,Colonna M,Asselain B,Arveux P,Rogel A,Exbrayat C,Guldenfels C,Courtial I,Soler-Michel P,Molinié F,Eilstein D,Duffy SW

    更新日期:2010-10-01 00:00:00

  • Correcting for dependent censoring in routine outcome monitoring data by applying the inverse probability censoring weighted estimator.

    abstract::Censored data make survival analysis more complicated because exact event times are not observed. Statistical methodology developed to account for censored observations assumes that patients' withdrawal from a study is independent of the event of interest. However, in practice, some covariates might be associated to b...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216628900

    authors: Willems S,Schat A,van Noorden MS,Fiocco M

    更新日期:2018-02-01 00:00:00

  • A corrected formulation for marginal inference derived from two-part mixed models for longitudinal semi-continuous data.

    abstract::For semi-continuous data which are a mixture of true zeros and continuously distributed positive values, the use of two-part mixed models provides a convenient modelling framework. However, deriving population-averaged (marginal) effects from such models is not always straightforward. Su et al. presented a model that ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280213509798

    authors: Tom BD,Su L,Farewell VT

    更新日期:2016-10-01 00:00:00

  • Allele-sharing among affected relatives: non-parametric methods for identifying genes.

    abstract::Non-parametric linkage analysis examines similarities among affected relatives in alleles of one or more genetic markers (pieces of DNA at known locations on a chromosome). The objective is to evaluate departures from the null hypothesis that the markers are not near a disease gene. Under the null hypothesis, Mendel's...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228020101000103

    authors: Shih MC,Whittemore AS

    更新日期:2001-02-01 00:00:00

  • Semi-supervised identification of cancer subgroups using survival outcomes and overlapping grouping information.

    abstract::Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because o...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217752980

    authors: Wei W,Sun Z,da Silveira WA,Yu Z,Lawson A,Hardiman G,Kelemen LE,Chung D

    更新日期:2019-07-01 00:00:00

  • Combining estimates of the odds ratio: the state of the art.

    abstract::Medical research commonly relies on the combination of 2 x 2 tables of counted data for making inferences about treatment effects or about the causes of disease. This article reviews point estimation and interval estimation for a common odds ratio. Traditional methods for providing these estimates face special challen...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029400300204

    authors: Emerson JD

    更新日期:1994-01-01 00:00:00

  • Bayesian variable selection in the accelerated failure time model with an application to the surveillance, epidemiology, and end results breast cancer data.

    abstract::Accelerated failure time model is a popular model to analyze censored time-to-event data. Analysis of this model without assuming any parametric distribution for the model error is challenging, and the model complexity is enhanced in the presence of large number of covariates. We developed a nonparametric Bayesian met...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280215626947

    authors: Zhang Z,Sinha S,Maiti T,Shipp E

    更新日期:2018-04-01 00:00:00

  • Change-point detection for infinite horizon dynamic treatment regimes.

    abstract::A dynamic treatment regime is a set of decision rules for how to treat a patient at multiple time points. At each time point, a treatment decision is made depending on the patient's medical history up to that point. We consider the infinite-horizon setting in which the number of decision points is very large. Specific...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217708655

    authors: Goldberg Y,Pollak M,Mitelpunkt A,Orlovsky M,Weiss-Meilik A,Gorfine M

    更新日期:2017-08-01 00:00:00