Abstract:
:Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.
journal_name
Stat Methods Med Resjournal_title
Statistical methods in medical researchauthors
van Smeden M,Moons KG,de Groot JA,Collins GS,Altman DG,Eijkemans MJ,Reitsma JBdoi
10.1177/0962280218784726subject
Has Abstractpub_date
2019-08-01 00:00:00pages
2455-2474issue
8eissn
0962-2802issn
1477-0334journal_volume
28pub_type
杂志文章abstract::Propensity score methods are common for estimating a binary treatment effect when treatment assignment is not randomized. When exposure is measured on an ordinal scale (i.e. low-medium-high), however, propensity score inference requires extensions which have received limited attention. Estimands of possible interest w...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280214560046
更新日期:2017-04-01 00:00:00
abstract::Multilevel models were originally developed to allow linear regression or ANOVA models to be applied to observations that are not mutually independent. This lack of independence commonly arises due to clustering of the units of observations into 'higher level units' such as patients in hospitals. In linear mixed model...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/096228020101000604
更新日期:2001-12-01 00:00:00
abstract::Optimal therapeutic decisions can be made according to disease prognosis, where the residual lifetime is extensively used because of its straightforward interpretation and formula. To predict the residual lifetime in a dynamic manner, a longitudinal biomarker that is repeatedly measured during the post-baseline follow...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280217753466
更新日期:2019-04-01 00:00:00
abstract::Clinical trials are expensive and time-consuming and so should also be used to study how treatments work, allowing for the evaluation of theoretical treatment models and refinement and improvement of treatments. These treatment processes can be studied using mediation analysis. Randomised treatment makes some of the a...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280216666111
更新日期:2018-06-01 00:00:00
abstract::This paper is based on a conference presentation in which several authors presented results from analyses of the same dataset concerning the evaluation of progression-free survival (PFS) as a surrogate endpoint for overall survival in advanced colorectal cancer clinical trials. In evaluating a potential surrogate endp...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280207081860
更新日期:2008-10-01 00:00:00
abstract::We propose a fully parametric model for the analysis of competing risks data where the types of failure may not be independent. We show how the dependence between the cause-specific survival times can be modelled with a copula function. Features include: identifiability of the problem; accessible understanding of the ...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1191/0962280203sm335ra
更新日期:2003-08-01 00:00:00
abstract::Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical fra...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280217747054
更新日期:2018-07-01 00:00:00
abstract::Statistical methods for carrying out asymptotic inferences (tests or confidence intervals) relative to one or two independent binomial proportions are very frequent. However, inferences about a linear combination of K independent proportions L = Σβ(i)p(i) (in which the first two are special cases) have had very little...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280209347953
更新日期:2011-08-01 00:00:00
abstract::In many health studies, researchers are interested in estimating the treatment effects on the outcome around and through an intermediate variable. Such causal mediation analyses aim to understand the mechanisms that explain the treatment effect. Although multiple mediators are often involved in real studies, most of t...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280215615899
更新日期:2018-01-01 00:00:00
abstract::Dependent censoring arises in biomedical studies when the survival outcome of interest is censored by competing risks. In survival data with microarray gene expressions, gene selection based on the univariate Cox regression analyses has been used extensively in medical research, which however, is only valid under the ...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280214533378
更新日期:2016-12-01 00:00:00
abstract::The use of genetic variants as instrumental variables - an approach known as Mendelian randomization - is a popular epidemiological method for estimating the causal effect of an exposure (phenotype, biomarker, risk factor) on a disease or health-related outcome from observational data. Instrumental variables must sati...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280219851817
更新日期:2020-04-01 00:00:00
abstract::This article is motivated by the need for discovering patterns of patients' health based on their daily settings of care to aid the health policy-makers to improve the effectiveness of distributing funding for health services. The hidden process of one's health status is assumed to be a continuous smooth function, cal...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280220951834
更新日期:2020-09-25 00:00:00
abstract::The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of p...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280218774817
更新日期:2019-06-01 00:00:00
abstract::The analysis of health care costs is complicated by the skewed and heteroscedastic nature of their distribution with possibly additional zero values. Statistical methods that do not adjust for these features can lead to incorrect conclusions. This paper reviews recent developments in statistical methods for the analys...
journal_title:Statistical methods in medical research
pub_type: 杂志文章,评审
doi:10.1191/0962280202sm290ra
更新日期:2002-08-01 00:00:00
abstract::Data in many experiments arise as curves and therefore it is natural to use a curve as a basic unit in the analysis, which is termed functional data analysis (FDA). In longitudinal studies, recent developments in FDA have extended classical linear models and linear mixed effects models to functional linear models (als...
journal_title:Statistical methods in medical research
pub_type: 杂志文章,评审
doi:10.1191/0962280204sm352ra
更新日期:2004-02-01 00:00:00
abstract::DNA methylation has been shown to play an important role in many complex diseases. The rapid development of high-throughput DNA methylation scan technologies provides great opportunities for genomewide DNA methylation-disease association studies. As methylation is a dynamic process involving time, it is quite plausibl...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280216683571
更新日期:2018-09-01 00:00:00
abstract::In this paper, we describe a Bayesian hierarchical Poisson model for the prospective analysis of data for infectious diseases. The proposed model consists of two components. The first component describes the behavior of disease during nonepidemic periods and the second component represents the increase in disease coun...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280214527385
更新日期:2014-12-01 00:00:00
abstract::Dependent binary response data arise frequently in practice due to repeated measurements in longitudinal studies or to subsampling primary sampling units as in fields such as teratology and ophthalmology. Several classes of approaches have recently been proposed to analyse such repeated binary outcome data. The differ...
journal_title:Statistical methods in medical research
pub_type: 杂志文章,评审
doi:10.1177/096228029200100303
更新日期:1992-01-01 00:00:00
abstract::Variable selection in semiparametric mixed models for longitudinal data remains a challenge, especially in the presence of multiple correlated outcomes. In this paper, we propose a model selection procedure that simultaneously selects fixed and random effects using a maximum penalized likelihood method with the adapti...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280217690769
更新日期:2018-10-01 00:00:00
abstract::In recent years, there has been a prominent discussion in the literature about the potential for overestimation of the treatment effect when a clinical trial stops at an interim analysis due to the experimental treatment showing a benefit over the control. However, there has been much less attention paid to the conver...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280218795320
更新日期:2019-10-01 00:00:00
abstract::Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1191/0962280204sm372ra
更新日期:2004-10-01 00:00:00
abstract::The accuracy of a diagnostic test, which is often quantified by a pair of measures such as sensitivity and specificity, is critical for medical decision making. Separate studies of an investigational diagnostic test can be combined through meta-analysis; however, such an analysis can be threatened by publication bias....
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280218791602
更新日期:2019-10-01 00:00:00
abstract::This work presents a brief overview of Markov models in cancer screening evaluation and focuses on two specific models. A three-state model was first proposed to estimate jointly the sensitivity of the screening procedure and the average duration in the preclinical phase, i.e. the period when the cancer is asymptomati...
journal_title:Statistical methods in medical research
pub_type: 杂志文章,评审
doi:10.1177/0962280209359848
更新日期:2010-10-01 00:00:00
abstract::Censored data make survival analysis more complicated because exact event times are not observed. Statistical methodology developed to account for censored observations assumes that patients' withdrawal from a study is independent of the event of interest. However, in practice, some covariates might be associated to b...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280216628900
更新日期:2018-02-01 00:00:00
abstract::For semi-continuous data which are a mixture of true zeros and continuously distributed positive values, the use of two-part mixed models provides a convenient modelling framework. However, deriving population-averaged (marginal) effects from such models is not always straightforward. Su et al. presented a model that ...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280213509798
更新日期:2016-10-01 00:00:00
abstract::Non-parametric linkage analysis examines similarities among affected relatives in alleles of one or more genetic markers (pieces of DNA at known locations on a chromosome). The objective is to evaluate departures from the null hypothesis that the markers are not near a disease gene. Under the null hypothesis, Mendel's...
journal_title:Statistical methods in medical research
pub_type: 杂志文章,评审
doi:10.1177/096228020101000103
更新日期:2001-02-01 00:00:00
abstract::Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because o...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280217752980
更新日期:2019-07-01 00:00:00
abstract::Medical research commonly relies on the combination of 2 x 2 tables of counted data for making inferences about treatment effects or about the causes of disease. This article reviews point estimation and interval estimation for a common odds ratio. Traditional methods for providing these estimates face special challen...
journal_title:Statistical methods in medical research
pub_type: 杂志文章,评审
doi:10.1177/096228029400300204
更新日期:1994-01-01 00:00:00
abstract::Accelerated failure time model is a popular model to analyze censored time-to-event data. Analysis of this model without assuming any parametric distribution for the model error is challenging, and the model complexity is enhanced in the presence of large number of covariates. We developed a nonparametric Bayesian met...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280215626947
更新日期:2018-04-01 00:00:00
abstract::A dynamic treatment regime is a set of decision rules for how to treat a patient at multiple time points. At each time point, a treatment decision is made depending on the patient's medical history up to that point. We consider the infinite-horizon setting in which the number of decision points is very large. Specific...
journal_title:Statistical methods in medical research
pub_type: 杂志文章
doi:10.1177/0962280217708655
更新日期:2017-08-01 00:00:00