Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.

Abstract:

BACKGROUND:Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size ("data hungriness"). METHODS:We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01). RESULTS:We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable. CONCLUSIONS:Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.

journal_name

BMC Med Res Methodol

authors

van der Ploeg T,Austin PC,Steyerberg EW

doi

10.1186/1471-2288-14-137

subject

Has Abstract

pub_date

2014-12-22 00:00:00

pages

137

issn

1471-2288

pii

1471-2288-14-137

journal_volume

14

pub_type

杂志文章
  • Writing a discussion section: how to integrate substantive and statistical expertise.

    abstract:BACKGROUND:When discussing results medical research articles often tear substantive and statistical (methodical) contributions apart, just as if both were independent. Consequently, reasoning on bias tends to be vague, unclear and superficial. This can lead to over-generalized, too narrow and misleading conclusions, es...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,评审

    doi:10.1186/s12874-018-0490-1

    authors: Höfler M,Venz J,Trautmann S,Miller R

    更新日期:2018-04-17 00:00:00

  • Does it matter whether the recipient of patient questionnaires in general practice is the general practitioner or an independent researcher? The REPLY randomised trial.

    abstract:BACKGROUND:Self-administered questionnaires are becoming increasingly common in general practice. Much research has explored methods to increase response rates but comparatively few studies have explored the effect of questionnaire administration on reported answers. METHODS:The aim of this study was to determine the ...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,随机对照试验

    doi:10.1186/1471-2288-8-42

    authors: Desborough JA,Butters P,Bhattacharya D,Holland RC,Wright DJ

    更新日期:2008-06-27 00:00:00

  • Linkage of the CHHiP randomised controlled trial with primary care data: a study investigating ways of supplementing cancer trials and improving evidence-based practice.

    abstract:BACKGROUND:Randomised controlled trials (RCTs) are the gold standard for evidence-based practice. However, RCTs can have limitations. For example, translation of findings into practice can be limited by design features, such as inclusion criteria, not accurately reflecting clinical populations. In addition, it is expen...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-020-01078-9

    authors: Lemanska A,Byford RC,Cruickshank C,Dearnaley DP,Ferreira F,Griffin C,Hall E,Hinton W,de Lusignan S,Sherlock J,Faithfull S

    更新日期:2020-07-25 00:00:00

  • Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data.

    abstract:BACKGROUND:Multiple Imputation as usually implemented assumes that data are Missing At Random (MAR), meaning that the underlying missing data mechanism, given the observed data, is independent of the unobserved data. To explore the sensitivity of the inferences to departures from the MAR assumption, we applied the meth...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-12-73

    authors: Héraud-Bousquet V,Larsen C,Carpenter J,Desenclos JC,Le Strat Y

    更新日期:2012-06-08 00:00:00

  • Telephone and face to face methods of assessment of veteran's community reintegration yield equivalent results.

    abstract:BACKGROUND:The Community Reintegration of Service Members (CRIS) is a new measure of community reintegration developed to measure veteran's participation in life roles. It consists of three sub-scales: Extent of Participation (Extent), Perceived Limitations with Participation (Perceived), and Satisfaction with Particip...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,随机对照试验

    doi:10.1186/1471-2288-11-98

    authors: Resnik LJ,Clark MA,Borgia M

    更新日期:2011-06-25 00:00:00

  • Mechanisms and pathways to impact in public health research: a preliminary analysis of research funded by the National Institute for Health Research (NIHR).

    abstract:BACKGROUND:The mechanisms and pathways to impacts from public health research in the UK have not been widely studied. Through the lens of one funder (NIHR), our aims are to map the diversity of public health research, in terms of funding mechanisms, disciplinary contributions, and public health impacts, identify exampl...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-020-0905-7

    authors: Boulding H,Kamenetzky A,Ghiga I,Ioppolo B,Herrera F,Parks S,Manville C,Guthrie S,Hinrichs-Krapels S

    更新日期:2020-02-19 00:00:00

  • Crowding in the emergency department in the absence of boarding - a transition regression model to predict departures and waiting time.

    abstract:BACKGROUND:Crowding in the emergency department (ED) is associated with increased mortality, increased treatment cost, and reduced quality of care. Crowding arises when demand exceed resources in the ED and a first sign may be increasing waiting time. We aimed to quantify predictors for departure from the ED, and relat...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-019-0710-3

    authors: Eiset AH,Kirkegaard H,Erlandsen M

    更新日期:2019-03-29 00:00:00

  • Protocol for a systematic review and individual patient data meta-analysis of prognostic factors of foot ulceration in people with diabetes: the international research collaboration for the prediction of diabetic foot ulcerations (PODUS).

    abstract:BACKGROUND:Diabetes-related lower limb amputations are associated with considerable morbidity and mortality and are usually preceded by foot ulceration. The available systematic reviews of aggregate data are compromised because the primary studies report both adjusted and unadjusted estimates. As adjusted meta-analyses...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,meta分析

    doi:10.1186/1471-2288-13-22

    authors: Crawford F,Anandan C,Chappell FM,Murray GD,Price JF,Sheikh A,Simpson CR,Maxwell M,Stansby GP,Young MJ,Abbott CA,Boulton AJ,Boyko EJ,Kastenbauer T,Leese GP,Monami M,Monteiro-Soares M,Rith-Najarian SJ,Veves A,Coates N

    更新日期:2013-02-15 00:00:00

  • Validity of the International Physical Activity Questionnaire (IPAQ) for assessing moderate-to-vigorous physical activity and sedentary behaviour of older adults in the United Kingdom.

    abstract:BACKGROUND:In order to accurately measure and monitor levels of moderate-to-vigorous physical activity (MVPA) and sedentary behaviour (SB) in older adults, cost efficient and valid instruments are required. To date, the International Physical Activity Questionnaire (IPAQ) has not been validated with older adults (aged ...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,多中心研究

    doi:10.1186/s12874-018-0642-3

    authors: Cleland C,Ferguson S,Ellis G,Hunter RF

    更新日期:2018-12-22 00:00:00

  • Incorporating nonlinearity into mediation analyses.

    abstract:BACKGROUND:Mediation is an important issue considered in the behavioral, medical, and social sciences. It addresses situations where the effect of a predictor variable X on an outcome variable Y is explained to some extent by an intervening, mediator variable M. Methods for addressing mediation have been available for ...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-017-0296-6

    authors: Knafl GJ,Knafl KA,Grey M,Dixon J,Deatrick JA,Gallo AM

    更新日期:2017-03-21 00:00:00

  • Sample size calculations for cluster randomised controlled trials with a fixed number of clusters.

    abstract:BACKGROUND:Cluster randomised controlled trials (CRCTs) are frequently used in health service evaluation. Assuming an average cluster size, required sample sizes are readily computed for both binary and continuous outcomes, by estimating a design effect or inflation factor. However, where the number of clusters are fix...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-11-102

    authors: Hemming K,Girling AJ,Sitch AJ,Marsh J,Lilford RJ

    更新日期:2011-06-30 00:00:00

  • A systematic survey shows that reporting and handling of missing outcome data in networks of interventions is poor.

    abstract:BACKGROUND:To provide empirical evidence about prevalence, reporting and handling of missing outcome data in systematic reviews with network meta-analysis and acknowledgement of their impact on the conclusions. METHODS:We conducted a systematic survey including all published systematic reviews of randomized controlled...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-018-0576-9

    authors: Spineli LM,Yepes-Nuñez JJ,Schünemann HJ

    更新日期:2018-10-24 00:00:00

  • Simulation-based estimation of mean and standard deviation for meta-analysis via Approximate Bayesian Computation (ABC).

    abstract:BACKGROUND:When conducting a meta-analysis of a continuous outcome, estimated means and standard deviations from the selected studies are required in order to obtain an overall estimate of the mean effect and its confidence interval. If these quantities are not directly reported in the publications, they must be estima...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-015-0055-5

    authors: Kwon D,Reis IM

    更新日期:2015-08-12 00:00:00

  • The efficiency and effectiveness of utilizing diagrams in interviews: an assessment of participatory diagramming and graphic elicitation.

    abstract:BACKGROUND:This paper focuses on measuring the efficiency and effectiveness of two diagramming methods employed in key informant interviews with clinicians and health care administrators. The two methods are 'participatory diagramming', where the respondent creates a diagram that assists in their communication of answe...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-8-53

    authors: Umoquit MJ,Dobrow MJ,Lemieux-Charles L,Ritvo PG,Urbach DR,Wodchis WP

    更新日期:2008-08-08 00:00:00

  • Validating the generic quality of life tool "QOL10" in a substance use disorder treatment cohort exposes a unique social construct.

    abstract:BACKGROUND:Generic quality of life (QoL) instruments provide important measures of self-reported wellbeing that can be compared across healthy and clinical populations. The aim of this analysis is to validate the ten-item QoL instrument "QOL10", as well as to confirm the validity of the embedded "QOL5" questionnaire an...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-016-0163-x

    authors: Muller AE,Skurtveit S,Clausen T

    更新日期:2016-05-23 00:00:00

  • Psychometric analysis of the brief symptom inventory 18 (BSI-18) in a representative German sample.

    abstract:BACKGROUND:The BSI-18 contains the three six-item scales somatization, depression, and anxiety as well as the Global Severity Index (GSI), including all 18 items. The BSI-18 is the latest and shortest of the multidimensional versions of the Symptom-Checklist 90-R, but its psychometric properties have not been sufficien...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-016-0283-3

    authors: Franke GH,Jaeger S,Glaesmer H,Barkmann C,Petrowski K,Braehler E

    更新日期:2017-01-26 00:00:00

  • Measurement and control of bias in patient reported outcomes using multidimensional item response theory.

    abstract:BACKGROUND:Patient-reported outcome (PRO) measures play a key role in the advancement of patient-centered care research. The accuracy of inferences, relevance of predictions, and the true nature of the associations made with PRO data depend on the validity of these measures. Errors inherent to self-report measures can ...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-016-0161-z

    authors: Dowling NM,Bolt DM,Deng S,Li C

    更新日期:2016-05-26 00:00:00

  • Participant recruitment in sensitive surveys: a comparative trial of 'opt in' versus 'opt out' approaches.

    abstract:BACKGROUND:Although in health services survey research we strive for a high response rate, this must be balanced against the need to recruit participants ethically and considerately, particularly in surveys with a sensitive nature. In survey research there are no established recommendations to guide recruitment approac...

    journal_title:BMC medical research methodology

    pub_type: 临床试验,杂志文章

    doi:10.1186/1471-2288-13-3

    authors: Hunt KJ,Shlomo N,Addington-Hall J

    更新日期:2013-01-11 00:00:00

  • Effects of the search technique on the measurement of the change in quality of randomized controlled trials over time in the field of brain injury.

    abstract:BACKGROUND:To determine if the search technique that is used to sample randomized controlled trial (RCT) manuscripts from a field of medical science can influence the measurement of the change in quality over time in that field. METHODS:RCT manuscripts in the field of brain injury were identified using two readily-ava...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-5-7

    authors: Borsody MK,Yamada C

    更新日期:2005-02-07 00:00:00

  • Dynamic risk prediction for diabetes using biomarker change measurements.

    abstract:BACKGROUND:Dynamic risk models, which incorporate disease-free survival and repeated measurements over time, might yield more accurate predictions of future health status compared to static models. The objective of this study was to develop and apply a dynamic prediction model to estimate the risk of developing type 2 ...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-019-0812-y

    authors: Parast L,Mathews M,Friedberg MW

    更新日期:2019-08-14 00:00:00

  • A probit- log- skew-normal mixture model for repeated measures data with excess zeros, with application to a cohort study of paediatric respiratory symptoms.

    abstract:BACKGROUND:A zero-inflated continuous outcome is characterized by occurrence of "excess" zeros that more than a single distribution can explain, with the positive observations forming a skewed distribution. Mixture models are employed for regression analysis of zero-inflated data. Moreover, for repeated measures zero-i...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-10-55

    authors: Mahmud S,Lou WW,Johnston NW

    更新日期:2010-06-14 00:00:00

  • Utilizing distributional analytics and electronic records to assess timeliness of inpatient blood glucose monitoring in non-critical care wards.

    abstract:BACKGROUND:Regular and timely monitoring of blood glucose (BG) levels in hospitalized patients with diabetes mellitus is crucial to optimizing inpatient glycaemic control. However, methods to quantify timeliness as a measurement of quality of care are lacking. We propose an analytical approach that utilizes BG measurem...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-016-0142-2

    authors: Chen Y,Kao SL,Tai ES,Wee HL,Khoo EY,Ning Y,Salloway MK,Deng X,Tan CS

    更新日期:2016-04-08 00:00:00

  • Abstracts in high profile journals often fail to report harm.

    abstract:BACKGROUND:To describe how frequently harm is reported in the abstract of high impact factor medical journals. METHODS: DESIGN AND POPULATION:We carried out a blinded structured review of a random sample of 363 Randomised Controlled Trials (RCTs) carried out on human beings, and published in high impact factor medica...

    journal_title:BMC medical research methodology

    pub_type: 信件,评审

    doi:10.1186/1471-2288-8-14

    authors: Bernal-Delgado E,Fisher ES

    更新日期:2008-03-27 00:00:00

  • Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials.

    abstract:BACKGROUND:Small number of clusters and large variation of cluster sizes commonly exist in cluster-randomized trials (CRTs) and are often the critical factors affecting the validity and efficiency of statistical analyses. F tests are commonly used in the generalized linear mixed model (GLMM) to test intervention effect...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-015-0026-x

    authors: Li P,Redden DT

    更新日期:2015-04-23 00:00:00

  • Recruitment of adolescents with suicidal ideation in the emergency department: lessons from a randomized controlled pilot trial of a youth suicide prevention intervention.

    abstract:BACKGROUND:Emergency Departments (EDs) are a first point-of-contact for many youth with mental health and suicidality concerns and can serve as an effective recruitment source for randomized controlled trials (RCTs) of mental health interventions. However, recruitment in acute care settings is impeded by several challe...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-020-01117-5

    authors: Tracey M,Finkelstein Y,Schachter R,Cleverley K,Monga S,Barwick M,Szatmari P,Moretti ME,Willan A,Henderson J,Korczak DJ

    更新日期:2020-09-14 00:00:00

  • Recruitment and retention in a multicentre randomised controlled trial in Bell's palsy: a case study.

    abstract:BACKGROUND:It is notoriously difficult to recruit patients to randomised controlled trials in primary care. This is particularly true when the disease process under investigation occurs relatively infrequently and must be investigated during a brief time window. Bell's palsy, an acute unilateral paralysis of the facial...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,随机对照试验

    doi:10.1186/1471-2288-7-15

    authors: McKinstry B,Hammersley V,Daly F,Sullivan F

    更新日期:2007-03-28 00:00:00

  • Awareness of wearing an accelerometer does not affect physical activity in youth.

    abstract:BACKGROUND:This study aimed to investigate whether awareness of being monitored by an accelerometer has an effect on physical activity in young people. METHODS:Eighty healthy participants aged 10-18 years were randomized between blinded and nonblinded groups. The blinded participants were informed that we were testing...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章,随机对照试验

    doi:10.1186/s12874-017-0378-5

    authors: Vanhelst J,Béghin L,Drumez E,Coopman S,Gottrand F

    更新日期:2017-07-11 00:00:00

  • Comparing survival curves based on medians.

    abstract:BACKGROUND:Although some nonparametric methods have been proposed in the literature to test for the equality of median survival times for censored data in medical research, in general they have inflated type I error rates, which make their use limited in practice, especially when the sample sizes are small. METHODS:In...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/s12874-016-0133-3

    authors: Chen Z,Zhang G

    更新日期:2016-03-16 00:00:00

  • Reported frequency of physical activity in a large epidemiological study: relationship to specific activities and repeatability over time.

    abstract:BACKGROUND:How overall physical activity relates to specific activities and how reported activity changes over time may influence interpretation of observed associations between physical activity and health. We examine the relationships between various physical activities self-reported at different times in a large coh...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-11-97

    authors: Armstrong ME,Cairns BJ,Green J,Reeves GK,Beral V,Million Women Study Collaborators.

    更新日期:2011-06-22 00:00:00

  • Comparison of confidence interval methods for an intra-class correlation coefficient (ICC).

    abstract:BACKGROUND:The intraclass correlation coefficient (ICC) is widely used in biomedical research to assess the reproducibility of measurements between raters, labs, technicians, or devices. For example, in an inter-rater reliability study, a high ICC value means that noise variability (between-raters and within-raters) is...

    journal_title:BMC medical research methodology

    pub_type: 杂志文章

    doi:10.1186/1471-2288-14-121

    authors: Ionan AC,Polley MY,McShane LM,Dobbin KK

    更新日期:2014-11-22 00:00:00