A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data.

Abstract:

:Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

journal_name

Stat Med

journal_title

Statistics in medicine

authors

Wolfson J,Bandyopadhyay S,Elidrisi M,Vazquez-Benitez G,Vock DM,Musgrove D,Adomavicius G,Johnson PE,O'Connor PJ

doi

10.1002/sim.6526

subject

Has Abstract

pub_date

2015-09-20 00:00:00

pages

2941-57

issue

21

eissn

0277-6715

issn

1097-0258

journal_volume

34

pub_type

杂志文章
  • Correcting for regression in assessing the response to treatment in a selected population.

    abstract::Previous work on the consequences of regression to the mean for the interpretation of responses to treatment is extended to the situation where the response measured is the proportional change in some variable. Methods for correcting for the bias are discussed. ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780060203

    authors: Curnow RN

    更新日期:1987-03-01 00:00:00

  • Describing time and age variations in the risk of radiation-induced solid tumour incidence in the Japanese atomic bomb survivors using generalized relative and absolute risk models.

    abstract::Generalized relative and absolute risk models, in which various functions of time and age modify the excess relative or absolute risk of radiation-induced cancer, are fitted to the Japanese atomic bomb survivor cancer incidence data set. Among generalized relative risk models, those in which a product of powers of tim...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19990115)18:1<17::aid-sim9

    authors: Little MP,Muirhead CR,Charles MW

    更新日期:1999-01-15 00:00:00

  • Methodological pitfalls in the analysis of contraceptive failure.

    abstract::Although the literature on contraceptive failure is vast and is expanding rapidly, our understanding of the relative efficacy of methods is quite limited because of defects in the research design and in the analytical tools used by investigators. Errors in the literature range from simple arithmetical mistakes to outr...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/sim.4780100206

    authors: Trussell J

    更新日期:1991-02-01 00:00:00

  • Promoting interactions with basic scientists and clinicians: the NIA Alzheimer's Disease Data Coordinating Center.

    abstract::To benefit Alzheimer's disease research, a central data co-ordinating centre (CDCC) is planned that will systematically collect data from 27 Alzheimer's disease centres (ADCs) located nationwide. This CDCC will combine, analyse and disseminate epidemiologic, demographic, clinical and neuropathological data to research...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(20000615/30)19:11/12<1453:

    authors: Cronin-Stubbs D,DeKosky ST,Morris JC,Evans DA

    更新日期:2000-06-15 00:00:00

  • Using pilot data to size a two-arm randomized trial to find a nearly optimal personalized treatment strategy.

    abstract::A personalized treatment strategy formalizes evidence-based treatment selection by mapping patient information to a recommended treatment. Personalized treatment strategies can produce better patient outcomes while reducing cost and treatment burden. Thus, among clinical and intervention scientists, there is a growing...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6783

    authors: Laber EB,Zhao YQ,Regh T,Davidian M,Tsiatis A,Stanford JB,Zeng D,Song R,Kosorok MR

    更新日期:2016-04-15 00:00:00

  • A framework establishing clear decision criteria for the assessment of drug efficacy.

    abstract::Much has been published on various aspects of data analysis and reporting from clinical trials within the biopharmaceutical environment. This ranges from regulatory guidelines on the format and content of registration dossiers to recommendations on data presentation and the statistical methodologies that are appropria...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/(sici)1097-0258(19980815/30)17:15/16<1829:

    authors: Huster WJ,Enas GG

    更新日期:1998-08-15 00:00:00

  • Bioequivalence revisited.

    abstract::The FDA permits marketing of a generic formulation of a drug G for the same indications as a standard preparation S if one can show that G is bioequivalent to S. Present implementation requires convincing evidence that the population mean difference in bioavailability (drug exposure) between the two preparations lies ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780111311

    authors: Sheiner LB

    更新日期:1992-09-30 00:00:00

  • Nonparametric covariate hypothesis tests for the cure rate in mixture cure models.

    abstract::In lifetime data, like cancer studies, there may be long term survivors, which lead to heavy censoring at the end of the follow-up period. Since a standard survival model is not appropriate to handle these data, a cure model is needed. In the literature, covariate hypothesis tests for cure models are limited to parame...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8530

    authors: López-Cheda A,Jácome MA,Van Keilegom I,Cao R

    更新日期:2020-07-30 00:00:00

  • On the use of discrete choice models for causal inference.

    abstract::Methodology for causal inference based on propensity scores has been developed and popularized in the last two decades. However, the majority of the methodology has concentrated on binary treatments. Only recently have these methods been extended to settings with multi-valued treatments. We propose a number of discret...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2095

    authors: Tchernis R,Horvitz-Lennon M,Normand SL

    更新日期:2005-07-30 00:00:00

  • Seasonal and other short-term influences on United States AIDS incidence.

    abstract::This paper models monthly AIDS diagnosis counts in terms of smooth secular trend, calendar month effects, and the number of workdays per month. A parameterization of month effects allows separation of true seasonal effects from a linear trend over the calendar year and an arbitrary June effect. There is strong evidenc...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780131905

    authors: Bacchetti P

    更新日期:1994-10-15 00:00:00

  • Sample size calculation for clinical trials with correlated count measurements based on the negative binomial distribution.

    abstract::Statistical inference based on correlated count measurements are frequently performed in biomedical studies. Most of existing sample size calculation methods for count outcomes are developed under the Poisson model. Deviation from the Poisson assumption (equality of mean and variance) has been widely documented in pra...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8378

    authors: Li D,Zhang S,Cao J

    更新日期:2019-12-10 00:00:00

  • Permutation tests for joinpoint regression with applications to cancer rates.

    abstract::The identification of changes in the recent trend is an important issue in the analysis of cancer mortality and incidence data. We apply a joinpoint regression model to describe such continuous changes and use the grid-search method to fit the regression function with unknown joinpoints assuming constant variance and ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(20000215)19:3<335::aid-sim

    authors: Kim HJ,Fay MP,Feuer EJ,Midthune DN

    更新日期:2000-02-15 00:00:00

  • Generalizability of causal inference in observational studies under retrospective convenience sampling.

    abstract::Many observational studies adopt what we call retrospective convenience sampling (RCS). With the sample size in each arm prespecified, RCS randomly selects subjects from the treatment-inclined subpopulation into the treatment arm and those from the control-inclined into the control arm. Samples in each arm are represe...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7808

    authors: Hu Z,Qin J

    更新日期:2018-05-20 00:00:00

  • Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution.

    abstract::The Box-Cox power exponential (BCPE) distribution, developed in this paper, provides a model for a dependent variable Y exhibiting both skewness and kurtosis (leptokurtosis or platykurtosis). The distribution is defined by a power transformation Y(nu) having a shifted and scaled (truncated) standard power exponential ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1861

    authors: Rigby RA,Stasinopoulos DM

    更新日期:2004-10-15 00:00:00

  • Models for the propensity score that contemplate the positivity assumption and their application to missing data and causality.

    abstract::Generalized linear models are often assumed to fit propensity scores, which are used to compute inverse probability weighted (IPW) estimators. To derive the asymptotic properties of IPW estimators, the propensity score is supposed to be bounded away from zero. This condition is known in the literature as strict positi...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7827

    authors: Molina J,Sued M,Valdora M

    更新日期:2018-10-30 00:00:00

  • Modelling heterogeneity in clustered count data with extra zeros using compound Poisson random effect.

    abstract::In medical and health studies, heterogeneities in clustered count data have been traditionally modeled by positive random effects in Poisson mixed models; however, excessive zeros often occur in clustered medical and health count data. In this paper, we consider a three-level random effects zero-inflated Poisson model...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3619

    authors: Ma R,Hasan MT,Sneddon G

    更新日期:2009-08-15 00:00:00

  • Model selection techniques for the covariance matrix for incomplete longitudinal data.

    abstract::In longitudinal studies with incomplete data, where the number of time points can become numerous, it is often advantageous to model the covariance matrix. We describe several covariance models (for example, mixed models, compound symmetry, AR(1)-type models, and combination models) that offer parsimonious alternative...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/sim.4780141302

    authors: Grady JJ,Helms RW

    更新日期:1995-07-15 00:00:00

  • A joint modeling and estimation method for multivariate longitudinal data with mixed types of responses to analyze physical activity data generated by accelerometers.

    abstract::A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi-likelihood type approximation for nonlinear variables and transform the prop...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7401

    authors: Li H,Zhang Y,Carroll RJ,Keadle SK,Sampson JN,Matthews CE

    更新日期:2017-11-10 00:00:00

  • Positing, fitting, and selecting regression models for pooled biomarker data.

    abstract::Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. I...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6496

    authors: Mitchell EM,Lyles RH,Schisterman EF

    更新日期:2015-07-30 00:00:00

  • Ratio of geometric means to analyze continuous outcomes in meta-analysis: comparison to mean differences and ratio of arithmetic means using empiric data and simulation.

    abstract::Meta-analyses pooling continuous outcomes can use mean differences (MD), standardized MD (MD in pooled standard deviation units, SMD), or ratio of arithmetic means (RoM). Recently, ratio of geometric means using ad hoc (RoGM (ad hoc) ) or Taylor series (RoGM (Taylor) ) methods for estimating variances have been propos...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4501

    authors: Friedrich JO,Adhikari NK,Beyene J

    更新日期:2012-07-30 00:00:00

  • Group sequential designs for cure rate models with early stopping in favour of the null hypothesis.

    abstract::Ewell and Ibrahim derived the large sample distribution of the logrank statistic under general local alternatives. Their asymptotic results enable us to extend several group sequential designs which allow for early stopping in favour of the null hypothesis to the setting in which the cure rate model is appropriate. In...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/1097-0258(20001130)19:22<3023::aid-sim638>

    authors: Patricia Bernardo MV,Ibrahim JG

    更新日期:2000-11-30 00:00:00

  • Doubly robust generalized estimating equations for longitudinal data.

    abstract::A popular method for analysing repeated-measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse-probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of bein...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3520

    authors: Seaman S,Copas A

    更新日期:2009-03-15 00:00:00

  • An illness-death stochastic model in the analysis of longitudinal dementia data.

    abstract::A significant source of missing data in longitudinal epidemiological studies on elderly individuals is death. Subjects in large scale community-based longitudinal dementia studies are usually evaluated for disease status in study waves, not under continuous surveillance as in traditional cohort studies. Therefore, for...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1506

    authors: Harezlak J,Gao S,Hui SL

    更新日期:2003-05-15 00:00:00

  • What do we mean by validating a prognostic model?

    abstract::Prognostic models are used in medicine for investigating patient outcome in relation to patient and disease characteristics. Such models do not always work well in practice, so it is widely recommended that they need to be validated. The idea of validating a prognostic model is generally taken to mean establishing tha...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(20000229)19:4<453::aid-sim

    authors: Altman DG,Royston P

    更新日期:2000-02-29 00:00:00

  • Modelling the relationship between continuous covariates and clinical events using isotonic regression.

    abstract::In a medical study we are often interested in graphically displaying the relationship between continuous variables and clinical events indicating disease progression. Often, it is reasonable to make the minimal assumption that the risk of progression is an arbitrary monotone function of the continuous variable. Someti...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1561

    authors: Ancukiewicz M,Finkelstein DM,Schoenfeld DA

    更新日期:2003-10-30 00:00:00

  • Hierarchical nested trial design (HNTD) for demonstrating treatment efficacy of new antibacterial drugs in patient populations with emerging bacterial resistance.

    abstract::In the last decade or so, pharmaceutical drug development activities in the area of new antibacterial drugs for treating serious bacterial diseases have declined, and at the same time, there are worries that the increased prevalence of antibiotic-resistant bacterial infections, especially the increase in drug-resistan...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6233

    authors: Huque MF,Valappil T,Soon GG

    更新日期:2014-11-10 00:00:00

  • The partial testing design: a less costly way to test equivalence for sensitivity and specificity.

    abstract::We propose a new, less costly, design to test the equivalence of digital versus analogue mammography in terms of sensitivity and specificity. Because breast cancer is a rare event among asymptomatic women, the sample size for testing equivalence of sensitivity is larger than that for testing equivalence of specificity...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19981015)17:19<2219::aid-s

    authors: Baker SG,Connor RJ,Kessler LG

    更新日期:1998-10-15 00:00:00

  • Joint estimation of multiple disease-specific sensitivities and specificities via crossed random effects models for correlated reader-based diagnostic data: application of data cloning.

    abstract::We present a model for describing correlated binocular data from reader-based diagnostic studies, where the same group of readers evaluates the presence or absence of certain diseases on binocular organs (e.g., fellow eyes) of patients. Multiple random effects are incorporated to meaningfully delineate various associa...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6584

    authors: Withanage N,de Leon AR,Rudnisky CJ

    更新日期:2015-12-20 00:00:00

  • Precision of incidence predictions based on Poisson distributed observations.

    abstract::Disease incidence predictions are useful for a number of administrative and scientific purposes. The simplest ones are made using trend extrapolation, on either an arithmetic or a logarithmic scale. This paper shows how approximate confidence prediction intervals can be calculated for such predictions, both for the to...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780131503

    authors: Hakulinen T,Dyba T

    更新日期:1994-08-15 00:00:00

  • Stochastically curtailed phase II clinical trials.

    abstract::Phase II trials often test the null hypothesis H(0): p or=p(1), where p is the true unknown proportion responding to the new treatment, p(0) is the greatest response proportion which is deemed clinically ineffective, and p(1) is the smallest response proportion which is deemed clinically effe...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2653

    authors: Ayanlowo AO,Redden DT

    更新日期:2007-03-30 00:00:00