Abstract:
:Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.
journal_name
Stat Medjournal_title
Statistics in medicineauthors
Wolfson J,Bandyopadhyay S,Elidrisi M,Vazquez-Benitez G,Vock DM,Musgrove D,Adomavicius G,Johnson PE,O'Connor PJdoi
10.1002/sim.6526subject
Has Abstractpub_date
2015-09-20 00:00:00pages
2941-57issue
21eissn
0277-6715issn
1097-0258journal_volume
34pub_type
杂志文章abstract::Previous work on the consequences of regression to the mean for the interpretation of responses to treatment is extended to the situation where the response measured is the proportional change in some variable. Methods for correcting for the bias are discussed. ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780060203
更新日期:1987-03-01 00:00:00
abstract::Generalized relative and absolute risk models, in which various functions of time and age modify the excess relative or absolute risk of radiation-induced cancer, are fitted to the Japanese atomic bomb survivor cancer incidence data set. Among generalized relative risk models, those in which a product of powers of tim...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19990115)18:1<17::aid-sim9
更新日期:1999-01-15 00:00:00
abstract::Although the literature on contraceptive failure is vast and is expanding rapidly, our understanding of the relative efficacy of methods is quite limited because of defects in the research design and in the analytical tools used by investigators. Errors in the literature range from simple arithmetical mistakes to outr...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.4780100206
更新日期:1991-02-01 00:00:00
abstract::To benefit Alzheimer's disease research, a central data co-ordinating centre (CDCC) is planned that will systematically collect data from 27 Alzheimer's disease centres (ADCs) located nationwide. This CDCC will combine, analyse and disseminate epidemiologic, demographic, clinical and neuropathological data to research...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(20000615/30)19:11/12<1453:
更新日期:2000-06-15 00:00:00
abstract::A personalized treatment strategy formalizes evidence-based treatment selection by mapping patient information to a recommended treatment. Personalized treatment strategies can produce better patient outcomes while reducing cost and treatment burden. Thus, among clinical and intervention scientists, there is a growing...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6783
更新日期:2016-04-15 00:00:00
abstract::Much has been published on various aspects of data analysis and reporting from clinical trials within the biopharmaceutical environment. This ranges from regulatory guidelines on the format and content of registration dossiers to recommendations on data presentation and the statistical methodologies that are appropria...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/(sici)1097-0258(19980815/30)17:15/16<1829:
更新日期:1998-08-15 00:00:00
abstract::The FDA permits marketing of a generic formulation of a drug G for the same indications as a standard preparation S if one can show that G is bioequivalent to S. Present implementation requires convincing evidence that the population mean difference in bioavailability (drug exposure) between the two preparations lies ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780111311
更新日期:1992-09-30 00:00:00
abstract::In lifetime data, like cancer studies, there may be long term survivors, which lead to heavy censoring at the end of the follow-up period. Since a standard survival model is not appropriate to handle these data, a cure model is needed. In the literature, covariate hypothesis tests for cure models are limited to parame...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8530
更新日期:2020-07-30 00:00:00
abstract::Methodology for causal inference based on propensity scores has been developed and popularized in the last two decades. However, the majority of the methodology has concentrated on binary treatments. Only recently have these methods been extended to settings with multi-valued treatments. We propose a number of discret...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2095
更新日期:2005-07-30 00:00:00
abstract::This paper models monthly AIDS diagnosis counts in terms of smooth secular trend, calendar month effects, and the number of workdays per month. A parameterization of month effects allows separation of true seasonal effects from a linear trend over the calendar year and an arbitrary June effect. There is strong evidenc...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780131905
更新日期:1994-10-15 00:00:00
abstract::Statistical inference based on correlated count measurements are frequently performed in biomedical studies. Most of existing sample size calculation methods for count outcomes are developed under the Poisson model. Deviation from the Poisson assumption (equality of mean and variance) has been widely documented in pra...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8378
更新日期:2019-12-10 00:00:00
abstract::The identification of changes in the recent trend is an important issue in the analysis of cancer mortality and incidence data. We apply a joinpoint regression model to describe such continuous changes and use the grid-search method to fit the regression function with unknown joinpoints assuming constant variance and ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(20000215)19:3<335::aid-sim
更新日期:2000-02-15 00:00:00
abstract::Many observational studies adopt what we call retrospective convenience sampling (RCS). With the sample size in each arm prespecified, RCS randomly selects subjects from the treatment-inclined subpopulation into the treatment arm and those from the control-inclined into the control arm. Samples in each arm are represe...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7808
更新日期:2018-05-20 00:00:00
abstract::The Box-Cox power exponential (BCPE) distribution, developed in this paper, provides a model for a dependent variable Y exhibiting both skewness and kurtosis (leptokurtosis or platykurtosis). The distribution is defined by a power transformation Y(nu) having a shifted and scaled (truncated) standard power exponential ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1861
更新日期:2004-10-15 00:00:00
abstract::Generalized linear models are often assumed to fit propensity scores, which are used to compute inverse probability weighted (IPW) estimators. To derive the asymptotic properties of IPW estimators, the propensity score is supposed to be bounded away from zero. This condition is known in the literature as strict positi...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7827
更新日期:2018-10-30 00:00:00
abstract::In medical and health studies, heterogeneities in clustered count data have been traditionally modeled by positive random effects in Poisson mixed models; however, excessive zeros often occur in clustered medical and health count data. In this paper, we consider a three-level random effects zero-inflated Poisson model...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3619
更新日期:2009-08-15 00:00:00
abstract::In longitudinal studies with incomplete data, where the number of time points can become numerous, it is often advantageous to model the covariance matrix. We describe several covariance models (for example, mixed models, compound symmetry, AR(1)-type models, and combination models) that offer parsimonious alternative...
journal_title:Statistics in medicine
pub_type: 杂志文章,评审
doi:10.1002/sim.4780141302
更新日期:1995-07-15 00:00:00
abstract::A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi-likelihood type approximation for nonlinear variables and transform the prop...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7401
更新日期:2017-11-10 00:00:00
abstract::Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. I...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6496
更新日期:2015-07-30 00:00:00
abstract::Meta-analyses pooling continuous outcomes can use mean differences (MD), standardized MD (MD in pooled standard deviation units, SMD), or ratio of arithmetic means (RoM). Recently, ratio of geometric means using ad hoc (RoGM (ad hoc) ) or Taylor series (RoGM (Taylor) ) methods for estimating variances have been propos...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4501
更新日期:2012-07-30 00:00:00
abstract::Ewell and Ibrahim derived the large sample distribution of the logrank statistic under general local alternatives. Their asymptotic results enable us to extend several group sequential designs which allow for early stopping in favour of the null hypothesis to the setting in which the cure rate model is appropriate. In...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/1097-0258(20001130)19:22<3023::aid-sim638>
更新日期:2000-11-30 00:00:00
abstract::A popular method for analysing repeated-measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse-probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of bein...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3520
更新日期:2009-03-15 00:00:00
abstract::A significant source of missing data in longitudinal epidemiological studies on elderly individuals is death. Subjects in large scale community-based longitudinal dementia studies are usually evaluated for disease status in study waves, not under continuous surveillance as in traditional cohort studies. Therefore, for...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1506
更新日期:2003-05-15 00:00:00
abstract::Prognostic models are used in medicine for investigating patient outcome in relation to patient and disease characteristics. Such models do not always work well in practice, so it is widely recommended that they need to be validated. The idea of validating a prognostic model is generally taken to mean establishing tha...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(20000229)19:4<453::aid-sim
更新日期:2000-02-29 00:00:00
abstract::In a medical study we are often interested in graphically displaying the relationship between continuous variables and clinical events indicating disease progression. Often, it is reasonable to make the minimal assumption that the risk of progression is an arbitrary monotone function of the continuous variable. Someti...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1561
更新日期:2003-10-30 00:00:00
abstract::In the last decade or so, pharmaceutical drug development activities in the area of new antibacterial drugs for treating serious bacterial diseases have declined, and at the same time, there are worries that the increased prevalence of antibiotic-resistant bacterial infections, especially the increase in drug-resistan...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6233
更新日期:2014-11-10 00:00:00
abstract::We propose a new, less costly, design to test the equivalence of digital versus analogue mammography in terms of sensitivity and specificity. Because breast cancer is a rare event among asymptomatic women, the sample size for testing equivalence of sensitivity is larger than that for testing equivalence of specificity...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19981015)17:19<2219::aid-s
更新日期:1998-10-15 00:00:00
abstract::We present a model for describing correlated binocular data from reader-based diagnostic studies, where the same group of readers evaluates the presence or absence of certain diseases on binocular organs (e.g., fellow eyes) of patients. Multiple random effects are incorporated to meaningfully delineate various associa...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6584
更新日期:2015-12-20 00:00:00
abstract::Disease incidence predictions are useful for a number of administrative and scientific purposes. The simplest ones are made using trend extrapolation, on either an arithmetic or a logarithmic scale. This paper shows how approximate confidence prediction intervals can be calculated for such predictions, both for the to...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780131503
更新日期:1994-08-15 00:00:00
abstract::Phase II trials often test the null hypothesis H(0): p
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2653
更新日期:2007-03-30 00:00:00