Abstract:
:Logistic regression analysis may well be used to develop a prognostic model for a dichotomous outcome. Especially when limited data are available, it is difficult to determine an appropriate selection of covariables for inclusion in such models. Also, predictions may be improved by applying some sort of shrinkage in the estimation of regression coefficients. In this study we compare the performance of several selection and shrinkage methods in small data sets of patients with acute myocardial infarction, where we aim to predict 30-day mortality. Selection methods included backward stepwise selection with significance levels alpha of 0.01, 0.05, 0. 157 (the AIC criterion) or 0.50, and the use of qualitative external information on the sign of regression coefficients in the model. Estimation methods included standard maximum likelihood, the use of a linear shrinkage factor, penalized maximum likelihood, the Lasso, or quantitative external information on univariable regression coefficients. We found that stepwise selection with a low alpha (for example, 0.05) led to a relatively poor model performance, when evaluated on independent data. Substantially better performance was obtained with full models with a limited number of important predictors, where regression coefficients were reduced with any of the shrinkage methods. Incorporation of external information for selection and estimation improved the stability and quality of the prognostic models. We therefore recommend shrinkage methods in full models including prespecified predictors and incorporation of external information, when prognostic models are constructed in small data sets.
journal_name
Stat Medjournal_title
Statistics in medicineauthors
Steyerberg EW,Eijkemans MJ,Harrell FE Jr,Habbema JDdoi
10.1002/(sici)1097-0258(20000430)19:8<1059::aid-sisubject
Has Abstractpub_date
2000-04-30 00:00:00pages
1059-79issue
8eissn
0277-6715issn
1097-0258pii
10.1002/(SICI)1097-0258(20000430)19:8<1059::AID-SIjournal_volume
19pub_type
杂志文章abstract::Clustered grouped survival data arise naturally in clinical medicine and biological research. For example, in a randomized clinical trial, the variable of interest is the time to occurrence of a certain event with or without a new treatment and the data are collected from possibly correlated subjects from independent ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1323
更新日期:2003-06-30 00:00:00
abstract::Patient compliance (adherence) with prescribed medication is often erratic, while clinical outcomes are causally linked to actual, rather than nominal medication dosage. We propose here a hierarchical Markov model for patient compliance. At the first stage, conditional upon individual random effects and a set of indiv...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19981030)17:20<2313::aid-s
更新日期:1998-10-30 00:00:00
abstract::Modelling disease clustering over space and time can be helpful in providing indications of possible exposures and planning corresponding public health practices. Though a considerable number of studies focus on modelling spatio-temporal patterns of disease, most of them do not directly model a spatio-temporal cluster...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2424
更新日期:2006-03-15 00:00:00
abstract::The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7028
更新日期:2016-11-20 00:00:00
abstract::Incomplete and unbalanced multivariate data often arise in longitudinal studies due to missing or unequally-timed repeated measurements and/or the presence of time-varying covariates. A general approach to analysing such data is through maximum likelihood analysis using a linear model for the expected responses, and s...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780070132
更新日期:1988-01-01 00:00:00
abstract::We present graphical and numerical methods for assessing the adequacy of the logistic regression model for stratified case-control data. The proposed methods are derived from the cumulative sum of residuals over the covariate or linear predictor. Under the assumed model, the cumulative residual process converges weakl...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1932
更新日期:2005-01-30 00:00:00
abstract::We present a model for describing correlated binocular data from reader-based diagnostic studies, where the same group of readers evaluates the presence or absence of certain diseases on binocular organs (e.g., fellow eyes) of patients. Multiple random effects are incorporated to meaningfully delineate various associa...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6584
更新日期:2015-12-20 00:00:00
abstract::Multivariate finite mixture models have been applied to the identification of dietary patterns. These models are known to have many parameters, and consequently large samples are usually required. We present a special case of a multivariate mixture model that reduces the number of parameters to be estimated and seems ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5336
更新日期:2012-08-30 00:00:00
abstract::Cross-sectional designs are often used to monitor the proportion of infections and other post-surgical complications acquired in hospitals. However, conventional methods for estimating incidence proportions when applied to cross-sectional data may provide estimators that are highly biased, as cross-sectional designs t...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5608
更新日期:2013-06-30 00:00:00
abstract::Multi-type recurrent event data arise when two or more different kinds of events may occur repeatedly over a period of observation. The scientific objectives in such settings are often to describe features of the marginal processes and to study the association between the different types of events. Interval-censored m...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.1936
更新日期:2005-03-15 00:00:00
abstract::The continual reassessment method (CRM) is an adaptive design for Phase I trials whose operating characteristics, including appropriate sample size, probability of correctly identifying the maximum tolerated dose, and the expected proportion of participants assigned to each dose, can only be determined via simulation....
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8746
更新日期:2020-09-16 00:00:00
abstract::The log-rank test is the most powerful non-parametric test for detecting a proportional hazards alternative and thus is the most commonly used testing procedure for comparing time-to-event distributions between different treatments in clinical trials. When the log-rank test is used for the primary data analysis, the s...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3501
更新日期:2009-02-28 00:00:00
abstract::We investigate population-averaged (PA) and cluster-specific (CS) associations for clustered binary logistic regression in the context of a longitudinal clinical trial that investigated the association between tooth-specific visual elastase kit results and periodontal disease progression within 26 weeks of follow-up. ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780140407
更新日期:1995-02-28 00:00:00
abstract::We have used Monte Carlo methods to compare the type I error properties of the conditional and unconditional versions of the generalized t and the generalized rank-sum tests to those of the independent samples t and Wilcoxon rank-sum tests. Results showed inflated type I errors for the conditional generalized tests bu...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780110410
更新日期:1992-02-28 00:00:00
abstract::In a variety of biomedical applications, particularly those involving screening for infectious diseases, testing individuals (e.g. blood/urine samples, etc.) in pools has become a standard method of data collection. This experimental design, known as group testing (or pooled testing), can provide a large reduction in ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.3678
更新日期:2009-10-15 00:00:00
abstract::The power prior has been widely used in many applications covering a large number of disciplines. The power prior is intended to be an informative prior constructed from historical data. It has been used in clinical trials, genetics, health care, psychology, environmental health, engineering, economics, and business. ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6728
更新日期:2015-12-10 00:00:00
abstract::We present some practical extensions and applications of a strategy proposed by Thall, Simon and Estey for designing and monitoring single-arm clinical trials with multiple outcomes. We show by application how the strategy may be applied to construct designs for phase IIA activity trials and phase II equivalence trial...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/(sici)1097-0258(19980730)17:14<1563::aid-s
更新日期:1998-07-30 00:00:00
abstract::Day and Walter derived methods of joint maximum likelihood estimation for the sojourn time distribution and the false negative rate for a screening programme. Their methods are not directly applicable to a programme which uses alternate screening by two modalities whose sojourn times and false negative rates will diff...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780080611
更新日期:1989-06-01 00:00:00
abstract::The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection metho...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5694
更新日期:2013-05-30 00:00:00
abstract::Inference for randomized clinical trials is generally based on the assumption that outcomes are independently and identically distributed under the null hypothesis. In some trials, particularly in infectious disease, outcomes may be correlated. This may be known in advance (e.g. allowing randomization of family member...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2977
更新日期:2008-03-15 00:00:00
abstract::Several relative risk models for survival time data in drug combination therapy are derived and their properties are discussed. The main intention of this paper is to clarify the differences among the models in order to help to choose the appropriate one in a given situation. The models are motivated by discussing the...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780091216
更新日期:1990-12-01 00:00:00
abstract::When estimating the probability of natural conception from observational data on couples with an unfulfilled child wish, the start of assisted reproductive therapy (ART) is a competing event that cannot be assumed to be independent of natural conception. In clinical practice, interest lies in the probability of natura...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6280
更新日期:2014-11-20 00:00:00
abstract::Methods for addressing multiplicity in clinical trials have attracted much attention during the past 20 years. They include the investigation of new classes of multiple test procedures, such as fixed sequence, fallback and gatekeeping procedures. More recently, sequentially rejective graphical test procedures have bee...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.5711
更新日期:2013-05-10 00:00:00
abstract::The power to detect a treatment effect in cluster randomized trials can be increased by increasing the number of clusters. An alternative is to include covariates into the regression model that relates treatment condition to outcome. In this paper, formulae are derived in order to evaluate both strategies on basis of ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.2297
更新日期:2006-08-15 00:00:00
abstract::The "some invalid, some valid instrumental variable estimator" (sisVIVE) is a lasso-based method for instrumental variables (IVs) regression of outcome on an exposure. In principle, sisVIVE is robust to some of the IVs in the analysis being invalid, in the sense of being related to the outcome variable through pathway...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.8066
更新日期:2019-04-30 00:00:00
abstract::Missing data arise in crossover trials, as they do in any form of clinical trial. Several papers have addressed the problems that missing data create, although almost all of these assume that the probability that a planned observation is missing does not depend on the value that would have been observed; that is, the ...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4497
更新日期:2012-07-20 00:00:00
abstract::This paper gives a standard error for Cohen's Kappa, conditional on the margins of the observed r x r table. An explicit formula is given for the 2 x 2 table, and a procedure for the more general situation. A parsimonious log-linear model is suggested for the general case and an approximate confidence interval for kap...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780100512
更新日期:1991-05-01 00:00:00
abstract::This paper models monthly AIDS diagnosis counts in terms of smooth secular trend, calendar month effects, and the number of workdays per month. A parameterization of month effects allows separation of true seasonal effects from a linear trend over the calendar year and an arbitrary June effect. There is strong evidenc...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.4780131905
更新日期:1994-10-15 00:00:00
abstract::In the medical literature, hundreds of prediction models are being developed to predict health outcomes in individuals. For continuous outcomes, typically a linear regression model is developed to predict an individual's outcome value conditional on values of multiple predictors (covariates). To improve model developm...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.7993
更新日期:2019-03-30 00:00:00
abstract::The 'landmark' and 'Simon and Makuch' non-parametric estimators of the survival function are commonly used to contrast the survival experience of time-dependent treatment groups in applications such as stem cell transplant versus chemotherapy in leukemia. However, the theoretical survival functions corresponding to th...
journal_title:Statistics in medicine
pub_type: 杂志文章
doi:10.1002/sim.6765
更新日期:2016-03-30 00:00:00