Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response.

Abstract:

:In observational studies, many continuous or categorical covariates may be related to an outcome. Various spline-based procedures or the multivariable fractional polynomial (MFP) procedure can be used to identify important variables and functional forms for continuous covariates. This is the main aim of an explanatory model, as opposed to a model only for prediction. The type of analysis often guides the complexity of the final model. Spline-based procedures and MFP have tuning parameters for choosing the required complexity. To compare model selection approaches, we perform a simulation study in the linear regression context based on a data structure intended to reflect realistic biomedical data. We vary the sample size, variance explained and complexity parameters for model selection. We consider 15 variables. A sample size of 200 (1000) and R(2)  = 0.2 (0.8) is the scenario with the smallest (largest) amount of information. For assessing performance, we consider prediction error, correct and incorrect inclusion of covariates, qualitative measures for judging selected functional forms and further novel criteria. From limited information, a suitable explanatory model cannot be obtained. Prediction performance from all types of models is similar. With a medium amount of information, MFP performs better than splines on several criteria. MFP better recovers simpler functions, whereas splines better recover more complex functions. For a large amount of information and no local structure, MFP and the spline procedures often select similar explanatory models.

journal_name

Stat Med

journal_title

Statistics in medicine

authors

Binder H,Sauerbrei W,Royston P

doi

10.1002/sim.5639

subject

Has Abstract

pub_date

2013-06-15 00:00:00

pages

2262-77

issue

13

eissn

0277-6715

issn

1097-0258

journal_volume

32

pub_type

杂志文章
  • Comparison of tests for categorical data from a stratified cluster randomized trial.

    abstract::Two features commonly exhibited by randomized trials of health promotion interventions are cluster randomization and stratification. Ignoring correlations between individuals within clusters can lead to an inflated type I error rate and hence a P-value which overstates the significance of the result. This paper compar...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1256

    authors: Dobbins TA,Simpson JM

    更新日期:2002-12-30 00:00:00

  • Statistical inferences for a twin correlation with multinomial outcomes.

    abstract::Current methods for statistical analysis of twin studies focus on continuous and dichotomous data, while only limited methodology exists for analysing multinomial data. As a consequence, investigators are often tempted to collapse multinomial data into two categories simply to facilitate the analysis. We address this ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/1097-0258(20010130)20:2<249::aid-sim641>3.

    authors: Bartfay E,Donner A

    更新日期:2001-01-30 00:00:00

  • Modelling age-specific risk: application to dementia.

    abstract::We give up-to-date methods for estimating the age-specific incidence of a disease and for estimating the effect of risk factors. We recommend taking age as the basic time scale of the analysis; then, the hazard function can be interpreted as the age-specific incidence of the disease. This choice raises a delayed entry...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19980915)17:17<1973::aid-s

    authors: Commenges D,Letenneur L,Joly P,Alioum A,Dartigues JF

    更新日期:1998-09-15 00:00:00

  • Joint analysis of multi-level repeated measures data and survival: an application to the end stage renal disease (ESRD) data.

    abstract::Shared random effects models have been increasingly common in the joint analyses of repeated measures (e.g. CD4 counts, hemoglobin levels) and a correlated failure time such as death. In this paper we study several shared random effects models in the multi-level repeated measures data setting with dependent failure ti...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3392

    authors: Liu L,Ma JZ,O'Quigley J

    更新日期:2008-11-29 00:00:00

  • Analytical, practical and regulatory issues in prevention studies.

    abstract::Prevention studies, as distinguished from studies investigating treatments for established disease, present some distinct challenges. Perhaps the most extensive experience with preventive agents is in the area of infectious diseases; vaccines have been extremely effective in preventing many such diseases. Vaccines hav...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1717

    authors: Ellenberg SS

    更新日期:2004-01-30 00:00:00

  • Analysis of in vitro fertilization data with multiple outcomes using discrete time-to-event analysis.

    abstract::In vitro fertilization (IVF) is an increasingly common method of assisted reproductive technology. Because of the careful observation and follow-up required as part of the procedure, IVF studies provide an ideal opportunity to identify and assess clinical and demographic factors along with environmental exposures that...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6050

    authors: Maity A,Williams PL,Ryan L,Missmer SA,Coull BA,Hauser R

    更新日期:2014-05-10 00:00:00

  • Identifiability and estimation of causal mediation effects with missing data.

    abstract::Mediation analysis is a standard approach to understanding how and why an intervention works in social and medical sciences. However, the presence of missing data, especially missing not at random data, poses a great challenge for the applicability of this approach in practice. Current methods for handling such missin...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7413

    authors: Li W,Zhou XH

    更新日期:2017-11-10 00:00:00

  • Testing for central mixtures of compartment model parameters.

    abstract::I discuss alternatives to the one compartment model, delta Yt = alpha + beta exp(- gamma t). Instead of comparing the one and two compartment models, I derive statistics for testing mixtures of the parameters (beta, gamma) in the one compartment model. I apply the proposed methods to the problem of hydrogen clearance ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780080811

    authors: Zelterman D

    更新日期:1989-08-01 00:00:00

  • Estimating adjusted risk difference (RD) and number needed to treat (NNT) measures in the Cox regression model.

    abstract::In medical research, risk difference (RD) and number needed to treat (NNT) measures for survival times have been mainly proposed without consideration of covariates. In this paper, we develop adjusted RD and NNT measures for use in observational studies with survival time outcomes within the framework of the Cox propo...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3793

    authors: Laubender RP,Bender R

    更新日期:2010-03-30 00:00:00

  • Random-effects meta-analysis of the clinical utility of tests and prediction models.

    abstract::The use of data from multiple studies or centers for the validation of a clinical test or a multivariable prediction model allows researchers to investigate the test's/model's performance in multiple settings and populations. Recently, meta-analytic techniques have been proposed to summarize discrimination and calibra...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7653

    authors: Wynants L,Riley RD,Timmerman D,Van Calster B

    更新日期:2018-05-30 00:00:00

  • A new approach to designing phase I-II cancer trials for cytotoxic chemotherapies.

    abstract::Recently, there has been much work on early phase cancer designs that incorporate both toxicity and efficacy data, called phase I-II designs because they combine elements of both phases. However, they do not explicitly address the phase II hypothesis test of H0 : p ≤ p0 , where p is the probability of efficacy at the ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6124

    authors: Bartroff J,Lai TL,Narasimhan B

    更新日期:2014-07-20 00:00:00

  • Joint analysis of mixed types of outcomes with latent variables.

    abstract::We propose a joint modeling approach to investigating the observed and latent risk factors of mixed types of outcomes. The proposed model comprises three parts. The first part is an exploratory factor analysis model that summarizes latent factors through multiple observed variables. The second part is a proportional h...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8840

    authors: Pan D,Wei Y,Song X

    更新日期:2020-12-09 00:00:00

  • A model for cross-over trials evaluating therapeutic preferences.

    abstract::A preference trial is a special form of cross-over trial where clinical conditions determine when patients change treatment, in a prescribed order. This can be modelled using a geometric distribution. The model can be simply fitted using standard logistic regression methodology. The procedure is applied to a trial stu...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(SICI)1097-0258(19960229)15:4<443::AID-SIM

    authors: Lindsey JK,Jones B

    更新日期:1996-02-28 00:00:00

  • A semi-Markov model for multistate and interval-censored data with multiple terminal events. Application in renal transplantation.

    abstract::The semi-Markov assumption emphasizes the importance of time spent in a state. In order to compute this type of multistate model, most transition times are always considered to be exactly identified or right censored. However, in the longitudinal analysis of chronic diseases, investigators are often confronted with in...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3100

    authors: Foucher Y,Giral M,Soulillou JP,Daures JP

    更新日期:2007-12-30 00:00:00

  • Assessing goodness-of-fit of parametric regression models for lifetime data-graphical methods.

    abstract::Graphical methods are often used to check goodness-of-fit of models to data. It is common to plot residuals against a reference distribution so that when the model fits the data, the configuration should be close to a straight line. Since the resemblance to a straight line is often unclear, it has been suggested to ad...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780141607

    authors: Cohen A,Barnett O

    更新日期:1995-08-30 00:00:00

  • How serious is bias in effect estimation in randomised trials with survival data given risk heterogeneity and informative censoring?

    abstract::It is often assumed that randomisation will prevent bias in estimation of treatment effects from clinical trials, but this is not true of the semiparametric Proportional Hazards model for survival data when there is underlying risk heterogeneity. Here, a new formula is proposed for estimation of this bias, improving o...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7343

    authors: McNamee R

    更新日期:2017-09-20 00:00:00

  • Simple methods for checking for possible errors in reported odds ratios, relative risks and confidence intervals.

    abstract::Meta-analyses of data from epidemiological studies are often based on odds ratios (ORs) or relative risks (RRs) and their 95 per cent confidence intervals (CIs) as reported by the authors. Where possible ORs, RRs and CIs should be checked against the source data. Some simple methods are presented for checking the vali...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19990815)18:15<1973::aid-s

    authors: Lee PN

    更新日期:1999-08-15 00:00:00

  • A simulation-free approach to assessing the performance of the continual reassessment method.

    abstract::The continual reassessment method (CRM) is an adaptive design for Phase I trials whose operating characteristics, including appropriate sample size, probability of correctly identifying the maximum tolerated dose, and the expected proportion of participants assigned to each dose, can only be determined via simulation....

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8746

    authors: Braun TM

    更新日期:2020-09-16 00:00:00

  • Bounds on natural direct effects in the presence of confounded intermediate variables.

    abstract::In epidemiological studies we often want to learn about the direct effect of an exposure on an outcome, i.e. the effect that is not relayed by a specific intermediate variable. In the literature, there are two common definitions of direct effects; controlled and natural. When the intermediate variable and the outcome ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3493

    authors: Sjölander A

    更新日期:2009-02-15 00:00:00

  • A comparison of methods for determining HIV viral set point.

    abstract::During a course of human immunodeficiency virus (HIV-1) infection, the viral load usually increases sharply to a peak following infection and then drops rapidly to a steady state, where it remains until progression to AIDS. This steady state is often referred to as the viral set point. It is believed that the HIV vira...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3038

    authors: Mei Y,Wang L,Holte SE

    更新日期:2008-01-15 00:00:00

  • A robust goodness-of-fit test statistic with application to ordinal regression models.

    abstract::We propose a goodness-of-fit test statistic for linear regression with heterogeneous variance, which is asymptotically chi-square if the given model is correct. The test statistic is computed as a quadratic form of observed minus predicted responses. We apply the method to a linear regression for an ordinal categorica...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780130205

    authors: Lipsitz SR,Buoncristiani JF

    更新日期:1994-01-30 00:00:00

  • Survival probabilities with time-dependent treatment indicator: quantities and non-parametric estimators.

    abstract::The 'landmark' and 'Simon and Makuch' non-parametric estimators of the survival function are commonly used to contrast the survival experience of time-dependent treatment groups in applications such as stem cell transplant versus chemotherapy in leukemia. However, the theoretical survival functions corresponding to th...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6765

    authors: Bernasconi DP,Rebora P,Iacobelli S,Valsecchi MG,Antolini L

    更新日期:2016-03-30 00:00:00

  • Traffic accident mapping in Bangkok metropolis: a case study.

    abstract::Results from an analysis of traffic accidents from a study of the police records of four police stations in the Bangkok metropolis are presented. The main emphasis in this study was put on the development of a measure for traffic accident density. The traffic flow was estimated at the various study locations by traine...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780142113

    authors: Ayuthya RS,Böhning D

    更新日期:1995-11-15 00:00:00

  • On prediction of future observation in growth curve model.

    abstract::Rao proposed and compared several approaches for predicting future observations in a growth curve model. The assessment of associated prediction efficiency for different prediction methods were evaluated by Cross-Validation Assessment Error (CVAE). He used three data sets, each with a limited number of subjects (13-27...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780132103

    authors: Tian JJ,Shukla R,Buncher CR

    更新日期:1994-11-15 00:00:00

  • Sam Greenhouse's years at the Census Bureau and the UNRRA.

    abstract::Sam Greenhouse joined the Census Bureau as a clerk at an interesting time period for the agency. The first use of sampling in the decennial census occurred in 1940. There was a major expansion of the amount of data collected. The organization of the Census Bureau underwent radical changes, including the growth of the ...

    journal_title:Statistics in medicine

    pub_type: 传,历史文章,杂志文章

    doi:10.1002/sim.1627

    authors: Keller J,Clark CZ

    更新日期:2003-11-15 00:00:00

  • Assessing the robustness of sisVIVE in a Mendelian randomization study to estimate the causal effect of body mass index on income using multiple SNPs from understanding society.

    abstract::The "some invalid, some valid instrumental variable estimator" (sisVIVE) is a lasso-based method for instrumental variables (IVs) regression of outcome on an exposure. In principle, sisVIVE is robust to some of the IVs in the analysis being invalid, in the sense of being related to the outcome variable through pathway...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8066

    authors: Bao Y,Clarke PS,Smart M,Kumari M

    更新日期:2019-04-30 00:00:00

  • Conditional power and predictive power based on right censored data with supplementary auxiliary information.

    abstract::Conditional power and predictive power provide estimates of the probability of success at the end of the trial based on the information from the interim analysis. The observed value of the time to event endpoint at the interim analysis could be biased for the true treatment effect due to early censoring, leading to a ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7673

    authors: Sun L,Wan Y

    更新日期:2018-08-15 00:00:00

  • Nonparametric modeling and analysis of association between Huntington's disease onset and CAG repeats.

    abstract::Huntington's disease (HD) is a neurodegenerative disorder with a dominant genetic mode of inheritance caused by an expansion of CAG repeats on chromosome 4. Typically, a longer sequence of CAG repeat length is associated with increased risk of experiencing earlier onset of HD. Previous studies of the association betwe...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5971

    authors: Ma Y,Wang Y

    更新日期:2014-04-15 00:00:00

  • Multilevel mixed effects parametric survival models using adaptive Gauss-Hermite quadrature with application to recurrent events and individual participant data meta-analysis.

    abstract::Multilevel mixed effects survival models are used in the analysis of clustered survival data, such as repeated events, multicenter clinical trials, and individual participant data (IPD) meta-analyses, to investigate heterogeneity in baseline risk and covariate effects. In this paper, we extend parametric frailty model...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6191

    authors: Crowther MJ,Look MP,Riley RD

    更新日期:2014-09-28 00:00:00

  • A framework establishing clear decision criteria for the assessment of drug efficacy.

    abstract::Much has been published on various aspects of data analysis and reporting from clinical trials within the biopharmaceutical environment. This ranges from regulatory guidelines on the format and content of registration dossiers to recommendations on data presentation and the statistical methodologies that are appropria...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/(sici)1097-0258(19980815/30)17:15/16<1829:

    authors: Huster WJ,Enas GG

    更新日期:1998-08-15 00:00:00