Prediction intervals with random forests.

Abstract:

:The classical and most commonly used approach to building prediction intervals is the parametric approach. However, its main drawback is that its validity and performance highly depend on the assumed functional link between the covariates and the response. This research investigates new methods that improve the performance of prediction intervals with random forests. Two aspects are explored: The method used to build the forest and the method used to build the prediction interval. Four methods to build the forest are investigated, three from the classification and regression tree (CART) paradigm and the transformation forest method. For CART forests, in addition to the default least-squares splitting rule, two alternative splitting criteria are investigated. We also present and evaluate the performance of five flexible methods for constructing prediction intervals. This yields 20 distinct method variations. To reliably attain the desired confidence level, we include a calibration procedure performed on the out-of-bag information provided by the forest. The 20 method variations are thoroughly investigated, and compared to five alternative methods through simulation studies and in real data settings. The results show that the proposed methods are very competitive. They outperform commonly used methods in both in simulation settings and with real data.

journal_name

Stat Methods Med Res

authors

Roy MH,Larocque D

doi

10.1177/0962280219829885

subject

Has Abstract

pub_date

2020-01-01 00:00:00

pages

205-229

issue

1

eissn

0962-2802

issn

1477-0334

journal_volume

29

pub_type

杂志文章
  • Statistical modelling of measles and influenza outbreaks.

    abstract::This paper reviews the application of statistical models to outbreaks of two common respiratory viral diseases, measles and influenza. For each disease, we look first at its epidemiological characteristics and assess the extent to which these either aid or hinder modelling. We then turn to the models that have been de...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029300200104

    authors: Cliff AD,Haggett P

    更新日期:1993-01-01 00:00:00

  • Efficient Monte Carlo evaluation of resampling-based hypothesis tests with applications to genetic epidemiology.

    abstract::Monte Carlo evaluation of resampling-based tests is often conducted in statistical analysis. However, this procedure is generally computationally intensive. The pooling resampling-based method has been developed to reduce the computational burden but the validity of the method has not been studied before. In this arti...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216661876

    authors: Fung WK,Yu K,Yang Y,Zhou JY

    更新日期:2018-05-01 00:00:00

  • Everything all right in method comparison studies?

    abstract::Researchers and clinicians often need to know whether a new method of measurement is equivalent to an established one that is already in use. For this problem, the estimation of limits of agreement advocated by Bland and Altman is a widely used solution. However, this approach ignores two vital issues in method compar...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280210379365

    authors: Alanen E

    更新日期:2012-08-01 00:00:00

  • Designs in partially controlled studies: messages from a review.

    abstract::The ability to evaluate effects of factors on outcomes is increasingly important for studies that control some but not all of the factors. Although important advances have been made in methods of analysis for such partially controlled studies, work on designs has been limited. To help understand why, we review the mai...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1191/0962280205sm405oa

    authors: Li F,Frangakis CE

    更新日期:2005-08-01 00:00:00

  • Modelling of zero-inflation improves inference of metagenomic gene count data.

    abstract::Metagenomics enables the study of gene abundances in complex mixtures of microorganisms and has become a standard methodology for the analysis of the human microbiome. However, gene abundance data is inherently noisy and contains high levels of biological and technical variability as well as an excess of zeros due to ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218811354

    authors: Jonsson V,Österlund T,Nerman O,Kristiansson E

    更新日期:2019-12-01 00:00:00

  • A composite likelihood approach to predict the sex of the baby.

    abstract::Couples with diseases associated with the sexual chromosomes, as well as families in countries where the desire for a male is extreme, are interested in influencing the sex of the baby. We propose an original composite likelihood approach to analyse the relation between sex of the newborn and timing of the intercourse...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217702415

    authors: Tiberi S,Scarpa B,Sartori N

    更新日期:2018-11-01 00:00:00

  • Measurement error correction using validation data: a review of methods and their applicability in case-control studies.

    abstract::Measurement error is a serious problem in the analysis of epidemiological data. In the past 20 years, a large number of methods for the correction of measurement error have been developed. While at the beginning mostly methods for cohort studies were considered, recently more attention has been paid to case-control st...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228020000900504

    authors: Thürigen D,Spiegelman D,Blettner M,Heuer C,Brenner H

    更新日期:2000-10-01 00:00:00

  • Pseudo-observations in survival analysis.

    abstract::We review recent work on the application of pseudo-observations in survival and event history analysis. This includes regression models for parameters like the survival function in a single point, the restricted mean survival time and transition or state occupation probabilities in multi-state models, e.g. the competi...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/0962280209105020

    authors: Andersen PK,Perme MP

    更新日期:2010-02-01 00:00:00

  • Statistical methods in genetic research on smoking.

    abstract::A growing body of evidence suggests that genetic factors have an important influence on the onset and course of smoking. Here we review some of the statistical methods that have been used to test for genetic influences on smoking behaviour, with a particular focus on studies of large national twin samples. We show how...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029800700205

    authors: Heath AC,Madden PA,Martin NG

    更新日期:1998-06-01 00:00:00

  • On the power of the Cochran-Armitage test for trend in the presence of misclassification.

    abstract::The Cochran-Armitage (CA) test is commonly used in both epidemiology and genetics to test for linear trend in two-way tables with a binary outcome. There has been increasing interest in the power and size of the test and in determination of sample size, especially when there is potential misclassification in the 'expo...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280211406424

    authors: Buonaccorsi JP,Laake P,Veierød MB

    更新日期:2014-06-01 00:00:00

  • Testing hypotheses under adaptive randomization with continuous covariates in clinical trials.

    abstract::Covariate-adaptive designs are widely used to balance covariates and maintain randomization in clinical trials. Adaptive designs for discrete covariates and their asymptotic properties have been well studied in the literature. However, important continuous covariates are often involved in clinical studies. Simply disc...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218770231

    authors: Li X,Zhou J,Hu F

    更新日期:2019-06-01 00:00:00

  • A generalization of functional clustering for discrete multivariate longitudinal data.

    abstract::This paper presents a new model-based generalized functional clustering method for discrete longitudinal data, such as multivariate binomial and Poisson distributed data. For this purpose, we propose a multivariate functional principal component analysis (MFPCA)-based clustering procedure for a latent multivariate Gau...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220921912

    authors: Lim Y,Cheung YK,Oh HS

    更新日期:2020-11-01 00:00:00

  • Re-weighted inference about hepatitis C virus-infected communities when analysing diagnosed patients referred to liver clinics.

    abstract::To project national hepatitis C virus (HCV) burden, unbiased estimation of HCV progression to liver cirrhosis is required for the whole community of HCV-infected individuals. However, widely varying estimates of progression rates to cirrhosis have been produced. This disparity is partly associated with the statistical...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280208094688

    authors: Fu B,Tom BD,Bird SM

    更新日期:2009-06-01 00:00:00

  • A robust score test of homogeneity for zero-inflated count data.

    abstract::In many applications of zero-inflated models, score tests are often used to evaluate whether the population heterogeneity as implied by these models is consistent with the data. The most frequently cited justification for using score tests is that they only require estimation under the null hypothesis. Because this es...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220937324

    authors: Hsu WW,Todem D,Mawella NR,Kim K,Rosenkranz RR

    更新日期:2020-12-01 00:00:00

  • Meta-analysis without study-specific variance information: Heterogeneity case.

    abstract::The random effects model in meta-analysis is a standard statistical tool often used to analyze the effect sizes of the quantity of interest if there is heterogeneity between studies. In the special case considered here, meta-analytic data contain only the sample means in two treatment arms and the sample sizes, but no...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217718867

    authors: Sangnawakij P,Böhning D,Niwitpong SA,Adams S,Stanton M,Holling H

    更新日期:2019-01-01 00:00:00

  • A proof-of-concept-to-confirmatory multiple adaptation design in the development of an anti-viral treatment.

    abstract::In the clinical development of some new infectious disease drugs, early clinical pharmacology trials may predict with high confidence that the efficacious doses are well below the range of the safety margin. In this case, a dose-ranging study may be unnecessary after a proof-of-concept (PoC) study testing the highest ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218807950

    authors: Fan XF,Gallo P,Su G,Menton R,Segal F

    更新日期:2019-12-01 00:00:00

  • Hybrid test for publication bias in meta-analysis.

    abstract::Publication bias frequently appears in meta-analyses when the included studies' results (e.g., p-values) influence the studies' publication processes. Some unfavorable studies may be suppressed from publication, so the meta-analytic results may be biased toward an artificially favorable direction. Many statistical tes...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220910172

    authors: Lin L

    更新日期:2020-10-01 00:00:00

  • Performance of informative priors skeptical of large treatment effects in clinical trials: A simulation study.

    abstract::One of the main advantages of Bayesian analyses of clinical trials is their ability to formally incorporate skepticism about large treatment effects through the use of informative priors. We conducted a simulation study to assess the performance of informative normal, Student- t, and beta distributions in estimating r...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280215620828

    authors: Pedroza C,Han W,Truong VTT,Green C,Tyson JE

    更新日期:2018-01-01 00:00:00

  • Correcting for dependent censoring in routine outcome monitoring data by applying the inverse probability censoring weighted estimator.

    abstract::Censored data make survival analysis more complicated because exact event times are not observed. Statistical methodology developed to account for censored observations assumes that patients' withdrawal from a study is independent of the event of interest. However, in practice, some covariates might be associated to b...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216628900

    authors: Willems S,Schat A,van Noorden MS,Fiocco M

    更新日期:2018-02-01 00:00:00

  • Causal mediation analysis with multiple causally non-ordered mediators.

    abstract::In many health studies, researchers are interested in estimating the treatment effects on the outcome around and through an intermediate variable. Such causal mediation analyses aim to understand the mechanisms that explain the treatment effect. Although multiple mediators are often involved in real studies, most of t...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280215615899

    authors: Taguri M,Featherstone J,Cheng J

    更新日期:2018-01-01 00:00:00

  • Semiparametric integrative interaction analysis for non-small-cell lung cancer.

    abstract::In genomic analysis, it is significant though challenging to identify markers associated with cancer outcomes or phenotypes. Based on the biological mechanisms of cancers and the characteristics of datasets, we propose a novel integrative interaction approach under a semiparametric model, in which genetic and environm...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220909969

    authors: Li Y,Wang F,Li R,Sun Y

    更新日期:2020-10-01 00:00:00

  • Modelling breast cancer tumour growth for a stable disease population.

    abstract::Statistical models of breast cancer tumour progression have been used to further our knowledge of the natural history of breast cancer, to evaluate mammography screening in terms of mortality, to estimate overdiagnosis, and to estimate the impact of lead-time bias when comparing survival times between screen detected ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217734583

    authors: Isheden G,Humphreys K

    更新日期:2019-03-01 00:00:00

  • Analysis of clustered data in receiver operating characteristic studies.

    abstract::Clustered data is not simply correlated data, but has its own unique aspects. In this paper, various methods for correlated receiver operating characteristic (ROC) curve data that have been extended specifically to clustered data are reviewed. For those methods that have not yet been extended, suggestions for their ap...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029800700402

    authors: Beam CA

    更新日期:1998-12-01 00:00:00

  • Letter to the editor: Fitting truncated normal distributions.

    abstract::I comment here on a recent paper in this journal, on the fitting of truncated normal distributions by the EM algorithm. I show that the fitting of such distributions by direct numerical maximization of likelihood (rather than EM) is straightforward, contrary to an assertion made by the authors of that paper. ...

    journal_title:Statistical methods in medical research

    pub_type: 评论,信件

    doi:10.1177/0962280217712089

    authors: MacDonald IL

    更新日期:2018-12-01 00:00:00

  • Statistical methods for multivariate meta-analysis of diagnostic tests: An overview and tutorial.

    abstract::In this article, we present an overview and tutorial of statistical methods for meta-analysis of diagnostic tests under two scenarios: (1) when the reference test can be considered a gold standard and (2) when the reference test cannot be considered a gold standard. In the first scenario, we first review the conventio...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/0962280213492588

    authors: Ma X,Nie L,Cole SR,Chu H

    更新日期:2016-08-01 00:00:00

  • Reference-based pattern-mixture models for analysis of longitudinal binary data.

    abstract::Pattern-mixture model (PMM)-based controlled imputations have become a popular tool to assess the sensitivity of primary analysis inference to different post-dropout assumptions or to estimate treatment effectiveness. The methodology is well established for continuous responses but less well established for binary res...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220941880

    authors: Lu K

    更新日期:2020-12-01 00:00:00

  • Fitting competing risks with an assumed copula.

    abstract::We propose a fully parametric model for the analysis of competing risks data where the types of failure may not be independent. We show how the dependence between the cause-specific survival times can be modelled with a copula function. Features include: identifiability of the problem; accessible understanding of the ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1191/0962280203sm335ra

    authors: Escarela G,Carrière JF

    更新日期:2003-08-01 00:00:00

  • Inferences about population means of health care costs.

    abstract::The analysis of health care costs is complicated by the skewed and heteroscedastic nature of their distribution with possibly additional zero values. Statistical methods that do not adjust for these features can lead to incorrect conclusions. This paper reviews recent developments in statistical methods for the analys...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1191/0962280202sm290ra

    authors: Zhou XH

    更新日期:2002-08-01 00:00:00

  • Mixture modelling for cluster analysis.

    abstract::Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1191/0962280204sm372ra

    authors: McLachlan GJ,Chang SU

    更新日期:2004-10-01 00:00:00

  • Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    abstract::Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratif...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280213497432

    authors: Lin Y,Yu M,Wang S,Chappell R,Imperiale TF

    更新日期:2016-08-01 00:00:00