Subsampling versus bootstrapping in resampling-based model selection for multivariable regression.

Abstract:

:In recent years, increasing attention has been devoted to the problem of the stability of multivariable regression models, understood as the resistance of the model to small changes in the data on which it has been fitted. Resampling techniques, mainly based on the bootstrap, have been developed to address this issue. In particular, the approaches based on the idea of "inclusion frequency" consider the repeated implementation of a variable selection procedure, for example backward elimination, on several bootstrap samples. The analysis of the variables selected in each iteration provides useful information on the model stability and on the variables' importance. Recent findings, nevertheless, show possible pitfalls in the use of the bootstrap, and alternatives such as subsampling have begun to be taken into consideration in the literature. Using model selection frequencies and variable inclusion frequencies, we empirically compare these two different resampling techniques, investigating the effect of their use in selected classical model selection procedures for multivariable regression. We conduct our investigations by analyzing two real data examples and by performing a simulation study. Our results reveal some advantages in using a subsampling technique rather than the bootstrap in this context.

journal_name

Biometrics

journal_title

Biometrics

authors

De Bin R,Janitza S,Sauerbrei W,Boulesteix AL

doi

10.1111/biom.12381

subject

Has Abstract

pub_date

2016-03-01 00:00:00

pages

272-80

issue

1

eissn

0006-341X

issn

1541-0420

journal_volume

72

pub_type

杂志文章
  • Modeling familial association of ages at onset of disease in the presence of competing risk.

    abstract::In genetic family studies, ages at onset of diseases are routinely collected. Often one is interested in assessing the familial association of ages at the onset of a certain disease type. However, when a competing risk is present and is related to the disease of interest, the usual measure of association by treating t...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01372.x

    authors: Shih JH,Albert PS

    更新日期:2010-12-01 00:00:00

  • Combining multivariate bioassays.

    abstract::Linear multivariate theory is applied to the problem of combining several multivariate bioassays. Results are an asymptotic test of the hypothesis of a common log relative potency; the maximum likelihood estimator of the common log relative potency; and an exact and asymptotic confidence interval estimator for log rel...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Meisner M,Kushner HB,Laska EM

    更新日期:1986-06-01 00:00:00

  • A hypothesis test for the end of a common source outbreak.

    abstract::The objective of this article is to develop a hypothesis-testing procedure to determine whether a common source outbreak has ended. We consider the case when neither the calendar date of exposure to the pathogen nor the exact incubation period distribution is known. The hypothesis-testing procedure is based on the spa...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2005.00421.x

    authors: Brookmeyer R,You X

    更新日期:2006-03-01 00:00:00

  • Discriminant diagnostics.

    abstract::I discuss diagnostic methods for discriminant analysis. The equivalence with linear regression is noted and regression diagnostics are considered. The leverage is a function of the linear discriminant function and the Mahalanobis distance of the observation from the group mean. The distribution of this distance is app...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lachenbruch PA

    更新日期:1997-12-01 00:00:00

  • Analysis of ordered categorical data: two score-independent approaches.

    abstract:SUMMARY:A trend test is often employed to analyze ordered categorical data, in which a set of increasing scores is assigned a priori. There is a drawback in this approach, because how to choose a set of scores is not clear. There have been debates on which scores should be used (e.g., Graubard and Korn, 1987, Biometric...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2008.00992.x

    authors: Zheng G

    更新日期:2008-12-01 00:00:00

  • Estimating the average treatment effect on survival based on observational data and using partly conditional modeling.

    abstract::Treatments are frequently evaluated in terms of their effect on patient survival. In settings where randomization of treatment is not feasible, observational data are employed, necessitating correction for covariate imbalances. Treatments are usually compared using a hazard ratio. Most existing methods which quantify ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12542

    authors: Gong Q,Schaubel DE

    更新日期:2017-03-01 00:00:00

  • Fitting mixture models to grouped and truncated data via the EM algorithm.

    abstract::The fitting of finite mixture models via the EM algorithm is considered for data which are available only in grouped form and which may also be truncated. A practical example is presented where a mixture of two doubly truncated log-normal distributions is adopted to model the distribution of the volume of red blood ce...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: McLachlan GJ,Jones PN

    更新日期:1988-06-01 00:00:00

  • On logit confidence intervals for the odds ratio with small samples.

    abstract::Unless the true association is very strong, simple large-sample confidence intervals for the odds ratio based on the delta method perform well even for small samples. Such intervals include the Woolf logit interval and the related Gart interval based on adding .5 before computing the log odds ratio estimate and its st...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00597.x

    authors: Agresti A

    更新日期:1999-06-01 00:00:00

  • Estimating the ventilation-perfusion distribution: an ill-posed integral equation problem.

    abstract::The distribution of ventilation-perfusion ratio over the lung is a useful indicator of the efficiency of lung function. Information about this distribution can be obtained by observing the retention in blood of inert gases passed through the lung. These retentions are related to the ventilation-perfusion distribution ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lim LL,Whitehead J

    更新日期:1992-03-01 00:00:00

  • A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation.

    abstract::Identifying disease-associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high-throughput methylation profiles at single-base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13307

    authors: Zhao K,Oualkacha K,Lakhal-Chaieb L,Labbe A,Klein K,Ciampi A,Hudson M,Colmegna I,Pastinen T,Zhang T,Daley D,Greenwood CMT

    更新日期:2020-05-21 00:00:00

  • A comment on optimal allocations for bioequivalence studies.

    abstract::A method purporting to provide optimal allocations in bioequivalence studies fails to do so on both statistical and practical grounds. Reasons as to why this is so are given. ...

    journal_title:Biometrics

    pub_type: 评论,杂志文章

    doi:10.1111/j.0006-341x.1999.01314.x

    authors: Senn S,Grieve AP

    更新日期:1999-12-01 00:00:00

  • Selecting the smoothing parameter for estimation of slowly changing evoked potential signals.

    abstract::Brain evoked potential (EP) data consist of a true response ("signal") and random background activity ("noise"), which are observed over repeated stimulus presentations ("trials"). A signal that changes slowly from trial to trial can be estimated by smoothing across trials and over time within trials. We present a met...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Raz J,Turetsky B,Fein G

    更新日期:1989-09-01 00:00:00

  • The Jolly-Seber model with tag loss.

    abstract::Tag loss in mark-recapture experiments is a violation of one of the Jolly-Seber model assumptions. It causes bias in parameter estimates and has only been dealt with in an ad hoc manner. We develop methodology to estimate tag retention and abundance in double-tagging mark-recapture experiments. We apply this methodolo...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00523.x

    authors: Cowen L,Schwarz CJ

    更新日期:2006-09-01 00:00:00

  • Testing for Hardy-Weinberg equilibrium.

    abstract::The class of admissible tests for Hardy-Weinberg equilibrium in a multi-allelic system is characterized. The standard goodness-of-fit chi-square tests is shown to be admissible for systems of two or more alleles. The conditional probability distribution required to determine the exact significance level of this test i...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Ledwina T,Gnot S

    更新日期:1980-03-01 00:00:00

  • Applications of likelihood asymptotics for nonlinear regression in herbicide bioassays.

    abstract::Dose-response models are intensively used in herbicide bioassays. Despite recent advancements in the development of new herbicides, statistical analyses are commonly based on asymptotic approximations that are sometimes poor. This paper presents the use of recent results in higher order asymptotics for likelihood-base...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2000.01204.x

    authors: Bellio R,Jensen JE,Seiden P

    更新日期:2000-12-01 00:00:00

  • The effect of conditional dependence on the evaluation of diagnostic tests.

    abstract::The accuracy of a new diagnostic test is often determined by comparison with a reference test which also has unknown error rates. Maximum likelihood estimation of the error rates of both tests is possible if they are simultaneously applied to two populations with different disease prevalences. The estimation procedure...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Vacek PM

    更新日期:1985-12-01 00:00:00

  • A note on the operating characteristics of the modified F test.

    abstract::Brownie, Boos, and Hughes-Oliver (1990, Biometrics 46, 259-266) suggested a modification to the fixed-effects analysis of variance (ANOVA) F test for use in situations where treatments are likely to affect mean response while simultaneously increasing between-subject variability. These authors suggest that the modifie...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Blair RC,Sawilowsky S

    更新日期:1993-09-01 00:00:00

  • Regional spatial modeling of topsoil geochemistry.

    abstract::Geographic information about the levels of toxics in environmental media is commonly used in regional environmental health studies when direct measurements of personal exposure is limited or unavailable. In this article, we propose a statistical framework for analyzing the spatial distribution of topsoil geochemical p...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2008.01041.x

    authors: Calder CA,Craigmile PF,Zhang J

    更新日期:2009-03-01 00:00:00

  • Assessing the goodness-of-fit of hidden Markov models.

    abstract::In this article, we propose a graphical technique for assessing the goodness-of-fit of a stationary hidden Markov model (HMM). We show that plots of the estimated distribution against the empirical distribution detect lack of fit with high probability for large sample sizes. By considering plots of the univariate and ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2004.00189.x

    authors: MacKay Altman R

    更新日期:2004-06-01 00:00:00

  • UMPU and alternative tests for association in 2 x 2 tables.

    abstract::The use of the uniformly most powerful among the unbiased (UMPU) test was recently suggested for the study of gametic association between two polymorphic loci as an alternative to the Fisher's exact test (Zapata and Alvarez, 1997, Annals of Human Genetics 61, 71-77). However, the proposed test is not UMPU for two-side...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.00535.x

    authors: Fuchs C

    更新日期:2001-06-01 00:00:00

  • A group sequential procedure for all-pairwise comparisons of k treatments based on the range statistic.

    abstract::In this paper, a group sequential procedure for all-private comparisons of the means of k independent normal populations with a common known variance is proposed. A repeated range test is defined and its critical points are tabulated. The power function is studied and minimum group size needed to achieve a desirable p...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Liu W

    更新日期:1995-09-01 00:00:00

  • A multilevel mixed effects varying coefficient model with multilevel predictors and random effects for modeling hospitalization risk in patients on dialysis.

    abstract::For patients on dialysis, hospitalizations remain a major risk factor for mortality and morbidity. We use data from a large national database, United States Renal Data System, to model time-varying effects of hospitalization risk factors as functions of time since initiation of dialysis. To account for the three-level...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13205

    authors: Li Y,Nguyen DV,Kürüm E,Rhee CM,Chen Y,Kalantar-Zadeh K,Şentürk D

    更新日期:2020-09-01 00:00:00

  • Binary regression analysis with pooled exposure measurements: a regression calibration approach.

    abstract::It has become increasingly common in epidemiological studies to pool specimens across subjects to achieve accurate quantitation of biomarkers and certain environmental chemicals. In this article, we consider the problem of fitting a binary regression model when an important exposure is subject to pooling. We take a re...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01464.x

    authors: Zhang Z,Albert PS

    更新日期:2011-06-01 00:00:00

  • Locally efficient estimation of the quality-adjusted lifetime distribution with right-censored data and covariates.

    abstract::Zhao and Tsiatis (1997) consider the problem of estimation of the distribution of the quality-adjusted lifetime when the chronological survival time is subject to right censoring. The quality-adjusted lifetime is typically defined as a weighted sum of the times spent in certain states up until death or some other fail...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00530.x

    authors: van der Laan MJ,Hubbard A

    更新日期:1999-06-01 00:00:00

  • Statistical monitoring of the hand, foot and mouth disease in China.

    abstract::In a period starting around 2007, the Hand, Foot, and Mouth Disease (HFMD) became wide-spreading in China, and the Chinese public health was seriously threatened. To prevent the outbreak of infectious diseases like HFMD, effective disease surveillance systems would be especially helpful to give signals of disease outb...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12301

    authors: Zhang J,Kang Y,Yang Y,Qiu P

    更新日期:2015-09-01 00:00:00

  • A note on the conditional approach to interval estimation in the calibration problem.

    abstract::In the calibration problem, the need to construct a confidence interval to estimate the unknown chi 0 arises when the null hypothesis of zero slope is rejected. Otherwise, the resulting confidence interval will be infinite to reflect the fact that the slope of the regression line may be zero. Under the condition of re...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lee JJ

    更新日期:1991-12-01 00:00:00

  • Semiparametric methods for mapping quantitative trait loci with censored data.

    abstract::Statistical methods for the detection of genes influencing quantitative traits with the aid of genetic markers are well developed for normally distributed, fully observed phenotypes. Many experiments are concerned with failure-time phenotypes, which have skewed distributions and which are usually subject to censoring ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2005.00346.x

    authors: Diao G,Lin DY

    更新日期:2005-09-01 00:00:00

  • A note on case-control sampling to estimate kappa coefficients.

    abstract::The feasibility and cost-effectiveness of estimation of kappa using a case-control method of sampling, proposed by Jannarone, Macera, and Garrison (1987, Biometrics 43, 433-437), is provided support. However, in this article unrealistic assumptions in their presentation are identified and more general results for more...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Kraemer HC,Bloch DA

    更新日期:1990-03-01 00:00:00

  • Tests for monotone mean residual life, using randomly censored data.

    abstract::At any age the mean residual life function gives the expected remaining life at that age. Reliabilists and biometricians have found it useful to categorize failure distributions by the monotonicity properties of the mean residual life function. Hollander and Proschan (1975, Biometrika 62, 585-593) have derived tests o...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Chen YY,Hollander M,Langberg NA

    更新日期:1983-03-01 00:00:00

  • Receiver operating characteristic curves and confidence bands for support vector machines.

    abstract::Many problems that appear in biomedical decision-making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The support vector machine (SVM) is a popular classification technique that is robust to model misspecification and effectively handles high-dime...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13365

    authors: Luckett DJ,Laber EB,El-Kamary SS,Fan C,Jhaveri R,Perou CM,Shebl FM,Kosorok MR

    更新日期:2020-08-31 00:00:00