The cross-validated AUC for MCP-logistic regression with high-dimensional data.

Abstract:

:We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

journal_name

Stat Methods Med Res

authors

Jiang D,Huang J,Zhang Y

doi

10.1177/0962280211428385

subject

Has Abstract

pub_date

2013-10-01 00:00:00

pages

505-18

issue

5

eissn

0962-2802

issn

1477-0334

pii

0962280211428385

journal_volume

22

pub_type

杂志文章
  • A Bayesian semiparametric approach with change points for spatial ordinal data.

    abstract::The change-point model has drawn much attention over the past few decades. It can accommodate the jump process, which allows for changes of the effects before and after the change point. Intellectual disability is a long-term disability that impacts performance in cognitive aspects of life and usually has its onset pr...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280212463415

    authors: Cai B,Lawson AB,McDermott S,Aelion CM

    更新日期:2016-04-01 00:00:00

  • Measuring continuous baseline covariate imbalances in clinical trial data.

    abstract::This paper presents and compares several methods of measuring continuous baseline covariate imbalance in clinical trial data. Simulations illustrate that though the t-test is an inappropriate method of assessing continuous baseline covariate imbalance, the test statistic itself is a robust measure in capturing imbalan...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280211416038

    authors: Ciolino JD,Martin RH,Zhao W,Hill MD,Jauch EC,Palesch YY

    更新日期:2015-04-01 00:00:00

  • Parametric models for incomplete continuous and categorical longitudinal data.

    abstract::This paper reviews models for incomplete continuous and categorical longitudinal data. In terms of Rubin's classification of missing value processes we are specifically concerned with the problem of nonrandom missingness. A distinction is drawn between the classes of selection and pattern-mixture models and, using sev...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029900800105

    authors: Kenward MG,Molenberghs G

    更新日期:1999-03-01 00:00:00

  • Extending backcalculation to analyse BSE data.

    abstract::We review the origins of backcalculation (or back projection) methods developed for the analysis of AIDS (acquired immunodeficiency syndrome) incidence data. These techniques have been used extensively for >15 years to deconvolute clinical case incidence, given knowledge of the incubation period distribution, to obtai...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1191/0962280203sm337ra

    authors: Donnelly CA,Ferguson NM,Ghani AC,Anderson RM

    更新日期:2003-06-01 00:00:00

  • Maximum likelihood estimation of time to first event in the presence of data gaps and multiple events.

    abstract::We propose a novel likelihood method for analyzing time-to-event data when multiple events and multiple missing data intervals are possible prior to the first observed event for a given subject. This research is motivated by data obtained from a heart monitor used to track the recovery process of subjects experiencing...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280212466089

    authors: Green CL,Brownie C,Boos DD,Lu JC,Krucoff MW

    更新日期:2016-04-01 00:00:00

  • Semi-supervised identification of cancer subgroups using survival outcomes and overlapping grouping information.

    abstract::Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because o...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217752980

    authors: Wei W,Sun Z,da Silveira WA,Yu Z,Lawson A,Hardiman G,Kelemen LE,Chung D

    更新日期:2019-07-01 00:00:00

  • Analysis of phase II methodologies for single-arm clinical trials with multiple endpoints in rare cancers: An example in Ewing's sarcoma.

    abstract::Trials run in either rare diseases, such as rare cancers, or rare sub-populations of common diseases are challenging in terms of identifying, recruiting and treating sufficient patients in a sensible period. Treatments for rare diseases are often designed for other disease areas and then later proposed as possible tre...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216662070

    authors: Dutton P,Love SB,Billingham L,Hassan AB

    更新日期:2018-05-01 00:00:00

  • Measurement error, time lag, unmeasured confounding: Considerations for longitudinal estimation of the effect of a mediator in randomised clinical trials.

    abstract::Clinical trials are expensive and time-consuming and so should also be used to study how treatments work, allowing for the evaluation of theoretical treatment models and refinement and improvement of treatments. These treatment processes can be studied using mediation analysis. Randomised treatment makes some of the a...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216666111

    authors: Goldsmith KA,Chalder T,White PD,Sharpe M,Pickles A

    更新日期:2018-06-01 00:00:00

  • Estimating marginal and incremental effects in the analysis of medical expenditure panel data using marginalized two-part random-effects generalized Gamma models: Evidence from China healthcare cost data.

    abstract::Conditional two-part random-effects models have been proposed for the analysis of healthcare cost panel data that contain both zero costs from the non-users of healthcare facilities and positive costs from the users. These models have been extended to accommodate more flexible data structures when using the generalize...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217690770

    authors: Zhang B,Liu W,Hu Y

    更新日期:2018-10-01 00:00:00

  • Power and sample size for multivariate logistic modeling of unmatched case-control studies.

    abstract::Sample size calculations are needed to design and assess the feasibility of case-control studies. Although such calculations are readily available for simple case-control designs and univariate analyses, there is limited theory and software for multivariate unconditional logistic analysis of case-control data. Here we...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217737157

    authors: Gail MH,Haneuse S

    更新日期:2019-03-01 00:00:00

  • Statistical modelling of measles and influenza outbreaks.

    abstract::This paper reviews the application of statistical models to outbreaks of two common respiratory viral diseases, measles and influenza. For each disease, we look first at its epidemiological characteristics and assess the extent to which these either aid or hinder modelling. We then turn to the models that have been de...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章,评审

    doi:10.1177/096228029300200104

    authors: Cliff AD,Haggett P

    更新日期:1993-01-01 00:00:00

  • The application of multidimensional scaling methods to epidemiological data.

    abstract::This paper illustrates the use of multidimensional scaling methods (MDS) to examine space-time patterns in epidemic data. The paper begins by outlining the principles of MDS. The model is then formally specified and illustrated by application to two data sets. The first is partly a tutorial example. It uses monthly re...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/096228029500400202

    authors: Cliff AD,Haggett P,Smallman-Raynor MR,Stroup DF,Williamson GD

    更新日期:1995-06-01 00:00:00

  • Propensity scores: from naive enthusiasm to intuitive understanding.

    abstract::Estimation of the effect of a binary exposure on an outcome in the presence of confounding is often carried out via outcome regression modelling. An alternative approach is to use propensity score methodology. The propensity score is the conditional probability of receiving the exposure given the observed covariates a...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280210394483

    authors: Williamson E,Morley R,Lucas A,Carpenter J

    更新日期:2012-06-01 00:00:00

  • Bayesian nonparametric inference for the three-class Youden index and its associated optimal cutoff points.

    abstract::The three-class Youden index serves both as a measure of medical test accuracy and a criterion to choose the optimal pair of cutoff values for classifying subjects into three ordinal disease categories (e.g. no disease, mild disease, advanced disease). We present a Bayesian nonparametric approach for estimating the th...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217742538

    authors: Carvalho VI,Branscum AJ

    更新日期:2018-03-01 00:00:00

  • Copas-like selection model to correct publication bias in systematic review of diagnostic test studies.

    abstract::The accuracy of a diagnostic test, which is often quantified by a pair of measures such as sensitivity and specificity, is critical for medical decision making. Separate studies of an investigational diagnostic test can be combined through meta-analysis; however, such an analysis can be threatened by publication bias....

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218791602

    authors: Piao J,Liu Y,Chen Y,Ning J

    更新日期:2019-10-01 00:00:00

  • The asymptotic maximal procedure for subject randomization in clinical trials.

    abstract::The maximal procedure is a restricted randomization method that maximizes the number of feasible allocation sequences under the constraints of the maximum tolerated imbalance and the allocation sequence length. It assigns an equal probability to all feasible sequences. However, its implementation is not easy due to th...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216677107

    authors: Zhao W,Berger VW,Yu Z

    更新日期:2018-07-01 00:00:00

  • Multilevel growth curve models that incorporate a random coefficient model for the level 1 variance function.

    abstract::Aim To present a flexible model for repeated measures longitudinal growth data within individuals that allows trends over time to incorporate individual-specific random effects. These may reflect the timing of growth events and characterise within-individual variability which can be modelled as a function of age. Subj...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217706728

    authors: Goldstein H,Leckie G,Charlton C,Tilling K,Browne WJ

    更新日期:2018-11-01 00:00:00

  • A composite likelihood approach to predict the sex of the baby.

    abstract::Couples with diseases associated with the sexual chromosomes, as well as families in countries where the desire for a male is extreme, are interested in influencing the sex of the baby. We propose an original composite likelihood approach to analyse the relation between sex of the newborn and timing of the intercourse...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280217702415

    authors: Tiberi S,Scarpa B,Sartori N

    更新日期:2018-11-01 00:00:00

  • Sample size for binary logistic prediction models: Beyond events per variable criteria.

    abstract::Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictor...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218784726

    authors: van Smeden M,Moons KG,de Groot JA,Collins GS,Altman DG,Eijkemans MJ,Reitsma JB

    更新日期:2019-08-01 00:00:00

  • Small sample sizes: A big data problem in high-dimensional data analysis.

    abstract::In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (mu...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280220970228

    authors: Konietschke F,Schwab K,Pauly M

    更新日期:2020-11-24 00:00:00

  • A monotone data augmentation algorithm for longitudinal data analysis via multivariate skew-t, skew-normal or t distributions.

    abstract::The mixed effects model for repeated measures has been widely used for the analysis of longitudinal clinical data collected at a number of fixed time points. We propose a robust extension of the mixed effects model for repeated measures for skewed and heavy-tailed data on basis of the multivariate skew-t distribution,...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280219865579

    authors: Tang Y

    更新日期:2020-06-01 00:00:00

  • Bayesian latent structure modeling of walking behavior in a physical activity intervention.

    abstract::The analysis of walking behavior in a physical activity intervention is considered. A Bayesian latent structure modeling approach is proposed whereby the ability and willingness of participants is modeled via latent effects. The dropout process is jointly modeled via a linked survival model. Computational issues are a...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280214529932

    authors: Lawson AB,Ellerbe C,Carroll R,Alia K,Coulon S,Wilson DK,VanHorn ML,George SM

    更新日期:2016-12-01 00:00:00

  • A frequentist approach to estimating the force of infection for a respiratory disease using repeated measurement data from a birth cohort.

    abstract::This article aims to develop a probability-based model involving the use of direct likelihood formulation and generalised linear modelling (GLM) approaches useful in estimating important disease parameters from longitudinal or repeated measurement data. The current application is based on infection with respiratory sy...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280210385749

    authors: Mwambi H,Ramroop S,White Lj,Okiro E,Nokes Dj,Shkedy Z,Molenberghs G

    更新日期:2011-10-01 00:00:00

  • Everything all right in method comparison studies?

    abstract::Researchers and clinicians often need to know whether a new method of measurement is equivalent to an established one that is already in use. For this problem, the estimation of limits of agreement advocated by Bland and Altman is a widely used solution. However, this approach ignores two vital issues in method compar...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280210379365

    authors: Alanen E

    更新日期:2012-08-01 00:00:00

  • Stratified and randomized play-the-winner rule.

    abstract::In this paper, a new allocation rule for treatment assignments in sequential clinical trials is proposed. The stratified and randomized play-the-winner rule (SRPWR) is an extension of the randomized play-the-winner rule to more than two treatments. It is applicable to cases where the probabilities of success of a trea...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280207081606

    authors: Liang Y,Carriere KC

    更新日期:2008-12-01 00:00:00

  • Performance of informative priors skeptical of large treatment effects in clinical trials: A simulation study.

    abstract::One of the main advantages of Bayesian analyses of clinical trials is their ability to formally incorporate skepticism about large treatment effects through the use of informative priors. We conducted a simulation study to assess the performance of informative normal, Student- t, and beta distributions in estimating r...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280215620828

    authors: Pedroza C,Han W,Truong VTT,Green C,Tyson JE

    更新日期:2018-01-01 00:00:00

  • Efficient Monte Carlo evaluation of resampling-based hypothesis tests with applications to genetic epidemiology.

    abstract::Monte Carlo evaluation of resampling-based tests is often conducted in statistical analysis. However, this procedure is generally computationally intensive. The pooling resampling-based method has been developed to reduce the computational burden but the validity of the method has not been studied before. In this arti...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280216661876

    authors: Fung WK,Yu K,Yang Y,Zhou JY

    更新日期:2018-05-01 00:00:00

  • Stochastic models of sequence evolution including insertion-deletion events.

    abstract::Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is ...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280208099500

    authors: Miklós I,Novák A,Satija R,Lyngsø R,Hein J

    更新日期:2009-10-01 00:00:00

  • Accurate quantification of uncertainty in epidemic parameter estimates and predictions using stochastic compartmental models.

    abstract::Stochastic transmission dynamic models are needed to quantify the uncertainty in estimates and predictions during outbreaks of infectious diseases. We previously developed a calibration method for stochastic epidemic compartmental models, called Multiple Shooting for Stochastic Systems (MSS), and demonstrated its comp...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218805780

    authors: Zimmer C,Leuba SI,Cohen T,Yaesoubi R

    更新日期:2019-12-01 00:00:00

  • A quick and accurate method for the estimation of covariate effects based on empirical Bayes estimates in mixed-effects modeling: Correction of bias due to shrinkage.

    abstract::Nonlinear mixed-effects modeling is a popular approach to describe the temporal trajectory of repeated measurements of clinical endpoints collected over time in clinical trials, to distinguish the within-subject and the between-subject variabilities, and to investigate clinically important risk factors (covariates) th...

    journal_title:Statistical methods in medical research

    pub_type: 杂志文章

    doi:10.1177/0962280218812595

    authors: Yuan M,Xu XS,Yang Y,Xu J,Huang X,Tao F,Zhao L,Zhang L,Pinheiro J

    更新日期:2019-12-01 00:00:00