Partially supervised learning using an EM-boosting algorithm.

Abstract:

:Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.

journal_name

Biometrics

journal_title

Biometrics

authors

Yasui Y,Pepe M,Hsu L,Adam BL,Feng Z

doi

10.1111/j.0006-341X.2004.00156.x

subject

Has Abstract

pub_date

2004-03-01 00:00:00

pages

199-206

issue

1

eissn

0006-341X

issn

1541-0420

pii

BIOM156

journal_volume

60

pub_type

杂志文章
  • Selecting the smoothing parameter for estimation of slowly changing evoked potential signals.

    abstract::Brain evoked potential (EP) data consist of a true response ("signal") and random background activity ("noise"), which are observed over repeated stimulus presentations ("trials"). A signal that changes slowly from trial to trial can be estimated by smoothing across trials and over time within trials. We present a met...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Raz J,Turetsky B,Fein G

    更新日期:1989-09-01 00:00:00

  • Sequential monitoring for comparison of changes in a response variable in clinical studies.

    abstract::The spending function approach proposed by Lan and DeMets (1983, Biometrika 70, 659-663) for sequential monitoring of clinical trials is applied to situations where comparison of changes in a continuous response variable between two groups is the primary concern. Death, loss to follow-up, and missed visits could cause...

    journal_title:Biometrics

    pub_type: 临床试验,杂志文章

    doi:

    authors: Wu MC,Lan KK

    更新日期:1992-09-01 00:00:00

  • A method for estimating incidence rates of onchocerciasis from skin-snip biopsies with consideration of false negatives.

    abstract::The aim of this study is to estimate incidence rates of onchocerciasis from skin-snip biopsies, based on incomplete data obtained in field surveys, with consideration of false negatives. The method of maximum likelihood is employed and the effect of false negatives on the incidence rates is discussed. ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Yanagawa T,Kasagi F,Yoshimura T

    更新日期:1984-06-01 00:00:00

  • Capture-recapture estimation using finite mixtures of arbitrary dimension.

    abstract::Reversible jump Markov chain Monte Carlo (RJMCMC) methods are used to fit Bayesian capture-recapture models incorporating heterogeneity in individuals and samples. Heterogeneity in capture probabilities comes from finite mixtures and/or fixed sample effects allowing for interactions. Estimation by RJMCMC allows automa...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01289.x

    authors: Arnold R,Hayakawa Y,Yip P

    更新日期:2010-06-01 00:00:00

  • Bayesian inference for prevalence in longitudinal two-phase studies.

    abstract::We consider Bayesian inference and model selection for prevalence estimation using a longitudinal two-phase design in which subjects initially receive a low-cost screening test followed by an expensive diagnostic test conducted on several occasions. The change in the subject's diagnostic probability over time is descr...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.01145.x

    authors: Erkanli A,Soyer R,Costello EJ

    更新日期:1999-12-01 00:00:00

  • A mixture model for quantum dot images of kinesin motor assays.

    abstract::We introduce a nearly automatic procedure to locate and count the quantum dots in images of kinesin motor assays. Our procedure employs an approximate likelihood estimator based on a two-component mixture model for the image data; the first component has a normal distribution, and the other component is distributed as...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01467.x

    authors: Hughes J,Fricks J

    更新日期:2011-06-01 00:00:00

  • A study of deleterious gene structure in plants using Markov chain Monte Carlo.

    abstract::The characteristics of deleterious genes have been of great interest in both theory and practice in genetics. Because of the complex genetic mechanism of these deleterious genes, most current studies try to estimate the overall magnitude of mortality effects on a population, which is characterized classically by the n...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00376.x

    authors: Lee JK,Lascoux M,Newton MA,Nordheim EV

    更新日期:1999-06-01 00:00:00

  • An adaptive weighted log-rank test with application to cancer prevention and screening trials.

    abstract::A class of adaptive weighted log-rank statistics is described where the vector of weights is chosen in a data-dependent way from a family of "smooth" weight vectors. A parametric family of weight vectors is identified which includes most shapes of weighting vectors that will be near optimal in many cancer prevention a...

    journal_title:Biometrics

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:

    authors: Self SG

    更新日期:1991-09-01 00:00:00

  • Flexible variable selection for recovering sparsity in nonadditive nonparametric models.

    abstract::Variable selection for recovering sparsity in nonadditive and nonparametric models with high-dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high-dimensional variables. There is currently no variable selection method ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12518

    authors: Fang Z,Kim I,Schaumont P

    更新日期:2016-12-01 00:00:00

  • Matched case-control data analysis with selection bias.

    abstract::Case-control studies offer a rapid and efficient way to evaluate hypotheses. On the other hand, proper selection of the controls is challenging, and the potential for selection bias is a major weakness. Valid inferences about parameters of interest cannot be drawn if selection bias exists. Furthermore, the selection b...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.01106.x

    authors: Lin IF,Paik MC

    更新日期:2001-12-01 00:00:00

  • Improved dynamic predictions from joint models of longitudinal and survival data with time-varying effects using P-splines.

    abstract::In the field of cardio-thoracic surgery, valve function is monitored over time after surgery. The motivation for our research comes from a study which includes patients who received a human tissue valve in the aortic position. These patients are followed prospectively over time by standardized echocardiographic assess...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12814

    authors: Andrinopoulou ER,Eilers PHC,Takkenberg JJM,Rizopoulos D

    更新日期:2018-06-01 00:00:00

  • Comparison of different methods for decision-making in bioequivalence assessment.

    abstract::If the regulatory requirements are symmetrical, the use of symmetrical confidence intervals as a decision rule for bioequivalence assessment leads, as shown by simulations, to better level properties and an inferior power compared to a rule based on shortest confidence intervals. A choice between these two approaches ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Mandallaz D,Mau J

    更新日期:1981-06-01 00:00:00

  • The effect of screening on some pretest-posttest test variances.

    abstract::The clinical trial design in which the endpoint is measured both at baseline and at the end of the study is used in a variety of situations. For two-group designs, test such as the t test or analysis of covariance are commonly used to evaluate treatment efficacy. Often such pretest-posttest trials restrict participati...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Follmann DA

    更新日期:1991-06-01 00:00:00

  • A semiparametric estimate of treatment effects with censored data.

    abstract::A semiparametric estimate of an average regression effect with right-censored failure time data has recently been proposed under the Cox-type model where the regression effect beta(t) is allowed to vary with time. In this article, we derive a simple algebraic relationship between this average regression effect and a m...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2001.00875.x

    authors: Xu R,Harrington DP

    更新日期:2001-09-01 00:00:00

  • Random-effects models, for longitudinal data using Gibbs sampling.

    abstract::Analysis of longitudinal studies is often complicated through differences amongst individuals in the number and spacing of observations. Laird and Ware (1982, Biometrics 38, 963-974) proposed a linear random-effects model to deal with this problem. We propose a generalisation of this model to accommodate multiple rand...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Gilks WR,Wang CC,Yvonnet B,Coursaget P

    更新日期:1993-06-01 00:00:00

  • Reader reaction: A note on the evaluation of group testing algorithms in the presence of misclassification.

    abstract::In the context of group testing screening, McMahan, Tebbs, and Bilder (2012, Biometrics 68, 287-296) proposed a two-stage procedure in a heterogenous population in the presence of misclassification. In earlier work published in Biometrics, Kim, Hudgens, Dreyfuss, Westreich, and Pilcher (2007, Biometrics 63, 1152-1162)...

    journal_title:Biometrics

    pub_type: 评论,杂志文章

    doi:10.1111/biom.12385

    authors: Malinovsky Y,Albert PS,Roy A

    更新日期:2016-03-01 00:00:00

  • FPCA-based method to select optimal sampling schedules that capture between-subject variability in longitudinal studies.

    abstract::A critical component of longitudinal study design involves determining the sampling schedule. Criteria for optimal design often focus on accurate estimation of the mean profile, although capturing the between-subject variance of the longitudinal process is also important since variance patterns may be associated with ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12714

    authors: Wu M,Diez-Roux A,Raghunathan TE,Sánchez BN

    更新日期:2018-03-01 00:00:00

  • Applications of multiple imputation to the analysis of censored regression data.

    abstract::The first part of the article reviews the Data Augmentation algorithm and presents two approximations to the Data Augmentation algorithm for the analysis of missing-data problems: the Poor Man's Data Augmentation algorithm and the Asymptotic Data Augmentation algorithm. These two algorithms are then implemented in the...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Wei GC,Tanner MA

    更新日期:1991-12-01 00:00:00

  • Multi-subgroup gene screening using semi-parametric hierarchical mixture models and the optimal discovery procedure: Application to a randomized clinical trial in multiple myeloma.

    abstract::This article proposes an efficient approach to screening genes associated with a phenotypic variable of interest in genomic studies with subgroups. In order to capture and detect various association profiles across subgroups, we flexibly estimate the underlying effect size distribution across subgroups using a semi-pa...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12716

    authors: Matsui S,Noma H,Qu P,Sakai Y,Matsui K,Heuck C,Crowley J

    更新日期:2018-03-01 00:00:00

  • On identifiability in capture-recapture models.

    abstract::We study the issue of identifiability of mixture models in the context of capture-recapture abundance estimation for closed populations. Such models are used to take account of individual heterogeneity in capture probabilities, but their validity was recently questioned by Link (2003, Biometrics 59, 1123-1130) on the ...

    journal_title:Biometrics

    pub_type: 评论,杂志文章

    doi:10.1111/j.1541-0420.2006.00637_1.x

    authors: Holzmann H,Munk A,Zucchini W

    更新日期:2006-09-01 00:00:00

  • Nonparametric estimation of relative mortality from nested case-control studies.

    abstract::Andersen et al. (1985, Biometrics 41, 921-932) gave an estimator of the cumulative relative mortality comparing rates of death in an epidemiologic cohort to an external population as a function of time when covariate information is available on all cohort members. We present an analogous estimator when covariate infor...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Borgan O,Langholz B

    更新日期:1993-06-01 00:00:00

  • Capitalizing on opportunistic data for monitoring relative abundances of species.

    abstract::With the internet, a massive amount of information on species abundance can be collected by citizen science programs. However, these data are often difficult to use directly in statistical inference, as their collection is generally opportunistic, and the distribution of the sampling effort is often not known. In this...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12431

    authors: Giraud C,Calenge C,Coron C,Julliard R

    更新日期:2016-06-01 00:00:00

  • Statistical modelling of the AIDS epidemic for forecasting health care needs.

    abstract::The objective of this paper is to develop statistical methods for estimating current and future numbers of individuals in different stages of the natural history of the human immunodeficiency (AIDS) virus infection and to evaluate the impact of therapeutic advances on these numbers. The approach is to extend the metho...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Brookmeyer R,Liao JG

    更新日期:1990-12-01 00:00:00

  • Alternative hypotheses for the effects of drugs in small-scale clinical studies.

    abstract::New drugs that will be investigated in the future are expected to deal with chronic diseases, where the number of patients available for controlled clinical trials will be small and where the long-term sequelae that it is hoped will be ameliorated take a long time to occur. Thus, it would be useful to construct powerf...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Salsburg D

    更新日期:1986-09-01 00:00:00

  • Aberrant crypt foci and semiparametric modeling of correlated binary data.

    abstract::Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines....

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2007.00892.x

    authors: Apanasovich TV,Ruppert D,Lupton JR,Popovic N,Turner ND,Chapkin RS,Carroll RJ

    更新日期:2008-06-01 00:00:00

  • On Bayesian methods for bioequivalence.

    abstract::Bayesian methods are presented for assessing bioequivalence for studies in which a new formulation and a standard are administered simultaneously, and for Latin square designs which compare two or more new formulations to a standard. Two examples illustrate the application of the methods. ...

    journal_title:Biometrics

    pub_type: 临床试验,杂志文章

    doi:

    authors: Selwyn MR,Hall NR

    更新日期:1984-12-01 00:00:00

  • Procedures for comparing samples with multiple endpoints.

    abstract::Five procedures are considered for the comparison of two or more multivariate samples. These procedures include a newly proposed nonparametric rank-sum test and a generalized least squares test. Also considered are the following tests: ordinary least squares, Hotelling's T2, and a Bonferroni per-experiment error-rate ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: O'Brien PC

    更新日期:1984-12-01 00:00:00

  • Multivariate survival analysis using piecewise gamma frailty.

    abstract::In this note we propose a frailty model called piecewise gamma frailty for correlated survival data with random effects having a nested structure. In frailty models, a dependence function defined as a hazard ratio of one member given the failure time of another member in a unit is determined by the distributional assu...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Paik MC,Tsai WY,Ottman R

    更新日期:1994-12-01 00:00:00

  • Ultra high-dimensional semiparametric longitudinal data analysis.

    abstract::As ultra high-dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially as ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13348

    authors: Green B,Lian H,Yu Y,Zu T

    更新日期:2020-08-04 00:00:00

  • Some scale estimators and lack-of-fit tests for the censored two-sample accelerated life model.

    abstract::Some new scale estimators for the censored two-sample accelerated life model are introduced. They are zeros of some integrated weighted difference between the two cumulative hazard estimators. These estimators are asymptotically normal. The weight is chosen to result in estimators whose asymptotic variances do not inv...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Yang S

    更新日期:1998-09-01 00:00:00