Abstract:
:Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost.
journal_name
Biometricsjournal_title
Biometricsauthors
Yasui Y,Pepe M,Hsu L,Adam BL,Feng Zdoi
10.1111/j.0006-341X.2004.00156.xsubject
Has Abstractpub_date
2004-03-01 00:00:00pages
199-206issue
1eissn
0006-341Xissn
1541-0420pii
BIOM156journal_volume
60pub_type
杂志文章相关文献
BIOMETRICS文献大全abstract::Brain evoked potential (EP) data consist of a true response ("signal") and random background activity ("noise"), which are observed over repeated stimulus presentations ("trials"). A signal that changes slowly from trial to trial can be estimated by smoothing across trials and over time within trials. We present a met...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1989-09-01 00:00:00
abstract::The spending function approach proposed by Lan and DeMets (1983, Biometrika 70, 659-663) for sequential monitoring of clinical trials is applied to situations where comparison of changes in a continuous response variable between two groups is the primary concern. Death, loss to follow-up, and missed visits could cause...
journal_title:Biometrics
pub_type: 临床试验,杂志文章
doi:
更新日期:1992-09-01 00:00:00
abstract::The aim of this study is to estimate incidence rates of onchocerciasis from skin-snip biopsies, based on incomplete data obtained in field surveys, with consideration of false negatives. The method of maximum likelihood is employed and the effect of false negatives on the incidence rates is discussed. ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1984-06-01 00:00:00
abstract::Reversible jump Markov chain Monte Carlo (RJMCMC) methods are used to fit Bayesian capture-recapture models incorporating heterogeneity in individuals and samples. Heterogeneity in capture probabilities comes from finite mixtures and/or fixed sample effects allowing for interactions. Estimation by RJMCMC allows automa...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2009.01289.x
更新日期:2010-06-01 00:00:00
abstract::We consider Bayesian inference and model selection for prevalence estimation using a longitudinal two-phase design in which subjects initially receive a low-cost screening test followed by an expensive diagnostic test conducted on several occasions. The change in the subject's diagnostic probability over time is descr...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.1999.01145.x
更新日期:1999-12-01 00:00:00
abstract::We introduce a nearly automatic procedure to locate and count the quantum dots in images of kinesin motor assays. Our procedure employs an approximate likelihood estimator based on a two-component mixture model for the image data; the first component has a normal distribution, and the other component is distributed as...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2010.01467.x
更新日期:2011-06-01 00:00:00
abstract::The characteristics of deleterious genes have been of great interest in both theory and practice in genetics. Because of the complex genetic mechanism of these deleterious genes, most current studies try to estimate the overall magnitude of mortality effects on a population, which is characterized classically by the n...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.1999.00376.x
更新日期:1999-06-01 00:00:00
abstract::A class of adaptive weighted log-rank statistics is described where the vector of weights is chosen in a data-dependent way from a family of "smooth" weight vectors. A parametric family of weight vectors is identified which includes most shapes of weighting vectors that will be near optimal in many cancer prevention a...
journal_title:Biometrics
pub_type: 临床试验,杂志文章,随机对照试验
doi:
更新日期:1991-09-01 00:00:00
abstract::Variable selection for recovering sparsity in nonadditive and nonparametric models with high-dimensional variables has been challenging. This problem becomes even more difficult due to complications in modeling unknown interaction terms among high-dimensional variables. There is currently no variable selection method ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12518
更新日期:2016-12-01 00:00:00
abstract::Case-control studies offer a rapid and efficient way to evaluate hypotheses. On the other hand, proper selection of the controls is challenging, and the potential for selection bias is a major weakness. Valid inferences about parameters of interest cannot be drawn if selection bias exists. Furthermore, the selection b...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.2001.01106.x
更新日期:2001-12-01 00:00:00
abstract::In the field of cardio-thoracic surgery, valve function is monitored over time after surgery. The motivation for our research comes from a study which includes patients who received a human tissue valve in the aortic position. These patients are followed prospectively over time by standardized echocardiographic assess...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12814
更新日期:2018-06-01 00:00:00
abstract::If the regulatory requirements are symmetrical, the use of symmetrical confidence intervals as a decision rule for bioequivalence assessment leads, as shown by simulations, to better level properties and an inferior power compared to a rule based on shortest confidence intervals. A choice between these two approaches ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1981-06-01 00:00:00
abstract::The clinical trial design in which the endpoint is measured both at baseline and at the end of the study is used in a variety of situations. For two-group designs, test such as the t test or analysis of covariance are commonly used to evaluate treatment efficacy. Often such pretest-posttest trials restrict participati...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1991-06-01 00:00:00
abstract::A semiparametric estimate of an average regression effect with right-censored failure time data has recently been proposed under the Cox-type model where the regression effect beta(t) is allowed to vary with time. In this article, we derive a simple algebraic relationship between this average regression effect and a m...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.2001.00875.x
更新日期:2001-09-01 00:00:00
abstract::Analysis of longitudinal studies is often complicated through differences amongst individuals in the number and spacing of observations. Laird and Ware (1982, Biometrics 38, 963-974) proposed a linear random-effects model to deal with this problem. We propose a generalisation of this model to accommodate multiple rand...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1993-06-01 00:00:00
abstract::In the context of group testing screening, McMahan, Tebbs, and Bilder (2012, Biometrics 68, 287-296) proposed a two-stage procedure in a heterogenous population in the presence of misclassification. In earlier work published in Biometrics, Kim, Hudgens, Dreyfuss, Westreich, and Pilcher (2007, Biometrics 63, 1152-1162)...
journal_title:Biometrics
pub_type: 评论,杂志文章
doi:10.1111/biom.12385
更新日期:2016-03-01 00:00:00
abstract::A critical component of longitudinal study design involves determining the sampling schedule. Criteria for optimal design often focus on accurate estimation of the mean profile, although capturing the between-subject variance of the longitudinal process is also important since variance patterns may be associated with ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12714
更新日期:2018-03-01 00:00:00
abstract::The first part of the article reviews the Data Augmentation algorithm and presents two approximations to the Data Augmentation algorithm for the analysis of missing-data problems: the Poor Man's Data Augmentation algorithm and the Asymptotic Data Augmentation algorithm. These two algorithms are then implemented in the...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1991-12-01 00:00:00
abstract::This article proposes an efficient approach to screening genes associated with a phenotypic variable of interest in genomic studies with subgroups. In order to capture and detect various association profiles across subgroups, we flexibly estimate the underlying effect size distribution across subgroups using a semi-pa...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12716
更新日期:2018-03-01 00:00:00
abstract::We study the issue of identifiability of mixture models in the context of capture-recapture abundance estimation for closed populations. Such models are used to take account of individual heterogeneity in capture probabilities, but their validity was recently questioned by Link (2003, Biometrics 59, 1123-1130) on the ...
journal_title:Biometrics
pub_type: 评论,杂志文章
doi:10.1111/j.1541-0420.2006.00637_1.x
更新日期:2006-09-01 00:00:00
abstract::Andersen et al. (1985, Biometrics 41, 921-932) gave an estimator of the cumulative relative mortality comparing rates of death in an epidemiologic cohort to an external population as a function of time when covariate information is available on all cohort members. We present an analogous estimator when covariate infor...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1993-06-01 00:00:00
abstract::With the internet, a massive amount of information on species abundance can be collected by citizen science programs. However, these data are often difficult to use directly in statistical inference, as their collection is generally opportunistic, and the distribution of the sampling effort is often not known. In this...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12431
更新日期:2016-06-01 00:00:00
abstract::The objective of this paper is to develop statistical methods for estimating current and future numbers of individuals in different stages of the natural history of the human immunodeficiency (AIDS) virus infection and to evaluate the impact of therapeutic advances on these numbers. The approach is to extend the metho...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1990-12-01 00:00:00
abstract::New drugs that will be investigated in the future are expected to deal with chronic diseases, where the number of patients available for controlled clinical trials will be small and where the long-term sequelae that it is hoped will be ameliorated take a long time to occur. Thus, it would be useful to construct powerf...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1986-09-01 00:00:00
abstract::Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines....
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2007.00892.x
更新日期:2008-06-01 00:00:00
abstract::Bayesian methods are presented for assessing bioequivalence for studies in which a new formulation and a standard are administered simultaneously, and for Latin square designs which compare two or more new formulations to a standard. Two examples illustrate the application of the methods. ...
journal_title:Biometrics
pub_type: 临床试验,杂志文章
doi:
更新日期:1984-12-01 00:00:00
abstract::Five procedures are considered for the comparison of two or more multivariate samples. These procedures include a newly proposed nonparametric rank-sum test and a generalized least squares test. Also considered are the following tests: ordinary least squares, Hotelling's T2, and a Bonferroni per-experiment error-rate ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1984-12-01 00:00:00
abstract::In this note we propose a frailty model called piecewise gamma frailty for correlated survival data with random effects having a nested structure. In frailty models, a dependence function defined as a hazard ratio of one member given the failure time of another member in a unit is determined by the distributional assu...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1994-12-01 00:00:00
abstract::As ultra high-dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially as ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.13348
更新日期:2020-08-04 00:00:00
abstract::Some new scale estimators for the censored two-sample accelerated life model are introduced. They are zeros of some integrated weighted difference between the two cumulative hazard estimators. These estimators are asymptotically normal. The weight is chosen to result in estimators whose asymptotic variances do not inv...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1998-09-01 00:00:00