Abstract:
:Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B63, 411-423), we propose the weighted gap and the difference of difference-weighted (DD-weighted) gap methods for estimating the number of clusters in data using the weighted within-clusters sum of errors: a measure of the within-clusters homogeneity. In addition, we propose a "multilayer" clustering approach, which is shown to be more accurate than the original gap method, particularly in detecting the nested cluster structure of the data. The methods are applicable when the input data contain continuous measurements and can be used with any clustering method. Simulation studies and real data are investigated and compared among these proposed methods as well as with the original gap method.
journal_name
Biometricsjournal_title
Biometricsauthors
Yan M,Ye Kdoi
10.1111/j.1541-0420.2007.00784.xsubject
Has Abstractpub_date
2007-12-01 00:00:00pages
1031-7issue
4eissn
0006-341Xissn
1541-0420pii
BIOM784journal_volume
63pub_type
杂志文章相关文献
BIOMETRICS文献大全abstract::The distance between homologous DNA sequences of two species is proposed to be -1/4 ln[det(P)], where P is the conditional probability matrix specifying the proportions of the various nucleotides in the second sequence, corresponding to each of the four nucleotides in the first sequence. A probability model is describ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1987-06-01 00:00:00
abstract::Increasingly, genetic studies of tumors of the same histologic diagnosis are elucidating subtypes that are distinct with respect to clinical endpoints such as response to treatment and survival. This raises concerns about the efficiency of using the simple log-rank test for analysis of treatment effect on survival in ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.2002.00232.x
更新日期:2002-03-01 00:00:00
abstract::Many problems that appear in biomedical decision-making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The support vector machine (SVM) is a popular classification technique that is robust to model misspecification and effectively handles high-dime...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.13365
更新日期:2020-08-31 00:00:00
abstract::Epidemiological studies of related individuals are often complicated by the fact that follow-up on the event type of interest is incomplete due to the occurrence of other events. We suggest a class of frailty models with cause-specific hazards for correlated competing events in related individuals. The frailties are b...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12326
更新日期:2015-09-01 00:00:00
abstract::Statistical inference based on right-censored data for the proportional hazards (PH) model with missing covariates has received considerable attention, but interval-censored or current status data with missing covariates has not yet been investigated. Our study is partly motivated by the analysis of fracture data from...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2010.01505.x
更新日期:2011-09-01 00:00:00
abstract::With the prevalence of gene expression studies and the relatively low reproducibility caused by insufficient sample sizes, it is natural to consider joint analysis that could combine data from different experiments effectively to achieve improved accuracy. We present in this article a model-based approach for better i...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2011.01602.x
更新日期:2011-12-01 00:00:00
abstract::In a presentation of various methods for assessing the sensitivity of regression results to unmeasured confounding, Lin, Psaty, and Kronmal (1998, Biometrics54, 948-963) use a conditional independence assumption to derive algebraic relationships between the true exposure effect and the apparent exposure effect in a re...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2008.01024.x
更新日期:2008-06-01 00:00:00
abstract::For the case of repeated measures on Y with mean values linear in a concomitant variable Z in [a, b], a straight-line confidence band over [a, b] is given with width linear in Z. Graphical presentation of such line-segment confidence bands can help emphasize that appropriate inferences are limited to the range of the ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1987-09-01 00:00:00
abstract::The objective of this paper is to develop statistical methods for estimating current and future numbers of individuals in different stages of the natural history of the human immunodeficiency (AIDS) virus infection and to evaluate the impact of therapeutic advances on these numbers. The approach is to extend the metho...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1990-12-01 00:00:00
abstract::In the area of ecological research the study of species diversity of a community or population seems to have been fully developed. However, the problem of how the distributions and expectations of the sample diversity indices are affected by the population diversity has received little attention. In this paper we show...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1983-12-01 00:00:00
abstract::In biostatistical practice, it is common to use information criteria as a guide for model selection. We propose new versions of the focused information criterion (FIC) for variable selection in logistic regression. The FIC gives, depending on the quantity to be estimated, possibly different sets of selected variables....
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2006.00567.x
更新日期:2006-12-01 00:00:00
abstract::Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341X.2004.00156.x
更新日期:2004-03-01 00:00:00
abstract::Experimental designs that include repeated measures of binary response variables over time and under different conditions are common in biology. In such settings, it is often desirable to characterize the response pattern over time. When response variables are continuous, this characterization can be made in terms of ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1988-12-01 00:00:00
abstract::Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypas...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.13064
更新日期:2019-12-01 00:00:00
abstract::The characteristics of deleterious genes have been of great interest in both theory and practice in genetics. Because of the complex genetic mechanism of these deleterious genes, most current studies try to estimate the overall magnitude of mortality effects on a population, which is characterized classically by the n...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.1999.00376.x
更新日期:1999-06-01 00:00:00
abstract::We describe group sequential tests for a bivariate response. The tests are defined in terms of the two response components jointly, rather than through a single summary statistic. Such methods are appropriate when the two responses concern different aspects of a treatment; for example, one might wish to show that a ne...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1993-09-01 00:00:00
abstract::A method is proposed for calculating the probabilities of assignment of a patient to treatments; it involves minimizing a quadratic criterion subject to a balance constraint. The optimal probabilities are very easy to compute. Numerical illustration is given and comparisons are drawn with the entropy-based methods of ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1983-12-01 00:00:00
abstract::A method is developed for obtaining tree-structured relative risk estimates for censored survival data. The first step of a full likelihood estimation procedure is used in a recursive partitioning algorithm that adopts most aspects of the widely used Classification and Regression Tree (CART) algorithm of Breiman et al...
journal_title:Biometrics
pub_type: 临床试验,杂志文章,随机对照试验
doi:
更新日期:1992-06-01 00:00:00
abstract::In this paper, a Markov process is developed as a mathematical model to study the general problem of quality control monitoring. This approach was previously used by Arnold (1970) in development of sampling plans to study the water quality monitoring of streams. Arnold considered the expected sample size required for ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1977-03-01 00:00:00
abstract::Restricted maximum likelihood (REML) is now well established as a method for estimating the parameters of the general Gaussian linear model with a structured covariance matrix, in particular for mixed linear models. Conventionally, estimates of precision and inference for fixed effects are based on their asymptotic di...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1997-09-01 00:00:00
abstract::We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent factors that analysts seek to include as predictors in the quantile ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2011.01639.x
更新日期:2012-03-01 00:00:00
abstract::We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships bet...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.1541-0420.2009.01265.x
更新日期:2010-03-01 00:00:00
abstract::In an epidemiological study of risk factors in breast cancer, data are available on confirmed cases from a diagnostic clinic and on controls from a screening clinic that sampled the general population. Relative risk estimation is complicated by differences in the interviewing environment and in the wording and order o...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1983-09-01 00:00:00
abstract::Given data from bilateral visual assessments on N subjects at k occasions, we consider inference for contralateral correlations (C) between fellow eyes and lateral correlations (L) among p different assessments of the same eye. Under permutation symmetric dependence structure between observations from fellow eyes and ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/j.0006-341x.2000.01188.x
更新日期:2000-12-01 00:00:00
abstract::We propose a sequential approach for constructing multiple-objective locally optimal designs for nonlinear models. The technique used here is a general one and we demonstrate the added benefits of using a multiple-objective design over a single-objective design with examples from biomedical studies. ...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1998-12-01 00:00:00
abstract::In this paper alternative methods for estimating the relative potency, its confidence intervals, and testing for proportionality are developed for multivariate bioassays. The test and estimate are based on the smaller characteristic root and the corresponding characteristic vector of a 2 X 2 matrix. The same idea is a...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1986-03-01 00:00:00
abstract::Combined maximum likelihood estimates for equicorrelation covariance matrices are considered. The case of a common equicorrelation rho and possibly different standard deviations sigma 1, ..., sigma k among k experimental groups is examined first, and the estimation of (rho, sigma 1, ..., sigma k) is discussed. Second,...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1994-09-01 00:00:00
abstract::Receiver operating characteristic (ROC) curves are frequently used to assess the usefulness of diagnostic markers. When several diagnostic markers are available, they can be combined by a best linear combination: that is, when the area under the ROC curve of this combination is maximized among all possible linear comb...
journal_title:Biometrics
pub_type: 杂志文章
doi:
更新日期:1997-06-01 00:00:00
abstract::Thanks to the growing interest in personalized medicine, joint modeling of longitudinal marker and time-to-event data has recently started to be used to derive dynamic individual risk predictions. Individual predictions are called dynamic because they are updated when information on the subject's health profile grows ...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12232
更新日期:2015-03-01 00:00:00
abstract::In this article, we consider two-phase sampling in the situation in which all covariates are categorical. Two-phase designs are appealing from an efficiency perspective since they allow sampling to be concentrated in informative cells. A number of likelihood-based methods have been developed for the analysis of two-ph...
journal_title:Biometrics
pub_type: 杂志文章
doi:10.1111/biom.12019
更新日期:2013-06-01 00:00:00