Determining the number of clusters using the weighted gap statistic.

Abstract:

:Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B63, 411-423), we propose the weighted gap and the difference of difference-weighted (DD-weighted) gap methods for estimating the number of clusters in data using the weighted within-clusters sum of errors: a measure of the within-clusters homogeneity. In addition, we propose a "multilayer" clustering approach, which is shown to be more accurate than the original gap method, particularly in detecting the nested cluster structure of the data. The methods are applicable when the input data contain continuous measurements and can be used with any clustering method. Simulation studies and real data are investigated and compared among these proposed methods as well as with the original gap method.

journal_name

Biometrics

journal_title

Biometrics

authors

Yan M,Ye K

doi

10.1111/j.1541-0420.2007.00784.x

subject

Has Abstract

pub_date

2007-12-01 00:00:00

pages

1031-7

issue

4

eissn

0006-341X

issn

1541-0420

pii

BIOM784

journal_volume

63

pub_type

杂志文章
  • Asynchronous distance between homologous DNA sequences.

    abstract::The distance between homologous DNA sequences of two species is proposed to be -1/4 ln[det(P)], where P is the conditional probability matrix specifying the proportions of the various nucleotides in the second sequence, corresponding to each of the four nucleotides in the first sequence. A probability model is describ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Barry D,Hartigan JA

    更新日期:1987-06-01 00:00:00

  • The use of frailty hazard models for unrecognized heterogeneity that interacts with treatment: considerations of efficiency and power.

    abstract::Increasingly, genetic studies of tumors of the same histologic diagnosis are elucidating subtypes that are distinct with respect to clinical endpoints such as response to treatment and survival. This raises concerns about the efficiency of using the simple log-rank test for analysis of treatment effect on survival in ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2002.00232.x

    authors: Li Y,Betensky RA,Louis DN,Cairncross JG

    更新日期:2002-03-01 00:00:00

  • Receiver operating characteristic curves and confidence bands for support vector machines.

    abstract::Many problems that appear in biomedical decision-making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The support vector machine (SVM) is a popular classification technique that is robust to model misspecification and effectively handles high-dime...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13365

    authors: Luckett DJ,Laber EB,El-Kamary SS,Fan C,Jhaveri R,Perou CM,Shebl FM,Kosorok MR

    更新日期:2020-08-31 00:00:00

  • Additive gamma frailty models with applications to competing risks in related individuals.

    abstract::Epidemiological studies of related individuals are often complicated by the fact that follow-up on the event type of interest is incomplete due to the occurrence of other events. We suggest a class of frailty models with cause-specific hazards for correlated competing events in related individuals. The frailties are b...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12326

    authors: Eriksson F,Scheike T

    更新日期:2015-09-01 00:00:00

  • Analysis of current status data with missing covariates.

    abstract::Statistical inference based on right-censored data for the proportional hazards (PH) model with missing covariates has received considerable attention, but interval-censored or current status data with missing covariates has not yet been investigated. Our study is partly motivated by the analysis of fracture data from...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01505.x

    authors: Wen CC,Lin CT

    更新日期:2011-09-01 00:00:00

  • An empirical Bayes' approach to joint analysis of multiple microarray gene expression studies.

    abstract::With the prevalence of gene expression studies and the relatively low reproducibility caused by insufficient sample sizes, it is natural to consider joint analysis that could combine data from different experiments effectively to achieve improved accuracy. We present in this article a model-based approach for better i...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2011.01602.x

    authors: Ruan L,Yuan M

    更新日期:2011-12-01 00:00:00

  • Sensitivity analysis: distributional assumptions and confounding assumptions.

    abstract::In a presentation of various methods for assessing the sensitivity of regression results to unmeasured confounding, Lin, Psaty, and Kronmal (1998, Biometrics54, 948-963) use a conditional independence assumption to derive algebraic relationships between the true exposure effect and the apparent exposure effect in a re...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2008.01024.x

    authors: Vanderweele TJ

    更新日期:2008-06-01 00:00:00

  • Line-segment confidence bands for repeated measures.

    abstract::For the case of repeated measures on Y with mean values linear in a concomitant variable Z in [a, b], a straight-line confidence band over [a, b] is given with width linear in Z. Graphical presentation of such line-segment confidence bands can help emphasize that appropriate inferences are limited to the range of the ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Stewart PW

    更新日期:1987-09-01 00:00:00

  • Statistical modelling of the AIDS epidemic for forecasting health care needs.

    abstract::The objective of this paper is to develop statistical methods for estimating current and future numbers of individuals in different stages of the natural history of the human immunodeficiency (AIDS) virus infection and to evaluate the impact of therapeutic advances on these numbers. The approach is to extend the metho...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Brookmeyer R,Liao JG

    更新日期:1990-12-01 00:00:00

  • Some distribution properties of the sample species-diversity indices and their applications.

    abstract::In the area of ecological research the study of species diversity of a community or population seems to have been fully developed. However, the problem of how the distributions and expectations of the sample diversity indices are affected by the population diversity has received little attention. In this paper we show...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Tong YL

    更新日期:1983-12-01 00:00:00

  • Variable selection for logistic regression using a prediction-focused information criterion.

    abstract::In biostatistical practice, it is common to use information criteria as a guide for model selection. We propose new versions of the focused information criterion (FIC) for variable selection in logistic regression. The FIC gives, depending on the quantity to be estimated, possibly different sets of selected variables....

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00567.x

    authors: Claeskens G,Croux C,Van Kerckhoven J

    更新日期:2006-12-01 00:00:00

  • Partially supervised learning using an EM-boosting algorithm.

    abstract::Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2004.00156.x

    authors: Yasui Y,Pepe M,Hsu L,Adam BL,Feng Z

    更新日期:2004-03-01 00:00:00

  • Growth curve models of repeated binary response.

    abstract::Experimental designs that include repeated measures of binary response variables over time and under different conditions are common in biology. In such settings, it is often desirable to characterize the response pattern over time. When response variables are continuous, this characterization can be made in terms of ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Stanek EJ 3rd,Diehl SR

    更新日期:1988-12-01 00:00:00

  • Fast Bayesian inference in large Gaussian graphical models.

    abstract::Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypas...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13064

    authors: Leday GGR,Richardson S

    更新日期:2019-12-01 00:00:00

  • A study of deleterious gene structure in plants using Markov chain Monte Carlo.

    abstract::The characteristics of deleterious genes have been of great interest in both theory and practice in genetics. Because of the complex genetic mechanism of these deleterious genes, most current studies try to estimate the overall magnitude of mortality effects on a population, which is characterized classically by the n...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.1999.00376.x

    authors: Lee JK,Lascoux M,Newton MA,Nordheim EV

    更新日期:1999-06-01 00:00:00

  • Group sequential tests for bivariate response: interim analyses of clinical trials with both efficacy and safety endpoints.

    abstract::We describe group sequential tests for a bivariate response. The tests are defined in terms of the two response components jointly, rather than through a single summary statistic. Such methods are appropriate when the two responses concern different aspects of a treatment; for example, one might wish to show that a ne...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Jennison C,Turnbull BW

    更新日期:1993-09-01 00:00:00

  • On constrained balance randomization for clinical trials.

    abstract::A method is proposed for calculating the probabilities of assignment of a patient to treatments; it involves minimizing a quadratic criterion subject to a balance constraint. The optimal probabilities are very easy to compute. Numerical illustration is given and comparisons are drawn with the entropy-based methods of ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Titterington DM

    更新日期:1983-12-01 00:00:00

  • Relative risk trees for censored survival data.

    abstract::A method is developed for obtaining tree-structured relative risk estimates for censored survival data. The first step of a full likelihood estimation procedure is used in a recursive partitioning algorithm that adopts most aspects of the widely used Classification and Regression Tree (CART) algorithm of Breiman et al...

    journal_title:Biometrics

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:

    authors: LeBlanc M,Crowley J

    更新日期:1992-06-01 00:00:00

  • Further aspects of a Markovian sampling policy for water quality monitoring.

    abstract::In this paper, a Markov process is developed as a mathematical model to study the general problem of quality control monitoring. This approach was previously used by Arnold (1970) in development of sampling plans to study the water quality monitoring of streams. Arnold considered the expected sample size required for ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Smeach SC,Jernigan RW

    更新日期:1977-03-01 00:00:00

  • Small sample inference for fixed effects from restricted maximum likelihood.

    abstract::Restricted maximum likelihood (REML) is now well established as a method for estimating the parameters of the general Gaussian linear model with a structured covariance matrix, in particular for mixed linear models. Conventionally, estimates of precision and inference for fixed effects are based on their asymptotic di...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Kenward MG,Roger JH

    更新日期:1997-09-01 00:00:00

  • Modeling adverse birth outcomes via confirmatory factor quantile regression.

    abstract::We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent factors that analysts seek to include as predictors in the quantile ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2011.01639.x

    authors: Burgette LF,Reiter JP

    更新日期:2012-03-01 00:00:00

  • A model-based approach for making ecological inference from distance sampling data.

    abstract::We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships bet...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01265.x

    authors: Johnson DS,Laake JL,Ver Hoef JM

    更新日期:2010-03-01 00:00:00

  • Correcting for the effect of misclassification bias in a case-control study using data from two different questionnaires.

    abstract::In an epidemiological study of risk factors in breast cancer, data are available on confirmed cases from a diagnostic clinic and on controls from a screening clinic that sampled the general population. Relative risk estimation is complicated by differences in the interviewing environment and in the wording and order o...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Elton RA,Duffy SW

    更新日期:1983-09-01 00:00:00

  • Symmetrically dependent models arising in visual assessment data.

    abstract::Given data from bilateral visual assessments on N subjects at k occasions, we consider inference for contralateral correlations (C) between fellow eyes and lateral correlations (L) among p different assessments of the same eye. Under permutation symmetric dependence structure between observations from fellow eyes and ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2000.01188.x

    authors: Viana M,Olkin I

    更新日期:2000-12-01 00:00:00

  • Sequential construction of multiple-objective optimal designs.

    abstract::We propose a sequential approach for constructing multiple-objective locally optimal designs for nonlinear models. The technique used here is a general one and we demonstrate the added benefits of using a multiple-objective design over a single-objective design with examples from biomedical studies. ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Huang YC,Wong WK

    更新日期:1998-12-01 00:00:00

  • Multivariate bioassay, combination of bioassays, and Fieller's theorem.

    abstract::In this paper alternative methods for estimating the relative potency, its confidence intervals, and testing for proportionality are developed for multivariate bioassays. The test and estimate are based on the smaller characteristic root and the corresponding characteristic vector of a 2 X 2 matrix. The same idea is a...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Srivastava MS

    更新日期:1986-03-01 00:00:00

  • Combined maximum likelihood estimates for the equicorrelation coefficient.

    abstract::Combined maximum likelihood estimates for equicorrelation covariance matrices are considered. The case of a common equicorrelation rho and possibly different standard deviations sigma 1, ..., sigma k among k experimental groups is examined first, and the estimation of (rho, sigma 1, ..., sigma k) is discussed. Second,...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Viana MA

    更新日期:1994-09-01 00:00:00

  • Confidence intervals for the generalized ROC criterion.

    abstract::Receiver operating characteristic (ROC) curves are frequently used to assess the usefulness of diagnostic markers. When several diagnostic markers are available, they can be combined by a best linear combination: that is, when the area under the ROC curve of this combination is maximized among all possible linear comb...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Reiser B,Faraggi D

    更新日期:1997-06-01 00:00:00

  • Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks.

    abstract::Thanks to the growing interest in personalized medicine, joint modeling of longitudinal marker and time-to-event data has recently started to be used to derive dynamic individual risk predictions. Individual predictions are called dynamic because they are updated when information on the subject's health profile grows ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12232

    authors: Blanche P,Proust-Lima C,Loubère L,Berr C,Dartigues JF,Jacqmin-Gadda H

    更新日期:2015-03-01 00:00:00

  • Bayesian inference for two-phase studies with categorical covariates.

    abstract::In this article, we consider two-phase sampling in the situation in which all covariates are categorical. Two-phase designs are appealing from an efficiency perspective since they allow sampling to be concentrated in informative cells. A number of likelihood-based methods have been developed for the analysis of two-ph...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12019

    authors: Ross M,Wakefield J

    更新日期:2013-06-01 00:00:00