Statistical significance for hierarchical clustering.

Abstract:

:Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high-dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this article, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets.

journal_name

Biometrics

journal_title

Biometrics

authors

Kimes PK,Liu Y,Neil Hayes D,Marron JS

doi

10.1111/biom.12647

subject

Has Abstract

pub_date

2017-09-01 00:00:00

pages

811-821

issue

3

eissn

0006-341X

issn

1541-0420

journal_volume

73

pub_type

杂志文章
  • Robust inference for the stepped wedge design.

    abstract::Stepped wedge designed trials are a type of cluster-randomized study in which the intervention is introduced to each cluster in a random order over time. This design is often used to assess the effect of a new intervention as it is rolled out across a series of clinics or communities. Based on a permutation argument, ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13106

    authors: Hughes JP,Heagerty PJ,Xia F,Ren Y

    更新日期:2020-03-01 00:00:00

  • Multiple imputation for model checking: completed-data plots with missing and latent data.

    abstract::In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to mo...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2005.031010.x

    authors: Gelman A,Van Mechelen I,Verbeke G,Heitjan DF,Meulders M

    更新日期:2005-03-01 00:00:00

  • Design considerations for efficient and effective microarray studies.

    abstract::This article describes the theoretical and practical issues in experimental design for gene expression microarrays. Specifically, this article 1) discusses the basic principles of design (randomization, replication, and blocking) as they pertain to microarrays, and 2) provides some general guidelines for statisticians...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341x.2003.00096.x

    authors: Kerr MK

    更新日期:2003-12-01 00:00:00

  • Coregionalized single- and multiresolution spatially varying growth curve modeling with application to weed growth.

    abstract::Modeling of longitudinal data from agricultural experiments using growth curves helps understand conditions conducive or unconducive to crop growth. Recent advances in Geographical Information Systems (GIS) now allow geocoding of agricultural data that help understand spatial patterns. A particularly common problem is...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00535.x

    authors: Banerjee S,Johnson GA

    更新日期:2006-09-01 00:00:00

  • Estimating the ventilation-perfusion distribution: an ill-posed integral equation problem.

    abstract::The distribution of ventilation-perfusion ratio over the lung is a useful indicator of the efficiency of lung function. Information about this distribution can be obtained by observing the retention in blood of inert gases passed through the lung. These retentions are related to the ventilation-perfusion distribution ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Lim LL,Whitehead J

    更新日期:1992-03-01 00:00:00

  • A comment on optimal allocations for bioequivalence studies.

    abstract::A method purporting to provide optimal allocations in bioequivalence studies fails to do so on both statistical and practical grounds. Reasons as to why this is so are given. ...

    journal_title:Biometrics

    pub_type: 评论,杂志文章

    doi:10.1111/j.0006-341x.1999.01314.x

    authors: Senn S,Grieve AP

    更新日期:1999-12-01 00:00:00

  • The proportional odds cumulative incidence model for competing risks.

    abstract::We suggest an estimator for the proportional odds cumulative incidence model for competing risks data. The key advantage of this model is that the regression parameters have the simple and useful odds ratio interpretation. The model has been considered by many authors, but it is rarely used in practice due to the lack...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12330

    authors: Eriksson F,Li J,Scheike T,Zhang MJ

    更新日期:2015-09-01 00:00:00

  • Statistical testing of genetic linkage under heterogeneity.

    abstract::Recent advances in human genetics have led to a renewed interest in statistical methods for the detection of linkage from family data--for example, between marker loci and disease traits. Statistical analysis of linkage between two loci is carried out almost exclusively by means of the lod (log-odds) score test, equiv...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Shoukri MM,Lathrop GM

    更新日期:1993-03-01 00:00:00

  • Tests for monotone mean residual life, using randomly censored data.

    abstract::At any age the mean residual life function gives the expected remaining life at that age. Reliabilists and biometricians have found it useful to categorize failure distributions by the monotonicity properties of the mean residual life function. Hollander and Proschan (1975, Biometrika 62, 585-593) have derived tests o...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Chen YY,Hollander M,Langberg NA

    更新日期:1983-03-01 00:00:00

  • Estimating diagnostic accuracy of raters without a gold standard by exploiting a group of experts.

    abstract::In diagnostic medicine, estimating the diagnostic accuracy of a group of raters or medical tests relative to the gold standard is often the primary goal. When a gold standard is absent, latent class models where the unknown gold standard test is treated as a latent variable are often used. However, these models have b...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2012.01789.x

    authors: Zhang B,Chen Z,Albert PS

    更新日期:2012-12-01 00:00:00

  • Sharpening bounds on principal effects with covariates.

    abstract::Estimation of treatment effects in randomized studies is often hampered by possible selection bias induced by conditioning on or adjusting for a variable measured post-randomization. One approach to obviate such selection bias is to consider inference about treatment effects within principal strata, that is, principal...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12103

    authors: Long DM,Hudgens MG

    更新日期:2013-12-01 00:00:00

  • Exact two-sample inference with missing data.

    abstract::When comparing follow-up measurements from two independent populations, missing records may arise due to censoring by events whose occurrence is associated with baseline covariates. In these situations, inferences based only on the completely followed observations may be biased if the follow-up measurements and the co...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2005.00332.x

    authors: Cheung YK

    更新日期:2005-06-01 00:00:00

  • On the treatment of grouped observations in life studies.

    abstract::Assuming a model of proportional failure rates, Cox (1972) presents a systematic study of the use of covariates in the analysis of life time. The treatment of tied observations is a particularly troublesome point in both theory and application. It appears that grouping rather than discrete time is the right way to han...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Thompson WA Jr

    更新日期:1977-09-01 00:00:00

  • Bayesian latent multi-state modeling for nonequidistant longitudinal electronic health records.

    abstract::Large amounts of longitudinal health records are now available for dynamic monitoring of the underlying processes governing the observations. However, the health status progression across time is not typically observed directly: records are observed only when a subject interacts with the system, yielding irregular and...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13261

    authors: Luo Y,Stephens DA,Verma A,Buckeridge DL

    更新日期:2020-03-11 00:00:00

  • Alternative estimation procedures for Pr(X less than Y) in categorized data.

    abstract::Consider two independent random variables X and Y. The functional R = Pr(X less than Y) [or gamma = Pr(X less than Y) - Pr(Y less than X)] is of practical importance in many situations, including clinical trials, genetics, and reliability. In this paper several approaches to estimation of gamma when X and Y are presen...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Simonoff JS,Hochberg Y,Reiser B

    更新日期:1986-12-01 00:00:00

  • Response-adaptive regression for longitudinal data.

    abstract::We propose a response-adaptive model for functional linear regression, which is adapted to sparsely sampled longitudinal responses. Our method aims at predicting response trajectories and models the regression relationship by directly conditioning the sparse and irregular observations of the response on the predictor,...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01518.x

    authors: Wu S,Müller HG

    更新日期:2011-09-01 00:00:00

  • Unbiased and locally efficient estimation of genetic effect on quantitative trait in the presence of population admixture.

    abstract::Population admixture can be a confounding factor in genetic association studies. Family-based methods (Rabinowitz and Larid, 2000, Human Heredity 50, 211-223) have been proposed in both testing and estimation settings to adjust for this confounding, especially in case-only association studies. The family-based methods...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01454.x

    authors: Wang Y,Yang Q,Rabinowitz D

    更新日期:2011-06-01 00:00:00

  • Heterogeneity models of disease susceptibility, with application to diabetic nephropathy.

    abstract::It is not, in general, possible to include all relevant risk factors in a model of survival or disease incidence. This heterogeneity must be accounted for in the interpretation, as it can imply otherwise unexpected results. This is illustrated by diabetic nephropathy, a serious complication experienced by some diabeti...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Hougaard P,Myglegaard P,Borch-Johnsen K

    更新日期:1994-12-01 00:00:00

  • Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums.

    abstract::Johnson and Wehrly (1978, Journal of the American Statistical Association 73, 602-606) and Wehrly and Johnson (1980, Biometrika 67, 255-256) show one way to construct the joint distribution of a circular and a linear random variable, or the joint distribution of a pair of circular random variables from their marginal ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2006.00716.x

    authors: Fernández-Durán JJ

    更新日期:2007-06-01 00:00:00

  • Mixture models for estimating the size of a closed population when capture rates vary among individuals.

    abstract::We develop a parameterization of the beta-binomial mixture that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals. Three classes of mixture models (beta-binomial, logistic-normal, and latent-class) are fitted to recaptures of snowshoe h...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/1541-0420.00042

    authors: Dorazio RM,Royle JA

    更新日期:2003-06-01 00:00:00

  • Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data.

    abstract::In clinical studies, longitudinal biomarkers are often used to monitor disease progression and failure time. Joint modeling of longitudinal and survival data has certain advantages and has emerged as an effective way to mutually enhance information. Typically, a parametric longitudinal model is assumed to facilitate t...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2007.00896.x

    authors: Ding J,Wang JL

    更新日期:2008-06-01 00:00:00

  • Estimating differences in restricted mean lifetime using observational data subject to dependent censoring.

    abstract::In epidemiologic studies of time to an event, mean lifetime is often of direct interest. We propose methods to estimate group- (e.g., treatment-) specific differences in restricted mean lifetime for studies where treatment is not randomized and lifetimes are subject to both dependent and independent censoring. The pro...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01503.x

    authors: Zhang M,Schaubel DE

    更新日期:2011-09-01 00:00:00

  • Confidence interval estimation for the ratio of simple and standardized rates in cohort studies.

    abstract::Computer simulation has been used to compare four methods for calculating confidence intervals for simple rate ratios estimated from cohort studies. The method proposed by Cornfield (1956. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. Vol. IV, 135-148) for interval estimati...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Howe GR

    更新日期:1983-06-01 00:00:00

  • A simple method for the analysis of clustered binary data.

    abstract::A simple method for comparing independent groups of clustered binary data with group-specific covariates is proposed. It is based on the concepts of design effect and effective sample size widely used in sample surveys, and assumes no specific models for the intracluster correlations. It can be implemented using any s...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:

    authors: Rao JN,Scott AJ

    更新日期:1992-06-01 00:00:00

  • Median regression models for longitudinal data with dropouts.

    abstract:SUMMARY:Recently, median regression models have received increasing attention. When continuous responses follow a distribution that is quite different from a normal distribution, usual mean regression models may fail to produce efficient estimators whereas median regression models may perform satisfactorily. In this ar...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2008.01105.x

    authors: Yi GY,He W

    更新日期:2009-06-01 00:00:00

  • Spatial cluster detection for weighted outcomes using cumulative geographic residuals.

    abstract::Spatial cluster detection is an important methodology for identifying regions with excessive numbers of adverse health events without making strong model assumptions on the underlying spatial dependence structure. Previous work has focused on point or individual-level outcome data and few advances have been made when ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2009.01323.x

    authors: Cook AJ,Li Y,Arterburn D,Tiwari RC

    更新日期:2010-09-01 00:00:00

  • On symmetric semiparametric two-sample problem.

    abstract::We consider a two-sample problem where data come from symmetric distributions. Usual two-sample data with only magnitudes recorded, arising from case-control studies or logistic discriminant analyses, may constitute a symmetric two-sample problem. We propose a semiparametric model such that, in addition to symmetry, t...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.13233

    authors: Li M,Diao G,Qin J

    更新日期:2020-12-01 00:00:00

  • Efficient regression analysis with ranked-set sampling.

    abstract::This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and ...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.0006-341X.2004.00255.x

    authors: Chen Z,Wang YG

    更新日期:2004-12-01 00:00:00

  • Doubly robust estimator for net survival rate in analyses of cancer registry data.

    abstract::Cancer population studies based on cancer registry databases are widely conducted to address various research questions. In general, cancer registry databases do not collect information on cause of death. The net survival rate is defined as the survival rate if a subject would not die for any causes other than cancer....

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/biom.12568

    authors: Komukai S,Hattori S

    更新日期:2017-03-01 00:00:00

  • Identification of differential aberrations in multiple-sample array CGH studies.

    abstract::Most existing methods for identifying aberrant regions with array CGH data are confined to a single target sample. Focusing on the comparison of multiple samples from two different groups, we develop a new penalized regression approach with a fused adaptive lasso penalty to accommodate the spatial dependence of the cl...

    journal_title:Biometrics

    pub_type: 杂志文章

    doi:10.1111/j.1541-0420.2010.01457.x

    authors: Wang HJ,Hu J

    更新日期:2011-06-01 00:00:00