Assessing local influence in principal component analysis with application to haematology study data.

Abstract:

:In many medical and health studies, high-dimensional data are often encountered. Principal component analysis (PCA) is a commonly used technique to reduce such data to a few components that includes most of the information provided by the original data. However, PCA is known to be very sensitive to some abnormal observations. Therefore, it is essential to assess such sensitivity in PCA. In this paper, the assessments of local influence based on generalized influence function are developed under the case-weights and additive perturbation schemes, along with a discussion of the perturbation scheme and the generalized influence function approach. When perturbing different variables of the data, it is noted that the directions of the largest joint local influence for the eigenvalues are all the same. Moreover, these directions are completely determined by the score values of the observations, to which an approximate cut-off point is given. The proposed methods are applied to analyse a set of haematology study data for illustration. Results add new insights in finding influential observations in the studied data set.

journal_name

Stat Med

journal_title

Statistics in medicine

authors

Fung WK,Gu H,Xiang L,Yau KK

doi

10.1002/sim.2747

subject

Has Abstract

pub_date

2007-06-15 00:00:00

pages

2730-44

issue

13

eissn

0277-6715

issn

1097-0258

journal_volume

26

pub_type

杂志文章
  • Tests for individual and population bioequivalence based on generalized p-values.

    abstract::The U.S. Food and Drug Administration (FDA) has proposed new regulations that address the 'prescribability' and 'switchability' of new formulations of already-approved drugs. These new criteria are known, respectively, as population and individual bioequivalence. Two methods have been proposed in the bioequivalence li...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1346

    authors: McNally RJ,Iyer H,Mathew T

    更新日期:2003-01-15 00:00:00

  • Testing for publication bias in diagnostic meta-analysis: a simulation study.

    abstract::The present study investigates the performance of several statistical tests to detect publication bias in diagnostic meta-analysis by means of simulation. While bivariate models should be used to pool data from primary studies in diagnostic meta-analysis, univariate measures of diagnostic accuracy are preferable for t...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6177

    authors: Bürkner PC,Doebler P

    更新日期:2014-08-15 00:00:00

  • Determining the value of additional surrogate exposure data for improving the estimate of an odds ratio.

    abstract::We consider the design of both cohort and case-control studies in which an initial ('stage 1') sample of complete data on an error-free disease indicator (D), a correct ('gold standard') dichotomous exposure measurement (X) and an error-prone exposure measurement (Z) are available. We calculate the amount of additiona...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780142307

    authors: Dahm PF,Gail MH,Rosenberg PS,Pee D

    更新日期:1995-12-15 00:00:00

  • Estimating time-dependent ROC curves using data under prevalent sampling.

    abstract::Prevalent sampling is frequently a convenient and economical sampling technique for the collection of time-to-event data and thus is commonly used in studies of the natural history of a disease. However, it is biased by design because it tends to recruit individuals with longer survival times. This paper considers est...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7184

    authors: Li S

    更新日期:2017-04-15 00:00:00

  • A penalized robust semiparametric approach for gene-environment interactions.

    abstract::In genetic and genomic studies, gene-environment (G×E) interactions have important implications. Some of the existing G×E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6609

    authors: Wu C,Shi X,Cui Y,Ma S

    更新日期:2015-12-30 00:00:00

  • Estimating age-related trends in cross-sectional studies using S-distributions.

    abstract::Growth trends in children are often based on cross-sectional studies, in which a sample of the population is investigated at one given point in time. Estimating age-related percentiles in such studies involves fitting data distributions, each of which is specific for one age group, and a subsequent smoothing of the pe...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(20000315)19:5<697::aid-sim

    authors: Sorribas A,March J,Voit EO

    更新日期:2000-03-15 00:00:00

  • Prospective epidemiological studies involving paired organs.

    abstract::Standard methods for analysing survival data or case-control data normally concern factors affecting a subject as a whole. However, in a study of a condition that might develop in one or both of a pair of bodily organs information on response and on covariates may be available for each separately. This information can...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780070509

    authors: Whitehead J,Dorse C

    更新日期:1988-05-01 00:00:00

  • Modelling of viral dynamics in hepatitis B and hepatitis C clinical trials.

    abstract::In the recent years, studies of hepatitis B (HBV) and hepatitis C virus (HCV) dynamics have drawn great attention as they provide insight into the process of virus elimination/production and of infected cells decay during antiviral treatment. Estimates of viral dynamic parameters may be used to determine the lifetime ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3457

    authors: Sypsa V,Hatzakis A

    更新日期:2008-12-30 00:00:00

  • A statistical assessment of clinical equivalence.

    abstract::An observed confidence distribution is proposed as a measure of strength of evidence for practically equivalent efficacies of two treatments. The concept is independent of prior opinions about relevant sizes of a difference in efficacy. It also avoids retrospective power calculations for trials with missed recruitment...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780071207

    authors: Mau J

    更新日期:1988-12-01 00:00:00

  • Latent transition analysis: inference and estimation.

    abstract::Parameters for latent transition analysis (LTA) are easily estimated by maximum likelihood (ML) or Bayesian method via Markov chain Monte Carlo (MCMC). However, unusual features in the likelihood can cause difficulties in ML and Bayesian inference and estimation, especially with small samples. In this study we explore...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3130

    authors: Chung H,Lanza ST,Loken E

    更新日期:2008-05-20 00:00:00

  • Methods for assessing reliability and validity for a measurement tool: a case study and critique using the WHO haemoglobin colour scale.

    abstract::Before introducing a new measurement tool it is necessary to evaluate its performance. Several statistical methods have been developed, or used, to evaluate the reliability and validity of a new assessment method in such circumstances. In this paper we review some commonly used methods. Data from a study that was cond...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1804

    authors: White SA,van den Broek NR

    更新日期:2004-05-30 00:00:00

  • Nonparametric comparison of two survival functions with dependent censoring via nonparametric multiple imputation.

    abstract::When the event time of interest depends on the censoring time, conventional two-sample test methods, such as the log-rank and Wilcoxon tests, can produce an invalid test result. We extend our previous work on estimation using auxiliary variables to adjust for dependent censoring via multiple imputation, to the compari...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3480

    authors: Hsu CH,Taylor JM

    更新日期:2009-02-01 00:00:00

  • A sensitivity analysis for subverting randomization in controlled trials.

    abstract::In some randomized controlled trials, subjects with a better prognosis may be diverted into the treatment group. This subverting of randomization results in an unobserved non-compliance with the originally intended treatment assignment. Consequently, the estimate of treatment effect from these trials may be biased. Th...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.715

    authors: Marcus SM

    更新日期:2001-02-28 00:00:00

  • A framework establishing clear decision criteria for the assessment of drug efficacy.

    abstract::Much has been published on various aspects of data analysis and reporting from clinical trials within the biopharmaceutical environment. This ranges from regulatory guidelines on the format and content of registration dossiers to recommendations on data presentation and the statistical methodologies that are appropria...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/(sici)1097-0258(19980815/30)17:15/16<1829:

    authors: Huster WJ,Enas GG

    更新日期:1998-08-15 00:00:00

  • Design and estimation in clinical trials with subpopulation selection.

    abstract::Population heterogeneity is frequently observed among patients' treatment responses in clinical trials because of various factors such as clinical background, environmental, and genetic factors. Different subpopulations defined by those baseline factors can lead to differences in the benefit or safety profile of a the...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7925

    authors: Chiu YD,Koenig F,Posch M,Jaki T

    更新日期:2018-12-20 00:00:00

  • Second-stage least squares versus penalized quasi-likelihood for fitting hierarchical models in epidemiologic analyses.

    abstract::Hierarchical regression analysis holds much promise for epidemiologic analysis, but has as yet seen limited application because of lack of easily used software and the relatively lengthy run times of preferred fitting methods (such as true maximum likelihood and Bayesian approaches). This paper compares three relative...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19970315)16:5<515::aid-sim

    authors: Greenland S

    更新日期:1997-03-15 00:00:00

  • Binary partitioning for continuous longitudinal data: categorizing a prognostic variable.

    abstract::We investigate a binary partitioning algorithm in the case of a continuous repeated measures outcome. The procedure is based on the use of the likelihood ratio statistic to evaluate the performance of individual splits. The procedure partitions a set of longitudinal data into two mutually exclusive groups based on an ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1266

    authors: Abdolell M,LeBlanc M,Stephens D,Harrison RV

    更新日期:2002-11-30 00:00:00

  • Randomization tests for multiarmed randomized clinical trials.

    abstract::We examine the use of randomization-based inference for analyzing multiarmed randomized clinical trials, including the application of conditional randomization tests to multiple comparisons. The view is taken that the linkage of the statistical test to the experimental design (randomization procedure) should be recogn...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8418

    authors: Wang Y,Rosenberger WF,Uschner D

    更新日期:2020-02-20 00:00:00

  • Some considerations in the analysis of rates of change in longitudinal studies.

    abstract::This paper discusses and compares several estimators of mean rate of change in unbalanced longitudinal data based on a model with randomly distributed regression coefficients across individuals. The estimators are unweighted and weighted means of these coefficients. The paper also evaluates commonly used variance esti...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780060509

    authors: Palta M,Cook T

    更新日期:1987-07-01 00:00:00

  • Estimation of the mediation effect with a binary mediator.

    abstract::A mediator acts as a third variable in the causal pathway between a risk factor and an outcome. In this paper, we consider the estimation of the mediation effect when the mediator is a binary variable. We give a precise definition of the mediation effect and examine asymptotic properties of five different estimators o...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2730

    authors: Li Y,Schneider JA,Bennett DA

    更新日期:2007-08-15 00:00:00

  • Bayesian sensitivity analysis of incomplete data: bridging pattern-mixture and selection models.

    abstract::Pattern-mixture models (PMM) and selection models (SM) are alternative approaches for statistical analysis when faced with incomplete data and a nonignorable missing-data mechanism. Both models make empirically unverifiable assumptions and need additional constraints to identify the parameters. Here, we first introduc...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6302

    authors: Kaciroti NA,Raghunathan T

    更新日期:2014-11-30 00:00:00

  • Survival analyses of randomized clinical trials adjusted for patients who switch treatments.

    abstract::Patients who switch treatment groups in randomized clinical trials can cause problems in the interpretation of the results. Although the intention-to-treat method is recognized as being the most reliable analysis, it may result in an underestimate of the treatment effect if there have been patients who switch treatmen...

    journal_title:Statistics in medicine

    pub_type: 临床试验,杂志文章,随机对照试验

    doi:10.1002/(SICI)1097-0258(19961015)15:19<2069::AID-S

    authors: Law MG,Kaldor JM

    更新日期:1996-10-15 00:00:00

  • Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data.

    abstract::The general linear mixed model provides a useful approach for analysing a wide variety of data structures which practising statisticians often encounter. Two such data structures which can be problematic to analyse are unbalanced repeated measures data and longitudinal data. Owing to recent advances in methods and sof...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/(sici)1097-0258(19971030)16:20<2349::aid-s

    authors: Cnaan A,Laird NM,Slasor P

    更新日期:1997-10-30 00:00:00

  • Signal detection in FDA AERS database using Dirichlet process.

    abstract::In the recent two decades, data mining methods for signal detection have been developed for drug safety surveillance, using large post-market safety data. Several of these methods assume that the number of reports for each drug-adverse event combination is a Poisson random variable with mean proportional to the unknow...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6510

    authors: Hu N,Huang L,Tiwari RC

    更新日期:2015-08-30 00:00:00

  • Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.

    abstract::Logistic regression analysis may well be used to develop a prognostic model for a dichotomous outcome. Especially when limited data are available, it is difficult to determine an appropriate selection of covariables for inclusion in such models. Also, predictions may be improved by applying some sort of shrinkage in t...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(20000430)19:8<1059::aid-si

    authors: Steyerberg EW,Eijkemans MJ,Harrell FE Jr,Habbema JD

    更新日期:2000-04-30 00:00:00

  • Statistical methods for multivariate interval-censored recurrent events.

    abstract::Multi-type recurrent event data arise when two or more different kinds of events may occur repeatedly over a period of observation. The scientific objectives in such settings are often to describe features of the marginal processes and to study the association between the different types of events. Interval-censored m...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1936

    authors: Chen BE,Cook RJ,Lawless JF,Zhan M

    更新日期:2005-03-15 00:00:00

  • Parameterization of treatment effects for meta-analysis in multi-state Markov models.

    abstract::Standard approaches to analysis of randomized controlled trials (RCTs) using Markov models make it difficult to generalize treatment effects to new patient groups and synthesize evidence across trials. This paper demonstrates how pair-wise and mixed treatment comparison meta-analysis can be applied to event history da...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4059

    authors: Price MJ,Welton NJ,Ades AE

    更新日期:2011-01-30 00:00:00

  • Chronic disease prevention: public health potential and research needs.

    abstract::This paper, arising out of an event to honour the statistical and scientific contributions of Professor Peter Armitage, is concerned with research strategies and needs for chronic disease prevention. A few highlights from recent intervention trials for the prevention of cancer, cardiovascular disease, fractures and di...

    journal_title:Statistics in medicine

    pub_type:

    doi:10.1002/sim.2045

    authors: Prentice RL

    更新日期:2004-11-30 00:00:00

  • Data-adaptive additive modeling.

    abstract::In this paper, we consider fitting a flexible and interpretable additive regression model in a data-rich setting. We wish to avoid pre-specifying the functional form of the conditional association between each covariate and the response, while still retaining interpretability of the fitted functions. A number of recen...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7859

    authors: Petersen A,Witten D

    更新日期:2019-02-20 00:00:00

  • Stochastic approximation EM for large-scale exploratory IRT factor analysis.

    abstract::A stochastic approximation EM algorithm (SAEM) is described for exploratory factor analysis of dichotomous or ordinal variables. The factor structure is obtained from sufficient statistics that are updated during iterations with the Robbins-Monro procedure. Two large-scale simulations are reported that compare accurac...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.8217

    authors: Camilli G,Geis E

    更新日期:2019-09-20 00:00:00