Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data.

Abstract:

:The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non-tumor tissues or between pre-treatment and post-treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network-based penalty that encourages a grouping effect of (1) linked Cytosine-phosphate-Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high-dimensional variable selection problems for matched case-control data. We further investigated the benefits of utilizing biological group or graph information for matched case-control data. We applied the proposed method to a genome-wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non-tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper.

journal_name

Stat Med

journal_title

Statistics in medicine

authors

Sun H,Wang S

doi

10.1002/sim.5694

subject

Has Abstract

pub_date

2013-05-30 00:00:00

pages

2127-39

issue

12

eissn

0277-6715

issn

1097-0258

journal_volume

32

pub_type

杂志文章
  • A multivariate Bayesian model for embryonic growth.

    abstract::Most longitudinal growth curve models evaluate the evolution of each of the anthropometric measurements separately. When applied to a 'reference population', this exercise leads to univariate reference curves against which new individuals can be evaluated. However, growth should be evaluated in totality, that is, by e...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6411

    authors: Willemsen SP,Eilers PH,Steegers-Theunissen RP,Lesaffre E

    更新日期:2015-04-15 00:00:00

  • Parametric randomization-based methods for correcting for treatment changes in the assessment of the causal effect of treatment.

    abstract::We develop parametric maximum likelihood methods to adjust for treatment changes during follow-up in order to assess the causal effect of treatment in clinical trials with time-to-event outcomes. The accelerated failure time model of Robins and Tsiatis relates each observed event time to the underlying event time that...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1618

    authors: Walker AS,White IR,Babiker AG

    更新日期:2004-02-28 00:00:00

  • CoPlot: a tool for visualizing multivariate data in medicine.

    abstract::Many critical questions in medicine require the analysis of complex multivariate data, often from large data sets describing numerous variables for numerous subjects. In this paper, we describe CoPlot, a tool for visualizing multivariate data in medicine. CoPlot is an adaptation of multidimensional scaling (MDS) that ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3078

    authors: Bravata DM,Shojania KG,Olkin I,Raveh A

    更新日期:2008-05-30 00:00:00

  • The power prior: theory and applications.

    abstract::The power prior has been widely used in many applications covering a large number of disciplines. The power prior is intended to be an informative prior constructed from historical data. It has been used in clinical trials, genetics, health care, psychology, environmental health, engineering, economics, and business. ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6728

    authors: Ibrahim JG,Chen MH,Gwon Y,Chen F

    更新日期:2015-12-10 00:00:00

  • Stochastically curtailed phase II clinical trials.

    abstract::Phase II trials often test the null hypothesis H(0): p or=p(1), where p is the true unknown proportion responding to the new treatment, p(0) is the greatest response proportion which is deemed clinically ineffective, and p(1) is the smallest response proportion which is deemed clinically effe...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2653

    authors: Ayanlowo AO,Redden DT

    更新日期:2007-03-30 00:00:00

  • Reducing cost in sequential testing: a limit of indifference approach.

    abstract::In noninferiority studies, a limit of indifference is used to express a tolerance in results such that the clinician would regard such results as being acceptable or 'not worse'. We applied this concept to a measure of accuracy, the Receiver Operating Characteristic (ROC) curve, for a sequence of tests. We expressed a...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5741

    authors: Ahmed AE,Schubert CM,McClish DK

    更新日期:2013-07-20 00:00:00

  • Graphical model checking with correlated response data.

    abstract::Correlated response data arise often in biomedical studies. The generalized estimation equation (GEE) approach is widely used in regression analysis for such data. However, there are few methods available to check the adequacy of regression models in GEE. In this paper, a graphical method is proposed based on Cook and...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.889

    authors: Pan W,Connett JE,Porzio GC,Weisberg S

    更新日期:2001-10-15 00:00:00

  • A semi-Bayes approach to the analysis of correlated multiple associations, with an application to an occupational cancer-mortality study.

    abstract::Thomas et al. presented the application of empirical-Bayes methods to the problem of multiple inference in epidemiologic studies. One limitation of their approach, which they noted, was the need to assume exchangeable log relative-risk parameters, and independent relative-risk estimates. Numerical integration was also...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780110208

    authors: Greenland S

    更新日期:1992-01-30 00:00:00

  • Comparison of operational characteristics for binary tests with clustered data.

    abstract::Although statistical methodology is well-developed for comparing diagnostic tests in terms of their sensitivity and specificity, comparative inference about predictive values is not. In this paper, we consider the analysis of studies comparing operating characteristics of two diagnostic tests that are measured on all ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6485

    authors: Kwak M,Um SW,Jung SH

    更新日期:2015-07-10 00:00:00

  • Estimation of the population effectiveness of vaccination.

    abstract::This paper presents a simple method for estimation of population vaccination effectiveness, which is the fraction of disease cases prevented by a vaccination programme. The method is based on the susceptible-infectious-recovered (SIR) model for the spread of an epidemic in a heterogeneous population under non-homogene...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19970330)16:6<601::aid-sim

    authors: Haber M

    更新日期:1997-03-30 00:00:00

  • Multi-state models for colon cancer recurrence and death with a cured fraction.

    abstract::In cancer clinical trials, patients often experience a recurrence of disease prior to the outcome of interest, overall survival. Additionally, for many cancers, there is a cured fraction of the population who will never experience a recurrence. There is often interest in how different covariates affect the probability...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6056

    authors: Conlon AS,Taylor JM,Sargent DJ

    更新日期:2014-05-10 00:00:00

  • Additive and multiplicative covariate regression models for relative survival incorporating fractional polynomials for time-dependent effects.

    abstract::Relative survival is used to estimate patient survival excluding causes of death not related to the disease of interest. Rather than using cause of death information from death certificates, which is often poorly recorded, relative survival compares the observed survival to that expected in a matched group from the ge...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2399

    authors: Lambert PC,Smith LK,Jones DR,Botha JL

    更新日期:2005-12-30 00:00:00

  • Economic evaluation of factorial randomised controlled trials: challenges, methods and recommendations.

    abstract::Increasing numbers of economic evaluations are conducted alongside randomised controlled trials. Such studies include factorial trials, which randomise patients to different levels of two or more factors and can therefore evaluate the effect of multiple treatments alone and in combination. Factorial trials can provide...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7322

    authors: Dakin H,Gray A

    更新日期:2017-08-15 00:00:00

  • Methodological pitfalls in the analysis of contraceptive failure.

    abstract::Although the literature on contraceptive failure is vast and is expanding rapidly, our understanding of the relative efficacy of methods is quite limited because of defects in the research design and in the analytical tools used by investigators. Errors in the literature range from simple arithmetical mistakes to outr...

    journal_title:Statistics in medicine

    pub_type: 杂志文章,评审

    doi:10.1002/sim.4780100206

    authors: Trussell J

    更新日期:1991-02-01 00:00:00

  • Testing the equality of two survival functions with right truncated data.

    abstract::To compare the survival functions based on right-truncated data, Lagakos et al. proposed a weighted logrank test based on a reverse time scale. This is in contrast to Bilker and Wang, who suggested a semi-parametric version of the Mann-Whitney test by assuming that the distribution of truncation times is known or can ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2556

    authors: Chi Y,Tsai WY,Chiang CL

    更新日期:2007-02-20 00:00:00

  • Multiple statistics for multiple events, with application to repeated infections in the growth factor studies.

    abstract::Clinical studies that involve the recording of two or more distinct and well-defined events on each subject give rise to multiple event data. Treatment comparisons are usually reported in univariate analyses of time to first event or number of events observed. However, this approach may not uncover the 'full story' of...

    journal_title:Statistics in medicine

    pub_type: 临床试验,杂志文章,多中心研究,随机对照试验

    doi:10.1002/(sici)1097-0258(19970430)16:8<941::aid-sim

    authors: Barai U,Teoh N

    更新日期:1997-04-30 00:00:00

  • Covariate adjusted weighted normal spatial scan statistics with applications to study geographic clustering of obesity and lung cancer mortality in the United States.

    abstract::In the field of cluster detection, a weighted normal model-based scan statistic was recently developed to analyze regional continuous data and to evaluate the clustering pattern of pre-defined cells (such as state, county, tract, school, hospital) that include many individuals. The continuous measures of interest are,...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.3990

    authors: Huang L,Tiwari RC,Pickle LW,Zou Z

    更新日期:2010-10-15 00:00:00

  • How should meta-regression analyses be undertaken and interpreted?

    abstract::Appropriate methods for meta-regression applied to a set of clinical trials, and the limitations and pitfalls in interpretation, are insufficiently recognized. Here we summarize recent research focusing on these issues, and consider three published examples of meta-regression in the light of this work. One principal m...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1187

    authors: Thompson SG,Higgins JP

    更新日期:2002-06-15 00:00:00

  • Adjusted Kaplan-Meier estimator and log-rank test with inverse probability of treatment weighting for survival data.

    abstract::Estimation and group comparison of survival curves are two very common issues in survival analysis. In practice, the Kaplan-Meier estimates of survival functions may be biased due to unbalanced distribution of confounders. Here we develop an adjusted Kaplan-Meier estimator (AKME) to reduce confounding effects using in...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.2174

    authors: Xie J,Liu C

    更新日期:2005-10-30 00:00:00

  • Estimating incidence of dementia subtypes: assessing the impact of missed cases.

    abstract::In many community-based studies on the incidence of dementia, a target population is screened and a subsample is clinically evaluated at baseline and follow-up. Incidence rates are affected by missed cases at both exams and this complicates the estimation of these rates. Recent work proposes a regression-based techniq...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(20000615/30)19:11/12<1577:

    authors: Izmirlian G,Brock D,White L

    更新日期:2000-06-15 00:00:00

  • A special case of reduced rank models for identification and modelling of time varying effects in survival analysis.

    abstract::Flexible survival models are in need when modelling data from long term follow-up studies. In many cases, the assumption of proportionality imposed by a Cox model will not be valid. Instead, a model that can identify time varying effects of fixed covariates can be used. Although there are several approaches that deal ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.7088

    authors: Perperoglou A

    更新日期:2016-12-10 00:00:00

  • A sexually transmitted infection screening algorithm based on semiparametric regression models.

    abstract::Sexually transmitted infections (STIs) with Chlamydia trachomatis, Neisseria gonorrhoeae, and Trichomonas vaginalis are among the most common infectious diseases in the United States, disproportionately affecting young women. Because a significant portion of the infections present no symptoms, infection control relies...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6515

    authors: Li Z,Liu H,Tu W

    更新日期:2015-09-10 00:00:00

  • Variable selection in covariate dependent random partition models: an application to urinary tract infection.

    abstract::Lower urinary tract symptoms can indicate the presence of urinary tract infection (UTI), a condition that if it becomes chronic requires expensive and time consuming care as well as leading to reduced quality of life. Detecting the presence and gravity of an infection from the earliest symptoms is then highly valuable...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6786

    authors: Barcella W,Iorio MD,Baio G,Malone-Lee J

    更新日期:2016-04-15 00:00:00

  • Fast linear mixed model computations for genome-wide association studies with longitudinal data.

    abstract::Genome-wide association studies are characterized by a huge number of statistical tests performed to discover new disease-related genetic variants [in the form of single-nucleotide polymorphisms (SNPs)] in human DNA. Many SNPs have been identified for cross-sectionally measured phenotypes. However, there is a growing ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5517

    authors: Sikorska K,Rivadeneira F,Groenen PJ,Hofman A,Uitterlinden AG,Eilers PH,Lesaffre E

    更新日期:2013-01-15 00:00:00

  • Assessing goodness-of-fit of parametric regression models for lifetime data-graphical methods.

    abstract::Graphical methods are often used to check goodness-of-fit of models to data. It is common to plot residuals against a reference distribution so that when the model fits the data, the configuration should be close to a straight line. Since the resemblance to a straight line is often unclear, it has been suggested to ad...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.4780141607

    authors: Cohen A,Barnett O

    更新日期:1995-08-30 00:00:00

  • Nonparametric regression of state occupation, entry, exit, and waiting times with multistate right-censored data.

    abstract::We construct nonparametric regression estimators of a number of temporal functions in a multistate system based on a continuous univariate baseline covariate. These estimators include state occupation probabilities, state entry, exit, and waiting (sojourn) time distribution functions of a general progressive (e.g., ac...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5703

    authors: Mostajabi F,Datta S

    更新日期:2013-07-30 00:00:00

  • A Markov mixed effect regression model for drug compliance.

    abstract::Patient compliance (adherence) with prescribed medication is often erratic, while clinical outcomes are causally linked to actual, rather than nominal medication dosage. We propose here a hierarchical Markov model for patient compliance. At the first stage, conditional upon individual random effects and a set of indiv...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/(sici)1097-0258(19981030)17:20<2313::aid-s

    authors: Girard P,Blaschke TF,Kastrissios H,Sheiner LB

    更新日期:1998-10-30 00:00:00

  • An improved algorithm for outbreak detection in multiple surveillance systems.

    abstract::In England and Wales, a large-scale multiple statistical surveillance system for infectious disease outbreaks has been in operation for nearly two decades. This system uses a robust quasi-Poisson regression algorithm to identify abberrances in weekly counts of isolates reported to the Health Protection Agency. In this...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.5595

    authors: Noufaily A,Enki DG,Farrington P,Garthwaite P,Andrews N,Charlett A

    更新日期:2013-03-30 00:00:00

  • Correcting for the dependent competing risk of treatment using inverse probability of censoring weighting and copulas in the estimation of natural conception chances.

    abstract::When estimating the probability of natural conception from observational data on couples with an unfulfilled child wish, the start of assisted reproductive therapy (ART) is a competing event that cannot be assumed to be independent of natural conception. In clinical practice, interest lies in the probability of natura...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.6280

    authors: van Geloven N,Geskus RB,Mol BW,Zwinderman AH

    更新日期:2014-11-20 00:00:00

  • Design and analysis of non-inferiority mortality trials in oncology.

    abstract::The recent revision of the Declaration of Helsinki and the existence of many new therapies that affect survival or serious morbidity, and that therefore cannot be denied patients, have generated increased interest in active-control trials, particularly those intended to show equivalence or non-inferiority to the activ...

    journal_title:Statistics in medicine

    pub_type: 杂志文章

    doi:10.1002/sim.1400

    authors: Rothmann M,Li N,Chen G,Chi GY,Temple R,Tsou HH

    更新日期:2003-01-30 00:00:00