Abstract:
:There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data. In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection simultaneously. We propose the Buckley-James boosting for the semiparametric accelerated failure time models with right censored survival data, which can be used to predict survival of future patients using the high-dimensional genomic data. In the spirit of adaptive LASSO, twin boosting is also incorporated to fit more sparse models. The proposed methods have a unified approach to fit linear models, non-linear effects models with possible interactions. The methods can perform variable selection and parameter estimation simultaneously. The proposed methods are evaluated by simulations and applied to a recent microarray gene expression data set for patients with diffuse large B-cell lymphoma under the current gold standard therapy.
journal_name
Stat Appl Genet Mol Biolauthors
Wang Z,Wang CYdoi
10.2202/1544-6115.1550subject
Has Abstractpub_date
2010-01-01 00:00:00pages
Article24eissn
2194-6302issn
1544-6115journal_volume
9pub_type
杂志文章abstract::Many gene- and pathway-based association tests have been proposed in the literature. Among them, the SKAT is widely used, especially for rare variants association studies. In this paper, we investigate the connection between SKAT and a principal component analysis. This investigation leads to a procedure that encompas...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2016-0061
更新日期:2017-07-26 00:00:00
abstract::Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these m...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1666
更新日期:2011-01-01 00:00:00
abstract::We are concerned with statistical inference for 2 × C × K contingency tables in the context of genetic case-control association studies. Multivariate methods based on asymptotic Gaussianity of vectors of test statistics require information about the asymptotic correlation structure among these test statistics under th...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2015-0024
更新日期:2015-11-01 00:00:00
abstract::The ENCODE project has funded the generation of a diverse collection of methylation profiles using reduced representation bisulfite sequencing (RRBS) technology, enabling the analysis of epigenetic variation on a genomic scale at single-site resolution. A standard application of RRBS experiments is in the location of ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2013-0027
更新日期:2013-12-01 00:00:00
abstract::The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebr...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1286
更新日期:2007-01-01 00:00:00
abstract::We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1462
更新日期:2009-01-01 00:00:00
abstract::Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognitio...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1266
更新日期:2007-01-01 00:00:00
abstract::Combining correlated p-values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p-values from individual component problems into a single unified p-value are unsuitable for the cor...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2019-0057
更新日期:2020-11-06 00:00:00
abstract::In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to exp...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1426
更新日期:2009-01-01 00:00:00
abstract::In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe dire...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章,评审
doi:10.2202/1544-6115.1296
更新日期:2007-01-01 00:00:00
abstract::We present a Bayesian hierarchical model for detecting differentially expressed genes using a mixture prior on the parameters representing differential effects. We formulate an easily interpretable 3-component mixture to classify genes as over-expressed, under-expressed and non-differentially expressed, and model gene...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1314
更新日期:2007-01-01 00:00:00
abstract::In this study, we propose a novel statistical framework for detecting progressive changes in molecular traits as response to a pathogenic stimulus. In particular, we propose to employ Bayesian hierarchical models to analyse changes in mean level, variance and correlation of metabolic traits in relation to covariates. ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2013-0041
更新日期:2014-04-01 00:00:00
abstract::In the exploding field of gene expression techniques such as DNA microarrays, there are still few general probabilistic methods for analysis of variance. Linear models and ANOVA are heavily used tools in many other disciplines of scientific research. The usual F-statistic is unsatisfactory for microarray data, which e...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1125
更新日期:2005-01-01 00:00:00
abstract::In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may no...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2018-0045
更新日期:2019-02-15 00:00:00
abstract::Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, fo...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2017-0060
更新日期:2019-02-13 00:00:00
abstract::Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established conce...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2017-0038
更新日期:2018-02-17 00:00:00
abstract::Recent analytic and technological breakthroughs have set the stage for genome-wide linkage disequilibrium studies to map disease-susceptibility variants. This paper discusses a probabilistic methodology for making disease-mapping inferences in large-scale case-control genetic studies. The semi-Bayesian approach promot...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1168
更新日期:2005-01-01 00:00:00
abstract::In many population genetic problems, parameter estimation is obstructed by an intractable likelihood function. Therefore, approximate estimation methods have been developed, and with growing computational power, sampling-based methods became popular. However, these methods such as Approximate Bayesian Computation (ABC...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2017-0016
更新日期:2017-11-27 00:00:00
abstract::High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite di...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2014-0093
更新日期:2015-06-01 00:00:00
abstract::We present a weighted-LASSO method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own prior internal structures of connectivity which drive the inference method. This ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1519
更新日期:2010-01-01 00:00:00
abstract::The approach adopted involved two-stages. First the 11205 measurements in the mass spectrometry data were reduced to 14 scores by a principal component analysis of the centered but otherwise untreated and unscaled data matrix. Then a linear classifier was derived by linear discriminant analysis using these 14 scores a...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1350
更新日期:2008-01-01 00:00:00
abstract::In cellular biology, node-and-edge graph or "network" data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or "interactions" between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data freque...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2015-0007
更新日期:2015-08-01 00:00:00
abstract::Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. This condition violates the assumptions underlying classic genetic analysis and may lead to failure of such analysis. In this work we extend the statistical model used for genetic linkage analysis in order to incorp...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1709
更新日期:2011-10-04 00:00:00
abstract::In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1729
更新日期:2011-01-01 00:00:00
abstract::We address a potential shortcoming of three probabilistic models for detecting interspecific recombination in DNA sequence alignments: the multiple change-point model (MCP) of Suchard et al. (2003), the dual multiple change-point model (DMCP) of Minin et al. (2005), and the phylogenetic factorial hidden Markov model (...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1399
更新日期:2008-01-01 00:00:00
abstract::We present an approach to association studies involving a dozen or so ;response' variables and a few hundred ;explanatory' variables which emphasizes transparency, simplicity, and protection against spurious results. The methods proposed are largely non-parametric, and they are systematically rounded-off by the Benjam...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1420
更新日期:2009-01-01 00:00:00
abstract::Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this re...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1501
更新日期:2009-01-01 00:00:00
abstract::Several important syndromes are caused by deleterious germline mutations of individual genes. In both clinical and research applications it is useful to evaluate the probability that an individual carries an inherited genetic variant of these genes, and to predict the risk of disease for that individual, using informa...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1063
更新日期:2004-01-01 00:00:00
abstract::The objective of the present paper is to develop a truly functional Bayesian method specifically designed for time series microarray data. The method allows one to identify differentially expressed genes in a time-course microarray experiment, to rank them and to estimate their expression profiles. Each gene expressio...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1299
更新日期:2007-01-01 00:00:00
abstract::An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction tech...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1147
更新日期:2006-01-01 00:00:00