Abstract:
:In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may not be suitable to apply these alignment-free methods directly to existing statistical classification methods, because an appropriate statistical classification theory for integrating with the alignment-free representation methods is still lacking. In this article, we propose a discriminant analysis method which uses the discrete wavelet packet transform to classify whole genome sequences. The proposed alignment-free representation statistics of features follow a joint normal distribution asymptotically. The data analysis results indicate that the proposed method provides satisfactory classification results in real time.
journal_name
Stat Appl Genet Mol Biolauthors
Huang HH,Girimurugan SBdoi
10.1515/sagmb-2018-0045subject
Has Abstractpub_date
2019-02-15 00:00:00issue
2eissn
2194-6302issn
1544-6115pii
/j/sagmb.ahead-of-print/sagmb-2018-0045/sagmb-2018journal_volume
18pub_type
杂志文章abstract::The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2018-0065
更新日期:2019-12-12 00:00:00
abstract::Accurately measuring epigenetic marks such as 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) at the single-nucleotide level, requires combining data from DNA processing methods including traditional (BS), oxidative (oxBS) or Tet-Assisted (TAB) bisulfite conversion. We introduce the R package MLML2R, which...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2018-0031
更新日期:2019-01-17 00:00:00
abstract::In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to exp...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1426
更新日期:2009-01-01 00:00:00
abstract::Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a com...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2013-0021
更新日期:2013-12-01 00:00:00
abstract::An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction tech...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1147
更新日期:2006-01-01 00:00:00
abstract::In this paper, we explore the use of M-quantile regression and M-quantile coefficients to detect statistical differences between temporal curves that belong to different experimental conditions. In particular, we consider the application of temporal gene expression data. Here, the aim is to detect genes whose temporal...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1452
更新日期:2009-01-01 00:00:00
abstract::We present a Bayesian hierarchical model for detecting differentially expressed genes using a mixture prior on the parameters representing differential effects. We formulate an easily interpretable 3-component mixture to classify genes as over-expressed, under-expressed and non-differentially expressed, and model gene...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1314
更新日期:2007-01-01 00:00:00
abstract::Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded wit...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2018-0039
更新日期:2019-05-11 00:00:00
abstract::Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth index...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1036
更新日期:2004-01-01 00:00:00
abstract::Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this re...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1501
更新日期:2009-01-01 00:00:00
abstract::Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established conce...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2017-0038
更新日期:2018-02-17 00:00:00
abstract::Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. This condition violates the assumptions underlying classic genetic analysis and may lead to failure of such analysis. In this work we extend the statistical model used for genetic linkage analysis in order to incorp...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1709
更新日期:2011-10-04 00:00:00
abstract::With the increasing availability of experimental data on gene interactions, modeling of gene regulatory pathways has gained special attention. Gradient descent algorithms have been widely used for regression and classification applications. Unfortunately, results obtained after training a model by gradient descent are...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2012-0021
更新日期:2014-02-01 00:00:00
abstract::Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the e...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1148
更新日期:2006-01-01 00:00:00
abstract::The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebr...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1286
更新日期:2007-01-01 00:00:00
abstract::In this study, we propose a novel statistical framework for detecting progressive changes in molecular traits as response to a pathogenic stimulus. In particular, we propose to employ Bayesian hierarchical models to analyse changes in mean level, variance and correlation of metabolic traits in relation to covariates. ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2013-0041
更新日期:2014-04-01 00:00:00
abstract::In cellular biology, node-and-edge graph or "network" data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or "interactions" between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data freque...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2015-0007
更新日期:2015-08-01 00:00:00
abstract::Multiple branching trees have been used to model the acquisition of HIV drug resistance mutations, and several different algorithms have been developed to construct the tree set that best describes the data. These algorithms have mainly focused on the structure of the tree set. The focal point of this paper is estimat...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1324
更新日期:2008-01-01 00:00:00
abstract::Multiple testing procedures are commonly used in gene expression studies for the detection of differential expression, where typically thousands of genes are measured over at least two experimental conditions. Given the need for powerful testing procedures, and the attendant danger of false positives in multiple testi...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1302
更新日期:2007-01-01 00:00:00
abstract::We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1462
更新日期:2009-01-01 00:00:00
abstract::Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognitio...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1266
更新日期:2007-01-01 00:00:00
abstract::Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN metho...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1054
更新日期:2004-01-01 00:00:00
abstract::The approach adopted involved two-stages. First the 11205 measurements in the mass spectrometry data were reduced to 14 scores by a principal component analysis of the centered but otherwise untreated and unscaled data matrix. Then a linear classifier was derived by linear discriminant analysis using these 14 scores a...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1350
更新日期:2008-01-01 00:00:00
abstract::Several important syndromes are caused by deleterious germline mutations of individual genes. In both clinical and research applications it is useful to evaluate the probability that an individual carries an inherited genetic variant of these genes, and to predict the risk of disease for that individual, using informa...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1063
更新日期:2004-01-01 00:00:00
abstract::We address a potential shortcoming of three probabilistic models for detecting interspecific recombination in DNA sequence alignments: the multiple change-point model (MCP) of Suchard et al. (2003), the dual multiple change-point model (DMCP) of Minin et al. (2005), and the phylogenetic factorial hidden Markov model (...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1399
更新日期:2008-01-01 00:00:00
abstract::In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account ...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1729
更新日期:2011-01-01 00:00:00
abstract::Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, fo...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2017-0060
更新日期:2019-02-13 00:00:00
abstract::Usually, a pedigree is sampled and included in the sample that is analyzed after following a predefined non-random sampling design comprising several specific procedures. To obtain a pedigree analysis result free from the bias caused by the sampling procedures, a correction is applied to the pedigree likelihood. The s...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1003
更新日期:2003-01-01 00:00:00
abstract::The genetic control of a complex trait can be studied by testing and mapping the genotypes of the underlying quantitative trait loci (QTLs) through their associations with observable marker genotypes. All existing statistical methods for QTL mapping assume an equilibrium population, allowing marker-QTL associations to...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.2202/1544-6115.1578
更新日期:2010-01-01 00:00:00
abstract::Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, n...
journal_title:Statistical applications in genetics and molecular biology
pub_type: 杂志文章
doi:10.1515/sagmb-2017-0058
更新日期:2018-07-31 00:00:00