当前位置： SCI文献检索 > Statistical Applications in Genetics and Molecular Biology期刊下所有文献 > Combining nearest neighbor classifiers versus cross-validation selection.

Combining nearest neighbor classifiers versus cross-validation selection.

Abstract：

:Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in choosing the best candidate classifier. As an alternative to selecting a single "winner,'' we propose a weighting method to combine the multiple NN rules. Four gene expression data sets are used to compare its performance with CV methods. The results show that when the CV selection is unstable, the combined classifier performs much better.

journal_name

Stat Appl Genet Mol Biol

journal_title

Statistical applications in genetics and molecular biology

authors

Paik M,Yang Y

doi

10.2202/1544-6115.1054

subject

Has Abstract

pub_date

2004-01-01 00:00:00

pages

Article12

eissn

2194-6302

issn

1544-6115

journal_volume

3

pub_type

杂志文章

相关文献

Statistical Applications in Genetics and Molecular Biology文献大全

On an extended interpretation of linkage disequilibrium in genetic case-control association studies.
abstract：:We are concerned with statistical inference for 2 × C × K contingency tables in the context of genetic case-control association studies. Multivariate methods based on asymptotic Gaussianity of vectors of test statistics require information about the asymptotic correlation structure among these test statistics under th...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2015-0024

authors： Dickhaus T,Stange J,Demirhan H

更新日期：2015-11-01 00:00:00
Sparse inverse of covariance matrix of QTL effects with incomplete marker data.
abstract：:Gametic models for fitting breeding values at QTL as random effects in outbred populations have become popular because they require few assumptions about the number and distribution of QTL alleles segregating. The covariance matrix of the gametic effects has an inverse that is sparse and can be constructed rapidly by ...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1048

authors： Thallman RM,Hanford KJ,Kachman SD,Van Vleck LD

更新日期：2004-01-01 00:00:00
Approximating the variance of the conditional probability of the state of a hidden Markov model.
abstract：:In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe dire...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章,评审

doi：10.2202/1544-6115.1296

authors： Siegmund DO,Yakir B

更新日期：2007-01-01 00:00:00
Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics.
abstract：:The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference ...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2018-0065

authors： Crook OM,Gatto L,Kirk PDW

更新日期：2019-12-12 00:00:00
Node sampling for protein complex estimation in bait-prey graphs.
abstract：:In cellular biology, node-and-edge graph or "network" data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or "interactions" between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data freque...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2015-0007

authors： Scholtens DM,Spencer BD

更新日期：2015-08-01 00:00:00
LCox: a tool for selecting genes related to survival outcomes using longitudinal gene expression data.
abstract：:Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, fo...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2017-0060

authors： Sun J,Herazo-Maya JD,Wang JL,Kaminski N,Zhao H

更新日期：2019-02-13 00:00:00
TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists.
abstract：:High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite di...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2014-0093

authors： Schimek MG,Budinská E,Kugler KG,Švendová V,Ding J,Lin S

更新日期：2015-06-01 00:00:00
Buckley-James boosting for survival analysis with high-dimensional biomarker data.
abstract：:There has been increasing interest in predicting patients' survival after therapy by investigating gene expression microarray data. In the regression and classification models with high-dimensional genomic data, boosting has been successfully applied to build accurate predictive models and conduct variable selection s...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1550

authors： Wang Z,Wang CY

更新日期：2010-01-01 00:00:00
Combining dependent p-values by gamma distributions.
abstract：:Combining correlated p-values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p-values from individual component problems into a single unified p-value are unsuitable for the cor...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2019-0057

authors： Chien LC

更新日期：2020-11-06 00:00:00
A Bayesian approach to estimation and testing in time-course microarray experiments.
abstract：:The objective of the present paper is to develop a truly functional Bayesian method specifically designed for time series microarray data. The method allows one to identify differentially expressed genes in a time-course microarray experiment, to rank them and to estimate their expression profiles. Each gene expressio...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1299

authors： Angelini C,De Canditiis D,Mutarelli M,Pensky M

更新日期：2007-01-01 00:00:00
Discrete Wavelet Packet Transform Based Discriminant Analysis for Whole Genome Sequences.
abstract：:In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may no...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2018-0045

authors： Huang HH,Girimurugan SB

更新日期：2019-02-15 00:00:00
Dimension reduction for classification with gene expression microarray data.
abstract：:An important application of gene expression microarray data is classification of biological samples or prediction of clinical and other outcomes. One necessary part of multivariate statistical analysis in such applications is dimension reduction. This paper provides a comparison study of three dimension reduction tech...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1147

authors： Dai JJ,Lieu L,Rocke D

更新日期：2006-01-01 00:00:00
Likelihood-based inference for multi-color optical mapping.
abstract：:Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognitio...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1266

authors： Tong L,Mets L,McPeek MS

更新日期：2007-01-01 00:00:00
A probabilistic approach to large-scale association scans: a semi-Bayesian method to detect disease-predisposing alleles.
abstract：:Recent analytic and technological breakthroughs have set the stage for genome-wide linkage disequilibrium studies to map disease-susceptibility variants. This paper discusses a probabilistic methodology for making disease-mapping inferences in large-scale case-control genetic studies. The semi-Bayesian approach promot...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1168

authors： Schrodi SJ

更新日期：2005-01-01 00:00:00
Multiple testing in candidate gene situations: a comparison of classical, discrete, and resampling-based procedures.
abstract：:In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account ...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1729

authors： Elsäβer A,Victor A,Hommel G

更新日期：2011-01-01 00:00:00
Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.
abstract：:Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these m...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1666

authors： Neuwald AF

更新日期：2011-01-01 00:00:00
A multiple testing approach to high-dimensional association studies with an application to the detection of associations between risk factors of heart disease and genetic polymorphisms.
abstract：:We present an approach to association studies involving a dozen or so ;response' variables and a few hundred ;explanatory' variables which emphasizes transparency, simplicity, and protection against spurious results. The methods proposed are largely non-parametric, and they are systematically rounded-off by the Benjam...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1420

authors： Ferreira JA,Berkhof J,Souverein O,Zwinderman K

更新日期：2009-01-01 00:00:00
Weighted-LASSO for structured network inference from time course data.
abstract：:We present a weighted-LASSO method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own prior internal structures of connectivity which drive the inference method. This ...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1519

authors： Charbonnier C,Chiquet J,Ambroise C

更新日期：2010-01-01 00:00:00
Comparison and visualisation of agreement for paired lists of rankings.
abstract：:Output from analysis of a high-throughput 'omics' experiment very often is a ranked list. One commonly encountered example is a ranked list of differentially expressed genes from a gene expression experiment, with a length of many hundreds of genes. There are numerous situations where interest is in the comparison of ...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2016-0036

authors： Donald MR,Wilson SR

更新日期：2017-03-01 00:00:00
Detecting outlier samples in microarray data.
abstract：:In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to exp...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1426

authors： Shieh AD,Hung YS

更新日期：2009-01-01 00:00:00
M-quantile regression analysis of temporal gene expression data.
abstract：:In this paper, we explore the use of M-quantile regression and M-quantile coefficients to detect statistical differences between temporal curves that belong to different experimental conditions. In particular, we consider the application of temporal gene expression data. Here, the aim is to detect genes whose temporal...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1452

authors： Vinciotti V,Yu K

更新日期：2009-01-01 00:00:00
Accommodating uncertainty in a tree set for function estimation.
abstract：:Multiple branching trees have been used to model the acquisition of HIV drug resistance mutations, and several different algorithms have been developed to construct the tree set that best describes the data. These algorithms have mainly focused on the structure of the tree set. The focal point of this paper is estimat...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1324

authors： Healy BC,DeGruttola VG,Hu C

更新日期：2008-01-01 00:00:00
A method to increase the power of multiple testing procedures through sample splitting.
abstract：:Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the e...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1148

authors： Rubin D,Dudoit S,van der Laan M

更新日期：2006-01-01 00:00:00
Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation.
abstract：:Making sound proteomic inferences using ELISA microarray assay requires both an accurate prediction of protein concentration and a credible estimate of its error. We present a method using monotonic spline statistical models (MS), penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to pr...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1364

authors： Daly DS,Anderson KK,White AM,Gonzalez RM,Varnum SM,Zangar RC

更新日期：2008-01-01 00:00:00
Sampling correction in pedigree analysis.
abstract：:Usually, a pedigree is sampled and included in the sample that is analyzed after following a predefined non-random sampling design comprising several specific procedures. To obtain a pedigree analysis result free from the bias caused by the sampling procedures, a correction is applied to the pedigree likelihood. The s...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1003

authors： Ginsburg E,Malkin I,Elston RC

更新日期：2003-01-01 00:00:00
Fully Bayesian mixture model for differential gene expression: simulations and model checks.
abstract：:We present a Bayesian hierarchical model for detecting differentially expressed genes using a mixture prior on the parameters representing differential effects. We formulate an easily interpretable 3-component mixture to classify genes as over-expressed, under-expressed and non-differentially expressed, and model gene...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1314

authors： Lewin A,Bochkina N,Richardson S

更新日期：2007-01-01 00:00:00
MLML2R: an R package for maximum likelihood estimation of DNA methylation and hydroxymethylation proportions.
abstract：:Accurately measuring epigenetic marks such as 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) at the single-nucleotide level, requires combining data from DNA processing methods including traditional (BS), oxidative (oxBS) or Tet-Assisted (TAB) bisulfite conversion. We introduce the R package MLML2R, which...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.1515/sagmb-2018-0031

authors： Kiihl SF,Martinez-Garrido MJ,Domingo-Relloso A,Bermudez J,Tellez-Plaza M

更新日期：2019-01-17 00:00:00
The cyclohedron test for finding periodic genes in time course expression studies.
abstract：:The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebr...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1286

authors： Morton J,Pachter L,Shiu A,Sturmfels B

更新日期：2007-01-01 00:00:00
Model selection based on FDR-thresholding optimizing the area under the ROC-curve.
abstract：:We evaluate variable selection by multiple tests controlling the false discovery rate (FDR) to build a linear score for prediction of clinical outcome in high-dimensional data. Quality of prediction is assessed by the receiver operating characteristic curve (ROC) for prediction in independent patients. Thus we try to ...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1462

authors： Graf AC,Bauer P

更新日期：2009-01-01 00:00:00
Transmission disequilibrium test power and sample size in the presence of locus heterogeneity.
abstract：:Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this re...

journal_title：Statistical applications in genetics and molecular biology

pub_type： 杂志文章

doi：10.2202/1544-6115.1501

authors： Chen C,Yang G,Buyske S,Matise T,Finch SJ,Gordon D

更新日期：2009-01-01 00:00:00