TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists.

Abstract:

:High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.

authors

Schimek MG,Budinská E,Kugler KG,Švendová V,Ding J,Lin S

doi

10.1515/sagmb-2014-0093

subject

Has Abstract

pub_date

2015-06-01 00:00:00

pages

311-6

issue

3

eissn

2194-6302

issn

1544-6115

pii

/j/sagmb.ahead-of-print/sagmb-2014-0093/sagmb-2014

journal_volume

14

pub_type

杂志文章
  • Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

    abstract::Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these m...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1666

    authors: Neuwald AF

    更新日期:2011-01-01 00:00:00

  • Second order optimization for the inference of gene regulatory pathways.

    abstract::With the increasing availability of experimental data on gene interactions, modeling of gene regulatory pathways has gained special attention. Gradient descent algorithms have been widely used for regression and classification applications. Unfortunately, results obtained after training a model by gradient descent are...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2012-0021

    authors: Das M,Murthy CA,De RK

    更新日期:2014-02-01 00:00:00

  • MLML2R: an R package for maximum likelihood estimation of DNA methylation and hydroxymethylation proportions.

    abstract::Accurately measuring epigenetic marks such as 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) at the single-nucleotide level, requires combining data from DNA processing methods including traditional (BS), oxidative (oxBS) or Tet-Assisted (TAB) bisulfite conversion. We introduce the R package MLML2R, which...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2018-0031

    authors: Kiihl SF,Martinez-Garrido MJ,Domingo-Relloso A,Bermudez J,Tellez-Plaza M

    更新日期:2019-01-17 00:00:00

  • Accommodating uncertainty in a tree set for function estimation.

    abstract::Multiple branching trees have been used to model the acquisition of HIV drug resistance mutations, and several different algorithms have been developed to construct the tree set that best describes the data. These algorithms have mainly focused on the structure of the tree set. The focal point of this paper is estimat...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1324

    authors: Healy BC,DeGruttola VG,Hu C

    更新日期:2008-01-01 00:00:00

  • A test for detecting differential indirect trans effects between two groups of samples.

    abstract::Integrative analysis of copy number and gene expression data can help in understanding the cis and trans effect of copy number aberrations on transcription levels of genes involved in a pathway. To analyse how these copy number mediated gene-gene interactions differ between groups of samples we propose a new method, n...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0058

    authors: Chaturvedi N,Menezes RX,Goeman JJ,Wieringen WV

    更新日期:2018-07-31 00:00:00

  • On an extended interpretation of linkage disequilibrium in genetic case-control association studies.

    abstract::We are concerned with statistical inference for 2 × C × K contingency tables in the context of genetic case-control association studies. Multivariate methods based on asymptotic Gaussianity of vectors of test statistics require information about the asymptotic correlation structure among these test statistics under th...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2015-0024

    authors: Dickhaus T,Stange J,Demirhan H

    更新日期:2015-11-01 00:00:00

  • BayesMendel: an R environment for Mendelian risk prediction.

    abstract::Several important syndromes are caused by deleterious germline mutations of individual genes. In both clinical and research applications it is useful to evaluate the probability that an individual carries an inherited genetic variant of these genes, and to predict the risk of disease for that individual, using informa...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1063

    authors: Chen S,Wang W,Broman KW,Katki HA,Parmigiani G

    更新日期:2004-01-01 00:00:00

  • Genetic linkage analysis in the presence of germline mosaicism.

    abstract::Germline mosaicism is a genetic condition in which some germ cells of an individual contain a mutation. This condition violates the assumptions underlying classic genetic analysis and may lead to failure of such analysis. In this work we extend the statistical model used for genetic linkage analysis in order to incorp...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1709

    authors: Weissbrod O,Geiger D

    更新日期:2011-10-04 00:00:00

  • Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments.

    abstract::The ENCODE project has funded the generation of a diverse collection of methylation profiles using reduced representation bisulfite sequencing (RRBS) technology, enabling the analysis of epigenetic variation on a genomic scale at single-site resolution. A standard application of RRBS experiments is in the location of ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0027

    authors: Lacey MR,Baribault C,Ehrlich M

    更新日期:2013-12-01 00:00:00

  • Node sampling for protein complex estimation in bait-prey graphs.

    abstract::In cellular biology, node-and-edge graph or "network" data collection often uses bait-prey technologies such as co-immunoprecipitation (CoIP). Bait-prey technologies assay relationships or "interactions" between protein pairs, with CoIP specifically measuring protein complex co-membership. Analyses of CoIP data freque...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2015-0007

    authors: Scholtens DM,Spencer BD

    更新日期:2015-08-01 00:00:00

  • Asymptotic optimality of likelihood-based cross-validation.

    abstract::Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth index...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1036

    authors: van der Laan MJ,Dudoit S,Keles S

    更新日期:2004-01-01 00:00:00

  • Combining nearest neighbor classifiers versus cross-validation selection.

    abstract::Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN metho...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1054

    authors: Paik M,Yang Y

    更新日期:2004-01-01 00:00:00

  • Empirical bayes microarray ANOVA and grouping cell lines by equal expression levels.

    abstract::In the exploding field of gene expression techniques such as DNA microarrays, there are still few general probabilistic methods for analysis of variance. Linear models and ANOVA are heavily used tools in many other disciplines of scientific research. The usual F-statistic is unsatisfactory for microarray data, which e...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1125

    authors: Lönnstedt I,Rimini R,Nilsson P

    更新日期:2005-01-01 00:00:00

  • The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix.

    abstract::Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality. We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequen...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1135

    authors: Newberg LA,McCue LA,Lawrence CE

    更新日期:2005-01-01 00:00:00

  • Principal component discriminant analysis.

    abstract::The approach adopted involved two-stages. First the 11205 measurements in the mass spectrometry data were reduced to 14 scores by a principal component analysis of the centered but otherwise untreated and unscaled data matrix. Then a linear classifier was derived by linear discriminant analysis using these 14 scores a...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1350

    authors: Fearn T

    更新日期:2008-01-01 00:00:00

  • Transmission disequilibrium test power and sample size in the presence of locus heterogeneity.

    abstract::Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this re...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1501

    authors: Chen C,Yang G,Buyske S,Matise T,Finch SJ,Gordon D

    更新日期:2009-01-01 00:00:00

  • Variance and covariance heterogeneity analysis for detection of metabolites associated with cadmium exposure.

    abstract::In this study, we propose a novel statistical framework for detecting progressive changes in molecular traits as response to a pathogenic stimulus. In particular, we propose to employ Bayesian hierarchical models to analyse changes in mean level, variance and correlation of metabolic traits in relation to covariates. ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0041

    authors: Salamanca BV,Ebbels TM,Iorio MD

    更新日期:2014-04-01 00:00:00

  • Approximating the variance of the conditional probability of the state of a hidden Markov model.

    abstract::In a hidden Markov model, one "estimates" the state of the hidden Markov chain at t by computing via the forwards-backwards algorithm the conditional distribution of the state vector given the observed data. The covariance matrix of this conditional distribution measures the information lost by failure to observe dire...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章,评审

    doi:10.2202/1544-6115.1296

    authors: Siegmund DO,Yakir B

    更新日期:2007-01-01 00:00:00

  • Accounting for undetected compounds in statistical analyses of mass spectrometry 'omic studies.

    abstract::Mass spectrometry is an important high-throughput technique for profiling small molecular compounds in biological samples and is widely used to identify potential diagnostic and prognostic compounds associated with disease. Commonly, this data generated by mass spectrometry has many missing values resulting when a com...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2013-0021

    authors: Taylor SL,Leiserowitz GS,Kim K

    更新日期:2013-12-01 00:00:00

  • Comparison and visualisation of agreement for paired lists of rankings.

    abstract::Output from analysis of a high-throughput 'omics' experiment very often is a ranked list. One commonly encountered example is a ranked list of differentially expressed genes from a gene expression experiment, with a length of many hundreds of genes. There are numerous situations where interest is in the comparison of ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2016-0036

    authors: Donald MR,Wilson SR

    更新日期:2017-03-01 00:00:00

  • A method to increase the power of multiple testing procedures through sample splitting.

    abstract::Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the e...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1148

    authors: Rubin D,Dudoit S,van der Laan M

    更新日期:2006-01-01 00:00:00

  • Mapping quantitative trait loci in a non-equilibrium population.

    abstract::The genetic control of a complex trait can be studied by testing and mapping the genotypes of the underlying quantitative trait loci (QTLs) through their associations with observable marker genotypes. All existing statistical methods for QTL mapping assume an equilibrium population, allowing marker-QTL associations to...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1578

    authors: Wu S,Yang J,Wu R

    更新日期:2010-01-01 00:00:00

  • Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting.

    abstract::Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established conce...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2017-0038

    authors: Dazard JE,Ishwaran H,Mehlotra R,Weinberg A,Zimmerman P

    更新日期:2018-02-17 00:00:00

  • Empirical bayes estimation of a sparse vector of gene expression changes.

    abstract::Gene microarray technology is often used to compare the expression of thousand of genes in two different cell lines. Typically, one does not expect measurable changes in transcription amounts for a large number of genes; furthermore, the noise level of array experiments is rather high in relation to the available numb...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1132

    authors: Erickson S,Sabatti C

    更新日期:2005-01-01 00:00:00

  • Likelihood-based inference for multi-color optical mapping.

    abstract::Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognitio...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1266

    authors: Tong L,Mets L,McPeek MS

    更新日期:2007-01-01 00:00:00

  • A Bayesian approach to estimation and testing in time-course microarray experiments.

    abstract::The objective of the present paper is to develop a truly functional Bayesian method specifically designed for time series microarray data. The method allows one to identify differentially expressed genes in a time-course microarray experiment, to rank them and to estimate their expression profiles. Each gene expressio...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1299

    authors: Angelini C,De Canditiis D,Mutarelli M,Pensky M

    更新日期:2007-01-01 00:00:00

  • Detecting outlier samples in microarray data.

    abstract::In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to exp...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1426

    authors: Shieh AD,Hung YS

    更新日期:2009-01-01 00:00:00

  • Discrete Wavelet Packet Transform Based Discriminant Analysis for Whole Genome Sequences.

    abstract::In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may no...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.1515/sagmb-2018-0045

    authors: Huang HH,Girimurugan SB

    更新日期:2019-02-15 00:00:00

  • Weighted-LASSO for structured network inference from time course data.

    abstract::We present a weighted-LASSO method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own prior internal structures of connectivity which drive the inference method. This ...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1519

    authors: Charbonnier C,Chiquet J,Ambroise C

    更新日期:2010-01-01 00:00:00

  • A multiple testing approach to high-dimensional association studies with an application to the detection of associations between risk factors of heart disease and genetic polymorphisms.

    abstract::We present an approach to association studies involving a dozen or so ;response' variables and a few hundred ;explanatory' variables which emphasizes transparency, simplicity, and protection against spurious results. The methods proposed are largely non-parametric, and they are systematically rounded-off by the Benjam...

    journal_title:Statistical applications in genetics and molecular biology

    pub_type: 杂志文章

    doi:10.2202/1544-6115.1420

    authors: Ferreira JA,Berkhof J,Souverein O,Zwinderman K

    更新日期:2009-01-01 00:00:00