Evaluation of methods for differential expression analysis on multi-group RNA-seq count data.


BACKGROUND:RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired. METHODS:We compare 12 pipelines available in nine R packages for detecting differential expressions (DE) from multi-group RNA-seq count data, focusing on three-group data with or without replicates. We evaluate those pipelines on the basis of both simulation data and real count data. RESULTS:As a result, the pipelines in the TCC package performed comparably to or better than other pipelines under various simulation scenarios. TCC implements a multi-step normalization strategy (called DEGES) that internally uses functions provided by other representative packages (edgeR, DESeq2, and so on). We found considerably different numbers of identified DEGs (18.5 ~ 45.7% of all genes) among the pipelines for the same real dataset but similar distributions of the classified expression patterns. We also found that DE results can roughly be estimated by the hierarchical dendrogram of sample clustering for the raw count data. CONCLUSION:We confirmed the DEGES-based pipelines implemented in TCC performed well in a three-group comparison as well as a two-group comparison. We recommend using the DEGES-based pipeline that internally uses edgeR (here called the EEE-E pipeline) for count data with replicates (especially for small sample sizes). For data without replicates, the DEGES-based pipeline with DESeq2 (called SSS-S) can be recommended.


BMC Bioinformatics


BMC bioinformatics


Tang M,Sun J,Shimizu K,Kadota K




Has Abstract


2015-11-04 00:00:00










  • Protein network prediction and topological analysis in Leishmania major as a tool for drug target selection.

    abstract:BACKGROUND:Leishmaniasis is a virulent parasitic infection that causes a worldwide disease burden. Most treatments have toxic side-effects and efficacy has decreased due to the emergence of resistant strains. The outlook is worsened by the absence of promising drug targets for this disease. We have taken a computationa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Flórez AF,Park D,Bhak J,Kim BC,Kuchinsky A,Morris JH,Espinosa J,Muskus C

    更新日期:2010-09-27 00:00:00

  • FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics.

    abstract:BACKGROUND:High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Chanumolu SK,Albahrani M,Otu HH

    更新日期:2019-08-15 00:00:00

  • Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.

    abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Zhang GL,Khan AM,Srinivasan KN,Heiny A,Lee K,Kwoh CK,August JT,Brusic V

    更新日期:2008-01-01 00:00:00

  • Prioritizing disease genes with an improved dual label propagation framework.

    abstract:BACKGROUND:Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein intera...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Zhang Y,Liu J,Liu X,Fan X,Hong Y,Wang Y,Huang Y,Xie M

    更新日期:2018-02-08 00:00:00

  • 3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.

    abstract:BACKGROUND:The reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Affeldt S,Verny L,Isambert H

    更新日期:2016-01-20 00:00:00

  • Epiviz: a view inside the design of an integrated visual analysis software for genomics.

    abstract:BACKGROUND:Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Chelaru F,Corrada Bravo H

    更新日期:2015-01-01 00:00:00

  • Efficient use of unlabeled data for protein sequence classification: a comparative study.

    abstract:BACKGROUND:Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved acc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Kuksa P,Huang PH,Pavlovic V

    更新日期:2009-04-29 00:00:00

  • A new pooling strategy for high-throughput screening: the Shifted Transversal Design.

    abstract:BACKGROUND:In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplicat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Thierry-Mieg N

    更新日期:2006-01-19 00:00:00

  • Challenging popular tools for the annotation of genetic variations with a real case, pathogenic mutations of lysosomal alpha-galactosidase.

    abstract:BACKGROUND:Severity gradation of missense mutations is a big challenge for exome annotation. Predictors of deleteriousness that are most frequently used to filter variants found by next generation sequencing, produce qualitative predictions, but also numerical scores. It has never been tested if these scores correlate ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Cimmaruta C,Citro V,Andreotti G,Liguori L,Cubellis MV,Hay Mele B

    更新日期:2018-11-30 00:00:00

  • Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data.

    abstract:BACKGROUND:Time-course microarray experiments are being increasingly used to characterize dynamic biological processes. In these experiments, the goal is to identify genes differentially expressed in time-course data, measured between different biological conditions. These differentially expressed genes can reveal the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Jonnalagadda S,Srinivasan R

    更新日期:2008-06-06 00:00:00

  • New directions in biomedical text annotation: definitions, guidelines and corpus construction.

    abstract:BACKGROUND:While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We rep...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Wilbur WJ,Rzhetsky A,Shatkay H

    更新日期:2006-07-25 00:00:00

  • Progressive multiple sequence alignment with indel evolution.

    abstract:BACKGROUND:Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modell...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Maiolo M,Zhang X,Gil M,Anisimova M

    更新日期:2018-09-21 00:00:00

  • Bayesian inference of biochemical kinetic parameters using the linear noise approximation.

    abstract:BACKGROUND:Fluorescent and luminescent gene reporters allow us to dynamically quantify changes in molecular species concentration over time on the single cell level. The mathematical modeling of their interaction through multivariate dynamical models requires the development of effective statistical methods to calibrat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Komorowski M,Finkenstädt B,Harper CV,Rand DA

    更新日期:2009-10-19 00:00:00

  • Bayesian models for pooling microarray studies with multiple sources of replications.

    abstract:BACKGROUND:Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Conlon EM,Song JJ,Liu JS

    更新日期:2006-05-05 00:00:00

  • AnyExpress: integrated toolkit for analysis of cross-platform gene expression data using a fast interval matching algorithm.

    abstract:BACKGROUND:Cross-platform analysis of gene express data requires multiple, intricate processes at different layers with various platforms. However, existing tools handle only a single platform and are not flexible enough to support custom changes, which arise from the new statistical methods, updated versions of refere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Kim J,Patel K,Jung H,Kuo WP,Ohno-Machado L

    更新日期:2011-03-17 00:00:00

  • Disease candidate gene identification and prioritization using protein interaction networks.

    abstract:BACKGROUND:Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-prot...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Chen J,Aronow BJ,Jegga AG

    更新日期:2009-02-27 00:00:00

  • CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks.

    abstract:BACKGROUND:Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Baumbach J

    更新日期:2007-11-06 00:00:00

  • A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships.

    abstract:BACKGROUND:Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to ac...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Lemaitre C,Barré A,Citti C,Tardy F,Thiaucourt F,Sirand-Pugnet P,Thébault P

    更新日期:2011-11-24 00:00:00

  • DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

    abstract:BACKGROUND:XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Linderman MD,Chia D,Wallace F,Nothaft FA

    更新日期:2019-10-11 00:00:00

  • Identification of functional hubs and modules by converting interactome networks into hierarchical ordering of proteins.

    abstract:BACKGROUND:Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network analysis for uncovering functi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Cho YR,Zhang A

    更新日期:2010-04-29 00:00:00

  • Methodology capture: discriminating between the "best" and the rest of community practice.

    abstract:BACKGROUND:The methodologies we use both enable and help define our research. However, as experimental complexity has increased the choice of appropriate methodologies has become an increasingly difficult task. This makes it difficult to keep track of available bioinformatics software, let alone the most suitable proto...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Eales JM,Pinney JW,Stevens RD,Robertson DL

    更新日期:2008-09-01 00:00:00

  • Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.

    abstract:BACKGROUND:Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Wu SH,Rodrigo AG

    更新日期:2015-11-04 00:00:00

  • Rearrangement analysis of multiple bacterial genomes.

    abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Noureen M,Tada I,Kawashima T,Arita M

    更新日期:2019-12-27 00:00:00

  • RECOVIR: an application package to automatically identify some single stranded RNA viruses using capsid protein residues that uniquely distinguish among these viruses.

    abstract:BACKGROUND:Most single stranded RNA (ssRNA) viruses mutate rapidly to generate large number of strains having highly divergent capsid sequences. Accurate strain recognition in uncharacterized target capsid sequences is essential for epidemiology, diagnostics, and vaccine development. Strain recognition based on similar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Zhu D,Fox GE,Chakravarty S

    更新日期:2007-10-10 00:00:00

  • FocAn: automated 3D analysis of DNA repair foci in image stacks acquired by confocal fluorescence microscopy.

    abstract:BACKGROUND:Phosphorylated histone H2AX, also known as γH2AX, forms μm-sized nuclear foci at the sites of DNA double-strand breaks (DSBs) induced by ionizing radiation and other agents. Due to their specificity and sensitivity, γH2AX immunoassays have become the gold standard for studying DSB induction and repair. One o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Memmel S,Sisario D,Zimmermann H,Sauer M,Sukhorukov VL,Djuzenova CS,Flentje M

    更新日期:2020-01-28 00:00:00

  • The InDeVal insertion/deletion evaluation tool: a program for finding target regions in DNA sequences and for aiding in sequence comparison.

    abstract:BACKGROUND:The program InDeVal was originally developed to help researchers find known regions of insertion/deletion activity (with the exception of isolated single-base indels) in newly determined Poaceae trnL-F sequences and compare them with 533 previously determined sequences. It is supplied with input files design...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Stoneberg Holt SD,Holt JA

    更新日期:2004-10-29 00:00:00

  • Detecting intergene correlation changes in microarray analysis: a new approach to gene selection.

    abstract:BACKGROUND:Microarray technology is commonly used as a simple screening tool with a focus on selecting genes that exhibit extremely large differential expressions between different phenotypes. It lacks the ability to select genes that change their relationships with other genes in different biological conditions (diffe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Hu R,Qiu X,Glazko G,Klebanov L,Yakovlev A

    更新日期:2009-01-15 00:00:00

  • Primary orthologs from local sequence context.

    abstract:BACKGROUND:The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Gao K,Miller J

    更新日期:2020-02-06 00:00:00

  • Modeling of shotgun sequencing of DNA plasmids using experimental and theoretical approaches.

    abstract:BACKGROUND:Processing and analysis of DNA sequences obtained from next-generation sequencing (NGS) face some difficulties in terms of the correct prediction of DNA sequencing outcomes without the implementation of bioinformatics approaches. However, algorithms based on NGS perform inefficiently due to the generation of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Shityakov S,Bencurova E,Förster C,Dandekar T

    更新日期:2020-04-03 00:00:00

  • MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data.

    abstract:BACKGROUND:Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable. RESULTS:We present Multire...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章


    authors: Feigelman J,Theis FJ,Marr C

    更新日期:2014-07-11 00:00:00