A comparison of automatic cell identification methods for single-cell RNA sequencing data.

Abstract:

BACKGROUND:Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification. RESULTS:Here, we benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. We use 2 experimental setups to evaluate the performance of each method for within dataset predictions (intra-dataset) and across datasets (inter-dataset) based on accuracy, percentage of unclassified cells, and computation time. We further evaluate the methods' sensitivity to the input features, number of cells per population, and their performance across different annotation levels and datasets. We find that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose support vector machine classifier has overall the best performance across the different experiments. CONCLUSIONS:We present a comprehensive evaluation of automatic cell identification methods for single-cell RNA sequencing data. All the code used for the evaluation is available on GitHub ( https://github.com/tabdelaal/scRNAseq_Benchmark ). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support the extension of new methods and new datasets.

journal_name

Genome Biol

journal_title

Genome biology

authors

Abdelaal T,Michielsen L,Cats D,Hoogduin D,Mei H,Reinders MJT,Mahfouz A

doi

10.1186/s13059-019-1795-z

subject

Has Abstract

pub_date

2019-09-09 00:00:00

pages

194

issue

1

eissn

1474-7596

issn

1474-760X

pii

10.1186/s13059-019-1795-z

journal_volume

20

pub_type

杂志文章
  • Molecular orchestration of the hepatic circadian symphony.

    abstract::The circadian clock determines the rhythmic expression of many different genes throughout a 24-hour period. A recent study investigating the circadian regulation of liver proteins reveals multiple levels of regulation, including transcriptional, post-transcriptional and post-translational mechanisms. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2006-7-9-234

    authors: Albrecht U

    更新日期:2006-01-01 00:00:00

  • Can sequence determine function?

    abstract::The functional annotation of proteins identified in genome sequencing projects is based on similarities to homologs in the databases. As a result of the possible strategies for divergent evolution, homologous enzymes frequently do not catalyze the same reaction, and we conclude that assignment of function from sequenc...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2000-1-5-reviews0005

    authors: Gerlt JA,Babbitt PC

    更新日期:2000-01-01 00:00:00

  • ReMixT: clone-specific genomic structure estimation in cancer.

    abstract::Somatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished in part by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencing mixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complic...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1267-2

    authors: McPherson AW,Roth A,Ha G,Chauve C,Steif A,de Souza CPE,Eirew P,Bouchard-Côté A,Aparicio S,Sahinalp SC,Shah SP

    更新日期:2017-07-27 00:00:00

  • Dramatic changes in transcription factor binding over evolutionary time.

    abstract::A recent study reveals a surprisingly high degree of change in the occupancy patterns of two transcription factors in the livers of five vertebrates. ...

    journal_title:Genome biology

    pub_type: 评论,杂志文章

    doi:10.1186/gb-2010-11-6-122

    authors: Weirauch MT,Hughes TR

    更新日期:2010-01-01 00:00:00

  • Direct measurement of transcription rates reveals multiple mechanisms for configuration of the Arabidopsis ambient temperature response.

    abstract:BACKGROUND:Sensing and responding to ambient temperature is important for controlling growth and development of many organisms, in part by regulating mRNA levels. mRNA abundance can change with temperature, but it is unclear whether this results from changes in transcription or decay rates, and whether passive or activ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2014-15-3-r45

    authors: Sidaway-Lee K,Costa MJ,Rand DA,Finkenstadt B,Penfield S

    更新日期:2014-03-03 00:00:00

  • Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella.

    abstract:BACKGROUND:Foodborne outbreaks of Salmonella remain a pressing public health concern. We recently detected a large outbreak of Salmonella enterica serovar Enteritidis phage type 14b affecting more than 30 patients in our hospital. This outbreak was linked to community, national and European-wide cases. Hospital patient...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-015-0677-2

    authors: Quick J,Ashton P,Calus S,Chatt C,Gossain S,Hawker J,Nair S,Neal K,Nye K,Peters T,De Pinna E,Robinson E,Struthers K,Webber M,Catto A,Dallman TJ,Hawkey P,Loman NJ

    更新日期:2015-05-30 00:00:00

  • The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data.

    abstract::We have developed an ontology to provide standardized nomenclature for anatomical terms in the postnatal mouse. The Adult Mouse Anatomical Dictionary is structured as a directed acyclic graph, and is organized hierarchically both spatially and functionally. The ontology will be used to annotate and integrate different...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-3-r29

    authors: Hayamizu TF,Mangan M,Corradi JP,Kadin JA,Ringwald M

    更新日期:2005-01-01 00:00:00

  • An integrated multi-omics approach to identify regulatory mechanisms in cancer metastatic processes.

    abstract:BACKGROUND:Metastatic progress is the primary cause of death in most cancers, yet the regulatory dynamics driving the cellular changes necessary for metastasis remain poorly understood. Multi-omics approaches hold great promise for addressing this challenge; however, current analysis tools have limited capabilities to ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-02213-x

    authors: Ghaffari S,Hanson C,Schmidt RE,Bouchonville KJ,Offer SM,Sinha S

    更新日期:2021-01-07 00:00:00

  • The bread wheat epigenomic map reveals distinct chromatin architectural and evolutionary features of functional genetic elements.

    abstract:BACKGROUND:Bread wheat is an allohexaploid species with a 16-Gb genome that has large intergenic regions, which presents a big challenge for pinpointing regulatory elements and further revealing the transcriptional regulatory mechanisms. Chromatin profiling to characterize the combinatorial patterns of chromatin signat...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1746-8

    authors: Li Z,Wang M,Lin K,Xie Y,Guo J,Ye L,Zhuang Y,Teng W,Ran X,Tong Y,Xue Y,Zhang W,Zhang Y

    更新日期:2019-07-15 00:00:00

  • Permutation-validated principal components analysis of microarray data.

    abstract:BACKGROUND:In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the a...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2002-3-4-research0019

    authors: Landgrebe J,Wurst W,Welzl G

    更新日期:2002-01-01 00:00:00

  • Co-binding by YY1 identifies the transcriptionally active, highly conserved set of CTCF-bound regions in primate genomes.

    abstract:BACKGROUND:The genomic binding of CTCF is highly conserved across mammals, but the mechanisms that underlie its stability are poorly understood. One transcription factor known to functionally interact with CTCF in the context of X-chromosome inactivation is the ubiquitously expressed YY1. Because combinatorial transcri...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-12-r148

    authors: Schwalie PC,Ward MC,Cain CE,Faure AJ,Gilad Y,Odom DT,Flicek P

    更新日期:2013-12-31 00:00:00

  • Surveying genome replication.

    abstract::Two recent studies have added microarrays to the toolkit used to analyze the origins of replication in yeast chromosomes, providing a fuller picture of how genomic DNA replication is organized. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2002-3-6-reviews1016

    authors: Kearsey S

    更新日期:2002-01-01 00:00:00

  • Serum-dependent transcriptional networks identify distinct functional roles for H-Ras and N-Ras during initial stages of the cell cycle.

    abstract:BACKGROUND:Using oligonucleotide microarrays, we compared transcriptional profiles corresponding to the initial cell cycle stages of mouse fibroblasts lacking the small GTPases H-Ras and/or N-Ras with those of matching, wild-type controls. RESULTS:Serum-starved wild-type and knockout ras fibroblasts had very similar t...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2009-10-11-r123

    authors: Castellano E,Guerrero C,Núñez A,De Las Rivas J,Santos E

    更新日期:2009-01-01 00:00:00

  • 'Horizontal' plant biology on the rise.

    abstract::A report on the Plant Genomics European Meeting (Plant-GEMS2004), Lyon, France, 22-25 September 2004. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2004-6-1-302

    authors: Van de Peer Y

    更新日期:2005-01-01 00:00:00

  • Mediation of Drosophila autosomal dosage effects and compensation by network interactions.

    abstract:BACKGROUND:Gene dosage change is a mild perturbation that is a valuable tool for pathway reconstruction in Drosophila. While it is often assumed that reducing gene dose by half leads to two-fold less expression, there is partial autosomal dosage compensation in Drosophila, which may be mediated by feedback or buffering...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-4-r28

    authors: Malone JH,Cho DY,Mattiuzzo NR,Artieri CG,Jiang L,Dale RK,Smith HE,McDaniel J,Munro S,Salit M,Andrews J,Przytycka TM,Oliver B

    更新日期:2012-04-24 00:00:00

  • Alternative splicing links histone modifications to stem cell fate decision.

    abstract:BACKGROUND:Understanding the embryonic stem cell (ESC) fate decision between self-renewal and proper differentiation is important for developmental biology and regenerative medicine. Attention has focused on mechanisms involving histone modifications, alternative pre-messenger RNA splicing, and cell-cycle progression. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-018-1512-3

    authors: Xu Y,Zhao W,Olson SD,Prabhakara KS,Zhou X

    更新日期:2018-09-14 00:00:00

  • Genome-wide analysis of the maternal-to-zygotic transition in Drosophila primordial germ cells.

    abstract:BACKGROUND:During the maternal-to-zygotic transition (MZT) vast changes in the embryonic transcriptome are produced by a combination of two processes: elimination of maternally provided mRNAs and synthesis of new transcripts from the zygotic genome. Previous genome-wide analyses of the MZT have been restricted to whole...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-2-r11

    authors: Siddiqui NU,Li X,Luo H,Karaiskakis A,Hou H,Kislinger T,Westwood JT,Morris Q,Lipshitz HD

    更新日期:2012-02-20 00:00:00

  • Genomic analysis of the eukaryotic protein kinase superfamily: a perspective.

    abstract::Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all the members of the eukaryotic protein kinase superfamily have recently culminated in the mining of essentially complet...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2003-4-5-111

    authors: Hanks SK

    更新日期:2003-01-01 00:00:00

  • A showcase of future plant biology: moving towards next-generation plant genetics assisted by genome sequencing and systems biology.

    abstract::A report on the Cold Spring Harbor Asia conference on Genome Assisted Biology of Crops and Model Plant Systems Meeting, held in Suzhou, China, April 21-25, 2014. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb4176

    authors: Lee I

    更新日期:2014-05-23 00:00:00

  • Expanded identification and characterization of mammalian circular RNAs.

    abstract:BACKGROUND:The recent reports of two circular RNAs (circRNAs) with strong potential to act as microRNA (miRNA) sponges suggest that circRNAs might play important roles in regulating gene expression. However, the global properties of circRNAs are not well understood. RESULTS:We developed a computational pipeline to ide...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-014-0409-z

    authors: Guo JU,Agarwal V,Guo H,Bartel DP

    更新日期:2014-07-29 00:00:00

  • Nucleosome deposition and DNA methylation at coding region boundaries.

    abstract:BACKGROUND:Nucleosome deposition downstream of transcription initiation and DNA methylation in the gene body suggest that control of transcription elongation is a key aspect of epigenetic regulation. RESULTS:Here we report a genome-wide observation of distinct peaks of nucleosomes and methylation at both ends of a pro...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2009-10-9-r89

    authors: Choi JK,Bae JB,Lyu J,Kim TY,Kim YJ

    更新日期:2009-01-01 00:00:00

  • Why genomics is more than genomes.

    abstract::A report on the 2004 meeting on Molecular Genetics of Bacteria and Bacteriophages, Cold Spring Harbor, USA, 25-29 August 2004. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2004-5-12-357

    authors: Lawrence JG

    更新日期:2004-01-01 00:00:00

  • EpiTEome: Simultaneous detection of transposable element insertion sites and their DNA methylation levels.

    abstract::The genome-wide investigation of DNA methylation levels has been limited to reference transposable element positions. The methylation analysis of non-reference and mobile transposable elements has only recently been performed, but required both genome resequencing and MethylC-seq datasets. We have created epiTEome, a ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1232-0

    authors: Daron J,Slotkin RK

    更新日期:2017-05-12 00:00:00

  • Wheat chromatin architecture is organized in genome territories and transcription factories.

    abstract:BACKGROUND:Polyploidy is ubiquitous in eukaryotic plant and fungal lineages, and it leads to the co-existence of several copies of similar or related genomes in one nucleus. In plants, polyploidy is considered a major factor in successful domestication. However, polyploidy challenges chromosome folding architecture in ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-01998-1

    authors: Concia L,Veluchamy A,Ramirez-Prado JS,Martin-Ramirez A,Huang Y,Perez M,Domenichini S,Rodriguez Granados NY,Kim S,Blein T,Duncan S,Pichot C,Manza-Mianza D,Juery C,Paux E,Moore G,Hirt H,Bergounioux C,Crespi M,Mahfouz

    更新日期:2020-04-29 00:00:00

  • The circadian clock goes genomic.

    abstract::Large-scale biology among plant species, as well as comparative genomics of circadian clock architecture and clock-regulated output processes, have greatly advanced our understanding of the endogenous timing system in plants. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2013-14-6-208

    authors: Staiger D,Shin J,Johansson M,Davis SJ

    更新日期:2013-06-24 00:00:00

  • Pharmacogenomic analysis of patient-derived tumor cells in gynecologic cancers.

    abstract:BACKGROUND:Gynecologic malignancy is one of the leading causes of mortality in female adults worldwide. Comprehensive genomic analysis has revealed a list of molecular aberrations that are essential to tumorigenesis, progression, and metastasis of gynecologic tumors. However, targeting such alterations has frequently l...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1848-3

    authors: Sa JK,Hwang JR,Cho YJ,Ryu JY,Choi JJ,Jeong SY,Kim J,Kim MS,Paik ES,Lee YY,Choi CH,Kim TJ,Kim BG,Bae DS,Lee Y,Her NG,Shin YJ,Cho HJ,Kim JY,Seo YJ,Koo H,Oh JW,Lee T,Kim HS,Song SY,Bae JS,Park WY,Han HD

    更新日期:2019-11-26 00:00:00

  • The diversity of endothelial cells: a challenge for therapeutic angiogenesis.

    abstract::Vascular endothelia comprise a diverse population of cells that specialize in response to genetic programs and environmental cues to take on distinct roles in different vessels, tissues, and organs, and in response to pathophysiological stresses. Characterization of endothelial-cell diversity will facilitate the devel...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2004-5-2-207

    authors: Conway EM,Carmeliet P

    更新日期:2004-01-01 00:00:00

  • Opening sequence: computational genomics in the era of high-throughput sequencing.

    abstract::A report on the 11th Cold Spring Harbor Laboratory/Wellcome Trust conference on Genome Informatics, Cold Spring Harbor Laboratories, New York, USA, November 2-5, 2011. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2011-12-12-310

    authors: Chambers EV,Kindt AS,Semple CA

    更新日期:2011-12-28 00:00:00

  • To be or not to be a piRNA: genomic origin and processing of piRNAs.

    abstract::Piwi-interacting RNAs (piRNAs) originate from genomic regions dubbed piRNA clusters. How cluster transcripts are selected for processing into piRNAs is not understood. We discuss evidence for the involvement of chromatin structure and maternally inherited piRNAs in determining their fate. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb4154

    authors: Le Thomas A,Tóth KF,Aravin AA

    更新日期:2014-01-27 00:00:00

  • DNA methylation and epigenomics: new technologies and emerging concepts.

    abstract::A report of the Keystone Symposia joint meetings on DNA Methylation and Epigenomics held in Keystone, Colorado, USA, 29 March to 3 April, 2015. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-015-0674-5

    authors: Chatterjee A,Eccles MR

    更新日期:2015-05-21 00:00:00