Abstract:
BACKGROUND:Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification. RESULTS:Here, we benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. We use 2 experimental setups to evaluate the performance of each method for within dataset predictions (intra-dataset) and across datasets (inter-dataset) based on accuracy, percentage of unclassified cells, and computation time. We further evaluate the methods' sensitivity to the input features, number of cells per population, and their performance across different annotation levels and datasets. We find that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose support vector machine classifier has overall the best performance across the different experiments. CONCLUSIONS:We present a comprehensive evaluation of automatic cell identification methods for single-cell RNA sequencing data. All the code used for the evaluation is available on GitHub ( https://github.com/tabdelaal/scRNAseq_Benchmark ). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support the extension of new methods and new datasets.
journal_name
Genome Bioljournal_title
Genome biologyauthors
Abdelaal T,Michielsen L,Cats D,Hoogduin D,Mei H,Reinders MJT,Mahfouz Adoi
10.1186/s13059-019-1795-zsubject
Has Abstractpub_date
2019-09-09 00:00:00pages
194issue
1eissn
1474-7596issn
1474-760Xpii
10.1186/s13059-019-1795-zjournal_volume
20pub_type
杂志文章相关文献
GENOME BIOLOGY文献大全abstract::The circadian clock determines the rhythmic expression of many different genes throughout a 24-hour period. A recent study investigating the circadian regulation of liver proteins reveals multiple levels of regulation, including transcriptional, post-transcriptional and post-translational mechanisms. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2006-7-9-234
更新日期:2006-01-01 00:00:00
abstract::The functional annotation of proteins identified in genome sequencing projects is based on similarities to homologs in the databases. As a result of the possible strategies for divergent evolution, homologous enzymes frequently do not catalyze the same reaction, and we conclude that assignment of function from sequenc...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2000-1-5-reviews0005
更新日期:2000-01-01 00:00:00
abstract::Somatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished in part by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencing mixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complic...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1267-2
更新日期:2017-07-27 00:00:00
abstract::A recent study reveals a surprisingly high degree of change in the occupancy patterns of two transcription factors in the livers of five vertebrates. ...
journal_title:Genome biology
pub_type: 评论,杂志文章
doi:10.1186/gb-2010-11-6-122
更新日期:2010-01-01 00:00:00
abstract:BACKGROUND:Sensing and responding to ambient temperature is important for controlling growth and development of many organisms, in part by regulating mRNA levels. mRNA abundance can change with temperature, but it is unclear whether this results from changes in transcription or decay rates, and whether passive or activ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2014-15-3-r45
更新日期:2014-03-03 00:00:00
abstract:BACKGROUND:Foodborne outbreaks of Salmonella remain a pressing public health concern. We recently detected a large outbreak of Salmonella enterica serovar Enteritidis phage type 14b affecting more than 30 patients in our hospital. This outbreak was linked to community, national and European-wide cases. Hospital patient...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-015-0677-2
更新日期:2015-05-30 00:00:00
abstract::We have developed an ontology to provide standardized nomenclature for anatomical terms in the postnatal mouse. The Adult Mouse Anatomical Dictionary is structured as a directed acyclic graph, and is organized hierarchically both spatially and functionally. The ontology will be used to annotate and integrate different...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-3-r29
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:Metastatic progress is the primary cause of death in most cancers, yet the regulatory dynamics driving the cellular changes necessary for metastasis remain poorly understood. Multi-omics approaches hold great promise for addressing this challenge; however, current analysis tools have limited capabilities to ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-020-02213-x
更新日期:2021-01-07 00:00:00
abstract:BACKGROUND:Bread wheat is an allohexaploid species with a 16-Gb genome that has large intergenic regions, which presents a big challenge for pinpointing regulatory elements and further revealing the transcriptional regulatory mechanisms. Chromatin profiling to characterize the combinatorial patterns of chromatin signat...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-019-1746-8
更新日期:2019-07-15 00:00:00
abstract:BACKGROUND:In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the a...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2002-3-4-research0019
更新日期:2002-01-01 00:00:00
abstract:BACKGROUND:The genomic binding of CTCF is highly conserved across mammals, but the mechanisms that underlie its stability are poorly understood. One transcription factor known to functionally interact with CTCF in the context of X-chromosome inactivation is the ubiquitously expressed YY1. Because combinatorial transcri...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2013-14-12-r148
更新日期:2013-12-31 00:00:00
abstract::Two recent studies have added microarrays to the toolkit used to analyze the origins of replication in yeast chromosomes, providing a fuller picture of how genomic DNA replication is organized. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2002-3-6-reviews1016
更新日期:2002-01-01 00:00:00
abstract:BACKGROUND:Using oligonucleotide microarrays, we compared transcriptional profiles corresponding to the initial cell cycle stages of mouse fibroblasts lacking the small GTPases H-Ras and/or N-Ras with those of matching, wild-type controls. RESULTS:Serum-starved wild-type and knockout ras fibroblasts had very similar t...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2009-10-11-r123
更新日期:2009-01-01 00:00:00
abstract::A report on the Plant Genomics European Meeting (Plant-GEMS2004), Lyon, France, 22-25 September 2004. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2004-6-1-302
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:Gene dosage change is a mild perturbation that is a valuable tool for pathway reconstruction in Drosophila. While it is often assumed that reducing gene dose by half leads to two-fold less expression, there is partial autosomal dosage compensation in Drosophila, which may be mediated by feedback or buffering...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-4-r28
更新日期:2012-04-24 00:00:00
abstract:BACKGROUND:Understanding the embryonic stem cell (ESC) fate decision between self-renewal and proper differentiation is important for developmental biology and regenerative medicine. Attention has focused on mechanisms involving histone modifications, alternative pre-messenger RNA splicing, and cell-cycle progression. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-018-1512-3
更新日期:2018-09-14 00:00:00
abstract:BACKGROUND:During the maternal-to-zygotic transition (MZT) vast changes in the embryonic transcriptome are produced by a combination of two processes: elimination of maternally provided mRNAs and synthesis of new transcripts from the zygotic genome. Previous genome-wide analyses of the MZT have been restricted to whole...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-2-r11
更新日期:2012-02-20 00:00:00
abstract::Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all the members of the eukaryotic protein kinase superfamily have recently culminated in the mining of essentially complet...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2003-4-5-111
更新日期:2003-01-01 00:00:00
abstract::A report on the Cold Spring Harbor Asia conference on Genome Assisted Biology of Crops and Model Plant Systems Meeting, held in Suzhou, China, April 21-25, 2014. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb4176
更新日期:2014-05-23 00:00:00
abstract:BACKGROUND:The recent reports of two circular RNAs (circRNAs) with strong potential to act as microRNA (miRNA) sponges suggest that circRNAs might play important roles in regulating gene expression. However, the global properties of circRNAs are not well understood. RESULTS:We developed a computational pipeline to ide...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-014-0409-z
更新日期:2014-07-29 00:00:00
abstract:BACKGROUND:Nucleosome deposition downstream of transcription initiation and DNA methylation in the gene body suggest that control of transcription elongation is a key aspect of epigenetic regulation. RESULTS:Here we report a genome-wide observation of distinct peaks of nucleosomes and methylation at both ends of a pro...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2009-10-9-r89
更新日期:2009-01-01 00:00:00
abstract::A report on the 2004 meeting on Molecular Genetics of Bacteria and Bacteriophages, Cold Spring Harbor, USA, 25-29 August 2004. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2004-5-12-357
更新日期:2004-01-01 00:00:00
abstract::The genome-wide investigation of DNA methylation levels has been limited to reference transposable element positions. The methylation analysis of non-reference and mobile transposable elements has only recently been performed, but required both genome resequencing and MethylC-seq datasets. We have created epiTEome, a ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1232-0
更新日期:2017-05-12 00:00:00
abstract:BACKGROUND:Polyploidy is ubiquitous in eukaryotic plant and fungal lineages, and it leads to the co-existence of several copies of similar or related genomes in one nucleus. In plants, polyploidy is considered a major factor in successful domestication. However, polyploidy challenges chromosome folding architecture in ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-020-01998-1
更新日期:2020-04-29 00:00:00
abstract::Large-scale biology among plant species, as well as comparative genomics of circadian clock architecture and clock-regulated output processes, have greatly advanced our understanding of the endogenous timing system in plants. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2013-14-6-208
更新日期:2013-06-24 00:00:00
abstract:BACKGROUND:Gynecologic malignancy is one of the leading causes of mortality in female adults worldwide. Comprehensive genomic analysis has revealed a list of molecular aberrations that are essential to tumorigenesis, progression, and metastasis of gynecologic tumors. However, targeting such alterations has frequently l...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-019-1848-3
更新日期:2019-11-26 00:00:00
abstract::Vascular endothelia comprise a diverse population of cells that specialize in response to genetic programs and environmental cues to take on distinct roles in different vessels, tissues, and organs, and in response to pathophysiological stresses. Characterization of endothelial-cell diversity will facilitate the devel...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2004-5-2-207
更新日期:2004-01-01 00:00:00
abstract::A report on the 11th Cold Spring Harbor Laboratory/Wellcome Trust conference on Genome Informatics, Cold Spring Harbor Laboratories, New York, USA, November 2-5, 2011. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2011-12-12-310
更新日期:2011-12-28 00:00:00
abstract::Piwi-interacting RNAs (piRNAs) originate from genomic regions dubbed piRNA clusters. How cluster transcripts are selected for processing into piRNAs is not understood. We discuss evidence for the involvement of chromatin structure and maternally inherited piRNAs in determining their fate. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb4154
更新日期:2014-01-27 00:00:00
abstract::A report of the Keystone Symposia joint meetings on DNA Methylation and Epigenomics held in Keystone, Colorado, USA, 29 March to 3 April, 2015. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-015-0674-5
更新日期:2015-05-21 00:00:00