Demystifying "drop-outs" in single-cell UMI data.

Abstract:

:Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or "drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.

journal_name

Genome Biol

journal_title

Genome biology

authors

Kim TH,Zhou X,Chen M

doi

10.1186/s13059-020-02096-y

subject

Has Abstract

pub_date

2020-08-06 00:00:00

pages

196

issue

1

eissn

1474-7596

issn

1474-760X

pii

10.1186/s13059-020-02096-y

journal_volume

21

pub_type

杂志文章
  • Evolution enters the genomic era.

    abstract::A report on the 18th Congress of the European Society for Evolutionary Biology (ESEB), Aarhus, Denmark, 20-25 August, 2001. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2001-2-11-reports4026

    authors: Liberles DA

    更新日期:2001-01-01 00:00:00

  • Signaling netwErks get the global treatment.

    abstract::Two landmark studies of cell signaling, by RNA interference and phosphoproteomics, provide complementary global views of the pathways downstream of receptor kinases, including those regulated by Erks. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-1-202

    authors: Yaffe MB,White FM

    更新日期:2007-01-01 00:00:00

  • A computational investigation of kinetoplastid trans-splicing.

    abstract::Trans-splicing is an unusual process in which two separate RNA strands are spliced together to yield a mature mRNA. We present a novel computational approach which has an overall accuracy of 82% and can predict 92% of known trans-splicing sites. We have applied our method to chromosomes 1 and 3 of Leishmania major, wi...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-11-r95

    authors: Gopal S,Awadalla S,Gaasterland T,Cross GA

    更新日期:2005-01-01 00:00:00

  • ELXR: a resource for rapid exon-directed sequence analysis.

    abstract::ELXR (Exon Locator and Extractor for Resequencing) streamlines the process of determining exon/intron boundaries and designing PCR and sequencing primers for high-throughput resequencing of exons. We have pre-computed ELXR primer sets for all exons identified from the human, mouse, and rat mRNA reference sequence (Ref...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2004-5-5-r36

    authors: Schageman JJ,Horton CJ,Niu S,Garner HR,Pertsemlidis A

    更新日期:2004-01-01 00:00:00

  • GWASs and the age of human as the model organism for autoimmune genetic research.

    abstract::Genetic studies have identified more than 150 autoimmune loci, and next-generation sequencing will identify more. Is it time to make human the model organism for autoimmune research? ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2010-11-5-212

    authors: Plenge R

    更新日期:2010-01-01 00:00:00

  • Fate by RNA methylation: m6A steers stem cell pluripotency.

    abstract::The N 6-methyladenosine (m6A) modification of mRNA has a crucial function in regulating pluripotency in murine stem cells: it facilitates resolution of naïve pluripotency towards differentiation. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/s13059-015-0609-1

    authors: Zhao BS,He C

    更新日期:2015-02-22 00:00:00

  • Species-specific shifts in centromere sequence composition are coincident with breakpoint reuse in karyotypically divergent lineages.

    abstract:BACKGROUND:It has been hypothesized that rapid divergence in centromere sequences accompanies rapid karyotypic change during speciation. However, the reuse of breakpoints coincident with centromeres in the evolution of divergent karyotypes poses a potential paradox. In distantly related species where the same centromer...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-8-r170

    authors: Bulazel KV,Ferreri GC,Eldridge MD,O'Neill RJ

    更新日期:2007-01-01 00:00:00

  • RuNAway Disease: A two cycle model for transmissible spongiform encephalopathies (TSEs) wherein

    abstract:BACKGROUND:Despite decades of research, the agent responsible for transmitting spongiform encephalopathies (TSEs) has not been identified. The Prion hypothesis, which dominates the field, supposes that modified host PrP protein, termed PrPSc, acts as the transmissible agent. This model fits the observation that TSE dis...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2001-2-7-preprint0006

    authors: Gibson TJ

    更新日期:2001-01-01 00:00:00

  • Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity.

    abstract:BACKGROUND:Epigenetic mechanisms such as chromatin accessibility impact transcription factor binding to DNA and transcriptional specificity. The androgen receptor (AR), a master regulator of the male phenotype and prostate cancer pathogenesis, acts primarily through ligand-activated transcription of target genes. Altho...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-10-r88

    authors: Tewari AK,Yardimci GG,Shibata Y,Sheffield NC,Song L,Taylor BS,Georgiev SG,Coetzee GA,Ohler U,Furey TS,Crawford GE,Febbo PG

    更新日期:2012-10-03 00:00:00

  • Roles of piRNAs in transposon and pseudogene regulation of germline mRNAs and lncRNAs.

    abstract::PIWI proteins, a subfamily of PAZ/PIWI Domain family RNA-binding proteins, are best known for their function in silencing transposons and germline development by partnering with small noncoding RNAs called PIWI-interacting RNAs (piRNAs). However, recent studies have revealed multifaceted roles of the PIWI-piRNA pathwa...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/s13059-020-02221-x

    authors: Wang C,Lin H

    更新日期:2021-01-08 00:00:00

  • Accurate and equitable medical genomic analysis requires an understanding of demography and its influence on sample size and ratio.

    abstract::In a recent study, Petrovski and Goldstein reported that (non-Finnish) Europeans have significantly fewer nonsynonymous singletons in Online Mendelian Inheritance in Man (OMIM) disease genes compared with Africans, Latinos, South Asians, East Asians, and other unassigned non-Europeans. We use simulations of Exome Aggr...

    journal_title:Genome biology

    pub_type: 评论,信件

    doi:10.1186/s13059-017-1172-8

    authors: Kessler MD,O'Connor TD

    更新日期:2017-02-27 00:00:00

  • Protein recoding by ADAR1-mediated RNA editing is not essential for normal development and homeostasis.

    abstract:BACKGROUND:Adenosine-to-inosine (A-to-I) editing of dsRNA by ADAR proteins is a pervasive epitranscriptome feature. Tens of thousands of A-to-I editing events are defined in the mouse, yet the functional impact of most is unknown. Editing causing protein recoding is the essential function of ADAR2, but an essential rol...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1301-4

    authors: Heraud-Farlow JE,Chalk AM,Linder SE,Li Q,Taylor S,White JM,Pang L,Liddicoat BJ,Gupte A,Li JB,Walkley CR

    更新日期:2017-09-05 00:00:00

  • Comparative biology and genomics join forces to decipher the diversity of life.

    abstract::A report on the Cold Spring Harbor Laboratory meeting on the Evolution of Developmental Diversity, Cold Spring Harbor, NY, USA, 17-21 April 2002. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2002-3-8-reports4023

    authors: King N

    更新日期:2002-07-15 00:00:00

  • Getting a buzz out of the bee genome.

    abstract::The honey bee Apis mellifera displays the most complex behavior of any insect. This, and its utility to humans, makes it a fascinating object of study for biologists. Such studies are now further enabled by the release of the honey-bee genome sequence. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2006-7-10-239

    authors: Ashburner M,Kyriacou CP

    更新日期:2006-01-01 00:00:00

  • The Amborella genome: an evolutionary reference for plant biology.

    abstract::The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics. ...

    journal_title:Genome biology

    pub_type: 信件

    doi:10.1186/gb-2008-9-3-402

    authors: Soltis DE,Albert VA,Leebens-Mack J,Palmer JD,Wing RA,dePamphilis CW,Ma H,Carlson JE,Altman N,Kim S,Wall PK,Zuccolo A,Soltis PS

    更新日期:2008-01-01 00:00:00

  • Consensus clustering and functional interpretation of gene-expression data.

    abstract::Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus cluste...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2004-5-11-r94

    authors: Swift S,Tucker A,Vinciotti V,Martin N,Orengo C,Liu X,Kellam P

    更新日期:2004-01-01 00:00:00

  • Genome-wide analysis of plant nat-siRNAs reveals insights into their distribution, biogenesis and function.

    abstract:BACKGROUND:Many eukaryotic genomes encode cis-natural antisense transcripts (cis-NATs). Sense and antisense transcripts may form double-stranded RNAs that are processed by the RNA interference machinery into small interfering RNAs (siRNAs). A few so-called nat-siRNAs have been reported in plants, mammals, Drosophila, a...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-3-r20

    authors: Zhang X,Xia J,Lii YE,Barrera-Figueroa BE,Zhou X,Gao S,Lu L,Niu D,Chen Z,Leung C,Wong T,Zhang H,Guo J,Li Y,Liu R,Liang W,Zhu JK,Zhang W,Jin H

    更新日期:2012-01-01 00:00:00

  • The developmental regulator PKL is required to maintain correct DNA methylation patterns at RNA-directed DNA methylation loci.

    abstract:BACKGROUND:The chromodomain helicase DNA-binding family of ATP-dependent chromatin remodeling factors play essential roles during eukaryote growth and development. They are recruited by specific transcription factors and regulate the expression of developmentally important genes. Here, we describe an unexpected role in...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1226-y

    authors: Yang R,Zheng Z,Chen Q,Yang L,Huang H,Miki D,Wu W,Zeng L,Liu J,Zhou JX,Ogas J,Zhu JK,He XJ,Zhang H

    更新日期:2017-05-31 00:00:00

  • Genomic analysis of the eukaryotic protein kinase superfamily: a perspective.

    abstract::Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all the members of the eukaryotic protein kinase superfamily have recently culminated in the mining of essentially complet...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2003-4-5-111

    authors: Hanks SK

    更新日期:2003-01-01 00:00:00

  • Quantitative reconstruction of leukocyte subsets using DNA methylation.

    abstract:BACKGROUND:Cell lineage-specific DNA methylation patterns distinguish normal human leukocyte subsets and can be used to detect and quantify these subsets in peripheral blood. We have developed an approach that uses DNA methylation to simultaneously quantify multiple leukocyte subsets, enabling investigation of immune m...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2014-15-3-r50

    authors: Accomando WP,Wiencke JK,Houseman EA,Nelson HH,Kelsey KT

    更新日期:2014-03-05 00:00:00

  • The sequence of human chromosome 21 and implications for research into Down syndrome.

    abstract::The recent completion of the DNA sequence of human chromosome 21 has provided the first look at the 225 genes that are candidates for involvement in Down syndrome (trisomy 21). A broad functional classification of these genes, their expression data and evolutionary conservation, and comparison with the gene content of...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2000-1-2-reviews0002

    authors: Gardiner K,Davisson M

    更新日期:2000-01-01 00:00:00

  • FineMAV: prioritizing candidate genetic variants driving local adaptations in human populations.

    abstract::We present a new method, Fine-Mapping of Adaptive Variation (FineMAV), which combines population differentiation, derived allele frequency, and molecular functionality to prioritize positively selected candidate variants for functional follow-up. We calibrate and test FineMAV using eight experimentally validated "gold...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1380-2

    authors: Szpak M,Mezzavilla M,Ayub Q,Chen Y,Xue Y,Tyler-Smith C

    更新日期:2018-01-17 00:00:00

  • Multiclass classification of microarray data with repeated measurements: application to cancer.

    abstract::Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2003-4-12-r83

    authors: Yeung KY,Bumgarner RE

    更新日期:2003-01-01 00:00:00

  • Impact of transposable elements on genome structure and evolution in bread wheat.

    abstract:BACKGROUND:Transposable elements (TEs) are major components of large plant genomes and main drivers of genome evolution. The most recent assembly of hexaploid bread wheat recovered the highly repetitive TE space in an almost complete chromosomal context and enabled a detailed view into the dynamics of TEs in the A, B, ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-018-1479-0

    authors: Wicker T,Gundlach H,Spannagl M,Uauy C,Borrill P,Ramírez-González RH,De Oliveira R,International Wheat Genome Sequencing Consortium.,Mayer KFX,Paux E,Choulet F

    更新日期:2018-08-17 00:00:00

  • Changes in the organization of the genome during the mammalian cell cycle.

    abstract::By using chromosome conformation capture technology, a recent study has revealed two alternative three-dimensional folding states of the human genome during the cell cycle. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb4147

    authors: Giorgetti L,Servant N,Heard E

    更新日期:2013-12-24 00:00:00

  • SCALE: modeling allele-specific gene expression by single-cell RNA sequencing.

    abstract::Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SC...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1200-8

    authors: Jiang Y,Zhang NR,Li M

    更新日期:2017-04-26 00:00:00

  • Histone variants: are they functionally heterogeneous?

    abstract::In most eukaryotes, histones, which are the major structural components of chromatin, are expressed as a family of sequence variants encoded by multiple genes. Because different histone variants can contribute to a distinct or unique nucleosomal architecture, this heterogeneity can be exploited to regulate a wide rang...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2001-2-7-reviews0006

    authors: Brown DT

    更新日期:2001-01-01 00:00:00

  • AlphaBeta: computational inference of epimutation rates and spectra from high-throughput DNA methylation data in plants.

    abstract::Stochastic changes in DNA methylation (i.e., spontaneous epimutations) contribute to methylome diversity in plants. Here, we describe AlphaBeta, a computational method for estimating the precise rate of such stochastic events using pedigree-based DNA methylation data as input. We demonstrate how AlphaBeta can be emplo...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-02161-6

    authors: Shahryary Y,Symeonidi A,Hazarika RR,Denkena J,Mubeen T,Hofmeister B,van Gurp T,Colomé-Tatché M,Verhoeven KJF,Tuskan G,Schmitz RJ,Johannes F

    更新日期:2020-10-06 00:00:00

  • Cloning and characterization of microRNAs from wheat (Triticum aestivum L.).

    abstract:BACKGROUND:MicroRNAs (miRNAs) are a class of small, non-coding regulatory RNAs that regulate gene expression by guiding target mRNA cleavage or translational inhibition. So far, identification of miRNAs has been limited to a few model plant species, such as Arabidopsis, rice and Populus, whose genomes have been sequenc...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-6-r96

    authors: Yao Y,Guo G,Ni Z,Sunkar R,Du J,Zhu JK,Sun Q

    更新日期:2007-01-01 00:00:00

  • Anticipating the 1,000 dollar genome.

    abstract::A new generation of DNA-sequencing platforms will become commercially available over the next few years. These instruments will enable re-sequencing of human genomes at a previously unimagined throughput and low cost. Here, I examine why the 1,000 dollar human genome is an important goal for research and clinical diag...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2006-7-7-112

    authors: Mardis ER

    更新日期:2006-01-01 00:00:00