Abstract:
:Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or "drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.
journal_name
Genome Bioljournal_title
Genome biologyauthors
Kim TH,Zhou X,Chen Mdoi
10.1186/s13059-020-02096-ysubject
Has Abstractpub_date
2020-08-06 00:00:00pages
196issue
1eissn
1474-7596issn
1474-760Xpii
10.1186/s13059-020-02096-yjournal_volume
21pub_type
杂志文章相关文献
GENOME BIOLOGY文献大全abstract::A report on the 18th Congress of the European Society for Evolutionary Biology (ESEB), Aarhus, Denmark, 20-25 August, 2001. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2001-2-11-reports4026
更新日期:2001-01-01 00:00:00
abstract::Two landmark studies of cell signaling, by RNA interference and phosphoproteomics, provide complementary global views of the pathways downstream of receptor kinases, including those regulated by Erks. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-1-202
更新日期:2007-01-01 00:00:00
abstract::Trans-splicing is an unusual process in which two separate RNA strands are spliced together to yield a mature mRNA. We present a novel computational approach which has an overall accuracy of 82% and can predict 92% of known trans-splicing sites. We have applied our method to chromosomes 1 and 3 of Leishmania major, wi...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-11-r95
更新日期:2005-01-01 00:00:00
abstract::ELXR (Exon Locator and Extractor for Resequencing) streamlines the process of determining exon/intron boundaries and designing PCR and sequencing primers for high-throughput resequencing of exons. We have pre-computed ELXR primer sets for all exons identified from the human, mouse, and rat mRNA reference sequence (Ref...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2004-5-5-r36
更新日期:2004-01-01 00:00:00
abstract::Genetic studies have identified more than 150 autoimmune loci, and next-generation sequencing will identify more. Is it time to make human the model organism for autoimmune research? ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2010-11-5-212
更新日期:2010-01-01 00:00:00
abstract::The N 6-methyladenosine (m6A) modification of mRNA has a crucial function in regulating pluripotency in murine stem cells: it facilitates resolution of naïve pluripotency towards differentiation. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/s13059-015-0609-1
更新日期:2015-02-22 00:00:00
abstract:BACKGROUND:It has been hypothesized that rapid divergence in centromere sequences accompanies rapid karyotypic change during speciation. However, the reuse of breakpoints coincident with centromeres in the evolution of divergent karyotypes poses a potential paradox. In distantly related species where the same centromer...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-8-r170
更新日期:2007-01-01 00:00:00
abstract:BACKGROUND:Despite decades of research, the agent responsible for transmitting spongiform encephalopathies (TSEs) has not been identified. The Prion hypothesis, which dominates the field, supposes that modified host PrP protein, termed PrPSc, acts as the transmissible agent. This model fits the observation that TSE dis...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2001-2-7-preprint0006
更新日期:2001-01-01 00:00:00
abstract:BACKGROUND:Epigenetic mechanisms such as chromatin accessibility impact transcription factor binding to DNA and transcriptional specificity. The androgen receptor (AR), a master regulator of the male phenotype and prostate cancer pathogenesis, acts primarily through ligand-activated transcription of target genes. Altho...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-10-r88
更新日期:2012-10-03 00:00:00
abstract::PIWI proteins, a subfamily of PAZ/PIWI Domain family RNA-binding proteins, are best known for their function in silencing transposons and germline development by partnering with small noncoding RNAs called PIWI-interacting RNAs (piRNAs). However, recent studies have revealed multifaceted roles of the PIWI-piRNA pathwa...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/s13059-020-02221-x
更新日期:2021-01-08 00:00:00
abstract::In a recent study, Petrovski and Goldstein reported that (non-Finnish) Europeans have significantly fewer nonsynonymous singletons in Online Mendelian Inheritance in Man (OMIM) disease genes compared with Africans, Latinos, South Asians, East Asians, and other unassigned non-Europeans. We use simulations of Exome Aggr...
journal_title:Genome biology
pub_type: 评论,信件
doi:10.1186/s13059-017-1172-8
更新日期:2017-02-27 00:00:00
abstract:BACKGROUND:Adenosine-to-inosine (A-to-I) editing of dsRNA by ADAR proteins is a pervasive epitranscriptome feature. Tens of thousands of A-to-I editing events are defined in the mouse, yet the functional impact of most is unknown. Editing causing protein recoding is the essential function of ADAR2, but an essential rol...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1301-4
更新日期:2017-09-05 00:00:00
abstract::A report on the Cold Spring Harbor Laboratory meeting on the Evolution of Developmental Diversity, Cold Spring Harbor, NY, USA, 17-21 April 2002. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2002-3-8-reports4023
更新日期:2002-07-15 00:00:00
abstract::The honey bee Apis mellifera displays the most complex behavior of any insect. This, and its utility to humans, makes it a fascinating object of study for biologists. Such studies are now further enabled by the release of the honey-bee genome sequence. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2006-7-10-239
更新日期:2006-01-01 00:00:00
abstract::The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics. ...
journal_title:Genome biology
pub_type: 信件
doi:10.1186/gb-2008-9-3-402
更新日期:2008-01-01 00:00:00
abstract::Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus cluste...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2004-5-11-r94
更新日期:2004-01-01 00:00:00
abstract:BACKGROUND:Many eukaryotic genomes encode cis-natural antisense transcripts (cis-NATs). Sense and antisense transcripts may form double-stranded RNAs that are processed by the RNA interference machinery into small interfering RNAs (siRNAs). A few so-called nat-siRNAs have been reported in plants, mammals, Drosophila, a...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-3-r20
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:The chromodomain helicase DNA-binding family of ATP-dependent chromatin remodeling factors play essential roles during eukaryote growth and development. They are recruited by specific transcription factors and regulate the expression of developmentally important genes. Here, we describe an unexpected role in...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1226-y
更新日期:2017-05-31 00:00:00
abstract::Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all the members of the eukaryotic protein kinase superfamily have recently culminated in the mining of essentially complet...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2003-4-5-111
更新日期:2003-01-01 00:00:00
abstract:BACKGROUND:Cell lineage-specific DNA methylation patterns distinguish normal human leukocyte subsets and can be used to detect and quantify these subsets in peripheral blood. We have developed an approach that uses DNA methylation to simultaneously quantify multiple leukocyte subsets, enabling investigation of immune m...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2014-15-3-r50
更新日期:2014-03-05 00:00:00
abstract::The recent completion of the DNA sequence of human chromosome 21 has provided the first look at the 225 genes that are candidates for involvement in Down syndrome (trisomy 21). A broad functional classification of these genes, their expression data and evolutionary conservation, and comparison with the gene content of...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2000-1-2-reviews0002
更新日期:2000-01-01 00:00:00
abstract::We present a new method, Fine-Mapping of Adaptive Variation (FineMAV), which combines population differentiation, derived allele frequency, and molecular functionality to prioritize positively selected candidate variants for functional follow-up. We calibrate and test FineMAV using eight experimentally validated "gold...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1380-2
更新日期:2018-01-17 00:00:00
abstract::Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2003-4-12-r83
更新日期:2003-01-01 00:00:00
abstract:BACKGROUND:Transposable elements (TEs) are major components of large plant genomes and main drivers of genome evolution. The most recent assembly of hexaploid bread wheat recovered the highly repetitive TE space in an almost complete chromosomal context and enabled a detailed view into the dynamics of TEs in the A, B, ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-018-1479-0
更新日期:2018-08-17 00:00:00
abstract::By using chromosome conformation capture technology, a recent study has revealed two alternative three-dimensional folding states of the human genome during the cell cycle. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb4147
更新日期:2013-12-24 00:00:00
abstract::Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SC...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1200-8
更新日期:2017-04-26 00:00:00
abstract::In most eukaryotes, histones, which are the major structural components of chromatin, are expressed as a family of sequence variants encoded by multiple genes. Because different histone variants can contribute to a distinct or unique nucleosomal architecture, this heterogeneity can be exploited to regulate a wide rang...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2001-2-7-reviews0006
更新日期:2001-01-01 00:00:00
abstract::Stochastic changes in DNA methylation (i.e., spontaneous epimutations) contribute to methylome diversity in plants. Here, we describe AlphaBeta, a computational method for estimating the precise rate of such stochastic events using pedigree-based DNA methylation data as input. We demonstrate how AlphaBeta can be emplo...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-020-02161-6
更新日期:2020-10-06 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are a class of small, non-coding regulatory RNAs that regulate gene expression by guiding target mRNA cleavage or translational inhibition. So far, identification of miRNAs has been limited to a few model plant species, such as Arabidopsis, rice and Populus, whose genomes have been sequenc...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-6-r96
更新日期:2007-01-01 00:00:00
abstract::A new generation of DNA-sequencing platforms will become commercially available over the next few years. These instruments will enable re-sequencing of human genomes at a previously unimagined throughput and low cost. Here, I examine why the 1,000 dollar human genome is an important goal for research and clinical diag...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2006-7-7-112
更新日期:2006-01-01 00:00:00