Are dropout imputation methods for scRNA-seq effective for scHi-C data?

Abstract:

:The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structural zeros due to underlying properties, leading to observed zeros being a mixture of both types of events. Although a great deal of progress has been made in imputing dropout events for single cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHiC data. In this paper, we adapted several methods from the single-cell RNA-seq literature for inference on observed zeros in scHiC data and evaluated their effectiveness. Through an extensive simulation study and real data analysis, we have shown that a couple of the adapted single-cell RNA-seq algorithms can be powerful for correctly identifying structural zeros and accurately imputing dropout values. Downstream analysis using the imputed values showed considerable improvement for clustering cells of the same types together over clustering results before imputation.

journal_name

Brief Bioinform

authors

Han C,Xie Q,Lin S

doi

10.1093/bib/bbaa289

subject

Has Abstract

pub_date

2020-11-17 00:00:00

eissn

1467-5463

issn

1477-4054

pii

5985294

pub_type

杂志文章
  • Federating data with Information Integrator.

    abstract::Information Integrator is an extension to IBM's relational database DB2, which uses data federation to provide benefits to molecular biology researchers through two unique capabilities: increased flexibility in combining data from disparate sources, and SQL access to non-SQL data, easing the task of automating data an...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/4.4.375

    authors: Arenson AD

    更新日期:2003-12-01 00:00:00

  • Comprehensive characterization of tissue-specific circular RNAs in the human and mouse genomes.

    abstract::Circular RNA (circRNA) is a group of RNA family generated by RNA circularization, which was discovered ubiquitously across different species and tissues. However, there is no global view of tissue specificity for circRNAs to date. Here we performed the comprehensive analysis to characterize the features of human and m...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbw081

    authors: Xia S,Feng J,Lei L,Hu J,Xia L,Wang J,Xiang Y,Liu L,Zhong S,Han L,He C

    更新日期:2017-11-01 00:00:00

  • Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls.

    abstract::The use of multiple testing procedures in the context of gene-set testing is an important but relatively underexposed topic. If a multiple testing method is used, this is usually a standard familywise error rate (FWER) or false discovery rate (FDR) controlling procedure in which the logical relationships that exist be...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbv091

    authors: Meijer RJ,Goeman JJ

    更新日期:2016-09-01 00:00:00

  • Bioinformatics education--perspectives and challenges out of Africa.

    abstract::The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of ...

    journal_title:Briefings in bioinformatics

    pub_type: 历史文章,杂志文章

    doi:10.1093/bib/bbu022

    authors: Tastan Bishop Ö,Adebiyi EF,Alzohairy AM,Everett D,Ghedira K,Ghouila A,Kumuthini J,Mulder NJ,Panji S,Patterton HG,H3ABioNet Consortium.,H3Africa Consortium.

    更新日期:2015-03-01 00:00:00

  • Single-cell transcriptome-based multilayer network biomarker for predicting prognosis and therapeutic response of gliomas.

    abstract::Occurrence and development of cancers are governed by complex networks of interacting intercellular and intracellular signals. The technology of single-cell RNA sequencing (scRNA-seq) provides an unprecedented opportunity for dissecting the interplay between the cancer cells and the associated microenvironment. Here w...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz040

    authors: Zhang J,Guan M,Wang Q,Zhang J,Zhou T,Sun X

    更新日期:2020-05-21 00:00:00

  • Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees.

    abstract::Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference tre...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbr034

    authors: Boeckmann B,Robinson-Rechavi M,Xenarios I,Dessimoz C

    更新日期:2011-09-01 00:00:00

  • Allotetraploid and autotetraploid models of linkage analysis.

    abstract::As a group of important plant species in agriculture and biology, polyploids have been increasingly studied in terms of their genome structure and organization. There are two types of polyploids, allopolyploids and autopolyploids, each resulting from a different genetic origin, which undergo meiotic divisions of a dis...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbt075

    authors: Xu F,Tong C,Lyu Y,Bo W,Pang X,Wu R

    更新日期:2015-01-01 00:00:00

  • Vertical integration methods for gene expression data analysis.

    abstract::Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a 'lack of information' problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where verti...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa169

    authors: Wu M,Yi H,Ma S

    更新日期:2020-08-14 00:00:00

  • Gene-based mediation analysis in epigenetic studies.

    abstract::Mediation analysis has been a useful tool for investigating the effect of mediators that lie in the path from the independent variable to the outcome. With the increasing dimensionality of mediators such as in (epi)genomics studies, high-dimensional mediation model is needed. In this work, we focus on epigenetic studi...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa113

    authors: Fang R,Yang H,Gao Y,Cao H,Goode EL,Cui Y

    更新日期:2020-07-01 00:00:00

  • TRCirc: a resource for transcriptional regulation information of circRNAs.

    abstract::In recent years, high-throughput genomic technologies like chromatin immunoprecipitation sequencing (ChIp-seq) and transcriptome sequencing (RNA-seq) have been becoming both more refined and less expensive, making them more accessible. Many circular RNAs (circRNAs) that originate from back-spliced exons have been iden...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bby083

    authors: Tang Z,Li X,Zhao J,Qian F,Feng C,Li Y,Zhang J,Jiang Y,Yang Y,Wang Q,Li C

    更新日期:2019-11-27 00:00:00

  • Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants.

    abstract:MOTIVATION:Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the vari...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bby034

    authors: Negri TDC,Alves WAL,Bugatti PH,Saito PTM,Domingues DS,Paschoal AR

    更新日期:2019-03-25 00:00:00

  • A practical guide for the functional annotation of genetic variations using SNPnexus.

    abstract::Broader functional annotation of known as well as putative genetic variations is a valuable mean for prioritizing targets in disease studies and large-scale genotyping projects. In this article, we present a practical guide to SNPnexus, a web-based tool that provides an aggregate set of functional annotations for geno...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbt004

    authors: Dayem Ullah AZ,Lemoine NR,Chelala C

    更新日期:2013-07-01 00:00:00

  • Critical limitations of prognostic signatures based on risk scores summarized from gene expression levels: a case study for resected stage I non-small-cell lung cancer.

    abstract::Most of current gene expression signatures for cancer prognosis are based on risk scores, usually calculated as some summaries of expression levels of the signature genes, whose applications require presetting risk score thresholds and data normalization. In this study, we demonstrate the critical limitations of such ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbv064

    authors: Qi L,Chen L,Li Y,Qin Y,Pan R,Zhao W,Gu Y,Wang H,Wang R,Chen X,Guo Z

    更新日期:2016-03-01 00:00:00

  • Bioinformatic analysis of SMN1-ACE/ACE2 interactions hinted at a potential protective effect of spinal muscular atrophy against COVID-19-induced lung injury.

    abstract::Patients with spinal muscular atrophy (SMA) are susceptible to the respiratory infections and might be at a heightened risk of poor clinical outcomes upon contracting coronavirus disease 2019 (COVID-19). In the face of the COVID-19 pandemic, the potential associations of SMA with the susceptibility to and prognosticat...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa285

    authors: Li Z,Li X,Shen J,Tan H,Rong T,Lin Y,Feng E,Chen Z,Jiao Y,Liu G,Zhang L,Vai Chan MT,Kei Wu WK

    更新日期:2020-11-14 00:00:00

  • Opportunities for community awareness platforms in personal genomics and bioinformatics education.

    abstract::Precision and personalized medicine will be increasingly based on the integration of various type of information, particularly electronic health records and genome sequences. The availability of cheap genome sequencing services and the information interoperability will increase the role of online bioinformatics analys...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbw078

    authors: Bianchi L,Liò P

    更新日期:2017-11-01 00:00:00

  • Proteome-scale analysis of phase-separated proteins in immunofluorescence images.

    abstract::Phase separation is an important mechanism that mediates the spatial distribution of proteins in different cellular compartments. While phase-separated proteins share certain sequence characteristics, including intrinsically disordered regions (IDRs) and prion-like domains, such characteristics are insufficient for ma...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa187

    authors: Yu C,Shen B,You K,Huang Q,Shi M,Wu C,Chen Y,Zhang C,Li T

    更新日期:2020-09-02 00:00:00

  • Dynamics of transcriptional and post-transcriptional regulation.

    abstract::Despite gene expression programs being notoriously complex, RNA abundance is usually assumed as a proxy for transcriptional activity. Recently developed approaches, able to disentangle transcriptional and post-transcriptional regulatory processes, have revealed a more complex scenario. It is now possible to work out h...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa389

    authors: Furlan M,de Pretis S,Pelizzola M

    更新日期:2020-12-22 00:00:00

  • GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets.

    abstract::Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient's individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine's main objective-ensuring the optimum diagnosis, tre...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa033

    authors: Gutiérrez-Sacristán A,De Niz C,Kothari C,Kong SW,Mandl KD,Avillach P

    更新日期:2021-01-18 00:00:00

  • Common introns within orthologous genes: software and application to plants.

    abstract::The residence of spliceosomal introns within protein-coding genes can fluctuate over time, with genes gaining, losing or conserving introns in a complex process that is not entirely understood. One approach for studying intron evolution is to compare introns with respect to position and type within closely related gen...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbp051

    authors: Wilkerson MD,Ru Y,Brendel VP

    更新日期:2009-11-01 00:00:00

  • Structural database resources for biological macromolecules.

    abstract::This Briefing reviews the widely used, currently active, up-to-date databases derived from the worldwide Protein Data Bank (PDB) to facilitate browsing, finding and exploring its entries. These databases contain visualization and analysis tools tailored to specific kinds of molecules and interactions, often including ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbw049

    authors: Abriata LA

    更新日期:2017-07-01 00:00:00

  • Computational methods for Gene Orthology inference.

    abstract::Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis,...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbr030

    authors: Kristensen DM,Wolf YI,Mushegian AR,Koonin EV

    更新日期:2011-09-01 00:00:00

  • Extended application of genomic selection to screen multiomics data for prognostic signatures of prostate cancer.

    abstract::Prognostic tests using expression profiles of several dozen genes help provide treatment choices for prostate cancer (PCa). However, these tests require improvement to meet the clinical need for resolving overtreatment, which continues to be a pervasive problem in PCa management. Genomic selection (GS) methodology, wh...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa197

    authors: Li R,Wang S,Cui Y,Qu H,Chater JM,Zhang L,Wei J,Wang M,Xu Y,Yu L,Lu J,Feng Y,Zhou R,Huang Y,Ma R,Zhu J,Zhong W,Jia Z

    更新日期:2020-09-08 00:00:00

  • SARS-CoV-2 hot-spot mutations are significantly enriched within inverted repeats and CpG island loci.

    abstract::SARS-CoV-2 is an intensively investigated virus from the order Nidovirales (Coronaviridae family) that causes COVID-19 disease in humans. Through enormous scientific effort, thousands of viral strains have been sequenced to date, thereby creating a strong background for deep bioinformatics studies of the SARS-CoV-2 ge...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa385

    authors: Goswami P,Bartas M,Lexa M,Bohálová N,Volná A,Červeň J,Červeňová V,Pečinka P,Špunda V,Fojta M,Brázda V

    更新日期:2020-12-21 00:00:00

  • Automated glycopeptide analysis--review of current state and future directions.

    abstract::Glycosylation of proteins is involved in immune defense, cell-cell adhesion, cellular recognition and pathogen binding and is one of the most common and complex post-translational modifications. Science is still struggling to assign detailed mechanisms and functions to this form of conjugation. Even the structural ana...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbs045

    authors: Dallas DC,Martin WF,Hua S,German JB

    更新日期:2013-05-01 00:00:00

  • Hybrid modelling of biological systems using fuzzy continuous Petri nets.

    abstract::Integrated modelling of biological systems is challenged by composing components with sufficient kinetic data and components with insufficient kinetic data or components built only using experts' experience and knowledge. Fuzzy continuous Petri nets (FCPNs) combine continuous Petri nets with fuzzy inference systems, a...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbz114

    authors: Liu F,Sun W,Heiner M,Gilbert D

    更新日期:2021-01-18 00:00:00

  • FINDSITE: a combined evolution/structure-based approach to protein function prediction.

    abstract::A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the appr...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/bbp017

    authors: Skolnick J,Brylinski M

    更新日期:2009-07-01 00:00:00

  • Discovery of G-quadruplex-forming sequences in SARS-CoV-2.

    abstract::The outbreak caused by the novel coronavirus SARS-CoV-2 has been declared a global health emergency. G-quadruplex structures in genomes have long been considered essential for regulating a number of biological processes in a plethora of organisms. We have analyzed and identified 25 four contiguous GG runs (G2NxG2NyG2N...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa114

    authors: Ji D,Juhas M,Tsang CM,Kwok CK,Li Y,Zhang Y

    更新日期:2020-06-01 00:00:00

  • Protein structure prediction in genomics.

    abstract::As the number of completely sequenced genomes rapidly increases, including now the complete Human Genome sequence, the post-genomic problems of genome-scale protein structure determination and the issue of gene function identification become ever more pressing. In fact, these problems can be seen as interrelated in th...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1093/bib/2.2.111

    authors: Jones DT

    更新日期:2001-05-01 00:00:00

  • The microRNA target site landscape is a novel molecular feature associating alternative polyadenylation with immune evasion activity in breast cancer.

    abstract::Alternative polyadenylation (APA) in breast tumor samples results in the removal/addition of cis-regulatory elements such as microRNA (miRNA) target sites in the 3'-untranslated region (3'-UTRs) of genes. Although previous computational APA studies focused on a subset of genes strongly affected by APA (APA genes), we ...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa191

    authors: Kim S,Bai Y,Fan Z,Diergaarde B,Tseng GC,Park HJ

    更新日期:2020-08-26 00:00:00

  • Capacity building for whole genome sequencing of Mycobacterium tuberculosis and bioinformatics in high TB burden countries.

    abstract:BACKGROUND:Whole genome sequencing (WGS) is increasingly used for Mycobacterium tuberculosis (Mtb) research. Countries with the highest tuberculosis (TB) burden face important challenges to integrate WGS into surveillance and research. METHODS:We assessed the global status of Mtb WGS and developed a 3-week training co...

    journal_title:Briefings in bioinformatics

    pub_type: 杂志文章

    doi:10.1093/bib/bbaa246

    authors: Rivière E,Heupink TH,Ismail N,Dippenaar A,Clarke C,Abebe G,Heusden P,Warren R,Meehan CJ,Van Rie A

    更新日期:2020-10-03 00:00:00