A theorem proving approach for automatically synthesizing visualizations of flow cytometry data.

Abstract:

BACKGROUND:Polychromatic flow cytometry is a popular technique that has wide usage in the medical sciences, especially for studying phenotypic properties of cells. The high-dimensionality of data generated by flow cytometry usually makes it difficult to visualize. The naive solution of simply plotting two-dimensional graphs for every combination of observables becomes impractical as the number of dimensions increases. A natural solution is to project the data from the original high dimensional space to a lower dimensional space while approximately preserving the overall relationship between the data points. The expert can then easily visualize and analyze this low-dimensional embedding of the original dataset. RESULTS:This paper describes a new method, SANJAY, for visualizing high-dimensional flow cytometry datasets. This technique uses a decision procedure to automatically synthesize two-dimensional and three-dimensional projections of the original high-dimensional data while trying to minimize distortion. We compare SANJAY to the popular multidimensional scaling (MDS) approach for visualization of small data sets drawn from a representative set of benchmarks, and our experiments show that SANJAY produces distortions that are 1.44 to 4.15 times smaller than those caused due to MDS. Our experimental results show that SANJAY also outperforms the Random Projections technique in terms of the distortions in the projections. CONCLUSIONS:We describe a new algorithmic technique that uses a symbolic decision procedure to automatically synthesize low-dimensional projections of flow cytometry data that typically have a high number of dimensions. Our algorithm is the first application, to our knowledge, of using automated theorem proving for automatically generating highly-accurate, low-dimensional visualizations of high-dimensional data.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Raj S,Hussain F,Husein Z,Torosdagli N,Turgut D,Deo N,Pattanaik S,Chang CJ,Jha SK

doi

10.1186/s12859-017-1662-4

subject

Has Abstract

pub_date

2017-06-07 00:00:00

pages

245

issue

Suppl 8

issn

1471-2105

pii

10.1186/s12859-017-1662-4

journal_volume

18

pub_type

杂志文章
  • An efficient visualization tool for the analysis of protein mutation matrices.

    abstract:BACKGROUND:It is useful to develop a tool that would effectively describe protein mutation matrices specifically geared towards the identification of mutations that produce either wanted or unwanted effects, such as an increase or decrease in affinity, or a predisposition towards misfolding. Here, we describe a tool wh...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-218

    authors: David MP,Lapid CM,Daria VR

    更新日期:2008-04-28 00:00:00

  • Ontological representation, integration, and analysis of LINCS cell line cells and their cellular responses.

    abstract:BACKGROUND:Aiming to understand cellular responses to different perturbations, the NIH Common Fund Library of Integrated Network-based Cellular Signatures (LINCS) program involves many institutes and laboratories working on over a thousand cell lines. The community-based Cell Line Ontology (CLO) is selected as the defa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1981-5

    authors: Ong E,Xie J,Ni Z,Liu Q,Sarntivijai S,Lin Y,Cooper D,Terryn R,Stathias V,Chung C,Schürer S,He Y

    更新日期:2017-12-21 00:00:00

  • An iterative block-shifting approach to retention time alignment that preserves the shape and area of gas chromatography-mass spectrometry peaks.

    abstract:BACKGROUND:Metabolomics, petroleum and biodiesel chemistry, biomarker discovery, and other fields which rely on high-resolution profiling of complex chemical mixtures generate datasets which contain millions of detector intensity readings, each uniquely addressed along dimensions of time (e.g., retention time of chemic...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S9-S15

    authors: Chae M,Shmookler Reis RJ,Thaden JJ

    更新日期:2008-08-12 00:00:00

  • MeDEStrand: an improved method to infer genome-wide absolute methylation levels from DNA enrichment data.

    abstract:BACKGROUND:DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DN...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2574-7

    authors: Xu J,Liu S,Yin P,Bulun S,Dai Y

    更新日期:2018-12-22 00:00:00

  • JContextExplorer: a tree-based approach to facilitate cross-species genomic context comparison.

    abstract:BACKGROUND:Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial speci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-18

    authors: Seitzer P,Huynh TA,Facciotti MT

    更新日期:2013-01-16 00:00:00

  • SBML-SAT: a systems biology markup language (SBML) based sensitivity analysis tool.

    abstract:BACKGROUND:It has long been recognized that sensitivity analysis plays a key role in modeling and analyzing cellular and biochemical processes. Systems biology markup language (SBML) has become a well-known platform for coding and sharing mathematical models of such processes. However, current SBML compatible software ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-342

    authors: Zi Z,Zheng Y,Rundell AE,Klipp E

    更新日期:2008-08-15 00:00:00

  • MPD: multiplex primer design for next-generation targeted sequencing.

    abstract:BACKGROUND:Targeted resequencing offers a cost-effective alternative to whole-genome and whole-exome sequencing when investigating regions known to be associated with a trait or disease. There are a number of approaches to targeted resequencing, including microfluidic PCR amplification, which may be enhanced by multipl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1453-3

    authors: Wingo TS,Kotlar A,Cutler DJ

    更新日期:2017-01-05 00:00:00

  • Epiviz: a view inside the design of an integrated visual analysis software for genomics.

    abstract:BACKGROUND:Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S11-S4

    authors: Chelaru F,Corrada Bravo H

    更新日期:2015-01-01 00:00:00

  • Novel domain expansion methods to improve the computational efficiency of the Chemical Master Equation solution for large biological networks.

    abstract:BACKGROUND:Numerical solutions of the chemical master equation (CME) are important for understanding the stochasticity of biochemical systems. However, solving CMEs is a formidable task. This task is complicated due to the nonlinear nature of the reactions and the size of the networks which result in different realizat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03668-2

    authors: Kosarwal R,Kulasiri D,Samarasinghe S

    更新日期:2020-11-11 00:00:00

  • A mixture of feature experts approach for protein-protein interaction prediction.

    abstract:BACKGROUND:High-throughput methods can directly detect the set of interacting proteins in model species but the results are often incomplete and exhibit high false positive and false negative rates. A number of researchers have recently presented methods for integrating direct and indirect data for predicting interacti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S10-S6

    authors: Qi Y,Klein-Seetharaman J,Bar-Joseph Z

    更新日期:2007-01-01 00:00:00

  • An automatic method to calculate heart rate from zebrafish larval cardiac videos.

    abstract:BACKGROUND:Zebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2166-6

    authors: Kang CP,Tu HC,Fu TF,Wu JM,Chu PH,Chang DT

    更新日期:2018-05-09 00:00:00

  • Evolutionary Pareto-optimization of stably folding peptides.

    abstract:BACKGROUND:As a rule, peptides are more flexible and unstructured than proteins with their substantial stabilizing hydrophobic cores. Nevertheless, a few stably folding peptides have been discovered. This raises the question whether there may be more such peptides that are unknown as yet. These molecules could be helpf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-109

    authors: Gronwald W,Hohm T,Hoffmann D

    更新日期:2008-02-19 00:00:00

  • BioNanoAnalyst: a visualisation tool to assess genome assembly quality using BioNano data.

    abstract:BACKGROUND:Reference genome assemblies are valuable, as they provide insights into gene content, genetic evolution and domestication. The higher the quality of a reference genome assembly the more accurate the downstream analysis will be. During the last few years, major efforts have been made towards improving the qua...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1735-4

    authors: Yuan Y,Bayer PE,Scheben A,Chan CK,Edwards D

    更新日期:2017-06-30 00:00:00

  • A novel method to identify cooperative functional modules: study of module coordination in the Saccharomyces cerevisiae cell cycle.

    abstract:BACKGROUND:Identifying key components in biological processes and their associations is critical for deciphering cellular functions. Recently, numerous gene expression and molecular interaction experiments have been reported in Saccharomyces cerevisiae, and these have enabled systematic studies. Although a number of ap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-281

    authors: Hsu JT,Peng CH,Hsieh WP,Lan CY,Tang CY

    更新日期:2011-07-12 00:00:00

  • Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

    abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S10-S2

    authors: Pyysalo S,Ohta T,Rak R,Rowley A,Chun HW,Jung SJ,Choi SP,Tsujii J,Ananiadou S

    更新日期:2015-01-01 00:00:00

  • Application of the common base method to regression and analysis of covariance (ANCOVA) in qPCR experiments and subsequent relative expression calculation.

    abstract:BACKGROUND:Quantitative polymerase chain reaction (qPCR) is the technique of choice for quantifying gene expression. While the technique itself is well established, approaches for the analysis of qPCR data continue to improve. RESULTS:Here we expand on the common base method to develop procedures for testing linear re...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03696-y

    authors: Ganger MT,Dietz GD,Headley P,Ewing SJ

    更新日期:2020-09-29 00:00:00

  • GraphCrunch: a tool for large network analyses.

    abstract:BACKGROUND:The recent explosion in biological and other real-world network data has created the need for improved tools for large network analyses. In addition to well established global network properties, several new mathematical techniques for analyzing local structural properties of large networks have been develop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-70

    authors: Milenković T,Lai J,Przulj N

    更新日期:2008-01-30 00:00:00

  • Automating dChip: toward reproducible sharing of microarray data analysis.

    abstract:BACKGROUND:During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dC...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-231

    authors: Li C

    更新日期:2008-05-08 00:00:00

  • A semi-parametric statistical model for integrating gene expression profiles across different platforms.

    abstract:BACKGROUND:Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0847-y

    authors: Lyu Y,Li Q

    更新日期:2016-01-11 00:00:00

  • Bison: bisulfite alignment on nodes of a cluster.

    abstract:BACKGROUND:DNA methylation changes are associated with a wide array of biological processes. Bisulfite conversion of DNA followed by high-throughput sequencing is increasingly being used to assess genome-wide methylation at single-base resolution. The relative slowness of most commonly used aligners for processing such...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-337

    authors: Ryan DP,Ehninger D

    更新日期:2014-10-18 00:00:00

  • Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    abstract:BACKGROUND:Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S11-S2

    authors: Nagar A,Hahsler M

    更新日期:2013-01-01 00:00:00

  • Integrating gene expression and GO classification for PCA by preclustering.

    abstract:BACKGROUND:Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-158

    authors: De Haan JR,Piek E,van Schaik RC,de Vlieg J,Bauerschmidt S,Buydens LM,Wehrens R

    更新日期:2010-03-26 00:00:00

  • Mutation status coupled with RNA-sequencing data can efficiently identify important non-significantly mutated genes serving as diagnostic biomarkers of endometrial cancer.

    abstract:BACKGROUND:Endometrial cancers (ECs) are one of the most common types of malignant tumor in females. Substantial efforts had been made to identify significantly mutated genes (SMGs) in ECs and use them as biomarkers for the classification of histological subtypes and the prediction of clinical outcomes. However, the im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1891-6

    authors: Liu K,He L,Liu Z,Xu J,Liu Y,Kuang Q,Wen Z,Li M

    更新日期:2017-12-28 00:00:00

  • MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are recognized as one of the most important families of non-coding RNAs that serve as important sequence-specific post-transcriptional regulators of gene expression. Identification of miRNAs is an important requirement for understanding the mechanisms of post-transcriptional regulation. Hu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-341

    authors: Huang TH,Fan B,Rothschild MF,Hu ZL,Li K,Zhao SH

    更新日期:2007-09-17 00:00:00

  • Functionally specified protein signatures distinctive for each of the different blue copper proteins.

    abstract:BACKGROUND:Proteins having similar functions from different sources can be identified by the occurrence in their sequences, a conserved cluster of amino acids referred to as pattern, motif, signature or fingerprint. The wide usage of protein sequence analysis in par with the growth of databases signifies the importance...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-127

    authors: Giri AV,Anishetty S,Gautam P

    更新日期:2004-09-09 00:00:00

  • πBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios.

    abstract:BACKGROUND:Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-133

    authors: Bielejec F,Lemey P,Carvalho LM,Baele G,Rambaut A,Suchard MA

    更新日期:2014-05-07 00:00:00

  • Cluster analysis of protein array results via similarity of Gene Ontology annotation.

    abstract:BACKGROUND:With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein set...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-338

    authors: Wolting C,McGlade CJ,Tritchler D

    更新日期:2006-07-12 00:00:00

  • GraphDNA: a Java program for graphical display of DNA composition analyses.

    abstract:BACKGROUND:Under conditions of no strand bias the number of Gs is equal to that of Cs for each DNA strand; similarly, the total number of Ts is equal to that of As. However, within each strand there are considerable local deviations from the A = T and G = C equality. These asymmetries in nucleotide composition have bee...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-21

    authors: Thomas JM,Horspool D,Brown G,Tcherepanov V,Upton C

    更新日期:2007-01-23 00:00:00

  • Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

    abstract:BACKGROUND:One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-410

    authors: Scheeff ED,Bourne PE

    更新日期:2006-09-14 00:00:00

  • Simple binary segmentation frameworks for identifying variation in DNA copy number.

    abstract:BACKGROUND:Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-277

    authors: Yang TY

    更新日期:2012-10-30 00:00:00