BIOZON: a system for unification, management and analysis of heterogeneous biological data.

Abstract:

BACKGROUND:Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. DESCRIPTION:Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. CONCLUSION:The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Birkland A,Yona G

doi

10.1186/1471-2105-7-70

keywords:

subject

Has Abstract

pub_date

2006-02-15 00:00:00

pages

70

issn

1471-2105

pii

1471-2105-7-70

journal_volume

7

pub_type

杂志文章
  • Protein network prediction and topological analysis in Leishmania major as a tool for drug target selection.

    abstract:BACKGROUND:Leishmaniasis is a virulent parasitic infection that causes a worldwide disease burden. Most treatments have toxic side-effects and efficacy has decreased due to the emergence of resistant strains. The outlook is worsened by the absence of promising drug targets for this disease. We have taken a computationa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-484

    authors: Flórez AF,Park D,Bhak J,Kim BC,Kuchinsky A,Morris JH,Espinosa J,Muskus C

    更新日期:2010-09-27 00:00:00

  • Identifying and quantifying metabolites by scoring peaks of GC-MS data.

    abstract:BACKGROUND:Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0374-2

    authors: Aggio RB,Mayor A,Reade S,Probert CS,Ruggiero K

    更新日期:2014-12-10 00:00:00

  • OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments.

    abstract:BACKGROUND:Differentially expressed genes are typically identified by analyzing the variation between replicate measurements. These procedures implicitly assume that there are no systematic errors in the data even though several sources of systematic error are known. RESULTS:OpWise estimates the amount of systematic e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-19

    authors: Price MN,Arkin AP,Alm EJ

    更新日期:2006-01-13 00:00:00

  • NEAT: an efficient network enrichment analysis test.

    abstract:BACKGROUND:Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1203-6

    authors: Signorelli M,Vinciotti V,Wit EC

    更新日期:2016-09-05 00:00:00

  • Finding sRNA generative locales from high-throughput sequencing data with NiBLS.

    abstract:BACKGROUND:Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a pauci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-93

    authors: MacLean D,Moulton V,Studholme DJ

    更新日期:2010-02-18 00:00:00

  • SemaTyP: a knowledge graph based literature mining method for drug discovery.

    abstract:BACKGROUND:Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2167-5

    authors: Sang S,Yang Z,Wang L,Liu X,Lin H,Wang J

    更新日期:2018-05-30 00:00:00

  • Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach.

    abstract:BACKGROUND:Cellular functions are coordinately carried out by groups of genes forming functional modules. Identifying such modules in the transcriptional regulatory network (TRN) of organisms is important for understanding the structure and function of these fundamental cellular networks and essential for the emerging ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-199

    authors: Ma HW,Buer J,Zeng AP

    更新日期:2004-12-16 00:00:00

  • CellSim: a novel software to calculate cell similarity and identify their co-regulation networks.

    abstract:BACKGROUND:Cell direct reprogramming technology has been rapidly developed with its low risk of tumor risk and avoidance of ethical issues caused by stem cells, but it is still limited to specific cell types. Direct reprogramming from an original cell to target cell type needs the cell similarity and cell specific regu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2699-3

    authors: Li L,Che D,Wang X,Zhang P,Rahman SU,Zhao J,Yu J,Tao S,Lu H,Liao M

    更新日期:2019-03-04 00:00:00

  • Ab-origin: an enhanced tool to identify the sourcing gene segments in germline for rearranged antibodies.

    abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S12-S20

    authors: Wang X,Wu D,Zheng S,Sun J,Tao L,Li Y,Cao Z

    更新日期:2008-12-12 00:00:00

  • High-order dynamic Bayesian Network learning with hidden common causes for causal gene regulatory network.

    abstract:BACKGROUND:Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. Howeve...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0823-6

    authors: Lo LY,Wong ML,Lee KH,Leung KS

    更新日期:2015-11-25 00:00:00

  • Systematic integration of experimental data and models in systems biology.

    abstract:BACKGROUND:The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-582

    authors: Li P,Dada JO,Jameson D,Spasic I,Swainston N,Carroll K,Dunn W,Khan F,Malys N,Messiha HL,Simeonidis E,Weichart D,Winder C,Wishart J,Broomhead DS,Goble CA,Gaskell SJ,Kell DB,Westerhoff HV,Mendes P,Paton NW

    更新日期:2010-11-29 00:00:00

  • Notos - a galaxy tool to analyze CpN observed expected ratios for inferring DNA methylation types.

    abstract:BACKGROUND:DNA methylation patterns store epigenetic information in the vast majority of eukaryotic species. The relatively high costs and technical challenges associated with the detection of DNA methylation however have created a bias in the number of methylation studies towards model organisms. Consequently, it rema...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2115-4

    authors: Bulla I,Aliaga B,Lacal V,Bulla J,Grunau C,Chaparro C

    更新日期:2018-03-27 00:00:00

  • Computational evaluation of TIS annotation for prokaryotic genomes.

    abstract:BACKGROUND:Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. RESULTS:Based on a homogeneity assumptio...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-160

    authors: Hu GQ,Zheng X,Ju LN,Zhu H,She ZS

    更新日期:2008-03-25 00:00:00

  • PVT: an efficient computational procedure to speed up next-generation sequence analysis.

    abstract:BACKGROUND:High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-167

    authors: Maji RK,Sarkar A,Khatua S,Dasgupta S,Ghosh Z

    更新日期:2014-06-04 00:00:00

  • Comparative evaluation of gene set analysis approaches for RNA-Seq data.

    abstract:BACKGROUND:Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0397-8

    authors: Rahmatallah Y,Emmert-Streib F,Glazko G

    更新日期:2014-12-05 00:00:00

  • Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers.

    abstract:BACKGROUND:Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3085-x

    authors: Ito S,Yadome M,Nishiki T,Ishiduki S,Inoue H,Yamaguchi R,Miyano S

    更新日期:2019-12-02 00:00:00

  • Determination of strongly overlapping signaling activity from microarray data.

    abstract:BACKGROUND:As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target ide...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-99

    authors: Bidaut G,Suhre K,Claverie JM,Ochs MF

    更新日期:2006-02-28 00:00:00

  • Benchmarking the HLA typing performance of Polysolver and Optitype in 50 Danish parental trios.

    abstract:BACKGROUND:The adaptive immune response intrinsically depends on hypervariable human leukocyte antigen (HLA) genes. Concomitantly, correct HLA phenotyping is crucial for successful donor-patient matching in organ transplantation. The cost and technical limitations of current laboratory techniques, together with advance...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2239-6

    authors: Matey-Hernandez ML,Danish Pan Genome Consortium.,Brunak S,Izarzugaza JMG

    更新日期:2018-06-25 00:00:00

  • BRCA-Pathway: a structural integration and visualization system of TCGA breast cancer data on KEGG pathways.

    abstract:BACKGROUND:Bioinformatics research for finding biological mechanisms can be done by analysis of transcriptome data with pathway based interpretation. Therefore, researchers have tried to develop tools to analyze transcriptome data with pathway based interpretation. Over the years, the amount of omics data has become hu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2016-6

    authors: Kim I,Choi S,Kim S

    更新日期:2018-02-19 00:00:00

  • A unifying model of genome evolution under parsimony.

    abstract:BACKGROUND:Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation. RESULTS:We present a data structure called a history graph that offers a practical ba...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-206

    authors: Paten B,Zerbino DR,Hickey G,Haussler D

    更新日期:2014-06-19 00:00:00

  • An automatic method to calculate heart rate from zebrafish larval cardiac videos.

    abstract:BACKGROUND:Zebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2166-6

    authors: Kang CP,Tu HC,Fu TF,Wu JM,Chu PH,Chang DT

    更新日期:2018-05-09 00:00:00

  • Attenuating dependence on structural data in computing protein energy landscapes.

    abstract:BACKGROUND:Nearly all cellular processes involve proteins structurally rearranging to accommodate molecular partners. The energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. In principle, reconstructing a protein's energy landscape ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2822-5

    authors: Morris D,Maximova T,Plaku E,Shehu A

    更新日期:2019-06-06 00:00:00

  • Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms.

    abstract:BACKGROUND:It is possible to predict whether a tuberculosis (TB) patient will fail to respond to specific antibiotics by sequencing the genome of the infecting Mycobacterium tuberculosis (Mtb) and observing whether the pathogen carries specific mutations at drug-resistance sites. This advancement has led to the collati...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2658-z

    authors: Ngo TM,Teo YY

    更新日期:2019-02-08 00:00:00

  • DNLC: differential network local consistency analysis.

    abstract:BACKGROUND:The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3046-4

    authors: Lu J,Lu Y,Ding Y,Xiao Q,Liu L,Cai Q,Kong Y,Bai Y,Yu T

    更新日期:2019-12-24 00:00:00

  • LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference.

    abstract:BACKGROUND:Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS3, a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of sequences that evolve at a r...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3020-1

    authors: Rivera-Rivera CJ,Montoya-Burgos JI

    更新日期:2019-08-13 00:00:00

  • Insertion and deletion correcting DNA barcodes based on watermarks.

    abstract:BACKGROUND:Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequenc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0482-7

    authors: Kracht D,Schober S

    更新日期:2015-02-18 00:00:00

  • Spectral estimation in unevenly sampled space of periodically expressed microarray time series data.

    abstract:BACKGROUND:Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-137

    authors: Liew AW,Xian J,Wu S,Smith D,Yan H

    更新日期:2007-04-24 00:00:00

  • Moiety modeling framework for deriving moiety abundances from mass spectrometry measured isotopologues.

    abstract:BACKGROUND:Stable isotope tracing can follow individual atoms through metabolic transformations through the detection of the incorporation of stable isotope within metabolites. This resulting data can be interpreted in terms related to metabolic flux. However, detection of a stable isotope in metabolites by mass spectr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3096-7

    authors: Jin H,Moseley HNB

    更新日期:2019-10-28 00:00:00

  • A multiresolution approach to automated classification of protein subcellular location images.

    abstract:BACKGROUND:Fluorescence microscopy is widely used to determine the subcellular location of proteins. Efforts to determine location on a proteome-wide basis create a need for automated methods to analyze the resulting images. Over the past ten years, the feasibility of using machine learning methods to recognize all maj...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-210

    authors: Chebira A,Barbotin Y,Jackson C,Merryman T,Srinivasa G,Murphy RF,Kovacević J

    更新日期:2007-06-19 00:00:00

  • Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling.

    abstract:BACKGROUND:Long-range interactions between regulatory DNA elements such as enhancers, insulators and promoters play an important role in regulating transcription. As chromatin contacts have been found throughout the human genome and in different cell types, spatial transcriptional control is now viewed as a general mec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-414

    authors: Rousseau M,Fraser J,Ferraiuolo MA,Dostie J,Blanchette M

    更新日期:2011-10-25 00:00:00