Improving ontologies by automatic reasoning and evaluation of logical definitions.

Abstract:

BACKGROUND:Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly being adopted as a method for quality control as well as for facilitating interoperability and data integration. RESULTS:We show how automated reasoning over logical definitions of ontology terms can be used to improve ontology structure. We provide the Java software package GULO (Getting an Understanding of LOgical definitions), which allows fast and easy evaluation for any kind of logically decomposed ontology by generating a composite OWL ontology from appropriate subsets of the referenced ontologies and comparing the inferred relationships with the relationships asserted in the target ontology. As a case study we show how to use GULO to evaluate the logical definitions that have been developed for the Mammalian Phenotype Ontology (MPO). CONCLUSIONS:Logical definitions of terms from biomedical ontologies represent an important resource for error and disagreement detection. GULO gives ontology curators a fast and simple tool for validation of their work.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Köhler S,Bauer S,Mungall CJ,Carletti G,Smith CL,Schofield P,Gkoutos GV,Robinson PN

doi

10.1186/1471-2105-12-418

subject

Has Abstract

pub_date

2011-10-27 00:00:00

pages

418

issn

1471-2105

pii

1471-2105-12-418

journal_volume

12

pub_type

杂志文章
  • Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach.

    abstract:BACKGROUND:Cellular functions are coordinately carried out by groups of genes forming functional modules. Identifying such modules in the transcriptional regulatory network (TRN) of organisms is important for understanding the structure and function of these fundamental cellular networks and essential for the emerging ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-199

    authors: Ma HW,Buer J,Zeng AP

    更新日期:2004-12-16 00:00:00

  • Integrated olfactory receptor and microarray gene expression databases.

    abstract:BACKGROUND:Gene expression patterns of olfactory receptors (ORs) are an important component of the signal encoding mechanism in the olfactory system since they determine the interactions between odorant ligands and sensory neurons. We have developed the Olfactory Receptor Microarray Database (ORMD) to house OR gene exp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-231

    authors: Liu N,Crasto CJ,Ma M

    更新日期:2007-06-30 00:00:00

  • Cluster analysis of protein array results via similarity of Gene Ontology annotation.

    abstract:BACKGROUND:With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein set...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-338

    authors: Wolting C,McGlade CJ,Tritchler D

    更新日期:2006-07-12 00:00:00

  • LncRNA HOTAIR-mediated Wnt/β-catenin network modeling to predict and validate therapeutic targets for cartilage damage.

    abstract:BACKGROUND:Cartilage damage is a crucial feature involved in several pathological conditions characterized by joint disorders, such as osteoarthritis and rheumatoid arthritis. Accumulated evidences showed that Wnt/β-catenin pathway plays a role in the pathogenesis of cartilage damage. In addition, it is experimentally ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2981-4

    authors: Zhou W,He X,Chen Z,Fan D,Wang Y,Feng H,Zhang G,Lu A,Xiao L

    更新日期:2019-07-31 00:00:00

  • SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

    abstract:BACKGROUND:Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero value...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3067-z

    authors: Li Y,Fan TWM,Lane AN,Kang WY,Arnold SM,Stromberg AJ,Wang C,Chen L

    更新日期:2019-10-17 00:00:00

  • Evidence for intron length conservation in a set of mammalian genes associated with embryonic development.

    abstract:BACKGROUND:We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S9-S16

    authors: Seoighe C,Korir PK

    更新日期:2011-10-05 00:00:00

  • A context-blocks model for identifying clinical relationships in patient records.

    abstract:BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preli...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S3-S3

    authors: Islamaj Doğan R,Névéol A,Lu Z

    更新日期:2011-06-09 00:00:00

  • Accuracy of RNA-Seq and its dependence on sequencing depth.

    abstract:BACKGROUND:The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S13-S5

    authors: Cai G,Li H,Lu Y,Huang X,Lee J,Müller P,Ji Y,Liang S

    更新日期:2012-01-01 00:00:00

  • Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    abstract:BACKGROUND:In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multip...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1273-5

    authors: Voillet V,Besse P,Liaubet L,San Cristobal M,González I

    更新日期:2016-10-03 00:00:00

  • Machine learning for discovering missing or wrong protein function annotations : A comparison using updated benchmark datasets.

    abstract:BACKGROUND:A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical m...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章,评审

    doi:10.1186/s12859-019-3060-6

    authors: Nakano FK,Lietaert M,Vens C

    更新日期:2019-09-23 00:00:00

  • WellInverter: a web application for the analysis of fluorescent reporter gene data.

    abstract:BACKGROUND:Fluorescent reporter genes have become widely used for monitoring gene expression in living cells. When a microbial strain carrying a reporter gene is grown in a microplate reader, the fluorescence and the absorbance (optical density) of the culture can be automatically measured every few minutes in a highly...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2920-4

    authors: Martin Y,Page M,Blanchet C,de Jong H

    更新日期:2019-06-11 00:00:00

  • Sequencing error correction without a reference genome.

    abstract:BACKGROUND:Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors fr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-367

    authors: Sleep JA,Schreiber AW,Baumann U

    更新日期:2013-12-18 00:00:00

  • NIFTI: an evolutionary approach for finding number of clusters in microarray data.

    abstract:BACKGROUND:Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learnin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-40

    authors: Jonnalagadda S,Srinivasan R

    更新日期:2009-01-30 00:00:00

  • Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields.

    abstract:BACKGROUND:De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence dif...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03740-x

    authors: Steyaert A,Audenaert P,Fostier J

    更新日期:2020-09-14 00:00:00

  • In silico modelling of hormone response elements.

    abstract:BACKGROUND:An important step in understanding the conditions that specify gene expression is the recognition of gene regulatory elements. Due to high diversity of different types of transcription factors and their DNA binding preferences, it is a challenging problem to establish an accurate model for recognition of fun...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S4-S27

    authors: Stepanova M,Lin F,Lin VC

    更新日期:2006-12-12 00:00:00

  • Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

    abstract:BACKGROUND:To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides inf...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-284

    authors: Oh SJ,Joung JG,Chang JH,Zhang BT

    更新日期:2006-06-06 00:00:00

  • Bounded search for de novo identification of degenerate cis-regulatory elements.

    abstract:BACKGROUND:The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-cou...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-254

    authors: Carlson JM,Chakravarty A,Khetani RS,Gross RH

    更新日期:2006-05-15 00:00:00

  • Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

    abstract:BACKGROUND:FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is im...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-364

    authors: Chadeau-Hyam M,Hoggart CJ,O'Reilly PF,Whittaker JC,De Iorio M,Balding DJ

    更新日期:2008-09-08 00:00:00

  • A benchmark study of sequence alignment methods for protein clustering.

    abstract:BACKGROUND:Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2524-4

    authors: Wang Y,Wu H,Cai Y

    更新日期:2018-12-31 00:00:00

  • Fpocket: an open source platform for ligand pocket detection.

    abstract:BACKGROUND:Virtual screening methods start to be well established as effective approaches to identify hits, candidates and leads for drug discovery research. Among those, structure based virtual screening (SBVS) approaches aim at docking collections of small compounds in the target structure to identify potent compound...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-168

    authors: Le Guilloux V,Schmidtke P,Tuffery P

    更新日期:2009-06-02 00:00:00

  • Multiple sequence alignment accuracy and evolutionary distance estimation.

    abstract:BACKGROUND:Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-278

    authors: Rosenberg MS

    更新日期:2005-11-23 00:00:00

  • Using mechanistic Bayesian networks to identify downstream targets of the sonic hedgehog pathway.

    abstract:BACKGROUND:The topology of a biological pathway provides clues as to how a pathway operates, but rationally using this topology information with observed gene expression data remains a challenge. RESULTS:We introduce a new general-purpose analytic method called Mechanistic Bayesian Networks (MBNs) that allows for the ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-433

    authors: Shah A,Tenzen T,McMahon AP,Woolf PJ

    更新日期:2009-12-18 00:00:00

  • CorrelaGenes: a new tool for the interpretation of the human transcriptome.

    abstract:BACKGROUND:The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS:By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) d...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S1-S6

    authors: Cremaschi P,Rovida S,Sacchi L,Lisa A,Calvi F,Montecucco A,Biamonti G,Bione S,Sacchi G

    更新日期:2014-01-01 00:00:00

  • R/BHC: fast Bayesian hierarchical clustering for microarray data.

    abstract:BACKGROUND:Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS:We present an R/Bioconductor port of a fast novel algorithm for...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-242

    authors: Savage RS,Heller K,Xu Y,Ghahramani Z,Truman WM,Grant M,Denby KJ,Wild DL

    更新日期:2009-08-06 00:00:00

  • Visualising very large phylogenetic trees in three dimensional hyperbolic space.

    abstract:BACKGROUND:Common existing phylogenetic tree visualisation tools are not able to display readable trees with more than a few thousand nodes. These existing methodologies are based in two dimensional space. RESULTS:We introduce the idea of visualising phylogenetic trees in three dimensional hyperbolic space with the Wa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-48

    authors: Hughes T,Hyun Y,Liberles DA

    更新日期:2004-04-29 00:00:00

  • Quality determination and the repair of poor quality spots in array experiments.

    abstract:BACKGROUND:A common feature of microarray experiments is the occurrence of missing gene expression data. These missing values occur for a variety of reasons, in particular, because of the filtering of poor quality spots and the removal of undefined values when a logarithmic transformation is applied to negative backgro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-234

    authors: Tom BD,Gilks WR,Brooke-Powell ET,Ajioka JW

    更新日期:2005-09-26 00:00:00

  • Modeling, validation and verification of three-dimensional cell-scaffold contacts from terabyte-sized images.

    abstract:BACKGROUND:Cell-scaffold contact measurements are derived from pairs of co-registered volumetric fluorescent confocal laser scanning microscopy (CLSM) images (z-stacks) of stained cells and three types of scaffolds (i.e., spun coat, large microfiber, and medium microfiber). Our analysis of the acquired terabyte-sized c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1928-x

    authors: Bajcsy P,Yoon S,Florczyk SJ,Hotaling NA,Simon M,Szczypinski PM,Schaub NJ,Simon CG Jr,Brady M,Sriram RD

    更新日期:2017-11-28 00:00:00

  • BioIMAX: a Web 2.0 approach for easy exploratory and collaborative access to multivariate bioimage data.

    abstract:BACKGROUND:Innovations in biological and biomedical imaging produce complex high-content and multivariate image data. For decision-making and generation of hypotheses, scientists need novel information technology tools that enable them to visually explore and analyze the data and to discuss and communicate results or f...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-297

    authors: Loyek C,Rajpoot NM,Khan M,Nattkemper TW

    更新日期:2011-07-21 00:00:00

  • CONSTAX: a tool for improved taxonomic resolution of environmental fungal ITS sequences.

    abstract:BACKGROUND:One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1952-x

    authors: Gdanetz K,Benucci GMN,Vande Pol N,Bonito G

    更新日期:2017-12-06 00:00:00

  • A new method for 2D gel spot alignment: application to the analysis of large sample sets in clinical proteomics.

    abstract:BACKGROUND:In current comparative proteomics studies, the large number of images generated by 2D gels is currently compared using spot matching algorithms. Unfortunately, differences in gel migration and sample variability make efficient spot alignment very difficult to obtain, and, as consequence most of the software ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-460

    authors: Pérès S,Molina L,Salvetat N,Granier C,Molina F

    更新日期:2008-10-28 00:00:00