Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods.

Abstract:

BACKGROUND:Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly combines qualities and anatomical entities. The first step in this process is the segmentation of the phenotype descriptions into their atomic elements. RESULTS:We present a two-phase hybrid segmentation method that combines a series individual classifiers using different aggregation schemes (set operations and simple majority voting). The approach is tested on a corpus comprised of skeletal phenotype descriptions emerged from the Human Phenotype Ontology. Experimental results show that the best hybrid method achieves an F-Score of 97.05% in the first phase and F-Scores of 97.16% / 94.50% in the second phase. CONCLUSIONS:The performance of the initial segmentation of anatomical entities and qualities (phase I) is not affected by the presence / absence of external resources, such as domain dictionaries. From a generic perspective, hybrid methods may not always improve the segmentation accuracy as they are heavily dependent on the goal and data characteristics.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Groza T,Hunter J,Zankl A

doi

10.1186/1471-2105-13-265

subject

Has Abstract

pub_date

2012-10-15 00:00:00

pages

265

issn

1471-2105

pii

1471-2105-13-265

journal_volume

13

pub_type

杂志文章
  • Predicting MoRFs in protein sequences using HMM profiles.

    abstract:BACKGROUND:Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1375-0

    authors: Sharma R,Kumar S,Tsunoda T,Patil A,Sharma A

    更新日期:2016-12-22 00:00:00

  • Detection of biological switches using the method of Gröebner bases.

    abstract:BACKGROUND:Bistability and ability to switch between two stable states is the hallmark of cellular responses. Cellular signaling pathways often contain bistable switches that regulate the transmission of the extracellular information to the nucleus where important biological functions are executed. RESULTS:In this wor...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3155-0

    authors: Arkun Y

    更新日期:2019-11-28 00:00:00

  • Augmented annotation and orthologue analysis for Oryctolagus cuniculus: Better Bunny.

    abstract:BACKGROUND:The rabbit is an important model organism used in a wide range of biomedical research. However, the rabbit genome is still sparsely annotated, thus prohibiting extensive functional analysis of gene sets derived from whole-genome experiments. We developed a web-based application that provides augmented annota...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-84

    authors: Craig DB,Kannan S,Dombkowski AA

    更新日期:2012-05-08 00:00:00

  • Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments.

    abstract:BACKGROUND:The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3302-7

    authors: Hur B,Kang D,Lee S,Moon JH,Lee G,Kim S

    更新日期:2019-12-27 00:00:00

  • SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes.

    abstract:BACKGROUND:Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1437-3

    authors: Mägi R,Suleimanov YV,Clarke GM,Kaakinen M,Fischer K,Prokopenko I,Morris AP

    更新日期:2017-01-11 00:00:00

  • An automated framework for understanding structural variations in the binding grooves of MHC class II molecules.

    abstract:BACKGROUND:MHC/HLA class II molecules are important components of the immune system and play a critical role in processes such as phagocytosis. Understanding peptide recognition properties of the hundreds of MHC class II alleles is essential to appreciate determinants of antigenicity and ultimately to predict epitopes....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S1-S55

    authors: Yeturu K,Utriainen T,Kemp GJ,Chandra N

    更新日期:2010-01-18 00:00:00

  • Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

    abstract:BACKGROUND:Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. Th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-99

    authors: Vlasblom J,Wodak SJ

    更新日期:2009-03-30 00:00:00

  • TMB-Hunt: an amino acid composition based method to screen proteomes for beta-barrel transmembrane proteins.

    abstract:BACKGROUND:Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria (both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discrimin...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-56

    authors: Garrow AG,Agnew A,Westhead DR

    更新日期:2005-03-15 00:00:00

  • Protein function prediction by collective classification with explicit and implicit edges in protein-protein interaction networks.

    abstract:BACKGROUND:Protein function prediction is an important problem in the post-genomic era. Recent advances in experimental biology have enabled the production of vast amounts of protein-protein interaction (PPI) data. Thus, using PPI data to functionally annotate proteins has been extensively studied. However, most existi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-S12-S4

    authors: Xiong W,Liu H,Guan J,Zhou S

    更新日期:2013-01-01 00:00:00

  • Biotite: a unifying open source computational biology framework in Python.

    abstract:BACKGROUND:As molecular biology is creating an increasing amount of sequence and structure data, the multitude of software to analyze this data is also rising. Most of the programs are made for a specific task, hence the user often needs to combine multiple programs in order to reach a goal. This can make the data proc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2367-z

    authors: Kunzmann P,Hamacher K

    更新日期:2018-10-01 00:00:00

  • RECOVIR: an application package to automatically identify some single stranded RNA viruses using capsid protein residues that uniquely distinguish among these viruses.

    abstract:BACKGROUND:Most single stranded RNA (ssRNA) viruses mutate rapidly to generate large number of strains having highly divergent capsid sequences. Accurate strain recognition in uncharacterized target capsid sequences is essential for epidemiology, diagnostics, and vaccine development. Strain recognition based on similar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-379

    authors: Zhu D,Fox GE,Chakravarty S

    更新日期:2007-10-10 00:00:00

  • Critique of the pairwise method for estimating qPCR amplification efficiency: beware of correlated data!

    abstract:BACKGROUND:A recently proposed method for estimating qPCR amplification efficiency E analyzes fluorescence intensity ratios from pairs of points deemed to lie in the exponential growth region on the amplification curves for all reactions in a dilution series. This method suffers from a serious problem: The resulting ra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03604-4

    authors: Tellinghuisen J

    更新日期:2020-07-08 00:00:00

  • Meta-aligner: long-read alignment based on genome statistics.

    abstract:BACKGROUND:Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff bet...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1518-y

    authors: Nashta-Ali D,Aliyari A,Ahmadian Moghadam A,Edrisi MA,Motahari SA,Hossein Khalaj B

    更新日期:2017-02-23 00:00:00

  • Metabolic network alignment in large scale by network compression.

    abstract::Metabolic network alignment is a system scale comparative analysis that discovers important similarities and differences across different metabolisms and organisms. Although the problem of aligning metabolic networks has been considered in the past, the computational complexity of the existing solutions has so far lim...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S3-S2

    authors: Ay F,Dang M,Kahveci T

    更新日期:2012-03-21 00:00:00

  • Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.

    abstract:BACKGROUND:T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunolo...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S1-S19

    authors: Zhang GL,Khan AM,Srinivasan KN,Heiny A,Lee K,Kwoh CK,August JT,Brusic V

    更新日期:2008-01-01 00:00:00

  • GenomeBlast: a web tool for small genome comparison.

    abstract:BACKGROUND:Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inferen...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-S4-S18

    authors: Lu G,Jiang L,Helikar RM,Rowley TW,Zhang L,Chen X,Moriyama EN

    更新日期:2006-12-12 00:00:00

  • Algebraic Dynamic Programming over general data structures.

    abstract:BACKGROUND:Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform sto...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S19-S2

    authors: zu Siederdissen CH,Prohaska SJ,Stadler PF

    更新日期:2015-01-01 00:00:00

  • Detection of gene pathways with predictive power for breast cancer prognosis.

    abstract:BACKGROUND:Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-1

    authors: Ma S,Kosorok MR

    更新日期:2010-01-01 00:00:00

  • Robust structure measures of metabolic networks that predict prokaryotic optimal growth temperature.

    abstract:BACKGROUND:Metabolic networks reflect the relationships between metabolites (biomolecules) and the enzymes (proteins), and are of particular interest since they describe all chemical reactions of an organism. The metabolic networks are constructed from the genome sequence of an organism, and the graphs can be used to s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3112-y

    authors: Weber Zendrera A,Sokolovska N,Soula HA

    更新日期:2019-10-15 00:00:00

  • ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.

    abstract:BACKGROUND:The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2665-0

    authors: Xiao A,Wu Z,Dong S

    更新日期:2019-02-14 00:00:00

  • Pripper: prediction of caspase cleavage sites from whole proteomes.

    abstract:BACKGROUND:Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to pre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-320

    authors: Piippo M,Lietzén N,Nevalainen OS,Salmi J,Nyman TA

    更新日期:2010-06-15 00:00:00

  • TAMEE: data management and analysis for tissue microarrays.

    abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-81

    authors: Thallinger GG,Baumgartner K,Pirklbauer M,Uray M,Pauritsch E,Mehes G,Buck CR,Zatloukal K,Trajanoski Z

    更新日期:2007-03-07 00:00:00

  • Multi-omic analysis of signalling factors in inflammatory comorbidities.

    abstract:BACKGROUND:Inflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and relate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2413-x

    authors: Xiao H,Bartoszek K,Lio' P

    更新日期:2018-11-30 00:00:00

  • A platform for processing expression of short time series (PESTS).

    abstract:BACKGROUND:Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time se...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-13

    authors: Sinha A,Markatou M

    更新日期:2011-01-11 00:00:00

  • The InDeVal insertion/deletion evaluation tool: a program for finding target regions in DNA sequences and for aiding in sequence comparison.

    abstract:BACKGROUND:The program InDeVal was originally developed to help researchers find known regions of insertion/deletion activity (with the exception of isolated single-base indels) in newly determined Poaceae trnL-F sequences and compare them with 533 previously determined sequences. It is supplied with input files design...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-173

    authors: Stoneberg Holt SD,Holt JA

    更新日期:2004-10-29 00:00:00

  • Membrane protein orientation and refinement using a knowledge-based statistical potential.

    abstract:BACKGROUND:Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-276

    authors: Nugent T,Jones DT

    更新日期:2013-09-18 00:00:00

  • Glycosylator: a Python framework for the rapid modeling of glycans.

    abstract:BACKGROUND:Carbohydrates are a class of large and diverse biomolecules, ranging from a simple monosaccharide to large multi-branching glycan structures. The covalent linkage of a carbohydrate to the nitrogen atom of an asparagine, a process referred to as N-linked glycosylation, plays an important role in the physiolog...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3097-6

    authors: Lemmin T,Soto C

    更新日期:2019-10-22 00:00:00

  • ImmunoGlobe: enabling systems immunology with a manually curated intercellular immune interaction network.

    abstract:BACKGROUND:While technological advances have made it possible to profile the immune system at high resolution, translating high-throughput data into knowledge of immune mechanisms has been challenged by the complexity of the interactions underlying immune processes. Tools to explore the immune network are critical for ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03702-3

    authors: Atallah MB,Tandon V,Hiam KJ,Boyce H,Hori M,Atallah W,Spitzer MH,Engleman E,Mallick P

    更新日期:2020-08-10 00:00:00

  • How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.

    abstract:BACKGROUND:The search for enriched features has become widely used to characterize a set of genes or proteins. A key aspect of this technique is its ability to identify correlations amongst heterogeneous data such as Gene Ontology annotations, gene expression data and genome location of genes. Despite the rapid growth ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-332

    authors: Barriot R,Sherman DJ,Dutour I

    更新日期:2007-09-11 00:00:00

  • COPASAAR--a database for proteomic analysis of single amino acid repeats.

    abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-196

    authors: Depledge DP,Dalby AR

    更新日期:2005-08-03 00:00:00