Identification of discriminative characteristics for clusters from biologic data with InforBIO software.

Abstract:

BACKGROUND:There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integration of genotypic and phenotypic information for polyphasic studies is necessary for the classification and identification of microbes. Thus, we devised an algorithm that objectively identifies discriminative characteristics for focused clusters on generated trees from a dataset composed of coded data, such as phenotypic information. Moreover, this algorithm has been integrated into the polyphasic analysis software, InforBIO. RESULTS:We developed a differential-character-finding algorithm based on information measures and used this algorithm to identify the characteristic that best discriminates operational taxonomic unit clusters. For all characteristics in a dataset, the algorithm estimates commonality in focused clusters and diversity among clusters by scoring based on Shannon's and relative entropies. All the characteristics selected for scoring are equally weighted. Thresholds for the scores are defined to identify discriminative characteristics for clusters efficiently from a database. The unique feature of the algorithm, which is implemented in the InforBIO software, is that it can identify the phenotypic characteristics that discriminate and are associated with the clusters of a phylogenetic tree. We successfully applied this algorithm to the study of phylogenetic clusters of Pseudomonas species. CONCLUSION:The algorithm in the InforBIO software is a novel and useful approach for microbial polyphasic studies. The algorithm can also be applied to diverse cluster analyses. The InforBIO software is available from the download site http://wdcm.nig.ac.jp/inforbio/. This software is free for personal but not commercial use.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Tanaka N,Uchino M,Miyazaki S,Sugawara H

doi

10.1186/1471-2105-8-281

subject

Has Abstract

pub_date

2007-08-02 00:00:00

pages

281

issn

1471-2105

pii

1471-2105-8-281

journal_volume

8

pub_type

杂志文章
  • MPD: multiplex primer design for next-generation targeted sequencing.

    abstract:BACKGROUND:Targeted resequencing offers a cost-effective alternative to whole-genome and whole-exome sequencing when investigating regions known to be associated with a trait or disease. There are a number of approaches to targeted resequencing, including microfluidic PCR amplification, which may be enhanced by multipl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1453-3

    authors: Wingo TS,Kotlar A,Cutler DJ

    更新日期:2017-01-05 00:00:00

  • Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

    abstract:BACKGROUND:One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-410

    authors: Scheeff ED,Bourne PE

    更新日期:2006-09-14 00:00:00

  • Evaluation of high-throughput functional categorization of human disease genes.

    abstract:BACKGROUND:Biological data that are well-organized by an ontology, such as Gene Ontology, enables high-throughput availability of the semantic web. It can also be used to facilitate high throughput classification of biomedical information. However, to our knowledge, no evaluation has been published on automating classi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S3-S7

    authors: Chen JL,Liu Y,Sam LT,Li J,Lussier YA

    更新日期:2007-05-09 00:00:00

  • Bison: bisulfite alignment on nodes of a cluster.

    abstract:BACKGROUND:DNA methylation changes are associated with a wide array of biological processes. Bisulfite conversion of DNA followed by high-throughput sequencing is increasingly being used to assess genome-wide methylation at single-base resolution. The relative slowness of most commonly used aligners for processing such...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-337

    authors: Ryan DP,Ehninger D

    更新日期:2014-10-18 00:00:00

  • Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.

    abstract:BACKGROUND:Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignme...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S1-S48

    authors: Saito Y,Sato K,Sakakibara Y

    更新日期:2011-02-15 00:00:00

  • Considering scores between unrelated proteins in the search database improves profile comparison.

    abstract:BACKGROUND:Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-399

    authors: Sadreyev RI,Wang Y,Grishin NV

    更新日期:2009-12-04 00:00:00

  • Prediction of MHC class I binding peptides, using SVMHC.

    abstract:BACKGROUND:T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-25

    authors: Dönnes P,Elofsson A

    更新日期:2002-09-11 00:00:00

  • Recovering rearranged cancer chromosomes from karyotype graphs.

    abstract:BACKGROUND:Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale va...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3208-4

    authors: Aganezov S,Zban I,Aksenov V,Alexeev N,Schatz MC

    更新日期:2019-12-17 00:00:00

  • Enrichment of homologs in insignificant BLAST hits by co-complex network alignment.

    abstract:BACKGROUND:Homology is a crucial concept in comparative genomics. The algorithm probably most widely used for homology detection in comparative genomics, is BLAST. Usually a stringent score cutoff is applied to distinguish putative homologs from possible false positive hits. As a consequence, some BLAST hits are discar...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-86

    authors: Fokkens L,Botelho SM,Boekhorst J,Snel B

    更新日期:2010-02-12 00:00:00

  • proTRAC--a software for probabilistic piRNA cluster detection, visualization and analysis.

    abstract:BACKGROUND:Throughout the metazoan lineage, typically gonadal expressed Piwi proteins and their guiding piRNAs (~26-32nt in length) form a protective mechanism of RNA interference directed against the propagation of transposable elements (TEs). Most piRNAs are generated from genomic piRNA clusters. Annotation of experi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-5

    authors: Rosenkranz D,Zischler H

    更新日期:2012-01-10 00:00:00

  • Robust pathway sampling in phenotype prediction. Application to triple negative breast cancer.

    abstract:BACKGROUND:Phenotype prediction problems are usually considered ill-posed, as the amount of samples is very limited with respect to the scrutinized genetic probes. This fact complicates the sampling of the defective genetic pathways due to the high number of possible discriminatory genetic networks involved. In this re...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3356-6

    authors: Cernea A,Fernández-Martínez JL,deAndrés-Galiana EJ,Fernández-Ovies FJ,Alvarez-Machancoses O,Fernández-Muñiz Z,Saligan LN,Sonis ST

    更新日期:2020-03-11 00:00:00

  • Qxpak.5: old mixed model solutions for new genomics problems.

    abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-202

    authors: Pérez-Enciso M,Misztal I

    更新日期:2011-05-25 00:00:00

  • MQAPRank: improved global protein model quality assessment by learning-to-rank.

    abstract:BACKGROUND:Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure predict...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1691-z

    authors: Jing X,Dong Q

    更新日期:2017-05-25 00:00:00

  • ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark.

    abstract:BACKGROUND:The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2665-0

    authors: Xiao A,Wu Z,Dong S

    更新日期:2019-02-14 00:00:00

  • A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases.

    abstract:BACKGROUND:The PathoLogic program constructs Pathway/Genome databases by using a genome's annotation to predict the set of metabolic pathways present in an organism. PathoLogic determines the set of reactions composing those pathways from the enzymes annotated in the organism's genome. Most annotation efforts fail to a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-76

    authors: Green ML,Karp PD

    更新日期:2004-06-09 00:00:00

  • An iterative block-shifting approach to retention time alignment that preserves the shape and area of gas chromatography-mass spectrometry peaks.

    abstract:BACKGROUND:Metabolomics, petroleum and biodiesel chemistry, biomarker discovery, and other fields which rely on high-resolution profiling of complex chemical mixtures generate datasets which contain millions of detector intensity readings, each uniquely addressed along dimensions of time (e.g., retention time of chemic...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S9-S15

    authors: Chae M,Shmookler Reis RJ,Thaden JJ

    更新日期:2008-08-12 00:00:00

  • Towards a supervised classification of neocortical interneuron morphologies.

    abstract:BACKGROUND:The challenge of classifying cortical interneurons is yet to be solved. Data-driven classification into established morphological types may provide insight and practical value. RESULTS:We trained models using 217 high-quality morphologies of rat somatosensory neocortex interneurons reconstructed by a single...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2470-1

    authors: Mihaljević B,Larrañaga P,Benavides-Piccione R,Hill S,DeFelipe J,Bielza C

    更新日期:2018-12-17 00:00:00

  • Low degree metabolites explain essential reactions and enhance modularity in biological networks.

    abstract:BACKGROUND:Recently there has been a lot of interest in identifying modules at the level of genetic and metabolic networks of organisms, as well as in identifying single genes and reactions that are essential for the organism. A goal of computational and systems biology is to go beyond identification towards an explana...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-118

    authors: Samal A,Singh S,Giri V,Krishna S,Raghuram N,Jain S

    更新日期:2006-03-08 00:00:00

  • Towards mainstreaming of biodiversity data publishing: recommendations of the GBIF Data Publishing Framework Task Group.

    abstract:BACKGROUND:Data are the evidentiary basis for scientific hypotheses, analyses and publication, for policy formation and for decision-making. They are essential to the evaluation and testing of results by peer scientists both present and future. There is broad consensus in the scientific and conservation communities tha...

    journal_title:BMC bioinformatics

    pub_type: 指南,杂志文章

    doi:10.1186/1471-2105-12-S15-S1

    authors: Moritz T,Krishnan S,Roberts D,Ingwersen P,Agosti D,Penev L,Cockerill M,Chavan V,Data Publishing Framework Task Group.

    更新日期:2011-01-01 00:00:00

  • A preliminary PET radiomics study of brain metastases using a fully automatic segmentation method.

    abstract:BACKGROUND:Positron Emission Tomography (PET) is increasingly utilized in radiomics studies for treatment evaluation purposes. Nevertheless, lesion volume identification in PET images is a critical and still challenging step in the process of radiomics, due to the low spatial resolution and high noise level of PET imag...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03647-7

    authors: Stefano A,Comelli A,Bravatà V,Barone S,Daskalovski I,Savoca G,Sabini MG,Ippolito M,Russo G

    更新日期:2020-09-16 00:00:00

  • Markov chain Monte Carlo for active module identification problem.

    abstract:BACKGROUND:Integrative network methods are commonly used for interpretation of high-throughput experimental biological data: transcriptomics, proteomics, metabolomics and others. One of the common approaches is finding a connected subnetwork of a global interaction network that best encompasses significant individual c...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03572-9

    authors: Alexeev N,Isomurodov J,Sukhov V,Korotkevich G,Sergushichev A

    更新日期:2020-11-18 00:00:00

  • Prioritizing disease genes with an improved dual label propagation framework.

    abstract:BACKGROUND:Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein intera...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2040-6

    authors: Zhang Y,Liu J,Liu X,Fan X,Hong Y,Wang Y,Huang Y,Xie M

    更新日期:2018-02-08 00:00:00

  • Partial mixture model for tight clustering of gene expression time-course.

    abstract:BACKGROUND:Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-287

    authors: Yuan Y,Li CT,Wilson R

    更新日期:2008-06-18 00:00:00

  • A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.

    abstract:BACKGROUND:A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. METHODS:This paper describes a hybrid linguistic approa...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S2-S1

    authors: Segura-Bedmar I,Martínez P,de Pablo-Sánchez C

    更新日期:2011-03-29 00:00:00

  • Detection of transposable elements by their compositional bias.

    abstract:BACKGROUND:Transposable elements (TE) are mobile genetic entities present in nearly all genomes. Previous work has shown that TEs tend to have a different nucleotide composition than the host genes, either considering codon usage bias or dinucleotide frequencies. We show here how these compositional differences can be ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-94

    authors: Andrieu O,Fiston AS,Anxolabéhère D,Quesneville H

    更新日期:2004-07-13 00:00:00

  • Asymmetric bagging and feature selection for activities prediction of drug molecules.

    abstract:BACKGROUND:Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S6-S7

    authors: Li GZ,Meng HH,Lu WC,Yang JY,Yang MQ

    更新日期:2008-05-28 00:00:00

  • An improved method for identifying functionally linked proteins using phylogenetic profiles.

    abstract:BACKGROUND:Phylogenetic profiles record the occurrence of homologs of genes across fully sequenced organisms. Proteins with similar profiles are typically components of protein complexes or metabolic pathways. Various existing methods measure similarity between two profiles and, hence, the likelihood that the two prote...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S4-S7

    authors: Cokus S,Mizutani S,Pellegrini M

    更新日期:2007-05-22 00:00:00

  • Preliminary nanopore cheminformatics analysis of aptamer-target binding strength.

    abstract:BACKGROUND:Aptamers are nucleic acids selected for their ability to bind to molecules of interest and may provide the basis for a whole new class of medicines. If the aptamer is simply a dsDNA molecule with a ssDNA overhang (a "sticky" end) then the segment of ssDNA that complements that overhang provides a known bindi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S7-S11

    authors: Thomson K,Amin I,Morales E,Winters-Hilt S

    更新日期:2007-11-01 00:00:00

  • Process attributes in bio-ontologies.

    abstract:BACKGROUND:Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attribute...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-217

    authors: Andrade AQ,Blondé W,Hastings J,Schulz S

    更新日期:2012-08-28 00:00:00

  • Accurate prediction of protein-lncRNA interactions by diffusion and HeteSim features across heterogeneous network.

    abstract:BACKGROUND:Identifying the interactions between proteins and long non-coding RNAs (lncRNAs) is of great importance to decipher the functional mechanisms of lncRNAs. However, current experimental techniques for detection of lncRNA-protein interactions are limited and inefficient. Many methods have been proposed to predi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2390-0

    authors: Deng L,Wang J,Xiao Y,Wang Z,Liu H

    更新日期:2018-10-11 00:00:00