Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

Abstract:

BACKGROUND:Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs. RESULTS:In this work we compare the performance of the Affinity Propagation (AP) and Markov Clustering (MCL) procedures. To this end we derive an unweighted network of protein-protein interactions from a set of 408 protein complexes from S. cervisiae hand curated in-house, and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise. To evaluate the performance on a weighted protein interaction graph, we also apply the two algorithms to the consolidated protein interaction network of S. cerevisiae, derived from genome scale purification experiments and to versions of this network in which varying proportions of the links have been randomly shuffled. CONCLUSION:Our analysis shows that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Vlasblom J,Wodak SJ

doi

10.1186/1471-2105-10-99

subject

Has Abstract

pub_date

2009-03-30 00:00:00

pages

99

issn

1471-2105

pii

1471-2105-10-99

journal_volume

10

pub_type

杂志文章
  • BiPOm: a rule-based ontology to represent and infer molecule knowledge from a biological process-centered viewpoint.

    abstract:BACKGROUND:Managing and organizing biological knowledge remains a major challenge, due to the complexity of living systems. Recently, systemic representations have been promising in tackling such a challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked sub...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03637-9

    authors: Henry V,Saïs F,Inizan O,Marchadier E,Dibie J,Goelzer A,Fromion V

    更新日期:2020-07-23 00:00:00

  • Finding sRNA generative locales from high-throughput sequencing data with NiBLS.

    abstract:BACKGROUND:Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a pauci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-93

    authors: MacLean D,Moulton V,Studholme DJ

    更新日期:2010-02-18 00:00:00

  • Compromise or optimize? The breakpoint anti-median.

    abstract:BACKGROUND:The median of k≥3 genomes was originally defined to find a compromise genome indicative of a common ancestor. However, in gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. "Near-medi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1340-y

    authors: Larlee CA,Brandts A,Sankoff D

    更新日期:2016-12-15 00:00:00

  • A theorem proving approach for automatically synthesizing visualizations of flow cytometry data.

    abstract:BACKGROUND:Polychromatic flow cytometry is a popular technique that has wide usage in the medical sciences, especially for studying phenotypic properties of cells. The high-dimensionality of data generated by flow cytometry usually makes it difficult to visualize. The naive solution of simply plotting two-dimensional g...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1662-4

    authors: Raj S,Hussain F,Husein Z,Torosdagli N,Turgut D,Deo N,Pattanaik S,Chang CJ,Jha SK

    更新日期:2017-06-07 00:00:00

  • Bayesian neural networks for detecting epistasis in genetic association studies.

    abstract:BACKGROUND:Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. RESULTS:A non-parametric Bayesian ap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0368-0

    authors: Beam AL,Motsinger-Reif A,Doyle J

    更新日期:2014-11-21 00:00:00

  • Rearrangement analysis of multiple bacterial genomes.

    abstract:BACKGROUND:Genomes are subjected to rearrangements that change the orientation and ordering of genes during evolution. The most common rearrangements that occur in uni-chromosomal genomes are inversions (or reversals) to adapt to the changing environment. Since genome rearrangements are rarer than point mutations, gene...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3293-4

    authors: Noureen M,Tada I,Kawashima T,Arita M

    更新日期:2019-12-27 00:00:00

  • Natural computation meta-heuristics for the in silico optimization of microbial strains.

    abstract:BACKGROUND:One of the greatest challenges in Metabolic Engineering is to develop quantitative models and algorithms to identify a set of genetic manipulations that will result in a microbial strain with a desirable metabolic phenotype which typically means having a high yield/productivity. This challenge is not only du...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-499

    authors: Rocha M,Maia P,Mendes R,Pinto JP,Ferreira EC,Nielsen J,Patil KR,Rocha I

    更新日期:2008-11-27 00:00:00

  • ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

    abstract:BACKGROUND:Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Meth...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-61

    authors: Hajiloo M,Sapkota Y,Mackey JR,Robson P,Greiner R,Damaraju S

    更新日期:2013-02-22 00:00:00

  • Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins.

    abstract:BACKGROUND:Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Des...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S6-S3

    authors: Ashtawy HM,Mahapatra NR

    更新日期:2015-01-01 00:00:00

  • Process attributes in bio-ontologies.

    abstract:BACKGROUND:Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attribute...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-217

    authors: Andrade AQ,Blondé W,Hastings J,Schulz S

    更新日期:2012-08-28 00:00:00

  • Dynamic changes in the secondary structure of ECE-1 and XCE account for their different substrate specificities.

    abstract:BACKGROUND:X-converting enzyme (XCE) involved in nervous control of respiration, is a member of the M13 family of zinc peptidases, for which no natural substrate has been identified yet. In contrast, it's well characterized homologue endothelin-converting enzyme-1 (ECE-1) showed broad substrate specificity and acts as ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-285

    authors: Ul-Haq Z,Iqbal S,Moin ST

    更新日期:2012-11-01 00:00:00

  • Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection.

    abstract:BACKGROUND:Lung adenocarcinoma is the most common type of lung cancer, with high mortality worldwide. Its occurrence and development were thoroughly studied by high-throughput expression microarray, which produced abundant data on gene expression, DNA methylation, and miRNA quantification. However, the hub genes, which...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2739-z

    authors: Fan X,Wang Y,Tang XQ

    更新日期:2019-05-01 00:00:00

  • Use of a structural alphabet for analysis of short loops connecting repetitive structures.

    abstract:BACKGROUND:Because loops connect regular secondary structures, analysis of the former depends directly on the definition of the latter. The numerous assignment methods, however, can offer different definitions. In a previous study, we defined a structural alphabet composed of 16 average protein fragments, which we call...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-58

    authors: Fourrier L,Benros C,de Brevern AG

    更新日期:2004-05-12 00:00:00

  • Prediction of dinucleotide-specific RNA-binding sites in proteins.

    abstract:BACKGROUND:Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S13-S5

    authors: Fernandez M,Kumagai Y,Standley DM,Sarai A,Mizuguchi K,Ahmad S

    更新日期:2011-01-01 00:00:00

  • Leveraging TCGA gene expression data to build predictive models for cancer drug response.

    abstract:BACKGROUND:Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients' primary tumor tissues to predict whether a patient wi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03690-4

    authors: Clayton EA,Pujol TA,McDonald JF,Qiu P

    更新日期:2020-09-30 00:00:00

  • NeurphologyJ: an automatic neuronal morphology quantification method and its application in pharmacological discovery.

    abstract:BACKGROUND:Automatic quantification of neuronal morphology from images of fluorescence microscopy plays an increasingly important role in high-content screenings. However, there exist very few freeware tools and methods which provide automatic neuronal morphology quantification for pharmacological discovery. RESULTS:T...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-230

    authors: Ho SY,Chao CY,Huang HL,Chiu TW,Charoenkwan P,Hwang E

    更新日期:2011-06-08 00:00:00

  • JContextExplorer: a tree-based approach to facilitate cross-species genomic context comparison.

    abstract:BACKGROUND:Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial speci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-18

    authors: Seitzer P,Huynh TA,Facciotti MT

    更新日期:2013-01-16 00:00:00

  • TPMS: a set of utilities for querying collections of gene trees.

    abstract:BACKGROUND:The information in large collections of phylogenetic trees is useful for many comparative genomic studies. Therefore, there is a need for flexible tools that allow exploration of such collections in order to retrieve relevant data as quickly as possible. RESULTS:In this paper, we present TPMS (Tree Pattern-...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-109

    authors: Bigot T,Daubin V,Lassalle F,Perrière G

    更新日期:2013-03-27 00:00:00

  • Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.

    abstract:BACKGROUND:When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1149-8

    authors: Mayr A,Hofner B,Schmid M

    更新日期:2016-07-22 00:00:00

  • Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

    abstract:BACKGROUND:During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-178

    authors: Nilsson RH,Kristiansson E,Ryberg M,Larsson KH

    更新日期:2005-07-18 00:00:00

  • Compartmentalization of the Edinburgh Human Metabolic Network.

    abstract:BACKGROUND:Direct in vivo investigation of human metabolism is complicated by the distinct metabolic functions of various sub-cellular organelles. Diverse micro-environments in different organelles may lead to distinct functions of the same protein and the use of different enzymes for the same metabolic reaction. To be...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-393

    authors: Hao T,Ma HW,Zhao XM,Goryanin I

    更新日期:2010-07-22 00:00:00

  • Thresher: determining the number of clusters while removing outliers.

    abstract:BACKGROUND:Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier pro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1998-9

    authors: Wang M,Abrams ZB,Kornblau SM,Coombes KR

    更新日期:2018-01-08 00:00:00

  • Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis.

    abstract:BACKGROUND:In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorith...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-4

    authors: Yang C,He Z,Yu W

    更新日期:2009-01-06 00:00:00

  • A knowledge discovery object model API for Java.

    abstract:BACKGROUND:Biological data resources have become heterogeneous and derive from multiple sources. This introduces challenges in the management and utilization of this data in software development. Although efforts are underway to create a standard format for the transmission and storage of biological data, this objectiv...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-4-51

    authors: Zuyderduyn SD,Jones SJ

    更新日期:2003-10-28 00:00:00

  • MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments.

    abstract:BACKGROUND:The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs fro...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-117

    authors: Collingridge PW,Kelly S

    更新日期:2012-05-30 00:00:00

  • OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments.

    abstract:BACKGROUND:Differentially expressed genes are typically identified by analyzing the variation between replicate measurements. These procedures implicitly assume that there are no systematic errors in the data even though several sources of systematic error are known. RESULTS:OpWise estimates the amount of systematic e...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-19

    authors: Price MN,Arkin AP,Alm EJ

    更新日期:2006-01-13 00:00:00

  • Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods.

    abstract:BACKGROUND:Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly co...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-265

    authors: Groza T,Hunter J,Zankl A

    更新日期:2012-10-15 00:00:00

  • Genoviz Software Development Kit: Java tool kit for building genomics visualization applications.

    abstract:BACKGROUND:Visualization software can expose previously undiscovered patterns in genomic data and advance biological science. RESULTS:The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK frame...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-266

    authors: Helt GA,Nicol JW,Erwin E,Blossom E,Blanchard SG Jr,Chervitz SA,Harmon C,Loraine AE

    更新日期:2009-08-25 00:00:00

  • HTPheno: an image analysis pipeline for high-throughput plant phenotyping.

    abstract:BACKGROUND:In the last few years high-throughput analysis methods have become state-of-the-art in the life sciences. One of the latest developments is automated greenhouse systems for high-throughput plant phenotyping. Such systems allow the non-destructive screening of plants over a period of time by means of image ac...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-148

    authors: Hartmann A,Czauderna T,Hoffmann R,Stein N,Schreiber F

    更新日期:2011-05-12 00:00:00

  • IPRStats: visualization of the functional potential of an InterProScan run.

    abstract:BACKGROUND:InterPro is a collection of protein signatures for the classification and automated annotation of proteins. Interproscan is a software tool that scans protein sequences against Interpro member databases using a variety of profile-based, hidden markov model and positional specific score matrix methods. It not...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-S12-S13

    authors: Kelly RJ,Vincent DE,Friedberg I

    更新日期:2010-12-21 00:00:00