'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools.

Abstract:

BACKGROUND:Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. RESULTS:In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains. CONCLUSION:We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein prediction (named YimLOC), which can be accessed through the AnaBench suite at http://anabench.bcm.umontreal.ca/anabench/. The source code is provided in the Additional File 2.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Shen YQ,Burger G

doi

10.1186/1471-2105-8-420

subject

Has Abstract

pub_date

2007-10-29 00:00:00

pages

420

issn

1471-2105

pii

1471-2105-8-420

journal_volume

8

pub_type

杂志文章
  • Correction to: Similarities and differences between variants called with human reference genome HG19 or HG38.

    abstract::After publication of this supplement article. ...

    journal_title:BMC bioinformatics

    pub_type: 已发布勘误

    doi:10.1186/s12859-019-2776-7

    authors: Pan B,Kusko R,Xiao W,Zheng Y,Liu Z,Xiao C,Sakkiah S,Guo W,Gong P,Zhang C,Ge W,Shi L,Tong W,Hong H

    更新日期:2019-05-15 00:00:00

  • MetaMIS: a metagenomic microbial interaction simulator based on microbial community profiles.

    abstract:BACKGROUND:The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1359-0

    authors: Shaw GT,Pao YY,Wang D

    更新日期:2016-11-25 00:00:00

  • Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.

    abstract:BACKGROUND:Clustering of protein sequences is of key importance in predicting the structure and function of newly sequenced proteins and is also of use for their annotation. With the advent of multiple high-throughput sequencing technologies, new protein sequences are becoming available at an extraordinary rate. The ra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2080-y

    authors: Abnousi A,Broschat SL,Kalyanaraman A

    更新日期:2018-03-05 00:00:00

  • A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.

    abstract:BACKGROUND:Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-69

    authors: Guo Y,Korhonen A,Liakata M,Silins I,Hogberg J,Stenius U

    更新日期:2011-03-08 00:00:00

  • Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie.

    abstract:BACKGROUND:Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the softw...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S16-S15

    authors: Giannoulatou E,Park SH,Humphreys DT,Ho JW

    更新日期:2014-01-01 00:00:00

  • COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project.

    abstract:BACKGROUND:With the ever increasing use of computational models in the biosciences, the need to share models and reproduce the results of published studies efficiently and easily is becoming more important. To this end, various standards have been proposed that can be used to describe models, simulations, data or other...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0369-z

    authors: Bergmann FT,Adams R,Moodie S,Cooper J,Glont M,Golebiewski M,Hucka M,Laibe C,Miller AK,Nickerson DP,Olivier BG,Rodriguez N,Sauro HM,Scharm M,Soiland-Reyes S,Waltemath D,Yvon F,Le Novère N

    更新日期:2014-12-14 00:00:00

  • A decision analysis model for KEGG pathway analysis.

    abstract:BACKGROUND:The knowledge base-driven pathway analysis is becoming the first choice for many investigators, in that it not only can reduce the complexity of functional analysis by grouping thousands of genes into just several hundred pathways, but also can increase the explanatory power for the experiment by identifying...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1285-1

    authors: Du J,Li M,Yuan Z,Guo M,Song J,Xie X,Chen Y

    更新日期:2016-10-06 00:00:00

  • Predicting anatomic therapeutic chemical classification codes using tiered learning.

    abstract:BACKGROUND:The low success rate and high cost of drug discovery requires the development of new paradigms to identify molecules of therapeutic value. The Anatomical Therapeutic Chemical (ATC) Code System is a World Health Organization (WHO) proposed classification that assigns multi-level codes to compounds based on th...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1660-6

    authors: Olson T,Singh R

    更新日期:2017-06-07 00:00:00

  • Learning statistical models for annotating proteins with function information using biomedical text.

    abstract:BACKGROUND:The BioCreative text mining evaluation investigated the application of text mining methods to the task of automatically extracting information from text in biomedical research articles. We participated in Task 2 of the evaluation. For this task, we built a system to automatically annotate a given protein wit...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S1-S18

    authors: Ray S,Craven M

    更新日期:2005-01-01 00:00:00

  • Providing visualisation support for the analysis of anatomy ontology data.

    abstract:BACKGROUND:Improvements in technology have been accompanied by the generation of large amounts of complex data. This same technology must be harnessed effectively if the knowledge stored within the data is to be retrieved. Storing data in ontologies aids its management; ontologies serve as controlled vocabularies that ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-74

    authors: Dadzie AS,Burger A

    更新日期:2005-03-24 00:00:00

  • ProLego: tool for extracting and visualizing topological modules in protein structures.

    abstract:BACKGROUND:In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intu...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2171-9

    authors: Khan T,Panday SK,Ghosh I

    更新日期:2018-05-04 00:00:00

  • Scoredist: a simple and robust protein sequence distance estimator.

    abstract:BACKGROUND:Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-108

    authors: Sonnhammer EL,Hollich V

    更新日期:2005-04-27 00:00:00

  • MapMi: automated mapping of microRNA loci.

    abstract:BACKGROUND:A large effort to discover microRNAs (miRNAs) has been under way. Currently miRBase is their primary repository, providing annotations of primary sequences, precursors and probable genomic loci. In many cases miRNAs are identical or very similar between related (or in some cases more distant) species. Howeve...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-133

    authors: Guerra-Assunção JA,Enright AJ

    更新日期:2010-03-16 00:00:00

  • SpliceMiner: a high-throughput database implementation of the NCBI Evidence Viewer for microarray splice variant analysis.

    abstract:BACKGROUND:There are many fewer genes in the human genome than there are expressed transcripts. Alternative splicing is the reason. Alternatively spliced transcripts are often specific to tissue type, developmental stage, environmental condition, or disease state. Accurate analysis of microarray expression data and des...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-75

    authors: Kahn AB,Ryan MC,Liu H,Zeeberg BR,Jamison DC,Weinstein JN

    更新日期:2007-03-05 00:00:00

  • Island method for estimating the statistical significance of profile-profile alignment scores.

    abstract:BACKGROUND:In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many exp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-112

    authors: Poleksic A

    更新日期:2009-04-20 00:00:00

  • circRNAprofiler: an R-based computational framework for the downstream analysis of circular RNAs.

    abstract:BACKGROUND:Circular RNAs (circRNAs) are a newly appreciated class of non-coding RNA molecules. Numerous tools have been developed for the detection of circRNAs, however computational tools to perform downstream functional analysis of circRNAs are scarce. RESULTS:We present circRNAprofiler, an R-based computational fra...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3500-3

    authors: Aufiero S,Reckman YJ,Tijsen AJ,Pinto YM,Creemers EE

    更新日期:2020-04-29 00:00:00

  • Analysis of genomic and transcriptomic variations as prognostic signature for lung adenocarcinoma.

    abstract:BACKGROUND:Lung cancer is the leading cause of the largest number of deaths worldwide and lung adenocarcinoma is the most common form of lung cancer. In order to understand the molecular basis of lung adenocarcinoma, integrative analysis have been performed by using genomics, transcriptomics, epigenomics, proteomics an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03691-3

    authors: Zengin T,Önal-Süzek T

    更新日期:2020-09-30 00:00:00

  • Visualization methods for statistical analysis of microarray clusters.

    abstract:BACKGROUND:The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gol...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-115

    authors: Hibbs MA,Dirksen NC,Li K,Troyanskaya OG

    更新日期:2005-05-12 00:00:00

  • Multi-label literature classification based on the Gene Ontology graph.

    abstract:BACKGROUND:The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-525

    authors: Jin B,Muller B,Zhai C,Lu X

    更新日期:2008-12-08 00:00:00

  • Colonyzer: automated quantification of micro-organism growth characteristics on solid agar.

    abstract:BACKGROUND:High-throughput screens comparing growth rates of arrays of distinct micro-organism cultures on solid agar are useful, rapid methods of quantifying genetic interactions. Growth rate is an informative phenotype which can be estimated by measuring cell densities at one or more times after inoculation. Precise ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-287

    authors: Lawless C,Wilkinson DJ,Young A,Addinall SG,Lydall DA

    更新日期:2010-05-28 00:00:00

  • Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physicochemical features.

    abstract:BACKGROUND:The human immunodeficiency virus type 1 (HIV-1) aspartic protease is an important enzyme owing to its imperative part in viral development and a causative agent of deadliest disease known as acquired immune deficiency syndrome (AIDS). Development of HIV-1 protease inhibitors can help understand the specifici...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1337-6

    authors: Singh O,Su EC

    更新日期:2016-12-23 00:00:00

  • Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient.

    abstract:BACKGROUND:Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-158

    authors: Stoltzfus A,Lapp H,Matasci N,Deus H,Sidlauskas B,Zmasek CM,Vaidya G,Pontelli E,Cranston K,Vos R,Webb CO,Harmon LJ,Pirrung M,O'Meara B,Pennell MW,Mirarab S,Rosenberg MS,Balhoff JP,Bik HM,Heath TA,Midford PE,Brown

    更新日期:2013-05-13 00:00:00

  • Advances in translational bioinformatics facilitate revealing the landscape of complex disease mechanisms.

    abstract::Advances of high-throughput technologies have rapidly produced more and more data from DNAs and RNAs to proteins, especially large volumes of genome-scale data. However, connection of the genomic information to cellular functions and biological behaviours relies on the development of effective approaches at higher sys...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-S17-I1

    authors: Yang JY,Dunker A,Liu JS,Qin X,Arabnia HR,Yang W,Niemierko A,Chen Z,Luo Z,Wang L,Liu Y,Xu D,Deng Y,Tong W,Yang M

    更新日期:2014-01-01 00:00:00

  • Pripper: prediction of caspase cleavage sites from whole proteomes.

    abstract:BACKGROUND:Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to pre...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-320

    authors: Piippo M,Lietzén N,Nevalainen OS,Salmi J,Nyman TA

    更新日期:2010-06-15 00:00:00

  • Measuring phenotype-phenotype similarity through the interactome.

    abstract:BACKGROUND:Recently, measuring phenotype similarity began to play an important role in disease diagnosis. Researchers have begun to pay attention to develop phenotype similarity measurement. However, existing methods ignore the interactions between phenotype-associated proteins, which may lead to inaccurate phenotype s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2102-9

    authors: Peng J,Hui W,Shang X

    更新日期:2018-04-11 00:00:00

  • Automated modelling of signal transduction networks.

    abstract:BACKGROUND:Intracellular signal transduction is achieved by networks of proteins and small molecules that transmit information from the cell surface to the nucleus, where they ultimately effect transcriptional changes. Understanding the mechanisms cells use to accomplish this important process requires a detailed molec...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-34

    authors: Steffen M,Petti A,Aach J,D'haeseleer P,Church G

    更新日期:2002-11-01 00:00:00

  • Application of the common base method to regression and analysis of covariance (ANCOVA) in qPCR experiments and subsequent relative expression calculation.

    abstract:BACKGROUND:Quantitative polymerase chain reaction (qPCR) is the technique of choice for quantifying gene expression. While the technique itself is well established, approaches for the analysis of qPCR data continue to improve. RESULTS:Here we expand on the common base method to develop procedures for testing linear re...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03696-y

    authors: Ganger MT,Dietz GD,Headley P,Ewing SJ

    更新日期:2020-09-29 00:00:00

  • BicPAMS: software for biological data analysis with pattern-based biclustering.

    abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1493-3

    authors: Henriques R,Ferreira FL,Madeira SC

    更新日期:2017-02-02 00:00:00

  • Construction and analysis of the protein-protein interaction networks for schizophrenia, bipolar disorder, and major depression.

    abstract:BACKGROUND:Schizophrenia, bipolar disorder, and major depression are devastating mental diseases, each with distinctive yet overlapping epidemiologic characteristics. Microarray and proteomics data have revealed genes which expressed abnormally in patients. Several single nucleotide polymorphisms (SNPs) and mutations a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S13-S20

    authors: Lee SA,Tsao TT,Yang KC,Lin H,Kuo YL,Hsu CH,Lee WK,Huang KC,Kao CY

    更新日期:2011-01-01 00:00:00

  • CNV-WebStore: online CNV analysis, storage and interpretation.

    abstract:BACKGROUND:Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV da...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-4

    authors: Vandeweyer G,Reyniers E,Wuyts W,Rooms L,Kooy RF

    更新日期:2011-01-05 00:00:00