PoGO: Prediction of Gene Ontology terms for fungal proteins.

Abstract:

BACKGROUND:Automated protein function prediction methods are the only practical approach for assigning functions to genes obtained from model organisms. Many of the previously reported function annotation methods are of limited utility for fungal protein annotation. They are often trained only to one species, are not available for high-volume data processing, or require the use of data derived by experiments such as microarray analysis. To meet the increasing need for high throughput, automated annotation of fungal genomes, we have developed a tool for annotating fungal protein sequences with terms from the Gene Ontology. RESULTS:We describe a classifier called PoGO (Prediction of Gene Ontology terms) that uses statistical pattern recognition methods to assign Gene Ontology (GO) terms to proteins from filamentous fungi. PoGO is organized as a meta-classifier in which each evidence source (sequence similarity, protein domains, protein structure and biochemical properties) is used to train independent base-level classifiers. The outputs of the base classifiers are used to train a meta-classifier, which provides the final assignment of GO terms. An independent classifier is trained for each GO term, making the system amenable to updating, without having to re-train the whole system. The resulting system is robust. It provides better accuracy and can assign GO terms to a higher percentage of unannotated protein sequences than other methods that we tested. CONCLUSIONS:Our annotation system overcomes many of the shortcomings that we found in other methods. We also provide a web server where users can submit protein sequences to be annotated.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Jung J,Yi G,Sukno SA,Thon MR

doi

10.1186/1471-2105-11-215

subject

Has Abstract

pub_date

2010-04-29 00:00:00

pages

215

issn

1471-2105

pii

1471-2105-11-215

journal_volume

11

pub_type

杂志文章
  • Extracting predictors for lung adenocarcinoma based on Granger causality test and stepwise character selection.

    abstract:BACKGROUND:Lung adenocarcinoma is the most common type of lung cancer, with high mortality worldwide. Its occurrence and development were thoroughly studied by high-throughput expression microarray, which produced abundant data on gene expression, DNA methylation, and miRNA quantification. However, the hub genes, which...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2739-z

    authors: Fan X,Wang Y,Tang XQ

    更新日期:2019-05-01 00:00:00

  • MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction.

    abstract:BACKGROUND:Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for predi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-274

    authors: Blum T,Briesemeister S,Kohlbacher O

    更新日期:2009-09-01 00:00:00

  • Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis.

    abstract:BACKGROUND:Reliability and Reproducibility of differentially expressed genes (DEGs) are essential for the biological interpretation of microarray data. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-143

    authors: Zhang L,Zhang J,Yang G,Wu D,Jiang L,Wen Z,Li M

    更新日期:2013-04-29 00:00:00

  • The Lair: a resource for exploratory analysis of published RNA-Seq data.

    abstract::Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is difficult to use in practice. We introduce a series of tools for processing and analyzing RNA-Seq data...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1357-2

    authors: Pimentel H,Sturmfels P,Bray N,Melsted P,Pachter L

    更新日期:2016-12-01 00:00:00

  • SAlign-a structure aware method for global PPI network alignment.

    abstract:BACKGROUND:High throughput experiments have generated a significantly large amount of protein interaction data, which is being used to study protein networks. Studying complete protein networks can reveal more insight about healthy/disease states than studying proteins in isolation. Similarly, a comparative study of pr...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03827-5

    authors: Ayub U,Haider I,Naveed H

    更新日期:2020-11-04 00:00:00

  • LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

    abstract:BACKGROUND:A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sop...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1146-y

    authors: Vanhoutreve R,Kress A,Legrand B,Gass H,Poch O,Thompson JD

    更新日期:2016-07-07 00:00:00

  • Identifying target processes for microbial electrosynthesis by elementary mode analysis.

    abstract:BACKGROUND:Microbial electrosynthesis and electro fermentation are techniques that aim to optimize microbial production of chemicals and fuels by regulating the cellular redox balance via interaction with electrodes. While the concept is known for decades major knowledge gaps remain, which make it hard to evaluate its ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0410-2

    authors: Kracke F,Krömer JO

    更新日期:2014-12-30 00:00:00

  • GC/MS based metabolomics: development of a data mining system for metabolite identification by using soft independent modeling of class analogy (SIMCA).

    abstract:BACKGROUND:The goal of metabolomics analyses is a comprehensive and systematic understanding of all metabolites in biological samples. Many useful platforms have been developed to achieve this goal. Gas chromatography coupled to mass spectrometry (GC/MS) is a well-established analytical method in metabolomics study, an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-131

    authors: Tsugawa H,Tsujimoto Y,Arita M,Bamba T,Fukusaki E

    更新日期:2011-05-04 00:00:00

  • Logical development of the cell ontology.

    abstract:BACKGROUND:The Cell Ontology (CL) is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and ass...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-6

    authors: Meehan TF,Masci AM,Abdulla A,Cowell LG,Blake JA,Mungall CJ,Diehl AD

    更新日期:2011-01-05 00:00:00

  • Comparing the performance of selected variant callers using synthetic data and genome segmentation.

    abstract:BACKGROUND:High-throughput sequencing has rapidly become an essential part of precision cancer medicine. But validating results obtained from analyzing and interpreting genomic data remains a rate-limiting factor. The gold standard, of course, remains manual validation by expert panels, which is not without its weaknes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2440-7

    authors: Bian X,Zhu B,Wang M,Hu Y,Chen Q,Nguyen C,Hicks B,Meerzaman D

    更新日期:2018-11-19 00:00:00

  • Bayesian neural networks for detecting epistasis in genetic association studies.

    abstract:BACKGROUND:Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. RESULTS:A non-parametric Bayesian ap...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0368-0

    authors: Beam AL,Motsinger-Reif A,Doyle J

    更新日期:2014-11-21 00:00:00

  • PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are a kind of small noncoding RNA molecules that are direct posttranscriptional regulations of mRNA targets. Studies have indicated that miRNAs play key roles in complex diseases by taking part in many biological processes, such as cell growth, cell death and so on. Therefore, in order to ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3426-9

    authors: Yan C,Wu FX,Wang J,Duan G

    更新日期:2020-03-18 00:00:00

  • Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells.

    abstract:BACKGROUND:Microarray experiments enable simultaneous measurement of the expression levels of virtually all transcripts present in cells, thereby providing a 'molecular picture' of the cell state. On the other hand, the genomic responses to a pharmacological or hormonal stimulus are dynamic molecular processes, where t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S2-S12

    authors: Mutarelli M,Cicatiello L,Ferraro L,Grober OM,Ravo M,Facchiano AM,Angelini C,Weisz A

    更新日期:2008-03-26 00:00:00

  • Augmented annotation and orthologue analysis for Oryctolagus cuniculus: Better Bunny.

    abstract:BACKGROUND:The rabbit is an important model organism used in a wide range of biomedical research. However, the rabbit genome is still sparsely annotated, thus prohibiting extensive functional analysis of gene sets derived from whole-genome experiments. We developed a web-based application that provides augmented annota...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-84

    authors: Craig DB,Kannan S,Dombkowski AA

    更新日期:2012-05-08 00:00:00

  • RWRMTN: a tool for predicting disease-associated microRNAs based on a microRNA-target gene network.

    abstract:BACKGROUND:The misregulation of microRNA (miRNA) has been shown to cause diseases. Recently, we have proposed a computational method based on a random walk framework on a miRNA-target gene network to predict disease-associated miRNAs. The prediction performance of our method is better than that of some existing state-o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03578-3

    authors: Le DH,Tran TTH

    更新日期:2020-06-15 00:00:00

  • Bayesian detection of periodic mRNA time profiles without use of training examples.

    abstract:BACKGROUND:Detection of periodically expressed genes from microarray data without use of known periodic and non-periodic training examples is an important problem, e.g. for identifying genes regulated by the cell-cycle in poorly characterised organisms. Commonly the investigator is only interested in genes expressed at...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-63

    authors: Andersson CR,Isaksson A,Gustafsson MG

    更新日期:2006-02-09 00:00:00

  • Multi-omic analysis of signalling factors in inflammatory comorbidities.

    abstract:BACKGROUND:Inflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and relate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2413-x

    authors: Xiao H,Bartoszek K,Lio' P

    更新日期:2018-11-30 00:00:00

  • Simulating variance heterogeneity in quantitative genome wide association studies.

    abstract:BACKGROUND:Analyzing Variance heterogeneity in genome wide association studies (vGWAS) is an emerging approach for detecting genetic loci involved in gene-gene and gene-environment interactions. vGWAS analysis detects variability in phenotype values across genotypes, as opposed to typical GWAS analysis, which detects v...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2061-1

    authors: Al Kawam A,Alshawaqfeh M,Cai JJ,Serpedin E,Datta A

    更新日期:2018-03-21 00:00:00

  • Image-based classification of plant genus and family for trained and untrained plant species.

    abstract:BACKGROUND:Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the mole...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2474-x

    authors: Seeland M,Rzanny M,Boho D,Wäldchen J,Mäder P

    更新日期:2019-01-03 00:00:00

  • Sample entropy analysis of cervical neoplasia gene-expression signatures.

    abstract:BACKGROUND:We introduce Approximate Entropy as a mathematical method of analysis for microarray data. Approximate entropy is applied here as a method to classify the complex gene expression patterns resultant of a clinical sample set. Since Entropy is a measure of disorder in a system, we believe that by choosing genes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-66

    authors: Botting SK,Trzeciakowski JP,Benoit MF,Salama SA,Diaz-Arrastia CR

    更新日期:2009-02-20 00:00:00

  • ProbPS: a new model for peak selection based on quantifying the dependence of the existence of derivative peaks on primary ion intensity.

    abstract:BACKGROUND:The analysis of mass spectra suggests that the existence of derivative peaks is strongly dependent on the intensity of the primary peaks. Peak selection from tandem mass spectrum is used to filter out noise and contaminant peaks. It is widely accepted that a valid primary peak tends to have high intensity an...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-346

    authors: Zhang S,Wang Y,Bu D,Zhang H,Sun S

    更新日期:2011-08-17 00:00:00

  • Membrane protein orientation and refinement using a knowledge-based statistical potential.

    abstract:BACKGROUND:Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-276

    authors: Nugent T,Jones DT

    更新日期:2013-09-18 00:00:00

  • Construction and analysis of the protein-protein interaction networks for schizophrenia, bipolar disorder, and major depression.

    abstract:BACKGROUND:Schizophrenia, bipolar disorder, and major depression are devastating mental diseases, each with distinctive yet overlapping epidemiologic characteristics. Microarray and proteomics data have revealed genes which expressed abnormally in patients. Several single nucleotide polymorphisms (SNPs) and mutations a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S13-S20

    authors: Lee SA,Tsao TT,Yang KC,Lin H,Kuo YL,Hsu CH,Lee WK,Huang KC,Kao CY

    更新日期:2011-01-01 00:00:00

  • Reuse of imputed data in microarray analysis increases imputation efficiency.

    abstract:BACKGROUND:The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the va...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-160

    authors: Kim KY,Kim BJ,Yi GS

    更新日期:2004-10-26 00:00:00

  • High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID).

    abstract:BACKGROUND:We previously developed GoMiner, an application that organizes lists of 'interesting' genes (for example, under-and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. The original version of GoMiner was oriented toward visualization and interp...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-168

    authors: Zeeberg BR,Qin H,Narasimhan S,Sunshine M,Cao H,Kane DW,Reimers M,Stephens RM,Bryant D,Burt SK,Elnekave E,Hari DM,Wynn TA,Cunningham-Rundles C,Stewart DM,Nelson D,Weinstein JN

    更新日期:2005-07-05 00:00:00

  • Primary orthologs from local sequence context.

    abstract:BACKGROUND:The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3384-2

    authors: Gao K,Miller J

    更新日期:2020-02-06 00:00:00

  • SitesIdentify: a protein functional site prediction tool.

    abstract:BACKGROUND:The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-379

    authors: Bray T,Chan P,Bougouffa S,Greaves R,Doig AJ,Warwicker J

    更新日期:2009-11-18 00:00:00

  • Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

    abstract:BACKGROUND:The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related term...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-10

    authors: Cohen R,Elhadad M,Elhadad N

    更新日期:2013-01-16 00:00:00

  • Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data.

    abstract:BACKGROUND:Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a me...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-520

    authors: Pelz CR,Kulesz-Martin M,Bagby G,Sears RC

    更新日期:2008-12-04 00:00:00

  • ElTetrado: a tool for identification and classification of tetrads and quadruplexes.

    abstract:BACKGROUND:Quadruplexes are specific structure motifs occurring, e.g., in telomeres and transcriptional regulatory regions. Recent discoveries confirmed their importance in biomedicine and led to an intensified examination of their properties. So far, the study of these motifs has focused mainly on the sequence and the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3385-1

    authors: Zok T,Popenda M,Szachniuk M

    更新日期:2020-01-31 00:00:00