A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases.

Abstract:

BACKGROUND:The PathoLogic program constructs Pathway/Genome databases by using a genome's annotation to predict the set of metabolic pathways present in an organism. PathoLogic determines the set of reactions composing those pathways from the enzymes annotated in the organism's genome. Most annotation efforts fail to assign function to 40-60% of sequences. In addition, large numbers of sequences may have non-specific annotations (e.g., thiolase family protein). Pathway holes occur when a genome appears to lack the enzymes needed to catalyze reactions in a pathway. If a protein has not been assigned a specific function during the annotation process, any reaction catalyzed by that protein will appear as a missing enzyme or pathway hole in a Pathway/Genome database. RESULTS:We have developed a method that efficiently combines homology and pathway-based evidence to identify candidates for filling pathway holes in Pathway/Genome databases. Our program not only identifies potential candidate sequences for pathway holes, but combines data from multiple, heterogeneous sources to assess the likelihood that a candidate has the required function. Our algorithm emulates the manual sequence annotation process, considering not only evidence from homology searches, but also considering evidence from genomic context (i.e., is the gene part of an operon?) and functional context (e.g., are there functionally-related genes nearby in the genome?) to determine the posterior belief that a candidate has the required function. The method can be applied across an entire metabolic pathway network and is generally applicable to any pathway database. The program uses a set of sequences encoding the required activity in other genomes to identify candidate proteins in the genome of interest, and then evaluates each candidate by using a simple Bayes classifier to determine the probability that the candidate has the desired function. We achieved 71% precision at a probability threshold of 0.9 during cross-validation using known reactions in computationally-predicted pathway databases. After applying our method to 513 pathway holes in 333 pathways from three Pathway/Genome databases, we increased the number of complete pathways by 42%. We made putative assignments to 46% of the holes, including annotation of 17 sequences of previously unknown function. CONCLUSIONS:Our pathway hole filler can be used not only to increase the utility of Pathway/Genome databases to both experimental and computational researchers, but also to improve predictions of protein function.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Green ML,Karp PD

doi

10.1186/1471-2105-5-76

keywords:

subject

Has Abstract

pub_date

2004-06-09 00:00:00

pages

76

issn

1471-2105

pii

1471-2105-5-76

journal_volume

5

pub_type

杂志文章
  • SHIVA - a web application for drug resistance and tropism testing in HIV.

    abstract:BACKGROUND:Drug resistance testing is mandatory in antiretroviral therapy in human immunodeficiency virus (HIV) infected patients for successful treatment. The emergence of resistances against antiretroviral agents remains the major obstacle in inhibition of viral replication and thus to control infection. Due to the h...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1179-2

    authors: Riemenschneider M,Hummel T,Heider D

    更新日期:2016-08-22 00:00:00

  • μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix.

    abstract:BACKGROUND:The miRNAs, a class of short approximately 22-nucleotide non-coding RNAs, often act post-transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular proc...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-14-266

    authors: Paul S,Maji P

    更新日期:2013-09-04 00:00:00

  • Simple binary segmentation frameworks for identifying variation in DNA copy number.

    abstract:BACKGROUND:Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-277

    authors: Yang TY

    更新日期:2012-10-30 00:00:00

  • Metabolite coupling in genome-scale metabolic networks.

    abstract:BACKGROUND:Biochemically detailed stoichiometric matrices have now been reconstructed for various bacteria, yeast, and for the human cardiac mitochondrion based on genomic and proteomic data. These networks have been manually curated based on legacy data and elementally and charge balanced. Comparative analysis of thes...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-111

    authors: Becker SA,Price ND,Palsson BØ

    更新日期:2006-03-06 00:00:00

  • GlycomeDB - integration of open-access carbohydrate structure databases.

    abstract:BACKGROUND:Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carboh...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-384

    authors: Ranzinger R,Herget S,Wetter T,von der Lieth CW

    更新日期:2008-09-19 00:00:00

  • A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection.

    abstract:BACKGROUND:The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire geno...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-018-2077-6

    authors: Wang W,Sun W,Wang W,Szatkiewicz J

    更新日期:2018-03-01 00:00:00

  • Cell subset prediction for blood genomic studies.

    abstract:BACKGROUND:Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment decisions. Most studies are based on analysis of total peripheral blood mononuclear cells (PBMCs), a mixed population. In this case, accuracy is inhere...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-258

    authors: Bolen CR,Uduman M,Kleinstein SH

    更新日期:2011-06-24 00:00:00

  • PubFocus: semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm.

    abstract:BACKGROUND:Understanding research activity within any given biomedical field is important. Search outputs generated by MEDLINE/PubMed are not well classified and require lengthy manual citation analysis. Automation of citation analytics can be very useful and timesaving for both novices and experts. RESULTS:PubFocus w...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-424

    authors: Plikus MV,Zhang Z,Chuong CM

    更新日期:2006-10-02 00:00:00

  • Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations.

    abstract:BACKGROUND:Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1790-x

    authors: Nguyen LH,Holmes S

    更新日期:2017-09-13 00:00:00

  • Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.

    abstract:BACKGROUND:In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene express...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03810-0

    authors: Acharya S,Cui L,Pan Y

    更新日期:2020-12-30 00:00:00

  • Improving clustering with metabolic pathway data.

    abstract:BACKGROUND:It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. T...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-15-101

    authors: Milone DH,Stegmayer G,López M,Kamenetzky L,Carrari F

    更新日期:2014-04-10 00:00:00

  • Improving performance of mammalian microRNA target prediction.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction speci...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-476

    authors: Liu H,Yue D,Chen Y,Gao SJ,Huang Y

    更新日期:2010-09-22 00:00:00

  • Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data.

    abstract:BACKGROUND:Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-282

    authors: Lopez D,Casero D,Cokus SJ,Merchant SS,Pellegrini M

    更新日期:2011-07-12 00:00:00

  • Generating confidence intervals on biological networks.

    abstract:BACKGROUND:In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-467

    authors: Thorne T,Stumpf MP

    更新日期:2007-11-30 00:00:00

  • GeneBins: a database for classifying gene expression data, with application to plant genome arrays.

    abstract:BACKGROUND:To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. RESULTS:We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-87

    authors: Goffard N,Weiller G

    更新日期:2007-03-12 00:00:00

  • Modeling lymphocyte homing and encounters in lymph nodes.

    abstract:BACKGROUND:The efficiency of lymph nodes depends on tissue structure and organization, which allow the coordination of lymphocyte traffic. Despite their essential role, our understanding of lymph node specific mechanisms is still incomplete and currently a topic of intense research. RESULTS:In this paper, we present a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-387

    authors: Baldazzi V,Paci P,Bernaschi M,Castiglione F

    更新日期:2009-11-25 00:00:00

  • Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

    abstract:BACKGROUND:Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two even...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-16-S10-S2

    authors: Pyysalo S,Ohta T,Rak R,Rowley A,Chun HW,Jung SJ,Choi SP,Tsujii J,Ananiadou S

    更新日期:2015-01-01 00:00:00

  • Restricted DCJ-indel model: sorting linear genomes with DCJ and indels.

    abstract:BACKGROUND:The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S19-S14

    authors: da Silva PH,Machado R,Dantas S,Braga MD

    更新日期:2012-01-01 00:00:00

  • BicPAMS: software for biological data analysis with pattern-based biclustering.

    abstract:BACKGROUND:Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entiti...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1493-3

    authors: Henriques R,Ferreira FL,Madeira SC

    更新日期:2017-02-02 00:00:00

  • GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing.

    abstract:BACKGROUND:Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-508

    authors: Yu G

    更新日期:2010-10-12 00:00:00

  • Detecting intergene correlation changes in microarray analysis: a new approach to gene selection.

    abstract:BACKGROUND:Microarray technology is commonly used as a simple screening tool with a focus on selecting genes that exhibit extremely large differential expressions between different phenotypes. It lacks the ability to select genes that change their relationships with other genes in different biological conditions (diffe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-20

    authors: Hu R,Qiu X,Glazko G,Klebanov L,Yakovlev A

    更新日期:2009-01-15 00:00:00

  • Identifications of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila.

    abstract:BACKGROUND:MicroRNAs (miRNAs) are a class of endogenous regulatory small RNAs which play an important role in posttranscriptional regulations by targeting mRNAs for cleavage or translational repression. The base-pairing between the 5'-end of miRNA and the target mRNA 3'-UTRs is essential for the miRNA:mRNA recognition....

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-432

    authors: Gu J,Fu H,Zhang X,Li Y

    更新日期:2007-11-08 00:00:00

  • Selection of optimal reference genes for normalization in quantitative RT-PCR.

    abstract:BACKGROUND:Normalization in real-time qRT-PCR is necessary to compensate for experimental variation. A popular normalization strategy employs reference gene(s), which may introduce additional variability into normalized expression levels due to innate variation (between tissues, individuals, etc). To minimize this inna...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-253

    authors: Chervoneva I,Li Y,Schulz S,Croker S,Wilson C,Waldman SA,Hyslop T

    更新日期:2010-05-14 00:00:00

  • Predicting MoRFs in protein sequences using HMM profiles.

    abstract:BACKGROUND:Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs i...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-016-1375-0

    authors: Sharma R,Kumar S,Tsunoda T,Patil A,Sharma A

    更新日期:2016-12-22 00:00:00

  • Evaluation of absolute quantitation by nonlinear regression in probe-based real-time PCR.

    abstract:BACKGROUND:In real-time PCR data analysis, the cycle threshold (CT) method is currently the gold standard. This method is based on an assumption of equal PCR efficiency in all reactions, and precision may suffer if this condition is not met. Nonlinear regression analysis (NLR) or curve fitting has therefore been sugges...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-107

    authors: Goll R,Olsen T,Cui G,Florholmen J

    更新日期:2006-03-03 00:00:00

  • A computational diffusion model to study antibody transport within reconstructed tumor microenvironments.

    abstract:BACKGROUND:Antibodies revolutionized cancer treatment over the past decades. Despite their successfully application, there are still challenges to overcome to improve efficacy, such as the heterogeneous distribution of antibodies within tumors. Tumor microenvironment features, such as the distribution of tumor and othe...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-03854-2

    authors: Cartaxo AL,Almeida J,Gualda EJ,Marsal M,Loza-Alvarez P,Brito C,Isidro IA

    更新日期:2020-11-17 00:00:00

  • Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA).

    abstract:BACKGROUND:MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes ba...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-261

    authors: González JR,Carrasco JL,Armengol L,Villatoro S,Jover L,Yasui Y,Estivill X

    更新日期:2008-06-04 00:00:00

  • The rise and fall of breakpoint reuse depending on genome resolution.

    abstract:BACKGROUND:During evolution, large-scale genome rearrangements of chromosomes shuffle the order of homologous genome sequences ("synteny blocks") across species. Some years ago, a controversy erupted in genome rearrangement studies over whether rearrangements recur, causing breakpoints to be reused. METHODS:We investi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S9-S1

    authors: Attie O,Darling AE,Yancopoulos S

    更新日期:2011-10-05 00:00:00

  • Assessment of the relationship between pre-chip and post-chip quality measures for Affymetrix GeneChip expression data.

    abstract:BACKGROUND:Gene expression microarray experiments are expensive to conduct and guidelines for acceptable quality control at intermediate steps before and after the samples are hybridised to chips are vague. We conducted an experiment hybridising RNA from human brain to 117 U133A Affymetrix GeneChips and used these data...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-211

    authors: Jones L,Goldstein DR,Hughes G,Strand AD,Collin F,Dunnett SB,Kooperberg C,Aragaki A,Olson JM,Augood SJ,Faull RL,Luthi-Carter R,Moskvina V,Hodges AK

    更新日期:2006-04-19 00:00:00

  • The IronChip evaluation package: a package of perl modules for robust analysis of custom microarrays.

    abstract:BACKGROUND:Gene expression studies greatly contribute to our understanding of complex relationships in gene regulatory networks. However, the complexity of array design, production and manipulations are limiting factors, affecting data quality. The use of customized DNA microarrays improves overall data quality in many...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-112

    authors: Vainshtein Y,Sanchez M,Brazma A,Hentze MW,Dandekar T,Muckenthaler MU

    更新日期:2010-03-01 00:00:00