Consistent probabilistic outputs for protein function prediction.

Abstract:

:In predicting hierarchical protein function annotations, such as terms in the Gene Ontology (GO), the simplest approach makes predictions for each term independently. However, this approach has the unfortunate consequence that the predictor may assign to a single protein a set of terms that are inconsistent with one another; for example, the predictor may assign a specific GO term to a given protein ('purine nucleotide binding') but not assign the parent term ('nucleotide binding'). Such predictions are difficult to interpret. In this work, we focus on methods for calibrating and combining independent predictions to obtain a set of probabilistic predictions that are consistent with the topology of the ontology. We call this procedure 'reconciliation'. We begin with a baseline method for predicting GO terms from a collection of data types using an ensemble of discriminative classifiers. We apply the method to a previously described benchmark data set, and we demonstrate that the resulting predictions are frequently inconsistent with the topology of the GO. We then consider 11 distinct reconciliation methods: three heuristic methods; four variants of a Bayesian network; an extension of logistic regression to the structured case; and three novel projection methods - isotonic regression and two variants of a Kullback-Leibler projection method. We evaluate each method in three different modes - per term, per protein and joint - corresponding to three types of prediction tasks. Although the principal goal of reconciliation is interpretability, it is important to assess whether interpretability comes at a cost in terms of precision and recall. Indeed, we find that many apparently reasonable reconciliation methods yield reconciled probabilities with significantly lower precision than the original, unreconciled estimates. On the other hand, we find that isotonic regression usually performs better than the underlying, unreconciled method, and almost never performs worse; isotonic regression appears to be able to use the constraints from the GO network to its advantage. An exception to this rule is the high precision regime for joint evaluation, where Kullback-Leibler projection yields the best performance.

journal_name

Genome Biol

journal_title

Genome biology

authors

Obozinski G,Lanckriet G,Grant C,Jordan MI,Noble WS

doi

10.1186/gb-2008-9-s1-s6

subject

Has Abstract

pub_date

2008-01-01 00:00:00

pages

S6

eissn

1474-7596

issn

1474-760X

pii

gb-2008-9-s1-s6

journal_volume

9 Suppl 1

pub_type

杂志文章
  • Computational identification of the normal and perturbed genetic networks involved in myeloid differentiation and acute promyelocytic leukemia.

    abstract:BACKGROUND:Acute myeloid leukemia (AML) comprises a group of diseases characterized by the abnormal development of malignant myeloid cells. Recent studies have demonstrated an important role for aberrant transcriptional regulation in AML pathophysiology. Although several transcription factors (TFs) involved in myeloid ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-2-r38

    authors: Chang LW,Payton JE,Yuan W,Ley TJ,Nagarajan R,Stormo GD

    更新日期:2008-01-01 00:00:00

  • OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries.

    abstract::The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1893-y

    authors: An L,Yang T,Yang J,Nuebler J,Xiang G,Hardison RC,Li Q,Zhang Y

    更新日期:2019-12-18 00:00:00

  • Allele-specific copy number analysis of tumor samples with aneuploidy and tumor heterogeneity.

    abstract::We describe a bioinformatic tool, Tumor Aberration Prediction Suite (TAPS), for the identification of allele-specific copy numbers in tumor samples using data from Affymetrix SNP arrays. It includes detailed visualization of genomic segment characteristics and iterative pattern recognition for copy number identificati...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-10-r108

    authors: Rasmussen M,Sundström M,Göransson Kultima H,Botling J,Micke P,Birgisson H,Glimelius B,Isaksson A

    更新日期:2011-10-24 00:00:00

  • Comparison of the oxidative phosphorylation (OXPHOS) nuclear genes in the genomes of Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae.

    abstract:BACKGROUND:In eukaryotic cells, oxidative phosphorylation (OXPHOS) uses the products of both nuclear and mitochondrial genes to generate cellular ATP. Interspecies comparative analysis of these genes, which appear to be under strong functional constraints, may shed light on the evolutionary mechanisms that act on a set...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-2-r11

    authors: Tripoli G,D'Elia D,Barsanti P,Caggese C

    更新日期:2005-01-01 00:00:00

  • New tricks for old NODs.

    abstract::Recent work has identified the human NOD-like receptor NLRX1 as a negative regulator of intracellular signaling leading to type I interferon production. Here we discuss these findings and the questions and implications they raise regarding the function of NOD-like receptors in the antiviral response. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2008-9-4-217

    authors: Pietras EM,Cheng G

    更新日期:2008-04-25 00:00:00

  • The functional spectrum of low-frequency coding variation.

    abstract:BACKGROUND:Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, b...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-9-r84

    authors: Marth GT,Yu F,Indap AR,Garimella K,Gravel S,Leong WF,Tyler-Smith C,Bainbridge M,Blackwell T,Zheng-Bradley X,Chen Y,Challis D,Clarke L,Ball EV,Cibulskis K,Cooper DN,Fulton B,Hartl C,Koboldt D,Muzny D,Smith R,Soug

    更新日期:2011-09-14 00:00:00

  • Sirtuins: Sir2-related NAD-dependent protein deacetylases.

    abstract::Silent information regulator 2 (Sir2) proteins, or sirtuins, are protein deacetylases dependent on nicotine adenine dinucleotide (NAD) and are found in organisms ranging from bacteria to humans. In eukaryotes, sirtuins regulate transcriptional repression, recombination, the cell-division cycle, microtubule organizatio...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2004-5-5-224

    authors: North BJ,Verdin E

    更新日期:2004-01-01 00:00:00

  • Wheat chromatin architecture is organized in genome territories and transcription factories.

    abstract:BACKGROUND:Polyploidy is ubiquitous in eukaryotic plant and fungal lineages, and it leads to the co-existence of several copies of similar or related genomes in one nucleus. In plants, polyploidy is considered a major factor in successful domestication. However, polyploidy challenges chromosome folding architecture in ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-01998-1

    authors: Concia L,Veluchamy A,Ramirez-Prado JS,Martin-Ramirez A,Huang Y,Perez M,Domenichini S,Rodriguez Granados NY,Kim S,Blein T,Duncan S,Pichot C,Manza-Mianza D,Juery C,Paux E,Moore G,Hirt H,Bergounioux C,Crespi M,Mahfouz

    更新日期:2020-04-29 00:00:00

  • Nanoarchaea: representatives of a novel archaeal phylum or a fast-evolving euryarchaeal lineage related to Thermococcales?

    abstract:BACKGROUND:Cultivable archaeal species are assigned to two phyla -- the Crenarchaeota and the Euryarchaeota -- by a number of important genetic differences, and this ancient split is strongly supported by phylogenetic analysis. The recently described hyperthermophile Nanoarchaeum equitans, harboring the smallest cellul...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-5-r42

    authors: Brochier C,Gribaldo S,Zivanovic Y,Confalonieri F,Forterre P

    更新日期:2005-01-01 00:00:00

  • Chipping away at major depressive disorder.

    abstract::An intriguing recent study examines the role of miR-1202, a glutamate receptor regulating microRNA, in regulating major depressive disorder. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-014-0421-3

    authors: Rucker JJ,McGuffin P

    更新日期:2014-07-26 00:00:00

  • Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia.

    abstract:BACKGROUND:Although aberrant DNA methylation has been observed previously in acute lymphoblastic leukemia (ALL), the patterns of differential methylation have not been comprehensively determined in all subtypes of ALL on a genome-wide scale. The relationship between DNA methylation, cytogenetic background, drug resista...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-9-r105

    authors: Nordlund J,Bäcklin CL,Wahlberg P,Busche S,Berglund EC,Eloranta ML,Flaegstad T,Forestier E,Frost BM,Harila-Saari A,Heyman M,Jónsson OG,Larsson R,Palle J,Rönnblom L,Schmiegelow K,Sinnett D,Söderhäll S,Pastinen T,Gusta

    更新日期:2013-09-24 00:00:00

  • SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes.

    abstract::Numerous methods have been developed to analyse RNA sequencing (RNA-seq) data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. Here we present superTranscripts, a substitute for a reference genome, where each gene with multiple transcripts is represented by a si...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1284-1

    authors: Davidson NM,Hawkins ADK,Oshlack A

    更新日期:2017-08-04 00:00:00

  • Cloning and characterization of microRNAs from wheat (Triticum aestivum L.).

    abstract:BACKGROUND:MicroRNAs (miRNAs) are a class of small, non-coding regulatory RNAs that regulate gene expression by guiding target mRNA cleavage or translational inhibition. So far, identification of miRNAs has been limited to a few model plant species, such as Arabidopsis, rice and Populus, whose genomes have been sequenc...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-6-r96

    authors: Yao Y,Guo G,Ni Z,Sunkar R,Du J,Zhu JK,Sun Q

    更新日期:2007-01-01 00:00:00

  • Comparison of Francisella tularensis genomes reveals evolutionary events associated with the emergence of human pathogenic strains.

    abstract:BACKGROUND:Francisella tularensis subspecies tularensis and holarctica are pathogenic to humans, whereas the two other subspecies, novicida and mediasiatica, rarely cause disease. To uncover the factors that allow subspecies tularensis and holarctica to be pathogenic to humans, we compared their genome sequences with t...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-6-r102

    authors: Rohmer L,Fong C,Abmayr S,Wasnick M,Larson Freeman TJ,Radey M,Guina T,Svensson K,Hayden HS,Jacobs M,Gallagher LA,Manoil C,Ernst RK,Drees B,Buckley D,Haugen E,Bovee D,Zhou Y,Chang J,Levy R,Lim R,Gillett W,Guenth

    更新日期:2007-01-01 00:00:00

  • A comparison of automatic cell identification methods for single-cell RNA sequencing data.

    abstract:BACKGROUND:Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growt...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-019-1795-z

    authors: Abdelaal T,Michielsen L,Cats D,Hoogduin D,Mei H,Reinders MJT,Mahfouz A

    更新日期:2019-09-09 00:00:00

  • Changes in the organization of the genome during the mammalian cell cycle.

    abstract::By using chromosome conformation capture technology, a recent study has revealed two alternative three-dimensional folding states of the human genome during the cell cycle. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb4147

    authors: Giorgetti L,Servant N,Heard E

    更新日期:2013-12-24 00:00:00

  • Mechanisms of aging in senescence-accelerated mice.

    abstract:BACKGROUND:Progressive neurological dysfunction is a key aspect of human aging. Because of underlying differences in the aging of mice and humans, useful mouse models have been difficult to obtain and study. We have used gene-expression analysis and polymorphism screening to study molecular senescence of the retina and...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-6-r48

    authors: Carter TA,Greenhall JA,Yoshida S,Fuchs S,Helton R,Swaroop A,Lockhart DJ,Barlow C

    更新日期:2005-01-01 00:00:00

  • Methylome evolution in plants.

    abstract::Despite major progress in dissecting the molecular pathways that control DNA methylation patterns in plants, little is known about the mechanisms that shape plant methylomes over evolutionary time. Drawing on recent intra- and interspecific epigenomic studies, we show that methylome evolution over long timescales is l...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/s13059-016-1127-5

    authors: Vidalis A,Živković D,Wardenaar R,Roquis D,Tellier A,Johannes F

    更新日期:2016-12-20 00:00:00

  • Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

    abstract:BACKGROUND:RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering th...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-7-r70

    authors: Gonzàlez-Porta M,Frankish A,Rung J,Harrow J,Brazma A

    更新日期:2013-07-01 00:00:00

  • Interrogation of global mutagenesis data with a genome scale model of Neisseria meningitidis to assess gene fitness in vitro and in sera.

    abstract:BACKGROUND:Neisseria meningitidis is an important human commensal and pathogen that causes several thousand deaths each year, mostly in young children. How the pathogen replicates and causes disease in the host is largely unknown, particularly the role of metabolism in colonization and disease. Completed genome sequenc...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2011-12-12-r127

    authors: Mendum TA,Newcombe J,Mannan AA,Kierzek AM,McFadden J

    更新日期:2011-12-30 00:00:00

  • Personal genomes and precision medicine.

    abstract::A report of the fifth annual Personal Genomes and Medical Genomics meeting, held at Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA, November 14-17, 2012. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2012-13-12-324

    authors: Highnam G,Mittelman D

    更新日期:2012-12-19 00:00:00

  • Genome-wide analysis of the maternal-to-zygotic transition in Drosophila primordial germ cells.

    abstract:BACKGROUND:During the maternal-to-zygotic transition (MZT) vast changes in the embryonic transcriptome are produced by a combination of two processes: elimination of maternally provided mRNAs and synthesis of new transcripts from the zygotic genome. Previous genome-wide analyses of the MZT have been restricted to whole...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-2-r11

    authors: Siddiqui NU,Li X,Luo H,Karaiskakis A,Hou H,Kislinger T,Westwood JT,Morris Q,Lipshitz HD

    更新日期:2012-02-20 00:00:00

  • The oncogenic role of circPVT1 in head and neck squamous cell carcinoma is mediated through the mutant p53/YAP/TEAD transcription-competent complex.

    abstract:BACKGROUND:Circular RNAs are a class of endogenous RNAs with various functions in eukaryotic cells. Worthy of note, circular RNAs play a critical role in cancer. Currently, nothing is known about their role in head and neck squamous cell carcinoma (HNSCC). The identification of circular RNAs in HNSCC might become usefu...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-017-1368-y

    authors: Verduci L,Ferraiuolo M,Sacconi A,Ganci F,Vitale J,Colombo T,Paci P,Strano S,Macino G,Rajewsky N,Blandino G

    更新日期:2017-12-20 00:00:00

  • Wheat functional genomics and engineering crop improvement.

    abstract::Genetic mapping and determination of the organization of the wheat genome are changing the wheat-breeding process. New initiatives to analyze the expressed portion of the wheat genome and structural analysis of the genomes of Arabidopsis and rice are increasing our knowledge of the genes that are linked to key agronom...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2002-3-5-reviews1013

    authors: Francki M,Appels R

    更新日期:2002-01-01 00:00:00

  • Identification of cyanobacterial non-coding RNAs by comparative genome analysis.

    abstract:BACKGROUND:Whole genome sequencing of marine cyanobacteria has revealed an unprecedented degree of genomic variation and streamlining. With a size of 1.66 megabase-pairs, Prochlorococcus sp. MED4 has the most compact of these genomes and it is enigmatic how the few identified regulatory proteins efficiently sustain the...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-9-r73

    authors: Axmann IM,Kensche P,Vogel J,Kohl S,Herzel H,Hess WR

    更新日期:2005-01-01 00:00:00

  • Functional analysis of Arabidopsis immune-related MAPKs uncovers a role for MPK3 as negative regulator of inducible defences.

    abstract:BACKGROUND:Mitogen-activated protein kinases (MAPKs) are key regulators of immune responses in animals and plants. In Arabidopsis, perception of microbe-associated molecular patterns (MAMPs) activates the MAPKs MPK3, MPK4 and MPK6. Increasing information depicts the molecular events activated by MAMPs in plants, but th...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2014-15-6-r87

    authors: Frei dit Frey N,Garcia AV,Bigeard J,Zaag R,Bueso E,Garmier M,Pateyron S,de Tauzia-Moreau ML,Brunaud V,Balzergue S,Colcombet J,Aubourg S,Martin-Magniette ML,Hirt H

    更新日期:2014-06-30 00:00:00

  • iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations.

    abstract:BACKGROUND:Bacillus subtilis is an organism of interest because of its extensive industrial applications, its similarity to pathogenic organisms, and its role as the model organism for Gram-positive, sporulating bacteria. In this work, we introduce a new genome-scale metabolic model of B. subtilis 168 called iBsu1103. ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2009-10-6-r69

    authors: Henry CS,Zinner JF,Cohoon MP,Stevens RL

    更新日期:2009-01-01 00:00:00

  • Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains.

    abstract:BACKGROUND:The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial s...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-6-r103

    authors: Hogg JS,Hu FZ,Janto B,Boissy R,Hayes J,Keefe R,Post JC,Ehrlich GD

    更新日期:2007-01-01 00:00:00

  • Statistical tools for synthesizing lists of differentially expressed features in related experiments.

    abstract::We propose a novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio. We present a Bayesian analysis of this ratio, which leads us to suggest two rules for choosing a cut-off on the ranked list of p ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-4-r54

    authors: Blangiardo M,Richardson S

    更新日期:2007-01-01 00:00:00

  • Identifying repeat domains in large genomes.

    abstract::We present a graph-based method for the analysis of repeat families in a repeat library. We build a repeat domain graph that decomposes a repeat library into repeat domains, short subsequences shared by multiple repeat families, and reveals the mosaic structure of repeat families. Our method recovers documented mosaic...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2006-7-1-r7

    authors: Zhi D,Raphael BJ,Price AL,Tang H,Pevzner PA

    更新日期:2006-01-01 00:00:00