Abstract:
:In predicting hierarchical protein function annotations, such as terms in the Gene Ontology (GO), the simplest approach makes predictions for each term independently. However, this approach has the unfortunate consequence that the predictor may assign to a single protein a set of terms that are inconsistent with one another; for example, the predictor may assign a specific GO term to a given protein ('purine nucleotide binding') but not assign the parent term ('nucleotide binding'). Such predictions are difficult to interpret. In this work, we focus on methods for calibrating and combining independent predictions to obtain a set of probabilistic predictions that are consistent with the topology of the ontology. We call this procedure 'reconciliation'. We begin with a baseline method for predicting GO terms from a collection of data types using an ensemble of discriminative classifiers. We apply the method to a previously described benchmark data set, and we demonstrate that the resulting predictions are frequently inconsistent with the topology of the GO. We then consider 11 distinct reconciliation methods: three heuristic methods; four variants of a Bayesian network; an extension of logistic regression to the structured case; and three novel projection methods - isotonic regression and two variants of a Kullback-Leibler projection method. We evaluate each method in three different modes - per term, per protein and joint - corresponding to three types of prediction tasks. Although the principal goal of reconciliation is interpretability, it is important to assess whether interpretability comes at a cost in terms of precision and recall. Indeed, we find that many apparently reasonable reconciliation methods yield reconciled probabilities with significantly lower precision than the original, unreconciled estimates. On the other hand, we find that isotonic regression usually performs better than the underlying, unreconciled method, and almost never performs worse; isotonic regression appears to be able to use the constraints from the GO network to its advantage. An exception to this rule is the high precision regime for joint evaluation, where Kullback-Leibler projection yields the best performance.
journal_name
Genome Bioljournal_title
Genome biologyauthors
Obozinski G,Lanckriet G,Grant C,Jordan MI,Noble WSdoi
10.1186/gb-2008-9-s1-s6subject
Has Abstractpub_date
2008-01-01 00:00:00pages
S6eissn
1474-7596issn
1474-760Xpii
gb-2008-9-s1-s6journal_volume
9 Suppl 1pub_type
杂志文章相关文献
GENOME BIOLOGY文献大全abstract:BACKGROUND:Acute myeloid leukemia (AML) comprises a group of diseases characterized by the abnormal development of malignant myeloid cells. Recent studies have demonstrated an important role for aberrant transcriptional regulation in AML pathophysiology. Although several transcription factors (TFs) involved in myeloid ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2008-9-2-r38
更新日期:2008-01-01 00:00:00
abstract::The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-019-1893-y
更新日期:2019-12-18 00:00:00
abstract::We describe a bioinformatic tool, Tumor Aberration Prediction Suite (TAPS), for the identification of allele-specific copy numbers in tumor samples using data from Affymetrix SNP arrays. It includes detailed visualization of genomic segment characteristics and iterative pattern recognition for copy number identificati...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2011-12-10-r108
更新日期:2011-10-24 00:00:00
abstract:BACKGROUND:In eukaryotic cells, oxidative phosphorylation (OXPHOS) uses the products of both nuclear and mitochondrial genes to generate cellular ATP. Interspecies comparative analysis of these genes, which appear to be under strong functional constraints, may shed light on the evolutionary mechanisms that act on a set...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-2-r11
更新日期:2005-01-01 00:00:00
abstract::Recent work has identified the human NOD-like receptor NLRX1 as a negative regulator of intracellular signaling leading to type I interferon production. Here we discuss these findings and the questions and implications they raise regarding the function of NOD-like receptors in the antiviral response. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2008-9-4-217
更新日期:2008-04-25 00:00:00
abstract:BACKGROUND:Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, b...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2011-12-9-r84
更新日期:2011-09-14 00:00:00
abstract::Silent information regulator 2 (Sir2) proteins, or sirtuins, are protein deacetylases dependent on nicotine adenine dinucleotide (NAD) and are found in organisms ranging from bacteria to humans. In eukaryotes, sirtuins regulate transcriptional repression, recombination, the cell-division cycle, microtubule organizatio...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2004-5-5-224
更新日期:2004-01-01 00:00:00
abstract:BACKGROUND:Polyploidy is ubiquitous in eukaryotic plant and fungal lineages, and it leads to the co-existence of several copies of similar or related genomes in one nucleus. In plants, polyploidy is considered a major factor in successful domestication. However, polyploidy challenges chromosome folding architecture in ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-020-01998-1
更新日期:2020-04-29 00:00:00
abstract:BACKGROUND:Cultivable archaeal species are assigned to two phyla -- the Crenarchaeota and the Euryarchaeota -- by a number of important genetic differences, and this ancient split is strongly supported by phylogenetic analysis. The recently described hyperthermophile Nanoarchaeum equitans, harboring the smallest cellul...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-5-r42
更新日期:2005-01-01 00:00:00
abstract::An intriguing recent study examines the role of miR-1202, a glutamate receptor regulating microRNA, in regulating major depressive disorder. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-014-0421-3
更新日期:2014-07-26 00:00:00
abstract:BACKGROUND:Although aberrant DNA methylation has been observed previously in acute lymphoblastic leukemia (ALL), the patterns of differential methylation have not been comprehensively determined in all subtypes of ALL on a genome-wide scale. The relationship between DNA methylation, cytogenetic background, drug resista...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2013-14-9-r105
更新日期:2013-09-24 00:00:00
abstract::Numerous methods have been developed to analyse RNA sequencing (RNA-seq) data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. Here we present superTranscripts, a substitute for a reference genome, where each gene with multiple transcripts is represented by a si...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1284-1
更新日期:2017-08-04 00:00:00
abstract:BACKGROUND:MicroRNAs (miRNAs) are a class of small, non-coding regulatory RNAs that regulate gene expression by guiding target mRNA cleavage or translational inhibition. So far, identification of miRNAs has been limited to a few model plant species, such as Arabidopsis, rice and Populus, whose genomes have been sequenc...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-6-r96
更新日期:2007-01-01 00:00:00
abstract:BACKGROUND:Francisella tularensis subspecies tularensis and holarctica are pathogenic to humans, whereas the two other subspecies, novicida and mediasiatica, rarely cause disease. To uncover the factors that allow subspecies tularensis and holarctica to be pathogenic to humans, we compared their genome sequences with t...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-6-r102
更新日期:2007-01-01 00:00:00
abstract:BACKGROUND:Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growt...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-019-1795-z
更新日期:2019-09-09 00:00:00
abstract::By using chromosome conformation capture technology, a recent study has revealed two alternative three-dimensional folding states of the human genome during the cell cycle. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb4147
更新日期:2013-12-24 00:00:00
abstract:BACKGROUND:Progressive neurological dysfunction is a key aspect of human aging. Because of underlying differences in the aging of mice and humans, useful mouse models have been difficult to obtain and study. We have used gene-expression analysis and polymorphism screening to study molecular senescence of the retina and...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-6-r48
更新日期:2005-01-01 00:00:00
abstract::Despite major progress in dissecting the molecular pathways that control DNA methylation patterns in plants, little is known about the mechanisms that shape plant methylomes over evolutionary time. Drawing on recent intra- and interspecific epigenomic studies, we show that methylome evolution over long timescales is l...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/s13059-016-1127-5
更新日期:2016-12-20 00:00:00
abstract:BACKGROUND:RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering th...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2013-14-7-r70
更新日期:2013-07-01 00:00:00
abstract:BACKGROUND:Neisseria meningitidis is an important human commensal and pathogen that causes several thousand deaths each year, mostly in young children. How the pathogen replicates and causes disease in the host is largely unknown, particularly the role of metabolism in colonization and disease. Completed genome sequenc...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2011-12-12-r127
更新日期:2011-12-30 00:00:00
abstract::A report of the fifth annual Personal Genomes and Medical Genomics meeting, held at Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA, November 14-17, 2012. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2012-13-12-324
更新日期:2012-12-19 00:00:00
abstract:BACKGROUND:During the maternal-to-zygotic transition (MZT) vast changes in the embryonic transcriptome are produced by a combination of two processes: elimination of maternally provided mRNAs and synthesis of new transcripts from the zygotic genome. Previous genome-wide analyses of the MZT have been restricted to whole...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-2-r11
更新日期:2012-02-20 00:00:00
abstract:BACKGROUND:Circular RNAs are a class of endogenous RNAs with various functions in eukaryotic cells. Worthy of note, circular RNAs play a critical role in cancer. Currently, nothing is known about their role in head and neck squamous cell carcinoma (HNSCC). The identification of circular RNAs in HNSCC might become usefu...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-017-1368-y
更新日期:2017-12-20 00:00:00
abstract::Genetic mapping and determination of the organization of the wheat genome are changing the wheat-breeding process. New initiatives to analyze the expressed portion of the wheat genome and structural analysis of the genomes of Arabidopsis and rice are increasing our knowledge of the genes that are linked to key agronom...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2002-3-5-reviews1013
更新日期:2002-01-01 00:00:00
abstract:BACKGROUND:Whole genome sequencing of marine cyanobacteria has revealed an unprecedented degree of genomic variation and streamlining. With a size of 1.66 megabase-pairs, Prochlorococcus sp. MED4 has the most compact of these genomes and it is enigmatic how the few identified regulatory proteins efficiently sustain the...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-9-r73
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:Mitogen-activated protein kinases (MAPKs) are key regulators of immune responses in animals and plants. In Arabidopsis, perception of microbe-associated molecular patterns (MAMPs) activates the MAPKs MPK3, MPK4 and MPK6. Increasing information depicts the molecular events activated by MAMPs in plants, but th...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2014-15-6-r87
更新日期:2014-06-30 00:00:00
abstract:BACKGROUND:Bacillus subtilis is an organism of interest because of its extensive industrial applications, its similarity to pathogenic organisms, and its role as the model organism for Gram-positive, sporulating bacteria. In this work, we introduce a new genome-scale metabolic model of B. subtilis 168 called iBsu1103. ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2009-10-6-r69
更新日期:2009-01-01 00:00:00
abstract:BACKGROUND:The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial s...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-6-r103
更新日期:2007-01-01 00:00:00
abstract::We propose a novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio. We present a Bayesian analysis of this ratio, which leads us to suggest two rules for choosing a cut-off on the ranked list of p ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-4-r54
更新日期:2007-01-01 00:00:00
abstract::We present a graph-based method for the analysis of repeat families in a repeat library. We build a repeat domain graph that decomposes a repeat library into repeat domains, short subsequences shared by multiple repeat families, and reveals the mosaic structure of repeat families. Our method recovers documented mosaic...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2006-7-1-r7
更新日期:2006-01-01 00:00:00