Mining physical protein-protein interactions from the literature.

Abstract:

BACKGROUND:Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. RESULTS:During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F1 score of 28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and 26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. CONCLUSION:We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.

journal_name

Genome Biol

journal_title

Genome biology

authors

Huang M,Ding S,Wang H,Zhu X

doi

10.1186/gb-2008-9-s2-s12

subject

Has Abstract

pub_date

2008-01-01 00:00:00

pages

S12

eissn

1474-7596

issn

1474-760X

pii

gb-2008-9-s2-s12

journal_volume

9 Suppl 2

pub_type

杂志文章
  • 5-hydroxymethylcytosine marks promoters in colon that resist DNA hypermethylation in cancer.

    abstract:BACKGROUND:The discovery of cytosine hydroxymethylation (5hmC) as a mechanism that potentially controls DNA methylation changes typical of neoplasia prompted us to investigate its behaviour in colon cancer. 5hmC is globally reduced in proliferating cells such as colon tumours and the gut crypt progenitors, from which t...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-015-0605-5

    authors: Uribe-Lewis S,Stark R,Carroll T,Dunning MJ,Bachman M,Ito Y,Stojic L,Halim S,Vowler SL,Lynch AG,Delatte B,de Bony EJ,Colin L,Defrance M,Krueger F,Silva AL,Ten Hoopen R,Ibrahim AE,Fuks F,Murrell A

    更新日期:2015-04-01 00:00:00

  • Transposable elements in the mammalian embryo: pioneers surviving through stealth and service.

    abstract::Transposable elements (TEs) are notable drivers of genetic innovation. Over evolutionary time, TE insertions can supply new promoter, enhancer, and insulator elements to protein-coding genes and establish novel, species-specific gene regulatory networks. Conversely, ongoing TE-driven insertional mutagenesis, nonhomolo...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/s13059-016-0965-5

    authors: Gerdes P,Richardson SR,Mager DL,Faulkner GJ

    更新日期:2016-05-09 00:00:00

  • High throughput single-cell detection of multiplex CRISPR-edited gene modifications.

    abstract::CRISPR-Cas9 gene editing has transformed our ability to rapidly interrogate the functional impact of somatic mutations in human cancers. Droplet-based technology enables the analysis of Cas9-introduced gene edits in thousands of single cells. Using this technology, we analyze Ba/F3 cells engineered to express single o...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-020-02174-1

    authors: Ten Hacken E,Clement K,Li S,Hernández-Sánchez M,Redd R,Wang S,Ruff D,Gruber M,Baranowski K,Jacob J,Flynn J,Jones KW,Neuberg D,Livak KJ,Pinello L,Wu CJ

    更新日期:2020-10-20 00:00:00

  • The rate of the molecular clock and the cost of gratuitous protein synthesis.

    abstract:BACKGROUND:The nature of the protein molecular clock, the protein-specific rate of amino acid substitutions, is among the central questions of molecular evolution. Protein expression level is the dominant determinant of the clock rate in a number of organisms. It has been suggested that highly expressed proteins evolve...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2010-11-9-r98

    authors: Plata G,Gottesman ME,Vitkup D

    更新日期:2010-01-01 00:00:00

  • Comparison of the oxidative phosphorylation (OXPHOS) nuclear genes in the genomes of Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae.

    abstract:BACKGROUND:In eukaryotic cells, oxidative phosphorylation (OXPHOS) uses the products of both nuclear and mitochondrial genes to generate cellular ATP. Interspecies comparative analysis of these genes, which appear to be under strong functional constraints, may shed light on the evolutionary mechanisms that act on a set...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-2-r11

    authors: Tripoli G,D'Elia D,Barsanti P,Caggese C

    更新日期:2005-01-01 00:00:00

  • The first aurochs genome reveals the breeding history of British and European cattle.

    abstract::The first genome sequence of the extinct European wild aurochs reveals the genetic foundation of native British and Irish landraces of cattle.See related Research article: www.dx.doi.org/10.1186/s13059-015-0790-2. ...

    journal_title:Genome biology

    pub_type: 评论,杂志文章

    doi:10.1186/s13059-015-0793-z

    authors: Orlando L

    更新日期:2015-10-26 00:00:00

  • Membrane transporters and protein traffic networks differentially affecting metal tolerance: a genomic phenotyping study in yeast.

    abstract:BACKGROUND:The cellular mechanisms that underlie metal toxicity and detoxification are rather variegated and incompletely understood. Genomic phenotyping was used to assess the roles played by all nonessential Saccharomyces cerevisiae proteins in modulating cell viability after exposure to cadmium, nickel, and other me...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2008-9-4-r67

    authors: Ruotolo R,Marchini G,Ottonello S

    更新日期:2008-04-07 00:00:00

  • Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency.

    abstract::Understanding the functional impact of genomic variants is a major goal of modern genetics and personalized medicine. Although many synonymous and non-coding variants act through altering the efficiency of pre-mRNA splicing, it is difficult to predict how these variants impact pre-mRNA splicing. Here, we describe a ma...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-018-1437-x

    authors: Adamson SI,Zhan L,Graveley BR

    更新日期:2018-06-01 00:00:00

  • Transcriptional profiling of long non-coding RNAs and novel transcribed regions across a diverse panel of archived human cancers.

    abstract:BACKGROUND:Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs, as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers and cancer-t...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-8-r75

    authors: Brunner AL,Beck AH,Edris B,Sweeney RT,Zhu SX,Li R,Montgomery K,Varma S,Gilks T,Guo X,Foley JW,Witten DM,Giacomini CP,Flynn RA,Pollack JR,Tibshirani R,Chang HY,van de Rijn M,West RB

    更新日期:2012-08-28 00:00:00

  • Copy number variation goes clinical.

    abstract::A report of the First Golden Helix Symposium 'Copy Number Variation (CNV) and Genomic Alterations in Health and Disease', Athens, Greece, 28-29 November 2008. ...

    journal_title:Genome biology

    pub_type:

    doi:10.1186/gb-2009-10-1-301

    authors: Le Caignec C,Redon R

    更新日期:2009-01-01 00:00:00

  • Attacking pathogens through their hosts.

    abstract::Through understanding the intricacies of host-pathogen interactions, it is now possible to inhibit the growth of microbes, especially viruses, by targeting host-cell proteins and functions. This new antimicrobial strategy has proved effective in the laboratory and in the clinic, and it has great potential for the futu...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2006-7-1-201

    authors: Kellam P

    更新日期:2006-01-01 00:00:00

  • Evidence from comparative genomics for a complete sexual cycle in the 'asexual' pathogenic yeast Candida glabrata.

    abstract:BACKGROUND:Candida glabrata is a pathogenic yeast of increasing medical concern. It has been regarded as asexual since it was first described in 1917, yet phylogenetic analyses have revealed that it is more closely related to sexual yeasts than other Candida species. We show here that the C. glabrata genome contains ma...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2003-4-2-r10

    authors: Wong S,Fares MA,Zimmermann W,Butler G,Wolfe KH

    更新日期:2003-01-01 00:00:00

  • Mediation of Drosophila autosomal dosage effects and compensation by network interactions.

    abstract:BACKGROUND:Gene dosage change is a mild perturbation that is a valuable tool for pathway reconstruction in Drosophila. While it is often assumed that reducing gene dose by half leads to two-fold less expression, there is partial autosomal dosage compensation in Drosophila, which may be mediated by feedback or buffering...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-4-r28

    authors: Malone JH,Cho DY,Mattiuzzo NR,Artieri CG,Jiang L,Dale RK,Smith HE,McDaniel J,Munro S,Salit M,Andrews J,Przytycka TM,Oliver B

    更新日期:2012-04-24 00:00:00

  • Intraepithelial gamma delta T cells exposed by functional genomics.

    abstract::Epithelial tissues house gammadelta T cells, which are important for the mucosal immune system and may be involved in controlling malignancies, infections and inflammation. Whole-genome gene-expression analysis provides a new way to study the signals required for the activation of gammadelta T cells, their mode of act...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2001-2-11-reviews1031

    authors: Boismenu R,Havran WL

    更新日期:2001-01-01 00:00:00

  • Reduced selection leads to accelerated gene loss in Shigella.

    abstract:BACKGROUND:Obligate pathogenic bacteria lose more genes relative to facultative pathogens, which, in turn, lose more genes than free-living bacteria. It was suggested that the increased gene loss in obligate pathogens may be due to a reduction in the effectiveness of purifying selection. Less attention has been given t...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-8-r164

    authors: Hershberg R,Tang H,Petrov DA

    更新日期:2007-01-01 00:00:00

  • Protein profiling comes of age.

    abstract::Ever since DNA microarrays were first applied to the quantitation of RNA levels, there has been considerable interest in generating a protein homolog that can be used to assay cellular protein expression. A recent paper describes the first microarray that can be used for such protein profiling. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2001-2-2-reviews1004

    authors: Tomlinson IM,Holt LJ

    更新日期:2001-01-01 00:00:00

  • The cryptochromes.

    abstract::Cryptochromes are photoreceptors that regulate entrainment by light of the circadian clock in plants and animals. They also act as integral parts of the central circadian oscillator in animal brains and as receptors controlling photomorphogenesis in response to blue or ultraviolet (UV-A) light in plants. Cryptochromes...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2005-6-5-220

    authors: Lin C,Todo T

    更新日期:2005-01-01 00:00:00

  • Using orthologous and paralogous proteins to identify specificity determining residues.

    abstract:BACKGROUND:Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues which de...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2002-3-3-preprint0002

    authors: Mirny LA,Gelfand MS

    更新日期:2002-01-01 00:00:00

  • Rapid gene isolation in barley and wheat by mutant chromosome sequencing.

    abstract::Identification of causal mutations in barley and wheat is hampered by their large genomes and suppressed recombination. To overcome these obstacles, we have developed MutChromSeq, a complexity reduction approach based on flow sorting and sequencing of mutant chromosomes, to identify induced mutations by comparison to ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-016-1082-1

    authors: Sánchez-Martín J,Steuernagel B,Ghosh S,Herren G,Hurni S,Adamski N,Vrána J,Kubaláková M,Krattinger SG,Wicker T,Doležel J,Keller B,Wulff BB

    更新日期:2016-10-31 00:00:00

  • Polysome profiling reveals translational control of gene expression in the human malaria parasite Plasmodium falciparum.

    abstract:BACKGROUND:In eukaryotic organisms, gene expression is regulated at multiple levels during the processes of transcription and translation. The absence of a tight regulatory network for transcription in the human malaria parasite suggests that gene expression may largely be controlled at post-transcriptional and transla...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2013-14-11-r128

    authors: Bunnik EM,Chung DW,Hamilton M,Ponts N,Saraf A,Prudhomme J,Florens L,Le Roch KG

    更新日期:2013-11-22 00:00:00

  • The Dictyostelium genome encodes numerous RasGEFs with multiple biological roles.

    abstract:BACKGROUND:Dictyostelium discoideum is a eukaryote with a simple lifestyle and a relatively small genome whose sequence has been fully determined. It is widely used for studies on cell signaling, movement and multicellular development. Ras guanine-nucleotide exchange factors (RasGEFs) are the proteins that activate Ras...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2005-6-8-r68

    authors: Wilkins A,Szafranski K,Fraser DJ,Bakthavatsalam D,Müller R,Fisher PR,Glöckner G,Eichinger L,Noegel AA,Insall RH

    更新日期:2005-01-01 00:00:00

  • Variations in abundance: genome-wide responses to genetic variation and vice versa.

    abstract::How do naturally occurring polymorphisms in DNA sequence relate to variation in gene expression? Recent work to map genetic sources of expression variation has shown a surprising balance between cis and trans effects. Other work suggests some chromosomal clustering of genes by expression pattern. A synthesis of approa...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2002-3-10-reviews1029

    authors: Hamilton BA

    更新日期:2002-09-19 00:00:00

  • DarkHorse: a method for genome-wide prediction of horizontal gene transfer.

    abstract::A new approach to rapid, genome-wide identification and ranking of horizontal transfer candidate proteins is presented. The method is quantitative, reproducible, and computationally undemanding. It can be combined with genomic signature and/or phylogenetic tree-building procedures to improve accuracy and efficiency. T...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-2-r16

    authors: Podell S,Gaasterland T

    更新日期:2007-01-01 00:00:00

  • Discovery of estrogen receptor alpha target genes and response elements in breast tumor cells.

    abstract:BACKGROUND:Estrogens and their receptors are important in human development, physiology and disease. In this study, we utilized an integrated genome-wide molecular and computational approach to characterize the interaction between the activated estrogen receptor (ER) and the regulatory elements of candidate target gene...

    journal_title:Genome biology

    pub_type: 杂志文章,meta分析

    doi:10.1186/gb-2004-5-9-r66

    authors: Lin CY,Ström A,Vega VB,Kong SL,Yeo AL,Thomsen JS,Chan WC,Doray B,Bangarusamy DK,Ramasamy A,Vergara LA,Tang S,Chong A,Bajic VB,Miller LD,Gustafsson JA,Liu ET

    更新日期:2004-01-01 00:00:00

  • Characterizing human lung tissue microbiota and its relationship to epidemiological and clinical features.

    abstract:BACKGROUND:The human lung tissue microbiota remains largely uncharacterized, although a number of studies based on airway samples suggest the existence of a viable human lung microbiota. Here we characterized the taxonomic and derived functional profiles of lung microbiota in 165 non-malignant lung tissue samples from ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/s13059-016-1021-1

    authors: Yu G,Gail MH,Consonni D,Carugno M,Humphrys M,Pesatori AC,Caporaso NE,Goedert JJ,Ravel J,Landi MT

    更新日期:2016-07-28 00:00:00

  • Toxicity in mice expressing short hairpin RNAs gives new insight into RNAi.

    abstract::Short hairpin RNAs can provide stable gene silencing via RNA interference. Recent studies have shown toxicity in vivo that appears to be related to saturation of the endogenous microRNA pathway. Will these findings limit the therapeutic use of such hairpins? ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2006-7-8-231

    authors: Snøve O Jr,Rossi JJ

    更新日期:2006-01-01 00:00:00

  • Divergence in cis-regulatory networks: taking the 'species' out of cross-species analysis.

    abstract::Many essential transcription factors have conserved roles in regulating biological programs, yet their genomic occupancy can diverge significantly. A new study demonstrates that such variations are primarily due to cis-regulatory sequences, rather than differences between the regulators or nuclear environments. ...

    journal_title:Genome biology

    pub_type: 杂志文章,评审

    doi:10.1186/gb-2008-9-11-240

    authors: Zinzen RP,Furlong EE

    更新日期:2008-01-01 00:00:00

  • All Your Base: a fast and accurate probabilistic approach to base calling.

    abstract::The accuracy of base calls produced by Illumina sequencers is adversely affected by several processes, with laser cross-talk and cluster phasing being prominent. We introduce an explicit statistical model of the sequencing process that generalizes current models of phasing and cross-talk and forms the basis of a base ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2012-13-2-r13

    authors: Massingham T,Goldman N

    更新日期:2012-02-29 00:00:00

  • Quantitative reconstruction of leukocyte subsets using DNA methylation.

    abstract:BACKGROUND:Cell lineage-specific DNA methylation patterns distinguish normal human leukocyte subsets and can be used to detect and quantify these subsets in peripheral blood. We have developed an approach that uses DNA methylation to simultaneously quantify multiple leukocyte subsets, enabling investigation of immune m...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2014-15-3-r50

    authors: Accomando WP,Wiencke JK,Houseman EA,Nelson HH,Kelsey KT

    更新日期:2014-03-05 00:00:00

  • A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database.

    abstract::mtcPTM is an online repository of human and mouse phosphosites in which data are hierarchically organized to preserve biologically relevant experimental information, thus allowing straightforward comparisons of phosphorylation patterns found under different conditions. The database also contains the largest available ...

    journal_title:Genome biology

    pub_type: 杂志文章

    doi:10.1186/gb-2007-8-5-r90

    authors: Jiménez JL,Hegemann B,Hutchins JR,Peters JM,Durbin R

    更新日期:2007-01-01 00:00:00