Abstract:
BACKGROUND:Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. RESULTS:During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F1 score of 28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and 26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. CONCLUSION:We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.
journal_name
Genome Bioljournal_title
Genome biologyauthors
Huang M,Ding S,Wang H,Zhu Xdoi
10.1186/gb-2008-9-s2-s12subject
Has Abstractpub_date
2008-01-01 00:00:00pages
S12eissn
1474-7596issn
1474-760Xpii
gb-2008-9-s2-s12journal_volume
9 Suppl 2pub_type
杂志文章相关文献
GENOME BIOLOGY文献大全abstract:BACKGROUND:The discovery of cytosine hydroxymethylation (5hmC) as a mechanism that potentially controls DNA methylation changes typical of neoplasia prompted us to investigate its behaviour in colon cancer. 5hmC is globally reduced in proliferating cells such as colon tumours and the gut crypt progenitors, from which t...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-015-0605-5
更新日期:2015-04-01 00:00:00
abstract::Transposable elements (TEs) are notable drivers of genetic innovation. Over evolutionary time, TE insertions can supply new promoter, enhancer, and insulator elements to protein-coding genes and establish novel, species-specific gene regulatory networks. Conversely, ongoing TE-driven insertional mutagenesis, nonhomolo...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/s13059-016-0965-5
更新日期:2016-05-09 00:00:00
abstract::CRISPR-Cas9 gene editing has transformed our ability to rapidly interrogate the functional impact of somatic mutations in human cancers. Droplet-based technology enables the analysis of Cas9-introduced gene edits in thousands of single cells. Using this technology, we analyze Ba/F3 cells engineered to express single o...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-020-02174-1
更新日期:2020-10-20 00:00:00
abstract:BACKGROUND:The nature of the protein molecular clock, the protein-specific rate of amino acid substitutions, is among the central questions of molecular evolution. Protein expression level is the dominant determinant of the clock rate in a number of organisms. It has been suggested that highly expressed proteins evolve...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2010-11-9-r98
更新日期:2010-01-01 00:00:00
abstract:BACKGROUND:In eukaryotic cells, oxidative phosphorylation (OXPHOS) uses the products of both nuclear and mitochondrial genes to generate cellular ATP. Interspecies comparative analysis of these genes, which appear to be under strong functional constraints, may shed light on the evolutionary mechanisms that act on a set...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-2-r11
更新日期:2005-01-01 00:00:00
abstract::The first genome sequence of the extinct European wild aurochs reveals the genetic foundation of native British and Irish landraces of cattle.See related Research article: www.dx.doi.org/10.1186/s13059-015-0790-2. ...
journal_title:Genome biology
pub_type: 评论,杂志文章
doi:10.1186/s13059-015-0793-z
更新日期:2015-10-26 00:00:00
abstract:BACKGROUND:The cellular mechanisms that underlie metal toxicity and detoxification are rather variegated and incompletely understood. Genomic phenotyping was used to assess the roles played by all nonessential Saccharomyces cerevisiae proteins in modulating cell viability after exposure to cadmium, nickel, and other me...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2008-9-4-r67
更新日期:2008-04-07 00:00:00
abstract::Understanding the functional impact of genomic variants is a major goal of modern genetics and personalized medicine. Although many synonymous and non-coding variants act through altering the efficiency of pre-mRNA splicing, it is difficult to predict how these variants impact pre-mRNA splicing. Here, we describe a ma...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-018-1437-x
更新日期:2018-06-01 00:00:00
abstract:BACKGROUND:Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs, as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers and cancer-t...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-8-r75
更新日期:2012-08-28 00:00:00
abstract::A report of the First Golden Helix Symposium 'Copy Number Variation (CNV) and Genomic Alterations in Health and Disease', Athens, Greece, 28-29 November 2008. ...
journal_title:Genome biology
pub_type:
doi:10.1186/gb-2009-10-1-301
更新日期:2009-01-01 00:00:00
abstract::Through understanding the intricacies of host-pathogen interactions, it is now possible to inhibit the growth of microbes, especially viruses, by targeting host-cell proteins and functions. This new antimicrobial strategy has proved effective in the laboratory and in the clinic, and it has great potential for the futu...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2006-7-1-201
更新日期:2006-01-01 00:00:00
abstract:BACKGROUND:Candida glabrata is a pathogenic yeast of increasing medical concern. It has been regarded as asexual since it was first described in 1917, yet phylogenetic analyses have revealed that it is more closely related to sexual yeasts than other Candida species. We show here that the C. glabrata genome contains ma...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2003-4-2-r10
更新日期:2003-01-01 00:00:00
abstract:BACKGROUND:Gene dosage change is a mild perturbation that is a valuable tool for pathway reconstruction in Drosophila. While it is often assumed that reducing gene dose by half leads to two-fold less expression, there is partial autosomal dosage compensation in Drosophila, which may be mediated by feedback or buffering...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-4-r28
更新日期:2012-04-24 00:00:00
abstract::Epithelial tissues house gammadelta T cells, which are important for the mucosal immune system and may be involved in controlling malignancies, infections and inflammation. Whole-genome gene-expression analysis provides a new way to study the signals required for the activation of gammadelta T cells, their mode of act...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2001-2-11-reviews1031
更新日期:2001-01-01 00:00:00
abstract:BACKGROUND:Obligate pathogenic bacteria lose more genes relative to facultative pathogens, which, in turn, lose more genes than free-living bacteria. It was suggested that the increased gene loss in obligate pathogens may be due to a reduction in the effectiveness of purifying selection. Less attention has been given t...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-8-r164
更新日期:2007-01-01 00:00:00
abstract::Ever since DNA microarrays were first applied to the quantitation of RNA levels, there has been considerable interest in generating a protein homolog that can be used to assay cellular protein expression. A recent paper describes the first microarray that can be used for such protein profiling. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2001-2-2-reviews1004
更新日期:2001-01-01 00:00:00
abstract::Cryptochromes are photoreceptors that regulate entrainment by light of the circadian clock in plants and animals. They also act as integral parts of the central circadian oscillator in animal brains and as receptors controlling photomorphogenesis in response to blue or ultraviolet (UV-A) light in plants. Cryptochromes...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2005-6-5-220
更新日期:2005-01-01 00:00:00
abstract:BACKGROUND:Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues which de...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2002-3-3-preprint0002
更新日期:2002-01-01 00:00:00
abstract::Identification of causal mutations in barley and wheat is hampered by their large genomes and suppressed recombination. To overcome these obstacles, we have developed MutChromSeq, a complexity reduction approach based on flow sorting and sequencing of mutant chromosomes, to identify induced mutations by comparison to ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-016-1082-1
更新日期:2016-10-31 00:00:00
abstract:BACKGROUND:In eukaryotic organisms, gene expression is regulated at multiple levels during the processes of transcription and translation. The absence of a tight regulatory network for transcription in the human malaria parasite suggests that gene expression may largely be controlled at post-transcriptional and transla...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2013-14-11-r128
更新日期:2013-11-22 00:00:00
abstract:BACKGROUND:Dictyostelium discoideum is a eukaryote with a simple lifestyle and a relatively small genome whose sequence has been fully determined. It is widely used for studies on cell signaling, movement and multicellular development. Ras guanine-nucleotide exchange factors (RasGEFs) are the proteins that activate Ras...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2005-6-8-r68
更新日期:2005-01-01 00:00:00
abstract::How do naturally occurring polymorphisms in DNA sequence relate to variation in gene expression? Recent work to map genetic sources of expression variation has shown a surprising balance between cis and trans effects. Other work suggests some chromosomal clustering of genes by expression pattern. A synthesis of approa...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2002-3-10-reviews1029
更新日期:2002-09-19 00:00:00
abstract::A new approach to rapid, genome-wide identification and ranking of horizontal transfer candidate proteins is presented. The method is quantitative, reproducible, and computationally undemanding. It can be combined with genomic signature and/or phylogenetic tree-building procedures to improve accuracy and efficiency. T...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-2-r16
更新日期:2007-01-01 00:00:00
abstract:BACKGROUND:Estrogens and their receptors are important in human development, physiology and disease. In this study, we utilized an integrated genome-wide molecular and computational approach to characterize the interaction between the activated estrogen receptor (ER) and the regulatory elements of candidate target gene...
journal_title:Genome biology
pub_type: 杂志文章,meta分析
doi:10.1186/gb-2004-5-9-r66
更新日期:2004-01-01 00:00:00
abstract:BACKGROUND:The human lung tissue microbiota remains largely uncharacterized, although a number of studies based on airway samples suggest the existence of a viable human lung microbiota. Here we characterized the taxonomic and derived functional profiles of lung microbiota in 165 non-malignant lung tissue samples from ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/s13059-016-1021-1
更新日期:2016-07-28 00:00:00
abstract::Short hairpin RNAs can provide stable gene silencing via RNA interference. Recent studies have shown toxicity in vivo that appears to be related to saturation of the endogenous microRNA pathway. Will these findings limit the therapeutic use of such hairpins? ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2006-7-8-231
更新日期:2006-01-01 00:00:00
abstract::Many essential transcription factors have conserved roles in regulating biological programs, yet their genomic occupancy can diverge significantly. A new study demonstrates that such variations are primarily due to cis-regulatory sequences, rather than differences between the regulators or nuclear environments. ...
journal_title:Genome biology
pub_type: 杂志文章,评审
doi:10.1186/gb-2008-9-11-240
更新日期:2008-01-01 00:00:00
abstract::The accuracy of base calls produced by Illumina sequencers is adversely affected by several processes, with laser cross-talk and cluster phasing being prominent. We introduce an explicit statistical model of the sequencing process that generalizes current models of phasing and cross-talk and forms the basis of a base ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2012-13-2-r13
更新日期:2012-02-29 00:00:00
abstract:BACKGROUND:Cell lineage-specific DNA methylation patterns distinguish normal human leukocyte subsets and can be used to detect and quantify these subsets in peripheral blood. We have developed an approach that uses DNA methylation to simultaneously quantify multiple leukocyte subsets, enabling investigation of immune m...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2014-15-3-r50
更新日期:2014-03-05 00:00:00
abstract::mtcPTM is an online repository of human and mouse phosphosites in which data are hierarchically organized to preserve biologically relevant experimental information, thus allowing straightforward comparisons of phosphorylation patterns found under different conditions. The database also contains the largest available ...
journal_title:Genome biology
pub_type: 杂志文章
doi:10.1186/gb-2007-8-5-r90
更新日期:2007-01-01 00:00:00