Abstract:
BACKGROUND:Text mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts. RESULTS:We describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score > 90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts. CONCLUSIONS:The results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Van Landeghem S,Björne J,Abeel T,De Baets B,Salakoski T,Van de Peer Ydoi
10.1186/1471-2105-13-S11-S6subject
Has Abstractpub_date
2012-06-26 00:00:00pages
S6issn
1471-2105pii
1471-2105-13-S11-S6journal_volume
13 Suppl 11pub_type
杂志文章abstract:BACKGROUND:The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-cou...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-254
更新日期:2006-05-15 00:00:00
abstract:BACKGROUND:Systems biology modeling from microarray data requires the most contemporary structural and functional array annotation. However, microarray annotations, especially for non-commercial, non-traditional biomedical model organisms, are often dated. In addition, most microarray analysis tools do not readily acce...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-30
更新日期:2009-01-23 00:00:00
abstract:BACKGROUND:Homology is a crucial concept in comparative genomics. The algorithm probably most widely used for homology detection in comparative genomics, is BLAST. Usually a stringent score cutoff is applied to distinguish putative homologs from possible false positive hits. As a consequence, some BLAST hits are discar...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-86
更新日期:2010-02-12 00:00:00
abstract:BACKGROUND:Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the pas...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3114-9
更新日期:2019-10-30 00:00:00
abstract:BACKGROUND:RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more rece...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-260
更新日期:2012-10-09 00:00:00
abstract:BACKGROUND:Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-537
更新日期:2010-10-29 00:00:00
abstract:BACKGROUND:Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structu...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1616-x
更新日期:2017-04-11 00:00:00
abstract:BACKGROUND:In current comparative proteomics studies, the large number of images generated by 2D gels is currently compared using spot matching algorithms. Unfortunately, differences in gel migration and sample variability make efficient spot alignment very difficult to obtain, and, as consequence most of the software ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-460
更新日期:2008-10-28 00:00:00
abstract:BACKGROUND:InterPro is a collection of protein signatures for the classification and automated annotation of proteins. Interproscan is a software tool that scans protein sequences against Interpro member databases using a variety of profile-based, hidden markov model and positional specific score matrix methods. It not...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-S12-S13
更新日期:2010-12-21 00:00:00
abstract:BACKGROUND:Nearly all cellular processes involve proteins structurally rearranging to accommodate molecular partners. The energy landscape underscores the inherent nature of proteins as dynamic molecules interconverting between structures with varying energies. In principle, reconstructing a protein's energy landscape ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-2822-5
更新日期:2019-06-06 00:00:00
abstract:BACKGROUND:T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-3-25
更新日期:2002-09-11 00:00:00
abstract:BACKGROUND:Network co-regulated modules are believed to have the functionality of packaging multiple biological entities, and can thus be assumed to coordinate many biological functions in their network neighbouring regions. RESULTS:Here, we weighted edges of a human protein interaction network and a transcriptional r...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-392
更新日期:2010-07-22 00:00:00
abstract:BACKGROUND:Fluorescent and luminescent gene reporters allow us to dynamically quantify changes in molecular species concentration over time on the single cell level. The mathematical modeling of their interaction through multivariate dynamical models requires the development of effective statistical methods to calibrat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-343
更新日期:2009-10-19 00:00:00
abstract:BACKGROUND:Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial speci...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-18
更新日期:2013-01-16 00:00:00
abstract:BACKGROUND:Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging vari...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-248
更新日期:2014-07-21 00:00:00
abstract:BACKGROUND:With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the followin...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-46
更新日期:2009-02-03 00:00:00
abstract:BACKGROUND:Inflammation is a core element of many different, systemic and chronic diseases that usually involve an important autoimmune component. The clinical phase of inflammatory diseases is often the culmination of a long series of pathologic events that started years before. The systemic characteristics and relate...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2413-x
更新日期:2018-11-30 00:00:00
abstract:BACKGROUND:Molecular biology (MB) is a dynamic research domain that benefits greatly from the use of modern software technology in preparing experiments, analyzing acquired data, and even performing "in-silico" analyses. As ever new findings change the face of this domain, software for MB has to be sufficiently flexibl...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-97
更新日期:2009-03-27 00:00:00
abstract:BACKGROUND:The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1359-0
更新日期:2016-11-25 00:00:00
abstract:BACKGROUND:MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes ba...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-261
更新日期:2008-06-04 00:00:00
abstract:BACKGROUND:In metabolomics experiments, spectral fingerprints of metabolites with no known structural identity are detected routinely. Computer-assisted structure elucidation (CASE) has been used to determine the structural identities of unknown compounds. It is generally accepted that a single 1D NMR spectrum or mass ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-234
更新日期:2014-07-05 00:00:00
abstract:BACKGROUND:Leishmania and other members of the Trypanosomatidae family diverged early on in eukaryotic evolution and consequently display unique cellular properties. Their apparent lack of transcriptional regulation is compensated by complex post-transcriptional control mechanisms, including the processing of polycistr...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-158
更新日期:2008-03-20 00:00:00
abstract:BACKGROUND:Phosphorylated histone H2AX, also known as γH2AX, forms μm-sized nuclear foci at the sites of DNA double-strand breaks (DSBs) induced by ionizing radiation and other agents. Due to their specificity and sensitivity, γH2AX immunoassays have become the gold standard for studying DSB induction and repair. One o...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3370-8
更新日期:2020-01-28 00:00:00
abstract:BACKGROUND:In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-54
更新日期:2008-01-27 00:00:00
abstract:BACKGROUND:Protein function in eukaryotic cells is often controlled in a cell cycle-dependent manner. Therefore, the correct assignment of cellular phenotypes to cell cycle phases is a crucial task in cell biology research. Nuclear proteins whose localization varies during the cell cycle are valuable and frequently use...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0618-9
更新日期:2015-05-29 00:00:00
abstract:BACKGROUND:The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1223-2
更新日期:2016-10-06 00:00:00
abstract:BACKGROUND:Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams. RESULTS:Our primary contribution is a local sliding-...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S9-S18
更新日期:2011-10-05 00:00:00
abstract:BACKGROUND:Many of solved tertiary structures of unknown functions do not have global sequence and structural similarities to proteins of known function. Often functional clues of unknown proteins can be obtained by predicting small ligand molecules that bind to the proteins. METHODS:In our previous work, we have deve...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S2-S7
更新日期:2012-03-13 00:00:00
abstract:BACKGROUND:Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. RESULTS:A non-parametric Bayesian ap...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0368-0
更新日期:2014-11-21 00:00:00
abstract:BACKGROUND:Mammalian germ cells undergo meiosis to produce sperm or eggs, haploid cells that are primed to meet and propagate life. Meiosis is initiated by retinoic acid and meiotic prophase is the first and most complex stage of meiosis when homologous chromosomes pair to exchange genetic information. Errors in meiosi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-14-72
更新日期:2013-02-27 00:00:00