Abstract:
BACKGROUND:The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS:Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS:Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
Donaldson I,Martin J,de Bruijn B,Wolting C,Lay V,Tuekam B,Zhang S,Baskin B,Bader GD,Michalickova K,Pawson T,Hogue CWdoi
10.1186/1471-2105-4-11keywords:
subject
Has Abstractpub_date
2003-03-27 00:00:00pages
11issn
1471-2105journal_volume
4pub_type
杂志文章abstract:BACKGROUND:An organism's protein interactome, or complete network of protein-protein interactions, defines the protein complexes that drive cellular processes. Techniques for studying protein complexes have traditionally applied targeted strategies such as yeast two-hybrid or affinity purification-mass spectrometry to ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-017-1865-8
更新日期:2017-10-23 00:00:00
abstract:BACKGROUND:Many proteins contain conserved sequence patterns (motifs) that contribute to their functionality. The process of experimentally identifying and validating novel protein motifs can be difficult, expensive, and time consuming. A means for helping to identify in advance the possible function of a novel motif i...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-379
更新日期:2011-09-26 00:00:00
abstract:BACKGROUND:Gene expression microarray experiments are expensive to conduct and guidelines for acceptable quality control at intermediate steps before and after the samples are hybridised to chips are vague. We conducted an experiment hybridising RNA from human brain to 117 U133A Affymetrix GeneChips and used these data...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-211
更新日期:2006-04-19 00:00:00
abstract:BACKGROUND:Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the pas...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3114-9
更新日期:2019-10-30 00:00:00
abstract:BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preli...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S3-S3
更新日期:2011-06-09 00:00:00
abstract:BACKGROUND:Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-247
更新日期:2006-05-05 00:00:00
abstract:BACKGROUND:Recent studies have shown that N6-methyladenosine (m6A) plays a critical role in numbers of biological processes and complex human diseases. However, the regulatory mechanisms of most methylation sites remain uncharted. Thus, in-depth study of the epi-transcriptomic patterns of m6A may provide insights into ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03787-w
更新日期:2020-10-09 00:00:00
abstract:BACKGROUND:Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Des...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-16-S6-S3
更新日期:2015-01-01 00:00:00
abstract:BACKGROUND:In quantitative proteomics, peptide mapping is a valuable approach to combine positional quantitative information with topographical and domain information of proteins. Quantitative proteomic analysis of cell surface shedding is an exemplary application area of this approach. RESULTS:We developed ImproViser...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-207
更新日期:2014-06-19 00:00:00
abstract:BACKGROUND:Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework. RESULTS:Qxpak is a package for versatile statistical genomics, specificall...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-202
更新日期:2011-05-25 00:00:00
abstract:BACKGROUND:As molecular biology is creating an increasing amount of sequence and structure data, the multitude of software to analyze this data is also rising. Most of the programs are made for a specific task, hence the user often needs to combine multiple programs in order to reach a goal. This can make the data proc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2367-z
更新日期:2018-10-01 00:00:00
abstract:BACKGROUND:The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D pos...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-015-0807-6
更新日期:2015-11-04 00:00:00
abstract:BACKGROUND:Variation in DNA copy number, due to gains and losses of chromosome segments, is common. A first step for analyzing DNA copy number data is to identify amplified or deleted regions in individuals. To locate such regions, we propose a circular binary segmentation procedure, which is based on a sequence of nes...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-277
更新日期:2012-10-30 00:00:00
abstract:BACKGROUND:In the adaptive immune system, variable regions of immunoglobulin (IG) are encoded by random recombination of variable (V), diversity (D), and joining (J) gene segments in the germline. Partitioning the functional antibody sequences to their sourcing germline gene segments is vital not only for understanding...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-S12-S20
更新日期:2008-12-12 00:00:00
abstract:BACKGROUND:As whole-genome sequencing is becoming a routine technique, it is important to identify a cost-effective depth of sequencing for such studies. However, the relationship between sequencing depth and biological results from the aspects of whole-genome coverage, variant discovery power and the quality of varian...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-019-3164-z
更新日期:2019-11-08 00:00:00
abstract:BACKGROUND:Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted wit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S10-S15
更新日期:2012-06-25 00:00:00
abstract:BACKGROUND:Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-132
更新日期:2005-05-31 00:00:00
abstract:BACKGROUND:In mass spectrometry (MS) based proteomic data analysis, peak detection is an essential step for subsequent analysis. Recently, there has been significant progress in the development of various peak detection algorithms. However, neither a comprehensive survey nor an experimental comparison of these algorith...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-4
更新日期:2009-01-06 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) play crucial roles in virtually every aspect of cellular function within an organism. Over the last decade, the development of novel high-throughput techniques has resulted in enormous amounts of data and provided valuable resources for studying protein interactions. Howev...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S7-S3
更新日期:2012-05-08 00:00:00
abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-275
更新日期:2010-05-21 00:00:00
abstract:BACKGROUND:Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the mole...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2474-x
更新日期:2019-01-03 00:00:00
abstract:BACKGROUND:Mouse xenografts from (patient-derived) tumors (PDX) or tumor cell lines are widely used as models to study various biological and preclinical aspects of cancer. However, analyses of their RNA and DNA profiles are challenging, because they comprise reads not only from the grafted human cancer but also from t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2353-5
更新日期:2018-10-04 00:00:00
abstract:BACKGROUND:New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bio...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S14-S4
更新日期:2009-11-10 00:00:00
abstract:BACKGROUND:The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which inten...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-15-S1-S15
更新日期:2014-01-01 00:00:00
abstract:BACKGROUND:Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions. RESULTS:We present a computational pipeline for detecting novel lncRNAs fro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-331
更新日期:2012-12-13 00:00:00
abstract:BACKGROUND:The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localiza...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-291
更新日期:2005-12-07 00:00:00
abstract:BACKGROUND:The importance of biodiversity conservation has been increasing steadily due to its benefits to human beings. Recently, producing and managing biodiversity databases have become much easier because of the information technology (IT) advancement. This made the general public's participation in biodiversity co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S15-S11
更新日期:2009-12-03 00:00:00
abstract:BACKGROUND:Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting m...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-44
更新日期:2011-02-02 00:00:00
abstract:BACKGROUND:Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional informa...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-534
更新日期:2008-12-16 00:00:00
abstract:BACKGROUND:Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing inf...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0418-7
更新日期:2015-01-16 00:00:00