Abstract:
BACKGROUND:We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random fields model the probability P(t/o) of a tag sequence given an observation sequence directly, and have previously been employed successfully for other tagging tasks. The mechanics of CRFs and their relationship to maximum entropy are discussed in detail. RESULTS:We employ a diverse feature set containing standard orthographic features combined with expert features in the form of gene and biological term lexicons to achieve a precision of 86.4% and recall of 78.7%. An analysis of the contribution of the various features of the model is provided.
journal_name
BMC Bioinformaticsjournal_title
BMC bioinformaticsauthors
McDonald R,Pereira Fdoi
10.1186/1471-2105-6-S1-S6keywords:
subject
Has Abstractpub_date
2005-01-01 00:00:00pages
S6issn
1471-2105pii
1471-2105-6-S1-S6journal_volume
6 Suppl 1pub_type
杂志文章abstract:BACKGROUND:Zebrafish is a widely used model organism for studying heart development and cardiac-related pathogenesis. With the ability of surviving without a functional circulation at larval stages, strong genetic similarity between zebrafish and mammals, prolific reproduction and optically transparent embryos, zebrafi...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2166-6
更新日期:2018-05-09 00:00:00
abstract:BACKGROUND:Understanding the relationship between the protein sequence and the 3D structure is a major research area in bioinformatics. The prediction of complete protein tertiary structure based only on sequence information is still an impractical work. This paper aims at revealing the hidden knowledge of the sequence...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S11-S15
更新日期:2009-10-08 00:00:00
abstract:BACKGROUND:Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-6-196
更新日期:2005-08-03 00:00:00
abstract:BACKGROUND AND GOAL:The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. RESULTS:In this conte...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2264-5
更新日期:2018-07-17 00:00:00
abstract:BACKGROUND:One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and pro...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-488
更新日期:2011-12-21 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs acc...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03896-6
更新日期:2020-12-16 00:00:00
abstract:BACKGROUND:Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. RESULTS:We d...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-03777-y
更新日期:2020-10-20 00:00:00
abstract:BACKGROUND:The importance of biodiversity conservation has been increasing steadily due to its benefits to human beings. Recently, producing and managing biodiversity databases have become much easier because of the information technology (IT) advancement. This made the general public's participation in biodiversity co...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-10-S15-S11
更新日期:2009-12-03 00:00:00
abstract:BACKGROUND:With the ever increasing use of computational models in the biosciences, the need to share models and reproduce the results of published studies efficiently and easily is becoming more important. To this end, various standards have been proposed that can be used to describe models, simulations, data or other...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0369-z
更新日期:2014-12-14 00:00:00
abstract:BACKGROUND:Light microscopy is of central importance in cell biology. The recent introduction of automated high content screening has expanded this technology towards automation of experiments and performing large scale perturbation assays. Nevertheless, evaluation of microscopy data continues to be a bottleneck in man...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-30
更新日期:2010-01-14 00:00:00
abstract:BACKGROUND:The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologist...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-264
更新日期:2008-06-05 00:00:00
abstract:BACKGROUND:In the biomedical domain, the desired information of a question (query) asked by biologists usually is a list of a certain type of entities covering different aspects that are related to the question, such as genes, proteins, diseases, mutations, etc. Hence it is important for a biomedical information retrie...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S5-S8
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-9-480
更新日期:2008-11-13 00:00:00
abstract:BACKGROUND:The identification of protein coding regions (exons) in DNA sequences using signal processing techniques is an important component of bioinformatics and biological signal processing. In this paper, a new method is presented for the identification of exonic regions in DNA sequences. This method is based on th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-430
更新日期:2011-11-03 00:00:00
abstract:BACKGROUND:Annotations of the phylogenetic tree of the human kinome is an intuitive way to visualize compound profiling data, structural features of kinases or functional relationships within this important class of proteins. The increasing volume and complexity of kinase-related data underlines the need for a tool tha...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1433-7
更新日期:2017-01-05 00:00:00
abstract:BACKGROUND:The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-303
更新日期:2011-07-26 00:00:00
abstract:BACKGROUND:Schizophrenia, bipolar disorder, and major depression are devastating mental diseases, each with distinctive yet overlapping epidemiologic characteristics. Microarray and proteomics data have revealed genes which expressed abnormally in patients. Several single nucleotide polymorphisms (SNPs) and mutations a...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-S13-S20
更新日期:2011-01-01 00:00:00
abstract:BACKGROUND:Microarray technology provides the expression level of many genes. Nowadays, an important issue is to select a small number of informative differentially expressed genes that provide biological knowledge and may be key elements for a disease. With the increasing volume of data generated by modern biomedical ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-020-3463-4
更新日期:2020-04-07 00:00:00
abstract:BACKGROUND:Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations fo...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-018-2073-x
更新日期:2018-05-08 00:00:00
abstract:BACKGROUND:With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceabilit...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-8-81
更新日期:2007-03-07 00:00:00
abstract:BACKGROUND:The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-1359-0
更新日期:2016-11-25 00:00:00
abstract:BACKGROUND:Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., conti...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-11-275
更新日期:2010-05-21 00:00:00
abstract::DNA methylation exhibits different patterns in different cancers. DNA methylation rates at different genomic loci appear to be highly correlated in some samples but not in others. We call such phenomena conditional concordant relationships (CCRs). In this study, we explored DNA methylation patterns in 12 common cancer...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-S13-S7
更新日期:2012-01-01 00:00:00
abstract:BACKGROUND:Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. RESULTS:On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-014-0383-1
更新日期:2014-12-10 00:00:00
abstract:BACKGROUND:RNA-Sequencing (RNA-seq) experiments have been popularly applied to transcriptome studies in recent years. Such experiments are still relatively costly. As a result, RNA-seq experiments often employ a small number of replicates. Power analysis and sample size calculation are challenging in the context of dif...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/s12859-016-0994-9
更新日期:2016-03-31 00:00:00
abstract:BACKGROUND:Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations acco...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-466
更新日期:2011-12-07 00:00:00
abstract:BACKGROUND:One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to th...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-7-410
更新日期:2006-09-14 00:00:00
abstract:BACKGROUND:The frequent exchange of genetic material among prokaryotes means that extracting a majority or plurality phylogenetic signal from many gene families, and the identification of gene families that are in significant conflict with the plurality signal is a frequent task in comparative genomics, and especially ...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-13-123
更新日期:2012-06-07 00:00:00
abstract:BACKGROUND:An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-5-140
更新日期:2004-09-30 00:00:00
abstract:BACKGROUND:Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of t...
journal_title:BMC bioinformatics
pub_type: 杂志文章
doi:10.1186/1471-2105-12-288
更新日期:2011-07-15 00:00:00