A context-blocks model for identifying clinical relationships in patient records.

Abstract:

BACKGROUND:Patient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preliminary steps for many important applications in medical informatics, ranging from quality of care to hypothesis generation. METHODS:In this work we describe an approach that facilitates the automatic recognition of eight relationships defined between medical problems, treatments and tests. Unlike the traditional bag-of-words representation, in this work, we represent a relationship with a scheme of five distinct context-blocks determined by the position of concepts in the text. As a preliminary step to relationship recognition, and in order to provide an end-to-end system, we also addressed the automatic extraction of medical problems, treatments and tests. Our approach combined the outcome of a statistical model for concept recognition and simple natural language processing features in a conditional random fields model. A set of 826 patient records from the 4th i2b2 challenge was used for training and evaluating the system. RESULTS:Results show that our concept recognition system achieved an F-measure of 0.870 for exact span concept detection. Moreover the context-block representation of relationships was more successful (F-Measure = 0.775) at identifying relationships than bag-of-words (F-Measure = 0.402). Most importantly, the performance of the end-to-end system of relationship extraction using automatically extracted concepts (F-Measure = 0.704) was comparable to that obtained using manually annotated concepts (F-Measure = 0.711), and their difference was not statistically significant. CONCLUSIONS:We extracted important clinical relationships from text in an automated manner, starting with concept recognition, and ending with relationship identification. The advantage of the context-blocks representation scheme was the correct management of word position information, which may be critical in identifying certain relationships. Our results may serve as benchmark for comparison to other systems developed on i2b2 challenge data. Finally, our system may serve as a preliminary step for other discovery tasks in medical informatics.

journal_name

BMC Bioinformatics

journal_title

BMC bioinformatics

authors

Islamaj Doğan R,Névéol A,Lu Z

doi

10.1186/1471-2105-12-S3-S3

subject

Has Abstract

pub_date

2011-06-09 00:00:00

pages

S3

issn

1471-2105

pii

1471-2105-12-S3-S3

journal_volume

12 Suppl 3

pub_type

杂志文章
  • ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    abstract:BACKGROUND:Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. RESULTS:A Java program was developed for retrieval of protein and nucleic acid s...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-3-40

    authors: Büssow K,Hoffmann S,Sievert V

    更新日期:2002-12-19 00:00:00

  • Non-coding RNA detection methods combined to improve usability, reproducibility and precision.

    abstract:BACKGROUND:Non-coding RNAs gain more attention as their diverse roles in many cellular processes are discovered. At the same time, the need for efficient computational prediction of ncRNAs increases with the pace of sequencing technology. Existing tools are based on various approaches and techniques, but none of them p...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-491

    authors: Raasch P,Schmitz U,Patenge N,Vera J,Kreikemeyer B,Wolkenhauer O

    更新日期:2010-09-29 00:00:00

  • HMM Logos for visualization of protein families.

    abstract:BACKGROUND:Profile Hidden Markov Models (pHMMs) are a widely used tool for protein family research. Up to now, however, there exists no method to visualize all of their central aspects graphically in an intuitively understandable way. RESULTS:We present a visualization method that incorporates both emission and transi...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-7

    authors: Schuster-Böckler B,Schultz J,Rahmann S

    更新日期:2004-01-21 00:00:00

  • eL-DASionator: an LDAS upload file generator.

    abstract:BACKGROUND:The Distributed Annotation System (DAS) allows merging of DNA sequence annotations from multiple sources and provides a single annotation view. A straightforward way to establish a DAS annotation server is to use the "Lightweight DAS" server (LDAS). Onto this type of server, annotations can be uploaded as fl...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-55

    authors: Negre V,Grunau C

    更新日期:2004-05-07 00:00:00

  • Temporal dynamics of protein complexes in PPI networks: a case study using yeast cell cycle dynamics.

    abstract::Complexes of physically interacting proteins are one of the fundamental functional units responsible for driving key biological mechanisms within the cell. With the advent of high-throughput techniques, significant amount of protein interaction (PPI) data has been catalogued for organisms such as yeast, which has in t...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S16

    authors: Srihari S,Leong HW

    更新日期:2012-01-01 00:00:00

  • Pushing the accuracy limit of shape complementarity for protein-protein docking.

    abstract:BACKGROUND:Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open que...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3270-y

    authors: Yan Y,Huang SY

    更新日期:2019-12-24 00:00:00

  • Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.

    abstract:BACKGROUND:Routine application of gene expression microarray technology is rapidly producing large amounts of data that necessitate new approaches of analysis. The analysis of a specific microarray experiment profits enormously from cross-comparing to other experiments. This process is generally performed by numerical ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-S4-S14

    authors: Finocchiaro G,Mancuso F,Muller H

    更新日期:2005-12-01 00:00:00

  • The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays.

    abstract:BACKGROUND:The MAQC project demonstrated that microarrays with comparable content show inter- and intra-platform reproducibility. However, since the content of gene databases still increases, the development of new generations of microarrays covering new content is mandatory. To better understand the potential challeng...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-186

    authors: Eggle D,Debey-Pascher S,Beyer M,Schultze JL

    更新日期:2009-06-18 00:00:00

  • NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.

    abstract:BACKGROUND:Deep shotgun sequencing on next generation sequencing (NGS) platforms has contributed significant amounts of data to enrich our understanding of genomes, transcriptomes, amplified single-cell genomes, and metagenomes. However, deep coverage variations in short-read data sets and high sequencing error rates o...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-014-0357-3

    authors: McCorrison JM,Venepally P,Singh I,Fouts DE,Lasken RS,Methé BA

    更新日期:2014-11-19 00:00:00

  • scDC: single cell differential composition analysis.

    abstract:BACKGROUND:Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite de...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3211-9

    authors: Cao Y,Lin Y,Ormerod JT,Yang P,Yang JYH,Lo KK

    更新日期:2019-12-24 00:00:00

  • SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.

    abstract:BACKGROUND:Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. R...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1841-3

    authors: Beccati A,Gerken J,Quast C,Yilmaz P,Glöckner FO

    更新日期:2017-09-30 00:00:00

  • Reporting and connecting cell type names and gating definitions through ontologies.

    abstract:BACKGROUND:Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-2725-5

    authors: Overton JA,Vita R,Dunn P,Burel JG,Bukhari SAC,Cheung KH,Kleinstein SH,Diehl AD,Peters B

    更新日期:2019-04-25 00:00:00

  • Cyclic nucleotide binding proteins in the Arabidopsis thaliana and Oryza sativa genomes.

    abstract:BACKGROUND:Cyclic nucleotides are ubiquitous intracellular messengers. Until recently, the roles of cyclic nucleotides in plant cells have proven difficult to uncover. With an understanding of the protein domains which can bind cyclic nucleotides (CNB and GAF domains) we scanned the completed genomes of the higher plan...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-6

    authors: Bridges D,Fraser ME,Moorhead GB

    更新日期:2005-01-11 00:00:00

  • Cascaded classifiers for confidence-based chemical named entity recognition.

    abstract:BACKGROUND:Chemical named entities represent an important facet of biomedical text. RESULTS:We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjusta...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-9-S11-S4

    authors: Corbett P,Copestake A

    更新日期:2008-11-19 00:00:00

  • Correlation analysis reveals the emergence of coherence in the gene expression dynamics following system perturbation.

    abstract::Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time se...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S1-S16

    authors: Neretti N,Remondini D,Tatar M,Sedivy JM,Pierini M,Mazzatti D,Powell J,Franceschi C,Castellani GC

    更新日期:2007-03-08 00:00:00

  • Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets.

    abstract:BACKGROUND:Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology all...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-7-241

    authors: Aubry M,Monnier A,Chicault C,de Tayrac M,Galibert MD,Burgun A,Mosser J

    更新日期:2006-05-04 00:00:00

  • An improved classification of G-protein-coupled receptors using sequence-derived features.

    abstract:BACKGROUND:G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and charact...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-11-420

    authors: Peng ZL,Yang JY,Chen X

    更新日期:2010-08-09 00:00:00

  • ChemEx: information extraction system for chemical data curation.

    abstract:BACKGROUND:Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-13-S17-S9

    authors: Tharatipyakul A,Numnark S,Wichadakul D,Ingsriswang S

    更新日期:2012-01-01 00:00:00

  • Simultaneous phylogeny reconstruction and multiple sequence alignment.

    abstract:BACKGROUND:A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S1-S11

    authors: Yue F,Shi J,Tang J

    更新日期:2009-01-30 00:00:00

  • Current approaches to gene regulatory network modelling.

    abstract::Many different approaches have been developed to model and simulate gene regulatory networks. We proposed the following categories for gene regulatory network models: network parts lists, network topology models, network control logic models, and dynamic models. Here we will describe some examples for each of these ca...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-S6-S9

    authors: Schlitt T,Brazma A

    更新日期:2007-09-27 00:00:00

  • A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases.

    abstract:BACKGROUND:The PathoLogic program constructs Pathway/Genome databases by using a genome's annotation to predict the set of metabolic pathways present in an organism. PathoLogic determines the set of reactions composing those pathways from the enzymes annotated in the organism's genome. Most annotation efforts fail to a...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-5-76

    authors: Green ML,Karp PD

    更新日期:2004-06-09 00:00:00

  • mRNA:guanine-N7 cap methyltransferases: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships.

    abstract:BACKGROUND:The 5'-terminal cap structure plays an important role in many aspects of mRNA metabolism. Capping enzymes encoded by viruses and pathogenic fungi are attractive targets for specific inhibitors. There is a large body of experimental data on viral and cellular methyltransferases (MTases) that carry out guanine...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-2-2

    authors: Bujnicki JM,Feder M,Radlinska M,Rychlewski L

    更新日期:2001-01-01 00:00:00

  • Evidence for intron length conservation in a set of mammalian genes associated with embryonic development.

    abstract:BACKGROUND:We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples ...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-12-S9-S16

    authors: Seoighe C,Korir PK

    更新日期:2011-10-05 00:00:00

  • Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data.

    abstract:BACKGROUND:The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, w...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-020-3427-8

    authors: Smith AM,Walsh JR,Long J,Davis CB,Henstock P,Hodge MR,Maciejewski M,Mu XJ,Ra S,Zhao S,Ziemek D,Fisher CK

    更新日期:2020-03-20 00:00:00

  • Ontology driven integration platform for clinical and translational research.

    abstract::Semantic Web technologies offer a promising framework for integration of disparate biomedical data. In this paper we present the semantic information integration platform under development at the Center for Clinical and Translational Sciences (CCTS) at the University of Texas Health Science Center at Houston (UTHSC-H)...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-10-S2-S2

    authors: Mirhaji P,Zhu M,Vagnoni M,Bernstam EV,Zhang J,Smith JW

    更新日期:2009-02-05 00:00:00

  • Scoredist: a simple and robust protein sequence distance estimator.

    abstract:BACKGROUND:Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-6-108

    authors: Sonnhammer EL,Hollich V

    更新日期:2005-04-27 00:00:00

  • Inclusion of the fitness sharing technique in an evolutionary algorithm to analyze the fitness landscape of the genetic code adaptability.

    abstract:BACKGROUND:The canonical code, although prevailing in complex genomes, is not universal. It was shown the canonical genetic code superior robustness compared to random codes, but it is not clearly determined how it evolved towards its current form. The error minimization theory considers the minimization of point mutat...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-017-1608-x

    authors: Santos J,Monteagudo Á

    更新日期:2017-03-27 00:00:00

  • CellProfiler Tracer: exploring and validating high-throughput, time-lapse microscopy image data.

    abstract:BACKGROUND:Time-lapse analysis of cellular images is an important and growing need in biology. Algorithms for cell tracking are widely available; what researchers have been missing is a single open-source software package to visualize standard tracking output (from software like CellProfiler) in a way that allows conve...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-015-0759-x

    authors: Bray MA,Carpenter AE

    更新日期:2015-11-04 00:00:00

  • SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data.

    abstract:BACKGROUND:Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero value...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/s12859-019-3067-z

    authors: Li Y,Fan TWM,Lane AN,Kang WY,Arnold SM,Stromberg AJ,Wang C,Chen L

    更新日期:2019-10-17 00:00:00

  • CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks.

    abstract:BACKGROUND:Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the...

    journal_title:BMC bioinformatics

    pub_type: 杂志文章

    doi:10.1186/1471-2105-8-429

    authors: Baumbach J

    更新日期:2007-11-06 00:00:00