Abstract:
BACKGROUND:Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. METHODS:We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. RESULTS:We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision=0.797, recall=0.713, f-score=0.753. For the normalization task (strict span+concept) it achieves precision=0.712, recall=0.637, f-score=0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. DISCUSSION:We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. CONCLUSION:Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#DNorm.).
journal_name
J Biomed Informjournal_title
Journal of biomedical informaticsauthors
Leaman R,Khare R,Lu Zdoi
10.1016/j.jbi.2015.07.010subject
Has Abstractpub_date
2015-10-01 00:00:00pages
28-37eissn
1532-0464issn
1532-0480pii
S1532-0464(15)00150-1journal_volume
57pub_type
杂志文章abstract::Drug therapeutic indications and side-effects are both measurable patient phenotype changes in response to the treatment. Inferring potential drug therapeutic indications and identifying clinically interesting drug side-effects are both important and challenging tasks. Previous studies have utilized either chemical st...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.03.014
更新日期:2014-10-01 00:00:00
abstract::Computer-based counseling systems in health care play an important role in the toolset available for medical doctors to inform, motivate and challenge their patients according to a well-defined therapeutic goal. The design, development and implementation of such systems require close collaboration between users, i.e. ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.10.005
更新日期:2009-04-01 00:00:00
abstract::Identifying the symptom clusters (two or more related symptoms) with shared underlying molecular mechanisms has been a vital analysis task to promote the symptom science and precision health. Related studies have applied the clustering algorithms (e.g. k-means, latent class model) to detect the symptom clusters mostly...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103482
更新日期:2020-07-01 00:00:00
abstract::In oncology, the reuse of data is confronted with the heterogeneity of terminologies. It is necessary to semantically integrate these distinct terminologies. The semantic integration by using a third terminology as a support is a conventional approach for the integration of two terminologies that are not very structur...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.08.013
更新日期:2017-10-01 00:00:00
abstract::With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical te...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.07.017
更新日期:2017-09-01 00:00:00
abstract::The Radiology Gamuts Ontology (RGO)-an ontology of diseases, interventions, and imaging findings-was developed to aid in decision support, education, and translational research in diagnostic radiology. The ontology defines a subsumption (is_a) relation between more general and more specific terms, and a causal relatio...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.03.015
更新日期:2016-06-01 00:00:00
abstract::A software framework can reduce costs related to the development of an application because it allows developers to reuse both design and code. Recently, companies and research groups have announced that they have been employing health software frameworks. This paper presents the design, proof-of-concept implementation...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.06.002
更新日期:2016-08-01 00:00:00
abstract::The class of continuous time Bayesian network classifiers is defined; it solves the problem of supervised classification on multivariate trajectories evolving in continuous time. The trajectory consists of the values of discrete attributes that are measured in continuous time, while the predicted class is expected to ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2012.07.002
更新日期:2012-12-01 00:00:00
abstract::Due to increasing volume and unstructured nature of the scientific literatures in biomedical domain, most of the information embedded within them remain untapped. This paper presents a biomedical text analytics system, DiseaSE (Disease Symptom Extraction), to identify and extract disease symptoms and their association...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103324
更新日期:2019-12-01 00:00:00
abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.07.020
更新日期:2015-12-01 00:00:00
abstract::This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which com...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.06.015
更新日期:2015-12-01 00:00:00
abstract::As a result of recent advances in cancer research and "precision medicine" approaches, i.e. the idea of treating each patient with the right drug at the right time, more and more cancer patients are being cured, or might have to cope with a life with cancer. For many people, cancer survival today means living with a c...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103342
更新日期:2020-01-01 00:00:00
abstract::Social media has been identified as a promising potential source of information for pharmacovigilance. The adoption of social media data has been hindered by the massive and noisy nature of the data. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103307
更新日期:2019-11-01 00:00:00
abstract:OBJECTIVE:To investigate whether SNOMED CT covers the terms used in pre-operative assessment guidelines, and if necessary, how the measured content coverage can be improved. METHODS:Pre-operative assessment guidelines were retrieved from the websites of (inter)national anesthesia-related societies. The recommendations...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2010.07.009
更新日期:2010-12-01 00:00:00
abstract::Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome, known as strong ignorability. Approaches to enforcing strong ignorability in causal analyses of observational data include weighting and matching methods. Effect estimates, such as the...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103515
更新日期:2020-09-01 00:00:00
abstract:MOTIVATION:PubMed is the most widely used database of biomedical literature. To the detriment of the user though, the ranking of the documents retrieved for a query is not content-based, and important semantic information in the form of assigned Medical Subject Headings (MeSH) terms is not readily presented or producti...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.05.009
更新日期:2011-12-01 00:00:00
abstract:OBJECTIVE:To outline new design directions for informatics solutions that facilitate personal discovery with self-monitoring data. We investigate this question in the context of chronic disease self-management with the focus on type 2 diabetes. MATERIALS AND METHODS:We conducted an observational qualitative study of d...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.09.013
更新日期:2017-12-01 00:00:00
abstract::The management of chronic and out-patients is a complex process which requires the cooperation of different agents belonging to several organizational units. Patients have to move to different locations to access the necessary services and to communicate their health status data. From their point of view there should ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2006.12.002
更新日期:2007-10-01 00:00:00
abstract::To extract biomedical information about bio-entities from the huge amount of biomedical literature, the first key step is recognizing their names in these literatures, which remains a challenging task due to the irregularities and ambiguities in bio-entities nomenclature. The recognition performances of the current po...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.01.002
更新日期:2008-08-01 00:00:00
abstract:BACKGROUND:The determination of risk factors and their temporal relations in natural language patient records is a complex task which has been addressed in the i2b2/UTHealth 2014 shared task. In this context, in most systems it was broadly decomposed into two sub-tasks implemented by two components: entity detection, a...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.06.014
更新日期:2015-12-01 00:00:00
abstract::The ultimate goal of genomics research is to describe the network of molecules and interactions that govern all biological functions and disease processes in cells. Nonlinear interactions among genes in terms of their logic relationships play a key role for deciphering the networks of molecules that underlie cellular ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2007.11.006
更新日期:2008-08-01 00:00:00
abstract::In this paper, a Hidden Semi-Markov Model (HSMM) based approach is proposed to evaluate and monitor body motion during a rehabilitation training program. The approach extracts clinically relevant motion features from skeleton joint trajectories, acquired by the RGB-D camera, and provides a score for the subject's perf...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.12.012
更新日期:2018-02-01 00:00:00
abstract::Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2012.11.001
更新日期:2013-02-01 00:00:00
abstract::Molecular Property Diagnostic Suite - Diabetes Mellitus (MPDSDM) is a Galaxy-based, open source disease-specific web portal for diabetes. It consists of three modules namely (i) data library (ii) data processing and (iii) data analysis tools. The data library (target library and literature) module provide extensive an...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.08.003
更新日期:2018-09-01 00:00:00
abstract:OBJECTIVE:Our aim is to extract clinically-meaningful phenotypes from longitudinal electronic health records (EHRs) of medically-complex children. This is a fragile set of patients consuming a disproportionate amount of pediatric care resources but who often end up with sub-optimal clinical outcome. The rise in availab...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103125
更新日期:2019-05-01 00:00:00
abstract::Mereological relations such as part-of and its inverse has-part are fundamental to the description of the structure of living organisms. Whereas classical mereology focuses on individual entities, mereological relations in biomedical ontologies are generally asserted between classes of individuals. In general, this pr...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2005.11.003
更新日期:2006-06-01 00:00:00
abstract::Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the w...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.04.013
更新日期:2015-08-01 00:00:00
abstract::The Foundational Model of Anatomy (FMA), initially developed as an enhancement of the anatomical content of UMLS, is a domain ontology of the concepts and relationships that pertain to the structural organization of the human body. It encompasses the material objects from the molecular to the macroscopic levels that c...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2003.11.007
更新日期:2003-12-01 00:00:00
abstract:STUDY OBJECTIVE:The goals of this investigation were to study the temporal relationships between the demands for key resources in the emergency department (ED) and the inpatient hospital, and to develop multivariate forecasting models. METHODS:Hourly data were collected from three diverse hospitals for the year 2006. ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.05.003
更新日期:2009-02-01 00:00:00
abstract::This study developed a medicine query system based on Semantic Web and open data especially for self-medication users to search over-the-counter (OTC) medicines. Most existing medicine query systems are based on keyword searches. If users are uncertain about the exact search words, these query systems do not offer eff...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103504
更新日期:2020-08-01 00:00:00