Abstract:
INTRODUCTION:Machine learning (ML) and natural language processing have great potential to improve information extraction (IE) within electronic medical records (EMRs) for a wide variety of clinical search and summarization tools. Despite ML advancements, clinical adoption of real time IE tools for patient care remains low. Clinically motivated IE task definitions, publicly available annotated clinical datasets, and inclusion of subtasks such as coreference resolution and named entity normalization are critical for the development of useful clinical tools. MATERIALS AND METHODS:We provide a task definition and comprehensive annotation requirements for a clinically motivated symptom extraction task. Four annotators labeled symptom mentions within 1108 discharge summaries from two public clinical note datasets for the tasks of named entity recognition, coreference resolution, and named entity normalization; these annotations will be released to the public. Baseline human performance was assessed and two ML models were evaluated on the symptom extraction task. RESULTS:16,922 symptom mentions were identified within the discharge summaries, with 11,944 symptom instances after coreference resolution and 1255 unique normalized answer forms. Human annotator performance averaged 92.2% F1. Recurrent network model performance was 85.6% F1 (recall 85.8%, precision 85.4%), and Transformer-based model performance was 86.3% F1 (recall 86.6%, precision 86.1%). Our models extracted vague symptoms, acronyms, typographical errors, and grouping statements. The models generalized effectively to a separate clinical note corpus and can run in real time. CONCLUSION:To our knowledge, this dataset will be the largest and most comprehensive publicly released, annotated dataset for clinically motivated symptom extraction, as it includes annotations for named entity recognition, coreference, and normalization for more than 1000 clinical documents. Our neural network models extracted symptoms from unstructured clinical free text at near human performance in real time. In this paper, we present a clinically motivated task definition, dataset, and simple supervised natural language processing models to demonstrate the feasibility of building clinically applicable information extraction tools.
journal_name
J Biomed Informjournal_title
Journal of biomedical informaticsauthors
Steinkamp JM,Bala W,Sharma A,Kantrowitz JJdoi
10.1016/j.jbi.2019.103354subject
Has Abstractpub_date
2020-02-01 00:00:00pages
103354eissn
1532-0464issn
1532-0480pii
S1532-0464(19)30276-Xjournal_volume
102pub_type
杂志文章abstract:OBJECTIVE:To develop an effective and scalable individual-level patient cost prediction method by automatically learning hidden temporal patterns from multivariate time series data in patient insurance claims using a convolutional neural network (CNN) architecture. METHODS:We used three years of medical and pharmacy c...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103565
更新日期:2020-11-01 00:00:00
abstract::Literature Based Discovery (LBD) refers to the problem of inferring new and interesting knowledge by logically connecting independent fragments of information units through explicit or implicit means. This area of research, which incorporates techniques from Natural Language Processing (NLP), Information Retrieval and...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2019.103141
更新日期:2019-05-01 00:00:00
abstract::Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) wit...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.01.009
更新日期:2016-04-01 00:00:00
abstract::Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2012.04.004
更新日期:2012-10-01 00:00:00
abstract:BACKGROUND:One of the major concerns of the biomedical community is the increasing prevalence of antimicrobial resistant microorganisms. Recent findings show that the diversification of colony morphology may be indicative of the expression of virulence factors and increased resistance to antibiotic therapeutics. To tra...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.03.007
更新日期:2015-06-01 00:00:00
abstract::We introduce a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances between points with respect to any two reference patterns in a special two-dimensional coordinate system, the relative distance pl...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2004.07.005
更新日期:2004-10-01 00:00:00
abstract::This study developed a medicine query system based on Semantic Web and open data especially for self-medication users to search over-the-counter (OTC) medicines. Most existing medicine query systems are based on keyword searches. If users are uncertain about the exact search words, these query systems do not offer eff...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103504
更新日期:2020-08-01 00:00:00
abstract::Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.08.003
更新日期:2017-09-01 00:00:00
abstract::Hypotheses of homology are the basis of phylogenetic analysis. All character data are considered to be equivalent regardless of the source of those characters. Putative homology statements are designated based on observations of similarity. Pairwise sequence alignment using the Needleman-Wunsch algorithm is the basis ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2005.11.005
更新日期:2006-02-01 00:00:00
abstract::The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candidates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metab...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.08.010
更新日期:2011-12-01 00:00:00
abstract::Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2012.11.001
更新日期:2013-02-01 00:00:00
abstract::Electronic health records (EHR) are a major source of information in biomedical informatics. Yet, missing values are prominent characteristics of EHR. Prediction on dataset with missing values results in inaccurate inferences. Nearest neighbour imputation based on lazy learning approach is a proven technique for missi...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103190
更新日期:2019-06-01 00:00:00
abstract::Community pharmacists are critically placed in the patient care chain being an extended frontline within primary healthcare networks across Europe. They are trained to ensure safe and effective medication use, a crucial and responsible role, extending beyond the common misconception limited to just providing timely ac...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103336
更新日期:2019-12-01 00:00:00
abstract::Molecular Property Diagnostic Suite - Diabetes Mellitus (MPDSDM) is a Galaxy-based, open source disease-specific web portal for diabetes. It consists of three modules namely (i) data library (ii) data processing and (iii) data analysis tools. The data library (target library and literature) module provide extensive an...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.08.003
更新日期:2018-09-01 00:00:00
abstract::Gene selection is important for cancer classification based on gene expression data, because of high dimensionality and small sample size. In this paper, we present a new gene selection method based on clustering, in which dissimilarity measures are obtained through kernel functions. It searches for best weights of ge...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.05.007
更新日期:2016-08-01 00:00:00
abstract::Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.12.002
更新日期:2014-08-01 00:00:00
abstract:RATIONALE:Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. METHODS:The tw...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.07.019
更新日期:2017-07-01 00:00:00
abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.07.020
更新日期:2015-12-01 00:00:00
abstract::Three-dimensional (3D) visualization of preoperative and intraoperative medical information becomes more and more important in minimally invasive surgery. We develop a 3D interactive surgical visualization system using mobile spatial information acquisition and autostereoscopic display for surgeons to observe surgical...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.05.014
更新日期:2017-07-01 00:00:00
abstract::In many respects, the critical care workplace resembles a paradigmatic complex system: on account of the dynamic and interactive nature of collaborative clinical work, these settings are characterized by non-linear, inter-dependent and emergent activities. Developing a comprehensive understanding of the work activitie...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.02.007
更新日期:2011-06-01 00:00:00
abstract::Medical error is a leading cause of patient death in the United States. Among the different types of medical errors, harm to patients caused by doctors missing early signs of deterioration is especially challenging to address due to the heterogeneity of patients' physiological patterns. In this study, we implemented r...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103425
更新日期:2020-07-01 00:00:00
abstract::A graphical representation of DNA sequences in which the distribution of a particular base B=A,C,G,T is represented by a set of discrete lines has been formulated. The methodology of this approach has been borrowed from two areas of physics: spectroscopy and dynamics. Consequently, the set of discrete lines is referre...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.06.001
更新日期:2017-08-01 00:00:00
abstract::The DDIExtraction Shared Task 2013 is the second edition of the DDIExtraction Shared Task series, a community-wide effort to promote the implementation and comparative assessment of natural language processing (NLP) techniques in the field of the pharmacovigilance domain, in particular, to address the extraction of dr...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.05.007
更新日期:2014-10-01 00:00:00
abstract::Phylogeography is a field that focuses on the geographical lineages of species such as vertebrates or viruses. Here, geographical data, such as location of a species or viral host is as important as the sequence information extracted from the species. Together, this information can help illustrate the migration of the...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.06.005
更新日期:2011-12-01 00:00:00
abstract::Drug therapeutic indications and side-effects are both measurable patient phenotype changes in response to the treatment. Inferring potential drug therapeutic indications and identifying clinically interesting drug side-effects are both important and challenging tasks. Previous studies have utilized either chemical st...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.03.014
更新日期:2014-10-01 00:00:00
abstract:OBJECTIVE:The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS:The significance of...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.05.009
更新日期:2017-07-01 00:00:00
abstract::With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfill...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.04.007
更新日期:2009-02-01 00:00:00
abstract:OBJECTIVE:Targeted drugs dramatically improve the treatment outcomes in cancer patients; however, these innovative drugs are often associated with unexpectedly high cardiovascular toxicity. Currently, cardiovascular safety represents both a challenging issue for drug developers, regulators, researchers, and clinicians ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.10.008
更新日期:2014-02-01 00:00:00
abstract::Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline m...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103525
更新日期:2020-09-01 00:00:00
abstract::Gene selection is an important task in bioinformatics studies, because the accuracy of cancer classification generally depends upon the genes that have biological relevance to the classifying problems. In this work, randomization test (RT) is used as a gene selection method for dealing with gene expression data. In th...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.03.009
更新日期:2013-08-01 00:00:00