Automatic detection of protected health information from clinic narratives.

Abstract:

:This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge.

journal_name

J Biomed Inform

authors

Yang H,Garibaldi JM

doi

10.1016/j.jbi.2015.06.015

subject

Has Abstract

pub_date

2015-12-01 00:00:00

pages

S30-8

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(15)00125-2

journal_volume

58 Suppl

pub_type

杂志文章
  • A hybrid of whale optimization and late acceptance hill climbing based imputation to enhance classification performance in electronic health records.

    abstract::Electronic health records (EHR) are a major source of information in biomedical informatics. Yet, missing values are prominent characteristics of EHR. Prediction on dataset with missing values results in inaccurate inferences. Nearest neighbour imputation based on lazy learning approach is a proven technique for missi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103190

    authors: Nagarajan G,Dhinesh Babu LD

    更新日期:2019-06-01 00:00:00

  • Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences.

    abstract::Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently r...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.08.009

    authors: Xu H,AbdelRahman S,Lu Y,Denny JC,Doan S

    更新日期:2011-12-01 00:00:00

  • Virtualizing living and working spaces: Proof of concept for a biomedical space-replication methodology.

    abstract::The physical spaces within which the work of health occurs - the home, the intensive care unit, the emergency room, even the bedroom - influence the manner in which behaviors unfold, and may contribute to efficacy and effectiveness of health interventions. Yet the study of such complex workspaces is difficult. Health ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.07.007

    authors: Brennan PF,Ponto K,Casper G,Tredinnick R,Broecker M

    更新日期:2015-10-01 00:00:00

  • Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.

    abstract:BACKGROUND:The determination of risk factors and their temporal relations in natural language patient records is a complex task which has been addressed in the i2b2/UTHealth 2014 shared task. In this context, in most systems it was broadly decomposed into two sub-tasks implemented by two components: entity detection, a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.06.014

    authors: Grouin C,Moriceau V,Zweigenbaum P

    更新日期:2015-12-01 00:00:00

  • Challenges in clinical natural language processing for automated disorder normalization.

    abstract:BACKGROUND:Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or groun...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.07.010

    authors: Leaman R,Khare R,Lu Z

    更新日期:2015-10-01 00:00:00

  • Automatic signal extraction, prioritizing and filtering approaches in detecting post-marketing cardiovascular events associated with targeted cancer drugs from the FDA Adverse Event Reporting System (FAERS).

    abstract:OBJECTIVE:Targeted drugs dramatically improve the treatment outcomes in cancer patients; however, these innovative drugs are often associated with unexpectedly high cardiovascular toxicity. Currently, cardiovascular safety represents both a challenging issue for drug developers, regulators, researchers, and clinicians ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.10.008

    authors: Xu R,Wang Q

    更新日期:2014-02-01 00:00:00

  • Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics.

    abstract::Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.11.001

    authors: Dao TT,Hoang TN,Ta XH,Tho MC

    更新日期:2013-02-01 00:00:00

  • Information extraction from biomedical text.

    abstract::Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. It requires deeper analysis than key word searches, but its aims fall short of the very hard and long-term problem of full text understanding. Information extraction r...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/s1532-0464(03)00015-7

    authors: Hobbs JR

    更新日期:2002-08-01 00:00:00

  • Health information technology adoption: Understanding research protocols and outcome measurements for IT interventions in health care.

    abstract:OBJECTIVE:To classify and characterize the variables commonly used to measure the impact of Information Technology (IT) adoption in health care, as well as settings and IT interventions tested, and to guide future research. MATERIALS AND METHODS:We conducted a descriptive study screening a sample of 236 studies from a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.07.018

    authors: Colicchio TK,Facelli JC,Del Fiol G,Scammon DL,Bowes WA 3rd,Narus SP

    更新日期:2016-10-01 00:00:00

  • Predicting changes in systolic blood pressure using longitudinal patient records.

    abstract:OBJECTIVE:This paper introduces a model that predicts future changes in systolic blood pressure (SBP) based on structured and unstructured (text-based) information from longitudinal clinical records. METHOD:For each patient, the clinical records are sorted in chronological order and SBP measurements are extracted from...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.06.024

    authors: Solomon JW,Nielsen RD

    更新日期:2015-12-01 00:00:00

  • Automated annotation and classification of BI-RADS assessment from radiology reports.

    abstract::The Breast Imaging Reporting and Data System (BI-RADS) was developed to reduce variation in the descriptions of findings. Manual analysis of breast radiology report data is challenging but is necessary for clinical and healthcare quality assurance activities. The objective of this study is to develop a natural languag...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.04.011

    authors: Castro SM,Tseytlin E,Medvedeva O,Mitchell K,Visweswaran S,Bekhuis T,Jacobson RS

    更新日期:2017-05-01 00:00:00

  • Chester: towards a personal medication advisor.

    abstract::Dialogue systems for health communication hold out the promise of providing intelligent assistance to patients through natural interfaces that require no training to use. But in order to make the development of such systems cost effective, we must be able to use generic techniques and components which are then special...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2006.02.004

    authors: Allen J,Ferguson G,Blaylock N,Byron D,Chambers N,Dzikovska M,Galescu L,Swift M

    更新日期:2006-10-01 00:00:00

  • HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    abstract::The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function....

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.01.008

    authors: O'Driscoll A,Belogrudov V,Carroll J,Kropp K,Walsh P,Ghazal P,Sleator RD

    更新日期:2015-04-01 00:00:00

  • Development of a clinician reputation metric to identify appropriate problem-medication pairs in a crowdsourced knowledge base.

    abstract:BACKGROUND:Correlation of data within electronic health records is necessary for implementation of various clinical decision support functions, including patient summarization. A key type of correlation is linking medications to clinical problems; while some databases of problem-medication links are available, they are...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.11.010

    authors: McCoy AB,Wright A,Rogith D,Fathiamini S,Ottenbacher AJ,Sittig DF

    更新日期:2014-04-01 00:00:00

  • Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients.

    abstract:OBJECTIVE:Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103184

    authors: Coquet J,Bozkurt S,Kan KM,Ferrari MK,Blayney DW,Brooks JD,Hernandez-Boussard T

    更新日期:2019-06-01 00:00:00

  • Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature.

    abstract::With a rapid progress in the field, a great many fMRI studies are published every year, to the extent that it is now becoming difficult for researchers to keep up with the literature, since reading papers is extremely time-consuming and labor-intensive. Thus, automatic information extraction has become an important is...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.04.003

    authors: Hsiao MY,Chen CC,Chen JH

    更新日期:2009-10-01 00:00:00

  • Unstructured medical image query using big data - An epilepsy case study.

    abstract::Big data technologies are critical to the medical field which requires new frameworks to leverage them. Such frameworks would benefit medical experts to test hypotheses by querying huge volumes of unstructured medical data to provide better patient care. The objective of this work is to implement and examine the feasi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.12.005

    authors: Istephan S,Siadat MR

    更新日期:2016-02-01 00:00:00

  • Deep learning with wearable based heart rate variability for prediction of mental and general health.

    abstract::The ubiquity and commoditisation of wearable biosensors (fitness bands) has led to a deluge of personal healthcare data, but with limited analytics typically fed back to the user. The feasibility of feeding back more complex, seemingly unrelated measures to users was investigated, by assessing whether increased levels...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103610

    authors: Coutts LV,Plans D,Brown AW,Collomosse J

    更新日期:2020-12-01 00:00:00

  • Toward analyzing and synthesizing previous research in early prediction of cardiac arrest using machine learning based on a multi-layered integrative framework.

    abstract:BACKGROUND:One of the significant problems in the field of healthcare is the low survival rate of people who have experienced sudden cardiac arrest. Early prediction of cardiac arrest can provide the time required for intervening and preventing its onset in order to reduce mortality. Traditional statistical methods hav...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.10.008

    authors: Layeghian Javan S,Sepehri MM,Aghajani H

    更新日期:2018-12-01 00:00:00

  • Cadec: A corpus of adverse drug event annotations.

    abstract::CSIRO Adverse Drug Event Corpus (Cadec) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.03.010

    authors: Karimi S,Metke-Jimenez A,Kemp M,Wang C

    更新日期:2015-06-01 00:00:00

  • MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms.

    abstract:MOTIVATION:PubMed is the most widely used database of biomedical literature. To the detriment of the user though, the ranking of the documents retrieved for a query is not content-based, and important semantic information in the form of assigned Medical Subject Headings (MeSH) terms is not readily presented or producti...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.05.009

    authors: Theodosiou T,Vizirianakis IS,Angelis L,Tsaftaris A,Darzentas N

    更新日期:2011-12-01 00:00:00

  • Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports.

    abstract::Evaluating automated indexing applications requires comparing automatically indexed terms against manual reference standard annotations. However, there are no standard guidelines for determining which words from a textual document to include in manual annotations, and the vague task can result in substantial variation...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2005.06.004

    authors: Chapman WW,Dowling JN

    更新日期:2006-04-01 00:00:00

  • Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text.

    abstract::We address the assignment of ICD-10 codes for causes of death by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We leverage a deep neural network that combines word embeddings, recurrent units, and neu...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.02.011

    authors: Duarte F,Martins B,Pinto CS,Silva MJ

    更新日期:2018-04-01 00:00:00

  • A machine-learned knowledge discovery method for associating complex phenotypes with complex genotypes. Application to pain.

    abstract:BACKGROUND:The association of genotyping information with common traits is not satisfactorily solved. One of the most complex traits is pain and association studies have failed so far to provide reproducible predictions of pain phenotypes from genotypes in the general population despite a well-established genetic basis...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.07.010

    authors: Lötsch J,Ultsch A

    更新日期:2013-10-01 00:00:00

  • Collaborative text-annotation resource for disease-centered relation extraction from biomedical text.

    abstract::Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text mining for relation extraction in the biomedical literature. Relation...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.02.001

    authors: Cano C,Monaghan T,Blanco A,Wall DP,Peshkin L

    更新日期:2009-10-01 00:00:00

  • Security and privacy in electronic health records: a systematic literature review.

    abstract:OBJECTIVE:To report the results of a systematic literature review concerning the security and privacy of electronic health record (EHR) systems. DATA SOURCES:Original articles written in English found in MEDLINE, ACM Digital Library, Wiley InterScience, IEEE Digital Library, Science@Direct, MetaPress, ERIC, CINAHL and...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2012.12.003

    authors: Fernández-Alemán JL,Señor IC,Lozoya PÁ,Toval A

    更新日期:2013-06-01 00:00:00

  • Modeling association detection in order to discover compounds to inhibit oral cancer.

    abstract::In the past, algorithms exploiting varying semantics in interactions between biological objects such as genes and diseases have been used in bioinformatics to uncover latent relationships within biological datasets. In this paper, we consider the algorithm Medusa in parallel with binary classification in order to find...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.005

    authors: Vittal S,Karthikeyan G

    更新日期:2018-08-01 00:00:00

  • A cascaded approach for Chinese clinical text de-identification with less annotation effort.

    abstract::With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical te...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.07.017

    authors: Jian Z,Guo X,Liu S,Ma H,Zhang S,Zhang R,Lei J

    更新日期:2017-09-01 00:00:00

  • A Bayesian system to detect and characterize overlapping outbreaks.

    abstract::Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.08.003

    authors: Aronis JM,Millett NE,Wagner MM,Tsui F,Ye Y,Ferraro JP,Haug PJ,Gesteland PH,Cooper GF

    更新日期:2017-09-01 00:00:00

  • Systematic comparison of the protein-protein interaction databases from a user's perspective.

    abstract::In absence of periodic systematic comparisons, biologists/bioinformaticians may be forced to make a subjective selection among the many protein-protein interaction (PPI) databases and tools. We conducted a comprehensive compilation and comparison of such resources. We compiled 375 PPI resources, short-listed 125 impor...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103380

    authors: Bajpai AK,Davuluri S,Tiwary K,Narayanan S,Oguru S,Basavaraju K,Dayalan D,Thirumurugan K,Acharya KK

    更新日期:2020-03-01 00:00:00