A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Abstract:

:With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical text requires significant amount of human work, developing an automated de-identification system is necessary. While there are many de-identification systems available for English clinical text, designing a de-identification system for Chinese clinical text faces many challenges such as unavailability of necessary lexical resources and sparsity of patient health information (PHI) in Chinese clinical text. In this paper, we designed a de-identification pipeline taking advantage of both rule-based and machine learning techniques. Our method, in particular, can effectively construct a data set with dense PHI information, which saves annotation time significantly for subsequent supervised learning. We experiment on a dataset of 3000 heterogeneous clinical documents to evaluate the annotation cost and the de-identification performance. Our approach can increase the efficiency of the annotation effort by over 60% while reaching performance as high as over 90% measured by F score. We demonstrate that combing rule-based and machine learning is an effective way to reduce the annotation cost and achieve high performance in Chinese clinical text de-identification task.

journal_name

J Biomed Inform

authors

Jian Z,Guo X,Liu S,Ma H,Zhang S,Zhang R,Lei J

doi

10.1016/j.jbi.2017.07.017

subject

Has Abstract

pub_date

2017-09-01 00:00:00

pages

76-83

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(17)30177-6

journal_volume

73

pub_type

杂志文章
  • Word sense disambiguation across two domains: biomedical literature and clinical notes.

    abstract::The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense anno...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.02.003

    authors: Savova GK,Coden AR,Sominsky IL,Johnson R,Ogren PV,de Groen PC,Chute CG

    更新日期:2008-12-01 00:00:00

  • High-performance implementation and analysis of the Linkmap program.

    abstract::Linkage analysis uses information from family pedigrees to map genes and locate disease genes on particular chromosomes. A recombination fraction denoted as theta is estimated as a measure of crossing over between two loci. Genetic linkage calculations are very time-consuming particularly for large family pedigrees, a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1006/jbin.2002.1031

    authors: Kothari K,Lopez-Benitez N,Poduslo SE

    更新日期:2001-12-01 00:00:00

  • Wisdom of artificial crowds feature selection in untargeted metabolomics: An application to the development of a blood-based diagnostic test for thrombotic myocardial infarction.

    abstract:INTRODUCTION:Heart disease remains a leading cause of global mortality. While acute myocardial infarction (colloquially: heart attack), has multiple proximate causes, proximate etiology cannot be determined by a blood-based diagnostic test. We enrolled a suitable patient cohort and conducted a non-targeted quantificati...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,多中心研究

    doi:10.1016/j.jbi.2018.03.007

    authors: Trainor PJ,Yampolskiy RV,DeFilippis AP

    更新日期:2018-05-01 00:00:00

  • Health information technology adoption: Understanding research protocols and outcome measurements for IT interventions in health care.

    abstract:OBJECTIVE:To classify and characterize the variables commonly used to measure the impact of Information Technology (IT) adoption in health care, as well as settings and IT interventions tested, and to guide future research. MATERIALS AND METHODS:We conducted a descriptive study screening a sample of 236 studies from a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.07.018

    authors: Colicchio TK,Facelli JC,Del Fiol G,Scammon DL,Bowes WA 3rd,Narus SP

    更新日期:2016-10-01 00:00:00

  • Serum cancer biomarker discovery through analysis of gene expression data sets across multiple tumor and normal tissues.

    abstract::The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candidates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metab...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.08.010

    authors: Jin H,Lee HC,Park SS,Jeong YS,Kim SY

    更新日期:2011-12-01 00:00:00

  • Scenario-based design: a method for connecting information system design with public health operations and emergency management.

    abstract:UNLABELLED:Responding to public health emergencies requires rapid and accurate assessment of workforce availability under adverse and changing circumstances. However, public health information systems to support resource management during both routine and emergency operations are currently lacking. We applied scenario-...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.07.004

    authors: Reeder B,Turner AM

    更新日期:2011-12-01 00:00:00

  • Mapping high-dimensional data onto a relative distance plane--an exact method for visualizing and characterizing high-dimensional patterns.

    abstract::We introduce a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances between points with respect to any two reference patterns in a special two-dimensional coordinate system, the relative distance pl...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2004.07.005

    authors: Somorjai RL,Dolenko B,Demko A,Mandelzweig M,Nikulin AE,Baumgartner R,Pizzi NJ

    更新日期:2004-10-01 00:00:00

  • Development of the nursing problem list subset of SNOMED CT®.

    abstract:OBJECTIVE:To create an interoperable set of nursing diagnoses for use in the patient problem list in the EHR to support interoperability. DESIGN:Queries for nursing diagnostic concepts were executed against the UMLS Metathesaurus to retrieve all nursing diagnoses across four nursing terminologies where the concept was...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.12.003

    authors: Matney SA,Warren JJ,Evans JL,Kim TY,Coenen A,Auld VA

    更新日期:2012-08-01 00:00:00

  • Patient similarity for precision medicine: A systematic review.

    abstract::Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is cur...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.06.001

    authors: Parimbelli E,Marini S,Sacchi L,Bellazzi R

    更新日期:2018-07-01 00:00:00

  • Classification of ADHD with bi-objective optimization.

    abstract::Attention Deficit Hyperactive Disorder (ADHD) is one of the most common diseases in school aged children. In this paper, we consider using fMRI data with classification techniques to aid the diagnosis of ADHD and propose a bi-objective ADHD classification scheme based on L1-norm support vector machine (SVM). In our cl...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.011

    authors: Shao L,Xu Y,Fu D

    更新日期:2018-08-01 00:00:00

  • A knowledge-based system to find over-the-counter medicines for self-medication.

    abstract::This study developed a medicine query system based on Semantic Web and open data especially for self-medication users to search over-the-counter (OTC) medicines. Most existing medicine query systems are based on keyword searches. If users are uncertain about the exact search words, these query systems do not offer eff...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103504

    authors: Sung HY,Chi YL

    更新日期:2020-08-01 00:00:00

  • Toward national comparable nurse practitioner data: proposed data elements, rationale, and methods.

    abstract::Federal funds have supported Nurse Practitioner (NP) education and the establishment of nurse-managed centers. Yet, important questions are raised about the quality and appropriate scope of NP care. Few NP-patient encounters are documented in the largest national surveys of ambulatory care, sponsored by the National C...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2003.09.016

    authors: Jenkins ML

    更新日期:2003-08-01 00:00:00

  • Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

    abstract::Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, theref...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.01.005

    authors: Liu B,Madduri RK,Sotomayor B,Chard K,Lacinski L,Dave UJ,Li J,Liu C,Foster IT

    更新日期:2014-06-01 00:00:00

  • Grounding a new information technology implementation framework in behavioral science: a systematic analysis of the literature on IT use.

    abstract::Many interventions to improve the success of information technology (IT) implementations are grounded in behavioral science, using theories, and models to identify conditions and determinants of successful use. However, each model in the IT literature has evolved to address specific theoretical problems of particular ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2003.09.002

    authors: Kukafka R,Johnson SB,Linfante A,Allegrante JP

    更新日期:2003-06-01 00:00:00

  • Virtualizing living and working spaces: Proof of concept for a biomedical space-replication methodology.

    abstract::The physical spaces within which the work of health occurs - the home, the intensive care unit, the emergency room, even the bedroom - influence the manner in which behaviors unfold, and may contribute to efficacy and effectiveness of health interventions. Yet the study of such complex workspaces is difficult. Health ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.07.007

    authors: Brennan PF,Ponto K,Casper G,Tredinnick R,Broecker M

    更新日期:2015-10-01 00:00:00

  • Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.

    abstract:BACKGROUND:The determination of risk factors and their temporal relations in natural language patient records is a complex task which has been addressed in the i2b2/UTHealth 2014 shared task. In this context, in most systems it was broadly decomposed into two sub-tasks implemented by two components: entity detection, a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.06.014

    authors: Grouin C,Moriceau V,Zweigenbaum P

    更新日期:2015-12-01 00:00:00

  • A novel web informatics approach for automated surveillance of cancer mortality trends.

    abstract::Cancer surveillance data are collected every year in the United States via the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). General trends are closely monitored to measure the nation's progress against cancer. The...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.03.027

    authors: Tourassi G,Yoon HJ,Xu S

    更新日期:2016-06-01 00:00:00

  • Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.

    abstract::Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this s...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.10.002

    authors: Xu R,Wang Q

    更新日期:2015-02-01 00:00:00

  • Predicting the function of transplanted kidney in long-term care processes: Application of a hybrid model.

    abstract:BACKGROUND:A tool that can predict the estimated glomerular filtration rate (eGFR) in routine daily care can help clinicians to make better decisions for kidney transplant patients and to improve transplantation outcome. In this paper, we proposed a hybrid prediction model for predicting a future value for eGFR during ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103116

    authors: Rashidi Khazaee P,Bagherzadeh M J,Niazkhani Z,Pirnejad H

    更新日期:2019-03-01 00:00:00

  • An image score inference system for RNAi genome-wide screening based on fuzzy mixture regression modeling.

    abstract::With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfill...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.04.007

    authors: Wang J,Zhou X,Li F,Bradley PL,Chang SF,Perrimon N,Wong ST

    更新日期:2009-02-01 00:00:00

  • Multi-step ahead meningitis case forecasting based on decomposition and multi-objective optimization methods.

    abstract::Epidemiological time series forecasting plays an important role in health public systems, due to its ability to allow managers to develop strategic planning to avoid possible epidemics. In this paper, a hybrid learning framework is developed to forecast multi-step-ahead (one, two, and three-month-ahead) meningitis cas...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103575

    authors: Ribeiro MHDM,Mariani VC,Coelho LDS

    更新日期:2020-11-01 00:00:00

  • Prediction of influenza vaccination outcome by neural networks and logistic regression.

    abstract::The major challenge in influenza vaccination is to predict vaccine efficacy. The purpose of this study was to design a model to enable successful prediction of the outcome of influenza vaccination based on real historical medical data. A non-linear neural network approach was used, and its performance compared to logi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.04.011

    authors: Trtica-Majnaric L,Zekic-Susac M,Sarlija N,Vitale B

    更新日期:2010-10-01 00:00:00

  • A comparison of two methods for retrieving ICD-9-CM data: the effect of using an ontology-based method for handling terminology changes.

    abstract:OBJECTIVE:Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information sys...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.01.005

    authors: Yu AC,Cimino JJ

    更新日期:2011-04-01 00:00:00

  • Challenges in clinical natural language processing for automated disorder normalization.

    abstract:BACKGROUND:Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or groun...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.07.010

    authors: Leaman R,Khare R,Lu Z

    更新日期:2015-10-01 00:00:00

  • Evaluation of an Enhanced Role-Based Access Control model to manage information access in collaborative processes for a statewide clinical education program.

    abstract:BACKGROUND:Managing information access in collaborative processes is a critical requirement to team-based biomedical research, clinical education, and patient care. We have previously developed a computation model, Enhanced Role-Based Access Control (EnhancedRBAC), and applied it to coordinate information access in the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.11.007

    authors: Le XH,Doll T,Barbosu M,Luque A,Wang D

    更新日期:2014-08-01 00:00:00

  • A medical treatment based scoring model to detect abusive institutions.

    abstract::Medical abuse refers to a type of abnormal medical practice which is not in compliance with qualitative or ethical standards, such as excessive prescription or overbilling of medical services. Detection of such medical abuses is crucial, especially for the patients and insurance providers, because they become subject ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103423

    authors: Lee J,Shin H,Cho S

    更新日期:2020-07-01 00:00:00

  • glUCModel: a monitoring and modeling system for chronic diseases applied to diabetes.

    abstract::Chronic patients must carry out a rigorous control of diverse factors in their lives. Diet, sport activity, medical analysis or blood glucose levels are some of them. This is a hard task, because some of these controls are performed very often, for instance some diabetics measure their glucose levels several times eve...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.12.015

    authors: Hidalgo JI,Maqueda E,Risco-Martín JL,Cuesta-Infante A,Colmenar JM,Nobel J

    更新日期:2014-04-01 00:00:00

  • Evaluating warfarin dosing models on multiple datasets with a novel software framework and evolutionary optimisation.

    abstract::Warfarin is an effective preventative treatment for arterial and venous thromboembolism, but requires individualised dosing due to its narrow therapeutic range and high individual variation. Many machine learning techniques have been demonstrated in this domain. This study evaluated the accuracy of the most promising ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103634

    authors: Truda G,Marais P

    更新日期:2021-01-01 00:00:00

  • Automated identification of adverse events related to central venous catheters.

    abstract::Methods for surveillance of adverse events (AEs) in clinical settings are limited by cost, technology, and appropriate data availability. In this study, two methods for semi-automated review of text records within the Veterans Administration database are utilized to identify AEs related to the placement of central ven...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2006.06.003

    authors: Penz JF,Wilcox AB,Hurdle JF

    更新日期:2007-04-01 00:00:00

  • A framework for modeling health behavior protocols and their linkage to behavioral theory.

    abstract::With the rise in chronic, behavior-related disease, computerized behavioral protocols (CBPs) that help individuals improve behaviors have the potential to play an increasing role in the future health of society. To be effective and widely used CBPs should be based on accepted behavioral theory. However, designing CBPs...

    journal_title:Journal of biomedical informatics

    pub_type: 临床试验,杂志文章

    doi:10.1016/j.jbi.2004.12.001

    authors: Lenert L,Norman GJ,Mailhot M,Patrick K

    更新日期:2005-08-01 00:00:00