Abstract:
:With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical text requires significant amount of human work, developing an automated de-identification system is necessary. While there are many de-identification systems available for English clinical text, designing a de-identification system for Chinese clinical text faces many challenges such as unavailability of necessary lexical resources and sparsity of patient health information (PHI) in Chinese clinical text. In this paper, we designed a de-identification pipeline taking advantage of both rule-based and machine learning techniques. Our method, in particular, can effectively construct a data set with dense PHI information, which saves annotation time significantly for subsequent supervised learning. We experiment on a dataset of 3000 heterogeneous clinical documents to evaluate the annotation cost and the de-identification performance. Our approach can increase the efficiency of the annotation effort by over 60% while reaching performance as high as over 90% measured by F score. We demonstrate that combing rule-based and machine learning is an effective way to reduce the annotation cost and achieve high performance in Chinese clinical text de-identification task.
journal_name
J Biomed Informjournal_title
Journal of biomedical informaticsauthors
Jian Z,Guo X,Liu S,Ma H,Zhang S,Zhang R,Lei Jdoi
10.1016/j.jbi.2017.07.017subject
Has Abstractpub_date
2017-09-01 00:00:00pages
76-83eissn
1532-0464issn
1532-0480pii
S1532-0464(17)30177-6journal_volume
73pub_type
杂志文章abstract::The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense anno...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.02.003
更新日期:2008-12-01 00:00:00
abstract::Linkage analysis uses information from family pedigrees to map genes and locate disease genes on particular chromosomes. A recombination fraction denoted as theta is estimated as a measure of crossing over between two loci. Genetic linkage calculations are very time-consuming particularly for large family pedigrees, a...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1006/jbin.2002.1031
更新日期:2001-12-01 00:00:00
abstract:INTRODUCTION:Heart disease remains a leading cause of global mortality. While acute myocardial infarction (colloquially: heart attack), has multiple proximate causes, proximate etiology cannot be determined by a blood-based diagnostic test. We enrolled a suitable patient cohort and conducted a non-targeted quantificati...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,多中心研究
doi:10.1016/j.jbi.2018.03.007
更新日期:2018-05-01 00:00:00
abstract:OBJECTIVE:To classify and characterize the variables commonly used to measure the impact of Information Technology (IT) adoption in health care, as well as settings and IT interventions tested, and to guide future research. MATERIALS AND METHODS:We conducted a descriptive study screening a sample of 236 studies from a...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.07.018
更新日期:2016-10-01 00:00:00
abstract::The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candidates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metab...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.08.010
更新日期:2011-12-01 00:00:00
abstract:UNLABELLED:Responding to public health emergencies requires rapid and accurate assessment of workforce availability under adverse and changing circumstances. However, public health information systems to support resource management during both routine and emergency operations are currently lacking. We applied scenario-...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.07.004
更新日期:2011-12-01 00:00:00
abstract::We introduce a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances between points with respect to any two reference patterns in a special two-dimensional coordinate system, the relative distance pl...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2004.07.005
更新日期:2004-10-01 00:00:00
abstract:OBJECTIVE:To create an interoperable set of nursing diagnoses for use in the patient problem list in the EHR to support interoperability. DESIGN:Queries for nursing diagnostic concepts were executed against the UMLS Metathesaurus to retrieve all nursing diagnoses across four nursing terminologies where the concept was...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.12.003
更新日期:2012-08-01 00:00:00
abstract::Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is cur...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.06.001
更新日期:2018-07-01 00:00:00
abstract::Attention Deficit Hyperactive Disorder (ADHD) is one of the most common diseases in school aged children. In this paper, we consider using fMRI data with classification techniques to aid the diagnosis of ADHD and propose a bi-objective ADHD classification scheme based on L1-norm support vector machine (SVM). In our cl...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.07.011
更新日期:2018-08-01 00:00:00
abstract::This study developed a medicine query system based on Semantic Web and open data especially for self-medication users to search over-the-counter (OTC) medicines. Most existing medicine query systems are based on keyword searches. If users are uncertain about the exact search words, these query systems do not offer eff...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103504
更新日期:2020-08-01 00:00:00
abstract::Federal funds have supported Nurse Practitioner (NP) education and the establishment of nurse-managed centers. Yet, important questions are raised about the quality and appropriate scope of NP care. Few NP-patient encounters are documented in the largest national surveys of ambulatory care, sponsored by the National C...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2003.09.016
更新日期:2003-08-01 00:00:00
abstract::Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, theref...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.01.005
更新日期:2014-06-01 00:00:00
abstract::Many interventions to improve the success of information technology (IT) implementations are grounded in behavioral science, using theories, and models to identify conditions and determinants of successful use. However, each model in the IT literature has evolved to address specific theoretical problems of particular ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2003.09.002
更新日期:2003-06-01 00:00:00
abstract::The physical spaces within which the work of health occurs - the home, the intensive care unit, the emergency room, even the bedroom - influence the manner in which behaviors unfold, and may contribute to efficacy and effectiveness of health interventions. Yet the study of such complex workspaces is difficult. Health ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.07.007
更新日期:2015-10-01 00:00:00
abstract:BACKGROUND:The determination of risk factors and their temporal relations in natural language patient records is a complex task which has been addressed in the i2b2/UTHealth 2014 shared task. In this context, in most systems it was broadly decomposed into two sub-tasks implemented by two components: entity detection, a...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.06.014
更新日期:2015-12-01 00:00:00
abstract::Cancer surveillance data are collected every year in the United States via the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). General trends are closely monitored to measure the nation's progress against cancer. The...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.03.027
更新日期:2016-06-01 00:00:00
abstract::Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this s...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.10.002
更新日期:2015-02-01 00:00:00
abstract:BACKGROUND:A tool that can predict the estimated glomerular filtration rate (eGFR) in routine daily care can help clinicians to make better decisions for kidney transplant patients and to improve transplantation outcome. In this paper, we proposed a hybrid prediction model for predicting a future value for eGFR during ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103116
更新日期:2019-03-01 00:00:00
abstract::With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfill...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.04.007
更新日期:2009-02-01 00:00:00
abstract::Epidemiological time series forecasting plays an important role in health public systems, due to its ability to allow managers to develop strategic planning to avoid possible epidemics. In this paper, a hybrid learning framework is developed to forecast multi-step-ahead (one, two, and three-month-ahead) meningitis cas...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103575
更新日期:2020-11-01 00:00:00
abstract::The major challenge in influenza vaccination is to predict vaccine efficacy. The purpose of this study was to design a model to enable successful prediction of the outcome of influenza vaccination based on real historical medical data. A non-linear neural network approach was used, and its performance compared to logi...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2010.04.011
更新日期:2010-10-01 00:00:00
abstract:OBJECTIVE:Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information sys...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.01.005
更新日期:2011-04-01 00:00:00
abstract:BACKGROUND:Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or groun...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.07.010
更新日期:2015-10-01 00:00:00
abstract:BACKGROUND:Managing information access in collaborative processes is a critical requirement to team-based biomedical research, clinical education, and patient care. We have previously developed a computation model, Enhanced Role-Based Access Control (EnhancedRBAC), and applied it to coordinate information access in the...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.11.007
更新日期:2014-08-01 00:00:00
abstract::Medical abuse refers to a type of abnormal medical practice which is not in compliance with qualitative or ethical standards, such as excessive prescription or overbilling of medical services. Detection of such medical abuses is crucial, especially for the patients and insurance providers, because they become subject ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103423
更新日期:2020-07-01 00:00:00
abstract::Chronic patients must carry out a rigorous control of diverse factors in their lives. Diet, sport activity, medical analysis or blood glucose levels are some of them. This is a hard task, because some of these controls are performed very often, for instance some diabetics measure their glucose levels several times eve...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.12.015
更新日期:2014-04-01 00:00:00
abstract::Warfarin is an effective preventative treatment for arterial and venous thromboembolism, but requires individualised dosing due to its narrow therapeutic range and high individual variation. Many machine learning techniques have been demonstrated in this domain. This study evaluated the accuracy of the most promising ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103634
更新日期:2021-01-01 00:00:00
abstract::Methods for surveillance of adverse events (AEs) in clinical settings are limited by cost, technology, and appropriate data availability. In this study, two methods for semi-automated review of text records within the Veterans Administration database are utilized to identify AEs related to the placement of central ven...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2006.06.003
更新日期:2007-04-01 00:00:00
abstract::With the rise in chronic, behavior-related disease, computerized behavioral protocols (CBPs) that help individuals improve behaviors have the potential to play an increasing role in the future health of society. To be effective and widely used CBPs should be based on accepted behavioral theory. However, designing CBPs...
journal_title:Journal of biomedical informatics
pub_type: 临床试验,杂志文章
doi:10.1016/j.jbi.2004.12.001
更新日期:2005-08-01 00:00:00