Exploring the Privacy-Preserving Properties of Word Embeddings: Algorithmic Validation Study.

Abstract:

BACKGROUND:Word embeddings are dense numeric vectors used to represent language in neural networks. Until recently, there had been no publicly released embeddings trained on clinical data. Our work is the first to study the privacy implications of releasing these models. OBJECTIVE:This paper aims to demonstrate that traditional word embeddings created on clinical corpora that have been deidentified by removing personal health information (PHI) can nonetheless be exploited to reveal sensitive patient information. METHODS:We used embeddings created from 400,000 doctor-written consultation notes and experimented with 3 common word embedding methods to explore the privacy-preserving properties of each. RESULTS:We found that if publicly released embeddings are trained from a corpus anonymized by PHI removal, it is possible to reconstruct up to 68.5% (n=411/600) of the full names that remain in the deidentified corpus and associated sensitive information to specific patients in the corpus from which the embeddings were created. We also found that the distance between the word vector representation of a patient's name and a diagnostic billing code is informative and differs significantly from the distance between the name and a code not billed for that patient. CONCLUSIONS:Special care must be taken when sharing word embeddings created from clinical texts, as current approaches may compromise patient privacy. If PHI removal is used for anonymization before traditional word embeddings are trained, it is possible to attribute sensitive information to patients who have not been fully deidentified by the (necessarily imperfect) removal algorithms. A promising alternative (ie, anonymization by PHI replacement) may avoid these flaws. Our results are timely and critical, as an increasing number of researchers are pushing for publicly available health data.

journal_name

J Med Internet Res

authors

Abdalla M,Abdalla M,Hirst G,Rudzicz F

doi

10.2196/18055

subject

Has Abstract

pub_date

2020-07-15 00:00:00

pages

e18055

issue

7

eissn

1439-4456

issn

1438-8871

pii

v22i7e18055

journal_volume

22

pub_type

杂志文章
  • Online social network use by health care providers in a high traffic patient care environment.

    abstract:BACKGROUND:The majority of workers, regardless of age or occupational status, report engaging in personal Internet use in the workplace. There is little understanding of the impact that personal Internet use may have on patient care in acute clinical settings. OBJECTIVE:The objective of this study was to investigate t...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.2421

    authors: Black E,Light J,Paradise Black N,Thompson L

    更新日期:2013-05-17 00:00:00

  • Internet skills performance tests: are people ready for eHealth?

    abstract:BACKGROUND:Despite the amount of online health information, there are several barriers that limit the Internet's adoption as a source of health information. One of these barriers is highlighted in conceptualizations of the digital divide which include the differential possession of Internet skills, or "eHealth literacy...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.1581

    authors: van Deursen AJ,van Dijk JA

    更新日期:2011-04-29 00:00:00

  • Adolescent Perspectives on the Use of Social Media to Support Type 1 Diabetes Management: Focus Group Study.

    abstract:BACKGROUND:A majority of adolescents report the use of some form of social media, and many prefer to communicate via social networking sites. Social media may offer new opportunities in diabetes management, particularly in terms of how health care teams provide tailored support and treatment to adolescents with diabete...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/12149

    authors: Malik FS,Panlasigui N,Gritton J,Gill H,Yi-Frazier JP,Moreno MA

    更新日期:2019-05-30 00:00:00

  • Using Artificial Intelligence (Watson for Oncology) for Treatment Recommendations Amongst Chinese Patients with Lung Cancer: Feasibility Study.

    abstract:BACKGROUND:Artificial intelligence (AI) is developing quickly in the medical field and can benefit both medical staff and patients. The clinical decision support system Watson for Oncology (WFO) is an outstanding representative AI in the medical field, and it can provide to cancer patients prompt treatment recommendati...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/11087

    authors: Liu C,Liu X,Wu F,Xie M,Feng Y,Hu C

    更新日期:2018-09-25 00:00:00

  • The Computer-Assisted Brief Intervention for Tobacco (CABIT) program: a pilot study.

    abstract:BACKGROUND:Health care providers do not routinely carry out brief counseling for tobacco cessation despite the evidence for its effectiveness. For this intervention to be routinely used, it must be brief, be convenient, require little investment of resources, require little specialized training, and be perceived as eff...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.2074

    authors: Boudreaux ED,Bedek KL,Byrne NJ,Baumann BM,Lord SA,Grissom G

    更新日期:2012-12-03 00:00:00

  • The reliability of tweets as a supplementary method of seasonal influenza surveillance.

    abstract:BACKGROUND:Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surv...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.3532

    authors: Aslam AA,Tsou MH,Spitzberg BH,An L,Gawron JM,Gupta DK,Peddecord KM,Nagel AC,Allen C,Yang JA,Lindsay S

    更新日期:2014-11-14 00:00:00

  • Effect of Recruitment Methods on Response Rate in a Web-Based Study for Primary Care Physicians: Factorial Randomized Controlled Trial.

    abstract:BACKGROUND:Low participation rates are one of the most serious disadvantages of Web-based studies. It is necessary to develop effective strategies to improve participation rates to obtain sufficient data. OBJECTIVE:The objective of this trial was to investigate the effect of emphasizing the incentive in the subject li...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章,随机对照试验

    doi:10.2196/jmir.8561

    authors: So R,Shinohara K,Aoki T,Tsujimoto Y,Suganuma AM,Furukawa TA

    更新日期:2018-02-08 00:00:00

  • Childcare service centers' preferences and intentions to use a web-based program to implement healthy eating and physical activity policies and practices: a cross-sectional study.

    abstract:BACKGROUND:Overweight and obesity is a significant public health problem that impacts a large number of children globally. Supporting childcare centers to deliver healthy eating and physical activity-promoting policies and practices is a recommended strategy for obesity prevention, given that such services provide acce...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.3639

    authors: Yoong SL,Williams CM,Finch M,Wyse R,Jones J,Freund M,Wiggers JH,Nathan N,Dodds P,Wolfenden L

    更新日期:2015-04-30 00:00:00

  • The Feasibility and Effectiveness of Web-Based Advance Care Planning Programs: Scoping Review.

    abstract:BACKGROUND:Advance care planning (ACP) is a process with the overall aim to enhance care in concordance with patients' preferences. Key elements of ACP are to enable persons to define goals and preferences for future medical treatment and care, to discuss these with family and health care professionals, and to document...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章,评审

    doi:10.2196/15578

    authors: van der Smissen D,Overbeek A,van Dulmen S,van Gemert-Pijnen L,van der Heide A,Rietjens JA,Korfage IJ

    更新日期:2020-03-17 00:00:00

  • The Generalizability of Randomized Controlled Trials of Self-Guided Internet-Based Cognitive Behavioral Therapy for Depressive Symptoms: Systematic Review and Meta-Regression Analysis.

    abstract:BACKGROUND:Self-guided internet-based cognitive behavioral therapies (iCBTs) for depressive symptoms may substantially increase accessibility to mental health treatment. Despite this, questions remain as to the generalizability of the research on self-guided iCBT. OBJECTIVE:We sought to describe the clinical entry cri...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/10113

    authors: Lorenzo-Luaces L,Johns E,Keefe JR

    更新日期:2018-11-09 00:00:00

  • Online Outreach Services Among Men Who Use the Internet to Seek Sex With Other Men (MISM) in Ontario, Canada: An Online Survey.

    abstract:BACKGROUND:Men who use the Internet to seek sex with other men (MISM) are increasingly using the Internet to find sexual health information and to seek sexual partners, with some research suggesting HIV transmission is associated with sexual partnering online. Aiming to "meet men where they are at," some AIDS service o...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.4503

    authors: Brennan DJ,Lachowsky NJ,Georgievski G,Rosser BR,MacLachlan D,Murray J,Cruising Counts Research Team.

    更新日期:2015-12-09 00:00:00

  • Remote Monitoring of Patients With Heart Failure: An Overview of Systematic Reviews.

    abstract:BACKGROUND:Many systematic reviews exist on the use of remote patient monitoring (RPM) interventions to improve clinical outcomes and psychological well-being of patients with heart failure. However, research is broadly distributed from simple telephone-based to complex technology-based interventions. The scope and foc...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章,评审

    doi:10.2196/jmir.6571

    authors: Bashi N,Karunanithi M,Fatehi F,Ding H,Walters D

    更新日期:2017-01-20 00:00:00

  • Can an Internet-based health risk assessment highlight problems of heart disease risk factor awareness? A cross-sectional analysis.

    abstract:BACKGROUND:Health risk assessments are becoming more popular as a tool to conveniently and effectively reach community-dwelling adults who may be at risk for serious chronic conditions such as coronary heart disease (CHD). The use of such instruments to improve adults' risk factor awareness and concordance with clinica...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.2369

    authors: Dickerson JB,McNeal CJ,Tsai G,Rivera CM,Smith ML,Ohsfeldt RL,Ory MG

    更新日期:2014-04-18 00:00:00

  • Importance of Internet surveillance in public health emergency control and prevention: evidence from a digital epidemiologic study during avian influenza A H7N9 outbreaks.

    abstract:BACKGROUND:Outbreaks of human infection with a new avian influenza A H7N9 virus occurred in China in the spring of 2013. Control and prevention of a new human infectious disease outbreak can be strongly affected by public reaction and social impact through the Internet and social media. OBJECTIVE:This study aimed to i...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.2911

    authors: Gu H,Chen B,Zhu H,Jiang T,Wang X,Chen L,Jiang Z,Zheng D,Jiang J

    更新日期:2014-01-17 00:00:00

  • Evaluating a Web-Based Coaching Program Using Electronic Health Records for Patients With Chronic Obstructive Pulmonary Disease in China: Randomized Controlled Trial.

    abstract:BACKGROUND:Chronic obstructive pulmonary disease (COPD) is now the fourth leading cause of death in the world, and it continues to increase in developing countries. The World Health Organization expects COPD to be the third most common cause of death in the world by 2020. Effective and continuous postdischarge care can...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章,随机对照试验

    doi:10.2196/jmir.6743

    authors: Wang L,He L,Tao Y,Sun L,Zheng H,Zheng Y,Shen Y,Liu S,Zhao Y,Wang Y

    更新日期:2017-07-21 00:00:00

  • Institutionalizing telemedicine applications: the challenge of legitimizing decision-making.

    abstract::During the last decades a variety of telemedicine applications have been trialed worldwide. However, telemedicine is still an example of major potential benefits that have not been fully attained. Health care regulators are still debating why institutionalizing telemedicine applications on a large scale has been so di...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.1669

    authors: Zanaboni P,Lettieri E

    更新日期:2011-09-28 00:00:00

  • Rhetorical Appeals and Tactics in New York Times Comments About Vaccines: Qualitative Analysis.

    abstract:BACKGROUND:Improving persuasion in response to vaccine skepticism is a long-standing problem. Elective nonvaccination emerging from skepticism about vaccine safety and efficacy jeopardizes herd immunity, exposing those who are most vulnerable to the risk of serious diseases. OBJECTIVE:This article analyzes vaccine sen...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/19504

    authors: Gallagher J,Lawrence HY

    更新日期:2020-12-04 00:00:00

  • Health-Specific Information and Communication Technology Use and Its Relationship to Obesity in High-Poverty, Urban Communities: Analysis of a Population-Based Biosocial Survey.

    abstract:BACKGROUND:More than 35% of American adults are obese. For African American and Hispanic adults, as well as individuals residing in poorer or more racially segregated urban neighborhoods, the likelihood of obesity is even higher. Information and communication technologies (ICTs) may substitute for or complement communi...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.5741

    authors: Gopalan A,Makelarski JA,Garibay LB,Escamilla V,Merchant RM,Wolfe MB Sr,Holbrook R,Lindau ST

    更新日期:2016-06-28 00:00:00

  • Influence of Scanner Precision and Analysis Software in Quantifying Three-Dimensional Intraoral Changes: Two-Factor Factorial Experimental Design.

    abstract:BACKGROUND:Three-dimensional scans are increasingly used to quantify biological topographical changes and clinical health outcomes. Traditionally, the use of 3D scans has been limited to specialized centers owing to the high cost of the scanning equipment and the necessity for complex analysis software. Technological a...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/17150

    authors: O'Toole S,Bartlett D,Keeling A,McBride J,Bernabe E,Crins L,Loomans B

    更新日期:2020-11-27 00:00:00

  • Methodological Shortcomings of Wrist-Worn Heart Rate Monitors Validations.

    abstract::Wearable sensor technology could have an important role for clinical research and in delivering health care. Accordingly, such technology should undergo rigorous evaluation prior to market launch, and its performance should be supported by evidence-based marketing claims. Many studies have been published attempting to...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/10108

    authors: Sartor F,Papini G,Cox LGE,Cleland J

    更新日期:2018-07-02 00:00:00

  • Medical Students' Experiences and Outcomes Using a Virtual Human Simulation to Improve Communication Skills: Mixed Methods Study.

    abstract:BACKGROUND:Attending to the wide range of communication behaviors that convey empathy is an important but often underemphasized concept to reduce errors in care, improve patient satisfaction, and improve cancer patient outcomes. A virtual human (VH)-based simulation, MPathic-VR, was developed to train health care provi...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/15459

    authors: Guetterman TC,Sakakibara R,Baireddy S,Kron FW,Scerbo MW,Cleary JF,Fetters MD

    更新日期:2019-11-27 00:00:00

  • Reinforcement Learning for Clinical Decision Support in Critical Care: Comprehensive Review.

    abstract:BACKGROUND:Decision support systems based on reinforcement learning (RL) have been implemented to facilitate the delivery of personalized care. This paper aimed to provide a comprehensive review of RL applications in the critical care setting. OBJECTIVE:This review aimed to survey the literature on RL applications for...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章,评审

    doi:10.2196/18477

    authors: Liu S,See KC,Ngiam KY,Celi LA,Sun X,Feng M

    更新日期:2020-07-20 00:00:00

  • Effectiveness of a Web-Based Computer-Tailored Multiple-Lifestyle Intervention for People Interested in Reducing their Cardiovascular Risk: A Randomized Controlled Trial.

    abstract:BACKGROUND:Web-based computer-tailored interventions for multiple health behaviors can improve the strength of behavior habits in people who want to reduce their cardiovascular risk. Nonetheless, few randomized controlled trials have tested this assumption to date. OBJECTIVE:The study aim was to test an 8-week Web-bas...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章,随机对照试验

    doi:10.2196/jmir.5147

    authors: Storm V,Dörenkämper J,Reinwand DA,Wienert J,De Vries H,Lippke S

    更新日期:2016-04-11 00:00:00

  • Comparison of Nutrigenomics Technology Interface Tools for Consumers and Health Professionals: A Sequential Explanatory Mixed Methods Investigation.

    abstract:BACKGROUND:Nutrigenomics forms the basis of personalized nutrition by customizing an individual's dietary plan based on the integration of life stage, current health status, and genome information. Some common genes that are included in nutrition-based multigene test panels include CYP1A2 (rate of caffeine break down),...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/12580

    authors: Araujo Almeida V,Littlejohn P,Cop I,Brown E,Afroze R,Davison KM

    更新日期:2019-06-28 00:00:00

  • Translating the Burden of Pollen Allergy Into Numbers Using Electronically Generated Symptom Data From the Patient's Hayfever Diary in Austria and Germany: 10-Year Observational Study.

    abstract:BACKGROUND:Pollen allergies affect a significant proportion of the population globally. At present, Web-based tools such as pollen diaries and mobile apps allow for easy and fast documentation of allergic symptoms via the internet. OBJECTIVE:This study aimed to characterize the users of the Patient's Hayfever Diary (P...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/16767

    authors: Bastl K,Bastl M,Bergmann KC,Berger M,Berger U

    更新日期:2020-02-21 00:00:00

  • How is an electronic screening and brief intervention tool on alcohol use received in a student population? A qualitative and quantitative evaluation.

    abstract:BACKGROUND:A previous study among Antwerp college and university students showed that more male (10.2%-11.1%) than female (1.8%-6.2%) students are at risk for problematic alcohol use. The current literature shows promising results in terms of feasibility and effectiveness for the use of brief electronic interventions t...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.1869

    authors: Fraeyman J,Van Royen P,Vriesacker B,De Mey L,Van Hal G

    更新日期:2012-04-23 00:00:00

  • Using Web-Based Questionnaires and Obstetric Records to Assess General Health Characteristics Among Pregnant Women: A Validation Study.

    abstract:BACKGROUND:Self-reported medical history information is included in many studies. However, data on the validity of Web-based questionnaires assessing medical history are scarce. If proven to be valid, Web-based questionnaires may provide researchers with an efficient means to collect data on this parameter in large pop...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.3847

    authors: van Gelder MM,Schouten NP,Merkus PJ,Verhaak CM,Roeleveld N,Roukema J

    更新日期:2015-06-16 00:00:00

  • Artificial Intelligence and Health Technology Assessment: Anticipating a New Level of Complexity.

    abstract::Artificial intelligence (AI) is seen as a strategic lever to improve access, quality, and efficiency of care and services and to build learning and value-based health systems. Many studies have examined the technical performance of AI within an experimental context. These studies provide limited insights into the issu...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/17707

    authors: Alami H,Lehoux P,Auclair Y,de Guise M,Gagnon MP,Shaw J,Roy D,Fleet R,Ag Ahmed MA,Fortin JP

    更新日期:2020-07-07 00:00:00

  • Using New and Emerging Technologies to Identify and Respond to Suicidality Among Help-Seeking Young People: A Cross-Sectional Study.

    abstract:BACKGROUND:Suicidal thoughts are common among young people presenting to face-to-face and online mental health services. The early detection and rapid response to these suicidal thoughts and other suicidal behaviors is a priority for suicide prevention and early intervention efforts internationally. Establishing how be...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.7897

    authors: Iorfino F,Davenport TA,Ospina-Pinillos L,Hermens DF,Cross S,Burns J,Hickie IB

    更新日期:2017-07-12 00:00:00

  • Evaluation of Pollen Apps Forecasts: The Need for Quality Control in an eHealth Service.

    abstract:BACKGROUND:Pollen forecasts are highly valuable for allergen avoidance and thus raising the quality of life of persons concerned by pollen allergies. They are considered as valuable free services for the public. Careful scientific evaluation of pollen forecasts in terms of accurateness and reliability has not been avai...

    journal_title:Journal of medical Internet research

    pub_type: 杂志文章

    doi:10.2196/jmir.7426

    authors: Bastl K,Berger U,Kmenta M

    更新日期:2017-05-08 00:00:00