Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Abstract:

:Neural embeddings are a popular set of methods for representing words, phrases or text as a low dimensional vector (typically 50-500 dimensions). However, it is difficult to interpret these dimensions in a meaningful manner, and creating neural embeddings requires extensive training and tuning of multiple parameters and hyperparameters. We present here a simple unsupervised method for representing words, phrases or text as a low dimensional vector, in which the meaning and relative importance of dimensions is transparent to inspection. We have created a near-comprehensive vector representation of words, and selected bigrams, trigrams and abbreviations, using the set of titles and abstracts in PubMed as a corpus. This vector is used to create several novel implicit word-word and text-text similarity metrics. The implicit word-word similarity metrics correlate well with human judgement of word pair similarity and relatedness, and outperform or equal all other reported methods on a variety of biomedical benchmarks, including several implementations of neural embeddings trained on PubMed corpora. Our implicit word-word metrics capture different aspects of word-word relatedness than word2vec-based metrics and are only partially correlated (rho = 0.5-0.8 depending on task and corpus). The vector representations of words, bigrams, trigrams, abbreviations, and PubMed title + abstracts are all publicly available from http://arrowsmith.psych.uic.edu/arrowsmith_uic/word_similarity_metrics.html for release under CC-BY-NC license. Several public web query interfaces are also available at the same site, including one which allows the user to specify a given word and view its most closely related terms according to direct co-occurrence as well as different implicit similarity metrics.

journal_name

J Biomed Inform

authors

Smalheiser NR,Cohen AM,Bonifield G

doi

10.1016/j.jbi.2019.103096

subject

Has Abstract

pub_date

2019-02-01 00:00:00

pages

103096

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(19)30006-1

journal_volume

90

pub_type

杂志文章
  • TRAK ontology: defining standard care for the rehabilitation of knee conditions.

    abstract::In this paper we discuss the design and development of TRAK (Taxonomy for RehAbilitation of Knee conditions), an ontology that formally models information relevant for the rehabilitation of knee conditions. TRAK provides the framework that can be used to collect coded data in sufficient detail to support epidemiologic...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.04.009

    authors: Button K,van Deursen RW,Soldatova L,Spasić I

    更新日期:2013-08-01 00:00:00

  • Multi-step ahead meningitis case forecasting based on decomposition and multi-objective optimization methods.

    abstract::Epidemiological time series forecasting plays an important role in health public systems, due to its ability to allow managers to develop strategic planning to avoid possible epidemics. In this paper, a hybrid learning framework is developed to forecast multi-step-ahead (one, two, and three-month-ahead) meningitis cas...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103575

    authors: Ribeiro MHDM,Mariani VC,Coelho LDS

    更新日期:2020-11-01 00:00:00

  • Automatic signal extraction, prioritizing and filtering approaches in detecting post-marketing cardiovascular events associated with targeted cancer drugs from the FDA Adverse Event Reporting System (FAERS).

    abstract:OBJECTIVE:Targeted drugs dramatically improve the treatment outcomes in cancer patients; however, these innovative drugs are often associated with unexpectedly high cardiovascular toxicity. Currently, cardiovascular safety represents both a challenging issue for drug developers, regulators, researchers, and clinicians ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.10.008

    authors: Xu R,Wang Q

    更新日期:2014-02-01 00:00:00

  • Introducing RFID technology in dynamic and time-critical medical settings: requirements and challenges.

    abstract::We describe the process of introducing RFID technology in the trauma bay of a trauma center to support fast-paced and complex teamwork during resuscitation. We analyzed trauma resuscitation tasks, photographs of medical tools, and videos of simulated resuscitations to gain insight into resuscitation tasks, work practi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.04.003

    authors: Parlak S,Sarcevic A,Marsic I,Burd RS

    更新日期:2012-10-01 00:00:00

  • Predicting severe clinical events by learning about life-saving actions and outcomes using distant supervision.

    abstract::Medical error is a leading cause of patient death in the United States. Among the different types of medical errors, harm to patients caused by doctors missing early signs of deterioration is especially challenging to address due to the heterogeneity of patients' physiological patterns. In this study, we implemented r...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103425

    authors: Lee DH,Yetisgen M,Vanderwende L,Horvitz E

    更新日期:2020-07-01 00:00:00

  • Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization.

    abstract:OBJECTIVE:Our aim is to extract clinically-meaningful phenotypes from longitudinal electronic health records (EHRs) of medically-complex children. This is a fragile set of patients consuming a disproportionate amount of pediatric care resources but who often end up with sub-optimal clinical outcome. The rise in availab...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103125

    authors: Perros I,Papalexakis EE,Vuduc R,Searles E,Sun J

    更新日期:2019-05-01 00:00:00

  • Chronic disease modeling and simulation software.

    abstract::Computers allow describing the progress of a disease using computerized models. These models allow aggregating expert and clinical information to allow researchers and decision makers to forecast disease progression. To make this forecast reliable, good models and therefore good modeling tools are required. This paper...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.06.003

    authors: Barhak J,Isaman DJ,Ye W,Lee D

    更新日期:2010-10-01 00:00:00

  • Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology.

    abstract::The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of acti...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.07.013

    authors: Zheng L,Yumak H,Chen L,Ochs C,Geller J,Kapusnik-Uner J,Perl Y

    更新日期:2017-09-01 00:00:00

  • Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function.

    abstract::Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) wit...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.01.009

    authors: Taslimitehrani V,Dong G,Pereira NL,Panahiazar M,Pathak J

    更新日期:2016-04-01 00:00:00

  • A framework for modeling health behavior protocols and their linkage to behavioral theory.

    abstract::With the rise in chronic, behavior-related disease, computerized behavioral protocols (CBPs) that help individuals improve behaviors have the potential to play an increasing role in the future health of society. To be effective and widely used CBPs should be based on accepted behavioral theory. However, designing CBPs...

    journal_title:Journal of biomedical informatics

    pub_type: 临床试验,杂志文章

    doi:10.1016/j.jbi.2004.12.001

    authors: Lenert L,Norman GJ,Mailhot M,Patrick K

    更新日期:2005-08-01 00:00:00

  • LGscore: A method to identify disease-related genes using biological literature and Google data.

    abstract::Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which iden...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.01.003

    authors: Kim J,Kim H,Yoon Y,Park S

    更新日期:2015-04-01 00:00:00

  • Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project.

    abstract::The Strategic Health IT Advanced Research Projects (SHARP) Program, established by the Office of the National Coordinator for Health Information Technology in 2010 supports research findings that remove barriers for increased adoption of health IT. The improvements envisioned by the SHARP Area 4 Consortium (SHARPn) wi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.01.009

    authors: Rea S,Pathak J,Savova G,Oniki TA,Westberg L,Beebe CE,Tao C,Parker CG,Haug PJ,Huff SM,Chute CG

    更新日期:2012-08-01 00:00:00

  • Tracking a moving user in indoor environments using Bluetooth low energy beacons.

    abstract:BACKGROUND:Bluetooth low energy (BLE) beacons have been used to track the locations of individuals in indoor environments for clinical applications such as workflow analysis and infectious disease modelling. Most current approaches use the received signal strength indicator (RSSI) to track locations. When using the RSS...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103288

    authors: Surian D,Kim V,Menon R,Dunn AG,Sintchenko V,Coiera E

    更新日期:2019-10-01 00:00:00

  • All-IP wireless sensor networks for real-time patient monitoring.

    abstract::This paper proposes the all-IP WSNs (wireless sensor networks) for real-time patient monitoring. In this paper, the all-IP WSN architecture based on gateway trees is proposed and the hierarchical address structure is presented. Based on this architecture, the all-IP WSN can perform routing without route discovery. Mor...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.08.002

    authors: Wang X,Le D,Cheng H,Xie C

    更新日期:2014-12-01 00:00:00

  • Wisdom of artificial crowds feature selection in untargeted metabolomics: An application to the development of a blood-based diagnostic test for thrombotic myocardial infarction.

    abstract:INTRODUCTION:Heart disease remains a leading cause of global mortality. While acute myocardial infarction (colloquially: heart attack), has multiple proximate causes, proximate etiology cannot be determined by a blood-based diagnostic test. We enrolled a suitable patient cohort and conducted a non-targeted quantificati...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,多中心研究

    doi:10.1016/j.jbi.2018.03.007

    authors: Trainor PJ,Yampolskiy RV,DeFilippis AP

    更新日期:2018-05-01 00:00:00

  • RedMed: Extending drug lexicons for social media applications.

    abstract::Social media has been identified as a promising potential source of information for pharmacovigilance. The adoption of social media data has been hindered by the massive and noisy nature of the data. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103307

    authors: Lavertu A,Altman RB

    更新日期:2019-11-01 00:00:00

  • Evaluation of an Enhanced Role-Based Access Control model to manage information access in collaborative processes for a statewide clinical education program.

    abstract:BACKGROUND:Managing information access in collaborative processes is a critical requirement to team-based biomedical research, clinical education, and patient care. We have previously developed a computation model, Enhanced Role-Based Access Control (EnhancedRBAC), and applied it to coordinate information access in the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.11.007

    authors: Le XH,Doll T,Barbosu M,Luque A,Wang D

    更新日期:2014-08-01 00:00:00

  • Modeling individual differences: A case study of the application of system identification for personalizing a physical activity intervention.

    abstract:BACKGROUND:Control systems engineering methods, particularly, system identification (system ID), offer an idiographic (i.e., person-specific) approach to develop dynamic models of physical activity (PA) that can be used to personalize interventions in a systematic, scalable way. The purpose of this work is to: (1) appl...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.01.010

    authors: Phatak SS,Freigoun MT,Martín CA,Rivera DE,Korinek EV,Adams MA,Buman MP,Klasnja P,Hekler EB

    更新日期:2018-03-01 00:00:00

  • A knowledge-based system to find over-the-counter medicines for self-medication.

    abstract::This study developed a medicine query system based on Semantic Web and open data especially for self-medication users to search over-the-counter (OTC) medicines. Most existing medicine query systems are based on keyword searches. If users are uncertain about the exact search words, these query systems do not offer eff...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103504

    authors: Sung HY,Chi YL

    更新日期:2020-08-01 00:00:00

  • Description of a method to support public health information management: organizational network analysis.

    abstract::In this case study, we describe a method that has potential to provide systematic support for public health information management. Public health agencies depend on specialized information that travels throughout an organization via communication networks among employees. Interactions that occur within these networks ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2006.09.004

    authors: Merrill J,Bakken S,Rockoff M,Gebbie K,Carley KM

    更新日期:2007-08-01 00:00:00

  • Prediction of influenza vaccination outcome by neural networks and logistic regression.

    abstract::The major challenge in influenza vaccination is to predict vaccine efficacy. The purpose of this study was to design a model to enable successful prediction of the outcome of influenza vaccination based on real historical medical data. A non-linear neural network approach was used, and its performance compared to logi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.04.011

    authors: Trtica-Majnaric L,Zekic-Susac M,Sarlija N,Vitale B

    更新日期:2010-10-01 00:00:00

  • An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

    abstract:OBJECTIVE:Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes. MATERIALS AND METHODS:Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into lo...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.09.005

    authors: Wang JK,Hom J,Balasubramanian S,Schuler A,Shah NH,Goldstein MK,Baiocchi MTM,Chen JH

    更新日期:2018-10-01 00:00:00

  • Health information technology adoption: Understanding research protocols and outcome measurements for IT interventions in health care.

    abstract:OBJECTIVE:To classify and characterize the variables commonly used to measure the impact of Information Technology (IT) adoption in health care, as well as settings and IT interventions tested, and to guide future research. MATERIALS AND METHODS:We conducted a descriptive study screening a sample of 236 studies from a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.07.018

    authors: Colicchio TK,Facelli JC,Del Fiol G,Scammon DL,Bowes WA 3rd,Narus SP

    更新日期:2016-10-01 00:00:00

  • A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts.

    abstract::An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms "paroxysmal cough" and "nocturnal cough" might be aggregated in a kidney disease study, but should be left separate in a pneumo...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.01.002

    authors: Pivovarov R,Elhadad N

    更新日期:2012-06-01 00:00:00

  • Unleashing genotypes in epidemiology - A novel method for managing high throughput information.

    abstract::The large amounts of data generated when high-throughput genotyping methods are used in large-scale epidemiological studies (>10,000 participants) present an enormous challenge to researchers in terms of structured data management. In order to face these challenges, a system has been designed and implemented where gen...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.07.005

    authors: Olund G,Brinne A,Lindqvist P,Litton JE

    更新日期:2009-12-01 00:00:00

  • The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships.

    abstract::Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.04.004

    authors: van Mulligen EM,Fourrier-Reglat A,Gurwitz D,Molokhia M,Nieto A,Trifiro G,Kors JA,Furlong LI

    更新日期:2012-10-01 00:00:00

  • The Analytic Information Warehouse (AIW): a platform for analytics using electronic health record data.

    abstract:OBJECTIVE:To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations. MATERIALS AND METHODS:We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transfor...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.01.005

    authors: Post AR,Kurc T,Cholleti S,Gao J,Lin X,Bornstein W,Cantrell D,Levine D,Hohmann S,Saltz JH

    更新日期:2013-06-01 00:00:00

  • A Bayesian system to detect and characterize overlapping outbreaks.

    abstract::Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.08.003

    authors: Aronis JM,Millett NE,Wagner MM,Tsui F,Ye Y,Ferraro JP,Haug PJ,Gesteland PH,Cooper GF

    更新日期:2017-09-01 00:00:00

  • Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics.

    abstract::Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.11.001

    authors: Dao TT,Hoang TN,Ta XH,Tho MC

    更新日期:2013-02-01 00:00:00

  • Colorado Care Tablet: the design of an interoperable Personal Health Application to help older adults with multimorbidity manage their medications.

    abstract::Medication errors are common and cause serious health issues during care transitions, particularly for older adults with multiple chronic conditions. In this paper, we discuss the design and evaluation of the Colorado Care Tablet, a Personal Health Application (PHA) that helps older adults and their lay caregivers man...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.05.007

    authors: Siek KA,Ross SE,Khan DU,Haverhals LM,Cali SR,Meyers J

    更新日期:2010-10-01 00:00:00