Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis.

Abstract:

OBJECTIVE:Published clinical trials and high quality peer reviewed medical publications are considered as the main sources of evidence used for synthesizing systematic reviews or practicing Evidence Based Medicine (EBM). Finding all relevant published evidence for a particular medical case is a time and labour intensive task, given the breadth of the biomedical literature. Automatic quantification of conceptual relationships between key clinical evidence within and across publications, despite variations in the expression of clinically-relevant concepts, can help to facilitate synthesis of evidence. In this study, we aim to provide an approach towards expediting evidence synthesis by quantifying semantic similarity of key evidence as expressed in the form of individual sentences. Such semantic textual similarity can be applied as a key approach for supporting selection of related studies. MATERIAL AND METHODS:We propose a generalisable approach for quantifying semantic similarity of clinical evidence in the biomedical literature, specifically considering the similarity of sentences corresponding to a given type of evidence, such as clinical interventions, population information, clinical findings, etc. We develop three sets of generic, ontology-based, and vector-space models of similarity measures that make use of a variety of lexical, conceptual, and contextual information to quantify the similarity of full sentences containing clinical evidence. To understand the impact of different similarity measures on the overall evidence semantic similarity quantification, we provide a comparative analysis of these measures when used as input to an unsupervised linear interpolation and a supervised regression ensemble. In order to provide a reliable test-bed for this experiment, we generate a dataset of 1000 pairs of sentences from biomedical publications that are annotated by ten human experts. We also extend the experiments on an external dataset for further generalisability testing. RESULTS:The combination of all diverse similarity measures showed stronger correlations with the gold standard similarity scores in the dataset than any individual kind of measure. Our approach reached near 0.80 average Pearson correlation across different clinical evidence types using the devised similarity measures. Although they were more effective when combined together, individual generic and vector-space measures also resulted in strong similarity quantification when used in both unsupervised and supervised models. On the external dataset, our similarity measures were highly competitive with the state-of-the-art approaches developed and trained specifically on that dataset for predicting semantic similarity. CONCLUSION:Experimental results showed that the proposed semantic similarity quantification approach can effectively identify related clinical evidence that is reported in the literature. The comparison with a state-of-the-art method demonstrated the effectiveness of the approach, and experiments with an external dataset support its generalisability.

journal_name

J Biomed Inform

authors

Hassanzadeh H,Nguyen A,Verspoor K

doi

10.1016/j.jbi.2019.103321

subject

Has Abstract

pub_date

2019-12-01 00:00:00

pages

103321

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(19)30240-0

journal_volume

100

pub_type

杂志文章
  • Specifying computer-based counseling systems in health care: a new approach to user-interface and interaction design.

    abstract::Computer-based counseling systems in health care play an important role in the toolset available for medical doctors to inform, motivate and challenge their patients according to a well-defined therapeutic goal. The design, development and implementation of such systems require close collaboration between users, i.e. ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.10.005

    authors: Herzberg D,Marsden N,Kübler P,Leonhardt C,Thomanek S,Jung H,Becker A

    更新日期:2009-04-01 00:00:00

  • Does the use of structured reporting improve usability? A comparative evaluation of the usability of two approaches for findings reporting in a large-scale telecardiology context.

    abstract::One of the main reasons that leads to a low adoption rate of telemedicine systems is poor usability. An aspect that influences usability during the reporting of findings is the input mode, e.g., if a free-text (FT) or a structured report (SR) interface is employed. The objective of our study is to compare the usabilit...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.07.002

    authors: Lacerda TC,von Wangenheim CG,von Wangenheim A,Giuliano I

    更新日期:2014-12-01 00:00:00

  • Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

    abstract::We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2017.07.012

    authors: Kreimeyer K,Foster M,Pandey A,Arya N,Halford G,Jones SF,Forshee R,Walderhaug M,Botsis T

    更新日期:2017-09-01 00:00:00

  • Phenotypic similarity for rare disease: Ciliopathy diagnoses and subtyping.

    abstract::Rare diseases are often hard and long to be diagnosed precisely, and most of them lack approved treatment. For some complex rare diseases, precision medicine approach is further required to stratify patients into homogeneous subgroups based on the clinical, biological or molecular features. In such situation, deep phe...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103308

    authors: Chen X,Garcelon N,Neuraz A,Billot K,Lelarge M,Bonald T,Garcia H,Martin Y,Benoit V,Vincent M,Faour H,Douillet M,Lyonnet S,Saunier S,Burgun A

    更新日期:2019-12-01 00:00:00

  • Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches.

    abstract::The accurate diagnosis of heart failure in emergency room patients is quite important, but can also be quite difficult due to our insufficient understanding of the characteristics of heart failure. The purpose of this study is to design a decision-making model that provides critical factors and knowledge associated wi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.04.013

    authors: Son CS,Kim YN,Kim HS,Park HS,Kim MS

    更新日期:2012-10-01 00:00:00

  • BioLattice: a framework for the biological interpretation of microarray gene expression data using concept lattice analysis.

    abstract:MOTIVATION:A challenge in microarray data analysis is to interpret observed changes in terms of biological properties and relationships. One powerful approach is to make associations of gene expression clusters with biomedical ontologies and/or biological pathways. However, this approach evaluates only one cluster at a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2007.10.003

    authors: Kim J,Chung HJ,Jung Y,Kim KK,Kim JH

    更新日期:2008-04-01 00:00:00

  • Lexical patterns, features and knowledge resources for coreference resolution in clinical notes.

    abstract::Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general-purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approa...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.02.012

    authors: Gooch P,Roudsari A

    更新日期:2012-10-01 00:00:00

  • Enhancing phylogeography by improving geographical information from GenBank.

    abstract::Phylogeography is a field that focuses on the geographical lineages of species such as vertebrates or viruses. Here, geographical data, such as location of a species or viral host is as important as the sequence information extracted from the species. Together, this information can help illustrate the migration of the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.06.005

    authors: Scotch M,Sarkar IN,Mei C,Leaman R,Cheung KH,Ortiz P,Singraur A,Gonzalez G

    更新日期:2011-12-01 00:00:00

  • Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature.

    abstract::With a rapid progress in the field, a great many fMRI studies are published every year, to the extent that it is now becoming difficult for researchers to keep up with the literature, since reading papers is extremely time-consuming and labor-intensive. Thus, automatic information extraction has become an important is...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.04.003

    authors: Hsiao MY,Chen CC,Chen JH

    更新日期:2009-10-01 00:00:00

  • Automated population of an i2b2 clinical data warehouse from an openEHR-based data repository.

    abstract:BACKGROUND:Detailed Clinical Model (DCM) approaches have recently seen wider adoption. More specifically, openEHR-based application systems are now used in production in several countries, serving diverse fields of application such as health information exchange, clinical registries and electronic medical record system...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.08.007

    authors: Haarbrandt B,Tute E,Marschollek M

    更新日期:2016-10-01 00:00:00

  • Clinical coverage of an archetype repository over SNOMED-CT.

    abstract::Clinical archetypes provide a means for health professionals to design what should be communicated as part of an Electronic Health Record (EHR). An ever-growing number of archetype definitions follow this health information modelling approach, and this international archetype resource will eventually cover a large num...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.12.001

    authors: Yu S,Berry D,Bisbal J

    更新日期:2012-06-01 00:00:00

  • A flexible approach to distributed data anonymization.

    abstract::Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.12.002

    authors: Kohlmayer F,Prasser F,Eckert C,Kuhn KA

    更新日期:2014-08-01 00:00:00

  • A genetic algorithm-support vector machine method with parameter optimization for selecting the tag SNPs.

    abstract::SNPs (Single Nucleotide Polymorphisms) include millions of changes in human genome, and therefore, are promising tools for disease-gene association studies. However, this kind of studies is constrained by the high expense of genotyping millions of SNPs. For this reason, it is required to obtain a suitable subset of SN...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.12.002

    authors: Ilhan I,Tezel G

    更新日期:2013-04-01 00:00:00

  • Description of a method to support public health information management: organizational network analysis.

    abstract::In this case study, we describe a method that has potential to provide systematic support for public health information management. Public health agencies depend on specialized information that travels throughout an organization via communication networks among employees. Interactions that occur within these networks ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2006.09.004

    authors: Merrill J,Bakken S,Rockoff M,Gebbie K,Carley KM

    更新日期:2007-08-01 00:00:00

  • Benchmarking relief-based feature selection methods for bioinformatics data mining.

    abstract::Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. 'omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. g...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.015

    authors: Urbanowicz RJ,Olson RS,Schmitt P,Meeker M,Moore JH

    更新日期:2018-09-01 00:00:00

  • Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients.

    abstract:OBJECTIVE:Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103184

    authors: Coquet J,Bozkurt S,Kan KM,Ferrari MK,Blayney DW,Brooks JD,Hernandez-Boussard T

    更新日期:2019-06-01 00:00:00

  • Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    abstract::While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architec...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.10.015

    authors: Schulz WL,Nelson BG,Felker DK,Durant TJS,Torres R

    更新日期:2016-12-01 00:00:00

  • Unstructured medical image query using big data - An epilepsy case study.

    abstract::Big data technologies are critical to the medical field which requires new frameworks to leverage them. Such frameworks would benefit medical experts to test hypotheses by querying huge volumes of unstructured medical data to provide better patient care. The objective of this work is to implement and examine the feasi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.12.005

    authors: Istephan S,Siadat MR

    更新日期:2016-02-01 00:00:00

  • ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.

    abstract::Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biome...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.10.001

    authors: Kamdar MR,Zeginis D,Hasnain A,Decker S,Deus HF

    更新日期:2014-02-01 00:00:00

  • HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    abstract::The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function....

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.01.008

    authors: O'Driscoll A,Belogrudov V,Carroll J,Kropp K,Walsh P,Ghazal P,Sleator RD

    更新日期:2015-04-01 00:00:00

  • Comparison of reversible-jump Markov-chain-Monte-Carlo learning approach with other methods for missing enzyme identification.

    abstract::Computational identification of missing enzymes plays a significant role in accurate and complete reconstruction of metabolic network for both newly sequenced and well-studied organisms. For a metabolic reaction, given a set of candidate enzymes identified according to certain biological evidences, a powerful mathemat...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2007.09.002

    authors: Geng B,Zhou X,Zhu J,Hung YS,Wong ST

    更新日期:2008-04-01 00:00:00

  • Classification models for the prediction of clinicians' information needs.

    abstract:OBJECTIVE:Clinicians face numerous information needs during patient care activities and most of these needs are not met. Infobuttons are information retrieval tools that help clinicians to fulfill their information needs by providing links to on-line health information resources from within an electronic medical record...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.07.001

    authors: Del Fiol G,Haug PJ

    更新日期:2009-02-01 00:00:00

  • Genome-wide analysis of multi-view data of miRNA-seq to identify miRNA biomarkers for stomach cancer.

    abstract::Stomach cancer is one of the leading causes of cancer-related deaths worldwide. More than 80% diagnosis of this cancer occur at later stages leading to low 5-year survival rate. This emphasizes the need to have better prognostic techniques for stomach cancer. In this regard, the Next-Generation Sequencing of whole gen...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103254

    authors: Pant N,Rakshit S,Paul S,Saha I

    更新日期:2019-09-01 00:00:00

  • All-IP wireless sensor networks for real-time patient monitoring.

    abstract::This paper proposes the all-IP WSNs (wireless sensor networks) for real-time patient monitoring. In this paper, the all-IP WSN architecture based on gateway trees is proposed and the hierarchical address structure is presented. Based on this architecture, the all-IP WSN can perform routing without route discovery. Mor...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.08.002

    authors: Wang X,Le D,Cheng H,Xie C

    更新日期:2014-12-01 00:00:00

  • Characterizing and optimizing human anticancer drug targets based on topological properties in the context of biological pathways.

    abstract::One of the challenging problems in drug discovery is to identify the novel targets for drugs. Most of the traditional methods for drug targets optimization focused on identifying the particular families of "druggable targets", but ignored their topological properties based on the biological pathways. In this study, we...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.02.007

    authors: Zhang J,Wang Y,Shang D,Yu F,Liu W,Zhang Y,Feng C,Wang Q,Xu Y,Liu Y,Bai X,Li X,Li C

    更新日期:2015-04-01 00:00:00

  • Unleashing genotypes in epidemiology - A novel method for managing high throughput information.

    abstract::The large amounts of data generated when high-throughput genotyping methods are used in large-scale epidemiological studies (>10,000 participants) present an enormous challenge to researchers in terms of structured data management. In order to face these challenges, a system has been designed and implemented where gen...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.07.005

    authors: Olund G,Brinne A,Lindqvist P,Litton JE

    更新日期:2009-12-01 00:00:00

  • Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences.

    abstract::Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently r...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.08.009

    authors: Xu H,AbdelRahman S,Lu Y,Denny JC,Doan S

    更新日期:2011-12-01 00:00:00

  • A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.

    abstract:RATIONALE:Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. METHODS:The tw...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.07.019

    authors: Redd AM,Gundlapalli AV,Divita G,Carter ME,Tran LT,Samore MH

    更新日期:2017-07-01 00:00:00

  • The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships.

    abstract::Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.04.004

    authors: van Mulligen EM,Fourrier-Reglat A,Gurwitz D,Molokhia M,Nieto A,Trifiro G,Kors JA,Furlong LI

    更新日期:2012-10-01 00:00:00

  • Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines.

    abstract::Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline m...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103525

    authors: Faris H,Habib M,Faris M,Alomari M,Alomari A

    更新日期:2020-09-01 00:00:00