当前位置： SCI文献检索 > JOURNAL OF BIOMEDICAL INFORMATICS期刊下所有文献 > A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.

A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.

Abstract：

RATIONALE:Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. METHODS:The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. RESULTS:The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. DISCUSSION:The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up.

journal_name

J Biomed Inform

journal_title

Journal of biomedical informatics

authors

Redd AM,Gundlapalli AV,Divita G,Carter ME,Tran LT,Samore MH

doi

10.1016/j.jbi.2016.07.019

subject

Has Abstract

pub_date

2017-07-01 00:00:00

pages

S68-S76

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(16)30073-9

journal_volume

71S

pub_type

杂志文章

在线工具

A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.