Using natural language processing to extract mammographic findings.

Abstract:

OBJECTIVE:Structured data on mammographic findings are difficult to obtain without manual review. We developed and evaluated a rule-based natural language processing (NLP) system to extract mammographic findings from free-text mammography reports. MATERIALS AND METHODS:The NLP system extracted four mammographic findings: mass, calcification, asymmetry, and architectural distortion, using a dictionary look-up method on 93,705 mammography reports from Group Health. Status annotations and anatomical location annotation were associated to each NLP detected finding through association rules. After excluding negated, uncertain, and historical findings, affirmative mentions of detected findings were summarized. Confidence flags were developed to denote reports with highly confident NLP results and reports with possible NLP errors. A random sample of 100 reports was manually abstracted to evaluate the accuracy of the system. RESULTS:The NLP system correctly coded 96-99 out of our sample of 100 reports depending on findings. Measures of sensitivity, specificity and negative predictive values exceeded 0.92 for all findings. Positive predictive values were relatively low for some findings due to their low prevalence. DISCUSSION:Our NLP system was implemented entirely in SAS Base, which makes it portable and easy to implement. It performed reasonably well with multiple applications, such as using confidence flags as a filter to improve the efficiency of manual review. Refinements of library and association rules, and testing on more diverse samples may further improve its performance. CONCLUSION:Our NLP system successfully extracts clinically useful information from mammography reports. Moreover, SAS is a feasible platform for implementing NLP algorithms.

journal_name

J Biomed Inform

authors

Gao H,Aiello Bowles EJ,Carrell D,Buist DS

doi

10.1016/j.jbi.2015.01.010

subject

Has Abstract

pub_date

2015-04-01 00:00:00

pages

77-84

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(15)00012-X

journal_volume

54

pub_type

杂志文章
  • Lessons learnt from the DDIExtraction-2013 Shared Task.

    abstract::The DDIExtraction Shared Task 2013 is the second edition of the DDIExtraction Shared Task series, a community-wide effort to promote the implementation and comparative assessment of natural language processing (NLP) techniques in the field of the pharmacovigilance domain, in particular, to address the extraction of dr...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.05.007

    authors: Segura-Bedmar I,Martínez P,Herrero-Zazo M

    更新日期:2014-10-01 00:00:00

  • A survey on literature based discovery approaches in biomedical domain.

    abstract::Literature Based Discovery (LBD) refers to the problem of inferring new and interesting knowledge by logically connecting independent fragments of information units through explicit or implicit means. This area of research, which incorporates techniques from Natural Language Processing (NLP), Information Retrieval and...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103141

    authors: Gopalakrishnan V,Jha K,Jin W,Zhang A

    更新日期:2019-05-01 00:00:00

  • Computer mediated reality technologies: A conceptual framework and survey of the state of the art in healthcare intervention systems.

    abstract:INTRODUCTION:The trend of an ageing and growing world population, particularly in developed countries, is expected to continue for decades to come causing an increase in demand for healthcare resources and services. Consequently, demand is growing faster than rises in funding. The UK government, in partnership with the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103102

    authors: Ibrahim Z,Money AG

    更新日期:2019-02-01 00:00:00

  • A survey on single and multi omics data mining methods in cancer data classification.

    abstract::Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2020.103466

    authors: Momeni Z,Hassanzadeh E,Saniee Abadeh M,Bellazzi R

    更新日期:2020-07-01 00:00:00

  • A term extraction tool for expanding content in the domain of functioning, disability, and health: proof of concept.

    abstract::Among the challenges in developing terminology systems is providing complete content coverage of specialized subject fields. This paper reports on a term extraction tool designed for the development and expansion of terminology systems concerned with functioning, disability, and health. Content relevant to this domain...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2003.09.005

    authors: Harris MR,Savova GK,Johnson TM,Chute CG

    更新日期:2003-08-01 00:00:00

  • Molecular property diagnostic suite for diabetes mellitus (MPDSDM): An integrated web portal for drug discovery and drug repurposing.

    abstract::Molecular Property Diagnostic Suite - Diabetes Mellitus (MPDSDM) is a Galaxy-based, open source disease-specific web portal for diabetes. It consists of three modules namely (i) data library (ii) data processing and (iii) data analysis tools. The data library (target library and literature) module provide extensive an...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.08.003

    authors: Gaur AS,Nagamani S,Tanneeru K,Druzhilovskiy D,Rudik A,Poroikov V,Narahari Sastry G

    更新日期:2018-09-01 00:00:00

  • A hybrid of whale optimization and late acceptance hill climbing based imputation to enhance classification performance in electronic health records.

    abstract::Electronic health records (EHR) are a major source of information in biomedical informatics. Yet, missing values are prominent characteristics of EHR. Prediction on dataset with missing values results in inaccurate inferences. Nearest neighbour imputation based on lazy learning approach is a proven technique for missi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103190

    authors: Nagarajan G,Dhinesh Babu LD

    更新日期:2019-06-01 00:00:00

  • An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.

    abstract:OBJECTIVE:Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes. MATERIALS AND METHODS:Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n = 1822) were stratified into lo...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.09.005

    authors: Wang JK,Hom J,Balasubramanian S,Schuler A,Shah NH,Goldstein MK,Baiocchi MTM,Chen JH

    更新日期:2018-10-01 00:00:00

  • Biomedical ontologies: what part-of is and isn't.

    abstract::Mereological relations such as part-of and its inverse has-part are fundamental to the description of the structure of living organisms. Whereas classical mereology focuses on individual entities, mereological relations in biomedical ontologies are generally asserted between classes of individuals. In general, this pr...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2005.11.003

    authors: Schulz S,Kumar A,Bittner T

    更新日期:2006-06-01 00:00:00

  • Monitoring Obstructive Sleep Apnea by means of a real-time mobile system based on the automatic extraction of sets of rules through Differential Evolution.

    abstract::Real-time Obstructive Sleep Apnea (OSA) episode detection and monitoring are important for society in terms of an improvement in the health of the general population and of a reduction in mortality and healthcare costs. Currently, to diagnose OSA patients undergo PolySomnoGraphy (PSG), a complicated and invasive test ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.02.015

    authors: Sannino G,De Falco I,De Pietro G

    更新日期:2014-06-01 00:00:00

  • Computing with evidence Part II: An evidential approach to predicting metabolic drug-drug interactions.

    abstract::We describe a novel experiment that we conducted with the Drug Interaction Knowledge-base (DIKB) to determine which combinations of evidence enable a rule-based theory of metabolic drug-drug interactions to make the most optimal set of predictions. The focus of the experiment was a group of 16 drugs including six memb...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.05.010

    authors: Boyce R,Collins C,Horn J,Kalet I

    更新日期:2009-12-01 00:00:00

  • Wisdom of artificial crowds feature selection in untargeted metabolomics: An application to the development of a blood-based diagnostic test for thrombotic myocardial infarction.

    abstract:INTRODUCTION:Heart disease remains a leading cause of global mortality. While acute myocardial infarction (colloquially: heart attack), has multiple proximate causes, proximate etiology cannot be determined by a blood-based diagnostic test. We enrolled a suitable patient cohort and conducted a non-targeted quantificati...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,多中心研究

    doi:10.1016/j.jbi.2018.03.007

    authors: Trainor PJ,Yampolskiy RV,DeFilippis AP

    更新日期:2018-05-01 00:00:00

  • GLIF3: a representation format for sharable computer-interpretable clinical practice guidelines.

    abstract::The Guideline Interchange Format (GLIF) is a model for representation of sharable computer-interpretable guidelines. The current version of GLIF (GLIF3) is a substantial update and enhancement of the model since the previous version (GLIF2). GLIF3 enables encoding of a guideline at three levels: a conceptual flowchart...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2004.04.002

    authors: Boxwala AA,Peleg M,Tu S,Ogunyemi O,Zeng QT,Wang D,Patel VL,Greenes RA,Shortliffe EH

    更新日期:2004-06-01 00:00:00

  • Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function.

    abstract::Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) wit...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.01.009

    authors: Taslimitehrani V,Dong G,Pereira NL,Panahiazar M,Pathak J

    更新日期:2016-04-01 00:00:00

  • Multi-step ahead meningitis case forecasting based on decomposition and multi-objective optimization methods.

    abstract::Epidemiological time series forecasting plays an important role in health public systems, due to its ability to allow managers to develop strategic planning to avoid possible epidemics. In this paper, a hybrid learning framework is developed to forecast multi-step-ahead (one, two, and three-month-ahead) meningitis cas...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103575

    authors: Ribeiro MHDM,Mariani VC,Coelho LDS

    更新日期:2020-11-01 00:00:00

  • Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    abstract::Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.03.002

    authors: Kumar M,Rath NK,Rath SK

    更新日期:2016-04-01 00:00:00

  • Evaluating performance of early warning indices to predict physiological instabilities.

    abstract::Patient monitoring algorithms that analyze multiple features from physiological signals can produce an index that serves as a predictive or prognostic measure for a specific critical health event or physiological instability. Classical detection metrics such as sensitivity and positive predictive value are often used ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.09.008

    authors: Scully CG,Daluwatte C

    更新日期:2017-11-01 00:00:00

  • Development of the nursing problem list subset of SNOMED CT®.

    abstract:OBJECTIVE:To create an interoperable set of nursing diagnoses for use in the patient problem list in the EHR to support interoperability. DESIGN:Queries for nursing diagnostic concepts were executed against the UMLS Metathesaurus to retrieve all nursing diagnoses across four nursing terminologies where the concept was...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.12.003

    authors: Matney SA,Warren JJ,Evans JL,Kim TY,Coenen A,Auld VA

    更新日期:2012-08-01 00:00:00

  • Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology.

    abstract::The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of acti...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.07.013

    authors: Zheng L,Yumak H,Chen L,Ochs C,Geller J,Kapusnik-Uner J,Perl Y

    更新日期:2017-09-01 00:00:00

  • Vaidurya: a multiple-ontology, concept-based, context-sensitive clinical-guideline search engine.

    abstract::We designed and implemented a generic search engine (Vaidurya), as part of our Digital clinical-Guideline Library (DeGeL) framework. Two search methods were implemented in addition to full-text search: (1) concept-based search, which relies on pre-indexing the guidelines in a clinically meaningful fashion, and (2) con...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.07.003

    authors: Moskovitch R,Shahar Y

    更新日期:2009-02-01 00:00:00

  • A comprehensive review of feature based methods for drug target interaction prediction.

    abstract::Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods o...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103159

    authors: Sachdev K,Gupta MK

    更新日期:2019-05-01 00:00:00

  • Serum cancer biomarker discovery through analysis of gene expression data sets across multiple tumor and normal tissues.

    abstract::The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candidates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metab...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.08.010

    authors: Jin H,Lee HC,Park SS,Jeong YS,Kim SY

    更新日期:2011-12-01 00:00:00

  • Continuous time Bayesian network classifiers.

    abstract::The class of continuous time Bayesian network classifiers is defined; it solves the problem of supervised classification on multivariate trajectories evolving in continuous time. The trajectory consists of the values of discrete attributes that are measured in continuous time, while the predicted class is expected to ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.07.002

    authors: Stella F,Amer Y

    更新日期:2012-12-01 00:00:00

  • DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx.

    abstract::In Electronic Health Records (EHRs), much of valuable information regarding patients' conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.02.010

    authors: Mehrabi S,Krishnan A,Sohn S,Roch AM,Schmidt H,Kesterson J,Beesley C,Dexter P,Max Schmidt C,Liu H,Palakal M

    更新日期:2015-04-01 00:00:00

  • Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project.

    abstract::The Strategic Health IT Advanced Research Projects (SHARP) Program, established by the Office of the National Coordinator for Health Information Technology in 2010 supports research findings that remove barriers for increased adoption of health IT. The improvements envisioned by the SHARP Area 4 Consortium (SHARPn) wi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.01.009

    authors: Rea S,Pathak J,Savova G,Oniki TA,Westberg L,Beebe CE,Tao C,Parker CG,Haug PJ,Huff SM,Chute CG

    更新日期:2012-08-01 00:00:00

  • Personal discovery in diabetes self-management: Discovering cause and effect using self-monitoring data.

    abstract:OBJECTIVE:To outline new design directions for informatics solutions that facilitate personal discovery with self-monitoring data. We investigate this question in the context of chronic disease self-management with the focus on type 2 diabetes. MATERIALS AND METHODS:We conducted an observational qualitative study of d...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.09.013

    authors: Mamykina L,Heitkemper EM,Smaldone AM,Kukafka R,Cole-Lewis HJ,Davidson PG,Mynatt ED,Cassells A,Tobin JN,Hripcsak G

    更新日期:2017-12-01 00:00:00

  • ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.

    abstract::Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biome...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.10.001

    authors: Kamdar MR,Zeginis D,Hasnain A,Decker S,Deus HF

    更新日期:2014-02-01 00:00:00

  • A novel web informatics approach for automated surveillance of cancer mortality trends.

    abstract::Cancer surveillance data are collected every year in the United States via the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). General trends are closely monitored to measure the nation's progress against cancer. The...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.03.027

    authors: Tourassi G,Yoon HJ,Xu S

    更新日期:2016-06-01 00:00:00

  • Does the use of structured reporting improve usability? A comparative evaluation of the usability of two approaches for findings reporting in a large-scale telecardiology context.

    abstract::One of the main reasons that leads to a low adoption rate of telemedicine systems is poor usability. An aspect that influences usability during the reporting of findings is the input mode, e.g., if a free-text (FT) or a structured report (SR) interface is employed. The objective of our study is to compare the usabilit...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.07.002

    authors: Lacerda TC,von Wangenheim CG,von Wangenheim A,Giuliano I

    更新日期:2014-12-01 00:00:00

  • Matching patients to clinical trials using semantically enriched document representation.

    abstract::Recruiting eligible patients for clinical trials is crucial for reliably answering specific questions about medical interventions and evaluation. However, clinical trial recruitment is a bottleneck in clinical research and drug development. Our goal is to provide an approach towards automating this manual and time-con...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103406

    authors: Hassanzadeh H,Karimi S,Nguyen A

    更新日期:2020-05-01 00:00:00