A controlled greedy supervised approach for co-reference resolution on clinical text.

Abstract:

:Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly because the well-researched existing methodologies do not perform well on new domains such as clinical records. This paper presents a variant of the influential mention-pair model for co-reference resolution. Using a series of linguistically and semantically motivated constraints, the proposed approach controls generation of less-informative/sub-optimal training and test instances. Additionally, the approach also introduces some aggressive greedy strategies in chain clustering. The proposed approach has been tested on the official test corpus of the recently held i2b2/VA 2011 challenge. It achieves an unweighted average F1 score of 0.895, calculated from multiple evaluation metrics (MUC, B(3) and CEAF scores). These results are comparable to the best systems of the challenge. What makes our proposed system distinct is that it also achieves high average F1 scores for each individual chain type (Test: 0.897, Person: 0.852, PROBLEM: 0.855, TREATMENT: 0.884). Unlike other works, it obtains good scores for each of the individual metrics rather than being biased towards a particular metric.

journal_name

J Biomed Inform

authors

Chowdhury MF,Zweigenbaum P

doi

10.1016/j.jbi.2013.03.007

subject

Has Abstract

pub_date

2013-06-01 00:00:00

pages

506-15

issue

3

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(13)00041-5

journal_volume

46

pub_type

杂志文章
  • The use of logic relationships to model colon cancer gene expression networks with mRNA microarray data.

    abstract::The ultimate goal of genomics research is to describe the network of molecules and interactions that govern all biological functions and disease processes in cells. Nonlinear interactions among genes in terms of their logic relationships play a key role for deciphering the networks of molecules that underlie cellular ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2007.11.006

    authors: Ruan X,Wang J,Li H,Perozzi RE,Perozzi EF

    更新日期:2008-08-01 00:00:00

  • Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text.

    abstract::We address the assignment of ICD-10 codes for causes of death by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We leverage a deep neural network that combines word embeddings, recurrent units, and neu...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.02.011

    authors: Duarte F,Martins B,Pinto CS,Silva MJ

    更新日期:2018-04-01 00:00:00

  • Clinical coverage of an archetype repository over SNOMED-CT.

    abstract::Clinical archetypes provide a means for health professionals to design what should be communicated as part of an Electronic Health Record (EHR). An ever-growing number of archetype definitions follow this health information modelling approach, and this international archetype resource will eventually cover a large num...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.12.001

    authors: Yu S,Berry D,Bisbal J

    更新日期:2012-06-01 00:00:00

  • Predicting the function of transplanted kidney in long-term care processes: Application of a hybrid model.

    abstract:BACKGROUND:A tool that can predict the estimated glomerular filtration rate (eGFR) in routine daily care can help clinicians to make better decisions for kidney transplant patients and to improve transplantation outcome. In this paper, we proposed a hybrid prediction model for predicting a future value for eGFR during ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103116

    authors: Rashidi Khazaee P,Bagherzadeh M J,Niazkhani Z,Pirnejad H

    更新日期:2019-03-01 00:00:00

  • The Counterfactual χ-GAN: Finding comparable cohorts in observational health data.

    abstract::Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome, known as strong ignorability. Approaches to enforcing strong ignorability in causal analyses of observational data include weighting and matching methods. Effect estimates, such as the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103515

    authors: Averitt AJ,Vanitchanant N,Ranganath R,Perotte AJ

    更新日期:2020-09-01 00:00:00

  • ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU.

    abstract::To improve the performance of Intensive Care Units (ICUs), the field of bio-statistics has developed scores which try to predict the likelihood of negative outcomes. These help evaluate the effectiveness of treatments and clinical practice, and also help to identify patients with unexpected outcomes. However, they hav...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103269

    authors: Caicedo-Torres W,Gutierrez J

    更新日期:2019-10-01 00:00:00

  • Enhancing phylogeography by improving geographical information from GenBank.

    abstract::Phylogeography is a field that focuses on the geographical lineages of species such as vertebrates or viruses. Here, geographical data, such as location of a species or viral host is as important as the sequence information extracted from the species. Together, this information can help illustrate the migration of the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.06.005

    authors: Scotch M,Sarkar IN,Mei C,Leaman R,Cheung KH,Ortiz P,Singraur A,Gonzalez G

    更新日期:2011-12-01 00:00:00

  • Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.

    abstract::Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve pr...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2004.07.007

    authors: Gamberger D,Lavrac N,Zelezný F,Tolar J

    更新日期:2004-08-01 00:00:00

  • A novel method to estimate the indirect community benefit of HIV interventions using a microsimulation model of HIV disease.

    abstract:BACKGROUND:Microsimulation models of human immunodeficiency virus (HIV) disease that simulate individual patients one at a time and assess clinical and economic outcomes of HIV interventions often provide key details regarding direct individual clinical benefits ("individual benefit"), but they may lack detail on trans...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103475

    authors: Kazemian P,Costantini S,Neilan AM,Resch SC,Walensky RP,Weinstein MC,Freedberg KA

    更新日期:2020-07-01 00:00:00

  • Classification of ADHD with bi-objective optimization.

    abstract::Attention Deficit Hyperactive Disorder (ADHD) is one of the most common diseases in school aged children. In this paper, we consider using fMRI data with classification techniques to aid the diagnosis of ADHD and propose a bi-objective ADHD classification scheme based on L1-norm support vector machine (SVM). In our cl...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.011

    authors: Shao L,Xu Y,Fu D

    更新日期:2018-08-01 00:00:00

  • Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators.

    abstract::Probabilistic record linkage is a method commonly used to determine whether demographic records refer to the same person. The Fellegi-Sunter method is a probabilistic approach that uses field weights based on log likelihood ratios to determine record similarity. This paper introduces an extension of the Fellegi-Sunter...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.08.004

    authors: DuVall SL,Kerber RA,Thomas A

    更新日期:2010-02-01 00:00:00

  • Risk factor detection for heart disease by applying text analytics in electronic medical records.

    abstract::In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics hav...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.08.011

    authors: Torii M,Fan JW,Yang WL,Lee T,Wiley MT,Zisook DS,Huang Y

    更新日期:2015-12-01 00:00:00

  • Toward national comparable nurse practitioner data: proposed data elements, rationale, and methods.

    abstract::Federal funds have supported Nurse Practitioner (NP) education and the establishment of nurse-managed centers. Yet, important questions are raised about the quality and appropriate scope of NP care. Few NP-patient encounters are documented in the largest national surveys of ambulatory care, sponsored by the National C...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2003.09.016

    authors: Jenkins ML

    更新日期:2003-08-01 00:00:00

  • A new framework for the selection of tag SNPs by multimarker haplotypes.

    abstract::This paper proposes a new framework for the selection of tag SNPs based on haplotypes instead of on a single SNP. The tag SNPs found by this framework form a set of haplotypes completely predictive of the alleles of all untyped SNPs. We refer to this problem as MTMH, which is defined as follows: given a set of SNPs, f...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.04.003

    authors: Huang YT,Chao KM

    更新日期:2008-12-01 00:00:00

  • Grounding a new information technology implementation framework in behavioral science: a systematic analysis of the literature on IT use.

    abstract::Many interventions to improve the success of information technology (IT) implementations are grounded in behavioral science, using theories, and models to identify conditions and determinants of successful use. However, each model in the IT literature has evolved to address specific theoretical problems of particular ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2003.09.002

    authors: Kukafka R,Johnson SB,Linfante A,Allegrante JP

    更新日期:2003-06-01 00:00:00

  • Towards an on-demand peer feedback system for a clinical knowledge base: a case study with order sets.

    abstract:OBJECTIVE:We have developed an automated knowledge base peer feedback system as part of an effort to facilitate the creation and refinement of sound clinical knowledge content within an enterprise-wide knowledge base. The program collects clinical data stored in our Clinical Data Repository during usage of a physician ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2007.05.006

    authors: Hulse NC,Del Fiol G,Bradshaw RL,Roemer LK,Rocha RA

    更新日期:2008-02-01 00:00:00

  • Annotating risk factors for heart disease in clinical narratives for diabetic patients.

    abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 29...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.05.009

    authors: Stubbs A,Uzuner Ö

    更新日期:2015-12-01 00:00:00

  • Feature selection techniques for maximum entropy based biomedical named entity recognition.

    abstract::Named entity recognition is an extremely important and fundamental task of biomedical text mining. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have complex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.12.012

    authors: Saha SK,Sarkar S,Mitra P

    更新日期:2009-10-01 00:00:00

  • Relevance feedback for enhancing content based image retrieval and automatic prediction of semantic image features: Application to bone tumor radiographs.

    abstract:BACKGROUND:The majority of current medical CBIR systems perform retrieval based only on "imaging signatures" generated by extracting pixel-level quantitative features, and only rarely has a feedback mechanism been incorporated to improve retrieval performance. In addition, current medical CBIR approaches do not routine...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.07.002

    authors: Banerjee I,Kurtz C,Devorah AE,Do B,Rubin DL,Beaulieu CF

    更新日期:2018-08-01 00:00:00

  • Game-based interventions for neuropsychological assessment, training and rehabilitation: Which game-elements to use? A systematic review.

    abstract::Game-based interventions (GBI) have been used to promote health-related outcomes, including cognitive functions. Criteria for game-elements (GE) selection are insufficiently characterized in terms of their adequacy to patients' clinical conditions or targeted cognitive outcomes. This study aimed to identify GE applied...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103287

    authors: Ferreira-Brito F,Fialho M,Virgolino A,Neves I,Miranda AC,Sousa-Santos N,Caneiras C,Carriço L,Verdelho A,Santos O

    更新日期:2019-10-01 00:00:00

  • Unleashing genotypes in epidemiology - A novel method for managing high throughput information.

    abstract::The large amounts of data generated when high-throughput genotyping methods are used in large-scale epidemiological studies (>10,000 participants) present an enormous challenge to researchers in terms of structured data management. In order to face these challenges, a system has been designed and implemented where gen...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.07.005

    authors: Olund G,Brinne A,Lindqvist P,Litton JE

    更新日期:2009-12-01 00:00:00

  • DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx.

    abstract::In Electronic Health Records (EHRs), much of valuable information regarding patients' conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.02.010

    authors: Mehrabi S,Krishnan A,Sohn S,Roch AM,Schmidt H,Kesterson J,Beesley C,Dexter P,Max Schmidt C,Liu H,Palakal M

    更新日期:2015-04-01 00:00:00

  • The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships.

    abstract::Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2012.04.004

    authors: van Mulligen EM,Fourrier-Reglat A,Gurwitz D,Molokhia M,Nieto A,Trifiro G,Kors JA,Furlong LI

    更新日期:2012-10-01 00:00:00

  • Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines.

    abstract::Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline m...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103525

    authors: Faris H,Habib M,Faris M,Alomari M,Alomari A

    更新日期:2020-09-01 00:00:00

  • Impact of an electronic handoff documentation tool on team shared mental models in pediatric critical care.

    abstract:OBJECTIVE:To examine the impact of the implementation of an electronic handoff tool (the Handoff Tool) on shared mental models (SMM) within patient care teams as measured by content overlap and discrepancies in verbal handoff presentations given by different clinicians caring for the same patient. MATERIALS AND METHOD...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.03.004

    authors: Jiang SY,Murphy A,Heitkemper EM,Hum RS,Kaufman DR,Mamykina L

    更新日期:2017-05-01 00:00:00

  • Methodological variations in lagged regression for detecting physiologic drug effects in EHR data.

    abstract::We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) diffe...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.08.014

    authors: Levine ME,Albers DJ,Hripcsak G

    更新日期:2018-10-01 00:00:00

  • Medical diagnosis of atherosclerosis from Carotid Artery Doppler Signals using principal component analysis (PCA), k-NN based weighting pre-processing and Artificial Immune Recognition System (AIRS).

    abstract::In this study, we proposed a new medical diagnosis system based on principal component analysis (PCA), k-NN based weighting pre-processing, and Artificial Immune Recognition System (AIRS) for diagnosis of atherosclerosis from Carotid Artery Doppler Signals. The suggested system consists of four stages. First, in the f...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2007.04.001

    authors: Latifoğlu F,Polat K,Kara S,Güneş S

    更新日期:2008-02-01 00:00:00

  • Creating hospital-specific customized clinical pathways by applying semantic reasoning to clinical data.

    abstract:OBJECTIVE:Clinical pathways (CPs) are widely studied methods to standardize clinical intervention and improve medical quality. However, standard care plans defined in current CPs are too general to execute in a practical healthcare environment. The purpose of this study was to create hospital-specific personalized CPs ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2014.07.017

    authors: Wang HQ,Zhou TS,Tian LL,Qian YM,Li JS

    更新日期:2014-12-01 00:00:00

  • Algorithms for rapid outbreak detection: a research synthesis.

    abstract::The threat of bioterrorism has stimulated interest in enhancing public health surveillance to detect disease outbreaks more rapidly than is currently possible. To advance research on improving the timeliness of outbreak detection, the Defense Advanced Research Project Agency sponsored the Bio-event Advanced Leading In...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2004.11.007

    authors: Buckeridge DL,Burkom H,Campbell M,Hogan WR,Moore AW

    更新日期:2005-04-01 00:00:00

  • Homology assessment and molecular sequence alignment.

    abstract::Hypotheses of homology are the basis of phylogenetic analysis. All character data are considered to be equivalent regardless of the source of those characters. Putative homology statements are designated based on observations of similarity. Pairwise sequence alignment using the Needleman-Wunsch algorithm is the basis ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2005.11.005

    authors: Phillips AJ

    更新日期:2006-02-01 00:00:00