Abstract:
BACKGROUND:Data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching/listening. However, health-related terms are often misspelled in such noisy text sources due to their complex morphology, resulting in the exclusion of relevant data for studies. In this paper, we present a customizable data-centric system that automatically generates common misspellings for complex health-related terms, which can improve the data collection process from noisy text sources. MATERIALS AND METHODS:The spelling variant generator relies on a dense vector model learned from large, unlabeled text, which is used to find semantically close terms to the original/seed keyword, followed by the filtering of terms that are lexically dissimilar beyond a given threshold. The process is executed recursively, converging when no new terms similar (lexically and semantically) to the seed keyword are found. The weighting of intra-word character sequence similarities allows further problem-specific customization of the system. RESULTS:On a dataset prepared for this study, our system outperforms the current state-of-the-art medication name variant generator with best F1-score of 0.69 and F14-score of 0.78. Extrinsic evaluation of the system on a set of cancer-related terms demonstrated an increase of over 67% in retrieval rate from Twitter posts when the generated variants are included. DISCUSSION:Our proposed spelling variant generator has several advantages over past spelling variant generators-(i) it is capable of filtering out lexically similar but semantically dissimilar terms, (ii) the number of variants generated is low, as many low-frequency and ambiguous misspellings are filtered out, and (iii) the system is fully automatic, customizable and easily executable. While the base system is fully unsupervised, we show how supervision may be employed to adjust weights for task-specific customizations. CONCLUSION:The performance and relative simplicity of our proposed approach make it a much-needed spelling variant generation resource for health-related text mining from noisy sources. The source code for the system has been made publicly available for research.
journal_name
J Biomed Informjournal_title
Journal of biomedical informaticsauthors
Sarker A,Gonzalez-Hernandez Gdoi
10.1016/j.jbi.2018.11.007subject
Has Abstractpub_date
2018-12-01 00:00:00pages
98-107eissn
1532-0464issn
1532-0480pii
S1532-0464(18)30216-8journal_volume
88pub_type
杂志文章abstract::A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metada...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.06.017
更新日期:2017-08-01 00:00:00
abstract::To extract biomedical information about bio-entities from the huge amount of biomedical literature, the first key step is recognizing their names in these literatures, which remains a challenging task due to the irregularities and ambiguities in bio-entities nomenclature. The recognition performances of the current po...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.01.002
更新日期:2008-08-01 00:00:00
abstract::Big data technologies are critical to the medical field which requires new frameworks to leverage them. Such frameworks would benefit medical experts to test hypotheses by querying huge volumes of unstructured medical data to provide better patient care. The objective of this work is to implement and examine the feasi...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.12.005
更新日期:2016-02-01 00:00:00
abstract::The explosive growth of biomedical literature has created a rich source of knowledge, such as that on protein-protein interactions (PPIs) and drug-drug interactions (DDIs), locked in unstructured free text. Biomedical relation classification aims to automatically detect and classify biomedical relations, which has gre...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2019.103294
更新日期:2019-11-01 00:00:00
abstract::Hypotheses of homology are the basis of phylogenetic analysis. All character data are considered to be equivalent regardless of the source of those characters. Putative homology statements are designated based on observations of similarity. Pairwise sequence alignment using the Needleman-Wunsch algorithm is the basis ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2005.11.005
更新日期:2006-02-01 00:00:00
abstract::The threat of bioterrorism has stimulated interest in enhancing public health surveillance to detect disease outbreaks more rapidly than is currently possible. To advance research on improving the timeliness of outbreak detection, the Defense Advanced Research Project Agency sponsored the Bio-event Advanced Leading In...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2004.11.007
更新日期:2005-04-01 00:00:00
abstract::Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly beca...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.03.007
更新日期:2013-06-01 00:00:00
abstract:OBJECTIVE:To outline new design directions for informatics solutions that facilitate personal discovery with self-monitoring data. We investigate this question in the context of chronic disease self-management with the focus on type 2 diabetes. MATERIALS AND METHODS:We conducted an observational qualitative study of d...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.09.013
更新日期:2017-12-01 00:00:00
abstract::We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2017.07.012
更新日期:2017-09-01 00:00:00
abstract::This work investigates, whether openEHR with its reference model, archetypes and templates is suitable for the digital representation of demographic as well as clinical data. Moreover, it elaborates openEHR as a tool for modelling Hospital Information Systems on a regional level based on a national logical infrastruct...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.04.004
更新日期:2015-06-01 00:00:00
abstract::One of the main reasons that leads to a low adoption rate of telemedicine systems is poor usability. An aspect that influences usability during the reporting of findings is the input mode, e.g., if a free-text (FT) or a structured report (SR) interface is employed. The objective of our study is to compare the usabilit...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2014.07.002
更新日期:2014-12-01 00:00:00
abstract:OBJECTIVE:The timely acknowledgement of critical patient clinical reports is vital for the delivery of safe patient care. With current EHR systems, critical reports reside on different screens. This leads to treatment delays and inefficient work flows. As a remedy, the R.A.P.I.D. (Root Aggregated Prioritized Informatio...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,随机对照试验
doi:10.1016/j.jbi.2016.04.001
更新日期:2016-06-01 00:00:00
abstract::Clinical pathways are used to guide clinicians to provide a standardised delivery of care. Because of their standardisation, the aim of clinical pathways is to reduce variation in both care process and patient outcomes. When learning clinical pathways from data through data mining, it is common practice to represent e...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103668
更新日期:2021-01-27 00:00:00
abstract::In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics hav...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.08.011
更新日期:2015-12-01 00:00:00
abstract:BACKGROUND:Bluetooth low energy (BLE) beacons have been used to track the locations of individuals in indoor environments for clinical applications such as workflow analysis and infectious disease modelling. Most current approaches use the received signal strength indicator (RSSI) to track locations. When using the RSS...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103288
更新日期:2019-10-01 00:00:00
abstract:BACKGROUND:A tool that can predict the estimated glomerular filtration rate (eGFR) in routine daily care can help clinicians to make better decisions for kidney transplant patients and to improve transplantation outcome. In this paper, we proposed a hybrid prediction model for predicting a future value for eGFR during ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103116
更新日期:2019-03-01 00:00:00
abstract::One of the major problems in genomics and medicine is the identification of gene networks and pathways deregulated in complex and polygenic diseases, like cancer. In this paper, we address the problem of assessing the variability of results of pathways analysis identified in different and independent genome wide expre...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.09.005
更新日期:2010-06-01 00:00:00
abstract::Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2012.11.001
更新日期:2013-02-01 00:00:00
abstract::SNPs (Single Nucleotide Polymorphisms) include millions of changes in human genome, and therefore, are promising tools for disease-gene association studies. However, this kind of studies is constrained by the high expense of genotyping millions of SNPs. For this reason, it is required to obtain a suitable subset of SN...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2012.12.002
更新日期:2013-04-01 00:00:00
abstract:BACKGROUND:Personal health information is a valuable resource to the advancement of research. In order to achieve a comprehensive reform of data infrastructure in Australia, both public engagement and building social trust is vital. In light of this, we conducted a study to explore the opinions, perceived risks and tru...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103222
更新日期:2019-07-01 00:00:00
abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 29...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.05.009
更新日期:2015-12-01 00:00:00
abstract::One of the challenging problems in drug discovery is to identify the novel targets for drugs. Most of the traditional methods for drug targets optimization focused on identifying the particular families of "druggable targets", but ignored their topological properties based on the biological pathways. In this study, we...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.02.007
更新日期:2015-04-01 00:00:00
abstract:OBJECTIVE:We have developed an automated knowledge base peer feedback system as part of an effort to facilitate the creation and refinement of sound clinical knowledge content within an enterprise-wide knowledge base. The program collects clinical data stored in our Clinical Data Repository during usage of a physician ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2007.05.006
更新日期:2008-02-01 00:00:00
abstract::Stomach cancer is one of the leading causes of cancer-related deaths worldwide. More than 80% diagnosis of this cancer occur at later stages leading to low 5-year survival rate. This emphasizes the need to have better prognostic techniques for stomach cancer. In this regard, the Next-Generation Sequencing of whole gen...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103254
更新日期:2019-09-01 00:00:00
abstract::In this study, we proposed a new medical diagnosis system based on principal component analysis (PCA), k-NN based weighting pre-processing, and Artificial Immune Recognition System (AIRS) for diagnosis of atherosclerosis from Carotid Artery Doppler Signals. The suggested system consists of four stages. First, in the f...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2007.04.001
更新日期:2008-02-01 00:00:00
abstract::The Guideline Interchange Format (GLIF) is a model for representation of sharable computer-interpretable guidelines. The current version of GLIF (GLIF3) is a substantial update and enhancement of the model since the previous version (GLIF2). GLIF3 enables encoding of a guideline at three levels: a conceptual flowchart...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2004.04.002
更新日期:2004-06-01 00:00:00
abstract::A 1998 paper that delineated desirable characteristics, or desiderata for controlled medical terminologies attempted to summarize emerging consensus regarding structural issues of such terminologies. Among the Desiderata was a call for terminologies to be "concept oriented." Since then, research has trended toward the...
journal_title:Journal of biomedical informatics
pub_type: 评论,杂志文章
doi:10.1016/j.jbi.2005.11.008
更新日期:2006-06-01 00:00:00
abstract::In many respects, the critical care workplace resembles a paradigmatic complex system: on account of the dynamic and interactive nature of collaborative clinical work, these settings are characterized by non-linear, inter-dependent and emergent activities. Developing a comprehensive understanding of the work activitie...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.02.007
更新日期:2011-06-01 00:00:00
abstract::Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.12.002
更新日期:2014-08-01 00:00:00
abstract::Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently r...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.08.009
更新日期:2011-12-01 00:00:00