Abstract:
:A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.
journal_name
J Biomed Informjournal_title
Journal of biomedical informaticsauthors
Panahiazar M,Dumontier M,Gevaert Odoi
10.1016/j.jbi.2017.06.017subject
Has Abstractpub_date
2017-08-01 00:00:00pages
132-139eissn
1532-0464issn
1532-0480pii
S1532-0464(17)30140-5journal_volume
72pub_type
杂志文章abstract::Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biome...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.10.001
更新日期:2014-02-01 00:00:00
abstract:OBJECTIVE:We have developed an automated knowledge base peer feedback system as part of an effort to facilitate the creation and refinement of sound clinical knowledge content within an enterprise-wide knowledge base. The program collects clinical data stored in our Clinical Data Repository during usage of a physician ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2007.05.006
更新日期:2008-02-01 00:00:00
abstract::According to Ministry of Health and Welfare of Taiwan, cancer has been one of the major causes of death in Taiwan since 1982. The Intensive-Modulated Radiation Therapy (IMRT) is one of the most important radiotherapies of cancers, especially for Nasopharyngeal cancers, Digestive system cancers and Cervical cancers. Fo...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103499
更新日期:2020-08-01 00:00:00
abstract::Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public da...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.02.004
更新日期:2016-04-01 00:00:00
abstract:OBJECTIVE:To report the results of a systematic literature review concerning the security and privacy of electronic health record (EHR) systems. DATA SOURCES:Original articles written in English found in MEDLINE, ACM Digital Library, Wiley InterScience, IEEE Digital Library, Science@Direct, MetaPress, ERIC, CINAHL and...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2012.12.003
更新日期:2013-06-01 00:00:00
abstract::Clinical archetypes provide a means for health professionals to design what should be communicated as part of an Electronic Health Record (EHR). An ever-growing number of archetype definitions follow this health information modelling approach, and this international archetype resource will eventually cover a large num...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.12.001
更新日期:2012-06-01 00:00:00
abstract::The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of acti...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2017.07.013
更新日期:2017-09-01 00:00:00
abstract::The large amounts of data generated when high-throughput genotyping methods are used in large-scale epidemiological studies (>10,000 participants) present an enormous challenge to researchers in terms of structured data management. In order to face these challenges, a system has been designed and implemented where gen...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.07.005
更新日期:2009-12-01 00:00:00
abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 29...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.05.009
更新日期:2015-12-01 00:00:00
abstract::Stomach cancer is one of the leading causes of cancer-related deaths worldwide. More than 80% diagnosis of this cancer occur at later stages leading to low 5-year survival rate. This emphasizes the need to have better prognostic techniques for stomach cancer. In this regard, the Next-Generation Sequencing of whole gen...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103254
更新日期:2019-09-01 00:00:00
abstract::Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) wit...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.01.009
更新日期:2016-04-01 00:00:00
abstract::With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfill...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.04.007
更新日期:2009-02-01 00:00:00
abstract::Over the last 8 years, the National Cancer Institute (NCI) has launched a major effort to integrate molecular and clinical cancer-related information within a unified biomedical informatics framework, with controlled terminology as its foundational layer. The NCI Thesaurus is the reference terminology underpinning the...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2006.02.013
更新日期:2007-02-01 00:00:00
abstract::The importance of including patient preferences in decisions regarding their care has received increased emphasis over recent years. Medical informatics can play an important role in improving patient-centered care by developing decision support systems to support the inclusion of patient preferences in clinical decis...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/s1532-0464(03)00037-6
更新日期:2002-10-01 00:00:00
abstract::Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods o...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2019.103159
更新日期:2019-05-01 00:00:00
abstract::Molecular Property Diagnostic Suite - Diabetes Mellitus (MPDSDM) is a Galaxy-based, open source disease-specific web portal for diabetes. It consists of three modules namely (i) data library (ii) data processing and (iii) data analysis tools. The data library (target library and literature) module provide extensive an...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.08.003
更新日期:2018-09-01 00:00:00
abstract:BACKGROUND:Correlation of data within electronic health records is necessary for implementation of various clinical decision support functions, including patient summarization. A key type of correlation is linking medications to clinical problems; while some databases of problem-medication links are available, they are...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2013.11.010
更新日期:2014-04-01 00:00:00
abstract:BACKGROUND:Personal health information is a valuable resource to the advancement of research. In order to achieve a comprehensive reform of data infrastructure in Australia, both public engagement and building social trust is vital. In light of this, we conducted a study to explore the opinions, perceived risks and tru...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103222
更新日期:2019-07-01 00:00:00
abstract::The explosive growth of biomedical literature has created a rich source of knowledge, such as that on protein-protein interactions (PPIs) and drug-drug interactions (DDIs), locked in unstructured free text. Biomedical relation classification aims to automatically detect and classify biomedical relations, which has gre...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章,评审
doi:10.1016/j.jbi.2019.103294
更新日期:2019-11-01 00:00:00
abstract::The Radiology Gamuts Ontology (RGO)-an ontology of diseases, interventions, and imaging findings-was developed to aid in decision support, education, and translational research in diagnostic radiology. The ontology defines a subsumption (is_a) relation between more general and more specific terms, and a causal relatio...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2016.03.015
更新日期:2016-06-01 00:00:00
abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.07.020
更新日期:2015-12-01 00:00:00
abstract::One of the major problems in genomics and medicine is the identification of gene networks and pathways deregulated in complex and polygenic diseases, like cancer. In this paper, we address the problem of assessing the variability of results of pathways analysis identified in different and independent genome wide expre...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.09.005
更新日期:2010-06-01 00:00:00
abstract:BACKGROUND:The determination of risk factors and their temporal relations in natural language patient records is a complex task which has been addressed in the i2b2/UTHealth 2014 shared task. In this context, in most systems it was broadly decomposed into two sub-tasks implemented by two components: entity detection, a...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2015.06.014
更新日期:2015-12-01 00:00:00
abstract::Chronic Obstructive Pulmonary Disease (COPD) is the fourth leading cause of death worldwide and represents one of the major causes of chronic morbidity. Cigarette smoking is the most important risk factor for COPD. In these patients, the airflow limitation is caused by a mixture of small airways disease and parenchyma...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2009.05.008
更新日期:2009-12-01 00:00:00
abstract:BACKGROUND:A tool that can predict the estimated glomerular filtration rate (eGFR) in routine daily care can help clinicians to make better decisions for kidney transplant patients and to improve transplantation outcome. In this paper, we proposed a hybrid prediction model for predicting a future value for eGFR during ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2019.103116
更新日期:2019-03-01 00:00:00
abstract:BACKGROUND:One of the significant problems in the field of healthcare is the low survival rate of people who have experienced sudden cardiac arrest. Early prediction of cardiac arrest can provide the time required for intervening and preventing its onset in order to reduce mortality. Traditional statistical methods hav...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2018.10.008
更新日期:2018-12-01 00:00:00
abstract::Named entity recognition is an extremely important and fundamental task of biomedical text mining. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have complex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and ...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2008.12.012
更新日期:2009-10-01 00:00:00
abstract:OBJECTIVE:Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information sys...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2011.01.005
更新日期:2011-04-01 00:00:00
abstract::Computer simulations have been used to model infectious diseases to examine the outcomes of alternative strategies for managing their spread. Methicillin resistant Staphylococcus aureus (MRSA) skin and soft tissue infections have become prominent in many communities and efforts are underway to reduce the spread of thi...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2010.03.013
更新日期:2010-08-01 00:00:00
abstract::Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline m...
journal_title:Journal of biomedical informatics
pub_type: 杂志文章
doi:10.1016/j.jbi.2020.103525
更新日期:2020-09-01 00:00:00