Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

Abstract:

:A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.

journal_name

J Biomed Inform

authors

Panahiazar M,Dumontier M,Gevaert O

doi

10.1016/j.jbi.2017.06.017

subject

Has Abstract

pub_date

2017-08-01 00:00:00

pages

132-139

eissn

1532-0464

issn

1532-0480

pii

S1532-0464(17)30140-5

journal_volume

72

pub_type

杂志文章
  • ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.

    abstract::Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biome...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.10.001

    authors: Kamdar MR,Zeginis D,Hasnain A,Decker S,Deus HF

    更新日期:2014-02-01 00:00:00

  • Towards an on-demand peer feedback system for a clinical knowledge base: a case study with order sets.

    abstract:OBJECTIVE:We have developed an automated knowledge base peer feedback system as part of an effort to facilitate the creation and refinement of sound clinical knowledge content within an enterprise-wide knowledge base. The program collects clinical data stored in our Clinical Data Repository during usage of a physician ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2007.05.006

    authors: Hulse NC,Del Fiol G,Bradshaw RL,Roemer LK,Rocha RA

    更新日期:2008-02-01 00:00:00

  • Utilizing online stochastic optimization on scheduling of intensity-modulate radiotherapy therapy (IMRT).

    abstract::According to Ministry of Health and Welfare of Taiwan, cancer has been one of the major causes of death in Taiwan since 1982. The Intensive-Modulated Radiation Therapy (IMRT) is one of the most important radiotherapies of cancers, especially for Nasopharyngeal cancers, Digestive system cancers and Cervical cancers. Fo...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103499

    authors: Chang WH,Lo SM,Chen TL,Chen JC,Wu HN

    更新日期:2020-08-01 00:00:00

  • A method and software framework for enriching private biomedical sources with data from public online repositories.

    abstract::Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public da...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.02.004

    authors: Anguita A,García-Remesal M,Graf N,Maojo V

    更新日期:2016-04-01 00:00:00

  • Security and privacy in electronic health records: a systematic literature review.

    abstract:OBJECTIVE:To report the results of a systematic literature review concerning the security and privacy of electronic health record (EHR) systems. DATA SOURCES:Original articles written in English found in MEDLINE, ACM Digital Library, Wiley InterScience, IEEE Digital Library, Science@Direct, MetaPress, ERIC, CINAHL and...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2012.12.003

    authors: Fernández-Alemán JL,Señor IC,Lozoya PÁ,Toval A

    更新日期:2013-06-01 00:00:00

  • Clinical coverage of an archetype repository over SNOMED-CT.

    abstract::Clinical archetypes provide a means for health professionals to design what should be communicated as part of an Electronic Health Record (EHR). An ever-growing number of archetype definitions follow this health information modelling approach, and this international archetype resource will eventually cover a large num...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.12.001

    authors: Yu S,Berry D,Bisbal J

    更新日期:2012-06-01 00:00:00

  • Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology.

    abstract::The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of acti...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2017.07.013

    authors: Zheng L,Yumak H,Chen L,Ochs C,Geller J,Kapusnik-Uner J,Perl Y

    更新日期:2017-09-01 00:00:00

  • Unleashing genotypes in epidemiology - A novel method for managing high throughput information.

    abstract::The large amounts of data generated when high-throughput genotyping methods are used in large-scale epidemiological studies (>10,000 participants) present an enormous challenge to researchers in terms of structured data management. In order to face these challenges, a system has been designed and implemented where gen...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.07.005

    authors: Olund G,Brinne A,Lindqvist P,Litton JE

    更新日期:2009-12-01 00:00:00

  • Annotating risk factors for heart disease in clinical narratives for diabetic patients.

    abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 29...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.05.009

    authors: Stubbs A,Uzuner Ö

    更新日期:2015-12-01 00:00:00

  • Genome-wide analysis of multi-view data of miRNA-seq to identify miRNA biomarkers for stomach cancer.

    abstract::Stomach cancer is one of the leading causes of cancer-related deaths worldwide. More than 80% diagnosis of this cancer occur at later stages leading to low 5-year survival rate. This emphasizes the need to have better prognostic techniques for stomach cancer. In this regard, the Next-Generation Sequencing of whole gen...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103254

    authors: Pant N,Rakshit S,Paul S,Saha I

    更新日期:2019-09-01 00:00:00

  • Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function.

    abstract::Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) wit...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.01.009

    authors: Taslimitehrani V,Dong G,Pereira NL,Panahiazar M,Pathak J

    更新日期:2016-04-01 00:00:00

  • An image score inference system for RNAi genome-wide screening based on fuzzy mixture regression modeling.

    abstract::With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfill...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.04.007

    authors: Wang J,Zhou X,Li F,Bradley PL,Chang SF,Perrimon N,Wong ST

    更新日期:2009-02-01 00:00:00

  • NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information.

    abstract::Over the last 8 years, the National Cancer Institute (NCI) has launched a major effort to integrate molecular and clinical cancer-related information within a unified biomedical informatics framework, with controlled terminology as its foundational layer. The NCI Thesaurus is the reference terminology underpinning the...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2006.02.013

    authors: Sioutos N,de Coronado S,Haber MW,Hartel FW,Shaiu WL,Wright LW

    更新日期:2007-02-01 00:00:00

  • Developing, implementing, and evaluating decision support systems for shared decision making in patient care: a conceptual model and case illustration.

    abstract::The importance of including patient preferences in decisions regarding their care has received increased emphasis over recent years. Medical informatics can play an important role in improving patient-centered care by developing decision support systems to support the inclusion of patient preferences in clinical decis...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/s1532-0464(03)00037-6

    authors: Ruland CM,Bakken S

    更新日期:2002-10-01 00:00:00

  • A comprehensive review of feature based methods for drug target interaction prediction.

    abstract::Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods o...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103159

    authors: Sachdev K,Gupta MK

    更新日期:2019-05-01 00:00:00

  • Molecular property diagnostic suite for diabetes mellitus (MPDSDM): An integrated web portal for drug discovery and drug repurposing.

    abstract::Molecular Property Diagnostic Suite - Diabetes Mellitus (MPDSDM) is a Galaxy-based, open source disease-specific web portal for diabetes. It consists of three modules namely (i) data library (ii) data processing and (iii) data analysis tools. The data library (target library and literature) module provide extensive an...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.08.003

    authors: Gaur AS,Nagamani S,Tanneeru K,Druzhilovskiy D,Rudik A,Poroikov V,Narahari Sastry G

    更新日期:2018-09-01 00:00:00

  • Development of a clinician reputation metric to identify appropriate problem-medication pairs in a crowdsourced knowledge base.

    abstract:BACKGROUND:Correlation of data within electronic health records is necessary for implementation of various clinical decision support functions, including patient summarization. A key type of correlation is linking medications to clinical problems; while some databases of problem-medication links are available, they are...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2013.11.010

    authors: McCoy AB,Wright A,Rogith D,Fathiamini S,Ottenbacher AJ,Sittig DF

    更新日期:2014-04-01 00:00:00

  • Personal health information in research: Perceived risk, trustworthiness and opinions from patients attending a tertiary healthcare facility.

    abstract:BACKGROUND:Personal health information is a valuable resource to the advancement of research. In order to achieve a comprehensive reform of data infrastructure in Australia, both public engagement and building social trust is vital. In light of this, we conducted a study to explore the opinions, perceived risks and tru...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103222

    authors: Krahe M,Milligan E,Reilly S

    更新日期:2019-07-01 00:00:00

  • Neural network-based approaches for biomedical relation classification: A review.

    abstract::The explosive growth of biomedical literature has created a rich source of knowledge, such as that on protein-protein interactions (PPIs) and drug-drug interactions (DDIs), locked in unstructured free text. Biomedical relation classification aims to automatically detect and classify biomedical relations, which has gre...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章,评审

    doi:10.1016/j.jbi.2019.103294

    authors: Zhang Y,Lin H,Yang Z,Wang J,Sun Y,Xu B,Zhao Z

    更新日期:2019-11-01 00:00:00

  • Transitive closure of subsumption and causal relations in a large ontology of radiological diagnosis.

    abstract::The Radiology Gamuts Ontology (RGO)-an ontology of diseases, interventions, and imaging findings-was developed to aid in decision support, education, and translational research in diagnostic radiology. The ontology defines a subsumption (is_a) relation between more general and more specific terms, and a causal relatio...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2016.03.015

    authors: Kahn CE Jr

    更新日期:2016-06-01 00:00:00

  • Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

    abstract::The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.07.020

    authors: Stubbs A,Uzuner Ö

    更新日期:2015-12-01 00:00:00

  • On the reproducibility of results of pathway analysis in genome-wide expression studies of colorectal cancers.

    abstract::One of the major problems in genomics and medicine is the identification of gene networks and pathways deregulated in complex and polygenic diseases, like cancer. In this paper, we address the problem of assessing the variability of results of pathways analysis identified in different and independent genome wide expre...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.09.005

    authors: Maglietta R,Distaso A,Piepoli A,Palumbo O,Carella M,D'Addabbo A,Mukherjee S,Ancona N

    更新日期:2010-06-01 00:00:00

  • Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.

    abstract:BACKGROUND:The determination of risk factors and their temporal relations in natural language patient records is a complex task which has been addressed in the i2b2/UTHealth 2014 shared task. In this context, in most systems it was broadly decomposed into two sub-tasks implemented by two components: entity detection, a...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2015.06.014

    authors: Grouin C,Moriceau V,Zweigenbaum P

    更新日期:2015-12-01 00:00:00

  • Explorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes.

    abstract::Chronic Obstructive Pulmonary Disease (COPD) is the fourth leading cause of death worldwide and represents one of the major causes of chronic morbidity. Cigarette smoking is the most important risk factor for COPD. In these patients, the airflow limitation is caused by a mixture of small airways disease and parenchyma...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2009.05.008

    authors: Paoletti M,Camiciottoli G,Meoni E,Bigazzi F,Cestelli L,Pistolesi M,Marchesi C

    更新日期:2009-12-01 00:00:00

  • Predicting the function of transplanted kidney in long-term care processes: Application of a hybrid model.

    abstract:BACKGROUND:A tool that can predict the estimated glomerular filtration rate (eGFR) in routine daily care can help clinicians to make better decisions for kidney transplant patients and to improve transplantation outcome. In this paper, we proposed a hybrid prediction model for predicting a future value for eGFR during ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2019.103116

    authors: Rashidi Khazaee P,Bagherzadeh M J,Niazkhani Z,Pirnejad H

    更新日期:2019-03-01 00:00:00

  • Toward analyzing and synthesizing previous research in early prediction of cardiac arrest using machine learning based on a multi-layered integrative framework.

    abstract:BACKGROUND:One of the significant problems in the field of healthcare is the low survival rate of people who have experienced sudden cardiac arrest. Early prediction of cardiac arrest can provide the time required for intervening and preventing its onset in order to reduce mortality. Traditional statistical methods hav...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2018.10.008

    authors: Layeghian Javan S,Sepehri MM,Aghajani H

    更新日期:2018-12-01 00:00:00

  • Feature selection techniques for maximum entropy based biomedical named entity recognition.

    abstract::Named entity recognition is an extremely important and fundamental task of biomedical text mining. Biomedical named entities include mentions of proteins, genes, DNA, RNA, etc which often have complex structures, but it is challenging to identify and classify such entities. Machine learning methods like CRF, MEMM and ...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2008.12.012

    authors: Saha SK,Sarkar S,Mitra P

    更新日期:2009-10-01 00:00:00

  • A comparison of two methods for retrieving ICD-9-CM data: the effect of using an ontology-based method for handling terminology changes.

    abstract:OBJECTIVE:Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information sys...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2011.01.005

    authors: Yu AC,Cimino JJ

    更新日期:2011-04-01 00:00:00

  • The development and validation of a simulation tool for health policy decision making.

    abstract::Computer simulations have been used to model infectious diseases to examine the outcomes of alternative strategies for managing their spread. Methicillin resistant Staphylococcus aureus (MRSA) skin and soft tissue infections have become prominent in many communities and efforts are underway to reduce the spread of thi...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2010.03.013

    authors: Panchanathan SS,Petitti DB,Fridsma DB

    更新日期:2010-08-01 00:00:00

  • Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines.

    abstract::Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline m...

    journal_title:Journal of biomedical informatics

    pub_type: 杂志文章

    doi:10.1016/j.jbi.2020.103525

    authors: Faris H,Habib M,Faris M,Alomari M,Alomari A

    更新日期:2020-09-01 00:00:00