Developing a healthcare dataset information resource (DIR) based on Semantic Web.

Abstract:

BACKGROUND:The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. METHODS:The DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search. RESULTS:The DIR prototype is composed of four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly. CONCLUSIONS:We have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway.

journal_name

BMC Med Genomics

journal_title

BMC medical genomics

authors

Shi J,Zheng M,Yao L,Ge Y

doi

10.1186/s12920-018-0411-5

subject

Has Abstract

pub_date

2018-11-20 00:00:00

pages

102

issue

Suppl 5

issn

1755-8794

pii

10.1186/s12920-018-0411-5

journal_volume

11

pub_type

杂志文章
  • Exon array analysis reveals neuroblastoma tumors have distinct alternative splicing patterns according to stage and MYCN amplification status.

    abstract:BACKGROUND:Neuroblastoma (NB) tumors are well known for their pronounced clinical and molecular heterogeneity. The global gene expression and DNA copy number alterations have been shown to have profound differences in tumors of low or high stage and those with or without MYCN amplification. RNA splicing is an important...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-4-35

    authors: Guo X,Chen QR,Song YK,Wei JS,Khan J

    更新日期:2011-04-18 00:00:00

  • Functional microarray analysis suggests repressed cell-cell signaling and cell survival-related modules inhibit progression of head and neck squamous cell carcinoma.

    abstract:BACKGROUND:Cancer shows a great diversity in its clinical behavior which cannot be easily predicted using the currently available clinical or pathological markers. The identification of pathways associated with lymph node metastasis (N+) and recurrent head and neck squamous cell carcinoma (HNSCC) may increase our under...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-4-33

    authors: Coló AE,Simoes AC,Carvalho AL,Melo CM,Fahham L,Kowalski LP,Soares FA,Neves EJ,Reis LF,Carvalho AF

    更新日期:2011-04-13 00:00:00

  • Categorizing biomedicine images using novel image features and sparse coding representation.

    abstract:BACKGROUND:Images embedded in biomedical publications carry rich information that often concisely summarize key hypotheses adopted, methods employed, or results obtained in a published study. Therefore, they offer valuable clues for understanding main content in a biomedical publication. Prior studies have pointed out ...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-6-S3-S8

    authors: Sheng J,Xu S,Luo X

    更新日期:2013-01-01 00:00:00

  • Pan-cancer analysis of differential DNA methylation patterns.

    abstract:BACKGROUND:DNA methylation is a key epigenetic regulator contributing to cancer development. To understand the role of DNA methylation in tumorigenesis, it is important to investigate and compare differential methylation (DM) patterns between normal and case samples across different cancer types. However, current pan-c...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-020-00780-3

    authors: Shi M,Tsui SK,Wu H,Wei Y

    更新日期:2020-10-22 00:00:00

  • Computational analysis of the mesenchymal signature landscape in gliomas.

    abstract:BACKGROUND:Epithelial to mesenchymal transition, and mimicking processes, contribute to cancer invasion and metastasis, and are known to be responsible for resistance to various therapeutic agents in many cancers. While a number of studies have proposed molecular signatures that characterize the spectrum of such transi...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-017-0252-7

    authors: Celiku O,Tandle A,Chung JY,Hewitt SM,Camphausen K,Shankavaram U

    更新日期:2017-03-09 00:00:00

  • Host sequence motifs shared by HIV predict response to antiretroviral therapy.

    abstract:BACKGROUND:The HIV viral genome mutates at a high rate and poses a significant long term health risk even in the presence of combination antiretroviral therapy. Current methods for predicting a patient's response to therapy rely on site-directed mutagenesis experiments and in vitro resistance assays. In this bioinforma...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-2-47

    authors: Dampier W,Evans P,Ungar L,Tozeren A

    更新日期:2009-07-23 00:00:00

  • Genetic association and stress mediated down-regulation in trabecular meshwork implicates MPP7 as a novel candidate gene in primary open angle glaucoma.

    abstract:BACKGROUND:Glaucoma is the largest cause of irreversible blindness affecting more than 60 million people globally. The disease is defined as a gradual loss of peripheral vision due to death of Retinal Ganglion Cells (RGC). The RGC death is largely influenced by the rate of aqueous humor production by ciliary processes ...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-016-0177-6

    authors: Vishal M,Sharma A,Kaurani L,Alfano G,Mookherjee S,Narta K,Agrawal J,Bhattacharya I,Roychoudhury S,Ray J,Waseem NH,Bhattacharya SS,Basu A,Sen A,Ray K,Mukhopadhyay A

    更新日期:2016-03-22 00:00:00

  • Reverse-engineering of gene networks for regulating early blood development from single-cell measurements.

    abstract:BACKGROUND:Recent advances in omics technologies have raised great opportunities to study large-scale regulatory networks inside the cell. In addition, single-cell experiments have measured the gene and protein activities in a large number of cells under the same experimental conditions. However, a significant challeng...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-017-0312-z

    authors: Wei J,Hu X,Zou X,Tian T

    更新日期:2017-12-28 00:00:00

  • Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy.

    abstract:BACKGROUND:The early diagnosis of lung cancer has been a critical problem in clinical practice for a long time and identifying differentially expressed gene as disease marker is a promising solution. However, the most existing gene differential expression analysis (DEA) methods have two main drawbacks: First, these met...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-019-0630-4

    authors: Zhao Z,Peng H,Zhang X,Zheng Y,Chen F,Fang L,Li J

    更新日期:2019-12-20 00:00:00

  • Using gene expression signatures to identify novel treatment strategies in gulf war illness.

    abstract:BACKGROUND:Gulf War Illness (GWI) is a complex multi-symptom disorder that affects up to one in three veterans of this 1991 conflict and for which no effective treatment has been found. Discovering novel treatment strategies for such a complex chronic illness is extremely expensive, carries a high probability of failur...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-015-0111-3

    authors: Craddock TJ,Harvey JM,Nathanson L,Barnes ZM,Klimas NG,Fletcher MA,Broderick G

    更新日期:2015-07-09 00:00:00

  • Whole exome sequencing in adult-onset hearing loss reveals a high load of predicted pathogenic variants in known deafness-associated genes and identifies new candidate genes.

    abstract:BACKGROUND:Deafness is a highly heterogenous disorder with over 100 genes known to underlie human non-syndromic hearing impairment. However, many more remain undiscovered, particularly those involved in the most common form of deafness: adult-onset progressive hearing loss. Despite several genome-wide association studi...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-018-0395-1

    authors: Lewis MA,Nolan LS,Cadge BA,Matthews LJ,Schulte BA,Dubno JR,Steel KP,Dawson SJ

    更新日期:2018-09-04 00:00:00

  • Integrative analysis reveals disease-associated genes and biomarkers for prostate cancer progression.

    abstract:BACKGROUND:Prostate cancer is one of the most common complex diseases with high leading cause of death in men. Identifications of prostate cancer associated genes and biomarkers are thus essential as they can gain insights into the mechanisms underlying disease progression and advancing for early diagnosis and developi...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-7-S1-S3

    authors: Li Y,Vongsangnak W,Chen L,Shen B

    更新日期:2014-01-01 00:00:00

  • The similarity of inherited diseases (II): clinical and biological similarity between the phenotypic series.

    abstract:BACKGROUND:Despite being caused by mutations in different genes, diseases in the same phenotypic series are clinically similar, as reported in Part I of this study. Here, in Part II, we hypothesized that the phenotypic series too might be clinically similar. Furthermore, on the assumption that gene mutations indirectly...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-020-00793-y

    authors: Gamba A,Salmona M,Cantù L,Bazzoni G

    更新日期:2020-09-24 00:00:00

  • Development of a blood-based gene expression algorithm for assessment of obstructive coronary artery disease in non-diabetic patients.

    abstract:BACKGROUND:Alterations in gene expression in peripheral blood cells have been shown to be sensitive to the presence and extent of coronary artery disease (CAD). A non-invasive blood test that could reliably assess obstructive CAD likelihood would have diagnostic utility. RESULTS:Microarray analysis of RNA samples from...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-4-26

    authors: Elashoff MR,Wingrove JA,Beineke P,Daniels SE,Tingley WG,Rosenberg S,Voros S,Kraus WE,Ginsburg GS,Schwartz RS,Ellis SG,Tahirkheli N,Waksman R,McPherson J,Lansky AJ,Topol EJ

    更新日期:2011-03-28 00:00:00

  • Network-based prediction and knowledge mining of disease genes.

    abstract:BACKGROUND:In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and da...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-8-S2-S9

    authors: Carson MB,Lu H

    更新日期:2015-01-01 00:00:00

  • Splice-site mutation causing partial retention of intron in the FLCN gene in Birt-Hogg-Dubé syndrome: a case report.

    abstract:BACKGROUND:Birt-Hogg-Dubé syndrome (BHD) is an autosomal dominant disorder caused by germline mutations in the folliculin gene (FLCN). Nearly 150 pathogenic mutations have been identified in FLCN. The most frequent pattern is a frameshift mutation within a coding exon. In addition, splice-site mutations have been repor...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-018-0359-5

    authors: Furuya M,Kobayashi H,Baba M,Ito T,Tanaka R,Nakatani Y

    更新日期:2018-05-02 00:00:00

  • Pharmacogenetic testing through the direct-to-consumer genetic testing company 23andMe.

    abstract:BACKGROUND:Rapid advances in scientific research have led to an increase in public awareness of genetic testing and pharmacogenetics. Direct-to-consumer (DTC) genetic testing companies, such as 23andMe, allow consumers to access their genetic information directly through an online service without the involvement of hea...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-017-0283-0

    authors: Lu M,Lewis CM,Traylor M

    更新日期:2017-06-19 00:00:00

  • WISARD: workbench for integrated superfast association studies for related datasets.

    abstract:BACKGROUND:A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome seq...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-018-0345-y

    authors: Lee S,Choi S,Qiao D,Cho M,Silverman EK,Park T,Won S

    更新日期:2018-04-20 00:00:00

  • 12q14 microduplication: a new clinical entity reciprocal to the microdeletion syndrome?

    abstract:BACKGROUND:12q14 microdeletion syndrome is characterized by low birth weight and failure to thrive, proportionate short stature and developmental delay. The opposite syndrome (microduplication) has not yet been characterized. Our main objective is the recognition of a new clinical entity - 12q14 microduplication syndro...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-019-0653-x

    authors: Dória S,Alves D,Pinho MJ,Pinto J,Leão M

    更新日期:2020-01-03 00:00:00

  • Tiling resolution array CGH and high density expression profiling of urothelial carcinomas delineate genomic amplicons and candidate target genes specific for advanced tumors.

    abstract:BACKGROUND:Urothelial carcinoma (UC) is characterized by nonrandom chromosomal aberrations, varying from one or a few changes in early-stage and low-grade tumors, to highly rearranged karyotypes in muscle-invasive lesions. Recent array-CGH analyses have shed further light on the genomic changes underlying the neoplasti...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-1-3

    authors: Heidenblad M,Lindgren D,Jonson T,Liedberg F,Veerla S,Chebil G,Gudjonsson S,Borg A,Månsson W,Höglund M

    更新日期:2008-01-31 00:00:00

  • HIP2: an online database of human plasma proteins from healthy individuals.

    abstract:BACKGROUND:With the introduction of increasingly powerful mass spectrometry (MS) techniques for clinical research, several recent large-scale MS proteomics studies have sought to characterize the entire human plasma proteome with a general objective for identifying thousands of proteins leaked from tissues in the circu...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-1-12

    authors: Saha S,Harrison SH,Shen C,Tang H,Radivojac P,Arnold RJ,Zhang X,Chen JY

    更新日期:2008-04-25 00:00:00

  • A systems biology approach to construct the gene regulatory network of systemic inflammation via microarray and databases mining.

    abstract:BACKGROUND:Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory netw...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-1-46

    authors: Chen BS,Yang SK,Lan CY,Chuang YJ

    更新日期:2008-09-30 00:00:00

  • Integration analysis of long non-coding RNA (lncRNA) role in tumorigenesis of colon adenocarcinoma.

    abstract:BACKGROUND:Colon adenocarcinoma (COAD) is one of the most common gastrointestinal cancers globally. Molecular aberrations of tumor suppressors and/or oncogenes are the main contributors to tumorigenesis. However, the exact underlying mechanisms of COAD pathogenesis are clearly not known yet. In this regard, there is an...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-020-00757-2

    authors: Poursheikhani A,Abbaszadegan MR,Nokhandani N,Kerachian MA

    更新日期:2020-07-29 00:00:00

  • Routine use of microarray-based gene expression profiling to identify patients with low cytogenetic risk acute myeloid leukemia: accurate results can be obtained even with suboptimal samples.

    abstract:BACKGROUND:Gene expression profiling has shown its ability to identify with high accuracy low cytogenetic risk acute myeloid leukemia such as acute promyelocytic leukemia and leukemias with t(8;21) or inv(16). The aim of this gene expression profiling study was to evaluate to what extent suboptimal samples with low leu...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-5-6

    authors: de la Blétière DR,Blanchet O,Cornillet-Lefèbvre P,Coutolleau A,Baranger L,Geneviève F,Luquet I,Hunault-Berger M,Beucher A,Schmidt-Tanguy A,Zandecki M,Delneste Y,Ifrah N,Guardiola P

    更新日期:2012-01-30 00:00:00

  • Transcriptomic signatures in whole blood of patients who acquire a chronic inflammatory response syndrome (CIRS) following an exposure to the marine toxin ciguatoxin.

    abstract:BACKGROUND:Ciguatoxins (CTXs) are polyether marine neurotoxins found in multiple reef-fish species and are potent activators of voltage-gated sodium channels. It is estimated that up to 500,000 people annually experience acute ciguatera poisoning from consuming toxic fish and a small percentage of these victims will de...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-015-0089-x

    authors: Ryan JC,Wu Q,Shoemaker RC

    更新日期:2015-04-02 00:00:00

  • Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks.

    abstract:BACKGROUND:One of the major goals of genomic medicine is the identification of causal genomic variants in a patient and their relation to the observed clinical phenotypes. Prioritizing the genomic variants by considering only the genotype information usually identifies a few hundred potential variants. Narrowing it dow...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-018-0372-8

    authors: Rao A,Vg S,Joseph T,Kotte S,Sivadasan N,Srinivasan R

    更新日期:2018-07-06 00:00:00

  • Glucocorticoids with different chemical structures but similar glucocorticoid receptor potency regulate subsets of common and unique genes in human trabecular meshwork cells.

    abstract:BACKGROUND:In addition to their well-documented ocular therapeutic effects, glucocorticoids (GCs) can cause sight-threatening side-effects including ocular hypertension presumably via morphological and biochemical changes in trabecular meshwork (TM) cells. In the present study, we directly compared the glucocorticoid r...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-2-58

    authors: Nehmé A,Lobenhofer EK,Stamer WD,Edelman JL

    更新日期:2009-09-10 00:00:00

  • Saliva samples are a viable alternative to blood samples as a source of DNA for high throughput genotyping.

    abstract:BACKGROUND:The increasing trend for incorporation of biological sample collection within clinical trials requires sample collection procedures which are convenient and acceptable for both patients and clinicians. This study investigated the feasibility of using saliva-extracted DNA in comparison to blood-derived DNA, a...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-5-19

    authors: Abraham JE,Maranian MJ,Spiteri I,Russell R,Ingle S,Luccarini C,Earl HM,Pharoah PP,Dunning AM,Caldas C

    更新日期:2012-05-30 00:00:00

  • Gene profiling of the erythro- and megakaryoblastic leukaemias induced by the Graffi murine retrovirus.

    abstract:BACKGROUND:Acute erythro- and megakaryoblastic leukaemias are associated with very poor prognoses and the mechanism of blastic transformation is insufficiently elucidated. The murine Graffi leukaemia retrovirus induces erythro- and megakaryoblastic leukaemias when inoculated into NFS mice and represents a good model to...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/1755-8794-3-2

    authors: Voisin V,Legault P,Ospina DP,Ben-David Y,Rassart E

    更新日期:2010-01-26 00:00:00

  • A comparison of machine learning classifiers for dementia with Lewy bodies using miRNA expression data.

    abstract:BACKGROUND:Dementia with Lewy bodies (DLB) is the second most common subtype of neurodegenerative dementia in humans following Alzheimer's disease (AD). Present clinical diagnosis of DLB has high specificity and low sensitivity and finding potential biomarkers of prodromal DLB is still challenging. MicroRNAs (miRNAs) h...

    journal_title:BMC medical genomics

    pub_type: 杂志文章

    doi:10.1186/s12920-019-0607-3

    authors: Shigemizu D,Akiyama S,Asanomi Y,Boroevich KA,Sharma A,Tsunoda T,Sakurai T,Ozaki K,Ochiya T,Niida S

    更新日期:2019-10-30 00:00:00