TAQIH, a tool for tabular data quality assessment and improvement in the context of health data.

Abstract:

BACKGROUND AND OBJECTIVES:Data curation is a tedious task but of paramount relevance for data analytics and more specially in the health context where data-driven decisions must be extremely accurate. The ambition of TAQIH is to support non-technical users on 1) the exploratory data analysis (EDA) process of tabular health data, and 2) the assessment and improvement of its quality. METHODS:A web-based tool has been implemented with a simple yet powerful visual interface. First, it provides interfaces to understand the dataset, to gain the understanding of the content, structure and distribution. Then, it provides data visualization and improvement utilities for the data quality dimensions of completeness, accuracy, redundancy and readability. RESULTS:It has been applied in two different scenarios. (1) The Northern Ireland General Practitioners (GPs) Prescription Data, an open data set containing drug prescriptions. (2) A glucose monitoring tele health system dataset. Findings on (1) include: Features that had significant amount of missing values (e.g. AMP_NM variable 53.39%); instances that have high percentage of variable values missing (e.g. 0.21% of the instances with > 75% of missing values); highly correlated variables (e.g. Gross and Actual cost almost completely correlated (∼ + 1.0)). Findings on (2) include: Features that had significant amount of missing values (e.g. patient height, weight and body mass index (BMI) (> 70%), date of diagnosis 13%)); highly correlated variables (e.g. height, weight and BMI). Full detail of the testing and insights related to findings are reported. CONCLUSIONS:TAQIH enables and supports users to carry out EDA on tabular health data and to assess and improve its quality. Having the layout of the application menu arranged sequentially as the conventional EDA pipeline helps following a consistent analysis process. The general description of the dataset and features section is very useful for the first overview of the dataset. The missing value heatmap is also very helpful in visually identifying correlations among missing values. The correlations section has proved to be supportive as a preliminary step before further data analysis pipelines, as well as the outliers section. Finally, the data quality section provides a quantitative value to the dataset improvements.

authors

Álvarez Sánchez R,Beristain Iraola A,Epelde Unanue G,Carlin P

doi

10.1016/j.cmpb.2018.12.029

subject

Has Abstract

pub_date

2019-11-01 00:00:00

pages

104824

eissn

0169-2607

issn

1872-7565

pii

S0169-2607(18)30418-8

journal_volume

181

pub_type

杂志文章
  • OLYMPUS: an automated hybrid clustering method in time series gene expression. Case study: host response after Influenza A (H1N1) infection.

    abstract::The increasing flow of short time series microarray experiments for the study of dynamic cellular processes poses the need for efficient clustering tools. These tools must deal with three primary issues: first, to consider the multi-functionality of genes; second, to evaluate the similarity of the relative change of a...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2013.05.025

    authors: Dimitrakopoulou K,Vrahatis AG,Wilk E,Tsakalidis AK,Bezerianos A

    更新日期:2013-09-01 00:00:00

  • Automatic recognition of cell layers in corneal confocal microscopy images.

    abstract::A confocal microscope can produce gray-scale images of the different layers of the cornea. We have addressed the problem of classifying these images, i.e. recognizing the layer displayed, using the shape of the cells contained, which is uniquely related to each specific layer. A first method was designed, based first ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/s0169-2607(01)00153-5

    authors: Ruggeri A,Pajaro S

    更新日期:2002-04-01 00:00:00

  • Automatic detection of breast border and nipple in digital mammograms.

    abstract::Advances in the area of computerized image analysis applied to mammography may have very important practical applications in automatically detecting asymmetries (masses, architectural distortions, etc.) between the two breasts. We have developed a fully automatic technique to detect the breast border and the nipple, t...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/0169-2607(96)01724-5

    authors: Méndez AJ,Tahoces PG,Lado MJ,Souto M,Correa JL,Vidal JJ

    更新日期:1996-05-01 00:00:00

  • The future of Medicare policy reform.

    abstract::The Medicare program, the largest health insurance program in the United States, is clearly at a crossroads as it enters its third decade. Historical increases in health care expenditures, plus a changing political and economic landscape, have set the groundwork for policy reform. Two basic reform strategies--reimburs...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/0169-2607(87)90055-1

    authors: Dobson A,Langenbrunner JC

    更新日期:1987-09-01 00:00:00

  • Reliability and validity of DS-ADHD: A decision support system on attention deficit hyperactivity disorders.

    abstract:BACKGROUND AND OBJECTIVES:The purpose of this study is to examine the reliability of the clinical use of the self-built decision support system, diagnosis-supported attention deficit hyperactivity disorder (DS-ADHD), in an effort to develop the DS-ADHD system, by probing into the development of indicating patterns of p...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2016.12.003

    authors: Chu KC,Huang YS,Tseng CF,Huang HJ,Wang CH,Tai HY

    更新日期:2017-03-01 00:00:00

  • Capturing the semantic relationship between clinical terms with current MeSH bibliographic coding.

    abstract::This paper compares bibliographic retrieval using current MeSH (Medical Subject Headings) to bibliographic retrieval using explicitly coded semantic relationships between index terms. In a previous study, ten lists of abstracts, each list containing 20-40 papers discussing a specific pair of terms, were analyzed to id...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/0169-2607(88)90084-3

    authors: Miller PL,Smith P,Morrow JS,Riely CA,Powsner SM

    更新日期:1988-11-01 00:00:00

  • A survey on histological image analysis-based assessment of three major biological factors influencing radiotherapy: proliferation, hypoxia and vasculature.

    abstract::Image analysis is a rapidly evolving field with growing applications in science and engineering. In cancer research, it has played a key role in advancing techniques of major diagnostic importance, minimising human intervention and providing vital clinical information. Especially in the field of tissue microscopy, the...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2003.07.001

    authors: Loukas CG,Linney A

    更新日期:2004-06-01 00:00:00

  • Entropy analysis of muscular near-infrared spectroscopy (NIRS) signals during exercise programme of type 2 diabetic patients: quantitative assessment of muscle metabolic pattern.

    abstract::Diabetes mellitus (DM) is a metabolic disorder that is widely rampant throughout the world population these days. The uncontrolled DM may lead to complications of eye, heart, kidney and nerves. The most common type of diabetes is the type 2 diabetes or insulin-resistant DM. Near-infrared spectroscopy (NIRS) technology...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2013.08.018

    authors: Molinari F,Acharya UR,Martis RJ,De Luca R,Petraroli G,Liboni W

    更新日期:2013-12-01 00:00:00

  • Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry.

    abstract:INTRODUCTION:Being able to predict functional outcomes after a stroke is highly desirable for clinicians. This allows clinicians to set reasonable goals with patients and relatives, and to reach shared after-care decisions for recovery or rehabilitation. The aim of this study was to apply various machine learning (ML) ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2020.105381

    authors: Lin CH,Hsu KC,Johnson KR,Fann YC,Tsai CH,Sun Y,Lien LM,Chang WL,Chen PL,Lin CL,Hsu CY,Taiwan Stroke Registry Investigators.

    更新日期:2020-07-01 00:00:00

  • Real-time simulation of ultrasound refraction phenomena using ray-trace based wavefront construction method.

    abstract::Ultrasound (US) imaging is one of the most popular techniques used in clinical diagnosis, mainly due to lack of adverse effects on patients and the simplicity of US equipment. However, the characteristics of the medium cause US imaging to imprecisely reconstruct examined tissues. The artifacts are the results of wave ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2016.07.034

    authors: Szostek K,Piórkowski A

    更新日期:2016-10-01 00:00:00

  • Facilitating pharmacometric workflow with the metrumrg package for R.

    abstract::metrumrg is an R package that facilitates workflow for the discipline of pharmacometrics. Support is provided for data preparation, modeling, simulation, diagnostics, and reporting. Existing tools and techniques are emphasized where available; original solutions are provided for otherwise unmet needs. In particular, m...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2012.08.009

    authors: Bergsma TT,Knebel W,Fisher J,Gillespie WR,Riggs MM,Gibiansky L,Gastonguay MR

    更新日期:2013-01-01 00:00:00

  • An advanced method in fetal phonocardiography.

    abstract::The long-term variability of the fetal heart rate (FHR) provides valuable information on the fetal health status. The routine clinical FHR measurements are usually carried out by the means of ultrasound cardiography. Although the frequent FHR monitoring is recommendable, the high quality ultrasound devices are so expe...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/s0169-2607(02)00111-6

    authors: Várady P,Wildt L,Benyó Z,Hein A

    更新日期:2003-07-01 00:00:00

  • Utilization of Discretization method on the diagnosis of optic nerve disease.

    abstract::The optic nerve disease is an important disease that appears commonly in public. In this paper, we propose a hybrid diagnostic system based on discretization (quantization) method and classification algorithms including C4.5 decision tree classifier, artificial neural network (ANN), and least square support vector mac...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2008.04.009

    authors: Polat K,Kara S,Güven A,Güneş S

    更新日期:2008-09-01 00:00:00

  • Automated expert multiexponential biomodeling interactively over the Internet.

    abstract::DIMSUM, an acronym for DIMension of a SUM of exponentials, is a highly automated expert system for fitting multiexponential models of increasing dimension to time series data. Up to now, a researcher has needed an individual copy of DIMSUM on his or her own computer as well as support to learn how to use it. W3DIMSUM,...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2005.03.008

    authors: Harless C,Distefano JJ 3rd

    更新日期:2005-08-01 00:00:00

  • Amplitude-aware permutation entropy: Illustration in spike detection and signal segmentation.

    abstract:BACKGROUND AND OBJECTIVE:Signal segmentation and spike detection are two important biomedical signal processing applications. Often, non-stationary signals must be segmented into piece-wise stationary epochs or spikes need to be found among a background of noise before being further analyzed. Permutation entropy (PE) h...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2016.02.008

    authors: Azami H,Escudero J

    更新日期:2016-05-01 00:00:00

  • The use of simulated annealing for finding optimal population designs.

    abstract::The development of functions for MATLAB and S-PLUS that can be used for the evaluation of specific population pharmacokinetic designs has been described recently. These functions are based on the evaluation of an approximation of the population Fisher information matrix. Optimisation of the design of the population ex...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/s0169-2607(01)00178-x

    authors: Duffull SB,Retout S,Mentré F

    更新日期:2002-07-01 00:00:00

  • A microcomputer based lung sounds analysis.

    abstract::The use of a microcomputer in lung sound-analysis is described. The system was used experimentally in order to evaluate automated auscultation as a mean for improving the sensitivity of pulmonary health mass screening. The sound signals from four custom-made piezoelectric transducers, affixed at specific locations on ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/0169-2607(93)90045-m

    authors: Nissan M,Gavriely N

    更新日期:1993-05-01 00:00:00

  • NONMEMory: a run management tool for NONMEM.

    abstract::NONMEM is an extremely powerful tool for nonlinear mixed-effect modelling and simulation of pharmacokinetic and pharmacodynamic data. However, it is a console-based application whose output does not lend itself to rapid interpretation or efficient management. NONMEMory has been created to be a comprehensive project ma...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2005.02.003

    authors: Wilkins JJ

    更新日期:2005-06-01 00:00:00

  • Control system design of a 3-DOF upper limbs rehabilitation robot.

    abstract::This paper presents the control system design of a rehabilitation and training robot for the upper limbs. Based on a hierarchical structure, this control system allows the execution of sequence of switching control laws (position, force, impedance and force/impedance) corresponding to the required training configurati...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2007.07.006

    authors: Denève A,Moughamir S,Afilal L,Zaytoon J

    更新日期:2008-02-01 00:00:00

  • Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.

    abstract::This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2016.01.017

    authors: Siuly,Yin X,Hadjiloucas S,Zhang Y

    更新日期:2016-04-01 00:00:00

  • The use of a modified Fedorov exchange algorithm to optimise sampling times for population pharmacokinetic experiments.

    abstract::We propose a new algorithm for optimising sampling times for population pharmacokinetic experiments using D-optimality. The algorithm was used in conjunction with the population Fisher information matrix as implemented in MATLAB (PFIM 1.1 and 1.2) to evaluate population pharmacokinetic designs. The new algorithm based...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2005.07.001

    authors: Ogungbenro K,Graham G,Gueorguieva I,Aarons L

    更新日期:2005-11-01 00:00:00

  • Profiling intra-patient type I diabetes behaviors.

    abstract:BACKGROUND:The large intra-patient variability in type 1 diabetic patients dramatically reduces the ability to achieve adequate blood glucose control. A novel methodology to identify different blood glucose dynamics profiles will allow therapies to be more accurate and tailored according to patient's conditions and to ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2016.08.022

    authors: Contreras I,Quirós C,Giménez M,Conget I,Vehi J

    更新日期:2016-11-01 00:00:00

  • Predicting body fat percentage based on gender, age and BMI by using artificial neural networks.

    abstract::In the human body, the relation between fat and fat-free mass (muscles, bones etc.) is necessary for the diagnosis of obesity and prediction of its comorbidities. Numerous formulas, such as Deurenberg et al., Gallagher et al., Jackson and Pollock, Jackson et al. etc., are available to predict body fat percentage (BF%)...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2013.10.013

    authors: Kupusinac A,Stokić E,Doroslovački R

    更新日期:2014-02-01 00:00:00

  • Risk of fracture in elderly patients: a new predictive index based on bone mineral density and finite element analysis.

    abstract::Hip fracture is more and more frequent in elderly population. For this reason, an increasing attention has been focused on the development of a non-invasive method to predict femoral neck fracture. A conventional approach to fracture diagnosis is the measurement of bone mass by dual-energy X-ray absorptiometry in some...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/s0169-2607(99)00007-3

    authors: Testi D,Viceconti M,Baruffaldi F,Cappello A

    更新日期:1999-07-01 00:00:00

  • Development of an auxiliary system for the execution of vascular catheter interventions with a reduced radiological risk; system description and first experimental results.

    abstract::Vascular catheterization is a common procedure in clinical medicine. It is normally performed by a specialist using an X-ray fluoroscopic guide and contrast-media. In the present paper, an image-guided navigation system which indicates a path providing guidance to the desired target inside the vascular tree is describ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2007.07.009

    authors: Placidi G,Franchi D,Marsili L,Gallo P

    更新日期:2007-11-01 00:00:00

  • Assessment of hepatic insulin degradation, in normoglycemic hypertensive patients, by minimal modelling of standard intravenous glucose tolerance test data.

    abstract::Role of hepatic insulin degradation in modulating insulin delivery to peripheral circulation, in insulin-resistant hypertensive patients, is not yet fully understood. This issue was investigated here by a novel application to hypertension of a previously proposed minimal modelling of insulin and C-peptide data, using ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2009.08.007

    authors: Di Nardo F,Boemi M,Burattini R

    更新日期:2010-02-01 00:00:00

  • Classification of auditory selective attention using spatial coherence and modular attention index.

    abstract:BACKGROUND AND OBJECTIVE:Brain-Computer Interfaces (BCIs) based on auditory selective attention have been receiving much attention because i) they are useful for completely paralyzed users since they do not require muscular effort or gaze and ii) focusing attention is a natural human ability. Several techniques - such ...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2018.10.002

    authors: de Souza AP,Soares QB,Felix LB,Mendes EMAM

    更新日期:2018-11-01 00:00:00

  • An interactive medical image segmentation system based on the optimal management of regions of interest using topological medical knowledge.

    abstract::This paper presents an original interactive system for efficient medical image segmentation in computer aided diagnosis. The main originality concerns the method used to manage, according to an a priori topological-based structural model, regions of interest (ROIs) within which computations can be constrained. The goa...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2006.04.004

    authors: Fasquel JB,Agnus V,Moreau J,Soler L,Marescaux J

    更新日期:2006-06-01 00:00:00

  • Mask_explorer: A tool for exploring brain masks in fMRI group analysis.

    abstract:BACKGROUND AND OBJECTIVE:Functional magnetic resonance imaging (fMRI) studies of the human brain are appearing in increasing numbers, providing interesting information about this complex system. Unique information about healthy and diseased brains is inferred using many types of experiments and analyses. In order to ob...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2016.07.015

    authors: Gajdoš M,Mikl M,Mareček R

    更新日期:2016-10-01 00:00:00

  • Evaluation of amplitude-based sorting algorithm to reduce lung tumor blurring in PET images using 4D NCAT phantom.

    abstract:PURPOSE:develop and validate a PET sorting algorithm based on the respiratory amplitude to correct for abnormal respiratory cycles. METHOD AND MATERIALS:using the 4D NCAT phantom model, 3D PET images were simulated in lung and other structures at different times within a respiratory cycle and noise was added. To valid...

    journal_title:Computer methods and programs in biomedicine

    pub_type: 杂志文章

    doi:10.1016/j.cmpb.2007.05.004

    authors: Wang J,Byrne J,Franquiz J,McGoron A

    更新日期:2007-08-01 00:00:00