Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images.

Abstract:

:A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients.

journal_name

Appl Soft Comput

journal_title

Applied soft computing

authors

Calderon-Ramirez S,Yang S,Moemeni A,Elizondo D,Colreavy-Donnelly S,Chavarría-Estrada LF,Molina-Cabello MA

doi

10.1016/j.asoc.2021.107692

keywords:

["COVID-19","Computer aided diagnosis","Coronavirus","Data imbalance","Semi-supervised learning"]

subject

Has Abstract

pub_date

2021-11-01 00:00:00

pages

107692

eissn

1568-4946

issn

1872-9681

pii

S1568-4946(21)00613-X

journal_volume

111

pub_type

杂志文章
  • Mitigating the risk of infection spread in manual order picking operations: A multi-objective approach.

    abstract::In the aftermath of the COVID-19 pandemic, supply chains experienced an unprecedented challenge to fulfill consumers' demand. As a vital operational component, manual order picking operations are highly prone to infection spread among the workers, and thus, susceptible to interruption. This study revisits the well-kno...

    journal_title:Applied soft computing

    pub_type: 杂志文章

    doi:10.1016/j.asoc.2020.106953

    authors: Ardjmand E,Singh M,Shakeri H,Tavasoli A,Young Ii WA

    更新日期:2021-03-01 00:00:00

  • A Novel Medical Diagnosis model for COVID-19 infection detection based on Deep Features and Bayesian Optimization.

    abstract::A pneumonia of unknown causes, which was detected in Wuhan, China, and spread rapidly throughout the world, was declared as Coronavirus disease 2019 (COVID-19). Thousands of people have lost their lives to this disease. Its negative effects on public health are ongoing. In this study, an intelligence computer-aided mo...

    journal_title:Applied soft computing

    pub_type: 杂志文章

    doi:10.1016/j.asoc.2020.106580

    authors: Nour M,Cömert Z,Polat K

    更新日期:2020-12-01 00:00:00

  • Temporal event searches based on event maps and relationships.

    abstract::To satisfy a user's need to find and understand the whole picture of an event effectively and efficiently, in this paper we formalize the problem of temporal event searches and propose a framework of event relationship analysis for search events based on user queries. We define three kinds of event relationships: temp...

    journal_title:Applied soft computing

    pub_type: 杂志文章

    doi:10.1016/j.asoc.2019.105750

    authors: Cai Y,Xie H,Lau RYK,Li Q,Wong TL,Wang FL

    更新日期:2019-12-01 00:00:00

  • Routine Discovery of Complex Genetic Models using Genetic Algorithms.

    abstract::Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of dise...

    journal_title:Applied soft computing

    pub_type: 杂志文章

    doi:10.1016/j.asoc.2003.08.003

    authors: Moore JH,Hahn LW,Ritchie MD,Thornton TA,White BC

    更新日期:2004-02-01 00:00:00

  • A novel capacity sharing mechanism to collaborative activities in the blood collection process during the COVID-19 outbreak.

    abstract::Because of government intervention, such as quarantine and cancellation of public events at the peak of the COVID-19 outbreak and donors' health scare of exposure to the virus in medical centers, the number of blood donors has considerably decreased. In some countries, the rate of blood donation has reached lower than...

    journal_title:Applied soft computing

    pub_type: 杂志文章

    doi:10.1016/j.asoc.2021.107821

    authors: Samani MRG,Hosseini-Motlagh SM

    更新日期:2021-08-13 00:00:00