A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems

There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance...

وصف كامل

التفاصيل البيبلوغرافية
الحاوية / القاعدة:Frontiers in Digital Health
المؤلفون الرئيسيون: Yuxuan Yang, Hadi Akbarzadeh Khorshidi, Uwe Aickelin
التنسيق: مقال
اللغة:الإنجليزية
منشور في: Frontiers Media S.A. 2024-07-01
الموضوعات:
الوصول للمادة أونلاين:https://www.frontiersin.org/articles/10.3389/fdgth.2024.1430245/full
_version_ 1850025234752602112
author Yuxuan Yang
Hadi Akbarzadeh Khorshidi
Hadi Akbarzadeh Khorshidi
Uwe Aickelin
author_facet Yuxuan Yang
Hadi Akbarzadeh Khorshidi
Hadi Akbarzadeh Khorshidi
Uwe Aickelin
author_sort Yuxuan Yang
collection DOAJ
container_title Frontiers in Digital Health
description There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.
format Article
id doaj-art-e5a2e31b16c04ad7bec05f7e9dcfa281
institution Directory of Open Access Journals
issn 2673-253X
language English
publishDate 2024-07-01
publisher Frontiers Media S.A.
record_format Article
spelling doaj-art-e5a2e31b16c04ad7bec05f7e9dcfa2812025-08-20T00:38:21ZengFrontiers Media S.A.Frontiers in Digital Health2673-253X2024-07-01610.3389/fdgth.2024.14302451430245A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problemsYuxuan Yang0Hadi Akbarzadeh Khorshidi1Hadi Akbarzadeh Khorshidi2Uwe Aickelin3School of Computing and Information Systems, The University of Melbourne, Parkville, VIC, AustraliaSchool of Computing and Information Systems, The University of Melbourne, Parkville, VIC, AustraliaCancer Health Services Research, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, VIC, AustraliaSchool of Computing and Information Systems, The University of Melbourne, Parkville, VIC, AustraliaThere has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.https://www.frontiersin.org/articles/10.3389/fdgth.2024.1430245/fullover-samplingre-samplingmulti-classimbalancedreviewmedical
spellingShingle Yuxuan Yang
Hadi Akbarzadeh Khorshidi
Hadi Akbarzadeh Khorshidi
Uwe Aickelin
A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
over-sampling
re-sampling
multi-class
imbalanced
review
medical
title A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
title_full A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
title_fullStr A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
title_full_unstemmed A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
title_short A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
title_sort review on over sampling techniques in classification of multi class imbalanced datasets insights for medical problems
topic over-sampling
re-sampling
multi-class
imbalanced
review
medical
url https://www.frontiersin.org/articles/10.3389/fdgth.2024.1430245/full
work_keys_str_mv AT yuxuanyang areviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT hadiakbarzadehkhorshidi areviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT hadiakbarzadehkhorshidi areviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT uweaickelin areviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT yuxuanyang reviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT hadiakbarzadehkhorshidi reviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT hadiakbarzadehkhorshidi reviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems
AT uweaickelin reviewonoversamplingtechniquesinclassificationofmulticlassimbalanceddatasetsinsightsformedicalproblems