CSRBoost: Clustered Sampling With Resampling Boosting for Imbalanced Dataset Pattern Classification

Data mining and machine learning (DM & ML) approaches frequently face class imbalance (CI) issues, especially in binary classification tasks when one class significantly outnumbers the other. Due to their propensity to favor the majority class, traditional DM & ML methods may e...

Full description

Bibliographic Details
Published in:IEEE Access
Main Authors: Seema Yadav, Dhruvanshu Joshi, Soham Mulye, Labib Asari, Sandeep S. Udmale, Girish P. Bhole
Format: Article
Language:English
Published: IEEE 2025-01-01
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11184792/
Description
Summary:Data mining and machine learning (DM & ML) approaches frequently face class imbalance (CI) issues, especially in binary classification tasks when one class significantly outnumbers the other. Due to their propensity to favor the majority class, traditional DM & ML methods may exhibit biases due to insufficient oversampling and subpar performance in identifying instances of the minority class where minority classes often carry critical importance. It consequently raises concern regarding algorithmic fairness. To address CI difficulties, it is essential to increase the ability to identify different discriminatory patterns in the data by creating a large number of test cases. The proposed approach aims to achieve more fair and unbiased model performance. We provide CSRBoost, an ensemble learning method to tackle the CI problem. Three essential methods are combined in CSRBoost: AdaBoost, undersampling, and oversampling. To improve the model’s capacity for generalization and provide a representative and balanced dataset, this approach provides the dynamic adjustments of clusters for sufficient granularity of the majority class in a controlled manner. It holds dataset-relevant structure information. Thus, the various horizons within the datasets remain unbroken. Besides, SMOTE and AdaBoost allow the model to adapt to complex boundaries by enhancing data diversity and minority class representation. CSRBoost is a dependable solution for CI data found in real-world classification tasks, as evidenced by its enhanced performance in handling imbalanced datasets.
ISSN:2169-3536