| Summary: | Data mining and machine learning (DM & ML) approaches frequently face class imbalance (CI) issues, especially in binary classification tasks when one class significantly outnumbers the other. Due to their propensity to favor the majority class, traditional DM & ML methods may exhibit biases due to insufficient oversampling and subpar performance in identifying instances of the minority class where minority classes often carry critical importance. It consequently raises concern regarding algorithmic fairness. To address CI difficulties, it is essential to increase the ability to identify different discriminatory patterns in the data by creating a large number of test cases. The proposed approach aims to achieve more fair and unbiased model performance. We provide CSRBoost, an ensemble learning method to tackle the CI problem. Three essential methods are combined in CSRBoost: AdaBoost, undersampling, and oversampling. To improve the model’s capacity for generalization and provide a representative and balanced dataset, this approach provides the dynamic adjustments of clusters for sufficient granularity of the majority class in a controlled manner. It holds dataset-relevant structure information. Thus, the various horizons within the datasets remain unbroken. Besides, SMOTE and AdaBoost allow the model to adapt to complex boundaries by enhancing data diversity and minority class representation. CSRBoost is a dependable solution for CI data found in real-world classification tasks, as evidenced by its enhanced performance in handling imbalanced datasets.
|