An Approach for Mining Imbalanced Datasets Combining Specialized Oversampling and Undersampling Methods

The paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the...

詳細記述

書誌詳細
出版年:IEEE Access
主要な著者: Joanna Jedrzejowicz, Piotr Jedrzejowicz
フォーマット: 論文
言語:英語
出版事項: IEEE 2023-01-01
主題:
オンライン・アクセス:https://ieeexplore.ieee.org/document/10339319/
その他の書誌記述
要約:The paper proposes an approach for mining imbalanced datasets combining specialized oversampling and undersampling methods. The oversampling part produces a set of non-dominated synthetic examples using two, possibly conflicting, criteria including classification potential and the distance from the borderline between minority and majority distances. The undersampling part is used to remove from the majority class examples that are likely to cause mistakes and disturbances in the process of mining. To validate the approach an extensive computational experiment has been carried. Performance of the proposed approach has been compared with that of several leading algorithms proposed for balancing minority and majority datasets. To assure fairness of comparisons a singular learner based on Gene Expression Programming (GEP) has been used in all cases. Experiment results confirmed that the proposed approach outperforms other methods investigated in the experiment.
ISSN:2169-3536