An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem

Minority oversampling techniques have played a pivotal role in the field of imbalanced learning. While traditional oversampling algorithms can cause problems such as intra-class imbalance of samples, ignoring important information of boundary samples, and high similarity between new and old samples....

Full description

Bibliographic Details
Main Authors: Chao-Ran Wang, Xin-Hui Shao
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9311147/
id doaj-3b33d7e789e14729a5a475e0d7fbcf12
record_format Article
spelling doaj-3b33d7e789e14729a5a475e0d7fbcf122021-03-30T15:16:46ZengIEEEIEEE Access2169-35362021-01-0195069508210.1109/ACCESS.2020.30479239311147An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification ProblemChao-Ran Wang0https://orcid.org/0000-0002-2071-2633Xin-Hui Shao1https://orcid.org/0000-0002-4120-8428College of Sciences, Northeastern University, Shenyang, ChinaCollege of Sciences, Northeastern University, Shenyang, ChinaMinority oversampling techniques have played a pivotal role in the field of imbalanced learning. While traditional oversampling algorithms can cause problems such as intra-class imbalance of samples, ignoring important information of boundary samples, and high similarity between new and old samples. Based on the situation, we proposed a new type of over-sampling method, BIRCH and Boundary Midpoint Centroid Synthetic Minority Over-Sampling Technique (BI-BMCSMOTE). First of all, the algorithm used the BIRCH clustering method to achieve quick cluster of the minority samples. After identifying and removing the noise, it marked the boundary minority samples in the label by probability. Secondly, it generated a density function for each sample cluster, calculated its density and sampling weight, performed midpoint composite sampling among the minority samples marked by probability and other minority samples in each cluster, and then calculated and analyzed the specific value of composite sampling to improve the accuracy of the model. According to the experimental results, the algorithm was proved to be valid.https://ieeexplore.ieee.org/document/9311147/Oversamplingboundaryminority sampleSMOTEBIRCHimbalanced learning
collection DOAJ
language English
format Article
sources DOAJ
author Chao-Ran Wang
Xin-Hui Shao
spellingShingle Chao-Ran Wang
Xin-Hui Shao
An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
IEEE Access
Oversampling
boundary
minority sample
SMOTE
BIRCH
imbalanced learning
author_facet Chao-Ran Wang
Xin-Hui Shao
author_sort Chao-Ran Wang
title An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
title_short An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
title_full An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
title_fullStr An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
title_full_unstemmed An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
title_sort improving majority weighted minority oversampling technique for imbalanced classification problem
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Minority oversampling techniques have played a pivotal role in the field of imbalanced learning. While traditional oversampling algorithms can cause problems such as intra-class imbalance of samples, ignoring important information of boundary samples, and high similarity between new and old samples. Based on the situation, we proposed a new type of over-sampling method, BIRCH and Boundary Midpoint Centroid Synthetic Minority Over-Sampling Technique (BI-BMCSMOTE). First of all, the algorithm used the BIRCH clustering method to achieve quick cluster of the minority samples. After identifying and removing the noise, it marked the boundary minority samples in the label by probability. Secondly, it generated a density function for each sample cluster, calculated its density and sampling weight, performed midpoint composite sampling among the minority samples marked by probability and other minority samples in each cluster, and then calculated and analyzed the specific value of composite sampling to improve the accuracy of the model. According to the experimental results, the algorithm was proved to be valid.
topic Oversampling
boundary
minority sample
SMOTE
BIRCH
imbalanced learning
url https://ieeexplore.ieee.org/document/9311147/
work_keys_str_mv AT chaoranwang animprovingmajorityweightedminorityoversamplingtechniqueforimbalancedclassificationproblem
AT xinhuishao animprovingmajorityweightedminorityoversamplingtechniqueforimbalancedclassificationproblem
AT chaoranwang improvingmajorityweightedminorityoversamplingtechniqueforimbalancedclassificationproblem
AT xinhuishao improvingmajorityweightedminorityoversamplingtechniqueforimbalancedclassificationproblem
_version_ 1724179733677080576