A Novel Boundary Oversampling Algorithm Based on Neighborhood Rough Set Model: NRSBoundary-SMOTE
Rough set theory is a powerful mathematical tool introduced by Pawlak to deal with imprecise, uncertain, and vague information. The Neighborhood-Based Rough Set Model expands the rough set theory; it could divide the dataset into three parts. And the boundary region indicates that the majority class...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2013-01-01
|
Series: | Mathematical Problems in Engineering |
Online Access: | http://dx.doi.org/10.1155/2013/694809 |
Summary: | Rough set theory is a powerful mathematical tool introduced by Pawlak to deal with imprecise, uncertain, and vague information. The Neighborhood-Based Rough Set Model expands the rough set theory; it could divide the dataset into three parts. And the boundary region indicates that the majority class samples and the minority class samples are overlapped. On the basis of what we know about the distribution of original dataset, we only oversample the minority class samples, which are overlapped with the majority class samples, in the boundary region. So, the NRSBoundary-SMOTE can expand the decision space for the minority class; meanwhile, it will shrink the decision space for the majority class. After conducting an experiment on four kinds of classifiers, NRSBoundary-SMOTE has higher accuracy than other methods when C4.5, CART, and KNN are used but it is worse than SMOTE on classifier SVM. |
---|---|
ISSN: | 1024-123X 1563-5147 |