Optimal Re-sampling Strategy for Multi-Class Imbalanced Data

碩士 === 國立交通大學 === 工業工程與管理系所 === 101 === In many fields, developing an effective classification model to predict the category of incoming data is an important problem. For example, classification model can be utilized to predict certain type goods that the customers will purchase or to determine whet...

Full description

Bibliographic Details
Main Authors: Wu, Ping-Yi, 吳秉怡
Other Authors: Tong, Lee-Ing
Format: Others
Language:zh-TW
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/72083335909084160963
Description
Summary:碩士 === 國立交通大學 === 工業工程與管理系所 === 101 === In many fields, developing an effective classification model to predict the category of incoming data is an important problem. For example, classification model can be utilized to predict certain type goods that the customers will purchase or to determine whether the loan customer will be default or not. However, real-world categorical data are often imbalanced, that is, the sample size of a particular class is significantly greater than that of others. In this case, most of the classification methods fail to construct an accurate model to classify the imbalanced data. There were several studies focused on developing binary classification models, but these models are not appropriate for data involve three or more categories. Therefore, this study introduces an optimal re-sampling strategy using design of experiments (DOE) and dual response surface methodology (DRS) to improve the accuracy of classification model for multi-class imbalanced data. The real cases from KEEL-dataset are used to demonstrate the effectiveness of the proposed procedure.