A Comparative Study of Feature Selection and Classification Techniques in Different Domain

碩士 === 國立臺北科技大學 === 電資學院外國學生專班 === 105 === There is no individual classification or feature selection technique has been shown to deal with all kinds of classification problems. A comparison of classifiers, as well as feature selectors coming from many areas of knowledge over different properties an...

Full description

Bibliographic Details
Main Author: Noviyanti Tri Maretta Sagala
Other Authors: Jenq-Haur Wang
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/g447fs
Description
Summary:碩士 === 國立臺北科技大學 === 電資學院外國學生專班 === 105 === There is no individual classification or feature selection technique has been shown to deal with all kinds of classification problems. A comparison of classifiers, as well as feature selectors coming from many areas of knowledge over different properties and domain of data sets, would be useful to obtain a clear idea about the performances or capabilities of each classifier and feature selector. The objective is to select the technique (classifier and feature selector) which more possibly reaches the best performance for any domain of data set. In this study, we focus on classifying datasets in different domains and properties such as numerical, categorical, and textual. We deal with one versus all strategy to handle multi-class problems. In the experiment, we compared the performance of 4 classification techniques namely Boosted C5.0, KNN, Naïve Bayes, and SVM. We further evaluated the effects of three feature selection methods; Correlation-based Feature Selection, Information Gain, and ReliefF. A different number of k-fold cross validation was used to evaluate the classification performance. For numerical data set (low and high dimensional data set), the performance of KNN was better than other classification methods. For a categorical and textual data set, Naïve Bayes and SVM were outperformed, respectively. We obtained that the use of 10-fold cross-validation was effectively improved the performance of classifier than 5-fold cross-validation.