Distance Features in Automatic Data Classification

碩士 === 國立中央大學 === 資訊管理研究所 === 98 === In data mining and pattern classification, feature extraction and representation is a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and repr...

Full description

Bibliographic Details
Main Authors: Zhen-fu Hong, 洪振富
Other Authors: Chih-fong Tsai
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/56066844700694151443
id ndltd-TW-098NCU05396010
record_format oai_dc
spelling ndltd-TW-098NCU053960102016-04-20T04:17:46Z http://ndltd.ncl.edu.tw/handle/56066844700694151443 Distance Features in Automatic Data Classification 距離式特徵於資料自動分類之研究 Zhen-fu Hong 洪振富 碩士 國立中央大學 資訊管理研究所 98 In data mining and pattern classification, feature extraction and representation is a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and representation methods have been proposed. However, many of them only focus on specific domain problems. In this thesis, we introduce a novel distance based feature extraction method for various pattern classification problems. Specifically, three distances are extracted, which are based on the distance between the data and its intra-cluster center and the distance between the data and its extra-cluster centers. Experiments based on ten datasets containing different numbers of classes, samples, and dimensions are examined. The experimental results using naïve Bayes, k-NN, and SVM classifiers show that concatenating the original features provided by the datasets to the distance based features can improve classification accuracy except image related datasets. In particular, the distance based features are suitable for the datasets which have smaller numbers of classes, numbers of samples, and the lower dimensionality of features. Moreover, two datasets, which have similar characteristics, are further used to validate this finding. The result is consistent with the first experiment result that adding the distance based features can improve the classification performance. Chih-fong Tsai 蔡志豐 2010 學位論文 ; thesis 68 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊管理研究所 === 98 === In data mining and pattern classification, feature extraction and representation is a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and representation methods have been proposed. However, many of them only focus on specific domain problems. In this thesis, we introduce a novel distance based feature extraction method for various pattern classification problems. Specifically, three distances are extracted, which are based on the distance between the data and its intra-cluster center and the distance between the data and its extra-cluster centers. Experiments based on ten datasets containing different numbers of classes, samples, and dimensions are examined. The experimental results using naïve Bayes, k-NN, and SVM classifiers show that concatenating the original features provided by the datasets to the distance based features can improve classification accuracy except image related datasets. In particular, the distance based features are suitable for the datasets which have smaller numbers of classes, numbers of samples, and the lower dimensionality of features. Moreover, two datasets, which have similar characteristics, are further used to validate this finding. The result is consistent with the first experiment result that adding the distance based features can improve the classification performance.
author2 Chih-fong Tsai
author_facet Chih-fong Tsai
Zhen-fu Hong
洪振富
author Zhen-fu Hong
洪振富
spellingShingle Zhen-fu Hong
洪振富
Distance Features in Automatic Data Classification
author_sort Zhen-fu Hong
title Distance Features in Automatic Data Classification
title_short Distance Features in Automatic Data Classification
title_full Distance Features in Automatic Data Classification
title_fullStr Distance Features in Automatic Data Classification
title_full_unstemmed Distance Features in Automatic Data Classification
title_sort distance features in automatic data classification
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/56066844700694151443
work_keys_str_mv AT zhenfuhong distancefeaturesinautomaticdataclassification
AT hóngzhènfù distancefeaturesinautomaticdataclassification
AT zhenfuhong jùlíshìtèzhēngyúzīliàozìdòngfēnlèizhīyánjiū
AT hóngzhènfù jùlíshìtèzhēngyúzīliàozìdòngfēnlèizhīyánjiū
_version_ 1718228189967286272