Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms

碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a give...

Full description

Bibliographic Details
Main Authors: Po-Cheng Wang, 王博正
Other Authors: Tzung-Pei Hong
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/mcvf5e
id ndltd-TW-097NSYS5392055
record_format oai_dc
spelling ndltd-TW-097NSYS53920552019-05-29T03:42:54Z http://ndltd.ncl.edu.tw/handle/mcvf5e Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms 基於基因演算法之自動屬性分群與特徵選取 Po-Cheng Wang 王博正 碩士 國立中山大學 資訊工程學系研究所 97 Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection. Tzung-Pei Hong 洪宗貝 2009 學位論文 ; thesis 83 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection.
author2 Tzung-Pei Hong
author_facet Tzung-Pei Hong
Po-Cheng Wang
王博正
author Po-Cheng Wang
王博正
spellingShingle Po-Cheng Wang
王博正
Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
author_sort Po-Cheng Wang
title Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_short Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_full Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_fullStr Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_full_unstemmed Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_sort automatic attribute clustering and feature selection based on genetic algorithms
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/mcvf5e
work_keys_str_mv AT pochengwang automaticattributeclusteringandfeatureselectionbasedongeneticalgorithms
AT wángbózhèng automaticattributeclusteringandfeatureselectionbasedongeneticalgorithms
AT pochengwang jīyújīyīnyǎnsuànfǎzhīzìdòngshǔxìngfēnqúnyǔtèzhēngxuǎnqǔ
AT wángbózhèng jīyújīyīnyǎnsuànfǎzhīzìdòngshǔxìngfēnqúnyǔtèzhēngxuǎnqǔ
_version_ 1719193010827165696