Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms

碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a give...

Full description

Bibliographic Details
Main Authors:	Po-Cheng Wang, 王博正
Other Authors:	Tzung-Pei Hong
Format:	Others
Language:	en_US
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/mcvf5e

id	ndltd-TW-097NSYS5392055
record_format	oai_dc
spelling	ndltd-TW-097NSYS53920552019-05-29T03:42:54Z http://ndltd.ncl.edu.tw/handle/mcvf5e Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms 基於基因演算法之自動屬性分群與特徵選取 Po-Cheng Wang 王博正碩士國立中山大學資訊工程學系研究所 97 Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection. Tzung-Pei Hong 洪宗貝 2009 學位論文 ; thesis 83 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 資訊工程學系研究所 === 97 === Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection.
author2	Tzung-Pei Hong
author_facet	Tzung-Pei Hong Po-Cheng Wang 王博正
author	Po-Cheng Wang 王博正
spellingShingle	Po-Cheng Wang 王博正 Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
author_sort	Po-Cheng Wang
title	Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_short	Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_full	Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_fullStr	Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_full_unstemmed	Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
title_sort	automatic attribute clustering and feature selection based on genetic algorithms
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/mcvf5e
work_keys_str_mv	AT pochengwang automaticattributeclusteringandfeatureselectionbasedongeneticalgorithms AT wángbózhèng automaticattributeclusteringandfeatureselectionbasedongeneticalgorithms AT pochengwang jīyújīyīnyǎnsuànfǎzhīzìdòngshǔxìngfēnqúnyǔtèzhēngxuǎnqǔ AT wángbózhèng jīyújīyīnyǎnsuànfǎzhīzìdòngshǔxìngfēnqúnyǔtèzhēngxuǎnqǔ
_version_	1719193010827165696

Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms

Similar Items