HABOS clustering algorithm for categorical data

The clustering algorithm based on sparse feature vector for categorical attributes(CABOSFVC) is an efficient high-dimensional clustering method for categorical data. Sparse feature dissimilarity(SFD) is used to calculate the distance and sparse feature vector is used to achieve data compression. How...

詳細記述

書誌詳細
出版年:工程科学学报
主要な著者: WU Sen, JIANG Dan-dan, WANG Qiang
フォーマット: 論文
言語:中国語
出版事項: Science Press 2016-07-01
主題:
オンライン・アクセス:http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2016.07.018
その他の書誌記述
要約:The clustering algorithm based on sparse feature vector for categorical attributes(CABOSFVC) is an efficient high-dimensional clustering method for categorical data. Sparse feature dissimilarity(SFD) is used to calculate the distance and sparse feature vector is used to achieve data compression. However,CABOSFVC algorithm is dependent upon SFD upper limit parameter for which there is no guidance for configuration. Aimed at solving the problem that CABOSFVC algorithm is sensitive to this parameter,a new heuristic hierarchical clustering algorithm of categorical data based on SFD(HABOS) was proposed in this paper. With the constraint of the upper limit number of clusters,this algorithm applied agglomerative hierarchical clustering and the new internal clustering validation index based on SFD(CVISFD) which was used to measure the results heuristically to achieve the best choice of the clustering level. Three UCI benchmark data sets were used to compare the improved algorithm with the traditional ones. The empirical tests show that HABOS increases the clustering accuracy and stability effectively.
ISSN:2095-9389