A Novel Algorithm Using N-link Average for Hierarchical Automatic Clustering

碩士 === 中原大學 === 資訊管理研究所 === 98 === This study proposed a novel method of using N-link average for hierarchical automatic clustering, which has the ability to explore arbitrary shapes and can improve the accuracy of clustering to avoid chaining effect efficiently. Comparing with relevant literature,...

Full description

Bibliographic Details
Main Authors: Jia-Hsien Chang, 張加憲
Other Authors: Wei-Ping Lee
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/19050667026272691022
Description
Summary:碩士 === 中原大學 === 資訊管理研究所 === 98 === This study proposed a novel method of using N-link average for hierarchical automatic clustering, which has the ability to explore arbitrary shapes and can improve the accuracy of clustering to avoid chaining effect efficiently. Comparing with relevant literature, this method is more correct for the data of automatic clustering analysis. Algorithm processes firstly uses the concept of Shared Nearest Neighbors as a preliminary noise filtering, and then uses k-means algorithm to divide the data set into multiple sub-clusters; meanwhile, run agglomerative hierarchical clustering through the way which N-link average algorithm judges the gap distance. After hierarchical cohesion, the new method can obtain the best data of clusters through every analysis for the gap and merges the noise to the nearest neighbors. N-link average is reinforced from the basis of Single-link. Its way to determine has expanded from minimum distance point to surface between two clusters. This study also combined the gravity theory to present quality factors, which merged the remaining outliers into the nearest big cluster in priority after initial noise filtering; for the sake of avoiding the outliers might be considered as an independent cluster and reduce the clustering effect. The experiment uses two-dimensional synthetic data to compare separately with Partitional Clustering Algorithm(k-means and PAM), Hierarchical Clustering Algorithms(Single-link, Complete-link, Group average, and Centroid)and a Two-Phase Clustering Algorithm based on K-means and Hierarchical Clustering with Single-Linkage Agglomerative Method and the results shows the new method we proposed can generate the clustering effect more correct for the data set of arbitrary shapes. Besides, comparison with the accuracy of automatic clustering in other relevant literature, adopting the data set of CHAMELEON can obtain more precise judgment of the number of clusters.