Fast Clustering by Affinity Propagation Based on Density Peaks

Clustering is an important technique in data mining and knowledge discovery. Affinity propagation clustering (AP) and density peaks and distance-based clustering (DDC) are two significant clustering algorithms proposed in 2007 and 2014 respectively. The two clustering algorithms have simple and clea...

Full description

Bibliographic Details
Main Authors:	Yang Li, Chonghui Guo, Leilei Sun
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Exemplar-based clustering affinity propagation density peaks
Online Access:	https://ieeexplore.ieee.org/document/9151946/

id	doaj-b56068b24e4b4c9089c6999f74199094
record_format	Article
spelling	doaj-b56068b24e4b4c9089c6999f741990942021-03-30T04:37:48ZengIEEEIEEE Access2169-35362020-01-01813888413889710.1109/ACCESS.2020.30127409151946Fast Clustering by Affinity Propagation Based on Density PeaksYang Li0https://orcid.org/0000-0002-8988-0491Chonghui Guo1https://orcid.org/0000-0002-5155-1297Leilei Sun2https://orcid.org/0000-0002-0157-1716Institute of Systems Engineering, Dalian University of Technology, Dalian, ChinaInstitute of Systems Engineering, Dalian University of Technology, Dalian, ChinaSchool of Computer Science and Engineering, Beihang University, Beijing, ChinaClustering is an important technique in data mining and knowledge discovery. Affinity propagation clustering (AP) and density peaks and distance-based clustering (DDC) are two significant clustering algorithms proposed in 2007 and 2014 respectively. The two clustering algorithms have simple and clear design ideas, and are effective in finding meaningful clustering solutions. They have been widely used in various applications successfully. However, a key disadvantage of AP is its high time complexity, which has become a bottleneck when applying AP for large-scale problems. The core idea of DDC is to construct the decision graph based on the local density and the distance of each data point, and then select the cluster centers, but the selection of the cluster centers is relatively subjective, and sometimes it is difficult to determine a suitable number of cluster centers. Here, we propose a two-stage clustering algorithm, called DDAP, to overcome these shortcomings. First, we select a small number of potential exemplars based on the two quantities of each data point in DDC to greatly compress the scale of the similarity matrix. Then we implement message-passing on the incomplete similarity matrix. In experiments, two synthetic datasets, nine publicly available datasets, and a real-world electronic medical records (EMRs) dataset are used to evaluate the proposed method. The results demonstrate that DDAP can achieve comparable clustering performance with the original AP algorithm, while the computational efficiency improves observably.https://ieeexplore.ieee.org/document/9151946/Exemplar-based clusteringaffinity propagationdensity peaks
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yang Li Chonghui Guo Leilei Sun
spellingShingle	Yang Li Chonghui Guo Leilei Sun Fast Clustering by Affinity Propagation Based on Density Peaks IEEE Access Exemplar-based clustering affinity propagation density peaks
author_facet	Yang Li Chonghui Guo Leilei Sun
author_sort	Yang Li
title	Fast Clustering by Affinity Propagation Based on Density Peaks
title_short	Fast Clustering by Affinity Propagation Based on Density Peaks
title_full	Fast Clustering by Affinity Propagation Based on Density Peaks
title_fullStr	Fast Clustering by Affinity Propagation Based on Density Peaks
title_full_unstemmed	Fast Clustering by Affinity Propagation Based on Density Peaks
title_sort	fast clustering by affinity propagation based on density peaks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Clustering is an important technique in data mining and knowledge discovery. Affinity propagation clustering (AP) and density peaks and distance-based clustering (DDC) are two significant clustering algorithms proposed in 2007 and 2014 respectively. The two clustering algorithms have simple and clear design ideas, and are effective in finding meaningful clustering solutions. They have been widely used in various applications successfully. However, a key disadvantage of AP is its high time complexity, which has become a bottleneck when applying AP for large-scale problems. The core idea of DDC is to construct the decision graph based on the local density and the distance of each data point, and then select the cluster centers, but the selection of the cluster centers is relatively subjective, and sometimes it is difficult to determine a suitable number of cluster centers. Here, we propose a two-stage clustering algorithm, called DDAP, to overcome these shortcomings. First, we select a small number of potential exemplars based on the two quantities of each data point in DDC to greatly compress the scale of the similarity matrix. Then we implement message-passing on the incomplete similarity matrix. In experiments, two synthetic datasets, nine publicly available datasets, and a real-world electronic medical records (EMRs) dataset are used to evaluate the proposed method. The results demonstrate that DDAP can achieve comparable clustering performance with the original AP algorithm, while the computational efficiency improves observably.
topic	Exemplar-based clustering affinity propagation density peaks
url	https://ieeexplore.ieee.org/document/9151946/
work_keys_str_mv	AT yangli fastclusteringbyaffinitypropagationbasedondensitypeaks AT chonghuiguo fastclusteringbyaffinitypropagationbasedondensitypeaks AT leileisun fastclusteringbyaffinitypropagationbasedondensitypeaks
_version_	1724181466260176896

Fast Clustering by Affinity Propagation Based on Density Peaks

Similar Items