Efficient Clustering Method Based on Density Peaks With Symmetric Neighborhood Relationship

The density peaks clustering (DPC) is a clustering method proposed by Rodriguez and Laio (Science, 2014), which sets up a decision graph to identify the cluster centers of data points. Because the improper selection of its parameter cut-off distance will lead to the wrong selection of initial cluste...

Full description

Bibliographic Details
Main Authors: Chunrong Wu, Jia Lee, Teijiro Isokawa, Jun Yao, Yunni Xia
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8695694/
Description
Summary:The density peaks clustering (DPC) is a clustering method proposed by Rodriguez and Laio (Science, 2014), which sets up a decision graph to identify the cluster centers of data points. Because the improper selection of its parameter cut-off distance will lead to the wrong selection of initial cluster centers with no corrective actions in the subsequent assignment process, DPC may not identify cluster centers with different densities accurately. Especially, all cluster centers are settled as soon as they are detected, after which the DPC simply assigns each point to the same cluster as its nearest neighbor of higher density. This tends to cause the erroneous assignments of data and thus degrade the efficiency of clustering. In this paper, we propose a robust clustering method which establishes a symmetric neighborhood graph over all data points, based on the k -nearest neighbors and reverse k-nearest neighbors of each point. In order to distinguish the density peaks from all data points, local densities of each point are calculated using the reverse k-nearest neighbors. After that, initial centers for clusters are estimated over the peaks and similar clusters are aggregated on the symmetric neighborhood graph, which ends up with every point being successfully assigned to a cluster. To testify the efficiency of the new clustering method, numerical experiments and comparison works have been done on a variety of artificial and real data sets for clustering.
ISSN:2169-3536