Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection

Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this s...

Full description

Bibliographic Details
Main Authors:	Hui Chen, Kunpeng Xu, Lifei Chen, Qingshan Jiang
Format:	Article
Language:	English
Published:	MDPI AG 2021-07-01
Series:	Mathematics
Subjects:	machine learning categorical data similarity feature selection kernel density estimation non-linear optimization
Online Access:	https://www.mdpi.com/2227-7390/9/14/1680

id	doaj-f4226b36b05f4a9cb4ac7a0f79b2dfa8
record_format	Article
spelling	doaj-f4226b36b05f4a9cb4ac7a0f79b2dfa82021-07-23T13:52:35ZengMDPI AGMathematics2227-73902021-07-0191680168010.3390/math9141680Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature SelectionHui Chen0Kunpeng Xu1Lifei Chen2Qingshan Jiang3Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaDepartment of Computer Science, University of Sherbrooke, Sherbrooke, QC J1K 2R1, CanadaCollege of Computer and Cyber Security, Fujian Normal University, Fuzhou 350007, ChinaShenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaKernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.https://www.mdpi.com/2227-7390/9/14/1680machine learningcategorical datasimilarityfeature selectionkernel density estimationnon-linear optimization
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hui Chen Kunpeng Xu Lifei Chen Qingshan Jiang
spellingShingle	Hui Chen Kunpeng Xu Lifei Chen Qingshan Jiang Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection Mathematics machine learning categorical data similarity feature selection kernel density estimation non-linear optimization
author_facet	Hui Chen Kunpeng Xu Lifei Chen Qingshan Jiang
author_sort	Hui Chen
title	Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_short	Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_full	Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_fullStr	Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_full_unstemmed	Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_sort	self-expressive kernel subspace clustering algorithm for categorical data with embedded feature selection
publisher	MDPI AG
series	Mathematics
issn	2227-7390
publishDate	2021-07-01
description	Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.
topic	machine learning categorical data similarity feature selection kernel density estimation non-linear optimization
url	https://www.mdpi.com/2227-7390/9/14/1680
work_keys_str_mv	AT huichen selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection AT kunpengxu selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection AT lifeichen selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection AT qingshanjiang selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection
_version_	1721287301836832768

Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection

Similar Items