Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection

Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this s...

Full description

Bibliographic Details
Main Authors: Hui Chen, Kunpeng Xu, Lifei Chen, Qingshan Jiang
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/14/1680
id doaj-f4226b36b05f4a9cb4ac7a0f79b2dfa8
record_format Article
spelling doaj-f4226b36b05f4a9cb4ac7a0f79b2dfa82021-07-23T13:52:35ZengMDPI AGMathematics2227-73902021-07-0191680168010.3390/math9141680Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature SelectionHui Chen0Kunpeng Xu1Lifei Chen2Qingshan Jiang3Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaDepartment of Computer Science, University of Sherbrooke, Sherbrooke, QC J1K 2R1, CanadaCollege of Computer and Cyber Security, Fujian Normal University, Fuzhou 350007, ChinaShenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaKernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.https://www.mdpi.com/2227-7390/9/14/1680machine learningcategorical datasimilarityfeature selectionkernel density estimationnon-linear optimization
collection DOAJ
language English
format Article
sources DOAJ
author Hui Chen
Kunpeng Xu
Lifei Chen
Qingshan Jiang
spellingShingle Hui Chen
Kunpeng Xu
Lifei Chen
Qingshan Jiang
Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
Mathematics
machine learning
categorical data
similarity
feature selection
kernel density estimation
non-linear optimization
author_facet Hui Chen
Kunpeng Xu
Lifei Chen
Qingshan Jiang
author_sort Hui Chen
title Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_short Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_full Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_fullStr Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_full_unstemmed Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
title_sort self-expressive kernel subspace clustering algorithm for categorical data with embedded feature selection
publisher MDPI AG
series Mathematics
issn 2227-7390
publishDate 2021-07-01
description Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.
topic machine learning
categorical data
similarity
feature selection
kernel density estimation
non-linear optimization
url https://www.mdpi.com/2227-7390/9/14/1680
work_keys_str_mv AT huichen selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection
AT kunpengxu selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection
AT lifeichen selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection
AT qingshanjiang selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection
_version_ 1721287301836832768