Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection
Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this s...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/9/14/1680 |
id |
doaj-f4226b36b05f4a9cb4ac7a0f79b2dfa8 |
---|---|
record_format |
Article |
spelling |
doaj-f4226b36b05f4a9cb4ac7a0f79b2dfa82021-07-23T13:52:35ZengMDPI AGMathematics2227-73902021-07-0191680168010.3390/math9141680Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature SelectionHui Chen0Kunpeng Xu1Lifei Chen2Qingshan Jiang3Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaDepartment of Computer Science, University of Sherbrooke, Sherbrooke, QC J1K 2R1, CanadaCollege of Computer and Cyber Security, Fujian Normal University, Fuzhou 350007, ChinaShenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, ChinaKernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes.https://www.mdpi.com/2227-7390/9/14/1680machine learningcategorical datasimilarityfeature selectionkernel density estimationnon-linear optimization |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hui Chen Kunpeng Xu Lifei Chen Qingshan Jiang |
spellingShingle |
Hui Chen Kunpeng Xu Lifei Chen Qingshan Jiang Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection Mathematics machine learning categorical data similarity feature selection kernel density estimation non-linear optimization |
author_facet |
Hui Chen Kunpeng Xu Lifei Chen Qingshan Jiang |
author_sort |
Hui Chen |
title |
Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection |
title_short |
Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection |
title_full |
Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection |
title_fullStr |
Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection |
title_full_unstemmed |
Self-Expressive Kernel Subspace Clustering Algorithm for Categorical Data with Embedded Feature Selection |
title_sort |
self-expressive kernel subspace clustering algorithm for categorical data with embedded feature selection |
publisher |
MDPI AG |
series |
Mathematics |
issn |
2227-7390 |
publishDate |
2021-07-01 |
description |
Kernel clustering of categorical data is a useful tool to process the separable datasets and has been employed in many disciplines. Despite recent efforts, existing methods for kernel clustering remain a significant challenge due to the assumption of feature independence and equal weights. In this study, we propose a self-expressive kernel subspace clustering algorithm for categorical data (SKSCC) using the self-expressive kernel density estimation (SKDE) scheme, as well as a new feature-weighted non-linear similarity measurement. In the SKSCC algorithm, we propose an effective non-linear optimization method to solve the clustering algorithm’s objective function, which not only considers the relationship between attributes in a non-linear space but also assigns a weight to each attribute in the algorithm to measure the degree of correlation. A series of experiments on some widely used synthetic and real-world datasets demonstrated the better effectiveness and efficiency of the proposed algorithm compared with other state-of-the-art methods, in terms of non-linear relationship exploration among attributes. |
topic |
machine learning categorical data similarity feature selection kernel density estimation non-linear optimization |
url |
https://www.mdpi.com/2227-7390/9/14/1680 |
work_keys_str_mv |
AT huichen selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection AT kunpengxu selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection AT lifeichen selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection AT qingshanjiang selfexpressivekernelsubspaceclusteringalgorithmforcategoricaldatawithembeddedfeatureselection |
_version_ |
1721287301836832768 |