Graph Clustering for Categorical Data

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 107 === Clustering is a popular task in many fields, especially machine learning and data mining. Many of the existing clustering algorithms or methods are designed for numerical data that have numerical attributes. Due to the popularity of big data, many collected d...

Full description

Bibliographic Details
Main Authors: Chen, Wei-Shiang, 陳威翔
Other Authors: Lin, Ja-Chen
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/69g376
id ndltd-TW-107NCTU5394036
record_format oai_dc
spelling ndltd-TW-107NCTU53940362019-05-16T01:40:47Z http://ndltd.ncl.edu.tw/handle/69g376 Graph Clustering for Categorical Data 文字型資料的圖形分群法 Chen, Wei-Shiang 陳威翔 碩士 國立交通大學 資訊科學與工程研究所 107 Clustering is a popular task in many fields, especially machine learning and data mining. Many of the existing clustering algorithms or methods are designed for numerical data that have numerical attributes. Due to the popularity of big data, many collected data are originally of categorical or nominal attributes. Transforming categorical data into numerical data with specific techniques may be a solution, but somehow loses the essence of the original data. In this study, we use graph clustering for categorical data to solve this problem. By using a context-based similarity measurement to estimate similarity between data objects, our first method transforms categorical dataset into a similarity matrix for a graph. Afterwards, we feed our graph transition matrix into a neural network model to obtain a graph embedding matrix. Finally, a simple clustering algorithm is utilized to cluster the embedding matrix. Our second method extends the idea of graph transition matrix used in our first method. With additional input for our neural network model, we change the structure of the model and obtain better representations for both the nodes and clustering results. Four categorical datasets including Congress vote, Heart, Mushroom, and HIV are tested in our experiments. The results show that our both methods can cluster the categorical data better than other categorical clustering methods. Lin, Ja-Chen 林志青 2018 學位論文 ; thesis 44 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 107 === Clustering is a popular task in many fields, especially machine learning and data mining. Many of the existing clustering algorithms or methods are designed for numerical data that have numerical attributes. Due to the popularity of big data, many collected data are originally of categorical or nominal attributes. Transforming categorical data into numerical data with specific techniques may be a solution, but somehow loses the essence of the original data. In this study, we use graph clustering for categorical data to solve this problem. By using a context-based similarity measurement to estimate similarity between data objects, our first method transforms categorical dataset into a similarity matrix for a graph. Afterwards, we feed our graph transition matrix into a neural network model to obtain a graph embedding matrix. Finally, a simple clustering algorithm is utilized to cluster the embedding matrix. Our second method extends the idea of graph transition matrix used in our first method. With additional input for our neural network model, we change the structure of the model and obtain better representations for both the nodes and clustering results. Four categorical datasets including Congress vote, Heart, Mushroom, and HIV are tested in our experiments. The results show that our both methods can cluster the categorical data better than other categorical clustering methods.
author2 Lin, Ja-Chen
author_facet Lin, Ja-Chen
Chen, Wei-Shiang
陳威翔
author Chen, Wei-Shiang
陳威翔
spellingShingle Chen, Wei-Shiang
陳威翔
Graph Clustering for Categorical Data
author_sort Chen, Wei-Shiang
title Graph Clustering for Categorical Data
title_short Graph Clustering for Categorical Data
title_full Graph Clustering for Categorical Data
title_fullStr Graph Clustering for Categorical Data
title_full_unstemmed Graph Clustering for Categorical Data
title_sort graph clustering for categorical data
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/69g376
work_keys_str_mv AT chenweishiang graphclusteringforcategoricaldata
AT chénwēixiáng graphclusteringforcategoricaldata
AT chenweishiang wénzìxíngzīliàodetúxíngfēnqúnfǎ
AT chénwēixiáng wénzìxíngzīliàodetúxíngfēnqúnfǎ
_version_ 1719178663704920064