SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA

Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification) or unsupervised learning (e.g. Clustering) Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in...

Full description

Bibliographic Details
Main Authors:	S. Anitha Elavarasi, J. Akilandeswari
Format:	Article
Language:	English
Published:	ICT Academy of Tamil Nadu 2014-01-01
Series:	ICTACT Journal on Soft Computing
Subjects:	Clustering Categorical Data Time Complexity Similarity Measure Data Mining Tools
Online Access:	http://ictactjournals.in/paper/7_Paper_715_722.pdf

id	doaj-b605bf758868465cb63b9d604b475920
record_format	Article
spelling	doaj-b605bf758868465cb63b9d604b4759202020-11-25T02:01:06ZengICT Academy of Tamil NaduICTACT Journal on Soft Computing0976-65612229-69562014-01-0142715722SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATAS. Anitha Elavarasi0J. Akilandeswari1Department of Computer Science and Engineering, Sona College of Technology, IndiaDepartment of Information Technology, Sona College of Technology, IndiaLearning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification) or unsupervised learning (e.g. Clustering) Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about ten different clustering algorithms, its methodology and the factors influencing its performance. Each algorithm is evaluated using real world datasets and its pro and cons are specified. The various similarity / dissimilarity measure applied to categorical data and its performance is also discussed. The time complexity defines the amount of time taken by an algorithm to perform the elementary operation. The time complexity of various algorithms are discussed and its performance on real world data such as mushroom, zoo, soya bean, cancer, vote, car and iris are measured. In this survey Cluster Accuracy and Error rate for four different clustering algorithm (K-modes, fuzzy K-modes, ROCK and Squeezer), two different similarity measure (DISC and Overlap) and DILCA applied for hierarchy and partition algorithm are evaluated.http://ictactjournals.in/paper/7_Paper_715_722.pdfClusteringCategorical DataTime ComplexitySimilarity MeasureData Mining Tools
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	S. Anitha Elavarasi J. Akilandeswari
spellingShingle	S. Anitha Elavarasi J. Akilandeswari SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA ICTACT Journal on Soft Computing Clustering Categorical Data Time Complexity Similarity Measure Data Mining Tools
author_facet	S. Anitha Elavarasi J. Akilandeswari
author_sort	S. Anitha Elavarasi
title	SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
title_short	SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
title_full	SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
title_fullStr	SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
title_full_unstemmed	SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
title_sort	survey on clustering algorithm and similarity measure for categorical data
publisher	ICT Academy of Tamil Nadu
series	ICTACT Journal on Soft Computing
issn	0976-6561 2229-6956
publishDate	2014-01-01
description	Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification) or unsupervised learning (e.g. Clustering) Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about ten different clustering algorithms, its methodology and the factors influencing its performance. Each algorithm is evaluated using real world datasets and its pro and cons are specified. The various similarity / dissimilarity measure applied to categorical data and its performance is also discussed. The time complexity defines the amount of time taken by an algorithm to perform the elementary operation. The time complexity of various algorithms are discussed and its performance on real world data such as mushroom, zoo, soya bean, cancer, vote, car and iris are measured. In this survey Cluster Accuracy and Error rate for four different clustering algorithm (K-modes, fuzzy K-modes, ROCK and Squeezer), two different similarity measure (DISC and Overlap) and DILCA applied for hierarchy and partition algorithm are evaluated.
topic	Clustering Categorical Data Time Complexity Similarity Measure Data Mining Tools
url	http://ictactjournals.in/paper/7_Paper_715_722.pdf
work_keys_str_mv	AT sanithaelavarasi surveyonclusteringalgorithmandsimilaritymeasureforcategoricaldata AT jakilandeswari surveyonclusteringalgorithmandsimilaritymeasureforcategoricaldata
_version_	1724958736882073600

SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA

Similar Items