Spherical k-Means Clustering

Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-...

Full description

Bibliographic Details
Main Authors:	Kurt Hornik, Ingo Feinerer, Martin Kober, Christian Buchta
Format:	Article
Language:	English
Published:	Foundation for Open Access Statistics 2012-09-01
Series:	Journal of Statistical Software
Subjects:	spherical clustering text mining cosine dissimilarity R
Online Access:	http://www.jstatsoft.org/v50/i10/paper

id	doaj-3ddfb3df2b8a4ae2bacb510866f52b2f
record_format	Article
spelling	doaj-3ddfb3df2b8a4ae2bacb510866f52b2f2020-11-25T00:24:14ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602012-09-015010Spherical k-Means ClusteringKurt HornikIngo FeinererMartin KoberChristian BuchtaClustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.http://www.jstatsoft.org/v50/i10/papersphericalclusteringtext miningcosine dissimilarityR
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kurt Hornik Ingo Feinerer Martin Kober Christian Buchta
spellingShingle	Kurt Hornik Ingo Feinerer Martin Kober Christian Buchta Spherical k-Means Clustering Journal of Statistical Software spherical clustering text mining cosine dissimilarity R
author_facet	Kurt Hornik Ingo Feinerer Martin Kober Christian Buchta
author_sort	Kurt Hornik
title	Spherical k-Means Clustering
title_short	Spherical k-Means Clustering
title_full	Spherical k-Means Clustering
title_fullStr	Spherical k-Means Clustering
title_full_unstemmed	Spherical k-Means Clustering
title_sort	spherical k-means clustering
publisher	Foundation for Open Access Statistics
series	Journal of Statistical Software
issn	1548-7660
publishDate	2012-09-01
description	Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.
topic	spherical clustering text mining cosine dissimilarity R
url	http://www.jstatsoft.org/v50/i10/paper
work_keys_str_mv	AT kurthornik sphericalkmeansclustering AT ingofeinerer sphericalkmeansclustering AT martinkober sphericalkmeansclustering AT christianbuchta sphericalkmeansclustering
_version_	1725353123110715392

Spherical k-Means Clustering

Similar Items