Spherical k-Means Clustering

Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-...

Full description

Bibliographic Details
Main Authors: Kurt Hornik, Ingo Feinerer, Martin Kober, Christian Buchta
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2012-09-01
Series:Journal of Statistical Software
Subjects:
R
Online Access:http://www.jstatsoft.org/v50/i10/paper
id doaj-3ddfb3df2b8a4ae2bacb510866f52b2f
record_format Article
spelling doaj-3ddfb3df2b8a4ae2bacb510866f52b2f2020-11-25T00:24:14ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602012-09-015010Spherical k-Means ClusteringKurt HornikIngo FeinererMartin KoberChristian BuchtaClustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.http://www.jstatsoft.org/v50/i10/papersphericalclusteringtext miningcosine dissimilarityR
collection DOAJ
language English
format Article
sources DOAJ
author Kurt Hornik
Ingo Feinerer
Martin Kober
Christian Buchta
spellingShingle Kurt Hornik
Ingo Feinerer
Martin Kober
Christian Buchta
Spherical k-Means Clustering
Journal of Statistical Software
spherical
clustering
text mining
cosine dissimilarity
R
author_facet Kurt Hornik
Ingo Feinerer
Martin Kober
Christian Buchta
author_sort Kurt Hornik
title Spherical k-Means Clustering
title_short Spherical k-Means Clustering
title_full Spherical k-Means Clustering
title_fullStr Spherical k-Means Clustering
title_full_unstemmed Spherical k-Means Clustering
title_sort spherical k-means clustering
publisher Foundation for Open Access Statistics
series Journal of Statistical Software
issn 1548-7660
publishDate 2012-09-01
description Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.
topic spherical
clustering
text mining
cosine dissimilarity
R
url http://www.jstatsoft.org/v50/i10/paper
work_keys_str_mv AT kurthornik sphericalkmeansclustering
AT ingofeinerer sphericalkmeansclustering
AT martinkober sphericalkmeansclustering
AT christianbuchta sphericalkmeansclustering
_version_ 1725353123110715392