Spherical k-Means Clustering
Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2012-09-01
|
Series: | Journal of Statistical Software |
Subjects: | |
Online Access: | http://www.jstatsoft.org/v50/i10/paper |
id |
doaj-3ddfb3df2b8a4ae2bacb510866f52b2f |
---|---|
record_format |
Article |
spelling |
doaj-3ddfb3df2b8a4ae2bacb510866f52b2f2020-11-25T00:24:14ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602012-09-015010Spherical k-Means ClusteringKurt HornikIngo FeinererMartin KoberChristian BuchtaClustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.http://www.jstatsoft.org/v50/i10/papersphericalclusteringtext miningcosine dissimilarityR |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kurt Hornik Ingo Feinerer Martin Kober Christian Buchta |
spellingShingle |
Kurt Hornik Ingo Feinerer Martin Kober Christian Buchta Spherical k-Means Clustering Journal of Statistical Software spherical clustering text mining cosine dissimilarity R |
author_facet |
Kurt Hornik Ingo Feinerer Martin Kober Christian Buchta |
author_sort |
Kurt Hornik |
title |
Spherical k-Means Clustering |
title_short |
Spherical k-Means Clustering |
title_full |
Spherical k-Means Clustering |
title_fullStr |
Spherical k-Means Clustering |
title_full_unstemmed |
Spherical k-Means Clustering |
title_sort |
spherical k-means clustering |
publisher |
Foundation for Open Access Statistics |
series |
Journal of Statistical Software |
issn |
1548-7660 |
publishDate |
2012-09-01 |
description |
Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment. |
topic |
spherical clustering text mining cosine dissimilarity R |
url |
http://www.jstatsoft.org/v50/i10/paper |
work_keys_str_mv |
AT kurthornik sphericalkmeansclustering AT ingofeinerer sphericalkmeansclustering AT martinkober sphericalkmeansclustering AT christianbuchta sphericalkmeansclustering |
_version_ |
1725353123110715392 |