Evaluating Clusterings by Estimating Clarity

In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then revie...

Full description

Bibliographic Details
Main Author:	Whissell, John
Language:	en
Published:	2012
Subjects:	clustering evaluating clustering cluster validation cluster analysis Computer Science
Online Access:	http://hdl.handle.net/10012/7103

id	ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-7103
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-71032013-10-04T04:11:53ZWhissell, John2012-10-12T20:08:05Z2012-10-12T20:08:05Z2012-10-12T20:08:05Z2012http://hdl.handle.net/10012/7103In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then review current clustering quality measures, accompanying this with an in-depth discussion of many of the important properties one needs to understand about such measures. This is followed by extensive document clustering experiments that show problems with standard clustering evaluation practices. I then develop informativeness, my new internal clustering quality measure for estimating the clarity of clusterings. I show that informativeness, which uses classification accuracy as a proxy for human assessment of clusterings, is both theoretically sensible and works empirically. I present a generalization of informativeness that leverages external clustering quality measures. I also show its use in a realistic application: email spam filtering. I show that informativeness can be used to select clusterings which lead to superior spam filters when few true labels are available. I conclude this thesis with a discussion of clustering evaluation in general, informativeness, and the directions I believe clustering evaluation research should take in the future.enclusteringevaluating clusteringcluster validationcluster analysisEvaluating Clusterings by Estimating ClarityThesis or DissertationSchool of Computer ScienceDoctor of PhilosophyComputer Science
collection	NDLTD
language	en
sources	NDLTD
topic	clustering evaluating clustering cluster validation cluster analysis Computer Science
spellingShingle	clustering evaluating clustering cluster validation cluster analysis Computer Science Whissell, John Evaluating Clusterings by Estimating Clarity
description	In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then review current clustering quality measures, accompanying this with an in-depth discussion of many of the important properties one needs to understand about such measures. This is followed by extensive document clustering experiments that show problems with standard clustering evaluation practices. I then develop informativeness, my new internal clustering quality measure for estimating the clarity of clusterings. I show that informativeness, which uses classification accuracy as a proxy for human assessment of clusterings, is both theoretically sensible and works empirically. I present a generalization of informativeness that leverages external clustering quality measures. I also show its use in a realistic application: email spam filtering. I show that informativeness can be used to select clusterings which lead to superior spam filters when few true labels are available. I conclude this thesis with a discussion of clustering evaluation in general, informativeness, and the directions I believe clustering evaluation research should take in the future.
author	Whissell, John
author_facet	Whissell, John
author_sort	Whissell, John
title	Evaluating Clusterings by Estimating Clarity
title_short	Evaluating Clusterings by Estimating Clarity
title_full	Evaluating Clusterings by Estimating Clarity
title_fullStr	Evaluating Clusterings by Estimating Clarity
title_full_unstemmed	Evaluating Clusterings by Estimating Clarity
title_sort	evaluating clusterings by estimating clarity
publishDate	2012
url	http://hdl.handle.net/10012/7103
work_keys_str_mv	AT whisselljohn evaluatingclusteringsbyestimatingclarity
_version_	1716600967291469824

Evaluating Clusterings by Estimating Clarity

Similar Items