Clustering Consistently

Bibliographic Details
Main Author:	Eldridge, Justin, Eldridge
Language:	English
Published:	The Ohio State University / OhioLINK 2017
Subjects:	Computer Science Statistics Artificial Intelligence machine learning unsupervised learning statistical learning clustering graphon mergeon density cluster tree hierarchical clustering
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249

id	ndltd-OhioLink-oai-etd.ohiolink.edu-osu1512070374903249
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-osu15120703749032492021-08-03T07:04:56Z Clustering Consistently Eldridge, Justin, Eldridge Computer Science Statistics Artificial Intelligence machine learning unsupervised learning statistical learning clustering graphon mergeon density cluster tree hierarchical clustering Clustering is the task of organizing data into natural groups, or clusters. A central goal in developing a theory of clustering is the derivation of correctness guarantees which ensure that clustering methods produce the right results. In this dissertation, we analyze the setting in which the data are sampled from some underlying probability distribution. In this case, an algorithm is "correct" (or consistent) if, given larger and larger data sets, its output converges in some sense to the ideal cluster structure of the distribution.In the first part, we study the setting in which data are drawn from a probability density supported on a subset of a Euclidean space. The natural cluster structure of the density is captured by the so-called high density cluster tree, which is due to Hartigan (1981). Hartigan introduced a notion of convergence to the density cluster tree, and recent work by Chaudhuri and Dasgupta (2010) and Kpotufe and von Luxburg (2011) has contructed algorithms which are consistent in this sense.We will show that Hartigan's notion of consistency is in fact not strong enough to ensure that an algorithm recovers the density cluster tree as we would intuitively expect. We identify the precise deficiency which allows this, and introduce a new, stronger notion of convergence which we call consistency in merge distortion. Consistency in merge distortion implies Hartigan's consistency, and we prove that the algorithm of Chaudhuri and Dasgupta (2010) satisfies our new notion.In the sequel, we consider the clustering of graphs sampled from a very general, non-parametric random graph model called a graphon. Unlike in the density setting, clustering in the graphon model is not well-studied. We therefore rigorously analyze the cluster structure of a graphon and formally define the graphon cluster tree. We adapt our notion of consistency in merge distortion to the graphon setting and identify efficient, consistent algorithms. 2017 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249 http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center.
collection	NDLTD
language	English
sources	NDLTD
topic	Computer Science Statistics Artificial Intelligence machine learning unsupervised learning statistical learning clustering graphon mergeon density cluster tree hierarchical clustering
spellingShingle	Computer Science Statistics Artificial Intelligence machine learning unsupervised learning statistical learning clustering graphon mergeon density cluster tree hierarchical clustering Eldridge, Justin, Eldridge Clustering Consistently
author	Eldridge, Justin, Eldridge
author_facet	Eldridge, Justin, Eldridge
author_sort	Eldridge, Justin, Eldridge
title	Clustering Consistently
title_short	Clustering Consistently
title_full	Clustering Consistently
title_fullStr	Clustering Consistently
title_full_unstemmed	Clustering Consistently
title_sort	clustering consistently
publisher	The Ohio State University / OhioLINK
publishDate	2017
url	http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249
work_keys_str_mv	AT eldridgejustineldridge clusteringconsistently
_version_	1719453162186735616

Clustering Consistently

Similar Items