id ndltd-OhioLink-oai-etd.ohiolink.edu-osu1512070374903249
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-osu15120703749032492021-08-03T07:04:56Z Clustering Consistently Eldridge, Justin, Eldridge Computer Science Statistics Artificial Intelligence machine learning unsupervised learning statistical learning clustering graphon mergeon density cluster tree hierarchical clustering Clustering is the task of organizing data into natural groups, or clusters. A central goal in developing a theory of clustering is the derivation of correctness guarantees which ensure that clustering methods produce the right results. In this dissertation, we analyze the setting in which the data are sampled from some underlying probability distribution. In this case, an algorithm is "correct" (or consistent) if, given larger and larger data sets, its output converges in some sense to the ideal cluster structure of the distribution.In the first part, we study the setting in which data are drawn from a probability density supported on a subset of a Euclidean space. The natural cluster structure of the density is captured by the so-called high density cluster tree, which is due to Hartigan (1981). Hartigan introduced a notion of convergence to the density cluster tree, and recent work by Chaudhuri and Dasgupta (2010) and Kpotufe and von Luxburg (2011) has contructed algorithms which are consistent in this sense.We will show that Hartigan's notion of consistency is in fact not strong enough to ensure that an algorithm recovers the density cluster tree as we would intuitively expect. We identify the precise deficiency which allows this, and introduce a new, stronger notion of convergence which we call consistency in merge distortion. Consistency in merge distortion implies Hartigan's consistency, and we prove that the algorithm of Chaudhuri and Dasgupta (2010) satisfies our new notion.In the sequel, we consider the clustering of graphs sampled from a very general, non-parametric random graph model called a graphon. Unlike in the density setting, clustering in the graphon model is not well-studied. We therefore rigorously analyze the cluster structure of a graphon and formally define the graphon cluster tree. We adapt our notion of consistency in merge distortion to the graphon setting and identify efficient, consistent algorithms. 2017 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249 http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center.
collection NDLTD
language English
sources NDLTD
topic Computer Science
Statistics
Artificial Intelligence
machine learning
unsupervised learning
statistical learning
clustering
graphon
mergeon
density cluster tree
hierarchical clustering
spellingShingle Computer Science
Statistics
Artificial Intelligence
machine learning
unsupervised learning
statistical learning
clustering
graphon
mergeon
density cluster tree
hierarchical clustering
Eldridge, Justin, Eldridge
Clustering Consistently
author Eldridge, Justin, Eldridge
author_facet Eldridge, Justin, Eldridge
author_sort Eldridge, Justin, Eldridge
title Clustering Consistently
title_short Clustering Consistently
title_full Clustering Consistently
title_fullStr Clustering Consistently
title_full_unstemmed Clustering Consistently
title_sort clustering consistently
publisher The Ohio State University / OhioLINK
publishDate 2017
url http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249
work_keys_str_mv AT eldridgejustineldridge clusteringconsistently
_version_ 1719453162186735616