Optimal Clustering in Stable Instances Using Combinations of Exact and Noisy Ordinal Queries

This work studies clustering algorithms which operates with <i>ordinal</i> or <i>comparison-based</i> queries (operations), a situation that arises in many active-learning applications where “dissimilarities” between data points are evaluated by humans. Typically, <i>ex...

Full description

Bibliographic Details
Main Authors: Enrico Bianchi, Paolo Penna
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/14/2/55
Description
Summary:This work studies clustering algorithms which operates with <i>ordinal</i> or <i>comparison-based</i> queries (operations), a situation that arises in many active-learning applications where “dissimilarities” between data points are evaluated by humans. Typically, <i>exact</i> answers are <i>costly</i> (or difficult to obtain in large amounts) while possibly <i>erroneous</i> answers have <i>low cost</i>. Motivated by these considerations, we study algorithms with non-trivial <i>trade-offs</i> between the number of exact (high-cost) operations and noisy (low-cost) operations with provable performance guarantees. Specifically, we study a class of polynomial-time <i>graph-based</i> clustering algorithms (termed Single-Linkage) which are widely used in practice and that guarantee <i>exact</i> solutions for <i>stable</i> instances in several clustering problems (these problems are NP-hard in the worst case). We provide several variants of these algorithms using <i>ordinal</i> operations and, in particular, non-trivial trade-offs between the number of <i>high-cost</i> and <i>low-cost</i> operations that are used. Our algorithms still guarantee <i>exact</i> solutions for <i>stable</i> instances of <i>k-medoids</i> clustering, and they use a rather small number of high-cost operations, without increasing the low-cost operations too much.
ISSN:1999-4893