Approximating and testing k-histogram distributions in sub-linear time
A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection...
Main Authors: | , , |
---|---|
Other Authors: | , |
Format: | Article |
Language: | English |
Published: |
Association for Computing Machinery (ACM),
2014-05-15T18:13:10Z.
|
Subjects: | |
Online Access: | Get fulltext |
Summary: | A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the l [subscript 2] distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the l [subscript 1] distance and l [subscript 2] distance respectively. David & Lucile Packard Foundation (Fellowship) National Science Foundation (U.S.) (Grant CCF-0728645) National Science Foundation (U.S.) (Grant 0732334) National Science Foundation (U.S.) (Grant 0728645) |
---|