Approximating and testing k-histogram distributions in sub-linear time

A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection...

Full description

Bibliographic Details
Main Authors: Indyk, Piotr (Contributor), Levi, Reut (Author), Rubinfeld, Ronitt (Contributor)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Association for Computing Machinery (ACM), 2014-05-15T18:13:10Z.
Subjects:
Online Access:Get fulltext
LEADER 01914 am a22002653u 4500
001 87005
042 |a dc 
100 1 0 |a Indyk, Piotr  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
100 1 0 |a Indyk, Piotr  |e contributor 
100 1 0 |a Rubinfeld, Ronitt  |e contributor 
700 1 0 |a Levi, Reut  |e author 
700 1 0 |a Rubinfeld, Ronitt  |e author 
245 0 0 |a Approximating and testing k-histogram distributions in sub-linear time 
260 |b Association for Computing Machinery (ACM),   |c 2014-05-15T18:13:10Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/87005 
520 |a A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the l [subscript 2] distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the l [subscript 1] distance and l [subscript 2] distance respectively. 
520 |a David & Lucile Packard Foundation (Fellowship) 
520 |a National Science Foundation (U.S.) (Grant CCF-0728645) 
520 |a National Science Foundation (U.S.) (Grant 0732334) 
520 |a National Science Foundation (U.S.) (Grant 0728645) 
546 |a en_US 
655 7 |a Article 
773 |t Proceedings of the 31st symposium on Principles of Database Systems (PODS '12)