Online Computing Quantile Summaries Over Uncertain Data Streams

Quantile summarization is a useful tool in data streams management and mining that can efficiently capture the distribution of the data. A quantile of a sequence of points is the point with a given rank in the sequence. Given a sequence of uncertain points S on the real line, each represented by one...

Full description

Bibliographic Details
Main Authors: Chunquan Liang, Mei Li, Bin Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8606044/
id doaj-8f58680c7819410293dc6b43aead252b
record_format Article
spelling doaj-8f58680c7819410293dc6b43aead252b2021-03-29T22:47:34ZengIEEEIEEE Access2169-35362019-01-017109161092610.1109/ACCESS.2019.28915508606044Online Computing Quantile Summaries Over Uncertain Data StreamsChunquan Liang0https://orcid.org/0000-0002-3569-6998Mei Li1Bin Liu2College of Information Engineering, Northwest A&F University, Xianyang, ChinaCollege of Information Engineering, Northwest A&F University, Xianyang, ChinaCollege of Information Engineering, Northwest A&F University, Xianyang, ChinaQuantile summarization is a useful tool in data streams management and mining that can efficiently capture the distribution of the data. A quantile of a sequence of points is the point with a given rank in the sequence. Given a sequence of uncertain points S on the real line, each represented by one-dimensional probability density function (pdf), we study the problem of incrementally maintaining quantile summaries on S such that for any query with a given rank, the summaries can provide a point as the quantile within a given error. We define quantile on uncertain data with discrete or continuous pdf in terms of two error metrics under possible worlds semantics. For an answer to a quantile query on uncertain data, we give the methods for calculating the value of the error and thereby discussing the high-level features of the summaries that can answer approximate quantile query under the two error metrics. We propose an online, space efficient algorithm to compute such summary data on uncertain data streams. The experimental results show that our algorithm substantially outperforms other techniques, such as Monte Carlo and averaging methods, in terms of query error and space for storing the summary data.https://ieeexplore.ieee.org/document/8606044/Data preprocessingpossible worldsquantile summariesuncertain data streams
collection DOAJ
language English
format Article
sources DOAJ
author Chunquan Liang
Mei Li
Bin Liu
spellingShingle Chunquan Liang
Mei Li
Bin Liu
Online Computing Quantile Summaries Over Uncertain Data Streams
IEEE Access
Data preprocessing
possible worlds
quantile summaries
uncertain data streams
author_facet Chunquan Liang
Mei Li
Bin Liu
author_sort Chunquan Liang
title Online Computing Quantile Summaries Over Uncertain Data Streams
title_short Online Computing Quantile Summaries Over Uncertain Data Streams
title_full Online Computing Quantile Summaries Over Uncertain Data Streams
title_fullStr Online Computing Quantile Summaries Over Uncertain Data Streams
title_full_unstemmed Online Computing Quantile Summaries Over Uncertain Data Streams
title_sort online computing quantile summaries over uncertain data streams
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Quantile summarization is a useful tool in data streams management and mining that can efficiently capture the distribution of the data. A quantile of a sequence of points is the point with a given rank in the sequence. Given a sequence of uncertain points S on the real line, each represented by one-dimensional probability density function (pdf), we study the problem of incrementally maintaining quantile summaries on S such that for any query with a given rank, the summaries can provide a point as the quantile within a given error. We define quantile on uncertain data with discrete or continuous pdf in terms of two error metrics under possible worlds semantics. For an answer to a quantile query on uncertain data, we give the methods for calculating the value of the error and thereby discussing the high-level features of the summaries that can answer approximate quantile query under the two error metrics. We propose an online, space efficient algorithm to compute such summary data on uncertain data streams. The experimental results show that our algorithm substantially outperforms other techniques, such as Monte Carlo and averaging methods, in terms of query error and space for storing the summary data.
topic Data preprocessing
possible worlds
quantile summaries
uncertain data streams
url https://ieeexplore.ieee.org/document/8606044/
work_keys_str_mv AT chunquanliang onlinecomputingquantilesummariesoveruncertaindatastreams
AT meili onlinecomputingquantilesummariesoveruncertaindatastreams
AT binliu onlinecomputingquantilesummariesoveruncertaindatastreams
_version_ 1724190819473162240