Online Computing Quantile Summaries Over Uncertain Data Streams
Quantile summarization is a useful tool in data streams management and mining that can efficiently capture the distribution of the data. A quantile of a sequence of points is the point with a given rank in the sequence. Given a sequence of uncertain points S on the real line, each represented by one...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8606044/ |
id |
doaj-8f58680c7819410293dc6b43aead252b |
---|---|
record_format |
Article |
spelling |
doaj-8f58680c7819410293dc6b43aead252b2021-03-29T22:47:34ZengIEEEIEEE Access2169-35362019-01-017109161092610.1109/ACCESS.2019.28915508606044Online Computing Quantile Summaries Over Uncertain Data StreamsChunquan Liang0https://orcid.org/0000-0002-3569-6998Mei Li1Bin Liu2College of Information Engineering, Northwest A&F University, Xianyang, ChinaCollege of Information Engineering, Northwest A&F University, Xianyang, ChinaCollege of Information Engineering, Northwest A&F University, Xianyang, ChinaQuantile summarization is a useful tool in data streams management and mining that can efficiently capture the distribution of the data. A quantile of a sequence of points is the point with a given rank in the sequence. Given a sequence of uncertain points S on the real line, each represented by one-dimensional probability density function (pdf), we study the problem of incrementally maintaining quantile summaries on S such that for any query with a given rank, the summaries can provide a point as the quantile within a given error. We define quantile on uncertain data with discrete or continuous pdf in terms of two error metrics under possible worlds semantics. For an answer to a quantile query on uncertain data, we give the methods for calculating the value of the error and thereby discussing the high-level features of the summaries that can answer approximate quantile query under the two error metrics. We propose an online, space efficient algorithm to compute such summary data on uncertain data streams. The experimental results show that our algorithm substantially outperforms other techniques, such as Monte Carlo and averaging methods, in terms of query error and space for storing the summary data.https://ieeexplore.ieee.org/document/8606044/Data preprocessingpossible worldsquantile summariesuncertain data streams |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chunquan Liang Mei Li Bin Liu |
spellingShingle |
Chunquan Liang Mei Li Bin Liu Online Computing Quantile Summaries Over Uncertain Data Streams IEEE Access Data preprocessing possible worlds quantile summaries uncertain data streams |
author_facet |
Chunquan Liang Mei Li Bin Liu |
author_sort |
Chunquan Liang |
title |
Online Computing Quantile Summaries Over Uncertain Data Streams |
title_short |
Online Computing Quantile Summaries Over Uncertain Data Streams |
title_full |
Online Computing Quantile Summaries Over Uncertain Data Streams |
title_fullStr |
Online Computing Quantile Summaries Over Uncertain Data Streams |
title_full_unstemmed |
Online Computing Quantile Summaries Over Uncertain Data Streams |
title_sort |
online computing quantile summaries over uncertain data streams |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Quantile summarization is a useful tool in data streams management and mining that can efficiently capture the distribution of the data. A quantile of a sequence of points is the point with a given rank in the sequence. Given a sequence of uncertain points S on the real line, each represented by one-dimensional probability density function (pdf), we study the problem of incrementally maintaining quantile summaries on S such that for any query with a given rank, the summaries can provide a point as the quantile within a given error. We define quantile on uncertain data with discrete or continuous pdf in terms of two error metrics under possible worlds semantics. For an answer to a quantile query on uncertain data, we give the methods for calculating the value of the error and thereby discussing the high-level features of the summaries that can answer approximate quantile query under the two error metrics. We propose an online, space efficient algorithm to compute such summary data on uncertain data streams. The experimental results show that our algorithm substantially outperforms other techniques, such as Monte Carlo and averaging methods, in terms of query error and space for storing the summary data. |
topic |
Data preprocessing possible worlds quantile summaries uncertain data streams |
url |
https://ieeexplore.ieee.org/document/8606044/ |
work_keys_str_mv |
AT chunquanliang onlinecomputingquantilesummariesoveruncertaindatastreams AT meili onlinecomputingquantilesummariesoveruncertaindatastreams AT binliu onlinecomputingquantilesummariesoveruncertaindatastreams |
_version_ |
1724190819473162240 |