Scalable kernel density estimation-based local outlier detection over large data streams

© 2019 Copyright held by the owner/author(s). Local outlier techniques are known to be effective for detecting outliers in skewed data, where subsets of the data exhibit diverse distribution properties. However, existing methods are not well equipped to support modern high-velocity data streams due...

Full description

Bibliographic Details
Main Authors: Qin, X (Author), Cao, L (Author), Rundensteiner, EA (Author), Madden, S (Author)
Format: Article
Language:English
Published: OpenProceedings.org, 2021-11-05T15:24:13Z.
Subjects:
Online Access:Get fulltext
LEADER 02353 am a22001933u 4500
001 137525
042 |a dc 
100 1 0 |a Qin, X  |e author 
700 1 0 |a Cao, L  |e author 
700 1 0 |a Rundensteiner, EA  |e author 
700 1 0 |a Madden, S  |e author 
245 0 0 |a Scalable kernel density estimation-based local outlier detection over large data streams 
260 |b OpenProceedings.org,   |c 2021-11-05T15:24:13Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/137525 
520 |a © 2019 Copyright held by the owner/author(s). Local outlier techniques are known to be effective for detecting outliers in skewed data, where subsets of the data exhibit diverse distribution properties. However, existing methods are not well equipped to support modern high-velocity data streams due to the high complexity of the detection algorithms and their volatility to data updates. To tackle these shortcomings, we propose local outlier semantics that operate at an abstraction level by leveraging kernel density estimation (KDE) to effectively detect local outliers from streaming data. A strategy to continuously detect top-N KDE-based local outliers over streams is designed, called KELOS - the first linear time complexity streaming local outlier detection approach. The first innovation of KELOS is the abstract kernel center-based KDE (aKDE) strategy. aKDE accurately yet efficiently estimates the data density at each point - essential for local outlier detection. This is based on the observation that a cluster of points close to each other tend to have a similar influence on a target point's density estimation when used as kernel centers. These points thus can be represented by one abstract kernel center. Next, the KELOS's inlier pruning strategy early prunes points that have no chance to become top-N outliers. This empowers KELOS to skip the computation of their data density and of the outlier status for every data point. Together aKDE and the inlier pruning strategy eliminate the performance bottleneck of streaming local outlier detection. The experimental evaluation demonstrates that KELOS is up to 6 orders of magnitude faster than existing solutions, while being highly effective in detecting local outliers from streaming data. 
546 |a en 
655 7 |a Article 
773 |t 10.5441/002/edbt.2019.37 
773 |t Advances in Database Technology - EDBT