Anomalies Detection Using Isolation in Concept-Drifting Data Streams

Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we presen...

Full description

Bibliographic Details
Main Authors: Maurras Ulbricht Togbe, Yousra Chabchoub, Aliou Boly, Mariam Barry, Raja Chiky, Maroua Bahri
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/10/1/13
id doaj-8fb183317b3743b2997c34e918b6bfc1
record_format Article
spelling doaj-8fb183317b3743b2997c34e918b6bfc12021-01-20T00:02:55ZengMDPI AGComputers2073-431X2021-01-0110131310.3390/computers10010013Anomalies Detection Using Isolation in Concept-Drifting Data Streams Maurras Ulbricht Togbe0Yousra Chabchoub1Aliou Boly2Mariam Barry3Raja Chiky4Maroua Bahri5ISEP, LISITE, 75006 Paris, FranceISEP, LISITE, 75006 Paris, FranceFaculté des Sciences et Techniques (FST)/Département Mathématiques et Informatique, Université Cheikh Anta Diop de Dakar, Dakar-Fann BP 5005, SenegalTélécom Paris, LTCI, Institut Polytechnique de Paris, 91120 Palaiseau, FranceISEP, LISITE, 75006 Paris, FranceTélécom Paris, LTCI, Institut Polytechnique de Paris, 91120 Palaiseau, FranceDetecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the <i>F</i>1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.https://www.mdpi.com/2073-431X/10/1/13anomaly detectionisolation-baseddata streamsdrift detectionsurvey
collection DOAJ
language English
format Article
sources DOAJ
author Maurras Ulbricht Togbe
Yousra Chabchoub
Aliou Boly
Mariam Barry
Raja Chiky
Maroua Bahri
spellingShingle Maurras Ulbricht Togbe
Yousra Chabchoub
Aliou Boly
Mariam Barry
Raja Chiky
Maroua Bahri
Anomalies Detection Using Isolation in Concept-Drifting Data Streams
Computers
anomaly detection
isolation-based
data streams
drift detection
survey
author_facet Maurras Ulbricht Togbe
Yousra Chabchoub
Aliou Boly
Mariam Barry
Raja Chiky
Maroua Bahri
author_sort Maurras Ulbricht Togbe
title Anomalies Detection Using Isolation in Concept-Drifting Data Streams
title_short Anomalies Detection Using Isolation in Concept-Drifting Data Streams
title_full Anomalies Detection Using Isolation in Concept-Drifting Data Streams
title_fullStr Anomalies Detection Using Isolation in Concept-Drifting Data Streams
title_full_unstemmed Anomalies Detection Using Isolation in Concept-Drifting Data Streams
title_sort anomalies detection using isolation in concept-drifting data streams
publisher MDPI AG
series Computers
issn 2073-431X
publishDate 2021-01-01
description Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the <i>F</i>1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.
topic anomaly detection
isolation-based
data streams
drift detection
survey
url https://www.mdpi.com/2073-431X/10/1/13
work_keys_str_mv AT maurrasulbrichttogbe anomaliesdetectionusingisolationinconceptdriftingdatastreams
AT yousrachabchoub anomaliesdetectionusingisolationinconceptdriftingdatastreams
AT aliouboly anomaliesdetectionusingisolationinconceptdriftingdatastreams
AT mariambarry anomaliesdetectionusingisolationinconceptdriftingdatastreams
AT rajachiky anomaliesdetectionusingisolationinconceptdriftingdatastreams
AT marouabahri anomaliesdetectionusingisolationinconceptdriftingdatastreams
_version_ 1724331627661754368