The Efficient Way of Detecting Anomalies in Large Scale Streaming Data

These days many companies has marketed the big data streams in numerous applications including industry, Internet of Things and telecommunication. The stream of data produced by these applications may contain the values which are not normal. These values are called as anomalies. A lot of work has be...

Full description

Bibliographic Details
Main Authors:	Sheeraz Lighari, Dil Muhammad Akbar Hussain
Format:	Article
Language:	English
Published:	University of Sindh 2018-07-01
Series:	University of Sindh Journal of Information and Communication Technology
Subjects:	Batch data Streaming data Clustering KMeans and Anomaly detection
Online Access:	http://sujo.usindh.edu.pk/index.php/USJICT/article/view/4453/pdf

id	doaj-20c417e66b204c62ae97683db873477d
record_format	Article
spelling	doaj-20c417e66b204c62ae97683db873477d2020-11-24T23:27:17ZengUniversity of SindhUniversity of Sindh Journal of Information and Communication Technology2521-55822523-12352018-07-0123156161The Efficient Way of Detecting Anomalies in Large Scale Streaming DataSheeraz Lighari0Dil Muhammad Akbar Hussain1Department of Energy Technology, Aalborg UniversityDepartment of Energy Technology, Aalborg UniversityThese days many companies has marketed the big data streams in numerous applications including industry, Internet of Things and telecommunication. The stream of data produced by these applications may contain the values which are not normal. These values are called as anomalies. A lot of work has been done in anomaly detection to the batch data but detecting anomalies from streaming data nevertheless remains a largely available issue. In streaming data, the tasks related to find out the anomalies has become challenging with the passage of time because of the dynamic changes in data, which are produced by different methods applied in data streaming infrastructures. In the process of anomaly detection, first of all, it is required to know the way of finding the normal behavior of data and then it is easy to know the dynamic behavior or change in the data. In this context, clustering is a very prominent technique. The application of clustering method is very common to analyze the static data but in the field of data mining, it is key a problem especially on the streaming data. In this paper, we are applying streaming version of KMeans clustering algorithm for anomaly detection. The algorithm is analyzed both on single and distributed environments. Furthermore, we are investigating the stream of data to know various factors such as accuracy, anomaly detection time, true positive rate, and false positive rate. The data stream used in our analysis is generated from Kddcup99 dataset which is largely used in the field of intrusion detection.http://sujo.usindh.edu.pk/index.php/USJICT/article/view/4453/pdfBatch dataStreaming dataClusteringKMeansand Anomaly detection
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Sheeraz Lighari Dil Muhammad Akbar Hussain
spellingShingle	Sheeraz Lighari Dil Muhammad Akbar Hussain The Efficient Way of Detecting Anomalies in Large Scale Streaming Data University of Sindh Journal of Information and Communication Technology Batch data Streaming data Clustering KMeans and Anomaly detection
author_facet	Sheeraz Lighari Dil Muhammad Akbar Hussain
author_sort	Sheeraz Lighari
title	The Efficient Way of Detecting Anomalies in Large Scale Streaming Data
title_short	The Efficient Way of Detecting Anomalies in Large Scale Streaming Data
title_full	The Efficient Way of Detecting Anomalies in Large Scale Streaming Data
title_fullStr	The Efficient Way of Detecting Anomalies in Large Scale Streaming Data
title_full_unstemmed	The Efficient Way of Detecting Anomalies in Large Scale Streaming Data
title_sort	efficient way of detecting anomalies in large scale streaming data
publisher	University of Sindh
series	University of Sindh Journal of Information and Communication Technology
issn	2521-5582 2523-1235
publishDate	2018-07-01
description	These days many companies has marketed the big data streams in numerous applications including industry, Internet of Things and telecommunication. The stream of data produced by these applications may contain the values which are not normal. These values are called as anomalies. A lot of work has been done in anomaly detection to the batch data but detecting anomalies from streaming data nevertheless remains a largely available issue. In streaming data, the tasks related to find out the anomalies has become challenging with the passage of time because of the dynamic changes in data, which are produced by different methods applied in data streaming infrastructures. In the process of anomaly detection, first of all, it is required to know the way of finding the normal behavior of data and then it is easy to know the dynamic behavior or change in the data. In this context, clustering is a very prominent technique. The application of clustering method is very common to analyze the static data but in the field of data mining, it is key a problem especially on the streaming data. In this paper, we are applying streaming version of KMeans clustering algorithm for anomaly detection. The algorithm is analyzed both on single and distributed environments. Furthermore, we are investigating the stream of data to know various factors such as accuracy, anomaly detection time, true positive rate, and false positive rate. The data stream used in our analysis is generated from Kddcup99 dataset which is largely used in the field of intrusion detection.
topic	Batch data Streaming data Clustering KMeans and Anomaly detection
url	http://sujo.usindh.edu.pk/index.php/USJICT/article/view/4453/pdf
work_keys_str_mv	AT sheerazlighari theefficientwayofdetectinganomaliesinlargescalestreamingdata AT dilmuhammadakbarhussain theefficientwayofdetectinganomaliesinlargescalestreamingdata AT sheerazlighari efficientwayofdetectinganomaliesinlargescalestreamingdata AT dilmuhammadakbarhussain efficientwayofdetectinganomaliesinlargescalestreamingdata
_version_	1725552631086055424

The Efficient Way of Detecting Anomalies in Large Scale Streaming Data

Similar Items