Unsupervised outlier detection in multidimensional data

Abstract Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomali...

Full description

Bibliographic Details
Main Authors:	Atiq ur Rehman, Samir Brahim Belhaouari
Format:	Article
Language:	English
Published:	SpringerOpen 2021-06-01
Series:	Journal of Big Data
Subjects:	Anomaly/outliers detection Advanced statistical methods Computationally inexpensive methods High dimensional data
Online Access:	https://doi.org/10.1186/s40537-021-00469-z

id	doaj-55d03fc9d82045f49921e5e79c876968
record_format	Article
spelling	doaj-55d03fc9d82045f49921e5e79c8769682021-06-06T11:53:52ZengSpringerOpenJournal of Big Data2196-11152021-06-018112710.1186/s40537-021-00469-zUnsupervised outlier detection in multidimensional dataAtiq ur Rehman0Samir Brahim Belhaouari1ICT Division, College of Science and Engineering, Hamad Bin Khalifa UniversityICT Division, College of Science and Engineering, Hamad Bin Khalifa UniversityAbstract Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.https://doi.org/10.1186/s40537-021-00469-zAnomaly/outliers detectionAdvanced statistical methodsComputationally inexpensive methodsHigh dimensional data
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Atiq ur Rehman Samir Brahim Belhaouari
spellingShingle	Atiq ur Rehman Samir Brahim Belhaouari Unsupervised outlier detection in multidimensional data Journal of Big Data Anomaly/outliers detection Advanced statistical methods Computationally inexpensive methods High dimensional data
author_facet	Atiq ur Rehman Samir Brahim Belhaouari
author_sort	Atiq ur Rehman
title	Unsupervised outlier detection in multidimensional data
title_short	Unsupervised outlier detection in multidimensional data
title_full	Unsupervised outlier detection in multidimensional data
title_fullStr	Unsupervised outlier detection in multidimensional data
title_full_unstemmed	Unsupervised outlier detection in multidimensional data
title_sort	unsupervised outlier detection in multidimensional data
publisher	SpringerOpen
series	Journal of Big Data
issn	2196-1115
publishDate	2021-06-01
description	Abstract Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.
topic	Anomaly/outliers detection Advanced statistical methods Computationally inexpensive methods High dimensional data
url	https://doi.org/10.1186/s40537-021-00469-z
work_keys_str_mv	AT atiqurrehman unsupervisedoutlierdetectioninmultidimensionaldata AT samirbrahimbelhaouari unsupervisedoutlierdetectioninmultidimensionaldata
_version_	1721393503665127424

Unsupervised outlier detection in multidimensional data

Similar Items