Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets

Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for...

Full description

Bibliographic Details
Main Authors:	GuiPing Wang, JianXi Yang, Ren Li
Format:	Article
Language:	English
Published:	Electronics and Telecommunications Research Institute (ETRI) 2017-10-01
Series:	ETRI Journal
Subjects:	Anomaly detection Decision function GMean Imbalanced training sample set Support vector machine (SVM)
Online Access:	https://doi.org/10.4218/etrij.17.0116.0879

id	doaj-fe906d3a01d941ada769c998e77c86a8
record_format	Article
spelling	doaj-fe906d3a01d941ada769c998e77c86a82020-11-25T02:36:40ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632233-73262017-10-0139562163110.4218/etrij.17.0116.087910.4218/etrij.17.0116.0879Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training DatasetsGuiPing WangJianXi YangRen LiAbnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false‐negative rate. This article proposes a new imbalanced SVM (termed ImSVM)‐based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over‐sampling techniques and several existing imbalanced SVM‐based techniques.https://doi.org/10.4218/etrij.17.0116.0879Anomaly detectionDecision functionGMeanImbalanced training sample setSupport vector machine (SVM)
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	GuiPing Wang JianXi Yang Ren Li
spellingShingle	GuiPing Wang JianXi Yang Ren Li Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets ETRI Journal Anomaly detection Decision function GMean Imbalanced training sample set Support vector machine (SVM)
author_facet	GuiPing Wang JianXi Yang Ren Li
author_sort	GuiPing Wang
title	Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_short	Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_full	Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_fullStr	Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_full_unstemmed	Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_sort	imbalanced svm‐based anomaly detection algorithm for imbalanced training datasets
publisher	Electronics and Telecommunications Research Institute (ETRI)
series	ETRI Journal
issn	1225-6463 2233-7326
publishDate	2017-10-01
description	Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false‐negative rate. This article proposes a new imbalanced SVM (termed ImSVM)‐based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over‐sampling techniques and several existing imbalanced SVM‐based techniques.
topic	Anomaly detection Decision function GMean Imbalanced training sample set Support vector machine (SVM)
url	https://doi.org/10.4218/etrij.17.0116.0879
work_keys_str_mv	AT guipingwang imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets AT jianxiyang imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets AT renli imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets
_version_	1724798744969347072

Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets

Similar Items