Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets

Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for...

Full description

Bibliographic Details
Main Authors: GuiPing Wang, JianXi Yang, Ren Li
Format: Article
Language:English
Published: Electronics and Telecommunications Research Institute (ETRI) 2017-10-01
Series:ETRI Journal
Subjects:
Online Access:https://doi.org/10.4218/etrij.17.0116.0879
id doaj-fe906d3a01d941ada769c998e77c86a8
record_format Article
spelling doaj-fe906d3a01d941ada769c998e77c86a82020-11-25T02:36:40ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632233-73262017-10-0139562163110.4218/etrij.17.0116.087910.4218/etrij.17.0116.0879Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training DatasetsGuiPing WangJianXi YangRen LiAbnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false‐negative rate. This article proposes a new imbalanced SVM (termed ImSVM)‐based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over‐sampling techniques and several existing imbalanced SVM‐based techniques.https://doi.org/10.4218/etrij.17.0116.0879Anomaly detectionDecision functionGMeanImbalanced training sample setSupport vector machine (SVM)
collection DOAJ
language English
format Article
sources DOAJ
author GuiPing Wang
JianXi Yang
Ren Li
spellingShingle GuiPing Wang
JianXi Yang
Ren Li
Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
ETRI Journal
Anomaly detection
Decision function
GMean
Imbalanced training sample set
Support vector machine (SVM)
author_facet GuiPing Wang
JianXi Yang
Ren Li
author_sort GuiPing Wang
title Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_short Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_full Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_fullStr Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_full_unstemmed Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
title_sort imbalanced svm‐based anomaly detection algorithm for imbalanced training datasets
publisher Electronics and Telecommunications Research Institute (ETRI)
series ETRI Journal
issn 1225-6463
2233-7326
publishDate 2017-10-01
description Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false‐negative rate. This article proposes a new imbalanced SVM (termed ImSVM)‐based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over‐sampling techniques and several existing imbalanced SVM‐based techniques.
topic Anomaly detection
Decision function
GMean
Imbalanced training sample set
Support vector machine (SVM)
url https://doi.org/10.4218/etrij.17.0116.0879
work_keys_str_mv AT guipingwang imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets
AT jianxiyang imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets
AT renli imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets
_version_ 1724798744969347072