Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets
Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Electronics and Telecommunications Research Institute (ETRI)
2017-10-01
|
Series: | ETRI Journal |
Subjects: | |
Online Access: | https://doi.org/10.4218/etrij.17.0116.0879 |
id |
doaj-fe906d3a01d941ada769c998e77c86a8 |
---|---|
record_format |
Article |
spelling |
doaj-fe906d3a01d941ada769c998e77c86a82020-11-25T02:36:40ZengElectronics and Telecommunications Research Institute (ETRI)ETRI Journal1225-64632233-73262017-10-0139562163110.4218/etrij.17.0116.087910.4218/etrij.17.0116.0879Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training DatasetsGuiPing WangJianXi YangRen LiAbnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false‐negative rate. This article proposes a new imbalanced SVM (termed ImSVM)‐based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over‐sampling techniques and several existing imbalanced SVM‐based techniques.https://doi.org/10.4218/etrij.17.0116.0879Anomaly detectionDecision functionGMeanImbalanced training sample setSupport vector machine (SVM) |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
GuiPing Wang JianXi Yang Ren Li |
spellingShingle |
GuiPing Wang JianXi Yang Ren Li Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets ETRI Journal Anomaly detection Decision function GMean Imbalanced training sample set Support vector machine (SVM) |
author_facet |
GuiPing Wang JianXi Yang Ren Li |
author_sort |
GuiPing Wang |
title |
Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets |
title_short |
Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets |
title_full |
Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets |
title_fullStr |
Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets |
title_full_unstemmed |
Imbalanced SVM‐Based Anomaly Detection Algorithm for Imbalanced Training Datasets |
title_sort |
imbalanced svm‐based anomaly detection algorithm for imbalanced training datasets |
publisher |
Electronics and Telecommunications Research Institute (ETRI) |
series |
ETRI Journal |
issn |
1225-6463 2233-7326 |
publishDate |
2017-10-01 |
description |
Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)‐based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false‐negative rate. This article proposes a new imbalanced SVM (termed ImSVM)‐based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over‐sampling techniques and several existing imbalanced SVM‐based techniques. |
topic |
Anomaly detection Decision function GMean Imbalanced training sample set Support vector machine (SVM) |
url |
https://doi.org/10.4218/etrij.17.0116.0879 |
work_keys_str_mv |
AT guipingwang imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets AT jianxiyang imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets AT renli imbalancedsvmbasedanomalydetectionalgorithmforimbalancedtrainingdatasets |
_version_ |
1724798744969347072 |