Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API

With the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose...

Full description

Bibliographic Details
Main Authors: Yong Fang, Wenjie Zhang, Beibei Li, Fan Jing, Lei Zhang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
EMD
Online Access:https://ieeexplore.ieee.org/document/8943285/
id doaj-ddff59b13f644f3cbd2d31d49b02171d
record_format Article
spelling doaj-ddff59b13f644f3cbd2d31d49b02171d2021-03-30T01:11:39ZengIEEEIEEE Access2169-35362020-01-0182313232610.1109/ACCESS.2019.29621988943285Semi-Supervised Malware Clustering Based on the Weight of Bytecode and APIYong Fang0https://orcid.org/0000-0003-0708-1686Wenjie Zhang1https://orcid.org/0000-0002-4033-0253Beibei Li2https://orcid.org/0000-0002-0485-1975Fan Jing3https://orcid.org/0000-0001-9133-1742Lei Zhang4https://orcid.org/0000-0001-8074-906XCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaWith the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose a novel malware family clustering approach based on dynamic and static features with their weights. In this approach, we employ a new similarity measurement method based on EMD to improve the accuracy of feature similarities. In addition, to reduce convergence time and improve clustering purity, we design a novel semi-supervised clustering algorithm, termed as S-DBSCAN by involving supervision information into the original algorithm known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The experimental results demonstrate that the proposed approach can correctly and accurately distinguish the samples among various families and achieve outperformed purity with 98.7%.https://ieeexplore.ieee.org/document/8943285/EMDhybrid featuressemi-supervised clusteringweight
collection DOAJ
language English
format Article
sources DOAJ
author Yong Fang
Wenjie Zhang
Beibei Li
Fan Jing
Lei Zhang
spellingShingle Yong Fang
Wenjie Zhang
Beibei Li
Fan Jing
Lei Zhang
Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
IEEE Access
EMD
hybrid features
semi-supervised clustering
weight
author_facet Yong Fang
Wenjie Zhang
Beibei Li
Fan Jing
Lei Zhang
author_sort Yong Fang
title Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
title_short Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
title_full Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
title_fullStr Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
title_full_unstemmed Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
title_sort semi-supervised malware clustering based on the weight of bytecode and api
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description With the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose a novel malware family clustering approach based on dynamic and static features with their weights. In this approach, we employ a new similarity measurement method based on EMD to improve the accuracy of feature similarities. In addition, to reduce convergence time and improve clustering purity, we design a novel semi-supervised clustering algorithm, termed as S-DBSCAN by involving supervision information into the original algorithm known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The experimental results demonstrate that the proposed approach can correctly and accurately distinguish the samples among various families and achieve outperformed purity with 98.7%.
topic EMD
hybrid features
semi-supervised clustering
weight
url https://ieeexplore.ieee.org/document/8943285/
work_keys_str_mv AT yongfang semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi
AT wenjiezhang semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi
AT beibeili semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi
AT fanjing semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi
AT leizhang semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi
_version_ 1724187533710983168