Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API
With the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8943285/ |
id |
doaj-ddff59b13f644f3cbd2d31d49b02171d |
---|---|
record_format |
Article |
spelling |
doaj-ddff59b13f644f3cbd2d31d49b02171d2021-03-30T01:11:39ZengIEEEIEEE Access2169-35362020-01-0182313232610.1109/ACCESS.2019.29621988943285Semi-Supervised Malware Clustering Based on the Weight of Bytecode and APIYong Fang0https://orcid.org/0000-0003-0708-1686Wenjie Zhang1https://orcid.org/0000-0002-4033-0253Beibei Li2https://orcid.org/0000-0002-0485-1975Fan Jing3https://orcid.org/0000-0001-9133-1742Lei Zhang4https://orcid.org/0000-0001-8074-906XCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaCollege of Cybersecurity, Sichuan University, Chengdu, ChinaWith the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose a novel malware family clustering approach based on dynamic and static features with their weights. In this approach, we employ a new similarity measurement method based on EMD to improve the accuracy of feature similarities. In addition, to reduce convergence time and improve clustering purity, we design a novel semi-supervised clustering algorithm, termed as S-DBSCAN by involving supervision information into the original algorithm known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The experimental results demonstrate that the proposed approach can correctly and accurately distinguish the samples among various families and achieve outperformed purity with 98.7%.https://ieeexplore.ieee.org/document/8943285/EMDhybrid featuressemi-supervised clusteringweight |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yong Fang Wenjie Zhang Beibei Li Fan Jing Lei Zhang |
spellingShingle |
Yong Fang Wenjie Zhang Beibei Li Fan Jing Lei Zhang Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API IEEE Access EMD hybrid features semi-supervised clustering weight |
author_facet |
Yong Fang Wenjie Zhang Beibei Li Fan Jing Lei Zhang |
author_sort |
Yong Fang |
title |
Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API |
title_short |
Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API |
title_full |
Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API |
title_fullStr |
Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API |
title_full_unstemmed |
Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API |
title_sort |
semi-supervised malware clustering based on the weight of bytecode and api |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
With the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose a novel malware family clustering approach based on dynamic and static features with their weights. In this approach, we employ a new similarity measurement method based on EMD to improve the accuracy of feature similarities. In addition, to reduce convergence time and improve clustering purity, we design a novel semi-supervised clustering algorithm, termed as S-DBSCAN by involving supervision information into the original algorithm known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The experimental results demonstrate that the proposed approach can correctly and accurately distinguish the samples among various families and achieve outperformed purity with 98.7%. |
topic |
EMD hybrid features semi-supervised clustering weight |
url |
https://ieeexplore.ieee.org/document/8943285/ |
work_keys_str_mv |
AT yongfang semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi AT wenjiezhang semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi AT beibeili semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi AT fanjing semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi AT leizhang semisupervisedmalwareclusteringbasedontheweightofbytecodeandapi |
_version_ |
1724187533710983168 |