Scalable and Multifaceted Search and Its Application for Binary Malware Files

Malicious binary files are a serious threat to industrial information systems. Because of their large number, an automatic assistant tool becomes essential for analysis, and finding similar files would be a great help. In this paper, we present a fast, scalable, and multifaceted search scheme to fin...

Full description

Bibliographic Details
Main Authors: Donghoon Kim, Junnyung Hur, Myungkeun Yoon
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9504570/
id doaj-ac256e8bf94444ff9c60d2599ee684f7
record_format Article
spelling doaj-ac256e8bf94444ff9c60d2599ee684f72021-08-23T23:00:44ZengIEEEIEEE Access2169-35362021-01-01911277011277910.1109/ACCESS.2021.31021579504570Scalable and Multifaceted Search and Its Application for Binary Malware FilesDonghoon Kim0https://orcid.org/0000-0002-2693-9492Junnyung Hur1https://orcid.org/0000-0001-8527-872XMyungkeun Yoon2https://orcid.org/0000-0003-1987-1394Kia Corporation, Seoul, Seocho-gu, Republic of KoreaDepartment of Computer Science, Kookmin University, Seoul, Seongbuk-gu, Republic of KoreaDepartment of Computer Science, Kookmin University, Seoul, Seongbuk-gu, Republic of KoreaMalicious binary files are a serious threat to industrial information systems. Because of their large number, an automatic assistant tool becomes essential for analysis, and finding similar files would be a great help. In this paper, we present a fast, scalable, and multifaceted search scheme to find similar binary malware files. We use a content-defined chunking algorithm to convert a file into a feature set for the first time. The proposed scheme uses MinHash to reduce any feature set of any file to a fixed size, which significantly improves search accuracy, processing speed, and space utilization. We theoretically prove that the new scheme returns similar files in jaccard index order. Through implementation and experiments with 12 million malicious files, we confirm that the search speed is increased by 600%, space is reduced by 90%, and the accuracy is increased by 400% at least, compared with the state-of-the-art of Elasticsearch.https://ieeexplore.ieee.org/document/9504570/Elasticsearchinverted indexjaccard indexmalwareMinHash
collection DOAJ
language English
format Article
sources DOAJ
author Donghoon Kim
Junnyung Hur
Myungkeun Yoon
spellingShingle Donghoon Kim
Junnyung Hur
Myungkeun Yoon
Scalable and Multifaceted Search and Its Application for Binary Malware Files
IEEE Access
Elasticsearch
inverted index
jaccard index
malware
MinHash
author_facet Donghoon Kim
Junnyung Hur
Myungkeun Yoon
author_sort Donghoon Kim
title Scalable and Multifaceted Search and Its Application for Binary Malware Files
title_short Scalable and Multifaceted Search and Its Application for Binary Malware Files
title_full Scalable and Multifaceted Search and Its Application for Binary Malware Files
title_fullStr Scalable and Multifaceted Search and Its Application for Binary Malware Files
title_full_unstemmed Scalable and Multifaceted Search and Its Application for Binary Malware Files
title_sort scalable and multifaceted search and its application for binary malware files
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Malicious binary files are a serious threat to industrial information systems. Because of their large number, an automatic assistant tool becomes essential for analysis, and finding similar files would be a great help. In this paper, we present a fast, scalable, and multifaceted search scheme to find similar binary malware files. We use a content-defined chunking algorithm to convert a file into a feature set for the first time. The proposed scheme uses MinHash to reduce any feature set of any file to a fixed size, which significantly improves search accuracy, processing speed, and space utilization. We theoretically prove that the new scheme returns similar files in jaccard index order. Through implementation and experiments with 12 million malicious files, we confirm that the search speed is increased by 600%, space is reduced by 90%, and the accuracy is increased by 400% at least, compared with the state-of-the-art of Elasticsearch.
topic Elasticsearch
inverted index
jaccard index
malware
MinHash
url https://ieeexplore.ieee.org/document/9504570/
work_keys_str_mv AT donghoonkim scalableandmultifacetedsearchanditsapplicationforbinarymalwarefiles
AT junnyunghur scalableandmultifacetedsearchanditsapplicationforbinarymalwarefiles
AT myungkeunyoon scalableandmultifacetedsearchanditsapplicationforbinarymalwarefiles
_version_ 1721198083007578112