Scalable and Multifaceted Search and Its Application for Binary Malware Files

Malicious binary files are a serious threat to industrial information systems. Because of their large number, an automatic assistant tool becomes essential for analysis, and finding similar files would be a great help. In this paper, we present a fast, scalable, and multifaceted search scheme to fin...

Full description

Bibliographic Details
Main Authors: Donghoon Kim, Junnyung Hur, Myungkeun Yoon
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9504570/
Description
Summary:Malicious binary files are a serious threat to industrial information systems. Because of their large number, an automatic assistant tool becomes essential for analysis, and finding similar files would be a great help. In this paper, we present a fast, scalable, and multifaceted search scheme to find similar binary malware files. We use a content-defined chunking algorithm to convert a file into a feature set for the first time. The proposed scheme uses MinHash to reduce any feature set of any file to a fixed size, which significantly improves search accuracy, processing speed, and space utilization. We theoretically prove that the new scheme returns similar files in jaccard index order. Through implementation and experiments with 12 million malicious files, we confirm that the search speed is increased by 600%, space is reduced by 90%, and the accuracy is increased by 400% at least, compared with the state-of-the-art of Elasticsearch.
ISSN:2169-3536