Version-Wide Software Birthmark via Machine Learning

Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems...

Full description

Bibliographic Details
Main Authors: Chih-Ko Chung, Pi-Chung Wang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9509024/
Description
Summary:Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.
ISSN:2169-3536