Version-Wide Software Birthmark via Machine Learning

Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems...

Full description

Bibliographic Details
Main Authors: Chih-Ko Chung, Pi-Chung Wang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9509024/
id doaj-09d5318f22b3417eacea6fad26baa4b7
record_format Article
spelling doaj-09d5318f22b3417eacea6fad26baa4b72021-08-12T23:00:25ZengIEEEIEEE Access2169-35362021-01-01911081111082510.1109/ACCESS.2021.31031869509024Version-Wide Software Birthmark via Machine LearningChih-Ko Chung0https://orcid.org/0000-0001-6392-9560Pi-Chung Wang1https://orcid.org/0000-0002-4220-2853Department of Computer Science and Engineering, National Chung Hsing University, Taichung, TaiwanDepartment of Computer Science and Engineering, National Chung Hsing University, Taichung, TaiwanIdentifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.https://ieeexplore.ieee.org/document/9509024/Software birthmarkexecutable file formatAuthenticodedigital signaturemachine learningcontent digest
collection DOAJ
language English
format Article
sources DOAJ
author Chih-Ko Chung
Pi-Chung Wang
spellingShingle Chih-Ko Chung
Pi-Chung Wang
Version-Wide Software Birthmark via Machine Learning
IEEE Access
Software birthmark
executable file format
Authenticode
digital signature
machine learning
content digest
author_facet Chih-Ko Chung
Pi-Chung Wang
author_sort Chih-Ko Chung
title Version-Wide Software Birthmark via Machine Learning
title_short Version-Wide Software Birthmark via Machine Learning
title_full Version-Wide Software Birthmark via Machine Learning
title_fullStr Version-Wide Software Birthmark via Machine Learning
title_full_unstemmed Version-Wide Software Birthmark via Machine Learning
title_sort version-wide software birthmark via machine learning
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.
topic Software birthmark
executable file format
Authenticode
digital signature
machine learning
content digest
url https://ieeexplore.ieee.org/document/9509024/
work_keys_str_mv AT chihkochung versionwidesoftwarebirthmarkviamachinelearning
AT pichungwang versionwidesoftwarebirthmarkviamachinelearning
_version_ 1721209150670635008