Version-Wide Software Birthmark via Machine Learning
Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9509024/ |
id |
doaj-09d5318f22b3417eacea6fad26baa4b7 |
---|---|
record_format |
Article |
spelling |
doaj-09d5318f22b3417eacea6fad26baa4b72021-08-12T23:00:25ZengIEEEIEEE Access2169-35362021-01-01911081111082510.1109/ACCESS.2021.31031869509024Version-Wide Software Birthmark via Machine LearningChih-Ko Chung0https://orcid.org/0000-0001-6392-9560Pi-Chung Wang1https://orcid.org/0000-0002-4220-2853Department of Computer Science and Engineering, National Chung Hsing University, Taichung, TaiwanDepartment of Computer Science and Engineering, National Chung Hsing University, Taichung, TaiwanIdentifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.https://ieeexplore.ieee.org/document/9509024/Software birthmarkexecutable file formatAuthenticodedigital signaturemachine learningcontent digest |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chih-Ko Chung Pi-Chung Wang |
spellingShingle |
Chih-Ko Chung Pi-Chung Wang Version-Wide Software Birthmark via Machine Learning IEEE Access Software birthmark executable file format Authenticode digital signature machine learning content digest |
author_facet |
Chih-Ko Chung Pi-Chung Wang |
author_sort |
Chih-Ko Chung |
title |
Version-Wide Software Birthmark via Machine Learning |
title_short |
Version-Wide Software Birthmark via Machine Learning |
title_full |
Version-Wide Software Birthmark via Machine Learning |
title_fullStr |
Version-Wide Software Birthmark via Machine Learning |
title_full_unstemmed |
Version-Wide Software Birthmark via Machine Learning |
title_sort |
version-wide software birthmark via machine learning |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates. |
topic |
Software birthmark executable file format Authenticode digital signature machine learning content digest |
url |
https://ieeexplore.ieee.org/document/9509024/ |
work_keys_str_mv |
AT chihkochung versionwidesoftwarebirthmarkviamachinelearning AT pichungwang versionwidesoftwarebirthmarkviamachinelearning |
_version_ |
1721209150670635008 |