Version-Wide Software Birthmark via Machine Learning

Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems...

Full description

Bibliographic Details
Main Authors:	Chih-Ko Chung, Pi-Chung Wang
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Software birthmark executable file format Authenticode digital signature machine learning content digest
Online Access:	https://ieeexplore.ieee.org/document/9509024/

id	doaj-09d5318f22b3417eacea6fad26baa4b7
record_format	Article
spelling	doaj-09d5318f22b3417eacea6fad26baa4b72021-08-12T23:00:25ZengIEEEIEEE Access2169-35362021-01-01911081111082510.1109/ACCESS.2021.31031869509024Version-Wide Software Birthmark via Machine LearningChih-Ko Chung0https://orcid.org/0000-0001-6392-9560Pi-Chung Wang1https://orcid.org/0000-0002-4220-2853Department of Computer Science and Engineering, National Chung Hsing University, Taichung, TaiwanDepartment of Computer Science and Engineering, National Chung Hsing University, Taichung, TaiwanIdentifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.https://ieeexplore.ieee.org/document/9509024/Software birthmarkexecutable file formatAuthenticodedigital signaturemachine learningcontent digest
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chih-Ko Chung Pi-Chung Wang
spellingShingle	Chih-Ko Chung Pi-Chung Wang Version-Wide Software Birthmark via Machine Learning IEEE Access Software birthmark executable file format Authenticode digital signature machine learning content digest
author_facet	Chih-Ko Chung Pi-Chung Wang
author_sort	Chih-Ko Chung
title	Version-Wide Software Birthmark via Machine Learning
title_short	Version-Wide Software Birthmark via Machine Learning
title_full	Version-Wide Software Birthmark via Machine Learning
title_fullStr	Version-Wide Software Birthmark via Machine Learning
title_full_unstemmed	Version-Wide Software Birthmark via Machine Learning
title_sort	version-wide software birthmark via machine learning
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows <italic>kernel32.dll</italic> and 545 versions of <italic>firefox.exe</italic>. We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.
topic	Software birthmark executable file format Authenticode digital signature machine learning content digest
url	https://ieeexplore.ieee.org/document/9509024/
work_keys_str_mv	AT chihkochung versionwidesoftwarebirthmarkviamachinelearning AT pichungwang versionwidesoftwarebirthmarkviamachinelearning
_version_	1721209150670635008

Version-Wide Software Birthmark via Machine Learning

Similar Items