Dynamic analyses of malware

This thesis examines machine learning techniques for detecting malware using dynamic runtime opcodes. Previous work in the field has faltered on inadequately sized and poorly sampled datasets. A novel run-trace dataset is presented, the largest in the literature to date. Using this dataset, malware...

Full description

Bibliographic Details
Main Author:	Carlin, Domhnall
Other Authors:	Sezer, Sakir ; O'Kane, Philip
Published:	Queen's University Belfast 2018
Online Access:	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766287

id	ndltd-bl.uk-oai-ethos.bl.uk-766287
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-7662872019-02-27T03:27:32ZDynamic analyses of malwareCarlin, DomhnallSezer, Sakir ; O'Kane, Philip2018This thesis examines machine learning techniques for detecting malware using dynamic runtime opcodes. Previous work in the field has faltered on inadequately sized and poorly sampled datasets. A novel run-trace dataset is presented, the largest in the literature to date. Using this dataset, malware detection using opcode analysis is shown to be not only feasible, but highly accurate at short run-lengths and without computationally-expensive sequencing analysis. Second, unsupervised learning is used to investigate the effects of anti-virus (AV) labels on detection rates. AV labels offer an English-language description of the malware type, whereas it is found that using an assembly language description is more beneficial in malware triaging. Third, the machine learning techniques are applied to ransomware run-traces, which has not been explored in the literature to date. This offers four further novel contributions: examination of dynamic API calls vs opcode traces in ransomware; run-lengths necessary to detect ransomware accurately; creation of a logical feature reduction algorithm to minimise computational expense in machine learning; the first model in the literature which can differentiate between benign encryption (zipping) and malicious encryption. Lastly, the computational costs of 23 machine learning algorithms are investigated with respect to the run trace dataset. In the literature, researchers discuss the explosion of malware, yet opcode analyses have used fixed-size datasets, with no deference to how this model will cope with retraining on escalating datasets. The cost of retraining and testing updatable and non-updatable classifiers, both parallelised and non-parallelised, is examined with simulated escalating datasets. Lastly, a model is proposed and examined to mitigate the disadvantages of the most successful classifiers for future work.Queen's University Belfasthttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766287Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
description	This thesis examines machine learning techniques for detecting malware using dynamic runtime opcodes. Previous work in the field has faltered on inadequately sized and poorly sampled datasets. A novel run-trace dataset is presented, the largest in the literature to date. Using this dataset, malware detection using opcode analysis is shown to be not only feasible, but highly accurate at short run-lengths and without computationally-expensive sequencing analysis. Second, unsupervised learning is used to investigate the effects of anti-virus (AV) labels on detection rates. AV labels offer an English-language description of the malware type, whereas it is found that using an assembly language description is more beneficial in malware triaging. Third, the machine learning techniques are applied to ransomware run-traces, which has not been explored in the literature to date. This offers four further novel contributions: examination of dynamic API calls vs opcode traces in ransomware; run-lengths necessary to detect ransomware accurately; creation of a logical feature reduction algorithm to minimise computational expense in machine learning; the first model in the literature which can differentiate between benign encryption (zipping) and malicious encryption. Lastly, the computational costs of 23 machine learning algorithms are investigated with respect to the run trace dataset. In the literature, researchers discuss the explosion of malware, yet opcode analyses have used fixed-size datasets, with no deference to how this model will cope with retraining on escalating datasets. The cost of retraining and testing updatable and non-updatable classifiers, both parallelised and non-parallelised, is examined with simulated escalating datasets. Lastly, a model is proposed and examined to mitigate the disadvantages of the most successful classifiers for future work.
author2	Sezer, Sakir ; O'Kane, Philip
author_facet	Sezer, Sakir ; O'Kane, Philip Carlin, Domhnall
author	Carlin, Domhnall
spellingShingle	Carlin, Domhnall Dynamic analyses of malware
author_sort	Carlin, Domhnall
title	Dynamic analyses of malware
title_short	Dynamic analyses of malware
title_full	Dynamic analyses of malware
title_fullStr	Dynamic analyses of malware
title_full_unstemmed	Dynamic analyses of malware
title_sort	dynamic analyses of malware
publisher	Queen's University Belfast
publishDate	2018
url	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766287
work_keys_str_mv	AT carlindomhnall dynamicanalysesofmalware
_version_	1718984348291563520

Dynamic analyses of malware

Similar Items