An improved bytecode similarity measurement for malicious JavaScript code detection

碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine lea...

Full description

Bibliographic Details
Main Authors:	Jun-Xu Yang, 楊鈞旭
Other Authors:	Wei-Chung Teng
Format:	Others
Language:	zh-TW
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/dyhtkn

id	ndltd-TW-107NTUS5392059
record_format	oai_dc
spelling	ndltd-TW-107NTUS53920592019-10-24T05:20:28Z http://ndltd.ncl.edu.tw/handle/dyhtkn An improved bytecode similarity measurement for malicious JavaScript code detection 一個改良位元組碼相似度計算之JavaScript惡意程式偵測方法 Jun-Xu Yang 楊鈞旭碩士國立臺灣科技大學資訊工程系 107 Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine learning techniques to capture characteristics of malicious scripts, and can detect malicious code by characteristics of malicious scripts. Malware classification is subject to concept drift, meaning the nature of malware changes over time. Due to malware often intentionally break rules regarding format specification or attempt undefined behavior, feature extraction based on domain knowledge for malware properties must also be updated in response to changes in malware. It will require additional overhead for feature extraction which is compounded by the changing nature of malware. Therefore, The minimization of domain knowledge is the most important in feature extraction. The main premise of this research is to measure the similarity of bytecode between different objects to detect malicious code, because it can use less domain knowledge to detect malicious code. In previous research, Ming Li et al [1] proposed the Normalized Compression Distance (NCD), a valid measure that measures the similarity of any two objects. There have been many researches [2] [3] [4] [5] compare the raw byte contents or API call sequences to detect malware by NCD. In latest research, Edward Raf et al [6] [7] proposed the Lempel-Ziv Jaccard Distance(LZJD), a measure that would perform better in larger sequences than NCD. This research will mainly uses LZJD proposed by Edward Raf et al. [6] [7] to improve the detection rate of malicious code. The experiments show that the architecture and algorithm proposed in this research give low false positive rate(0.43%) and low false negative rate(6.89%) compared with previous researches. This also represents the preprocessing of the bytecode is better than the previous researches. Wei-Chung Teng 鄧惟中 2019 學位論文 ; thesis 31 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine learning techniques to capture characteristics of malicious scripts, and can detect malicious code by characteristics of malicious scripts. Malware classification is subject to concept drift, meaning the nature of malware changes over time. Due to malware often intentionally break rules regarding format specification or attempt undefined behavior, feature extraction based on domain knowledge for malware properties must also be updated in response to changes in malware. It will require additional overhead for feature extraction which is compounded by the changing nature of malware. Therefore, The minimization of domain knowledge is the most important in feature extraction. The main premise of this research is to measure the similarity of bytecode between different objects to detect malicious code, because it can use less domain knowledge to detect malicious code. In previous research, Ming Li et al [1] proposed the Normalized Compression Distance (NCD), a valid measure that measures the similarity of any two objects. There have been many researches [2] [3] [4] [5] compare the raw byte contents or API call sequences to detect malware by NCD. In latest research, Edward Raf et al [6] [7] proposed the Lempel-Ziv Jaccard Distance(LZJD), a measure that would perform better in larger sequences than NCD. This research will mainly uses LZJD proposed by Edward Raf et al. [6] [7] to improve the detection rate of malicious code. The experiments show that the architecture and algorithm proposed in this research give low false positive rate(0.43%) and low false negative rate(6.89%) compared with previous researches. This also represents the preprocessing of the bytecode is better than the previous researches.
author2	Wei-Chung Teng
author_facet	Wei-Chung Teng Jun-Xu Yang 楊鈞旭
author	Jun-Xu Yang 楊鈞旭
spellingShingle	Jun-Xu Yang 楊鈞旭 An improved bytecode similarity measurement for malicious JavaScript code detection
author_sort	Jun-Xu Yang
title	An improved bytecode similarity measurement for malicious JavaScript code detection
title_short	An improved bytecode similarity measurement for malicious JavaScript code detection
title_full	An improved bytecode similarity measurement for malicious JavaScript code detection
title_fullStr	An improved bytecode similarity measurement for malicious JavaScript code detection
title_full_unstemmed	An improved bytecode similarity measurement for malicious JavaScript code detection
title_sort	improved bytecode similarity measurement for malicious javascript code detection
publishDate	2019
url	http://ndltd.ncl.edu.tw/handle/dyhtkn
work_keys_str_mv	AT junxuyang animprovedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection AT yángjūnxù animprovedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection AT junxuyang yīgègǎiliángwèiyuánzǔmǎxiāngshìdùjìsuànzhījavascriptèyìchéngshìzhēncèfāngfǎ AT yángjūnxù yīgègǎiliángwèiyuánzǔmǎxiāngshìdùjìsuànzhījavascriptèyìchéngshìzhēncèfāngfǎ AT junxuyang improvedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection AT yángjūnxù improvedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection
_version_	1719277067061690368

An improved bytecode similarity measurement for malicious JavaScript code detection

Similar Items