An improved bytecode similarity measurement for malicious JavaScript code detection

碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine lea...

Full description

Bibliographic Details
Main Authors: Jun-Xu Yang, 楊鈞旭
Other Authors: Wei-Chung Teng
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/dyhtkn
id ndltd-TW-107NTUS5392059
record_format oai_dc
spelling ndltd-TW-107NTUS53920592019-10-24T05:20:28Z http://ndltd.ncl.edu.tw/handle/dyhtkn An improved bytecode similarity measurement for malicious JavaScript code detection 一個改良位元組碼相似度計算之JavaScript惡意程式偵測方法 Jun-Xu Yang 楊鈞旭 碩士 國立臺灣科技大學 資訊工程系 107 Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine learning techniques to capture characteristics of malicious scripts, and can detect malicious code by characteristics of malicious scripts. Malware classification is subject to concept drift, meaning the nature of malware changes over time. Due to malware often intentionally break rules regarding format specification or attempt undefined behavior, feature extraction based on domain knowledge for malware properties must also be updated in response to changes in malware. It will require additional overhead for feature extraction which is compounded by the changing nature of malware. Therefore, The minimization of domain knowledge is the most important in feature extraction. The main premise of this research is to measure the similarity of bytecode between different objects to detect malicious code, because it can use less domain knowledge to detect malicious code. In previous research, Ming Li et al [1] proposed the Normalized Compression Distance (NCD), a valid measure that measures the similarity of any two objects. There have been many researches [2] [3] [4] [5] compare the raw byte contents or API call sequences to detect malware by NCD. In latest research, Edward Raf et al [6] [7] proposed the Lempel-Ziv Jaccard Distance(LZJD), a measure that would perform better in larger sequences than NCD. This research will mainly uses LZJD proposed by Edward Raf et al. [6] [7] to improve the detection rate of malicious code. The experiments show that the architecture and algorithm proposed in this research give low false positive rate(0.43%) and low false negative rate(6.89%) compared with previous researches. This also represents the preprocessing of the bytecode is better than the previous researches. Wei-Chung Teng 鄧惟中 2019 學位論文 ; thesis 31 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine learning techniques to capture characteristics of malicious scripts, and can detect malicious code by characteristics of malicious scripts. Malware classification is subject to concept drift, meaning the nature of malware changes over time. Due to malware often intentionally break rules regarding format specification or attempt undefined behavior, feature extraction based on domain knowledge for malware properties must also be updated in response to changes in malware. It will require additional overhead for feature extraction which is compounded by the changing nature of malware. Therefore, The minimization of domain knowledge is the most important in feature extraction. The main premise of this research is to measure the similarity of bytecode between different objects to detect malicious code, because it can use less domain knowledge to detect malicious code. In previous research, Ming Li et al [1] proposed the Normalized Compression Distance (NCD), a valid measure that measures the similarity of any two objects. There have been many researches [2] [3] [4] [5] compare the raw byte contents or API call sequences to detect malware by NCD. In latest research, Edward Raf et al [6] [7] proposed the Lempel-Ziv Jaccard Distance(LZJD), a measure that would perform better in larger sequences than NCD. This research will mainly uses LZJD proposed by Edward Raf et al. [6] [7] to improve the detection rate of malicious code. The experiments show that the architecture and algorithm proposed in this research give low false positive rate(0.43%) and low false negative rate(6.89%) compared with previous researches. This also represents the preprocessing of the bytecode is better than the previous researches.
author2 Wei-Chung Teng
author_facet Wei-Chung Teng
Jun-Xu Yang
楊鈞旭
author Jun-Xu Yang
楊鈞旭
spellingShingle Jun-Xu Yang
楊鈞旭
An improved bytecode similarity measurement for malicious JavaScript code detection
author_sort Jun-Xu Yang
title An improved bytecode similarity measurement for malicious JavaScript code detection
title_short An improved bytecode similarity measurement for malicious JavaScript code detection
title_full An improved bytecode similarity measurement for malicious JavaScript code detection
title_fullStr An improved bytecode similarity measurement for malicious JavaScript code detection
title_full_unstemmed An improved bytecode similarity measurement for malicious JavaScript code detection
title_sort improved bytecode similarity measurement for malicious javascript code detection
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/dyhtkn
work_keys_str_mv AT junxuyang animprovedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection
AT yángjūnxù animprovedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection
AT junxuyang yīgègǎiliángwèiyuánzǔmǎxiāngshìdùjìsuànzhījavascriptèyìchéngshìzhēncèfāngfǎ
AT yángjūnxù yīgègǎiliángwèiyuánzǔmǎxiāngshìdùjìsuànzhījavascriptèyìchéngshìzhēncèfāngfǎ
AT junxuyang improvedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection
AT yángjūnxù improvedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection
_version_ 1719277067061690368