An improved bytecode similarity measurement for malicious JavaScript code detection
碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine lea...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/dyhtkn |
id |
ndltd-TW-107NTUS5392059 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NTUS53920592019-10-24T05:20:28Z http://ndltd.ncl.edu.tw/handle/dyhtkn An improved bytecode similarity measurement for malicious JavaScript code detection 一個改良位元組碼相似度計算之JavaScript惡意程式偵測方法 Jun-Xu Yang 楊鈞旭 碩士 國立臺灣科技大學 資訊工程系 107 Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine learning techniques to capture characteristics of malicious scripts, and can detect malicious code by characteristics of malicious scripts. Malware classification is subject to concept drift, meaning the nature of malware changes over time. Due to malware often intentionally break rules regarding format specification or attempt undefined behavior, feature extraction based on domain knowledge for malware properties must also be updated in response to changes in malware. It will require additional overhead for feature extraction which is compounded by the changing nature of malware. Therefore, The minimization of domain knowledge is the most important in feature extraction. The main premise of this research is to measure the similarity of bytecode between different objects to detect malicious code, because it can use less domain knowledge to detect malicious code. In previous research, Ming Li et al [1] proposed the Normalized Compression Distance (NCD), a valid measure that measures the similarity of any two objects. There have been many researches [2] [3] [4] [5] compare the raw byte contents or API call sequences to detect malware by NCD. In latest research, Edward Raf et al [6] [7] proposed the Lempel-Ziv Jaccard Distance(LZJD), a measure that would perform better in larger sequences than NCD. This research will mainly uses LZJD proposed by Edward Raf et al. [6] [7] to improve the detection rate of malicious code. The experiments show that the architecture and algorithm proposed in this research give low false positive rate(0.43%) and low false negative rate(6.89%) compared with previous researches. This also represents the preprocessing of the bytecode is better than the previous researches. Wei-Chung Teng 鄧惟中 2019 學位論文 ; thesis 31 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 107 === Detection of malicious JavaScript code can be classified into two lines: dynamic approaches and static approaches. Dynamic approaches are mostly based on low-interaction honey clients and high-interaction honey clients. Static approaches mainly adopt machine learning techniques to capture characteristics of malicious scripts, and can detect malicious code by characteristics of malicious scripts. Malware classification is subject to concept drift, meaning the nature of malware changes over time. Due to malware often intentionally break rules regarding format specification or attempt undefined behavior, feature extraction based on domain knowledge for malware properties must also be updated in response to changes in malware. It will require additional overhead for feature extraction which is compounded by the changing nature of malware. Therefore, The minimization of domain knowledge is the most important in feature extraction. The main premise of this research is to measure the similarity of bytecode between different objects to detect
malicious code, because it can use less domain knowledge to detect malicious code.
In previous research, Ming Li et al [1] proposed the Normalized Compression Distance (NCD), a valid measure that measures the similarity of any two objects. There have been many researches [2] [3] [4] [5] compare the raw byte contents or API call sequences to detect malware by NCD. In latest research, Edward Raf et al [6] [7] proposed the Lempel-Ziv Jaccard Distance(LZJD), a measure that would perform better in larger sequences than NCD. This research will mainly uses LZJD proposed by Edward Raf et al. [6] [7] to improve the detection rate of malicious code.
The experiments show that the architecture and algorithm proposed in this research give low false positive rate(0.43%) and low false negative rate(6.89%) compared with previous researches. This also represents the preprocessing of the bytecode is better than the previous researches.
|
author2 |
Wei-Chung Teng |
author_facet |
Wei-Chung Teng Jun-Xu Yang 楊鈞旭 |
author |
Jun-Xu Yang 楊鈞旭 |
spellingShingle |
Jun-Xu Yang 楊鈞旭 An improved bytecode similarity measurement for malicious JavaScript code detection |
author_sort |
Jun-Xu Yang |
title |
An improved bytecode similarity measurement for malicious JavaScript code detection |
title_short |
An improved bytecode similarity measurement for malicious JavaScript code detection |
title_full |
An improved bytecode similarity measurement for malicious JavaScript code detection |
title_fullStr |
An improved bytecode similarity measurement for malicious JavaScript code detection |
title_full_unstemmed |
An improved bytecode similarity measurement for malicious JavaScript code detection |
title_sort |
improved bytecode similarity measurement for malicious javascript code detection |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/dyhtkn |
work_keys_str_mv |
AT junxuyang animprovedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection AT yángjūnxù animprovedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection AT junxuyang yīgègǎiliángwèiyuánzǔmǎxiāngshìdùjìsuànzhījavascriptèyìchéngshìzhēncèfāngfǎ AT yángjūnxù yīgègǎiliángwèiyuánzǔmǎxiāngshìdùjìsuànzhījavascriptèyìchéngshìzhēncèfāngfǎ AT junxuyang improvedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection AT yángjūnxù improvedbytecodesimilaritymeasurementformaliciousjavascriptcodedetection |
_version_ |
1719277067061690368 |