Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
碩士 === 國立臺灣科技大學 === 電機工程系 === 106 === People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engi...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/6w29rj |
id |
ndltd-TW-106NTUS5442063 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106NTUS54420632019-11-28T05:22:05Z http://ndltd.ncl.edu.tw/handle/6w29rj Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack 智慧化之混合學習架構於網路釣魚研究 Yu-Hung Chen 陳昱宏 碩士 國立臺灣科技大學 電機工程系 106 People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engineering, spam, and other means. Phishing websites normally have short lifetime, and involve a more complex form of attack. They have therefore attracted increasing attention in the area of information security. The prevention of phishing has various elements. In recent years, related research has focused on the prevention of fraudulent phishing. However, current mechanisms and processes for identifying rely too much on manual identification, which is inefficient for the real-time phishing information collection. This study proposes a hybrid learning architecture that combines various developed Machine Learning and Deep Learning mechanisms, including the suspicious website learning model, feature analysis and hazard assessment, etc. Phishing websites typically imitate legitimate websites, so the number of phishing websites is much larger than the legitimate websites, which arise the issues of data imbalance between the phishing website and legitimate website during data collection. Also, data imbalance leads the model trending to the category with larger number of data. To solve the problem of model migration, this study utilizes Generative Adversarial Networks (GANs) to generate new instances, which complement the category with lesser data. Furthermore, to make the model more convergent and more efficient, an Autoencoder architecture that reduces feature dimensionality and mapping the data is added. Before implement the above-mentioned model training, features are evaluated using mechanisms that include ANOVA, X^2 and Information Gain, so as to filter out unrelated feature or feature with noise as much as possible to improve the stability and reliability of the model. To achieve real-time detection of phishing website and high stability, the hybrid learning architecture eventually generates two phishing detection models to cooperate with each other for phishing detection. Since the convergence time of the machine learning model is much shorter than the deep learning model, the XGBoost detection model (machine learning model) is used initially. After complete the training of deep learning model, the XGBoost model will be replaced by the CNN model (deep learning model) and updated itself afterwards. Experimental results indicate that the accuracy of XGBoost model can reach 99.67% and the CNN model in this investigation has an accuracy of 99.83%. Jiann-Liang Chen 陳俊良 2018 學位論文 ; thesis 157 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 電機工程系 === 106 === People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engineering, spam, and other means. Phishing websites normally have short lifetime, and involve a more complex form of attack. They have therefore attracted increasing attention in the area of information security. The prevention of phishing has various elements. In recent years, related research has focused on the prevention of fraudulent phishing. However, current mechanisms and processes for identifying rely too much on manual identification, which is inefficient for the real-time phishing information collection.
This study proposes a hybrid learning architecture that combines various developed Machine Learning and Deep Learning mechanisms, including the suspicious website learning model, feature analysis and hazard assessment, etc. Phishing websites typically imitate legitimate websites, so the number of phishing websites is much larger than the legitimate websites, which arise the issues of data imbalance between the phishing website and legitimate website during data collection. Also, data imbalance leads the model trending to the category with larger number of data. To solve the problem of model migration, this study utilizes Generative Adversarial Networks (GANs) to generate new instances, which complement the category with lesser data. Furthermore, to make the model more convergent and more efficient, an Autoencoder architecture that reduces feature dimensionality and mapping the data is added. Before implement the above-mentioned model training, features are evaluated using mechanisms that include ANOVA, X^2 and Information Gain, so as to filter out unrelated feature or feature with noise as much as possible to improve the stability and reliability of the model.
To achieve real-time detection of phishing website and high stability, the hybrid learning architecture eventually generates two phishing detection models to cooperate with each other for phishing detection. Since the convergence time of the machine learning model is much shorter than the deep learning model, the XGBoost detection model (machine learning model) is used initially. After complete the training of deep learning model, the XGBoost model will be replaced by the CNN model (deep learning model) and updated itself afterwards. Experimental results indicate that the accuracy of XGBoost model can reach 99.67% and the CNN model in this investigation has an accuracy of 99.83%.
|
author2 |
Jiann-Liang Chen |
author_facet |
Jiann-Liang Chen Yu-Hung Chen 陳昱宏 |
author |
Yu-Hung Chen 陳昱宏 |
spellingShingle |
Yu-Hung Chen 陳昱宏 Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack |
author_sort |
Yu-Hung Chen |
title |
Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack |
title_short |
Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack |
title_full |
Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack |
title_fullStr |
Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack |
title_full_unstemmed |
Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack |
title_sort |
intelligent hybrid learning architecture for cyber-phishing attack |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/6w29rj |
work_keys_str_mv |
AT yuhungchen intelligenthybridlearningarchitectureforcyberphishingattack AT chényùhóng intelligenthybridlearningarchitectureforcyberphishingattack AT yuhungchen zhìhuìhuàzhīhùnhéxuéxíjiàgòuyúwǎnglùdiàoyúyánjiū AT chényùhóng zhìhuìhuàzhīhùnhéxuéxíjiàgòuyúwǎnglùdiàoyúyánjiū |
_version_ |
1719297038959509504 |