Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack

碩士 === 國立臺灣科技大學 === 電機工程系 === 106 === People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engi...

Full description

Bibliographic Details
Main Authors: Yu-Hung Chen, 陳昱宏
Other Authors: Jiann-Liang Chen
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/6w29rj
id ndltd-TW-106NTUS5442063
record_format oai_dc
spelling ndltd-TW-106NTUS54420632019-11-28T05:22:05Z http://ndltd.ncl.edu.tw/handle/6w29rj Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack 智慧化之混合學習架構於網路釣魚研究 Yu-Hung Chen 陳昱宏 碩士 國立臺灣科技大學 電機工程系 106 People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engineering, spam, and other means. Phishing websites normally have short lifetime, and involve a more complex form of attack. They have therefore attracted increasing attention in the area of information security. The prevention of phishing has various elements. In recent years, related research has focused on the prevention of fraudulent phishing. However, current mechanisms and processes for identifying rely too much on manual identification, which is inefficient for the real-time phishing information collection. This study proposes a hybrid learning architecture that combines various developed Machine Learning and Deep Learning mechanisms, including the suspicious website learning model, feature analysis and hazard assessment, etc. Phishing websites typically imitate legitimate websites, so the number of phishing websites is much larger than the legitimate websites, which arise the issues of data imbalance between the phishing website and legitimate website during data collection. Also, data imbalance leads the model trending to the category with larger number of data. To solve the problem of model migration, this study utilizes Generative Adversarial Networks (GANs) to generate new instances, which complement the category with lesser data. Furthermore, to make the model more convergent and more efficient, an Autoencoder architecture that reduces feature dimensionality and mapping the data is added. Before implement the above-mentioned model training, features are evaluated using mechanisms that include ANOVA, X^2 and Information Gain, so as to filter out unrelated feature or feature with noise as much as possible to improve the stability and reliability of the model. To achieve real-time detection of phishing website and high stability, the hybrid learning architecture eventually generates two phishing detection models to cooperate with each other for phishing detection. Since the convergence time of the machine learning model is much shorter than the deep learning model, the XGBoost detection model (machine learning model) is used initially. After complete the training of deep learning model, the XGBoost model will be replaced by the CNN model (deep learning model) and updated itself afterwards. Experimental results indicate that the accuracy of XGBoost model can reach 99.67% and the CNN model in this investigation has an accuracy of 99.83%. Jiann-Liang Chen 陳俊良 2018 學位論文 ; thesis 157 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 電機工程系 === 106 === People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engineering, spam, and other means. Phishing websites normally have short lifetime, and involve a more complex form of attack. They have therefore attracted increasing attention in the area of information security. The prevention of phishing has various elements. In recent years, related research has focused on the prevention of fraudulent phishing. However, current mechanisms and processes for identifying rely too much on manual identification, which is inefficient for the real-time phishing information collection. This study proposes a hybrid learning architecture that combines various developed Machine Learning and Deep Learning mechanisms, including the suspicious website learning model, feature analysis and hazard assessment, etc. Phishing websites typically imitate legitimate websites, so the number of phishing websites is much larger than the legitimate websites, which arise the issues of data imbalance between the phishing website and legitimate website during data collection. Also, data imbalance leads the model trending to the category with larger number of data. To solve the problem of model migration, this study utilizes Generative Adversarial Networks (GANs) to generate new instances, which complement the category with lesser data. Furthermore, to make the model more convergent and more efficient, an Autoencoder architecture that reduces feature dimensionality and mapping the data is added. Before implement the above-mentioned model training, features are evaluated using mechanisms that include ANOVA, X^2 and Information Gain, so as to filter out unrelated feature or feature with noise as much as possible to improve the stability and reliability of the model. To achieve real-time detection of phishing website and high stability, the hybrid learning architecture eventually generates two phishing detection models to cooperate with each other for phishing detection. Since the convergence time of the machine learning model is much shorter than the deep learning model, the XGBoost detection model (machine learning model) is used initially. After complete the training of deep learning model, the XGBoost model will be replaced by the CNN model (deep learning model) and updated itself afterwards. Experimental results indicate that the accuracy of XGBoost model can reach 99.67% and the CNN model in this investigation has an accuracy of 99.83%.
author2 Jiann-Liang Chen
author_facet Jiann-Liang Chen
Yu-Hung Chen
陳昱宏
author Yu-Hung Chen
陳昱宏
spellingShingle Yu-Hung Chen
陳昱宏
Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
author_sort Yu-Hung Chen
title Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_short Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_full Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_fullStr Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_full_unstemmed Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_sort intelligent hybrid learning architecture for cyber-phishing attack
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/6w29rj
work_keys_str_mv AT yuhungchen intelligenthybridlearningarchitectureforcyberphishingattack
AT chényùhóng intelligenthybridlearningarchitectureforcyberphishingattack
AT yuhungchen zhìhuìhuàzhīhùnhéxuéxíjiàgòuyúwǎnglùdiàoyúyánjiū
AT chényùhóng zhìhuìhuàzhīhùnhéxuéxíjiàgòuyúwǎnglùdiàoyúyánjiū
_version_ 1719297038959509504