Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack

碩士 === 國立臺灣科技大學 === 電機工程系 === 106 === People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engi...

Full description

Bibliographic Details
Main Authors:	Yu-Hung Chen, 陳昱宏
Other Authors:	Jiann-Liang Chen
Format:	Others
Language:	en_US
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/6w29rj

id	ndltd-TW-106NTUS5442063
record_format	oai_dc
spelling	ndltd-TW-106NTUS54420632019-11-28T05:22:05Z http://ndltd.ncl.edu.tw/handle/6w29rj Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack 智慧化之混合學習架構於網路釣魚研究 Yu-Hung Chen 陳昱宏碩士國立臺灣科技大學電機工程系 106 People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engineering, spam, and other means. Phishing websites normally have short lifetime, and involve a more complex form of attack. They have therefore attracted increasing attention in the area of information security. The prevention of phishing has various elements. In recent years, related research has focused on the prevention of fraudulent phishing. However, current mechanisms and processes for identifying rely too much on manual identification, which is inefficient for the real-time phishing information collection. This study proposes a hybrid learning architecture that combines various developed Machine Learning and Deep Learning mechanisms, including the suspicious website learning model, feature analysis and hazard assessment, etc. Phishing websites typically imitate legitimate websites, so the number of phishing websites is much larger than the legitimate websites, which arise the issues of data imbalance between the phishing website and legitimate website during data collection. Also, data imbalance leads the model trending to the category with larger number of data. To solve the problem of model migration, this study utilizes Generative Adversarial Networks (GANs) to generate new instances, which complement the category with lesser data. Furthermore, to make the model more convergent and more efficient, an Autoencoder architecture that reduces feature dimensionality and mapping the data is added. Before implement the above-mentioned model training, features are evaluated using mechanisms that include ANOVA, X^2 and Information Gain, so as to filter out unrelated feature or feature with noise as much as possible to improve the stability and reliability of the model. To achieve real-time detection of phishing website and high stability, the hybrid learning architecture eventually generates two phishing detection models to cooperate with each other for phishing detection. Since the convergence time of the machine learning model is much shorter than the deep learning model, the XGBoost detection model (machine learning model) is used initially. After complete the training of deep learning model, the XGBoost model will be replaced by the CNN model (deep learning model) and updated itself afterwards. Experimental results indicate that the accuracy of XGBoost model can reach 99.67% and the CNN model in this investigation has an accuracy of 99.83%. Jiann-Liang Chen 陳俊良 2018 學位論文 ; thesis 157 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 電機工程系 === 106 === People have become increasingly dependent on information technology since the emergence of the Internet. Consequently, hackers engage in financial crimes and computer attacks through the Internet. Nowadays, cyber-attacks may involve Trojans, botnets, social engineering, spam, and other means. Phishing websites normally have short lifetime, and involve a more complex form of attack. They have therefore attracted increasing attention in the area of information security. The prevention of phishing has various elements. In recent years, related research has focused on the prevention of fraudulent phishing. However, current mechanisms and processes for identifying rely too much on manual identification, which is inefficient for the real-time phishing information collection. This study proposes a hybrid learning architecture that combines various developed Machine Learning and Deep Learning mechanisms, including the suspicious website learning model, feature analysis and hazard assessment, etc. Phishing websites typically imitate legitimate websites, so the number of phishing websites is much larger than the legitimate websites, which arise the issues of data imbalance between the phishing website and legitimate website during data collection. Also, data imbalance leads the model trending to the category with larger number of data. To solve the problem of model migration, this study utilizes Generative Adversarial Networks (GANs) to generate new instances, which complement the category with lesser data. Furthermore, to make the model more convergent and more efficient, an Autoencoder architecture that reduces feature dimensionality and mapping the data is added. Before implement the above-mentioned model training, features are evaluated using mechanisms that include ANOVA, X^2 and Information Gain, so as to filter out unrelated feature or feature with noise as much as possible to improve the stability and reliability of the model. To achieve real-time detection of phishing website and high stability, the hybrid learning architecture eventually generates two phishing detection models to cooperate with each other for phishing detection. Since the convergence time of the machine learning model is much shorter than the deep learning model, the XGBoost detection model (machine learning model) is used initially. After complete the training of deep learning model, the XGBoost model will be replaced by the CNN model (deep learning model) and updated itself afterwards. Experimental results indicate that the accuracy of XGBoost model can reach 99.67% and the CNN model in this investigation has an accuracy of 99.83%.
author2	Jiann-Liang Chen
author_facet	Jiann-Liang Chen Yu-Hung Chen 陳昱宏
author	Yu-Hung Chen 陳昱宏
spellingShingle	Yu-Hung Chen 陳昱宏 Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
author_sort	Yu-Hung Chen
title	Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_short	Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_full	Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_fullStr	Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_full_unstemmed	Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack
title_sort	intelligent hybrid learning architecture for cyber-phishing attack
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/6w29rj
work_keys_str_mv	AT yuhungchen intelligenthybridlearningarchitectureforcyberphishingattack AT chényùhóng intelligenthybridlearningarchitectureforcyberphishingattack AT yuhungchen zhìhuìhuàzhīhùnhéxuéxíjiàgòuyúwǎnglùdiàoyúyánjiū AT chényùhóng zhìhuìhuàzhīhùnhéxuéxíjiàgòuyúwǎnglùdiàoyúyánjiū
_version_	1719297038959509504

Intelligent Hybrid Learning Architecture for Cyber-Phishing Attack

Similar Items