Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks

Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachmen...

Full description

Bibliographic Details
Main Authors:	Bo Sun, Tao Ban, Chansu Han, Takeshi Takahashi, Katsunari Yoshioka, Jun'ichi Takeuchi, Abdolhossein Sarrafzadeh, Meikang Qiu, Daisuke Inoue
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Targeted email attack decoy document machine learning natural language processing
Online Access:	https://ieeexplore.ieee.org/document/9435284/

id	doaj-dd70700bf3a14f6ca626cd8f2b617564
record_format	Article
spelling	doaj-dd70700bf3a14f6ca626cd8f2b6175642021-06-22T23:00:21ZengIEEEIEEE Access2169-35362021-01-019879628797110.1109/ACCESS.2021.30820009435284Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email AttacksBo Sun0https://orcid.org/0000-0002-7822-3672Tao Ban1https://orcid.org/0000-0002-9616-3212Chansu Han2https://orcid.org/0000-0002-1728-5300Takeshi Takahashi3https://orcid.org/0000-0002-6477-7770Katsunari Yoshioka4Jun'ichi Takeuchi5https://orcid.org/0000-0002-5819-3082Abdolhossein Sarrafzadeh6Meikang Qiu7Daisuke Inoue8Department of Information Systems, Saitama Institute of Technology, Fukaya, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanGraduate School of Environment and Information Sciences, Yokohama National University, Yokohama, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanCenter of Excellence in Cybersecurity, North Carolina A&T State University, Greensboro, NC, USADepartment of Computer Science and Information Systems, Texas A&M University–Commerce, Commerce, TX, USANational Institute of Information and Communications Technology, Koganei, JapanDetecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as <italic>decoy documents</italic>), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.https://ieeexplore.ieee.org/document/9435284/Targeted email attackdecoy documentmachine learningnatural language processing
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Bo Sun Tao Ban Chansu Han Takeshi Takahashi Katsunari Yoshioka Jun'ichi Takeuchi Abdolhossein Sarrafzadeh Meikang Qiu Daisuke Inoue
spellingShingle	Bo Sun Tao Ban Chansu Han Takeshi Takahashi Katsunari Yoshioka Jun'ichi Takeuchi Abdolhossein Sarrafzadeh Meikang Qiu Daisuke Inoue Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks IEEE Access Targeted email attack decoy document machine learning natural language processing
author_facet	Bo Sun Tao Ban Chansu Han Takeshi Takahashi Katsunari Yoshioka Jun'ichi Takeuchi Abdolhossein Sarrafzadeh Meikang Qiu Daisuke Inoue
author_sort	Bo Sun
title	Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_short	Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_full	Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_fullStr	Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_full_unstemmed	Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_sort	leveraging machine learning techniques to identify deceptive decoy documents associated with targeted email attacks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as <italic>decoy documents</italic>), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.
topic	Targeted email attack decoy document machine learning natural language processing
url	https://ieeexplore.ieee.org/document/9435284/
work_keys_str_mv	AT bosun leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT taoban leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT chansuhan leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT takeshitakahashi leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT katsunariyoshioka leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT junichitakeuchi leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT abdolhosseinsarrafzadeh leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT meikangqiu leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT daisukeinoue leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
_version_	1721362769111941120

Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks

Similar Items