Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks

Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachmen...

Full description

Bibliographic Details
Main Authors: Bo Sun, Tao Ban, Chansu Han, Takeshi Takahashi, Katsunari Yoshioka, Jun'ichi Takeuchi, Abdolhossein Sarrafzadeh, Meikang Qiu, Daisuke Inoue
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9435284/
id doaj-dd70700bf3a14f6ca626cd8f2b617564
record_format Article
spelling doaj-dd70700bf3a14f6ca626cd8f2b6175642021-06-22T23:00:21ZengIEEEIEEE Access2169-35362021-01-019879628797110.1109/ACCESS.2021.30820009435284Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email AttacksBo Sun0https://orcid.org/0000-0002-7822-3672Tao Ban1https://orcid.org/0000-0002-9616-3212Chansu Han2https://orcid.org/0000-0002-1728-5300Takeshi Takahashi3https://orcid.org/0000-0002-6477-7770Katsunari Yoshioka4Jun'ichi Takeuchi5https://orcid.org/0000-0002-5819-3082Abdolhossein Sarrafzadeh6Meikang Qiu7Daisuke Inoue8Department of Information Systems, Saitama Institute of Technology, Fukaya, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanGraduate School of Environment and Information Sciences, Yokohama National University, Yokohama, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanCenter of Excellence in Cybersecurity, North Carolina A&#x0026;T State University, Greensboro, NC, USADepartment of Computer Science and Information Systems, Texas A&#x0026;M University&#x2013;Commerce, Commerce, TX, USANational Institute of Information and Communications Technology, Koganei, JapanDetecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as <italic>decoy documents</italic>), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5&#x0025;, an F-measure of 97.9&#x0025; and a low FPR of 3.1&#x0025;. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.https://ieeexplore.ieee.org/document/9435284/Targeted email attackdecoy documentmachine learningnatural language processing
collection DOAJ
language English
format Article
sources DOAJ
author Bo Sun
Tao Ban
Chansu Han
Takeshi Takahashi
Katsunari Yoshioka
Jun'ichi Takeuchi
Abdolhossein Sarrafzadeh
Meikang Qiu
Daisuke Inoue
spellingShingle Bo Sun
Tao Ban
Chansu Han
Takeshi Takahashi
Katsunari Yoshioka
Jun'ichi Takeuchi
Abdolhossein Sarrafzadeh
Meikang Qiu
Daisuke Inoue
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
IEEE Access
Targeted email attack
decoy document
machine learning
natural language processing
author_facet Bo Sun
Tao Ban
Chansu Han
Takeshi Takahashi
Katsunari Yoshioka
Jun'ichi Takeuchi
Abdolhossein Sarrafzadeh
Meikang Qiu
Daisuke Inoue
author_sort Bo Sun
title Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_short Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_full Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_fullStr Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_full_unstemmed Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
title_sort leveraging machine learning techniques to identify deceptive decoy documents associated with targeted email attacks
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as <italic>decoy documents</italic>), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5&#x0025;, an F-measure of 97.9&#x0025; and a low FPR of 3.1&#x0025;. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.
topic Targeted email attack
decoy document
machine learning
natural language processing
url https://ieeexplore.ieee.org/document/9435284/
work_keys_str_mv AT bosun leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT taoban leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT chansuhan leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT takeshitakahashi leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT katsunariyoshioka leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT junichitakeuchi leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT abdolhosseinsarrafzadeh leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT meikangqiu leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
AT daisukeinoue leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks
_version_ 1721362769111941120