Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachmen...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9435284/ |
id |
doaj-dd70700bf3a14f6ca626cd8f2b617564 |
---|---|
record_format |
Article |
spelling |
doaj-dd70700bf3a14f6ca626cd8f2b6175642021-06-22T23:00:21ZengIEEEIEEE Access2169-35362021-01-019879628797110.1109/ACCESS.2021.30820009435284Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email AttacksBo Sun0https://orcid.org/0000-0002-7822-3672Tao Ban1https://orcid.org/0000-0002-9616-3212Chansu Han2https://orcid.org/0000-0002-1728-5300Takeshi Takahashi3https://orcid.org/0000-0002-6477-7770Katsunari Yoshioka4Jun'ichi Takeuchi5https://orcid.org/0000-0002-5819-3082Abdolhossein Sarrafzadeh6Meikang Qiu7Daisuke Inoue8Department of Information Systems, Saitama Institute of Technology, Fukaya, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanGraduate School of Environment and Information Sciences, Yokohama National University, Yokohama, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanCenter of Excellence in Cybersecurity, North Carolina A&T State University, Greensboro, NC, USADepartment of Computer Science and Information Systems, Texas A&M University–Commerce, Commerce, TX, USANational Institute of Information and Communications Technology, Koganei, JapanDetecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as <italic>decoy documents</italic>), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.https://ieeexplore.ieee.org/document/9435284/Targeted email attackdecoy documentmachine learningnatural language processing |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Bo Sun Tao Ban Chansu Han Takeshi Takahashi Katsunari Yoshioka Jun'ichi Takeuchi Abdolhossein Sarrafzadeh Meikang Qiu Daisuke Inoue |
spellingShingle |
Bo Sun Tao Ban Chansu Han Takeshi Takahashi Katsunari Yoshioka Jun'ichi Takeuchi Abdolhossein Sarrafzadeh Meikang Qiu Daisuke Inoue Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks IEEE Access Targeted email attack decoy document machine learning natural language processing |
author_facet |
Bo Sun Tao Ban Chansu Han Takeshi Takahashi Katsunari Yoshioka Jun'ichi Takeuchi Abdolhossein Sarrafzadeh Meikang Qiu Daisuke Inoue |
author_sort |
Bo Sun |
title |
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks |
title_short |
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks |
title_full |
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks |
title_fullStr |
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks |
title_full_unstemmed |
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks |
title_sort |
leveraging machine learning techniques to identify deceptive decoy documents associated with targeted email attacks |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as <italic>decoy documents</italic>), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets. |
topic |
Targeted email attack decoy document machine learning natural language processing |
url |
https://ieeexplore.ieee.org/document/9435284/ |
work_keys_str_mv |
AT bosun leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT taoban leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT chansuhan leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT takeshitakahashi leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT katsunariyoshioka leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT junichitakeuchi leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT abdolhosseinsarrafzadeh leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT meikangqiu leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks AT daisukeinoue leveragingmachinelearningtechniquestoidentifydeceptivedecoydocumentsassociatedwithtargetedemailattacks |
_version_ |
1721362769111941120 |