Efficient email classification approach based on semantic methods

Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the appr...

Full description

Bibliographic Details
Main Authors: Eman M. Bahgat, Sherine Rady, Walaa Gad, Ibrahim F. Moawad
Format: Article
Language:English
Published: Elsevier 2018-12-01
Series:Ain Shams Engineering Journal
Online Access:http://www.sciencedirect.com/science/article/pii/S2090447918300455
id doaj-0955590ffc4841c9827cbf08f84ef05f
record_format Article
spelling doaj-0955590ffc4841c9827cbf08f84ef05f2021-06-02T08:09:24ZengElsevierAin Shams Engineering Journal2090-44792018-12-019432593269Efficient email classification approach based on semantic methodsEman M. Bahgat0Sherine Rady1Walaa Gad2Ibrahim F. Moawad3Corresponding author.; Faculty of Computer and Information Sciences, Ain Shams University, Cairo, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo, EgyptFaculty of Computer and Information Sciences, Ain Shams University, Cairo, EgyptEmails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the approaches introduced to solve this problem handled the high dimensionality of emails by using syntactic feature selection. In this paper, an efficient email filtering approach based on semantic methods is addressed. The proposed approach employs the WordNet ontology and applies different semantic based methods and similarity measures for reducing the huge number of extracted textual features, and hence the space and time complexities are reduced. Moreover, to get the minimal optimal features’ set, feature dimensionality reduction has been integrated using feature selection techniques such as the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the standard benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the feature selection achieves high computational performance at high space and time reduction rates. A comparative study for several classification algorithms indicated that the Logistic Regression achieves the highest accuracy compared to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS feature selection technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90% feature reduction. Besides, the conducted experiments showed that the proposed work has a highly significant performance with higher accuracy and less time compared to other related works. Keywords: Email classification, Spam, WordNet ontology, Semantic similarity, Features reductionhttp://www.sciencedirect.com/science/article/pii/S2090447918300455
collection DOAJ
language English
format Article
sources DOAJ
author Eman M. Bahgat
Sherine Rady
Walaa Gad
Ibrahim F. Moawad
spellingShingle Eman M. Bahgat
Sherine Rady
Walaa Gad
Ibrahim F. Moawad
Efficient email classification approach based on semantic methods
Ain Shams Engineering Journal
author_facet Eman M. Bahgat
Sherine Rady
Walaa Gad
Ibrahim F. Moawad
author_sort Eman M. Bahgat
title Efficient email classification approach based on semantic methods
title_short Efficient email classification approach based on semantic methods
title_full Efficient email classification approach based on semantic methods
title_fullStr Efficient email classification approach based on semantic methods
title_full_unstemmed Efficient email classification approach based on semantic methods
title_sort efficient email classification approach based on semantic methods
publisher Elsevier
series Ain Shams Engineering Journal
issn 2090-4479
publishDate 2018-12-01
description Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the approaches introduced to solve this problem handled the high dimensionality of emails by using syntactic feature selection. In this paper, an efficient email filtering approach based on semantic methods is addressed. The proposed approach employs the WordNet ontology and applies different semantic based methods and similarity measures for reducing the huge number of extracted textual features, and hence the space and time complexities are reduced. Moreover, to get the minimal optimal features’ set, feature dimensionality reduction has been integrated using feature selection techniques such as the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the standard benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the feature selection achieves high computational performance at high space and time reduction rates. A comparative study for several classification algorithms indicated that the Logistic Regression achieves the highest accuracy compared to Naïve Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS feature selection technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90% feature reduction. Besides, the conducted experiments showed that the proposed work has a highly significant performance with higher accuracy and less time compared to other related works. Keywords: Email classification, Spam, WordNet ontology, Semantic similarity, Features reduction
url http://www.sciencedirect.com/science/article/pii/S2090447918300455
work_keys_str_mv AT emanmbahgat efficientemailclassificationapproachbasedonsemanticmethods
AT sherinerady efficientemailclassificationapproachbasedonsemanticmethods
AT walaagad efficientemailclassificationapproachbasedonsemanticmethods
AT ibrahimfmoawad efficientemailclassificationapproachbasedonsemanticmethods
_version_ 1721406531819274240