A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in...

Full description

Bibliographic Details
Main Authors: Amgad Muneer, Suliman Mohamed Fati
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/12/11/187
id doaj-f25a87b4847d46b7b6bf397e48c51a5f
record_format Article
spelling doaj-f25a87b4847d46b7b6bf397e48c51a5f2020-11-25T03:40:53ZengMDPI AGFuture Internet1999-59032020-10-011218718710.3390/fi12110187A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on TwitterAmgad Muneer0Suliman Mohamed Fati1Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32160, MalaysiaInformation Systems Department, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi ArabiaThe advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).https://www.mdpi.com/1999-5903/12/11/187cyberbullying detectiontweets classificationTwitterlogistic regressionrandom forestlight GBM
collection DOAJ
language English
format Article
sources DOAJ
author Amgad Muneer
Suliman Mohamed Fati
spellingShingle Amgad Muneer
Suliman Mohamed Fati
A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
Future Internet
cyberbullying detection
tweets classification
Twitter
logistic regression
random forest
light GBM
author_facet Amgad Muneer
Suliman Mohamed Fati
author_sort Amgad Muneer
title A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_short A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_full A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_fullStr A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_full_unstemmed A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_sort comparative analysis of machine learning techniques for cyberbullying detection on twitter
publisher MDPI AG
series Future Internet
issn 1999-5903
publishDate 2020-10-01
description The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).
topic cyberbullying detection
tweets classification
Twitter
logistic regression
random forest
light GBM
url https://www.mdpi.com/1999-5903/12/11/187
work_keys_str_mv AT amgadmuneer acomparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter
AT sulimanmohamedfati acomparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter
AT amgadmuneer comparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter
AT sulimanmohamedfati comparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter
_version_ 1724532412297248768