A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in...

Full description

Bibliographic Details
Main Authors:	Amgad Muneer, Suliman Mohamed Fati
Format:	Article
Language:	English
Published:	MDPI AG 2020-10-01
Series:	Future Internet
Subjects:	cyberbullying detection tweets classification Twitter logistic regression random forest light GBM
Online Access:	https://www.mdpi.com/1999-5903/12/11/187

id	doaj-f25a87b4847d46b7b6bf397e48c51a5f
record_format	Article
spelling	doaj-f25a87b4847d46b7b6bf397e48c51a5f2020-11-25T03:40:53ZengMDPI AGFuture Internet1999-59032020-10-011218718710.3390/fi12110187A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on TwitterAmgad Muneer0Suliman Mohamed Fati1Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32160, MalaysiaInformation Systems Department, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi ArabiaThe advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).https://www.mdpi.com/1999-5903/12/11/187cyberbullying detectiontweets classificationTwitterlogistic regressionrandom forestlight GBM
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Amgad Muneer Suliman Mohamed Fati
spellingShingle	Amgad Muneer Suliman Mohamed Fati A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter Future Internet cyberbullying detection tweets classification Twitter logistic regression random forest light GBM
author_facet	Amgad Muneer Suliman Mohamed Fati
author_sort	Amgad Muneer
title	A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_short	A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_full	A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_fullStr	A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_full_unstemmed	A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
title_sort	comparative analysis of machine learning techniques for cyberbullying detection on twitter
publisher	MDPI AG
series	Future Internet
issn	1999-5903
publishDate	2020-10-01
description	The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).
topic	cyberbullying detection tweets classification Twitter logistic regression random forest light GBM
url	https://www.mdpi.com/1999-5903/12/11/187
work_keys_str_mv	AT amgadmuneer acomparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter AT sulimanmohamedfati acomparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter AT amgadmuneer comparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter AT sulimanmohamedfati comparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter
_version_	1724532412297248768

A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

Similar Items