A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter
The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-10-01
|
Series: | Future Internet |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-5903/12/11/187 |
id |
doaj-f25a87b4847d46b7b6bf397e48c51a5f |
---|---|
record_format |
Article |
spelling |
doaj-f25a87b4847d46b7b6bf397e48c51a5f2020-11-25T03:40:53ZengMDPI AGFuture Internet1999-59032020-10-011218718710.3390/fi12110187A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on TwitterAmgad Muneer0Suliman Mohamed Fati1Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32160, MalaysiaInformation Systems Department, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi ArabiaThe advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).https://www.mdpi.com/1999-5903/12/11/187cyberbullying detectiontweets classificationTwitterlogistic regressionrandom forestlight GBM |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Amgad Muneer Suliman Mohamed Fati |
spellingShingle |
Amgad Muneer Suliman Mohamed Fati A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter Future Internet cyberbullying detection tweets classification logistic regression random forest light GBM |
author_facet |
Amgad Muneer Suliman Mohamed Fati |
author_sort |
Amgad Muneer |
title |
A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter |
title_short |
A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter |
title_full |
A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter |
title_fullStr |
A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter |
title_full_unstemmed |
A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter |
title_sort |
comparative analysis of machine learning techniques for cyberbullying detection on twitter |
publisher |
MDPI AG |
series |
Future Internet |
issn |
1999-5903 |
publishDate |
2020-10-01 |
description |
The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00). |
topic |
cyberbullying detection tweets classification logistic regression random forest light GBM |
url |
https://www.mdpi.com/1999-5903/12/11/187 |
work_keys_str_mv |
AT amgadmuneer acomparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter AT sulimanmohamedfati acomparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter AT amgadmuneer comparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter AT sulimanmohamedfati comparativeanalysisofmachinelearningtechniquesforcyberbullyingdetectionontwitter |
_version_ |
1724532412297248768 |