Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media

The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are stron...

Full description

Bibliographic Details
Main Authors:	Komal Florio, Valerio Basile, Marco Polignano, Pierpaolo Basile, Viviana Patti
Format:	Article
Language:	English
Published:	MDPI AG 2020-06-01
Series:	Applied Sciences
Subjects:	hate speech monitoring diachronic analysis microblogging data supervised machine learning
Online Access:	https://www.mdpi.com/2076-3417/10/12/4180

id	doaj-0e390ae9a600456786a577693a5fab06
record_format	Article
spelling	doaj-0e390ae9a600456786a577693a5fab062020-11-25T02:58:06ZengMDPI AGApplied Sciences2076-34172020-06-01104180418010.3390/app10124180Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social MediaKomal Florio0Valerio Basile1Marco Polignano2Pierpaolo Basile3Viviana Patti4Department of Computer Science, University of Turin, 10149 Turin, ItalyDepartment of Computer Science, University of Turin, 10149 Turin, ItalyDepartment of Computer Science, University of Bari “Aldo Moro”, 70126 Bari, ItalyDepartment of Computer Science, University of Bari “Aldo Moro”, 70126 Bari, ItalyDepartment of Computer Science, University of Turin, 10149 Turin, ItalyThe availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.https://www.mdpi.com/2076-3417/10/12/4180hate speech monitoringdiachronic analysismicroblogging datasupervised machine learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Komal Florio Valerio Basile Marco Polignano Pierpaolo Basile Viviana Patti
spellingShingle	Komal Florio Valerio Basile Marco Polignano Pierpaolo Basile Viviana Patti Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media Applied Sciences hate speech monitoring diachronic analysis microblogging data supervised machine learning
author_facet	Komal Florio Valerio Basile Marco Polignano Pierpaolo Basile Viviana Patti
author_sort	Komal Florio
title	Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media
title_short	Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media
title_full	Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media
title_fullStr	Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media
title_full_unstemmed	Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media
title_sort	time of your hate: the challenge of time in hate speech detection on social media
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2020-06-01
description	The availability of large annotated corpora from social media and the development of powerful classification approaches have contributed in an unprecedented way to tackle the challenge of monitoring users’ opinions and sentiments in online social platforms across time. Such linguistic data are strongly affected by events and topic discourse, and this aspect is crucial when detecting phenomena such as hate speech, especially from a diachronic perspective. We address this challenge by focusing on a real case study: the “Contro l’odio” platform for monitoring hate speech against immigrants in the Italian Twittersphere. We explored the temporal robustness of a BERT model for Italian (AlBERTo), the current benchmark on non-diachronic detection settings. We tested different training strategies to evaluate how the classification performance is affected by adding more data temporally distant from the test set and hence potentially different in terms of topic and language use. Our analysis points out the limits that a supervised classification model encounters on data that are heavily influenced by events. Our results show how AlBERTo is highly sensitive to the temporal distance of the fine-tuning set. However, with an adequate time window, the performance increases, while requiring less annotated data than a traditional classifier.
topic	hate speech monitoring diachronic analysis microblogging data supervised machine learning
url	https://www.mdpi.com/2076-3417/10/12/4180
work_keys_str_mv	AT komalflorio timeofyourhatethechallengeoftimeinhatespeechdetectiononsocialmedia AT valeriobasile timeofyourhatethechallengeoftimeinhatespeechdetectiononsocialmedia AT marcopolignano timeofyourhatethechallengeoftimeinhatespeechdetectiononsocialmedia AT pierpaolobasile timeofyourhatethechallengeoftimeinhatespeechdetectiononsocialmedia AT vivianapatti timeofyourhatethechallengeoftimeinhatespeechdetectiononsocialmedia
_version_	1724708403208519680

Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media

Similar Items