Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media

Homophobic expressions are a form of insulting the sexual orientation or personality of people. Severe psychological traumas may occur in people who are exposed to this type of communication. It is important to develop automatic classification systems based on language models to examine social media...

Full description

Bibliographic Details
Main Authors:	Aci, Ç.İ (Author), Akdagli, A. (Author), Karayiğit, H. (Author)
Format:	Article
Language:	English
Published:	Kauno Technologijos Universitetas 2022
Subjects:	deep learning Homophobic speech detection multilingual BERT sentiment analysis text classification transfer learning Turkish social media
Online Access:	View Fulltext in Publisher


LEADER	02990nam a2200241Ia 4500
001	10.5755-j01.itc.51.2.29988
008	220718s2022 CNT 000 0 und d
020			\|a 1392124X (ISSN)
245	1	0	\|a Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media
260		0	\|b Kauno Technologijos Universitetas \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.5755/j01.itc.51.2.29988
520	3		\|a Homophobic expressions are a form of insulting the sexual orientation or personality of people. Severe psychological traumas may occur in people who are exposed to this type of communication. It is important to develop automatic classification systems based on language models to examine social media content and distinguish homophobic discourse. This study aims to present a pre-trained Multilingual Bidirectional Encoder Representations from Transformers (M-BERT) model that can successfully detect whether Turkish comments on social media contain homophobic or related hate comments (i.e., sexist, severe humiliation, and defecation expressions). Comments in the Homophobic-Abusive Turkish Comments (HATC) dataset were collected from Instagram to train the detection models. The HATC dataset was manually labeled at the sentence level and combined with the Abusive Turkish Comments (ATC) dataset that has developed in our previous study. The HATC dataset has been balanced using the resampling method and two forms of the dataset (i.e., resHATC and original HATC) were used in the experiments. Afterward, the M-BERT model was compared with DL-based models (i.e., Long-Short Term Memory, Bidirectional Long-Short Term Memory (BiLSTM), Gated Recurrent Unit), Traditional Machine Learning (TML) classifiers (i.e., Support Vector Machine, Naive Bayes, Random Forest) and Ensemble Classifiers (i.e., Adaptive Boosting, eXtreme Gradient Boosting, Gradient Boosting) for the best model selection. The performance of the detection models was evaluated using F1-score, precision, and recall performance metrics. Results showed the best performance (homophobic F1-score: 82.64%, hateful F1-score: 91.75%, neutral F1-score: 96.08%, average F1-score: 90.15%) were achieved with the M-BERT model on the HATC dataset. The M-BERT detection model can increase the effectiveness of filters in detecting Turkish homophobic and related hate speech in social networks. It can be used to detect homophobic and related hate speech for different languages since the M-BERT model has multilingual pre-trained data. © 2022, Kauno Technologijos Universitetas. All rights reserved.
650	0	4	\|a deep learning
650	0	4	\|a Homophobic speech detection
650	0	4	\|a multilingual BERT
650	0	4	\|a sentiment analysis
650	0	4	\|a text classification
650	0	4	\|a transfer learning
650	0	4	\|a Turkish social media
700	1		\|a Aci, Ç.İ. \|e author
700	1		\|a Akdagli, A. \|e author
700	1		\|a Karayiğit, H. \|e author
773			\|t Information Technology and Control

Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media

Similar Items