A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection

Social media provides a public and convenient platform for people to communicate. However, it is also open to hateful behavior and toxic comments. Social networks, like Facebook, Twitter, and many others, have been working on developing effective toxic comment detection methods to provide better ser...

Full description

Bibliographic Details
Main Authors: Guizhe Song, Degen Huang, Yanping Zhang*
Format: Article
Language:English
Published: Faculty of Mechanical Engineering in Slavonski Brod, Faculty of Electrical Engineering in Osijek, Faculty of Civil Engineering in Osijek 2021-01-01
Series:Tehnički Vjesnik
Subjects:
Online Access:https://hrcak.srce.hr/file/380242
Description
Summary:Social media provides a public and convenient platform for people to communicate. However, it is also open to hateful behavior and toxic comments. Social networks, like Facebook, Twitter, and many others, have been working on developing effective toxic comment detection methods to provide better service. Monolingual language model focuses on a single-language and provides high accuracy in detection. Multilingual language model provides better generalization performance. In order to improve the effectiveness of detecting toxic comments in multiple languages, we propose a hybrid model, which fuses monolingual model and multilingual model. We use labeled data to fine-tune the monolingual pre-trained model. We use masked language modeling to semi-supervise the fine-tuning of multilingual pre-trained model on unlabeled data and then use labeled data to fine-tune the model. Through this way, we can fully utilize the large amount of unlabeled data; reduce dependence on labeled comment data; and improve the effectiveness of detection. We also design several comparative experiments. The results demonstrate the effectiveness and advantage of our proposed model, especially compared to the XLM-RoBERTa multilingual fine-tuning model.
ISSN:1330-3651
1848-6339