Pre-trained transformer-based language models for Sundanese

The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-tra...

Full description

Bibliographic Details
Main Authors:	Lucky, H. (Author), Suhartono, D. (Author), Wongso, W. (Author)
Format:	Article
Language:	English
Published:	Springer Science and Business Media Deutschland GmbH 2022
Subjects:	Low-resource Language Natural Language Understanding Sundanese Language Transformers
Online Access:	View Fulltext in Publisher


LEADER	01503nam a2200205Ia 4500
001	10.1186-s40537-022-00590-7
008	220425s2022 CNT 000 0 und d
020			\|a 21961115 (ISSN)
245	1	0	\|a Pre-trained transformer-based language models for Sundanese
260		0	\|b Springer Science and Business Media Deutschland GmbH \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1186/s40537-022-00590-7
520	3		\|a The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use. © 2022, The Author(s).
650	0	4	\|a Low-resource Language
650	0	4	\|a Natural Language Understanding
650	0	4	\|a Sundanese Language
650	0	4	\|a Transformers
700	1		\|a Lucky, H. \|e author
700	1		\|a Suhartono, D. \|e author
700	1		\|a Wongso, W. \|e author
773			\|t Journal of Big Data

Pre-trained transformer-based language models for Sundanese

Similar Items