The Development of Small-Scale Language Models for Low-Resource Languages, with a Focus on Kazakh and Direct Preference Optimization

Low-resource languages remain underserved by contemporary large language models (LLMs) because they lack sizable corpora, bespoke preprocessing tools, and the computing budgets assumed by mainstream alignment pipelines. Focusing on Kazakh, we present a 1.94B parameter LLaMA-based model that demonstr...

全面介紹

書目詳細資料
發表在:	Big Data and Cognitive Computing
Main Authors:	Nurgali Kadyrbek, Zhanseit Tuimebayev, Madina Mansurova, Vítor Viegas
格式:	Article
語言:	英语
出版:	MDPI AG 2025-05-01
主題:	Kazakh language model LLaMA natural language processing (NLP) low-resource languages DPO fine-tuning
在線閱讀:	https://www.mdpi.com/2504-2289/9/5/137

因特網

https://www.mdpi.com/2504-2289/9/5/137

The Development of Small-Scale Language Models for Low-Resource Languages, with a Focus on Kazakh and Direct Preference Optimization

因特網

相似書籍