Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions

This paper discusses the design for developing the Minangkabau language corpus, especially regarding the opportunities and challenges. The corpus development of Minangkabau is a crucial project to document, preserve, and revive the treasure trove of culture within the language. The availability of...

全面介紹

書目詳細資料
發表在:Arbitrer
主要作者: Handoko Handoko
格式: Article
語言:英语
出版: Universitas Andalas 2024-09-01
主題:
在線閱讀:https://arbitrer.fib.unand.ac.id/index.php/arbitrer/article/view/508
_version_ 1849318751573377024
author Handoko Handoko
author_facet Handoko Handoko
author_sort Handoko Handoko
collection DOAJ
container_title Arbitrer
description This paper discusses the design for developing the Minangkabau language corpus, especially regarding the opportunities and challenges. The corpus development of Minangkabau is a crucial project to document, preserve, and revive the treasure trove of culture within the language. The availability of a Minangkabau language corpus can open opportunities for more intensive research on the Minangkabau language with a more modern and data-based approach. It can also encourage the development of Minangkabau corpus-based teaching materials. The corpus is manually assembled using various sources’ comprehensive data collection, annotation, and curation pipelines. These may be manuscripts, books, newspapers, or other written texts and spontaneous conversations, such as interviews or public speeches. Multimedia resources, such as television and radio broadcasts, audio-video recordings, and social media content, also add to the diversity of data gathered. The availability of accessible digital sources, such as online videos, online radio programs, and ebooks, can make data collection easier. However, several challenges may appear in developing the Minangkabau language corpus, such as limited technology accessibility, dialect variations, and the involvement of highly skilled human resources. This paper explains some opportunities for developing the Minangkabau language corpus and increasing the role of the corpus in revitalizing and documenting the Minangkabau language. Furthermore, the availability of the Minangkabau language corpus can also be a starting point for developing linguistic technology, such as voice recognition, text-to-speech, and natural language processing.
format Article
id doaj-art-9ade4e53976a4fefa52ef7d8665cc29d
institution Directory of Open Access Journals
issn 2339-1162
2550-1011
language English
publishDate 2024-09-01
publisher Universitas Andalas
record_format Article
spelling doaj-art-9ade4e53976a4fefa52ef7d8665cc29d2025-09-02T17:39:45ZengUniversitas AndalasArbitrer2339-11622550-10112024-09-0111310.25077/ar.11.3.413-429.2024417Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future DirectionsHandoko Handoko0https://orcid.org/0000-0003-2474-3821Universitas Andalas This paper discusses the design for developing the Minangkabau language corpus, especially regarding the opportunities and challenges. The corpus development of Minangkabau is a crucial project to document, preserve, and revive the treasure trove of culture within the language. The availability of a Minangkabau language corpus can open opportunities for more intensive research on the Minangkabau language with a more modern and data-based approach. It can also encourage the development of Minangkabau corpus-based teaching materials. The corpus is manually assembled using various sources’ comprehensive data collection, annotation, and curation pipelines. These may be manuscripts, books, newspapers, or other written texts and spontaneous conversations, such as interviews or public speeches. Multimedia resources, such as television and radio broadcasts, audio-video recordings, and social media content, also add to the diversity of data gathered. The availability of accessible digital sources, such as online videos, online radio programs, and ebooks, can make data collection easier. However, several challenges may appear in developing the Minangkabau language corpus, such as limited technology accessibility, dialect variations, and the involvement of highly skilled human resources. This paper explains some opportunities for developing the Minangkabau language corpus and increasing the role of the corpus in revitalizing and documenting the Minangkabau language. Furthermore, the availability of the Minangkabau language corpus can also be a starting point for developing linguistic technology, such as voice recognition, text-to-speech, and natural language processing. https://arbitrer.fib.unand.ac.id/index.php/arbitrer/article/view/508Minangkabau corpuslanguage documentationlanguage preservationcorpus methodologydigital resources
spellingShingle Handoko Handoko
Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
Minangkabau corpus
language documentation
language preservation
corpus methodology
digital resources
title Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
title_full Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
title_fullStr Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
title_full_unstemmed Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
title_short Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
title_sort developing the corpus of minangkabau language insights challenges and future directions
topic Minangkabau corpus
language documentation
language preservation
corpus methodology
digital resources
url https://arbitrer.fib.unand.ac.id/index.php/arbitrer/article/view/508
work_keys_str_mv AT handokohandoko developingthecorpusofminangkabaulanguageinsightschallengesandfuturedirections