Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions
This paper discusses the design for developing the Minangkabau language corpus, especially regarding the opportunities and challenges. The corpus development of Minangkabau is a crucial project to document, preserve, and revive the treasure trove of culture within the language. The availability of...
| 發表在: | Arbitrer |
|---|---|
| 主要作者: | |
| 格式: | Article |
| 語言: | 英语 |
| 出版: |
Universitas Andalas
2024-09-01
|
| 主題: | |
| 在線閱讀: | https://arbitrer.fib.unand.ac.id/index.php/arbitrer/article/view/508 |
| _version_ | 1849318751573377024 |
|---|---|
| author | Handoko Handoko |
| author_facet | Handoko Handoko |
| author_sort | Handoko Handoko |
| collection | DOAJ |
| container_title | Arbitrer |
| description |
This paper discusses the design for developing the Minangkabau language corpus, especially regarding the opportunities and challenges. The corpus development of Minangkabau is a crucial project to document, preserve, and revive the treasure trove of culture within the language. The availability of a Minangkabau language corpus can open opportunities for more intensive research on the Minangkabau language with a more modern and data-based approach. It can also encourage the development of Minangkabau corpus-based teaching materials. The corpus is manually assembled using various sources’ comprehensive data collection, annotation, and curation pipelines. These may be manuscripts, books, newspapers, or other written texts and spontaneous conversations, such as interviews or public speeches. Multimedia resources, such as television and radio broadcasts, audio-video recordings, and social media content, also add to the diversity of data gathered. The availability of accessible digital sources, such as online videos, online radio programs, and ebooks, can make data collection easier. However, several challenges may appear in developing the Minangkabau language corpus, such as limited technology accessibility, dialect variations, and the involvement of highly skilled human resources. This paper explains some opportunities for developing the Minangkabau language corpus and increasing the role of the corpus in revitalizing and documenting the Minangkabau language. Furthermore, the availability of the Minangkabau language corpus can also be a starting point for developing linguistic technology, such as voice recognition, text-to-speech, and natural language processing.
|
| format | Article |
| id | doaj-art-9ade4e53976a4fefa52ef7d8665cc29d |
| institution | Directory of Open Access Journals |
| issn | 2339-1162 2550-1011 |
| language | English |
| publishDate | 2024-09-01 |
| publisher | Universitas Andalas |
| record_format | Article |
| spelling | doaj-art-9ade4e53976a4fefa52ef7d8665cc29d2025-09-02T17:39:45ZengUniversitas AndalasArbitrer2339-11622550-10112024-09-0111310.25077/ar.11.3.413-429.2024417Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future DirectionsHandoko Handoko0https://orcid.org/0000-0003-2474-3821Universitas Andalas This paper discusses the design for developing the Minangkabau language corpus, especially regarding the opportunities and challenges. The corpus development of Minangkabau is a crucial project to document, preserve, and revive the treasure trove of culture within the language. The availability of a Minangkabau language corpus can open opportunities for more intensive research on the Minangkabau language with a more modern and data-based approach. It can also encourage the development of Minangkabau corpus-based teaching materials. The corpus is manually assembled using various sources’ comprehensive data collection, annotation, and curation pipelines. These may be manuscripts, books, newspapers, or other written texts and spontaneous conversations, such as interviews or public speeches. Multimedia resources, such as television and radio broadcasts, audio-video recordings, and social media content, also add to the diversity of data gathered. The availability of accessible digital sources, such as online videos, online radio programs, and ebooks, can make data collection easier. However, several challenges may appear in developing the Minangkabau language corpus, such as limited technology accessibility, dialect variations, and the involvement of highly skilled human resources. This paper explains some opportunities for developing the Minangkabau language corpus and increasing the role of the corpus in revitalizing and documenting the Minangkabau language. Furthermore, the availability of the Minangkabau language corpus can also be a starting point for developing linguistic technology, such as voice recognition, text-to-speech, and natural language processing. https://arbitrer.fib.unand.ac.id/index.php/arbitrer/article/view/508Minangkabau corpuslanguage documentationlanguage preservationcorpus methodologydigital resources |
| spellingShingle | Handoko Handoko Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions Minangkabau corpus language documentation language preservation corpus methodology digital resources |
| title | Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions |
| title_full | Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions |
| title_fullStr | Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions |
| title_full_unstemmed | Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions |
| title_short | Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions |
| title_sort | developing the corpus of minangkabau language insights challenges and future directions |
| topic | Minangkabau corpus language documentation language preservation corpus methodology digital resources |
| url | https://arbitrer.fib.unand.ac.id/index.php/arbitrer/article/view/508 |
| work_keys_str_mv | AT handokohandoko developingthecorpusofminangkabaulanguageinsightschallengesandfuturedirections |
