Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages

In this thesis, we mainly investigate the influence of using unsupervised morphological segmentation as features on the dependency parsing of morphologically rich languages such as Finnish, Estonian, Hungarian, Turkish, Uyghur, and Kazakh. Studying the morphology of these languages is of great impor...

Full description

Bibliographic Details
Main Author: Yusupujiang, Zulipiye
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för lingvistik och filologi 2018
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-354459
id ndltd-UPSALLA1-oai-DiVA.org-uu-354459
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-3544592018-06-21T05:59:36ZUsing Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich LanguagesengYusupujiang, ZulipiyeUppsala universitet, Institutionen för lingvistik och filologi2018Language Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)In this thesis, we mainly investigate the influence of using unsupervised morphological segmentation as features on the dependency parsing of morphologically rich languages such as Finnish, Estonian, Hungarian, Turkish, Uyghur, and Kazakh. Studying the morphology of these languages is of great importance for the dependency parsing of morphologically rich languages since dependency relations in a sentence of these languages mostly rely on morphemes rather than word order. In order to investigate our research questions, we have conducted a large number of parsing experiments both on MaltParser and UDPipe. We have generated the supervised morphology and the predicted POS tags from UDPipe, and obtained the unsupervised morphological segmentation from Morfessor, and have converted the unsupervised morphological segmentation into features and added them to the UD treebanks of each language. We have also investigated the different ways of converting the unsupervised segmentation into features and studied the result of each method. We have reported the Labeled Attachment Score (LAS) for all of our experimental results. The main finding of this study is that dependency parsing of some languages can be improved simply by providing unsupervised morphology during parsing if there is no manually annotated or supervised morphology available for such languages. After adding unsupervised morphological information with predicted POS tags, we get improvement of 4.9%, 6.0%, 8.7%, 3.3%, 3.7%, and 12.0% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on MaltParser, and the parsing accuracies have been improved by 2.7%, 4.1%, 8.2%, 2.4%, 1.6%, and 2.6% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on UDPipe when comparing the results from the models which do not use any morphological information during parsing. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-354459application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
spellingShingle Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
Yusupujiang, Zulipiye
Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
description In this thesis, we mainly investigate the influence of using unsupervised morphological segmentation as features on the dependency parsing of morphologically rich languages such as Finnish, Estonian, Hungarian, Turkish, Uyghur, and Kazakh. Studying the morphology of these languages is of great importance for the dependency parsing of morphologically rich languages since dependency relations in a sentence of these languages mostly rely on morphemes rather than word order. In order to investigate our research questions, we have conducted a large number of parsing experiments both on MaltParser and UDPipe. We have generated the supervised morphology and the predicted POS tags from UDPipe, and obtained the unsupervised morphological segmentation from Morfessor, and have converted the unsupervised morphological segmentation into features and added them to the UD treebanks of each language. We have also investigated the different ways of converting the unsupervised segmentation into features and studied the result of each method. We have reported the Labeled Attachment Score (LAS) for all of our experimental results. The main finding of this study is that dependency parsing of some languages can be improved simply by providing unsupervised morphology during parsing if there is no manually annotated or supervised morphology available for such languages. After adding unsupervised morphological information with predicted POS tags, we get improvement of 4.9%, 6.0%, 8.7%, 3.3%, 3.7%, and 12.0% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on MaltParser, and the parsing accuracies have been improved by 2.7%, 4.1%, 8.2%, 2.4%, 1.6%, and 2.6% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on UDPipe when comparing the results from the models which do not use any morphological information during parsing.
author Yusupujiang, Zulipiye
author_facet Yusupujiang, Zulipiye
author_sort Yusupujiang, Zulipiye
title Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
title_short Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
title_full Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
title_fullStr Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
title_full_unstemmed Using Unsupervised Morphological Segmentation to Improve Dependency Parsing for Morphologically Rich Languages
title_sort using unsupervised morphological segmentation to improve dependency parsing for morphologically rich languages
publisher Uppsala universitet, Institutionen för lingvistik och filologi
publishDate 2018
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-354459
work_keys_str_mv AT yusupujiangzulipiye usingunsupervisedmorphologicalsegmentationtoimprovedependencyparsingformorphologicallyrichlanguages
_version_ 1718699138117271552