The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing

In this thesis, we explore the impact of M-BERT and different transfer sizes on the choice of different transfer languages in dependency parsing. In order to investigate our research questions, we conduct a series of experiments on the treebanks in Universal Dependencies with UUParser.     The main...

Full description

Bibliographic Details
Main Author: Zhang, Yifei
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för lingvistik och filologi 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446094
id ndltd-UPSALLA1-oai-DiVA.org-uu-446094
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-4460942021-06-18T05:30:39ZThe Influence of M-BERT and Sizes on the Choice of Transfer Languages in ParsingengZhang, YifeiUppsala universitet, Institutionen för lingvistik och filologi2021Language Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)In this thesis, we explore the impact of M-BERT and different transfer sizes on the choice of different transfer languages in dependency parsing. In order to investigate our research questions, we conduct a series of experiments on the treebanks in Universal Dependencies with UUParser.     The main conclusions and contributions of this study are as follows:   First, we train a variety of languages in several different scripts with M-BERT being added into the parsing framework, which is one of the most state-of-the-art deep learning models based on the Transformer architecture. In general, we get advancing results with M-BERT compared with the randomly initialized embedding in UUParser.    Second, since it is a common way to choose a source language, which is 'close' to the target language in cross-lingual parsing, we try to explore what 'close' languages actually are, as there is not a definition for 'close'. In our study, we explore how strongly the parsing results are correlated with the different linguistic distances between the source and target languages. The relevant data is queried from URIEL Database. We find that the parsing performance is more dependent on inventory, syntactic and featural distance than on the geographic, genetic and phonological distance in zero-shot experiments. In the few-shot prediction, the parsing accuracy shows stronger correlation with inventory and syntactic distance than with others.     Third, we vary the training sizes in few-shot experiments with M-BERT being added to see how the parsing results are influenced. We find that it is very obvious that few-shot experiments outperform zero-shot experiments. With the source sizes being cut, all parsing scores decrease. However, we do not see a linear drop of the results. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446094application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
spellingShingle Language Technology (Computational Linguistics)
Språkteknologi (språkvetenskaplig databehandling)
Zhang, Yifei
The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing
description In this thesis, we explore the impact of M-BERT and different transfer sizes on the choice of different transfer languages in dependency parsing. In order to investigate our research questions, we conduct a series of experiments on the treebanks in Universal Dependencies with UUParser.     The main conclusions and contributions of this study are as follows:   First, we train a variety of languages in several different scripts with M-BERT being added into the parsing framework, which is one of the most state-of-the-art deep learning models based on the Transformer architecture. In general, we get advancing results with M-BERT compared with the randomly initialized embedding in UUParser.    Second, since it is a common way to choose a source language, which is 'close' to the target language in cross-lingual parsing, we try to explore what 'close' languages actually are, as there is not a definition for 'close'. In our study, we explore how strongly the parsing results are correlated with the different linguistic distances between the source and target languages. The relevant data is queried from URIEL Database. We find that the parsing performance is more dependent on inventory, syntactic and featural distance than on the geographic, genetic and phonological distance in zero-shot experiments. In the few-shot prediction, the parsing accuracy shows stronger correlation with inventory and syntactic distance than with others.     Third, we vary the training sizes in few-shot experiments with M-BERT being added to see how the parsing results are influenced. We find that it is very obvious that few-shot experiments outperform zero-shot experiments. With the source sizes being cut, all parsing scores decrease. However, we do not see a linear drop of the results.
author Zhang, Yifei
author_facet Zhang, Yifei
author_sort Zhang, Yifei
title The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing
title_short The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing
title_full The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing
title_fullStr The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing
title_full_unstemmed The Influence of M-BERT and Sizes on the Choice of Transfer Languages in Parsing
title_sort influence of m-bert and sizes on the choice of transfer languages in parsing
publisher Uppsala universitet, Institutionen för lingvistik och filologi
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446094
work_keys_str_mv AT zhangyifei theinfluenceofmbertandsizesonthechoiceoftransferlanguagesinparsing
AT zhangyifei influenceofmbertandsizesonthechoiceoftransferlanguagesinparsing
_version_ 1719411158557917184