Pre-Training on Mixed Data for Low-Resource Neural Machine Translation

The pre-training fine-tuning mode has been shown to be effective for low resource neural machine translation. In this mode, pre-training models trained on monolingual data are used to initiate translation models to transfer knowledge from monolingual data into translation models. In recent years, pr...

Full description

Bibliographic Details
Main Authors:	Wenbo Zhang, Xiao Li, Yating Yang, Rui Dong
Format:	Article
Language:	English
Published:	MDPI AG 2021-03-01
Series:	Information
Subjects:	neural machine translation pre-training low resource word translation
Online Access:	https://www.mdpi.com/2078-2489/12/3/133

id	doaj-bd0b76b4e99647c5b8e1967ef61e4236
record_format	Article
spelling	doaj-bd0b76b4e99647c5b8e1967ef61e42362021-03-19T00:03:54ZengMDPI AGInformation2078-24892021-03-011213313310.3390/info12030133Pre-Training on Mixed Data for Low-Resource Neural Machine TranslationWenbo Zhang0Xiao Li1Yating Yang2Rui Dong3The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, ChinaThe Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, ChinaThe Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, ChinaThe Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, ChinaThe pre-training fine-tuning mode has been shown to be effective for low resource neural machine translation. In this mode, pre-training models trained on monolingual data are used to initiate translation models to transfer knowledge from monolingual data into translation models. In recent years, pre-training models usually take sentences with randomly masked words as input, and are trained by predicting these masked words based on unmasked words. In this paper, we propose a new pre-training method that still predicts masked words, but randomly replaces some of the unmasked words in the input with their translation words in another language. The translation words are from bilingual data, so that the data for pre-training contains both monolingual data and bilingual data. We conduct experiments on Uyghur-Chinese corpus to evaluate our method. The experimental results show that our method can make the pre-training model have a better generalization ability and help the translation model to achieve better performance. Through a word translation task, we also demonstrate that our method enables the embedding of the translation model to acquire more alignment knowledge.https://www.mdpi.com/2078-2489/12/3/133neural machine translationpre-traininglow resourceword translation
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wenbo Zhang Xiao Li Yating Yang Rui Dong
spellingShingle	Wenbo Zhang Xiao Li Yating Yang Rui Dong Pre-Training on Mixed Data for Low-Resource Neural Machine Translation Information neural machine translation pre-training low resource word translation
author_facet	Wenbo Zhang Xiao Li Yating Yang Rui Dong
author_sort	Wenbo Zhang
title	Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
title_short	Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
title_full	Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
title_fullStr	Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
title_full_unstemmed	Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
title_sort	pre-training on mixed data for low-resource neural machine translation
publisher	MDPI AG
series	Information
issn	2078-2489
publishDate	2021-03-01
description	The pre-training fine-tuning mode has been shown to be effective for low resource neural machine translation. In this mode, pre-training models trained on monolingual data are used to initiate translation models to transfer knowledge from monolingual data into translation models. In recent years, pre-training models usually take sentences with randomly masked words as input, and are trained by predicting these masked words based on unmasked words. In this paper, we propose a new pre-training method that still predicts masked words, but randomly replaces some of the unmasked words in the input with their translation words in another language. The translation words are from bilingual data, so that the data for pre-training contains both monolingual data and bilingual data. We conduct experiments on Uyghur-Chinese corpus to evaluate our method. The experimental results show that our method can make the pre-training model have a better generalization ability and help the translation model to achieve better performance. Through a word translation task, we also demonstrate that our method enables the embedding of the translation model to acquire more alignment knowledge.
topic	neural machine translation pre-training low resource word translation
url	https://www.mdpi.com/2078-2489/12/3/133
work_keys_str_mv	AT wenbozhang pretrainingonmixeddataforlowresourceneuralmachinetranslation AT xiaoli pretrainingonmixeddataforlowresourceneuralmachinetranslation AT yatingyang pretrainingonmixeddataforlowresourceneuralmachinetranslation AT ruidong pretrainingonmixeddataforlowresourceneuralmachinetranslation
_version_	1724214855580254208

Pre-Training on Mixed Data for Low-Resource Neural Machine Translation

Similar Items