Research of BERT Cross-Lingual Word Embedding Learning

With the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cros...

Full description

Bibliographic Details
Main Author:	WANG Yurong, LIN Min, LI Yanling
Format:	Article
Language:	zho
Published:	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2021-08-01
Series:	Jisuanji kexue yu tansuo
Subjects:	cross-lingual word embedding mongolian-chinese bidirectional encoder representations from transformers (bert)
Online Access:	http://fcst.ceaj.org/CN/abstract/abstract2825.shtml

id	doaj-7b92687a4ef94885ad8f9b4bb7ee0800
record_format	Article
spelling	doaj-7b92687a4ef94885ad8f9b4bb7ee08002021-08-09T08:42:10ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182021-08-011581405141710.3778/j.issn.1673-9418.2101042Research of BERT Cross-Lingual Word Embedding LearningWANG Yurong, LIN Min, LI Yanling0College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, ChinaWith the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cross-lingual word embedding can be mapped to a shared low-dimensional space with the help of transfer learning, and the grammar semantic and struc-tural features can be transferred between different languages, which can be used to model cross-lingual semantic infor-mation. By training a large number of corpora, a general word embedding is obtained in BERT (bidirectional encoder representations from transformers) model, which is further dynamically optimized according to specific downstream tasks to generate context-sensitive word embedding, thus solving the aggregation problem of previous models and obtaining dynamic word embedding. Based on the literature review of the existing cross-lingual word embedding based on BERT studies, this paper comprehensively describes the development of cross-lingual word embedding learning based on BERT learning methods, models and techniques, as well as the required training data. According to different training methods, it is divided into two categories, supervised learning and unsupervised learning. And the representative research of the two types of methods is compared and summarized in detail. Finally, the evaluation methods of cross-lingual word embedding are summarized, and the prospect is made by studying the cross-lingual word embedding of Mongolian and Chinese based on BERT.http://fcst.ceaj.org/CN/abstract/abstract2825.shtmlcross-lingual word embeddingmongolian-chinesebidirectional encoder representations from transformers (bert)
collection	DOAJ
language	zho
format	Article
sources	DOAJ
author	WANG Yurong, LIN Min, LI Yanling
spellingShingle	WANG Yurong, LIN Min, LI Yanling Research of BERT Cross-Lingual Word Embedding Learning Jisuanji kexue yu tansuo cross-lingual word embedding mongolian-chinese bidirectional encoder representations from transformers (bert)
author_facet	WANG Yurong, LIN Min, LI Yanling
author_sort	WANG Yurong, LIN Min, LI Yanling
title	Research of BERT Cross-Lingual Word Embedding Learning
title_short	Research of BERT Cross-Lingual Word Embedding Learning
title_full	Research of BERT Cross-Lingual Word Embedding Learning
title_fullStr	Research of BERT Cross-Lingual Word Embedding Learning
title_full_unstemmed	Research of BERT Cross-Lingual Word Embedding Learning
title_sort	research of bert cross-lingual word embedding learning
publisher	Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
series	Jisuanji kexue yu tansuo
issn	1673-9418
publishDate	2021-08-01
description	With the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cross-lingual word embedding can be mapped to a shared low-dimensional space with the help of transfer learning, and the grammar semantic and struc-tural features can be transferred between different languages, which can be used to model cross-lingual semantic infor-mation. By training a large number of corpora, a general word embedding is obtained in BERT (bidirectional encoder representations from transformers) model, which is further dynamically optimized according to specific downstream tasks to generate context-sensitive word embedding, thus solving the aggregation problem of previous models and obtaining dynamic word embedding. Based on the literature review of the existing cross-lingual word embedding based on BERT studies, this paper comprehensively describes the development of cross-lingual word embedding learning based on BERT learning methods, models and techniques, as well as the required training data. According to different training methods, it is divided into two categories, supervised learning and unsupervised learning. And the representative research of the two types of methods is compared and summarized in detail. Finally, the evaluation methods of cross-lingual word embedding are summarized, and the prospect is made by studying the cross-lingual word embedding of Mongolian and Chinese based on BERT.
topic	cross-lingual word embedding mongolian-chinese bidirectional encoder representations from transformers (bert)
url	http://fcst.ceaj.org/CN/abstract/abstract2825.shtml
work_keys_str_mv	AT wangyuronglinminliyanling researchofbertcrosslingualwordembeddinglearning
_version_	1721214985069133824

Research of BERT Cross-Lingual Word Embedding Learning

Similar Items