Research of BERT Cross-Lingual Word Embedding Learning

With the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cros...

Full description

Bibliographic Details
Main Author: WANG Yurong, LIN Min, LI Yanling
Format: Article
Language:zho
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 2021-08-01
Series:Jisuanji kexue yu tansuo
Subjects:
Online Access:http://fcst.ceaj.org/CN/abstract/abstract2825.shtml
id doaj-7b92687a4ef94885ad8f9b4bb7ee0800
record_format Article
spelling doaj-7b92687a4ef94885ad8f9b4bb7ee08002021-08-09T08:42:10ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182021-08-011581405141710.3778/j.issn.1673-9418.2101042Research of BERT Cross-Lingual Word Embedding LearningWANG Yurong, LIN Min, LI Yanling0College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, ChinaWith the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cross-lingual word embedding can be mapped to a shared low-dimensional space with the help of transfer learning, and the grammar semantic and struc-tural features can be transferred between different languages, which can be used to model cross-lingual semantic infor-mation. By training a large number of corpora, a general word embedding is obtained in BERT (bidirectional encoder representations from transformers) model, which is further dynamically optimized according to specific downstream tasks to generate context-sensitive word embedding, thus solving the aggregation problem of previous models and obtaining dynamic word embedding. Based on the literature review of the existing cross-lingual word embedding based on BERT studies, this paper comprehensively describes the development of cross-lingual word embedding learning based on BERT learning methods, models and techniques, as well as the required training data. According to different training methods, it is divided into two categories, supervised learning and unsupervised learning. And the representative research of the two types of methods is compared and summarized in detail. Finally, the evaluation methods of cross-lingual word embedding are summarized, and the prospect is made by studying the cross-lingual word embedding of Mongolian and Chinese based on BERT.http://fcst.ceaj.org/CN/abstract/abstract2825.shtmlcross-lingual word embeddingmongolian-chinesebidirectional encoder representations from transformers (bert)
collection DOAJ
language zho
format Article
sources DOAJ
author WANG Yurong, LIN Min, LI Yanling
spellingShingle WANG Yurong, LIN Min, LI Yanling
Research of BERT Cross-Lingual Word Embedding Learning
Jisuanji kexue yu tansuo
cross-lingual word embedding
mongolian-chinese
bidirectional encoder representations from transformers (bert)
author_facet WANG Yurong, LIN Min, LI Yanling
author_sort WANG Yurong, LIN Min, LI Yanling
title Research of BERT Cross-Lingual Word Embedding Learning
title_short Research of BERT Cross-Lingual Word Embedding Learning
title_full Research of BERT Cross-Lingual Word Embedding Learning
title_fullStr Research of BERT Cross-Lingual Word Embedding Learning
title_full_unstemmed Research of BERT Cross-Lingual Word Embedding Learning
title_sort research of bert cross-lingual word embedding learning
publisher Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
series Jisuanji kexue yu tansuo
issn 1673-9418
publishDate 2021-08-01
description With the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cross-lingual word embedding can be mapped to a shared low-dimensional space with the help of transfer learning, and the grammar semantic and struc-tural features can be transferred between different languages, which can be used to model cross-lingual semantic infor-mation. By training a large number of corpora, a general word embedding is obtained in BERT (bidirectional encoder representations from transformers) model, which is further dynamically optimized according to specific downstream tasks to generate context-sensitive word embedding, thus solving the aggregation problem of previous models and obtaining dynamic word embedding. Based on the literature review of the existing cross-lingual word embedding based on BERT studies, this paper comprehensively describes the development of cross-lingual word embedding learning based on BERT learning methods, models and techniques, as well as the required training data. According to different training methods, it is divided into two categories, supervised learning and unsupervised learning. And the representative research of the two types of methods is compared and summarized in detail. Finally, the evaluation methods of cross-lingual word embedding are summarized, and the prospect is made by studying the cross-lingual word embedding of Mongolian and Chinese based on BERT.
topic cross-lingual word embedding
mongolian-chinese
bidirectional encoder representations from transformers (bert)
url http://fcst.ceaj.org/CN/abstract/abstract2825.shtml
work_keys_str_mv AT wangyuronglinminliyanling researchofbertcrosslingualwordembeddinglearning
_version_ 1721214985069133824