Research of BERT Cross-Lingual Word Embedding Learning
With the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cros...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
2021-08-01
|
Series: | Jisuanji kexue yu tansuo |
Subjects: | |
Online Access: | http://fcst.ceaj.org/CN/abstract/abstract2825.shtml |
id |
doaj-7b92687a4ef94885ad8f9b4bb7ee0800 |
---|---|
record_format |
Article |
spelling |
doaj-7b92687a4ef94885ad8f9b4bb7ee08002021-08-09T08:42:10ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182021-08-011581405141710.3778/j.issn.1673-9418.2101042Research of BERT Cross-Lingual Word Embedding LearningWANG Yurong, LIN Min, LI Yanling0College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, ChinaWith the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cross-lingual word embedding can be mapped to a shared low-dimensional space with the help of transfer learning, and the grammar semantic and struc-tural features can be transferred between different languages, which can be used to model cross-lingual semantic infor-mation. By training a large number of corpora, a general word embedding is obtained in BERT (bidirectional encoder representations from transformers) model, which is further dynamically optimized according to specific downstream tasks to generate context-sensitive word embedding, thus solving the aggregation problem of previous models and obtaining dynamic word embedding. Based on the literature review of the existing cross-lingual word embedding based on BERT studies, this paper comprehensively describes the development of cross-lingual word embedding learning based on BERT learning methods, models and techniques, as well as the required training data. According to different training methods, it is divided into two categories, supervised learning and unsupervised learning. And the representative research of the two types of methods is compared and summarized in detail. Finally, the evaluation methods of cross-lingual word embedding are summarized, and the prospect is made by studying the cross-lingual word embedding of Mongolian and Chinese based on BERT.http://fcst.ceaj.org/CN/abstract/abstract2825.shtmlcross-lingual word embeddingmongolian-chinesebidirectional encoder representations from transformers (bert) |
collection |
DOAJ |
language |
zho |
format |
Article |
sources |
DOAJ |
author |
WANG Yurong, LIN Min, LI Yanling |
spellingShingle |
WANG Yurong, LIN Min, LI Yanling Research of BERT Cross-Lingual Word Embedding Learning Jisuanji kexue yu tansuo cross-lingual word embedding mongolian-chinese bidirectional encoder representations from transformers (bert) |
author_facet |
WANG Yurong, LIN Min, LI Yanling |
author_sort |
WANG Yurong, LIN Min, LI Yanling |
title |
Research of BERT Cross-Lingual Word Embedding Learning |
title_short |
Research of BERT Cross-Lingual Word Embedding Learning |
title_full |
Research of BERT Cross-Lingual Word Embedding Learning |
title_fullStr |
Research of BERT Cross-Lingual Word Embedding Learning |
title_full_unstemmed |
Research of BERT Cross-Lingual Word Embedding Learning |
title_sort |
research of bert cross-lingual word embedding learning |
publisher |
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press |
series |
Jisuanji kexue yu tansuo |
issn |
1673-9418 |
publishDate |
2021-08-01 |
description |
With the development of multilingual information on the Internet, how to effectively represent the infor-mation contained in different language texts has become an important sub-task of natural language information processing. Therefore, cross-lingual word embedding has become a hot technology. Cross-lingual word embedding can be mapped to a shared low-dimensional space with the help of transfer learning, and the grammar semantic and struc-tural features can be transferred between different languages, which can be used to model cross-lingual semantic infor-mation. By training a large number of corpora, a general word embedding is obtained in BERT (bidirectional encoder representations from transformers) model, which is further dynamically optimized according to specific downstream tasks to generate context-sensitive word embedding, thus solving the aggregation problem of previous models and obtaining dynamic word embedding. Based on the literature review of the existing cross-lingual word embedding based on BERT studies, this paper comprehensively describes the development of cross-lingual word embedding learning based on BERT learning methods, models and techniques, as well as the required training data. According to different training methods, it is divided into two categories, supervised learning and unsupervised learning. And the representative research of the two types of methods is compared and summarized in detail. Finally, the evaluation methods of cross-lingual word embedding are summarized, and the prospect is made by studying the cross-lingual word embedding of Mongolian and Chinese based on BERT. |
topic |
cross-lingual word embedding mongolian-chinese bidirectional encoder representations from transformers (bert) |
url |
http://fcst.ceaj.org/CN/abstract/abstract2825.shtml |
work_keys_str_mv |
AT wangyuronglinminliyanling researchofbertcrosslingualwordembeddinglearning |
_version_ |
1721214985069133824 |