MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9206017/ |
id |
doaj-205745367bba4d39bf1c9be735248357 |
---|---|
record_format |
Article |
spelling |
doaj-205745367bba4d39bf1c9be7352483572021-03-30T03:38:38ZengIEEEIEEE Access2169-35362020-01-01818162918163910.1109/ACCESS.2020.30265359206017MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical TextsChengxi Yan0https://orcid.org/0000-0003-1128-550XQi Su1Jun Wang2Department of Information Management, Peking University, Beijing, ChinaSchool of Foreign Languages, Peking University, Beijing, ChinaDepartment of Information Management, Peking University, Beijing, ChinaNamed Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.https://ieeexplore.ieee.org/document/9206017/Named entity recognitiongated neural networkChinese historical texts |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chengxi Yan Qi Su Jun Wang |
spellingShingle |
Chengxi Yan Qi Su Jun Wang MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts IEEE Access Named entity recognition gated neural network Chinese historical texts |
author_facet |
Chengxi Yan Qi Su Jun Wang |
author_sort |
Chengxi Yan |
title |
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts |
title_short |
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts |
title_full |
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts |
title_fullStr |
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts |
title_full_unstemmed |
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts |
title_sort |
mogcn: mixture of gated convolutional neural network for named entity recognition of chinese historical texts |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies. |
topic |
Named entity recognition gated neural network Chinese historical texts |
url |
https://ieeexplore.ieee.org/document/9206017/ |
work_keys_str_mv |
AT chengxiyan mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts AT qisu mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts AT junwang mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts |
_version_ |
1724183043179020288 |