MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts

Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a...

Full description

Bibliographic Details
Main Authors: Chengxi Yan, Qi Su, Jun Wang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9206017/
id doaj-205745367bba4d39bf1c9be735248357
record_format Article
spelling doaj-205745367bba4d39bf1c9be7352483572021-03-30T03:38:38ZengIEEEIEEE Access2169-35362020-01-01818162918163910.1109/ACCESS.2020.30265359206017MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical TextsChengxi Yan0https://orcid.org/0000-0003-1128-550XQi Su1Jun Wang2Department of Information Management, Peking University, Beijing, ChinaSchool of Foreign Languages, Peking University, Beijing, ChinaDepartment of Information Management, Peking University, Beijing, ChinaNamed Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.https://ieeexplore.ieee.org/document/9206017/Named entity recognitiongated neural networkChinese historical texts
collection DOAJ
language English
format Article
sources DOAJ
author Chengxi Yan
Qi Su
Jun Wang
spellingShingle Chengxi Yan
Qi Su
Jun Wang
MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
IEEE Access
Named entity recognition
gated neural network
Chinese historical texts
author_facet Chengxi Yan
Qi Su
Jun Wang
author_sort Chengxi Yan
title MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_short MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_full MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_fullStr MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_full_unstemmed MoGCN: Mixture of Gated Convolutional Neural Network for Named Entity Recognition of Chinese Historical Texts
title_sort mogcn: mixture of gated convolutional neural network for named entity recognition of chinese historical texts
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.
topic Named entity recognition
gated neural network
Chinese historical texts
url https://ieeexplore.ieee.org/document/9206017/
work_keys_str_mv AT chengxiyan mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts
AT qisu mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts
AT junwang mogcnmixtureofgatedconvolutionalneuralnetworkfornamedentityrecognitionofchinesehistoricaltexts
_version_ 1724183043179020288