Summary: | Named Entity Recognition (NER) systems have been largely advanced by deep neural networks in the recent decade. However, the state-of-the-arts on NER have been less applied to Chinese historical texts due to the lack of standard corpora in Chinese historical domains and the difficulty of accessing a quality ancient corpus. This paper addresses the respective issues and proposes an efficient automatic processing solution for tackling NER of ancient Chinese data, including the implementation of data-driven tagging and an innovative end-to-end network namely “MoGCN” (Mixture of Gated Convolutional Neural Network). A corpus consisting of three genres of Chinese historical classics is generated by our tagging approach, which is experimented for uncovering the generalization ability of proposed model. The empirical analysis demonstrates that our proposed model achieves the best results with above 1.5% F1-score improvement over other sophisticated models in this dataset, where the experimental performance shows positive dependence on the quality of corpus. Furthermore, our model can perform much better on shorter entities especially for 2-charater ones, while many long-range entities can be only identified by our model based on our auxiliary attribute analysis. This work serves as a preliminary exploitation of NER for historical data, providing unique insights and reference values for similar tasks. Future work should be focused on more exploration about NER optimization on massive Chinese traditional texts with linguistic features and learning strategies.
|