CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement
This paper proposed a method for improving the XLNet model to address the shortcomings of segmentation algorithm for processing Chinese language, such as long sub-word lengths, long word lists and incomplete word list coverage. To address these issues, we proposed the CWSXLNet (Chinese Word Segmenta...
| Published in: | Applied Sciences |
|---|---|
| Main Authors: | , , , , |
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2023-03-01
|
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/13/6/4056 |
| _version_ | 1850093280337854464 |
|---|---|
| author | Shiqian Guo Yansun Huang Baohua Huang Linda Yang Cong Zhou |
| author_facet | Shiqian Guo Yansun Huang Baohua Huang Linda Yang Cong Zhou |
| author_sort | Shiqian Guo |
| collection | DOAJ |
| container_title | Applied Sciences |
| description | This paper proposed a method for improving the XLNet model to address the shortcomings of segmentation algorithm for processing Chinese language, such as long sub-word lengths, long word lists and incomplete word list coverage. To address these issues, we proposed the CWSXLNet (Chinese Word Segmentation XLNet) model based on Chinese word segmentation information enhancement. The model first pre-processed Chinese pretrained text by Chinese word segmentation tool, and proposed a Chinese word segmentation attention mask mechanism by combining PLM (Permuted Language Model) and two-stream self-attention mechanism of XLNet. While performing natural language processing at word granularity, it can reduce the degree of masking between masked and non-masked words for two words belonging to the same word. For the Chinese sentiment analysis task, proposed the CWSXLNet-BiGRU-Attention model, which introduces bi-directional GRU as well as self-attention mechanism in the downstream task. Experiments show that CWSXLNet has achieved 89.91% precision, 91.53% recall rate and 90.71% F1-score, and CWSXLNet-BiGRU-Attention has achieved 92.61% precision, 93.19% recall rate and 92.90% F1-score on ChnSentiCorp dataset, which indicates that CWSXLNet has better performance than other models in Chinese sentiment analysis. |
| format | Article |
| id | doaj-art-1a1d42e2eee94fddba2ba3955e45e17e |
| institution | Directory of Open Access Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2023-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| spelling | doaj-art-1a1d42e2eee94fddba2ba3955e45e17e2025-08-20T00:08:02ZengMDPI AGApplied Sciences2076-34172023-03-01136405610.3390/app13064056CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information EnhancementShiqian Guo0Yansun Huang1Baohua Huang2Linda Yang3Cong Zhou4School of Computer, Electronics and Information, Guangxi University, Nanning 530004, ChinaAuditing Bureau of Xixiangtang, Nanning 530001, ChinaSchool of Computer, Electronics and Information, Guangxi University, Nanning 530004, ChinaSchool of Computer, Electronics and Information, Guangxi University, Nanning 530004, ChinaSchool of Computer, Electronics and Information, Guangxi University, Nanning 530004, ChinaThis paper proposed a method for improving the XLNet model to address the shortcomings of segmentation algorithm for processing Chinese language, such as long sub-word lengths, long word lists and incomplete word list coverage. To address these issues, we proposed the CWSXLNet (Chinese Word Segmentation XLNet) model based on Chinese word segmentation information enhancement. The model first pre-processed Chinese pretrained text by Chinese word segmentation tool, and proposed a Chinese word segmentation attention mask mechanism by combining PLM (Permuted Language Model) and two-stream self-attention mechanism of XLNet. While performing natural language processing at word granularity, it can reduce the degree of masking between masked and non-masked words for two words belonging to the same word. For the Chinese sentiment analysis task, proposed the CWSXLNet-BiGRU-Attention model, which introduces bi-directional GRU as well as self-attention mechanism in the downstream task. Experiments show that CWSXLNet has achieved 89.91% precision, 91.53% recall rate and 90.71% F1-score, and CWSXLNet-BiGRU-Attention has achieved 92.61% precision, 93.19% recall rate and 92.90% F1-score on ChnSentiCorp dataset, which indicates that CWSXLNet has better performance than other models in Chinese sentiment analysis.https://www.mdpi.com/2076-3417/13/6/4056sentiment analysisChinese word segmentationXLNetattention maskmachine learningnatural language processing |
| spellingShingle | Shiqian Guo Yansun Huang Baohua Huang Linda Yang Cong Zhou CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement sentiment analysis Chinese word segmentation XLNet attention mask machine learning natural language processing |
| title | CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement |
| title_full | CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement |
| title_fullStr | CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement |
| title_full_unstemmed | CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement |
| title_short | CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement |
| title_sort | cwsxlnet a sentiment analysis model based on chinese word segmentation information enhancement |
| topic | sentiment analysis Chinese word segmentation XLNet attention mask machine learning natural language processing |
| url | https://www.mdpi.com/2076-3417/13/6/4056 |
| work_keys_str_mv | AT shiqianguo cwsxlnetasentimentanalysismodelbasedonchinesewordsegmentationinformationenhancement AT yansunhuang cwsxlnetasentimentanalysismodelbasedonchinesewordsegmentationinformationenhancement AT baohuahuang cwsxlnetasentimentanalysismodelbasedonchinesewordsegmentationinformationenhancement AT lindayang cwsxlnetasentimentanalysismodelbasedonchinesewordsegmentationinformationenhancement AT congzhou cwsxlnetasentimentanalysismodelbasedonchinesewordsegmentationinformationenhancement |
