Polyseme-Aware Vector Representation for Text Classification
Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results i...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9145584/ |
id |
doaj-fb4e2be757a8461fb58f6eccc1eb44fb |
---|---|
record_format |
Article |
spelling |
doaj-fb4e2be757a8461fb58f6eccc1eb44fb2021-03-30T04:06:06ZengIEEEIEEE Access2169-35362020-01-01813568613569910.1109/ACCESS.2020.30109819145584Polyseme-Aware Vector Representation for Text ClassificationShun Guo0https://orcid.org/0000-0003-3723-7688Nianmin Yao1https://orcid.org/0000-0001-9705-6649Department of Computer Science and Technology, Dalian University of Technology, Dalian, ChinaDepartment of Computer Science and Technology, Dalian University of Technology, Dalian, ChinaRepresentation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.https://ieeexplore.ieee.org/document/9145584/Polysemous wordscontext clustering algorithmPAVRM-ContextPAVRM-Center |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shun Guo Nianmin Yao |
spellingShingle |
Shun Guo Nianmin Yao Polyseme-Aware Vector Representation for Text Classification IEEE Access Polysemous words context clustering algorithm PAVRM-Context PAVRM-Center |
author_facet |
Shun Guo Nianmin Yao |
author_sort |
Shun Guo |
title |
Polyseme-Aware Vector Representation for Text Classification |
title_short |
Polyseme-Aware Vector Representation for Text Classification |
title_full |
Polyseme-Aware Vector Representation for Text Classification |
title_fullStr |
Polyseme-Aware Vector Representation for Text Classification |
title_full_unstemmed |
Polyseme-Aware Vector Representation for Text Classification |
title_sort |
polyseme-aware vector representation for text classification |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance. |
topic |
Polysemous words context clustering algorithm PAVRM-Context PAVRM-Center |
url |
https://ieeexplore.ieee.org/document/9145584/ |
work_keys_str_mv |
AT shunguo polysemeawarevectorrepresentationfortextclassification AT nianminyao polysemeawarevectorrepresentationfortextclassification |
_version_ |
1724182366679728128 |