Polyseme-Aware Vector Representation for Text Classification

Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results i...

Full description

Bibliographic Details
Main Authors: Shun Guo, Nianmin Yao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9145584/
id doaj-fb4e2be757a8461fb58f6eccc1eb44fb
record_format Article
spelling doaj-fb4e2be757a8461fb58f6eccc1eb44fb2021-03-30T04:06:06ZengIEEEIEEE Access2169-35362020-01-01813568613569910.1109/ACCESS.2020.30109819145584Polyseme-Aware Vector Representation for Text ClassificationShun Guo0https://orcid.org/0000-0003-3723-7688Nianmin Yao1https://orcid.org/0000-0001-9705-6649Department of Computer Science and Technology, Dalian University of Technology, Dalian, ChinaDepartment of Computer Science and Technology, Dalian University of Technology, Dalian, ChinaRepresentation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.https://ieeexplore.ieee.org/document/9145584/Polysemous wordscontext clustering algorithmPAVRM-ContextPAVRM-Center
collection DOAJ
language English
format Article
sources DOAJ
author Shun Guo
Nianmin Yao
spellingShingle Shun Guo
Nianmin Yao
Polyseme-Aware Vector Representation for Text Classification
IEEE Access
Polysemous words
context clustering algorithm
PAVRM-Context
PAVRM-Center
author_facet Shun Guo
Nianmin Yao
author_sort Shun Guo
title Polyseme-Aware Vector Representation for Text Classification
title_short Polyseme-Aware Vector Representation for Text Classification
title_full Polyseme-Aware Vector Representation for Text Classification
title_fullStr Polyseme-Aware Vector Representation for Text Classification
title_full_unstemmed Polyseme-Aware Vector Representation for Text Classification
title_sort polyseme-aware vector representation for text classification
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.
topic Polysemous words
context clustering algorithm
PAVRM-Context
PAVRM-Center
url https://ieeexplore.ieee.org/document/9145584/
work_keys_str_mv AT shunguo polysemeawarevectorrepresentationfortextclassification
AT nianminyao polysemeawarevectorrepresentationfortextclassification
_version_ 1724182366679728128