Improving Named Entity Recognition for Social Media with Data Augmentation

Social media is important for providing text information; however, due to its informal and unstructured nature, traditional named entity recognition (NER) methods face the challenge of achieving high accuracy when dealing with social media data. This paper proposes a new method for social media name...

Full description

Bibliographic Details
Main Authors: Cui, X. (Author), Liu, W. (Author)
Format: Article
Language:English
Published: MDPI 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
Description
Summary:Social media is important for providing text information; however, due to its informal and unstructured nature, traditional named entity recognition (NER) methods face the challenge of achieving high accuracy when dealing with social media data. This paper proposes a new method for social media named entity recognition with data augmentation. First, we pre-train the language model by using a bi-directional encoder representation of the transformer (BERT) to obtain a semantic vector of the word based on the contextual information of the word. Then, we obtain similar entities via data augmentation methods and perform substitution or semantic transformation on these entities. After that, the input into the Bi-LSTM model is trained and then fused and fine-tuned to obtain the best label. In addition, our use of the self-attentive layer captures the essential information of the features and reduces the reliance on external information. Experimental results on the WNUT16, WNUT17, and OntoNotes 5.0 datasets confirm the effectiveness of our proposed model. © 2023 by the authors.
ISBN:20763417 (ISSN)
DOI:10.3390/app13095360