Data and knowledge-driven named entity recognition for cyber security

Abstract Named Entity Recognition (NER) for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts. In the field of machine learning, deep neural networks automatically learn text features from a large number of dataset...

Full description

Bibliographic Details
Main Authors: Chen Gao, Xuan Zhang, Hui Liu
Format: Article
Language:English
Published: SpringerOpen 2021-05-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-021-00072-y
id doaj-a712c172188a44d395728cbf0c5bd579
record_format Article
spelling doaj-a712c172188a44d395728cbf0c5bd5792021-05-09T11:03:35ZengSpringerOpenCybersecurity2523-32462021-05-014111310.1186/s42400-021-00072-yData and knowledge-driven named entity recognition for cyber securityChen Gao0Xuan Zhang1Hui Liu2School of Software, Yunnan UniversitySchool of Software, Yunnan UniversitySchool of Software, Yunnan UniversityAbstract Named Entity Recognition (NER) for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts. In the field of machine learning, deep neural networks automatically learn text features from a large number of datasets, but this data-driven method usually lacks the ability to deal with rare entities. Gasmi et al. proposed a deep learning method for named entity recognition in the field of cyber security, and achieved good results, reaching an F1 value of 82.8%. But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge, this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition. In addition, based on the data-driven deep learning model, an attention mechanism is adopted to enrich the local features of the text, better models the context, and improves the recognition effect of complex entities. Experimental results show that our method is better than the baseline model. Our model is more effective in identifying cyber security entities. The Precision, Recall and F1 value reached 90.19%, 86.60% and 88.36% respectively.https://doi.org/10.1186/s42400-021-00072-yCyber securityNamed entity recognitionAttention mechanismDictionaryDeep learning
collection DOAJ
language English
format Article
sources DOAJ
author Chen Gao
Xuan Zhang
Hui Liu
spellingShingle Chen Gao
Xuan Zhang
Hui Liu
Data and knowledge-driven named entity recognition for cyber security
Cybersecurity
Cyber security
Named entity recognition
Attention mechanism
Dictionary
Deep learning
author_facet Chen Gao
Xuan Zhang
Hui Liu
author_sort Chen Gao
title Data and knowledge-driven named entity recognition for cyber security
title_short Data and knowledge-driven named entity recognition for cyber security
title_full Data and knowledge-driven named entity recognition for cyber security
title_fullStr Data and knowledge-driven named entity recognition for cyber security
title_full_unstemmed Data and knowledge-driven named entity recognition for cyber security
title_sort data and knowledge-driven named entity recognition for cyber security
publisher SpringerOpen
series Cybersecurity
issn 2523-3246
publishDate 2021-05-01
description Abstract Named Entity Recognition (NER) for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts. In the field of machine learning, deep neural networks automatically learn text features from a large number of datasets, but this data-driven method usually lacks the ability to deal with rare entities. Gasmi et al. proposed a deep learning method for named entity recognition in the field of cyber security, and achieved good results, reaching an F1 value of 82.8%. But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge, this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition. In addition, based on the data-driven deep learning model, an attention mechanism is adopted to enrich the local features of the text, better models the context, and improves the recognition effect of complex entities. Experimental results show that our method is better than the baseline model. Our model is more effective in identifying cyber security entities. The Precision, Recall and F1 value reached 90.19%, 86.60% and 88.36% respectively.
topic Cyber security
Named entity recognition
Attention mechanism
Dictionary
Deep learning
url https://doi.org/10.1186/s42400-021-00072-y
work_keys_str_mv AT chengao dataandknowledgedrivennamedentityrecognitionforcybersecurity
AT xuanzhang dataandknowledgedrivennamedentityrecognitionforcybersecurity
AT huiliu dataandknowledgedrivennamedentityrecognitionforcybersecurity
_version_ 1721454738667470848