Text Mining of Hazard and Operability Analysis Reports Based on Active Learning

In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the sy...

Full description

Bibliographic Details
Main Authors: Zhenhua Wang, Beike Zhang, Dong Gao
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Processes
Subjects:
Online Access:https://www.mdpi.com/2227-9717/9/7/1178
id doaj-a0880bf5cd9843d1882e99238feb35c5
record_format Article
spelling doaj-a0880bf5cd9843d1882e99238feb35c52021-07-23T14:03:14ZengMDPI AGProcesses2227-97172021-07-0191178117810.3390/pr9071178Text Mining of Hazard and Operability Analysis Reports Based on Active LearningZhenhua Wang0Beike Zhang1Dong Gao2College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, ChinaCollege of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, ChinaCollege of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, ChinaIn the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.https://www.mdpi.com/2227-9717/9/7/1178active learningsampling algorithmhazard and operability analysisdeep learningnamed entity recognition
collection DOAJ
language English
format Article
sources DOAJ
author Zhenhua Wang
Beike Zhang
Dong Gao
spellingShingle Zhenhua Wang
Beike Zhang
Dong Gao
Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
Processes
active learning
sampling algorithm
hazard and operability analysis
deep learning
named entity recognition
author_facet Zhenhua Wang
Beike Zhang
Dong Gao
author_sort Zhenhua Wang
title Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
title_short Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
title_full Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
title_fullStr Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
title_full_unstemmed Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
title_sort text mining of hazard and operability analysis reports based on active learning
publisher MDPI AG
series Processes
issn 2227-9717
publishDate 2021-07-01
description In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.
topic active learning
sampling algorithm
hazard and operability analysis
deep learning
named entity recognition
url https://www.mdpi.com/2227-9717/9/7/1178
work_keys_str_mv AT zhenhuawang textminingofhazardandoperabilityanalysisreportsbasedonactivelearning
AT beikezhang textminingofhazardandoperabilityanalysisreportsbasedonactivelearning
AT donggao textminingofhazardandoperabilityanalysisreportsbasedonactivelearning
_version_ 1721286230393487360