An Improved Method for Named Entity Recognition and Its Application to CEMR

Named Entity Recognition (NER) on Clinical Electronic Medical Records (CEMR) is a fundamental step in extracting disease knowledge by identifying specific entity terms such as diseases, symptoms, etc. However, the state-of-the-art NER methods based on Long Short-Term Memory (LSTM) fail to exploit GP...

Full description

Bibliographic Details
Main Authors: Ming Gao, Qifeng Xiao, Shaochun Wu, Kun Deng
Format: Article
Language:English
Published: MDPI AG 2019-08-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/11/9/185
id doaj-8728ed63a0e743418a87891d72b59427
record_format Article
spelling doaj-8728ed63a0e743418a87891d72b594272020-11-25T01:35:11ZengMDPI AGFuture Internet1999-59032019-08-0111918510.3390/fi11090185fi11090185An Improved Method for Named Entity Recognition and Its Application to CEMRMing Gao0Qifeng Xiao1Shaochun Wu2Kun Deng3Department of Intelligent Information Processing, Shanghai University, Shanghai 200444, ChinaDepartment of Intelligent Information Processing, Shanghai University, Shanghai 200444, ChinaDepartment of Intelligent Information Processing, Shanghai University, Shanghai 200444, ChinaDepartment of Intelligent Information Processing, Shanghai University, Shanghai 200444, ChinaNamed Entity Recognition (NER) on Clinical Electronic Medical Records (CEMR) is a fundamental step in extracting disease knowledge by identifying specific entity terms such as diseases, symptoms, etc. However, the state-of-the-art NER methods based on Long Short-Term Memory (LSTM) fail to exploit GPU parallelism fully under the massive medical records. Although a novel NER method based on Iterated Dilated CNNs (ID-CNNs) can accelerate network computing, it tends to ignore the word-order feature and semantic information of the current word. In order to enhance the performance of ID-CNNs-based models on NER tasks, an attention-based ID-CNNs-CRF model, which combines the word-order feature and local context, is proposed. Firstly, position embedding is utilized to fuse word-order information. Secondly, the ID-CNNs architecture is used to extract global semantic information rapidly. Simultaneously, the attention mechanism is employed to pay attention to the local context. Finally, we apply the CRF to obtain the optimal tag sequence. Experiments conducted on two CEMR datasets show that our model outperforms traditional ones. The F1-scores of 94.55% and 91.17% are obtained respectively on these two datasets, and both are better than LSTM-based models.https://www.mdpi.com/1999-5903/11/9/185clinical electronic recordsnamed entity recognitionconvolutional neural network
collection DOAJ
language English
format Article
sources DOAJ
author Ming Gao
Qifeng Xiao
Shaochun Wu
Kun Deng
spellingShingle Ming Gao
Qifeng Xiao
Shaochun Wu
Kun Deng
An Improved Method for Named Entity Recognition and Its Application to CEMR
Future Internet
clinical electronic records
named entity recognition
convolutional neural network
author_facet Ming Gao
Qifeng Xiao
Shaochun Wu
Kun Deng
author_sort Ming Gao
title An Improved Method for Named Entity Recognition and Its Application to CEMR
title_short An Improved Method for Named Entity Recognition and Its Application to CEMR
title_full An Improved Method for Named Entity Recognition and Its Application to CEMR
title_fullStr An Improved Method for Named Entity Recognition and Its Application to CEMR
title_full_unstemmed An Improved Method for Named Entity Recognition and Its Application to CEMR
title_sort improved method for named entity recognition and its application to cemr
publisher MDPI AG
series Future Internet
issn 1999-5903
publishDate 2019-08-01
description Named Entity Recognition (NER) on Clinical Electronic Medical Records (CEMR) is a fundamental step in extracting disease knowledge by identifying specific entity terms such as diseases, symptoms, etc. However, the state-of-the-art NER methods based on Long Short-Term Memory (LSTM) fail to exploit GPU parallelism fully under the massive medical records. Although a novel NER method based on Iterated Dilated CNNs (ID-CNNs) can accelerate network computing, it tends to ignore the word-order feature and semantic information of the current word. In order to enhance the performance of ID-CNNs-based models on NER tasks, an attention-based ID-CNNs-CRF model, which combines the word-order feature and local context, is proposed. Firstly, position embedding is utilized to fuse word-order information. Secondly, the ID-CNNs architecture is used to extract global semantic information rapidly. Simultaneously, the attention mechanism is employed to pay attention to the local context. Finally, we apply the CRF to obtain the optimal tag sequence. Experiments conducted on two CEMR datasets show that our model outperforms traditional ones. The F1-scores of 94.55% and 91.17% are obtained respectively on these two datasets, and both are better than LSTM-based models.
topic clinical electronic records
named entity recognition
convolutional neural network
url https://www.mdpi.com/1999-5903/11/9/185
work_keys_str_mv AT minggao animprovedmethodfornamedentityrecognitionanditsapplicationtocemr
AT qifengxiao animprovedmethodfornamedentityrecognitionanditsapplicationtocemr
AT shaochunwu animprovedmethodfornamedentityrecognitionanditsapplicationtocemr
AT kundeng animprovedmethodfornamedentityrecognitionanditsapplicationtocemr
AT minggao improvedmethodfornamedentityrecognitionanditsapplicationtocemr
AT qifengxiao improvedmethodfornamedentityrecognitionanditsapplicationtocemr
AT shaochunwu improvedmethodfornamedentityrecognitionanditsapplicationtocemr
AT kundeng improvedmethodfornamedentityrecognitionanditsapplicationtocemr
_version_ 1725067988769439744