Korean clinical entity recognition from diagnosis text using BERT

Abstract Background While clinical entity recognition mostly aims at electronic health records (EHRs), there are also the demands of dealing with the other type of text data. Automatic medical diagnosis is an example of new applications using a different data source. In this work, we are interested...

Full description

Bibliographic Details
Main Authors:	Young-Min Kim, Tae-Hoon Lee
Format:	Article
Language:	English
Published:	BMC 2020-09-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Clinical entity recognition BERT Korean Diagnosis text
Online Access:	http://link.springer.com/article/10.1186/s12911-020-01241-8

id	doaj-10e56fe0f1de4c6295c4ee810788924b
record_format	Article
spelling	doaj-10e56fe0f1de4c6295c4ee810788924b2020-11-25T03:57:21ZengBMCBMC Medical Informatics and Decision Making1472-69472020-09-0120S71910.1186/s12911-020-01241-8Korean clinical entity recognition from diagnosis text using BERTYoung-Min Kim0Tae-Hoon Lee1Graduate School of Technology & Innovation Management, Hanyang UniversityDivision of Interdisciplinary Industrial Studies, Hanyang UniversityAbstract Background While clinical entity recognition mostly aims at electronic health records (EHRs), there are also the demands of dealing with the other type of text data. Automatic medical diagnosis is an example of new applications using a different data source. In this work, we are interested in extracting Korean clinical entities from a new medical dataset, which is completely different from EHRs. The dataset is collected from an online QA site for medical diagnosis. Bidirectional Encoder Representations from Transformers (BERT), which is one of the best language representation models, is used to extract the entities. Results A slightly modified version of BERT labeling strategy replaces the original labeling to enhance the separation of postpositions in Korean. A new clinical entity recognition dataset that we construct, as well as a standard NER dataset, have been used for the experiments. A pre-trained multilingual BERT model is used for the initialization of the entity recognition model. BERT significantly outperforms a character-level bidirectional LSTM-CRF, a benchmark model, in terms of all metrics. The micro-averaged precision, recall, and f1 of BERT are 0.83, 0.85 and 0.84, whereas that of bi-LSTM-CRF are 0.82, 0.79 and 0.81 respectively. The recall values of BERT are especially better than that of the other model. It can be interpreted that the trained BERT model could detect out of vocabulary (OOV) words better than bi-LSTM-CRF. Conclusions The recently developed BERT and its WordPiece tokenization are effective for the Korean clinical entity recognition. The experiments using a new dataset constructed for the purpose and a standard NER dataset show the superiority of BERT compared to a state-of-the-art method. To the best of our knowledge, this work is one of the first studies dealing with clinical entity extraction from non-EHR data.http://link.springer.com/article/10.1186/s12911-020-01241-8Clinical entity recognitionBERTKoreanDiagnosis text
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Young-Min Kim Tae-Hoon Lee
spellingShingle	Young-Min Kim Tae-Hoon Lee Korean clinical entity recognition from diagnosis text using BERT BMC Medical Informatics and Decision Making Clinical entity recognition BERT Korean Diagnosis text
author_facet	Young-Min Kim Tae-Hoon Lee
author_sort	Young-Min Kim
title	Korean clinical entity recognition from diagnosis text using BERT
title_short	Korean clinical entity recognition from diagnosis text using BERT
title_full	Korean clinical entity recognition from diagnosis text using BERT
title_fullStr	Korean clinical entity recognition from diagnosis text using BERT
title_full_unstemmed	Korean clinical entity recognition from diagnosis text using BERT
title_sort	korean clinical entity recognition from diagnosis text using bert
publisher	BMC
series	BMC Medical Informatics and Decision Making
issn	1472-6947
publishDate	2020-09-01
description	Abstract Background While clinical entity recognition mostly aims at electronic health records (EHRs), there are also the demands of dealing with the other type of text data. Automatic medical diagnosis is an example of new applications using a different data source. In this work, we are interested in extracting Korean clinical entities from a new medical dataset, which is completely different from EHRs. The dataset is collected from an online QA site for medical diagnosis. Bidirectional Encoder Representations from Transformers (BERT), which is one of the best language representation models, is used to extract the entities. Results A slightly modified version of BERT labeling strategy replaces the original labeling to enhance the separation of postpositions in Korean. A new clinical entity recognition dataset that we construct, as well as a standard NER dataset, have been used for the experiments. A pre-trained multilingual BERT model is used for the initialization of the entity recognition model. BERT significantly outperforms a character-level bidirectional LSTM-CRF, a benchmark model, in terms of all metrics. The micro-averaged precision, recall, and f1 of BERT are 0.83, 0.85 and 0.84, whereas that of bi-LSTM-CRF are 0.82, 0.79 and 0.81 respectively. The recall values of BERT are especially better than that of the other model. It can be interpreted that the trained BERT model could detect out of vocabulary (OOV) words better than bi-LSTM-CRF. Conclusions The recently developed BERT and its WordPiece tokenization are effective for the Korean clinical entity recognition. The experiments using a new dataset constructed for the purpose and a standard NER dataset show the superiority of BERT compared to a state-of-the-art method. To the best of our knowledge, this work is one of the first studies dealing with clinical entity extraction from non-EHR data.
topic	Clinical entity recognition BERT Korean Diagnosis text
url	http://link.springer.com/article/10.1186/s12911-020-01241-8
work_keys_str_mv	AT youngminkim koreanclinicalentityrecognitionfromdiagnosistextusingbert AT taehoonlee koreanclinicalentityrecognitionfromdiagnosistextusingbert
_version_	1724461307615248384

Korean clinical entity recognition from diagnosis text using BERT

Similar Items