Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study

BackgroundDrug prescriptions are often recorded in free-text clinical narratives; making this information available in a structured form is important to support many health-related tasks. Although several natural language processing (NLP) methods have been proposed to extract...

Full description

Bibliographic Details
Main Authors:	Alfattni, Ghada, Belousov, Maksim, Peek, Niels, Nenadic, Goran
Format:	Article
Language:	English
Published:	JMIR Publications 2021-05-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2021/5/e24678

id	doaj-6d4fa008bb49497b9de20e57b2c81726
record_format	Article
spelling	doaj-6d4fa008bb49497b9de20e57b2c817262021-05-05T13:16:24ZengJMIR PublicationsJMIR Medical Informatics2291-96942021-05-0195e2467810.2196/24678Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining StudyAlfattni, GhadaBelousov, MaksimPeek, NielsNenadic, Goran BackgroundDrug prescriptions are often recorded in free-text clinical narratives; making this information available in a structured form is important to support many health-related tasks. Although several natural language processing (NLP) methods have been proposed to extract such information, many challenges remain. ObjectiveThis study evaluates the feasibility of using NLP and deep learning approaches for extracting and linking drug names and associated attributes identified in clinical free-text notes and presents an extensive error analysis of different methods. This study initiated with the participation in the 2018 National NLP Clinical Challenges (n2c2) shared task on adverse drug events and medication extraction. MethodsThe proposed system (DrugEx) consists of a named entity recognizer (NER) to identify drugs and associated attributes and a relation extraction (RE) method to identify the relations between them. For NER, we explored deep learning-based approaches (ie, bidirectional long-short term memory with conditional random fields [BiLSTM-CRFs]) with various embeddings (ie, word embedding, character embedding [CE], and semantic-feature embedding) to investigate how different embeddings influence the performance. A rule-based method was implemented for RE and compared with a context-aware long-short term memory (LSTM) model. The methods were trained and evaluated using the 2018 n2c2 shared task data. ResultsThe experiments showed that the best model (BiLSTM-CRFs with pretrained word embeddings [PWE] and CE) achieved lenient micro F-scores of 0.921 for NER, 0.927 for RE, and 0.855 for the end-to-end system. NER, which relies on the pretrained word and semantic embeddings, performed better on most individual entity types, but NER with PWE and CE had the highest classification efficiency among the proposed approaches. Extracting relations using the rule-based method achieved higher accuracy than the context-aware LSTM for most relations. Interestingly, the LSTM model performed notably better in the reason-drug relations, the most challenging relation type. ConclusionsThe proposed end-to-end system achieved encouraging results and demonstrated the feasibility of using deep learning methods to extract medication information from free-text data.https://medinform.jmir.org/2021/5/e24678
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Alfattni, Ghada Belousov, Maksim Peek, Niels Nenadic, Goran
spellingShingle	Alfattni, Ghada Belousov, Maksim Peek, Niels Nenadic, Goran Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study JMIR Medical Informatics
author_facet	Alfattni, Ghada Belousov, Maksim Peek, Niels Nenadic, Goran
author_sort	Alfattni, Ghada
title	Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study
title_short	Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study
title_full	Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study
title_fullStr	Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study
title_full_unstemmed	Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study
title_sort	extracting drug names and associated attributes from discharge summaries: text mining study
publisher	JMIR Publications
series	JMIR Medical Informatics
issn	2291-9694
publishDate	2021-05-01
description	BackgroundDrug prescriptions are often recorded in free-text clinical narratives; making this information available in a structured form is important to support many health-related tasks. Although several natural language processing (NLP) methods have been proposed to extract such information, many challenges remain. ObjectiveThis study evaluates the feasibility of using NLP and deep learning approaches for extracting and linking drug names and associated attributes identified in clinical free-text notes and presents an extensive error analysis of different methods. This study initiated with the participation in the 2018 National NLP Clinical Challenges (n2c2) shared task on adverse drug events and medication extraction. MethodsThe proposed system (DrugEx) consists of a named entity recognizer (NER) to identify drugs and associated attributes and a relation extraction (RE) method to identify the relations between them. For NER, we explored deep learning-based approaches (ie, bidirectional long-short term memory with conditional random fields [BiLSTM-CRFs]) with various embeddings (ie, word embedding, character embedding [CE], and semantic-feature embedding) to investigate how different embeddings influence the performance. A rule-based method was implemented for RE and compared with a context-aware long-short term memory (LSTM) model. The methods were trained and evaluated using the 2018 n2c2 shared task data. ResultsThe experiments showed that the best model (BiLSTM-CRFs with pretrained word embeddings [PWE] and CE) achieved lenient micro F-scores of 0.921 for NER, 0.927 for RE, and 0.855 for the end-to-end system. NER, which relies on the pretrained word and semantic embeddings, performed better on most individual entity types, but NER with PWE and CE had the highest classification efficiency among the proposed approaches. Extracting relations using the rule-based method achieved higher accuracy than the context-aware LSTM for most relations. Interestingly, the LSTM model performed notably better in the reason-drug relations, the most challenging relation type. ConclusionsThe proposed end-to-end system achieved encouraging results and demonstrated the feasibility of using deep learning methods to extract medication information from free-text data.
url	https://medinform.jmir.org/2021/5/e24678
work_keys_str_mv	AT alfattnighada extractingdrugnamesandassociatedattributesfromdischargesummariestextminingstudy AT belousovmaksim extractingdrugnamesandassociatedattributesfromdischargesummariestextminingstudy AT peekniels extractingdrugnamesandassociatedattributesfromdischargesummariestextminingstudy AT nenadicgoran extractingdrugnamesandassociatedattributesfromdischargesummariestextminingstudy
_version_	1721462109716348928

Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study

Similar Items