A Named Entity Recognition system applied to Arabic text in the medical domain
Currently, 30-35% of the global population uses the Internet. Furthermore, there is a rapidly increasing number of non-English language internet users, accompanied by an also increasing amount of unstructured text online. One area replete with underexploited online text is the Arabic medical domain,...
Main Author: | |
---|---|
Published: |
Staffordshire University
2017
|
Subjects: | |
Online Access: | https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714652 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-714652 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-7146522018-09-05T03:19:51ZA Named Entity Recognition system applied to Arabic text in the medical domainAlanazi, Saad2017Currently, 30-35% of the global population uses the Internet. Furthermore, there is a rapidly increasing number of non-English language internet users, accompanied by an also increasing amount of unstructured text online. One area replete with underexploited online text is the Arabic medical domain, and one method that can be used to extract valuable data from Arabic medical texts is Named Entity Recognition (NER). NER is the process by which a system can automatically detect and categorise Named Entities (NE). NER has numerous applications in many domains, and medical texts are no exception. NER applied to the medical domain could assist in detection of patterns in medical records, allowing doctors to make better diagnoses and treatment decisions, enabling medical staff to quickly assess a patient's records and ensuring that patients are informed about their data, as just a few examples. However, all these applications would require a very high level of accuracy. To improve the accuracy of NER in this domain, new approaches need to be developed that are tailored to the types of named entities to be extracted and categorised. In an effort to solve this problem, this research applied Bayesian Belief Networks (BBN) to the process. BBN, a probabilistic model for prediction of random variables and their dependencies, can be used to detect and predict entities. The aim of this research is to apply BBN to the NER task to extract relevant medical entities such as disease names, symptoms, treatment methods, and diagnosis methods from modern Arabic texts in the medical domain. To achieve this aim, a new corpus related to the medical domain has been built and annotated. Our BBN approach achieved a 96.60% precision, 90.79% recall, and 93.60% F-measure for the disease entity, while for the treatment method entity, it achieved 69.33%, 70.99%, and 70.15% for precision, recall, and F-measure, respectively. For the diagnosis method and symptom categories, our system achieved 84.91% and 71.34%, respectively, for precision, 53.36% and 49.34%, respectively, for recall, and 65.53% and 58.33%, for F-measure, respectively. Our BBN strategy achieved good accuracy for NEs in the categories of disease and treatment method. However, the average word length of the other two NE categories observed, diagnosis method and symptom, may have had a negative effect on their accuracy. Overall, the application of BBN to Arabic medical NER is successful, but more development is needed to improve accuracy to a standard at which the results can be applied to real medical systems.610.28Staffordshire Universityhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714652http://eprints.staffs.ac.uk/3129/Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
610.28 |
spellingShingle |
610.28 Alanazi, Saad A Named Entity Recognition system applied to Arabic text in the medical domain |
description |
Currently, 30-35% of the global population uses the Internet. Furthermore, there is a rapidly increasing number of non-English language internet users, accompanied by an also increasing amount of unstructured text online. One area replete with underexploited online text is the Arabic medical domain, and one method that can be used to extract valuable data from Arabic medical texts is Named Entity Recognition (NER). NER is the process by which a system can automatically detect and categorise Named Entities (NE). NER has numerous applications in many domains, and medical texts are no exception. NER applied to the medical domain could assist in detection of patterns in medical records, allowing doctors to make better diagnoses and treatment decisions, enabling medical staff to quickly assess a patient's records and ensuring that patients are informed about their data, as just a few examples. However, all these applications would require a very high level of accuracy. To improve the accuracy of NER in this domain, new approaches need to be developed that are tailored to the types of named entities to be extracted and categorised. In an effort to solve this problem, this research applied Bayesian Belief Networks (BBN) to the process. BBN, a probabilistic model for prediction of random variables and their dependencies, can be used to detect and predict entities. The aim of this research is to apply BBN to the NER task to extract relevant medical entities such as disease names, symptoms, treatment methods, and diagnosis methods from modern Arabic texts in the medical domain. To achieve this aim, a new corpus related to the medical domain has been built and annotated. Our BBN approach achieved a 96.60% precision, 90.79% recall, and 93.60% F-measure for the disease entity, while for the treatment method entity, it achieved 69.33%, 70.99%, and 70.15% for precision, recall, and F-measure, respectively. For the diagnosis method and symptom categories, our system achieved 84.91% and 71.34%, respectively, for precision, 53.36% and 49.34%, respectively, for recall, and 65.53% and 58.33%, for F-measure, respectively. Our BBN strategy achieved good accuracy for NEs in the categories of disease and treatment method. However, the average word length of the other two NE categories observed, diagnosis method and symptom, may have had a negative effect on their accuracy. Overall, the application of BBN to Arabic medical NER is successful, but more development is needed to improve accuracy to a standard at which the results can be applied to real medical systems. |
author |
Alanazi, Saad |
author_facet |
Alanazi, Saad |
author_sort |
Alanazi, Saad |
title |
A Named Entity Recognition system applied to Arabic text in the medical domain |
title_short |
A Named Entity Recognition system applied to Arabic text in the medical domain |
title_full |
A Named Entity Recognition system applied to Arabic text in the medical domain |
title_fullStr |
A Named Entity Recognition system applied to Arabic text in the medical domain |
title_full_unstemmed |
A Named Entity Recognition system applied to Arabic text in the medical domain |
title_sort |
named entity recognition system applied to arabic text in the medical domain |
publisher |
Staffordshire University |
publishDate |
2017 |
url |
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714652 |
work_keys_str_mv |
AT alanazisaad anamedentityrecognitionsystemappliedtoarabictextinthemedicaldomain AT alanazisaad namedentityrecognitionsystemappliedtoarabictextinthemedicaldomain |
_version_ |
1718728223546671104 |