EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION

Biomedical Named Entity Recognition (Bio-NER) is an essential step of biomedical information extraction and biomedical text mining. Although, a lot of researches have been made in the design of rule-based and supervised tools for general NER, Bio-NER still remains a challenge and an area of active r...

Full description

Bibliographic Details
Main Authors: Maan Tareq Abd, Masnizah Mohd
Format: Article
Language:English
Published: UKM Press 2017-12-01
Series:Asia-Pacific Journal of Information Technology and Multimedia
Subjects:
Online Access:https://www.ukm.my/apjitm/view.php?id=41
id doaj-13883ab6ac784966bc32c416045689e9
record_format Article
spelling doaj-13883ab6ac784966bc32c416045689e92021-06-18T11:49:42ZengUKM PressAsia-Pacific Journal of Information Technology and Multimedia2289-21922017-12-01602111https://doi.org/10.17576/apjitm-2017-0602-01EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITIONMaan Tareq AbdMasnizah MohdBiomedical Named Entity Recognition (Bio-NER) is an essential step of biomedical information extraction and biomedical text mining. Although, a lot of researches have been made in the design of rule-based and supervised tools for general NER, Bio-NER still remains a challenge and an area of active research, as still there is huge difference in F-score of 10 points between general newswire NER and Bio-NER. The complex structures of the biomedical entities pose a huge challenge for their recognition. To handle this, this paper explores different effective word representations with Support Vector Machine (SVM) to deal with the complex structures of biomedical named entities. First, this paper identifies and evaluates a set of morphological and contextual features with SVM learning method for Bio-NER. This paper also presents an extended distributed representation word embedding technique (EDRWE) for Bio-NER. These models are evaluated on widely used standard Bio-NER dataset namely GENIA corpus. Experimental results show that EDRWE technique improves the overall performance of the Bio-NER and outperforms all other representation methods. Results analysis shows that the new EDRWE is satisfactory and effective for Bio-NER especially when only a small-sized data set is available.https://www.ukm.my/apjitm/view.php?id=41word representationword embeddingbiomedical named entity
collection DOAJ
language English
format Article
sources DOAJ
author Maan Tareq Abd
Masnizah Mohd
spellingShingle Maan Tareq Abd
Masnizah Mohd
EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION
Asia-Pacific Journal of Information Technology and Multimedia
word representation
word embedding
biomedical named entity
author_facet Maan Tareq Abd
Masnizah Mohd
author_sort Maan Tareq Abd
title EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION
title_short EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION
title_full EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION
title_fullStr EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION
title_full_unstemmed EXTENDED DISTRIBUTED PROTOTYPICAL FOR BIOMEDICAL NAMED ENTITY RECOGNITION
title_sort extended distributed prototypical for biomedical named entity recognition
publisher UKM Press
series Asia-Pacific Journal of Information Technology and Multimedia
issn 2289-2192
publishDate 2017-12-01
description Biomedical Named Entity Recognition (Bio-NER) is an essential step of biomedical information extraction and biomedical text mining. Although, a lot of researches have been made in the design of rule-based and supervised tools for general NER, Bio-NER still remains a challenge and an area of active research, as still there is huge difference in F-score of 10 points between general newswire NER and Bio-NER. The complex structures of the biomedical entities pose a huge challenge for their recognition. To handle this, this paper explores different effective word representations with Support Vector Machine (SVM) to deal with the complex structures of biomedical named entities. First, this paper identifies and evaluates a set of morphological and contextual features with SVM learning method for Bio-NER. This paper also presents an extended distributed representation word embedding technique (EDRWE) for Bio-NER. These models are evaluated on widely used standard Bio-NER dataset namely GENIA corpus. Experimental results show that EDRWE technique improves the overall performance of the Bio-NER and outperforms all other representation methods. Results analysis shows that the new EDRWE is satisfactory and effective for Bio-NER especially when only a small-sized data set is available.
topic word representation
word embedding
biomedical named entity
url https://www.ukm.my/apjitm/view.php?id=41
work_keys_str_mv AT maantareqabd extendeddistributedprototypicalforbiomedicalnamedentityrecognition
AT masnizahmohd extendeddistributedprototypicalforbiomedicalnamedentityrecognition
_version_ 1721372837687590912