Citation Intent Classification Using Word Embedding
Citation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algori...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9319154/ |
id |
doaj-9631fdb0507d48eca8f44bed34f9c65a |
---|---|
record_format |
Article |
spelling |
doaj-9631fdb0507d48eca8f44bed34f9c65a2021-03-30T15:04:11ZengIEEEIEEE Access2169-35362021-01-0199982999510.1109/ACCESS.2021.30505479319154Citation Intent Classification Using Word EmbeddingMuhammad Roman0https://orcid.org/0000-0002-9035-2426Abdul Shahid1https://orcid.org/0000-0002-6291-2641Shafiullah Khan2https://orcid.org/0000-0001-8363-2051Anis Koubaa3Lisu Yu4https://orcid.org/0000-0001-8637-852XInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanRobotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, Saudi ArabiaInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanRobotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, Saudi ArabiaSchool of Information Engineering, Nanchang University, Nanchang, ChinaCitation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algorithms. The existing scholarly datasets are best suited for statistical approaches but lack citation context, intent, and section information. Furthermore, the datasets are too small to be used with deep learning approaches. For citation intent analysis, the datasets must have a citation context labeled with different citation intent classes. Most of the datasets either do not have labeled context sentences, or the sample is too small to be generalized. In this study, we critically investigated the available datasets for citation intent and proposed an automated citation intent technique to label the citation context with citation intent. Furthermore, we annotated ten million citation contexts with citation intent from Citation Context Dataset (C2D) dataset with the help of our proposed method. We applied Global Vectors (GloVe), Infersent, and Bidirectional Encoder Representations from Transformers (BERT) word embedding methods and compared their Precision, Recall, and F1 measures. It was found that BERT embedding performs significantly better, having an 89% Precision score. The labeled dataset, which is freely available for research purposes, will enhance the study of citation context analysis. Finally, It can be used as a benchmark dataset for finding the citation motivation and function from in-text citations.https://ieeexplore.ieee.org/document/9319154/Citation intentcitation analysiscitation contextcitation motivationcitation function classificationword embedding |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Muhammad Roman Abdul Shahid Shafiullah Khan Anis Koubaa Lisu Yu |
spellingShingle |
Muhammad Roman Abdul Shahid Shafiullah Khan Anis Koubaa Lisu Yu Citation Intent Classification Using Word Embedding IEEE Access Citation intent citation analysis citation context citation motivation citation function classification word embedding |
author_facet |
Muhammad Roman Abdul Shahid Shafiullah Khan Anis Koubaa Lisu Yu |
author_sort |
Muhammad Roman |
title |
Citation Intent Classification Using Word Embedding |
title_short |
Citation Intent Classification Using Word Embedding |
title_full |
Citation Intent Classification Using Word Embedding |
title_fullStr |
Citation Intent Classification Using Word Embedding |
title_full_unstemmed |
Citation Intent Classification Using Word Embedding |
title_sort |
citation intent classification using word embedding |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Citation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algorithms. The existing scholarly datasets are best suited for statistical approaches but lack citation context, intent, and section information. Furthermore, the datasets are too small to be used with deep learning approaches. For citation intent analysis, the datasets must have a citation context labeled with different citation intent classes. Most of the datasets either do not have labeled context sentences, or the sample is too small to be generalized. In this study, we critically investigated the available datasets for citation intent and proposed an automated citation intent technique to label the citation context with citation intent. Furthermore, we annotated ten million citation contexts with citation intent from Citation Context Dataset (C2D) dataset with the help of our proposed method. We applied Global Vectors (GloVe), Infersent, and Bidirectional Encoder Representations from Transformers (BERT) word embedding methods and compared their Precision, Recall, and F1 measures. It was found that BERT embedding performs significantly better, having an 89% Precision score. The labeled dataset, which is freely available for research purposes, will enhance the study of citation context analysis. Finally, It can be used as a benchmark dataset for finding the citation motivation and function from in-text citations. |
topic |
Citation intent citation analysis citation context citation motivation citation function classification word embedding |
url |
https://ieeexplore.ieee.org/document/9319154/ |
work_keys_str_mv |
AT muhammadroman citationintentclassificationusingwordembedding AT abdulshahid citationintentclassificationusingwordembedding AT shafiullahkhan citationintentclassificationusingwordembedding AT aniskoubaa citationintentclassificationusingwordembedding AT lisuyu citationintentclassificationusingwordembedding |
_version_ |
1724180124611379200 |