Citation Intent Classification Using Word Embedding

Citation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algori...

Full description

Bibliographic Details
Main Authors: Muhammad Roman, Abdul Shahid, Shafiullah Khan, Anis Koubaa, Lisu Yu
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9319154/
id doaj-9631fdb0507d48eca8f44bed34f9c65a
record_format Article
spelling doaj-9631fdb0507d48eca8f44bed34f9c65a2021-03-30T15:04:11ZengIEEEIEEE Access2169-35362021-01-0199982999510.1109/ACCESS.2021.30505479319154Citation Intent Classification Using Word EmbeddingMuhammad Roman0https://orcid.org/0000-0002-9035-2426Abdul Shahid1https://orcid.org/0000-0002-6291-2641Shafiullah Khan2https://orcid.org/0000-0001-8363-2051Anis Koubaa3Lisu Yu4https://orcid.org/0000-0001-8637-852XInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanRobotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, Saudi ArabiaInstitute of Computing, Kohat University of Science and Technology, Kohat, PakistanRobotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, Saudi ArabiaSchool of Information Engineering, Nanchang University, Nanchang, ChinaCitation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algorithms. The existing scholarly datasets are best suited for statistical approaches but lack citation context, intent, and section information. Furthermore, the datasets are too small to be used with deep learning approaches. For citation intent analysis, the datasets must have a citation context labeled with different citation intent classes. Most of the datasets either do not have labeled context sentences, or the sample is too small to be generalized. In this study, we critically investigated the available datasets for citation intent and proposed an automated citation intent technique to label the citation context with citation intent. Furthermore, we annotated ten million citation contexts with citation intent from Citation Context Dataset (C2D) dataset with the help of our proposed method. We applied Global Vectors (GloVe), Infersent, and Bidirectional Encoder Representations from Transformers (BERT) word embedding methods and compared their Precision, Recall, and F1 measures. It was found that BERT embedding performs significantly better, having an 89% Precision score. The labeled dataset, which is freely available for research purposes, will enhance the study of citation context analysis. Finally, It can be used as a benchmark dataset for finding the citation motivation and function from in-text citations.https://ieeexplore.ieee.org/document/9319154/Citation intentcitation analysiscitation contextcitation motivationcitation function classificationword embedding
collection DOAJ
language English
format Article
sources DOAJ
author Muhammad Roman
Abdul Shahid
Shafiullah Khan
Anis Koubaa
Lisu Yu
spellingShingle Muhammad Roman
Abdul Shahid
Shafiullah Khan
Anis Koubaa
Lisu Yu
Citation Intent Classification Using Word Embedding
IEEE Access
Citation intent
citation analysis
citation context
citation motivation
citation function classification
word embedding
author_facet Muhammad Roman
Abdul Shahid
Shafiullah Khan
Anis Koubaa
Lisu Yu
author_sort Muhammad Roman
title Citation Intent Classification Using Word Embedding
title_short Citation Intent Classification Using Word Embedding
title_full Citation Intent Classification Using Word Embedding
title_fullStr Citation Intent Classification Using Word Embedding
title_full_unstemmed Citation Intent Classification Using Word Embedding
title_sort citation intent classification using word embedding
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Citation analysis is an active area of research for various reasons. So far, statistical approaches are mainly used for citation analysis, which does not look into the internal context of the citations. Deep analysis of citation may reveal interesting findings by utilizing deep neural network algorithms. The existing scholarly datasets are best suited for statistical approaches but lack citation context, intent, and section information. Furthermore, the datasets are too small to be used with deep learning approaches. For citation intent analysis, the datasets must have a citation context labeled with different citation intent classes. Most of the datasets either do not have labeled context sentences, or the sample is too small to be generalized. In this study, we critically investigated the available datasets for citation intent and proposed an automated citation intent technique to label the citation context with citation intent. Furthermore, we annotated ten million citation contexts with citation intent from Citation Context Dataset (C2D) dataset with the help of our proposed method. We applied Global Vectors (GloVe), Infersent, and Bidirectional Encoder Representations from Transformers (BERT) word embedding methods and compared their Precision, Recall, and F1 measures. It was found that BERT embedding performs significantly better, having an 89% Precision score. The labeled dataset, which is freely available for research purposes, will enhance the study of citation context analysis. Finally, It can be used as a benchmark dataset for finding the citation motivation and function from in-text citations.
topic Citation intent
citation analysis
citation context
citation motivation
citation function classification
word embedding
url https://ieeexplore.ieee.org/document/9319154/
work_keys_str_mv AT muhammadroman citationintentclassificationusingwordembedding
AT abdulshahid citationintentclassificationusingwordembedding
AT shafiullahkhan citationintentclassificationusingwordembedding
AT aniskoubaa citationintentclassificationusingwordembedding
AT lisuyu citationintentclassificationusingwordembedding
_version_ 1724180124611379200