An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources

It is hard to select and read suitable documents due to the rapidly growing number of scholarly documents. Keyphrases can be considered as the gist of a document so that a researcher can select the documents that they want using keyphrase queries. However, there are also many scholarly documents wit...

Full description

Bibliographic Details
Main Authors:	Teng-Fei Li, Liang Hu, Jian-Feng Chu, Hong-Tu Li, Ling Chi
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Phrase extraction graph-based ranking topic-based clustering within-collection resource NLP
Online Access:	https://ieeexplore.ieee.org/document/8819880/

id	doaj-5e0de68778754d78a2d77ddb8fa280db
record_format	Article
spelling	doaj-5e0de68778754d78a2d77ddb8fa280db2021-03-29T23:21:43ZengIEEEIEEE Access2169-35362019-01-01712608812609710.1109/ACCESS.2019.29382138819880An Unsupervised Approach for Keyphrase Extraction Using Within-Collection ResourcesTeng-Fei Li0https://orcid.org/0000-0002-7696-5779Liang Hu1Jian-Feng Chu2Hong-Tu Li3Ling Chi4https://orcid.org/0000-0002-2716-9127College of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaCollege of Computer Science and Technology, Jilin University, Changchun, ChinaIt is hard to select and read suitable documents due to the rapidly growing number of scholarly documents. Keyphrases can be considered as the gist of a document so that a researcher can select the documents that they want using keyphrase queries. However, there are also many scholarly documents without any keyphrases tagged by the authors or other researchers. Automatic keyphrase extraction can help researchers to quickly extract keyphrases. This paper proposed an unsupervised approach for keyphrase extraction using graph-based ranking and topic-based clustering under the assumption that we only use the within-collection resources. We use graph-based ranking to describe the relevance between two words and topic-based clustering to embed semantical information into words. In this paper, we assume that each word has its own meaning, and each meaning can be considered as a topic, though we know nothing about these meanings. We use topic-based clustering to assign the “correct meaning” to the “correct word”. In addition, by taking the relevance among phrases into consideration and only using within-collection resources, we can use the graph-based ranking in our approach. The edges in a graph that are built for phrases can describe the hidden relevance between two phrases, and the weights that are set for edges can measure the connection between two phrases. Then, after using the position feature, our approach consists of an enhanced graph-based ranking and a topic-based clustering. The experiments are run on four datasets: KDD, WWW, GSN and ACM. The results indicate that our approach has better performance than the state-of-the-art methods.https://ieeexplore.ieee.org/document/8819880/Phrase extractiongraph-based rankingtopic-based clusteringwithin-collection resourceNLP
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Teng-Fei Li Liang Hu Jian-Feng Chu Hong-Tu Li Ling Chi
spellingShingle	Teng-Fei Li Liang Hu Jian-Feng Chu Hong-Tu Li Ling Chi An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources IEEE Access Phrase extraction graph-based ranking topic-based clustering within-collection resource NLP
author_facet	Teng-Fei Li Liang Hu Jian-Feng Chu Hong-Tu Li Ling Chi
author_sort	Teng-Fei Li
title	An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources
title_short	An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources
title_full	An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources
title_fullStr	An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources
title_full_unstemmed	An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources
title_sort	unsupervised approach for keyphrase extraction using within-collection resources
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	It is hard to select and read suitable documents due to the rapidly growing number of scholarly documents. Keyphrases can be considered as the gist of a document so that a researcher can select the documents that they want using keyphrase queries. However, there are also many scholarly documents without any keyphrases tagged by the authors or other researchers. Automatic keyphrase extraction can help researchers to quickly extract keyphrases. This paper proposed an unsupervised approach for keyphrase extraction using graph-based ranking and topic-based clustering under the assumption that we only use the within-collection resources. We use graph-based ranking to describe the relevance between two words and topic-based clustering to embed semantical information into words. In this paper, we assume that each word has its own meaning, and each meaning can be considered as a topic, though we know nothing about these meanings. We use topic-based clustering to assign the “correct meaning” to the “correct word”. In addition, by taking the relevance among phrases into consideration and only using within-collection resources, we can use the graph-based ranking in our approach. The edges in a graph that are built for phrases can describe the hidden relevance between two phrases, and the weights that are set for edges can measure the connection between two phrases. Then, after using the position feature, our approach consists of an enhanced graph-based ranking and a topic-based clustering. The experiments are run on four datasets: KDD, WWW, GSN and ACM. The results indicate that our approach has better performance than the state-of-the-art methods.
topic	Phrase extraction graph-based ranking topic-based clustering within-collection resource NLP
url	https://ieeexplore.ieee.org/document/8819880/
work_keys_str_mv	AT tengfeili anunsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT lianghu anunsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT jianfengchu anunsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT hongtuli anunsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT lingchi anunsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT tengfeili unsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT lianghu unsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT jianfengchu unsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT hongtuli unsupervisedapproachforkeyphraseextractionusingwithincollectionresources AT lingchi unsupervisedapproachforkeyphraseextractionusingwithincollectionresources
_version_	1724189644514394112

An Unsupervised Approach for Keyphrase Extraction Using Within-Collection Resources

Similar Items