TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents

In Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction....

Full description

Bibliographic Details
Main Authors: Ahmad Amin, Toqir A. Rana, Natash Ali Mian, Muhammad Waseem Iqbal, Abbas Khalid, Tahir Alyas, Mohammad Tubishat
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9265205/
id doaj-6724720762db4d408608da11933ff3a0
record_format Article
spelling doaj-6724720762db4d408608da11933ff3a02021-03-30T03:52:04ZengIEEEIEEE Access2169-35362020-01-01821267521268610.1109/ACCESS.2020.30395489265205TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu DocumentsAhmad Amin0https://orcid.org/0000-0003-0302-5177Toqir A. Rana1https://orcid.org/0000-0003-4353-7150Natash Ali Mian2Muhammad Waseem Iqbal3Abbas Khalid4https://orcid.org/0000-0002-0007-0354Tahir Alyas5https://orcid.org/0000-0003-0938-3127Mohammad Tubishat6https://orcid.org/0000-0003-1464-8345Department of Computer Science and IT, The University of Lahore, Lahore, PakistanDepartment of Computer Science and IT, The University of Lahore, Lahore, PakistanSchool of Computer and Information Technology, Beaconhouse National University, Lahore, PakistanDepartment of Computer Science and IT, The Superior College (University Campus), Lahore, PakistanDepartment of Computer Science and IT, The University of Lahore, Lahore, PakistanDepartment of Computer Science, Lahore Garrison University, Lahore, PakistanSchool of Technology and Computing, Asia Pacific University of Technology and Innovation, Kuala Lumpur, MalaysiaIn Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction. Keyphrases are a set of terms which represent high level description of a document. Different techniques of keyphrase extraction for topic prediction have been proposed for multiple languages i.e. English, Arabic, etc. However, this area needs to be explored for other languages e.g. Urdu. Therefore, in this paper, a novel unsupervised approach for topic prediction for Urdu language has been introduced which is able to extract more significant information from the documents. For this purpose, the proposed TOP-Rank system extracts keywords from the document and ranks them according to their position in a sentence. These keywords along with their ranking scores are utilized to generate keyphrases by applying syntactic rules to extracts more meaningful topics. These keyphrases are ranked according to the keywords scores and re-ranked with respect to their positions in the document. Finally, our proposed model identifies top-ranked keyphrases as topical significance and keyphrase with the highest score is selected as the topic of the document. Experiments are performed on two different datasets and performance of the proposed system is compared with existing state-of-the-art techniques. Results have shown that our proposed system outperforms existing techniques and holds the ability to produce more meaningful topics.https://ieeexplore.ieee.org/document/9265205/Topic extractiontop-rankkeyphrase extractiontopic predictionUrdu positional ranking
collection DOAJ
language English
format Article
sources DOAJ
author Ahmad Amin
Toqir A. Rana
Natash Ali Mian
Muhammad Waseem Iqbal
Abbas Khalid
Tahir Alyas
Mohammad Tubishat
spellingShingle Ahmad Amin
Toqir A. Rana
Natash Ali Mian
Muhammad Waseem Iqbal
Abbas Khalid
Tahir Alyas
Mohammad Tubishat
TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
IEEE Access
Topic extraction
top-rank
keyphrase extraction
topic prediction
Urdu positional ranking
author_facet Ahmad Amin
Toqir A. Rana
Natash Ali Mian
Muhammad Waseem Iqbal
Abbas Khalid
Tahir Alyas
Mohammad Tubishat
author_sort Ahmad Amin
title TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
title_short TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
title_full TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
title_fullStr TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
title_full_unstemmed TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
title_sort top-rank: a novel unsupervised approach for topic prediction using keyphrase extraction for urdu documents
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description In Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction. Keyphrases are a set of terms which represent high level description of a document. Different techniques of keyphrase extraction for topic prediction have been proposed for multiple languages i.e. English, Arabic, etc. However, this area needs to be explored for other languages e.g. Urdu. Therefore, in this paper, a novel unsupervised approach for topic prediction for Urdu language has been introduced which is able to extract more significant information from the documents. For this purpose, the proposed TOP-Rank system extracts keywords from the document and ranks them according to their position in a sentence. These keywords along with their ranking scores are utilized to generate keyphrases by applying syntactic rules to extracts more meaningful topics. These keyphrases are ranked according to the keywords scores and re-ranked with respect to their positions in the document. Finally, our proposed model identifies top-ranked keyphrases as topical significance and keyphrase with the highest score is selected as the topic of the document. Experiments are performed on two different datasets and performance of the proposed system is compared with existing state-of-the-art techniques. Results have shown that our proposed system outperforms existing techniques and holds the ability to produce more meaningful topics.
topic Topic extraction
top-rank
keyphrase extraction
topic prediction
Urdu positional ranking
url https://ieeexplore.ieee.org/document/9265205/
work_keys_str_mv AT ahmadamin toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
AT toqirarana toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
AT natashalimian toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
AT muhammadwaseemiqbal toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
AT abbaskhalid toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
AT tahiralyas toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
AT mohammadtubishat toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments
_version_ 1724182746658504704