TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
In Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction....
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9265205/ |
id |
doaj-6724720762db4d408608da11933ff3a0 |
---|---|
record_format |
Article |
spelling |
doaj-6724720762db4d408608da11933ff3a02021-03-30T03:52:04ZengIEEEIEEE Access2169-35362020-01-01821267521268610.1109/ACCESS.2020.30395489265205TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu DocumentsAhmad Amin0https://orcid.org/0000-0003-0302-5177Toqir A. Rana1https://orcid.org/0000-0003-4353-7150Natash Ali Mian2Muhammad Waseem Iqbal3Abbas Khalid4https://orcid.org/0000-0002-0007-0354Tahir Alyas5https://orcid.org/0000-0003-0938-3127Mohammad Tubishat6https://orcid.org/0000-0003-1464-8345Department of Computer Science and IT, The University of Lahore, Lahore, PakistanDepartment of Computer Science and IT, The University of Lahore, Lahore, PakistanSchool of Computer and Information Technology, Beaconhouse National University, Lahore, PakistanDepartment of Computer Science and IT, The Superior College (University Campus), Lahore, PakistanDepartment of Computer Science and IT, The University of Lahore, Lahore, PakistanDepartment of Computer Science, Lahore Garrison University, Lahore, PakistanSchool of Technology and Computing, Asia Pacific University of Technology and Innovation, Kuala Lumpur, MalaysiaIn Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction. Keyphrases are a set of terms which represent high level description of a document. Different techniques of keyphrase extraction for topic prediction have been proposed for multiple languages i.e. English, Arabic, etc. However, this area needs to be explored for other languages e.g. Urdu. Therefore, in this paper, a novel unsupervised approach for topic prediction for Urdu language has been introduced which is able to extract more significant information from the documents. For this purpose, the proposed TOP-Rank system extracts keywords from the document and ranks them according to their position in a sentence. These keywords along with their ranking scores are utilized to generate keyphrases by applying syntactic rules to extracts more meaningful topics. These keyphrases are ranked according to the keywords scores and re-ranked with respect to their positions in the document. Finally, our proposed model identifies top-ranked keyphrases as topical significance and keyphrase with the highest score is selected as the topic of the document. Experiments are performed on two different datasets and performance of the proposed system is compared with existing state-of-the-art techniques. Results have shown that our proposed system outperforms existing techniques and holds the ability to produce more meaningful topics.https://ieeexplore.ieee.org/document/9265205/Topic extractiontop-rankkeyphrase extractiontopic predictionUrdu positional ranking |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ahmad Amin Toqir A. Rana Natash Ali Mian Muhammad Waseem Iqbal Abbas Khalid Tahir Alyas Mohammad Tubishat |
spellingShingle |
Ahmad Amin Toqir A. Rana Natash Ali Mian Muhammad Waseem Iqbal Abbas Khalid Tahir Alyas Mohammad Tubishat TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents IEEE Access Topic extraction top-rank keyphrase extraction topic prediction Urdu positional ranking |
author_facet |
Ahmad Amin Toqir A. Rana Natash Ali Mian Muhammad Waseem Iqbal Abbas Khalid Tahir Alyas Mohammad Tubishat |
author_sort |
Ahmad Amin |
title |
TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents |
title_short |
TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents |
title_full |
TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents |
title_fullStr |
TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents |
title_full_unstemmed |
TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents |
title_sort |
top-rank: a novel unsupervised approach for topic prediction using keyphrase extraction for urdu documents |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
In Natural Language Processing (NLP), topic modeling is the technique to extract abstract information from documents with huge amount of text. This abstract information leads towards the identification of the topics in the document. One way to retrieve topics from documents is keyphrase extraction. Keyphrases are a set of terms which represent high level description of a document. Different techniques of keyphrase extraction for topic prediction have been proposed for multiple languages i.e. English, Arabic, etc. However, this area needs to be explored for other languages e.g. Urdu. Therefore, in this paper, a novel unsupervised approach for topic prediction for Urdu language has been introduced which is able to extract more significant information from the documents. For this purpose, the proposed TOP-Rank system extracts keywords from the document and ranks them according to their position in a sentence. These keywords along with their ranking scores are utilized to generate keyphrases by applying syntactic rules to extracts more meaningful topics. These keyphrases are ranked according to the keywords scores and re-ranked with respect to their positions in the document. Finally, our proposed model identifies top-ranked keyphrases as topical significance and keyphrase with the highest score is selected as the topic of the document. Experiments are performed on two different datasets and performance of the proposed system is compared with existing state-of-the-art techniques. Results have shown that our proposed system outperforms existing techniques and holds the ability to produce more meaningful topics. |
topic |
Topic extraction top-rank keyphrase extraction topic prediction Urdu positional ranking |
url |
https://ieeexplore.ieee.org/document/9265205/ |
work_keys_str_mv |
AT ahmadamin toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments AT toqirarana toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments AT natashalimian toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments AT muhammadwaseemiqbal toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments AT abbaskhalid toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments AT tahiralyas toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments AT mohammadtubishat toprankanovelunsupervisedapproachfortopicpredictionusingkeyphraseextractionforurdudocuments |
_version_ |
1724182746658504704 |