Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora

Sentiment analysis has recently drawn considerable research attentions in the recent years owing to its applicability in determining users opinion, sentiment and emotions from large collections of textual data. The goal of sentiment analysis centered on improving users experience by deploying robust...

Full description

Bibliographic Details
Main Authors: Kayode Sakariyau Adewole, Abdullateef Oluwagbemiga Balogun, Muiz Raheem, Muhammed K. Jimoh, Rasheed Gbenga Jimoh, Modinat Abolore Mabayoje, Fatima E. Usman-Hamza, Abimbola Ganiyat Akintola, Ayisat Wuraola Asaju-Gbolagade
Format: Article
Language:English
Published: Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT) 2021-06-01
Series:Jordanian Journal of Computers and Information Technology
Subjects:
Online Access:http://www.ejmanager.com/fulltextpdf.php?mno=40755
id doaj-2f99587837af4aa096e13cc364d2244d
record_format Article
spelling doaj-2f99587837af4aa096e13cc364d2244d2021-06-01T12:04:19ZengScientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)Jordanian Journal of Computers and Information Technology 2413-93512021-06-017213015110.5455/jjcit.71-160985871340755Hybrid Feature Selection Framework for Sentiment Analysis on Large CorporaKayode Sakariyau Adewole0Abdullateef Oluwagbemiga Balogun1Muiz Raheem2Muhammed K. Jimoh3Rasheed Gbenga Jimoh4Modinat Abolore Mabayoje5Fatima E. Usman-Hamza6Abimbola Ganiyat Akintola7Ayisat Wuraola Asaju-Gbolagade8Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Education Technology, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria. Department of Computer Science, University of Ilorin, Ilorin, Nigeria.Sentiment analysis has recently drawn considerable research attentions in the recent years owing to its applicability in determining users opinion, sentiment and emotions from large collections of textual data. The goal of sentiment analysis centered on improving users experience by deploying robust techniques that mine opinions and emotions from large corpora. Although there are a number of studies on sentiment analysis and opinion mining from textual information, however, the existence of domain-specific words such as slang, abbreviations and grammatical mistakes further posed serious challenges to existing sentiment analysis methods. Therefore, research efforts have focused on finding the most discriminative attributes that can help in capturing users opinions from textual datasets. In this paper, we focused on identification of effective discriminative subset of features that can aid classification of users opinion from large corpora. This study proposed hybrid feature selection framework that is based on hybridization of filter- and wrapper-based feature selection methods. Correlation feature selection (CFS), a filter-based approach is hybridized with Boruta and Recursive Feature Elimination (RFE), which are wrapper-based feature selection methods, to identify the most discriminative features subsets for sentiment analysis. Four publicly available datasets for sentiment analysis: Amazon, Yelp, IMDB and Kaggle were considered to evaluate the performance of the proposed hybrid feature selection framework. This study evaluated the performance of three classification algorithms: Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) to ascertain the superiority of the proposed approach. Experimental results across different contexts as depicted by the datasets considered in this study clearly showed that CFS combined with Boruta produced promising results especially when the features selected are passed to RF classifier. Indeed, the proposed hybrid framework provide effective way of predicting users opinions and emotions while giving substantial consideration to predictive accuracy [JJCIT 2021; 7(2.000): 130-151]http://www.ejmanager.com/fulltextpdf.php?mno=40755sentiment analysisopinion mininghybrid feature selectionborutarecursive feature elimination
collection DOAJ
language English
format Article
sources DOAJ
author Kayode Sakariyau Adewole
Abdullateef Oluwagbemiga Balogun
Muiz Raheem
Muhammed K. Jimoh
Rasheed Gbenga Jimoh
Modinat Abolore Mabayoje
Fatima E. Usman-Hamza
Abimbola Ganiyat Akintola
Ayisat Wuraola Asaju-Gbolagade
spellingShingle Kayode Sakariyau Adewole
Abdullateef Oluwagbemiga Balogun
Muiz Raheem
Muhammed K. Jimoh
Rasheed Gbenga Jimoh
Modinat Abolore Mabayoje
Fatima E. Usman-Hamza
Abimbola Ganiyat Akintola
Ayisat Wuraola Asaju-Gbolagade
Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora
Jordanian Journal of Computers and Information Technology
sentiment analysis
opinion mining
hybrid feature selection
boruta
recursive feature elimination
author_facet Kayode Sakariyau Adewole
Abdullateef Oluwagbemiga Balogun
Muiz Raheem
Muhammed K. Jimoh
Rasheed Gbenga Jimoh
Modinat Abolore Mabayoje
Fatima E. Usman-Hamza
Abimbola Ganiyat Akintola
Ayisat Wuraola Asaju-Gbolagade
author_sort Kayode Sakariyau Adewole
title Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora
title_short Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora
title_full Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora
title_fullStr Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora
title_full_unstemmed Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora
title_sort hybrid feature selection framework for sentiment analysis on large corpora
publisher Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
series Jordanian Journal of Computers and Information Technology
issn 2413-9351
publishDate 2021-06-01
description Sentiment analysis has recently drawn considerable research attentions in the recent years owing to its applicability in determining users opinion, sentiment and emotions from large collections of textual data. The goal of sentiment analysis centered on improving users experience by deploying robust techniques that mine opinions and emotions from large corpora. Although there are a number of studies on sentiment analysis and opinion mining from textual information, however, the existence of domain-specific words such as slang, abbreviations and grammatical mistakes further posed serious challenges to existing sentiment analysis methods. Therefore, research efforts have focused on finding the most discriminative attributes that can help in capturing users opinions from textual datasets. In this paper, we focused on identification of effective discriminative subset of features that can aid classification of users opinion from large corpora. This study proposed hybrid feature selection framework that is based on hybridization of filter- and wrapper-based feature selection methods. Correlation feature selection (CFS), a filter-based approach is hybridized with Boruta and Recursive Feature Elimination (RFE), which are wrapper-based feature selection methods, to identify the most discriminative features subsets for sentiment analysis. Four publicly available datasets for sentiment analysis: Amazon, Yelp, IMDB and Kaggle were considered to evaluate the performance of the proposed hybrid feature selection framework. This study evaluated the performance of three classification algorithms: Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) to ascertain the superiority of the proposed approach. Experimental results across different contexts as depicted by the datasets considered in this study clearly showed that CFS combined with Boruta produced promising results especially when the features selected are passed to RF classifier. Indeed, the proposed hybrid framework provide effective way of predicting users opinions and emotions while giving substantial consideration to predictive accuracy [JJCIT 2021; 7(2.000): 130-151]
topic sentiment analysis
opinion mining
hybrid feature selection
boruta
recursive feature elimination
url http://www.ejmanager.com/fulltextpdf.php?mno=40755
work_keys_str_mv AT kayodesakariyauadewole hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT abdullateefoluwagbemigabalogun hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT muizraheem hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT muhammedkjimoh hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT rasheedgbengajimoh hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT modinataboloremabayoje hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT fatimaeusmanhamza hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT abimbolaganiyatakintola hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
AT ayisatwuraolaasajugbolagade hybridfeatureselectionframeworkforsentimentanalysisonlargecorpora
_version_ 1721410789793857536