Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics

Performing sentiment analysis on social networks big data can be helpful for various research and business projects to take useful insights from text-oriented content. In this paper, we propose a general pre-processing framework for sentiment analysis, which is devoted to adopting FastText with Recu...

詳細記述

書誌詳細
出版年:Journal of Artificial Intelligence and Data Mining
主要な著者: M. Molaei, D. Mohamadpur
フォーマット: 論文
言語:英語
出版事項: Shahrood University of Technology 2022-04-01
主題:
オンライン・アクセス:https://jad.shahroodut.ac.ir/article_2394_3644e2e9116dad9870c3da1fa96f3f24.pdf
_version_ 1849831689298116608
author M. Molaei
D. Mohamadpur
author_facet M. Molaei
D. Mohamadpur
author_sort M. Molaei
collection DOAJ
container_title Journal of Artificial Intelligence and Data Mining
description Performing sentiment analysis on social networks big data can be helpful for various research and business projects to take useful insights from text-oriented content. In this paper, we propose a general pre-processing framework for sentiment analysis, which is devoted to adopting FastText with Recurrent Neural Network variants to prepare textual data efficiently. This framework consists of three different stages of data cleansing, tweets padding, word embedding’s extraction from FastText and conversion of tweets to these vectors, which implemented using DataFrame data structure in Apache Spark. Its main objective is to enhance the performance of online sentiment analysis in terms of pre-processing time and handle large scale data volume. In addition, we propose a distributed intelligent system for online social big data analytics. It is designed to store, process, and classify a huge amount of information in online. The proposed system adopts any word embedding libraries like FastText with different distributed deep learning models like LSTM or GRU. The results of the evaluations show that the proposed framework can significantly improve the performance of previous RDD-based methods in terms of processing time and data volume.
format Article
id doaj-art-e5d78d4ad3aa42d8bbf86d6d87f547fa
institution Directory of Open Access Journals
issn 2322-5211
2322-4444
language English
publishDate 2022-04-01
publisher Shahrood University of Technology
record_format Article
spelling doaj-art-e5d78d4ad3aa42d8bbf86d6d87f547fa2025-08-20T01:28:08ZengShahrood University of TechnologyJournal of Artificial Intelligence and Data Mining2322-52112322-44442022-04-0110219720510.22044/jadm.2022.11330.22932394Distributed Online Pre-Processing Framework for Big Data Sentiment AnalyticsM. Molaei0D. Mohamadpur1Department of Computer Engineering, University of Zanjan, Iran.Department of Computer Engineering, University of Zanjan, Iran.Performing sentiment analysis on social networks big data can be helpful for various research and business projects to take useful insights from text-oriented content. In this paper, we propose a general pre-processing framework for sentiment analysis, which is devoted to adopting FastText with Recurrent Neural Network variants to prepare textual data efficiently. This framework consists of three different stages of data cleansing, tweets padding, word embedding’s extraction from FastText and conversion of tweets to these vectors, which implemented using DataFrame data structure in Apache Spark. Its main objective is to enhance the performance of online sentiment analysis in terms of pre-processing time and handle large scale data volume. In addition, we propose a distributed intelligent system for online social big data analytics. It is designed to store, process, and classify a huge amount of information in online. The proposed system adopts any word embedding libraries like FastText with different distributed deep learning models like LSTM or GRU. The results of the evaluations show that the proposed framework can significantly improve the performance of previous RDD-based methods in terms of processing time and data volume.https://jad.shahroodut.ac.ir/article_2394_3644e2e9116dad9870c3da1fa96f3f24.pdfbigdatapre-processingapache-sparkdataframernn
spellingShingle M. Molaei
D. Mohamadpur
Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
bigdata
pre-processing
apache-spark
dataframe
rnn
title Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
title_full Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
title_fullStr Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
title_full_unstemmed Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
title_short Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
title_sort distributed online pre processing framework for big data sentiment analytics
topic bigdata
pre-processing
apache-spark
dataframe
rnn
url https://jad.shahroodut.ac.ir/article_2394_3644e2e9116dad9870c3da1fa96f3f24.pdf
work_keys_str_mv AT mmolaei distributedonlinepreprocessingframeworkforbigdatasentimentanalytics
AT dmohamadpur distributedonlinepreprocessingframeworkforbigdatasentimentanalytics