Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning

Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitou...

Full description

Bibliographic Details
Main Authors: Shoayee Alotaibi, Rashid Mehmood, Iyad Katib, Omer Rana, Aiiad Albeshri
Format: Article
Language:English
Published: MDPI AG 2020-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/4/1398
id doaj-31be9e5fe4de496e9ec45bccde2970fb
record_format Article
spelling doaj-31be9e5fe4de496e9ec45bccde2970fb2020-11-25T02:03:24ZengMDPI AGApplied Sciences2076-34172020-02-01104139810.3390/app10041398app10041398Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine LearningShoayee Alotaibi0Rashid Mehmood1Iyad Katib2Omer Rana3Aiiad Albeshri4Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaHigh-Performance Computing Center, King Abdulaziz University, Jeddah 21589, Saudi ArabiaComputer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaSchool of Computer Science, Cardiff University, Cardiff CF10 3AT, UKComputer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaSmartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics.https://www.mdpi.com/2076-3417/10/4/1398smart citieshealthcareapache sparkdisease detectionsymptoms detectionarabic languagesaudi dialecttwittermachine learningbig datahigh performance computing (hpc)
collection DOAJ
language English
format Article
sources DOAJ
author Shoayee Alotaibi
Rashid Mehmood
Iyad Katib
Omer Rana
Aiiad Albeshri
spellingShingle Shoayee Alotaibi
Rashid Mehmood
Iyad Katib
Omer Rana
Aiiad Albeshri
Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
Applied Sciences
smart cities
healthcare
apache spark
disease detection
symptoms detection
arabic language
saudi dialect
twitter
machine learning
big data
high performance computing (hpc)
author_facet Shoayee Alotaibi
Rashid Mehmood
Iyad Katib
Omer Rana
Aiiad Albeshri
author_sort Shoayee Alotaibi
title Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
title_short Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
title_full Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
title_fullStr Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
title_full_unstemmed Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
title_sort sehaa: a big data analytics tool for healthcare symptoms and diseases detection using twitter, apache spark, and machine learning
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-02-01
description Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics.
topic smart cities
healthcare
apache spark
disease detection
symptoms detection
arabic language
saudi dialect
twitter
machine learning
big data
high performance computing (hpc)
url https://www.mdpi.com/2076-3417/10/4/1398
work_keys_str_mv AT shoayeealotaibi sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning
AT rashidmehmood sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning
AT iyadkatib sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning
AT omerrana sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning
AT aiiadalbeshri sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning
_version_ 1724948586419978240