Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitou...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/4/1398 |
id |
doaj-31be9e5fe4de496e9ec45bccde2970fb |
---|---|
record_format |
Article |
spelling |
doaj-31be9e5fe4de496e9ec45bccde2970fb2020-11-25T02:03:24ZengMDPI AGApplied Sciences2076-34172020-02-01104139810.3390/app10041398app10041398Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine LearningShoayee Alotaibi0Rashid Mehmood1Iyad Katib2Omer Rana3Aiiad Albeshri4Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaHigh-Performance Computing Center, King Abdulaziz University, Jeddah 21589, Saudi ArabiaComputer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaSchool of Computer Science, Cardiff University, Cardiff CF10 3AT, UKComputer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaSmartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics.https://www.mdpi.com/2076-3417/10/4/1398smart citieshealthcareapache sparkdisease detectionsymptoms detectionarabic languagesaudi dialecttwittermachine learningbig datahigh performance computing (hpc) |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shoayee Alotaibi Rashid Mehmood Iyad Katib Omer Rana Aiiad Albeshri |
spellingShingle |
Shoayee Alotaibi Rashid Mehmood Iyad Katib Omer Rana Aiiad Albeshri Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning Applied Sciences smart cities healthcare apache spark disease detection symptoms detection arabic language saudi dialect machine learning big data high performance computing (hpc) |
author_facet |
Shoayee Alotaibi Rashid Mehmood Iyad Katib Omer Rana Aiiad Albeshri |
author_sort |
Shoayee Alotaibi |
title |
Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning |
title_short |
Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning |
title_full |
Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning |
title_fullStr |
Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning |
title_full_unstemmed |
Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning |
title_sort |
sehaa: a big data analytics tool for healthcare symptoms and diseases detection using twitter, apache spark, and machine learning |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2020-02-01 |
description |
Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics. |
topic |
smart cities healthcare apache spark disease detection symptoms detection arabic language saudi dialect machine learning big data high performance computing (hpc) |
url |
https://www.mdpi.com/2076-3417/10/4/1398 |
work_keys_str_mv |
AT shoayeealotaibi sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning AT rashidmehmood sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning AT iyadkatib sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning AT omerrana sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning AT aiiadalbeshri sehaaabigdataanalyticstoolforhealthcaresymptomsanddiseasesdetectionusingtwitterapachesparkandmachinelearning |
_version_ |
1724948586419978240 |