Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014

As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machi...

Full description

Bibliographic Details
Main Authors: Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, Su-I Hou
Format: Article
Language:English
Published: MDPI AG 2019-06-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/4/2/84
id doaj-ed6bd761ea7f44d3a9bfeb2961dda54d
record_format Article
spelling doaj-ed6bd761ea7f44d3a9bfeb2961dda54d2020-11-25T02:40:48ZengMDPI AGData2306-57292019-06-01428410.3390/data4020084data4020084Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014Isaac Chun-Hai Fung0Jingjing Yin1Keisha D. Pressley2Carmen H. Duke3Chen Mo4Hai Liang5King-Wa Fu6Zion Tsz Ho Tse7Su-I Hou8Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30460, USADepartment of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30460, USADepartment of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30460, USADepartment of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30460, USADepartment of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30460, USASchool of Journalism and Communication, Chinese University of Hong Kong, Hong Kong Special Administrative Region, ChinaJournalism and Media Studies Centre, The University of Hong Kong, HongKong, ChinaSchool of Electrical and Computer Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USACollege of Community Innovation and Education, The University of Central Florida, Orlando, FL 32816, USAAs a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machine (SVM) models as classifiers of relevant tweets was evaluated. A manual coding of 1,826 randomly sampled HIV/AIDS-related original tweets from November 30 through December 2, 2014 was completed. Logistic regression was applied to analyze the association between the World Bank-designated income level of users’ self-reported countries and Twitter contents. To identify the optimal SVM model, 1278 (70%) of the 1826 sampled tweets were randomly selected as the training set, and 548 (30%) served as the test set. Another 180 tweets were separately sampled and coded as the held-out dataset. Compared with tweets from low-income countries, tweets from the Organization for Economic Cooperation and Development countries had 60% lower odds to mention epidemiology (adjusted odds ratio, aOR = 0.404; 95% CI: 0.166, 0.981) and three times the odds to mention compassion/support (aOR = 3.080; 95% CI: 1.179, 8.047). Tweets from lower-middle-income countries had 79% lower odds than tweets from low-income countries to mention HIV-affected sub-populations (aOR = 0.213; 95% CI: 0.068, 0.664). The optimal SVM model was able to identify relevant tweets from the held-out dataset of 180 tweets with an accuracy (F1 score) of 0.72. This study demonstrated how students can be taught to analyze Twitter data using manual coding, regression models, and SVM models.https://www.mdpi.com/2306-5729/4/2/84global healthhealth promotionHIV/AIDSsocial mediasupervised machine learningTwitter
collection DOAJ
language English
format Article
sources DOAJ
author Isaac Chun-Hai Fung
Jingjing Yin
Keisha D. Pressley
Carmen H. Duke
Chen Mo
Hai Liang
King-Wa Fu
Zion Tsz Ho Tse
Su-I Hou
spellingShingle Isaac Chun-Hai Fung
Jingjing Yin
Keisha D. Pressley
Carmen H. Duke
Chen Mo
Hai Liang
King-Wa Fu
Zion Tsz Ho Tse
Su-I Hou
Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
Data
global health
health promotion
HIV/AIDS
social media
supervised machine learning
Twitter
author_facet Isaac Chun-Hai Fung
Jingjing Yin
Keisha D. Pressley
Carmen H. Duke
Chen Mo
Hai Liang
King-Wa Fu
Zion Tsz Ho Tse
Su-I Hou
author_sort Isaac Chun-Hai Fung
title Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
title_short Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
title_full Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
title_fullStr Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
title_full_unstemmed Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
title_sort pedagogical demonstration of twitter data analysis: a case study of world aids day, 2014
publisher MDPI AG
series Data
issn 2306-5729
publishDate 2019-06-01
description As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machine (SVM) models as classifiers of relevant tweets was evaluated. A manual coding of 1,826 randomly sampled HIV/AIDS-related original tweets from November 30 through December 2, 2014 was completed. Logistic regression was applied to analyze the association between the World Bank-designated income level of users’ self-reported countries and Twitter contents. To identify the optimal SVM model, 1278 (70%) of the 1826 sampled tweets were randomly selected as the training set, and 548 (30%) served as the test set. Another 180 tweets were separately sampled and coded as the held-out dataset. Compared with tweets from low-income countries, tweets from the Organization for Economic Cooperation and Development countries had 60% lower odds to mention epidemiology (adjusted odds ratio, aOR = 0.404; 95% CI: 0.166, 0.981) and three times the odds to mention compassion/support (aOR = 3.080; 95% CI: 1.179, 8.047). Tweets from lower-middle-income countries had 79% lower odds than tweets from low-income countries to mention HIV-affected sub-populations (aOR = 0.213; 95% CI: 0.068, 0.664). The optimal SVM model was able to identify relevant tweets from the held-out dataset of 180 tweets with an accuracy (F1 score) of 0.72. This study demonstrated how students can be taught to analyze Twitter data using manual coding, regression models, and SVM models.
topic global health
health promotion
HIV/AIDS
social media
supervised machine learning
Twitter
url https://www.mdpi.com/2306-5729/4/2/84
work_keys_str_mv AT isaacchunhaifung pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT jingjingyin pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT keishadpressley pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT carmenhduke pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT chenmo pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT hailiang pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT kingwafu pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT ziontszhotse pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
AT suihou pedagogicaldemonstrationoftwitterdataanalysisacasestudyofworldaidsday2014
_version_ 1724779692346572800