A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives

BackgroundTwitter has shown some usefulness in predicting influenza cases on a weekly basis in multiple countries and on different geographic scales. Recently, Broniatowski and colleagues suggested Twitter’s relevance at the city-level for New York City. Here, we look to dive...

Full description

Bibliographic Details
Main Authors:	Nagar, Ruchit, Yuan, Qingyu, Freifeld, Clark C, Santillana, Mauricio, Nojima, Aaron, Chunara, Rumi, Brownstein, John S
Format:	Article
Language:	English
Published:	JMIR Publications 2014-10-01
Series:	Journal of Medical Internet Research
Online Access:	http://www.jmir.org/2014/10/e236/

id	doaj-937f16f4eae14d678c51d0bce76f2bf0
record_format	Article
spelling	doaj-937f16f4eae14d678c51d0bce76f2bf02021-04-02T21:36:07ZengJMIR PublicationsJournal of Medical Internet Research1438-88712014-10-011610e23610.2196/jmir.3416A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal PerspectivesNagar, RuchitYuan, QingyuFreifeld, Clark CSantillana, MauricioNojima, AaronChunara, RumiBrownstein, John S BackgroundTwitter has shown some usefulness in predicting influenza cases on a weekly basis in multiple countries and on different geographic scales. Recently, Broniatowski and colleagues suggested Twitter’s relevance at the city-level for New York City. Here, we look to dive deeper into the case of New York City by analyzing daily Twitter data from temporal and spatiotemporal perspectives. Also, through manual coding of all tweets, we look to gain qualitative insights that can help direct future automated searches. ObjectiveThe intent of the study was first to validate the temporal predictive strength of daily Twitter data for influenza-like illness emergency department (ILI-ED) visits during the New York City 2012-2013 influenza season against other available and established datasets (Google search query, or GSQ), and second, to examine the spatial distribution and the spread of geocoded tweets as proxies for potential cases. MethodsFrom the Twitter Streaming API, 2972 tweets were collected in the New York City region matching the keywords “flu”, “influenza”, “gripe”, and “high fever”. The tweets were categorized according to the scheme developed by Lamb et al. A new fourth category was added as an evaluator guess for the probability of the subject(s) being sick to account for strength of confidence in the validity of the statement. Temporal correlations were made for tweets against daily ILI-ED visits and daily GSQ volume. The best models were used for linear regression for forecasting ILI visits. A weighted, retrospective Poisson model with SaTScan software (n=1484), and vector map were used for spatiotemporal analysis. ResultsInfection-related tweets (R=.763) correlated better than GSQ time series (R=.683) for the same keywords and had a lower mean average percent error (8.4 vs 11.8) for ILI-ED visit prediction in January, the most volatile month of flu. SaTScan identified primary outbreak cluster of high-probability infection tweets with a 2.74 relative risk ratio compared to medium-probability infection tweets at P=.001 in Northern Brooklyn, in a radius that includes Barclay’s Center and the Atlantic Avenue Terminal. ConclusionsWhile others have looked at weekly regional tweets, this study is the first to stress test Twitter for daily city-level data for New York City. Extraction of personal testimonies of infection-related tweets suggests Twitter’s strength both qualitatively and quantitatively for ILI-ED prediction compared to alternative daily datasets mixed with awareness-based data such as GSQ. Additionally, granular Twitter data provide important spatiotemporal insights. A tweet vector-map may be useful for visualization of city-level spread when local gold standard data are otherwise unavailable.http://www.jmir.org/2014/10/e236/
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Nagar, Ruchit Yuan, Qingyu Freifeld, Clark C Santillana, Mauricio Nojima, Aaron Chunara, Rumi Brownstein, John S
spellingShingle	Nagar, Ruchit Yuan, Qingyu Freifeld, Clark C Santillana, Mauricio Nojima, Aaron Chunara, Rumi Brownstein, John S A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives Journal of Medical Internet Research
author_facet	Nagar, Ruchit Yuan, Qingyu Freifeld, Clark C Santillana, Mauricio Nojima, Aaron Chunara, Rumi Brownstein, John S
author_sort	Nagar, Ruchit
title	A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives
title_short	A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives
title_full	A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives
title_fullStr	A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives
title_full_unstemmed	A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives
title_sort	case study of the new york city 2012-2013 influenza season with daily geocoded twitter data from temporal and spatiotemporal perspectives
publisher	JMIR Publications
series	Journal of Medical Internet Research
issn	1438-8871
publishDate	2014-10-01
description	BackgroundTwitter has shown some usefulness in predicting influenza cases on a weekly basis in multiple countries and on different geographic scales. Recently, Broniatowski and colleagues suggested Twitter’s relevance at the city-level for New York City. Here, we look to dive deeper into the case of New York City by analyzing daily Twitter data from temporal and spatiotemporal perspectives. Also, through manual coding of all tweets, we look to gain qualitative insights that can help direct future automated searches. ObjectiveThe intent of the study was first to validate the temporal predictive strength of daily Twitter data for influenza-like illness emergency department (ILI-ED) visits during the New York City 2012-2013 influenza season against other available and established datasets (Google search query, or GSQ), and second, to examine the spatial distribution and the spread of geocoded tweets as proxies for potential cases. MethodsFrom the Twitter Streaming API, 2972 tweets were collected in the New York City region matching the keywords “flu”, “influenza”, “gripe”, and “high fever”. The tweets were categorized according to the scheme developed by Lamb et al. A new fourth category was added as an evaluator guess for the probability of the subject(s) being sick to account for strength of confidence in the validity of the statement. Temporal correlations were made for tweets against daily ILI-ED visits and daily GSQ volume. The best models were used for linear regression for forecasting ILI visits. A weighted, retrospective Poisson model with SaTScan software (n=1484), and vector map were used for spatiotemporal analysis. ResultsInfection-related tweets (R=.763) correlated better than GSQ time series (R=.683) for the same keywords and had a lower mean average percent error (8.4 vs 11.8) for ILI-ED visit prediction in January, the most volatile month of flu. SaTScan identified primary outbreak cluster of high-probability infection tweets with a 2.74 relative risk ratio compared to medium-probability infection tweets at P=.001 in Northern Brooklyn, in a radius that includes Barclay’s Center and the Atlantic Avenue Terminal. ConclusionsWhile others have looked at weekly regional tweets, this study is the first to stress test Twitter for daily city-level data for New York City. Extraction of personal testimonies of infection-related tweets suggests Twitter’s strength both qualitatively and quantitatively for ILI-ED prediction compared to alternative daily datasets mixed with awareness-based data such as GSQ. Additionally, granular Twitter data provide important spatiotemporal insights. A tweet vector-map may be useful for visualization of city-level spread when local gold standard data are otherwise unavailable.
url	http://www.jmir.org/2014/10/e236/
work_keys_str_mv	AT nagarruchit acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT yuanqingyu acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT freifeldclarkc acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT santillanamauricio acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT nojimaaaron acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT chunararumi acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT brownsteinjohns acasestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT nagarruchit casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT yuanqingyu casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT freifeldclarkc casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT santillanamauricio casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT nojimaaaron casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT chunararumi casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives AT brownsteinjohns casestudyofthenewyorkcity20122013influenzaseasonwithdailygeocodedtwitterdatafromtemporalandspatiotemporalperspectives
_version_	1721544974405730304

A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives

Similar Items