Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns

Background: Timely allocation of medical resources for coronavirus disease (COVID-19) requires early detection of regional outbreaks. Internet browsing data may predict case outbreaks in local populations that are yet to be confirmed. Objective: We investigated whether search-engine query patterns c...

Full description

Bibliographic Details
Main Authors: Cousins, Henry C (Author), Cousins, Clara C (Author), Harris, Alon (Author), Pasquale, Louis R (Author)
Other Authors: Massachusetts Institute of Technology. Department of Biological Engineering (Contributor)
Format: Article
Language:English
Published: JMIR Publications Inc., 2021-01-04T21:39:15Z.
Subjects:
Online Access:Get fulltext
LEADER 02249 am a22001813u 4500
001 128949
042 |a dc 
100 1 0 |a Cousins, Henry C  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Biological Engineering  |e contributor 
700 1 0 |a Cousins, Clara C  |e author 
700 1 0 |a Harris, Alon  |e author 
700 1 0 |a Pasquale, Louis R  |e author 
245 0 0 |a Regional Infoveillance of COVID-19 Case Rates: Analysis of Search-Engine Query Patterns 
260 |b JMIR Publications Inc.,   |c 2021-01-04T21:39:15Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/128949 
520 |a Background: Timely allocation of medical resources for coronavirus disease (COVID-19) requires early detection of regional outbreaks. Internet browsing data may predict case outbreaks in local populations that are yet to be confirmed. Objective: We investigated whether search-engine query patterns can help to predict COVID-19 case rates at the state and metropolitan area levels in the United States. Methods: We used regional confirmed case data from the New York Times and Google Trends results from 50 states and 166 county-based designated market areas (DMA). We identified search terms whose activity precedes and correlates with confirmed case rates at the national level. We used univariate regression to construct a composite explanatory variable based on best-fitting search queries offset by temporal lags. We measured the raw and z-transformed Pearson correlation and root-mean-square error (RMSE) of the explanatory variable with out-of-sample case rate data at the state and DMA levels. Results: Predictions were highly correlated with confirmed case rates at the state (mean r=0.69, 95% CI 0.51-0.81; median RMSE 1.27, IQR 1.48) and DMA levels (mean r=0.51, 95% CI 0.39-0.61; median RMSE 4.38, IQR 1.80), using search data available up to 10 days prior to confirmed case rates. They fit case-rate activity in 49 of 50 states and in 103 of 166 DMA at a significance level of .05. Conclusions: Identifiable patterns in search query activity may help to predict emerging regional outbreaks of COVID-19, although they remain vulnerable to stochastic changes in search intensity. 
655 7 |a Article 
773 |t Journal of Medical Internet Research