Automated vocabulary discovery for geo-parsing online epidemic intelligence

<p>Abstract</p> <p>Background</p> <p>Automated surveillance of the Internet provides a timely and sensitive method for alerting on global emerging infectious disease threats. HealthMap is part of a new generation of online systems designed to monitor and visualize, on a...

Full description

Bibliographic Details
Main Authors: Freifeld Clark C, Keller Mikaela, Brownstein John S
Format: Article
Language:English
Published: BMC 2009-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/385
id doaj-5161bd328718441ebdfa5a5962acc4e2
record_format Article
spelling doaj-5161bd328718441ebdfa5a5962acc4e22020-11-24T21:55:35ZengBMCBMC Bioinformatics1471-21052009-11-0110138510.1186/1471-2105-10-385Automated vocabulary discovery for geo-parsing online epidemic intelligenceFreifeld Clark CKeller MikaelaBrownstein John S<p>Abstract</p> <p>Background</p> <p>Automated surveillance of the Internet provides a timely and sensitive method for alerting on global emerging infectious disease threats. HealthMap is part of a new generation of online systems designed to monitor and visualize, on a real-time basis, disease outbreak alerts as reported by online news media and public health sources. HealthMap is of specific interest for national and international public health organizations and international travelers. A particular task that makes such a surveillance useful is the automated discovery of the geographic references contained in the retrieved outbreak alerts. This task is sometimes referred to as "geo-parsing". A typical approach to geo-parsing would demand an expensive training corpus of alerts manually tagged by a human.</p> <p>Results</p> <p>Given that human readers perform this kind of task by using both their lexical and contextual knowledge, we developed an approach which relies on a relatively small expert-built gazetteer, thus limiting the need of human input, but focuses on learning the context in which geographic references appear. We show in a set of experiments, that this approach exhibits a substantial capacity to discover geographic locations outside of its initial lexicon.</p> <p>Conclusion</p> <p>The results of this analysis provide a framework for future automated global surveillance efforts that reduce manual input and improve timeliness of reporting.</p> http://www.biomedcentral.com/1471-2105/10/385
collection DOAJ
language English
format Article
sources DOAJ
author Freifeld Clark C
Keller Mikaela
Brownstein John S
spellingShingle Freifeld Clark C
Keller Mikaela
Brownstein John S
Automated vocabulary discovery for geo-parsing online epidemic intelligence
BMC Bioinformatics
author_facet Freifeld Clark C
Keller Mikaela
Brownstein John S
author_sort Freifeld Clark C
title Automated vocabulary discovery for geo-parsing online epidemic intelligence
title_short Automated vocabulary discovery for geo-parsing online epidemic intelligence
title_full Automated vocabulary discovery for geo-parsing online epidemic intelligence
title_fullStr Automated vocabulary discovery for geo-parsing online epidemic intelligence
title_full_unstemmed Automated vocabulary discovery for geo-parsing online epidemic intelligence
title_sort automated vocabulary discovery for geo-parsing online epidemic intelligence
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2009-11-01
description <p>Abstract</p> <p>Background</p> <p>Automated surveillance of the Internet provides a timely and sensitive method for alerting on global emerging infectious disease threats. HealthMap is part of a new generation of online systems designed to monitor and visualize, on a real-time basis, disease outbreak alerts as reported by online news media and public health sources. HealthMap is of specific interest for national and international public health organizations and international travelers. A particular task that makes such a surveillance useful is the automated discovery of the geographic references contained in the retrieved outbreak alerts. This task is sometimes referred to as "geo-parsing". A typical approach to geo-parsing would demand an expensive training corpus of alerts manually tagged by a human.</p> <p>Results</p> <p>Given that human readers perform this kind of task by using both their lexical and contextual knowledge, we developed an approach which relies on a relatively small expert-built gazetteer, thus limiting the need of human input, but focuses on learning the context in which geographic references appear. We show in a set of experiments, that this approach exhibits a substantial capacity to discover geographic locations outside of its initial lexicon.</p> <p>Conclusion</p> <p>The results of this analysis provide a framework for future automated global surveillance efforts that reduce manual input and improve timeliness of reporting.</p>
url http://www.biomedcentral.com/1471-2105/10/385
work_keys_str_mv AT freifeldclarkc automatedvocabularydiscoveryforgeoparsingonlineepidemicintelligence
AT kellermikaela automatedvocabularydiscoveryforgeoparsingonlineepidemicintelligence
AT brownsteinjohns automatedvocabularydiscoveryforgeoparsingonlineepidemicintelligence
_version_ 1725861652858929152