Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym re...

Full description

Bibliographic Details
Main Authors: Edwin Aldana-Bobadilla, Alejandro Molina-Villegas, Ivan Lopez-Arevalo, Shanel Reyes-Palacios, Victor Muñiz-Sanchez, Jean Arreola-Trapala
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/12/18/3041
id doaj-2a94e8c05d16492f856aa3ed81fb4916
record_format Article
spelling doaj-2a94e8c05d16492f856aa3ed81fb49162020-11-25T03:07:35ZengMDPI AGRemote Sensing2072-42922020-09-01123041304110.3390/rs12183041Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured TextEdwin Aldana-Bobadilla0Alejandro Molina-Villegas1Ivan Lopez-Arevalo2Shanel Reyes-Palacios3Victor Muñiz-Sanchez4Jean Arreola-Trapala5Conacyt-Centro de Investigación y de Estudios Avanzados del I.P.N. (Cinvestav), Victoria 87130, MexicoConacyt-Centro de Investigación en Ciencias de Información Geoespacial (Centrogeo), Mérida 97302, MexicoCentro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas (Cinvestav Tamaulipas), Victoria 87130, MexicoCentro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas (Cinvestav Tamaulipas), Victoria 87130, MexicoCentro de Investigación en Matemáticas (Cimat), Monterrey 66628, MexicoCentro de Investigación en Matemáticas (Cimat), Monterrey 66628, MexicoThe automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called <i>dynamic context disambiguation</i>. Once place names are recognized in an input text, they are solved using a grammar, in which a set of rules specifies how ambiguities could be solved, in a similar way to that which a person would utilize, considering the context. As a result, we have an assignment of the most likely geographic properties of the recognized places. We propose an assessment measure based on a ranking of closeness relative to the predicted and actual locations of a place name. Regarding this measure, our method outperforms OpenStreetMap Nominatim. We include other assessment measures to assess the recognition ability of place names and the prediction of what we called geographic levels (administrative jurisdiction of places).https://www.mdpi.com/2072-4292/12/18/3041geoparsingtoponym resolutiongeographic named entity recognitionnamed entity recognition in Spanish
collection DOAJ
language English
format Article
sources DOAJ
author Edwin Aldana-Bobadilla
Alejandro Molina-Villegas
Ivan Lopez-Arevalo
Shanel Reyes-Palacios
Victor Muñiz-Sanchez
Jean Arreola-Trapala
spellingShingle Edwin Aldana-Bobadilla
Alejandro Molina-Villegas
Ivan Lopez-Arevalo
Shanel Reyes-Palacios
Victor Muñiz-Sanchez
Jean Arreola-Trapala
Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
Remote Sensing
geoparsing
toponym resolution
geographic named entity recognition
named entity recognition in Spanish
author_facet Edwin Aldana-Bobadilla
Alejandro Molina-Villegas
Ivan Lopez-Arevalo
Shanel Reyes-Palacios
Victor Muñiz-Sanchez
Jean Arreola-Trapala
author_sort Edwin Aldana-Bobadilla
title Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
title_short Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
title_full Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
title_fullStr Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
title_full_unstemmed Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
title_sort adaptive geoparsing method for toponym recognition and resolution in unstructured text
publisher MDPI AG
series Remote Sensing
issn 2072-4292
publishDate 2020-09-01
description The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called <i>dynamic context disambiguation</i>. Once place names are recognized in an input text, they are solved using a grammar, in which a set of rules specifies how ambiguities could be solved, in a similar way to that which a person would utilize, considering the context. As a result, we have an assignment of the most likely geographic properties of the recognized places. We propose an assessment measure based on a ranking of closeness relative to the predicted and actual locations of a place name. Regarding this measure, our method outperforms OpenStreetMap Nominatim. We include other assessment measures to assess the recognition ability of place names and the prediction of what we called geographic levels (administrative jurisdiction of places).
topic geoparsing
toponym resolution
geographic named entity recognition
named entity recognition in Spanish
url https://www.mdpi.com/2072-4292/12/18/3041
work_keys_str_mv AT edwinaldanabobadilla adaptivegeoparsingmethodfortoponymrecognitionandresolutioninunstructuredtext
AT alejandromolinavillegas adaptivegeoparsingmethodfortoponymrecognitionandresolutioninunstructuredtext
AT ivanlopezarevalo adaptivegeoparsingmethodfortoponymrecognitionandresolutioninunstructuredtext
AT shanelreyespalacios adaptivegeoparsingmethodfortoponymrecognitionandresolutioninunstructuredtext
AT victormunizsanchez adaptivegeoparsingmethodfortoponymrecognitionandresolutioninunstructuredtext
AT jeanarreolatrapala adaptivegeoparsingmethodfortoponymrecognitionandresolutioninunstructuredtext
_version_ 1724669710615707648