GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources

With the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. Thus, the “big geospatial data management” issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering di...

Full description

Bibliographic Details
Published in:ISPRS International Journal of Geo-Information
Main Authors: Chih-Yuan Huang, Hao Chang
Format: Article
Language:English
Published: MDPI AG 2016-08-01
Subjects:
Online Access:http://www.mdpi.com/2220-9964/5/8/136
_version_ 1852742898021302272
author Chih-Yuan Huang
Hao Chang
author_facet Chih-Yuan Huang
Hao Chang
author_sort Chih-Yuan Huang
collection DOAJ
container_title ISPRS International Journal of Geo-Information
description With the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. Thus, the “big geospatial data management” issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the WWW, users cannot find resources of their interests efficiently. While the WWW has Web search engines addressing web resource discovery issues, we envision that the geospatial Web (i.e., GeoWeb) also requires GeoWeb search engines. To realize a GeoWeb search engine, one of the first steps is to proactively discover GeoWeb resources on the WWW. Hence, in this study, we propose the GeoWeb Crawler, an extensible Web crawling framework that can find various types of GeoWeb resources, such as Open Geospatial Consortium (OGC) web services, Keyhole Markup Language (KML) and Environmental Systems Research Institute, Inc (ESRI) Shapefiles. In addition, we apply the distributed computing concept to promote the performance of the GeoWeb Crawler. The result shows that for 10 targeted resources types, the GeoWeb Crawler discovered 7351 geospatial services and 194,003 datasets. As a result, the proposed GeoWeb Crawler framework is proven to be extensible and scalable to provide a comprehensive index of GeoWeb.
format Article
id doaj-art-e481483edfe04749b7e335b41b5f3c4e
institution Directory of Open Access Journals
issn 2220-9964
language English
publishDate 2016-08-01
publisher MDPI AG
record_format Article
spelling doaj-art-e481483edfe04749b7e335b41b5f3c4e2025-08-19T21:03:49ZengMDPI AGISPRS International Journal of Geo-Information2220-99642016-08-015813610.3390/ijgi5080136ijgi5080136GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web ResourcesChih-Yuan Huang0Hao Chang1Center for Space and Remote Sensing Research, National Central University, Taoyuan 320, TaiwanDepartment of Civil Engineering, National Central University, Taoyuan 320, TaiwanWith the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. Thus, the “big geospatial data management” issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the WWW, users cannot find resources of their interests efficiently. While the WWW has Web search engines addressing web resource discovery issues, we envision that the geospatial Web (i.e., GeoWeb) also requires GeoWeb search engines. To realize a GeoWeb search engine, one of the first steps is to proactively discover GeoWeb resources on the WWW. Hence, in this study, we propose the GeoWeb Crawler, an extensible Web crawling framework that can find various types of GeoWeb resources, such as Open Geospatial Consortium (OGC) web services, Keyhole Markup Language (KML) and Environmental Systems Research Institute, Inc (ESRI) Shapefiles. In addition, we apply the distributed computing concept to promote the performance of the GeoWeb Crawler. The result shows that for 10 targeted resources types, the GeoWeb Crawler discovered 7351 geospatial services and 194,003 datasets. As a result, the proposed GeoWeb Crawler framework is proven to be extensible and scalable to provide a comprehensive index of GeoWeb.http://www.mdpi.com/2220-9964/5/8/136Geospatial Webresource discoveryWeb crawlerOpen Geospatial Consortium
spellingShingle Chih-Yuan Huang
Hao Chang
GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
Geospatial Web
resource discovery
Web crawler
Open Geospatial Consortium
title GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
title_full GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
title_fullStr GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
title_full_unstemmed GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
title_short GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
title_sort geoweb crawler an extensible and scalable web crawling framework for discovering geospatial web resources
topic Geospatial Web
resource discovery
Web crawler
Open Geospatial Consortium
url http://www.mdpi.com/2220-9964/5/8/136
work_keys_str_mv AT chihyuanhuang geowebcrawleranextensibleandscalablewebcrawlingframeworkfordiscoveringgeospatialwebresources
AT haochang geowebcrawleranextensibleandscalablewebcrawlingframeworkfordiscoveringgeospatialwebresources