Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were a...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-04-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | http://www.mdpi.com/2220-9964/7/4/144 |
id |
doaj-9aae74b79cf54feca0d3cb0663e325e5 |
---|---|
record_format |
Article |
spelling |
doaj-9aae74b79cf54feca0d3cb0663e325e52020-11-24T21:01:42ZengMDPI AGISPRS International Journal of Geo-Information2220-99642018-04-017414410.3390/ijgi7040144ijgi7040144Evaluating the Open Source Data Containers for Handling Big Geospatial Raster DataFei Hu0Mengchao Xu1Jingchao Yang2Yanshou Liang3Kejin Cui4Michael M. Little5Christopher S. Lynnes6Daniel Q. Duffy7Chaowei Yang8NSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USANSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USANSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USANSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USANSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USANASA Goddard Space Flight Center, Greenbelt, MD 20771, USANASA Goddard Space Flight Center, Greenbelt, MD 20771, USANASA Goddard Space Flight Center, Greenbelt, MD 20771, USANSF Spatiotemporal Innovation Center and Department of Geography and GeoInformation Science, George Mason University, Fairfax, VA 22030, USABig geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.http://www.mdpi.com/2220-9964/7/4/144big datadata containergeospatial raster data managementGIS |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Fei Hu Mengchao Xu Jingchao Yang Yanshou Liang Kejin Cui Michael M. Little Christopher S. Lynnes Daniel Q. Duffy Chaowei Yang |
spellingShingle |
Fei Hu Mengchao Xu Jingchao Yang Yanshou Liang Kejin Cui Michael M. Little Christopher S. Lynnes Daniel Q. Duffy Chaowei Yang Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data ISPRS International Journal of Geo-Information big data data container geospatial raster data management GIS |
author_facet |
Fei Hu Mengchao Xu Jingchao Yang Yanshou Liang Kejin Cui Michael M. Little Christopher S. Lynnes Daniel Q. Duffy Chaowei Yang |
author_sort |
Fei Hu |
title |
Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data |
title_short |
Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data |
title_full |
Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data |
title_fullStr |
Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data |
title_full_unstemmed |
Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data |
title_sort |
evaluating the open source data containers for handling big geospatial raster data |
publisher |
MDPI AG |
series |
ISPRS International Journal of Geo-Information |
issn |
2220-9964 |
publishDate |
2018-04-01 |
description |
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability. |
topic |
big data data container geospatial raster data management GIS |
url |
http://www.mdpi.com/2220-9964/7/4/144 |
work_keys_str_mv |
AT feihu evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT mengchaoxu evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT jingchaoyang evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT yanshouliang evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT kejincui evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT michaelmlittle evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT christopherslynnes evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT danielqduffy evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata AT chaoweiyang evaluatingtheopensourcedatacontainersforhandlingbiggeospatialrasterdata |
_version_ |
1716777162081566720 |