A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a...

Full description

Bibliographic Details
Main Authors: Feng Zhang, Jingwei Zhou, Renyi Liu, Zhenhong Du, Xinyue Ye
Format: Article
Language:English
Published: MDPI AG 2016-09-01
Series:Sustainability
Subjects:
Online Access:http://www.mdpi.com/2071-1050/8/9/926
id doaj-a91e053dedf54ff29abd1a834652a283
record_format Article
spelling doaj-a91e053dedf54ff29abd1a834652a2832020-11-24T22:27:26ZengMDPI AGSustainability2071-10502016-09-018992610.3390/su8090926su8090926A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for SustainabilityFeng Zhang0Jingwei Zhou1Renyi Liu2Zhenhong Du3Xinyue Ye4Zhejiang Provincial Key Laboratory of Geographic Information Science, Department of Earth Sciences, Zhejiang University, 148 Tianmushan Road, Hangzhou 310028, ChinaSchool of the Earth Sciences, Zhejiang University, 38 Zheda Road, Hangzhou 310027, ChinaZhejiang Provincial Key Laboratory of Geographic Information Science, Department of Earth Sciences, Zhejiang University, 148 Tianmushan Road, Hangzhou 310028, ChinaZhejiang Provincial Key Laboratory of Geographic Information Science, Department of Earth Sciences, Zhejiang University, 148 Tianmushan Road, Hangzhou 310028, ChinaDepartment of Geography, Kent State University, Kent, OH 44240, USASustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a fundamental method for making data more informative with respect to spatial relations. The dramatic growth of data volumes has led to increased focus on high-performance large-scale spatial join. In this paper, we present Spatial Join with Spark (SJS), a proposed high-performance algorithm, that uses a simple, but efficient, uniform spatial grid to partition datasets and joins the partitions with the built-in join transformation of Spark. SJS utilizes the distributed in-memory iterative computation of Spark, then introduces a calculation-evaluating model and in-memory spatial repartition technology, which optimize the initial partition by evaluating the calculation amount of local join algorithms without any disk access. We compare four in-memory spatial join algorithms in SJS for further performance improvement. Based on extensive experiments with real-world data, we conclude that SJS outperforms the Spark and MapReduce implementations of earlier spatial join approaches. This study demonstrates that it is promising to leverage high-performance computing for large-scale spatial join analysis. The availability of large-sized geo-referenced datasets along with the high-performance computing technology can raise great opportunities for sustainability research on whether and how these new trends in data and technology can be utilized to help detect the associated trends and patterns in the human-environment dynamics.http://www.mdpi.com/2071-1050/8/9/926spatial joinparallel computingSparkperformance
collection DOAJ
language English
format Article
sources DOAJ
author Feng Zhang
Jingwei Zhou
Renyi Liu
Zhenhong Du
Xinyue Ye
spellingShingle Feng Zhang
Jingwei Zhou
Renyi Liu
Zhenhong Du
Xinyue Ye
A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
Sustainability
spatial join
parallel computing
Spark
performance
author_facet Feng Zhang
Jingwei Zhou
Renyi Liu
Zhenhong Du
Xinyue Ye
author_sort Feng Zhang
title A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
title_short A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
title_full A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
title_fullStr A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
title_full_unstemmed A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
title_sort new design of high-performance large-scale gis computing at a finer spatial granularity: a case study of spatial join with spark for sustainability
publisher MDPI AG
series Sustainability
issn 2071-1050
publishDate 2016-09-01
description Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a fundamental method for making data more informative with respect to spatial relations. The dramatic growth of data volumes has led to increased focus on high-performance large-scale spatial join. In this paper, we present Spatial Join with Spark (SJS), a proposed high-performance algorithm, that uses a simple, but efficient, uniform spatial grid to partition datasets and joins the partitions with the built-in join transformation of Spark. SJS utilizes the distributed in-memory iterative computation of Spark, then introduces a calculation-evaluating model and in-memory spatial repartition technology, which optimize the initial partition by evaluating the calculation amount of local join algorithms without any disk access. We compare four in-memory spatial join algorithms in SJS for further performance improvement. Based on extensive experiments with real-world data, we conclude that SJS outperforms the Spark and MapReduce implementations of earlier spatial join approaches. This study demonstrates that it is promising to leverage high-performance computing for large-scale spatial join analysis. The availability of large-sized geo-referenced datasets along with the high-performance computing technology can raise great opportunities for sustainability research on whether and how these new trends in data and technology can be utilized to help detect the associated trends and patterns in the human-environment dynamics.
topic spatial join
parallel computing
Spark
performance
url http://www.mdpi.com/2071-1050/8/9/926
work_keys_str_mv AT fengzhang anewdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT jingweizhou anewdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT renyiliu anewdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT zhenhongdu anewdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT xinyueye anewdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT fengzhang newdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT jingweizhou newdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT renyiliu newdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT zhenhongdu newdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
AT xinyueye newdesignofhighperformancelargescalegiscomputingatafinerspatialgranularityacasestudyofspatialjoinwithsparkforsustainability
_version_ 1725749980460744704