GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS

One viewpoint of current and future IT systems holds that there is an increase in the scale and velocity at which data are acquired and analysed from heterogeneous, dynamic sources. In the earth observation and geoinformatics domains, this process is driven by the increase in number and types of d...

Full description

Bibliographic Details
Main Authors: G. McFerren, T. van Zyl
Format: Article
Language:English
Published: Copernicus Publications 2016-06-01
Series:The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B7/931/2016/isprs-archives-XLI-B7-931-2016.pdf
id doaj-ac665ed013a1405e81f48b46c6f6929a
record_format Article
spelling doaj-ac665ed013a1405e81f48b46c6f6929a2020-11-25T01:45:11ZengCopernicus PublicationsThe International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences1682-17502194-90342016-06-01XLI-B793193710.5194/isprs-archives-XLI-B7-931-2016GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTSG. McFerren0T. van Zyl1CSIR Meraka Institute, Meiring Naudé Road, Brummeria, Pretoria, South AfricaSchool of Computer Science and Applied Mathematics, University of the Witwatersrand, 1 Jan Smuts Avenue, Braamfontein 2000, Johannesburg, South AfricaOne viewpoint of current and future IT systems holds that there is an increase in the scale and velocity at which data are acquired and analysed from heterogeneous, dynamic sources. In the earth observation and geoinformatics domains, this process is driven by the increase in number and types of devices that report location and the proliferation of assorted sensors, from satellite constellations to oceanic buoy arrays. Much of these data will be encountered as self-contained messages on data streams - continuous, infinite flows of data. Spatial analytics over data streams concerns the search for spatial and spatio-temporal relationships within and amongst data “on the move”. In spatial databases, queries can assess a store of data to unpack spatial relationships; this is not the case on streams, where spatial relationships need to be established with the incomplete data available. Methods for spatially-based indexing, filtering, joining and transforming of streaming data need to be established and implemented in software components. This article describes the usage patterns and performance metrics of a number of well known FOSS4G Python software libraries within the data stream processing paradigm. In particular, we consider the RTree library for spatial indexing, the Shapely library for geometric processing and transformation and the PyProj library for projection and geodesic calculations over streams of geospatial data. We introduce a message oriented Python-based geospatial data streaming framework called Swordfish, which provides data stream processing primitives, functions, transports and a common data model for describing messages, based on the Open Geospatial Consortium Observations and Measurements (O&M) and Unidata Common Data Model (CDM) standards. We illustrate how the geospatial software components are integrated with the Swordfish framework. Furthermore, we describe the tight temporal constraints under which geospatial functionality can be invoked when processing high velocity, potentially infinite geospatial data streams. The article discusses the performance of these libraries under simulated streaming loads (size, complexity and volume of messages) and how they can be deployed and utilised with Swordfish under real load scenarios, illustrated by a set of Vessel Automatic Identification System (AIS) use cases. We conclude that the described software libraries are able to perform adequately under geospatial data stream processing scenarios - many real application use cases will be handled sufficiently by the software.https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B7/931/2016/isprs-archives-XLI-B7-931-2016.pdf
collection DOAJ
language English
format Article
sources DOAJ
author G. McFerren
T. van Zyl
spellingShingle G. McFerren
T. van Zyl
GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
author_facet G. McFerren
T. van Zyl
author_sort G. McFerren
title GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS
title_short GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS
title_full GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS
title_fullStr GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS
title_full_unstemmed GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS
title_sort geospatial data stream processing in python using foss4g components
publisher Copernicus Publications
series The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
issn 1682-1750
2194-9034
publishDate 2016-06-01
description One viewpoint of current and future IT systems holds that there is an increase in the scale and velocity at which data are acquired and analysed from heterogeneous, dynamic sources. In the earth observation and geoinformatics domains, this process is driven by the increase in number and types of devices that report location and the proliferation of assorted sensors, from satellite constellations to oceanic buoy arrays. Much of these data will be encountered as self-contained messages on data streams - continuous, infinite flows of data. Spatial analytics over data streams concerns the search for spatial and spatio-temporal relationships within and amongst data “on the move”. In spatial databases, queries can assess a store of data to unpack spatial relationships; this is not the case on streams, where spatial relationships need to be established with the incomplete data available. Methods for spatially-based indexing, filtering, joining and transforming of streaming data need to be established and implemented in software components. This article describes the usage patterns and performance metrics of a number of well known FOSS4G Python software libraries within the data stream processing paradigm. In particular, we consider the RTree library for spatial indexing, the Shapely library for geometric processing and transformation and the PyProj library for projection and geodesic calculations over streams of geospatial data. We introduce a message oriented Python-based geospatial data streaming framework called Swordfish, which provides data stream processing primitives, functions, transports and a common data model for describing messages, based on the Open Geospatial Consortium Observations and Measurements (O&M) and Unidata Common Data Model (CDM) standards. We illustrate how the geospatial software components are integrated with the Swordfish framework. Furthermore, we describe the tight temporal constraints under which geospatial functionality can be invoked when processing high velocity, potentially infinite geospatial data streams. The article discusses the performance of these libraries under simulated streaming loads (size, complexity and volume of messages) and how they can be deployed and utilised with Swordfish under real load scenarios, illustrated by a set of Vessel Automatic Identification System (AIS) use cases. We conclude that the described software libraries are able to perform adequately under geospatial data stream processing scenarios - many real application use cases will be handled sufficiently by the software.
url https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B7/931/2016/isprs-archives-XLI-B7-931-2016.pdf
work_keys_str_mv AT gmcferren geospatialdatastreamprocessinginpythonusingfoss4gcomponents
AT tvanzyl geospatialdatastreamprocessinginpythonusingfoss4gcomponents
_version_ 1725024636704391168