SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for...

Full description

Bibliographic Details
Main Authors:	J. Boehm, K. Liu, C. Alis
Format:	Article
Language:	English
Published:	Copernicus Publications 2016-06-01
Series:	The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:	https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B2/343/2016/isprs-archives-XLI-B2-343-2016.pdf

id	doaj-1087729b4c3543bb8862fd7f1bbf7272
record_format	Article
spelling	doaj-1087729b4c3543bb8862fd7f1bbf72722020-11-24T21:01:38ZengCopernicus PublicationsThe International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences1682-17502194-90342016-06-01XLI-B234334810.5194/isprs-archives-XLI-B2-343-2016SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINEJ. Boehm0K. Liu1C. Alis2Dept. of Civil, Environmental and Geomatic Engineering, University College London, UKDept. of Civil, Environmental and Geomatic Engineering, University College London, UKDept. of Civil, Environmental and Geomatic Engineering, University College London, UKIn the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B2/343/2016/isprs-archives-XLI-B2-343-2016.pdf
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	J. Boehm K. Liu C. Alis
spellingShingle	J. Boehm K. Liu C. Alis SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
author_facet	J. Boehm K. Liu C. Alis
author_sort	J. Boehm
title	SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
title_short	SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
title_full	SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
title_fullStr	SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
title_full_unstemmed	SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
title_sort	sideloading – ingestion of large point clouds into the apache spark big data engine
publisher	Copernicus Publications
series	The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
issn	1682-1750 2194-9034
publishDate	2016-06-01
description	In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.
url	https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLI-B2/343/2016/isprs-archives-XLI-B2-343-2016.pdf
work_keys_str_mv	AT jboehm sideloadingingestionoflargepointcloudsintotheapachesparkbigdataengine AT kliu sideloadingingestionoflargepointcloudsintotheapachesparkbigdataengine AT calis sideloadingingestionoflargepointcloudsintotheapachesparkbigdataengine
_version_	1716777450360274944

SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

Similar Items