Geospatial Queries on Data Collection Using a Common Provenance Model

Lineage information is the part of the metadata that describes “what”, “when”, “who”, “how”, and “where” geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial...

Full description

Bibliographic Details
Main Authors: Guillem Closa, Joan Masó, Núria Julià, Xavier Pons
Format: Article
Language:English
Published: MDPI AG 2021-03-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/10/3/139
id doaj-f06d0b9b3ba74fc39ee9a2dcee72f096
record_format Article
spelling doaj-f06d0b9b3ba74fc39ee9a2dcee72f0962021-03-06T00:02:59ZengMDPI AGISPRS International Journal of Geo-Information2220-99642021-03-011013913910.3390/ijgi10030139Geospatial Queries on Data Collection Using a Common Provenance ModelGuillem Closa0Joan Masó1Núria Julià2Xavier Pons3Grumets Research Group, CREAF, Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainGrumets Research Group, CREAF, Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainGrumets Research Group, CREAF, Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainGrumets Research Group, Dep de Geografia, Edifici B, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainLineage information is the part of the metadata that describes “what”, “when”, “who”, “how”, and “where” geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph.https://www.mdpi.com/2220-9964/10/3/139provenancelineagegraphdata queriesmetadata
collection DOAJ
language English
format Article
sources DOAJ
author Guillem Closa
Joan Masó
Núria Julià
Xavier Pons
spellingShingle Guillem Closa
Joan Masó
Núria Julià
Xavier Pons
Geospatial Queries on Data Collection Using a Common Provenance Model
ISPRS International Journal of Geo-Information
provenance
lineage
graph
data queries
metadata
author_facet Guillem Closa
Joan Masó
Núria Julià
Xavier Pons
author_sort Guillem Closa
title Geospatial Queries on Data Collection Using a Common Provenance Model
title_short Geospatial Queries on Data Collection Using a Common Provenance Model
title_full Geospatial Queries on Data Collection Using a Common Provenance Model
title_fullStr Geospatial Queries on Data Collection Using a Common Provenance Model
title_full_unstemmed Geospatial Queries on Data Collection Using a Common Provenance Model
title_sort geospatial queries on data collection using a common provenance model
publisher MDPI AG
series ISPRS International Journal of Geo-Information
issn 2220-9964
publishDate 2021-03-01
description Lineage information is the part of the metadata that describes “what”, “when”, “who”, “how”, and “where” geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph.
topic provenance
lineage
graph
data queries
metadata
url https://www.mdpi.com/2220-9964/10/3/139
work_keys_str_mv AT guillemclosa geospatialqueriesondatacollectionusingacommonprovenancemodel
AT joanmaso geospatialqueriesondatacollectionusingacommonprovenancemodel
AT nuriajulia geospatialqueriesondatacollectionusingacommonprovenancemodel
AT xavierpons geospatialqueriesondatacollectionusingacommonprovenancemodel
_version_ 1724230142897684480