Geospatial Queries on Data Collection Using a Common Provenance Model
Lineage information is the part of the metadata that describes “what”, “when”, “who”, “how”, and “where” geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-03-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2220-9964/10/3/139 |
id |
doaj-f06d0b9b3ba74fc39ee9a2dcee72f096 |
---|---|
record_format |
Article |
spelling |
doaj-f06d0b9b3ba74fc39ee9a2dcee72f0962021-03-06T00:02:59ZengMDPI AGISPRS International Journal of Geo-Information2220-99642021-03-011013913910.3390/ijgi10030139Geospatial Queries on Data Collection Using a Common Provenance ModelGuillem Closa0Joan Masó1Núria Julià2Xavier Pons3Grumets Research Group, CREAF, Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainGrumets Research Group, CREAF, Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainGrumets Research Group, CREAF, Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainGrumets Research Group, Dep de Geografia, Edifici B, Universitat Autònoma de Barcelona, 08193 Bellaterra, Catalonia, SpainLineage information is the part of the metadata that describes “what”, “when”, “who”, “how”, and “where” geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph.https://www.mdpi.com/2220-9964/10/3/139provenancelineagegraphdata queriesmetadata |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Guillem Closa Joan Masó Núria Julià Xavier Pons |
spellingShingle |
Guillem Closa Joan Masó Núria Julià Xavier Pons Geospatial Queries on Data Collection Using a Common Provenance Model ISPRS International Journal of Geo-Information provenance lineage graph data queries metadata |
author_facet |
Guillem Closa Joan Masó Núria Julià Xavier Pons |
author_sort |
Guillem Closa |
title |
Geospatial Queries on Data Collection Using a Common Provenance Model |
title_short |
Geospatial Queries on Data Collection Using a Common Provenance Model |
title_full |
Geospatial Queries on Data Collection Using a Common Provenance Model |
title_fullStr |
Geospatial Queries on Data Collection Using a Common Provenance Model |
title_full_unstemmed |
Geospatial Queries on Data Collection Using a Common Provenance Model |
title_sort |
geospatial queries on data collection using a common provenance model |
publisher |
MDPI AG |
series |
ISPRS International Journal of Geo-Information |
issn |
2220-9964 |
publishDate |
2021-03-01 |
description |
Lineage information is the part of the metadata that describes “what”, “when”, “who”, “how”, and “where” geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph. |
topic |
provenance lineage graph data queries metadata |
url |
https://www.mdpi.com/2220-9964/10/3/139 |
work_keys_str_mv |
AT guillemclosa geospatialqueriesondatacollectionusingacommonprovenancemodel AT joanmaso geospatialqueriesondatacollectionusingacommonprovenancemodel AT nuriajulia geospatialqueriesondatacollectionusingacommonprovenancemodel AT xavierpons geospatialqueriesondatacollectionusingacommonprovenancemodel |
_version_ |
1724230142897684480 |