Using Introspection to Collect Provenance in R

Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDat...

Full description

Bibliographic Details
Main Authors: Barbara Lerner, Emery Boose, Luis Perez
Format: Article
Language:English
Published: MDPI AG 2018-03-01
Series:Informatics
Subjects:
R
Online Access:http://www.mdpi.com/2227-9709/5/1/12
id doaj-961fe314101644e5828b811d633c5a83
record_format Article
spelling doaj-961fe314101644e5828b811d633c5a832020-11-25T00:45:26ZengMDPI AGInformatics2227-97092018-03-01511210.3390/informatics5010012informatics5010012Using Introspection to Collect Provenance in RBarbara Lerner0Emery Boose1Luis Perez2Computer Science Department, Mount Holyoke College, South Hadley, MA 01075, USAHarvard Forest, Harvard University, Petersham, MA 01366, USAHarvard College, Cambridge, MA 02138, USAData provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.http://www.mdpi.com/2227-9709/5/1/12scientific data provenanceprovenance captureprovenance granularityRintrospection
collection DOAJ
language English
format Article
sources DOAJ
author Barbara Lerner
Emery Boose
Luis Perez
spellingShingle Barbara Lerner
Emery Boose
Luis Perez
Using Introspection to Collect Provenance in R
Informatics
scientific data provenance
provenance capture
provenance granularity
R
introspection
author_facet Barbara Lerner
Emery Boose
Luis Perez
author_sort Barbara Lerner
title Using Introspection to Collect Provenance in R
title_short Using Introspection to Collect Provenance in R
title_full Using Introspection to Collect Provenance in R
title_fullStr Using Introspection to Collect Provenance in R
title_full_unstemmed Using Introspection to Collect Provenance in R
title_sort using introspection to collect provenance in r
publisher MDPI AG
series Informatics
issn 2227-9709
publishDate 2018-03-01
description Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.
topic scientific data provenance
provenance capture
provenance granularity
R
introspection
url http://www.mdpi.com/2227-9709/5/1/12
work_keys_str_mv AT barbaralerner usingintrospectiontocollectprovenanceinr
AT emeryboose usingintrospectiontocollectprovenanceinr
AT luisperez usingintrospectiontocollectprovenanceinr
_version_ 1725270207228805120