Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.

Recording provenance of workflow runs with RO-Crate

Iacopo Colonnelli;
2024-01-01

Abstract

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
2024
Inglese
Comitato scientifico
19
9
1
35
35
GERMANIA
REGNO UNITO DI GRAN BRETAGNA
SPAGNA
AUSTRIA
BELGIO
GIAPPONE
PAESI BASSI
REPUBBLICA CECA
SVIZZERA
   EUROPEAN PILOT FOR EXASCALE
   EUPEX
   European Commission
   Horizon 2020 Framework Programme
   101033975

   HPC BIG DATA ARTIFICIAL INTELLIGENCE CROSS STACK PLATFORM TOWARDS EXASCALE
   ACROSS
   European Commission
   Horizon 2020 Framework Programme
   955648
1 – prodotto con file in versione Open Access (allegherò il file al passo 6 - Carica)
262
18
Simone Leo; Michael R. Crusoe; Laura Rodríguez-Navas; Raül Sirvent; Alexander Kanitz; Paul De Geest; Rudolf Wittner; Luca Pireddu; Daniel Garijo; José...espandi
info:eu-repo/semantics/article
open
03-CONTRIBUTO IN RIVISTA::03A-Articolo su Rivista
File in questo prodotto:
File Dimensione Formato  
journal.pone.0309210.pdf

Accesso aperto

Descrizione: PDF Editoriale
Tipo di file: PDF EDITORIALE
Dimensione 1.88 MB
Formato Adobe PDF
1.88 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2011890
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact