Advances in big data and the growth in complexity of modern applications highlight the necessity for optimizing workflow executions on different levels, such as hybrid workflow executions, automatic optimization of data movements, and efficient use of IO. Following this line, streaming features are the desired capabilities for file-based workflows as they can reduce overall execution times. Expanding workflows with streaming capabilities usually requires rewriting the application, which is time-consuming and requires deep knowledge of the application. With this work, we introduce the Cross-Application Programmable IO (CAPIO) methodology, of which the stack is composed of two parts: the CAPIO-CL coordination language and the CAPIO middleware (which implements the semantics expressed by the CAPIO-CL coordination language). The CAPIO-CL coordination language annotates synchronization semantics between files produced and consumed by workflow steps. At the same time, the CAPIO middleware improves the performance of file-based workflows, leveraging the information provided by the CAPIO-CL language while not having to change (recompile) the code of the original workflow steps. By design, the CAPIO middleware supports multiple backends and can be extended to support more. It is dynamic, and it supports dynamic job scheduling. Benchmarks, done on both microbenchmarks and real-life workflows, prove that with CAPIO, it is possible to reduce the workflow execution time by up to (Formula presented) .
Dynamic transparent streaming in file-based workflows with CAPIO
Santimaria M. E.;Colonnelli I.;Cantalupo B.;Medic D.;Sciacca E.;Aldinucci M.
2025-01-01
Abstract
Advances in big data and the growth in complexity of modern applications highlight the necessity for optimizing workflow executions on different levels, such as hybrid workflow executions, automatic optimization of data movements, and efficient use of IO. Following this line, streaming features are the desired capabilities for file-based workflows as they can reduce overall execution times. Expanding workflows with streaming capabilities usually requires rewriting the application, which is time-consuming and requires deep knowledge of the application. With this work, we introduce the Cross-Application Programmable IO (CAPIO) methodology, of which the stack is composed of two parts: the CAPIO-CL coordination language and the CAPIO middleware (which implements the semantics expressed by the CAPIO-CL coordination language). The CAPIO-CL coordination language annotates synchronization semantics between files produced and consumed by workflow steps. At the same time, the CAPIO middleware improves the performance of file-based workflows, leveraging the information provided by the CAPIO-CL language while not having to change (recompile) the code of the original workflow steps. By design, the CAPIO middleware supports multiple backends and can be extended to support more. It is dynamic, and it supports dynamic job scheduling. Benchmarks, done on both microbenchmarks and real-life workflows, prove that with CAPIO, it is possible to reduce the workflow execution time by up to (Formula presented) .| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S0167739X25004534-main.pdf
Accesso aperto
Descrizione: PDF Editoriale
Tipo di file:
PDF EDITORIALE
Dimensione
9.7 MB
Formato
Adobe PDF
|
9.7 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



