This work introduces a novel double-sided streaming methodology that combines control-plane and data-plane streaming. Our goal is to implement the long-advocated separation of concerns in workflow orchestration without introducing artificial boundaries in their execution. Our approach is exemplified by the integration of control-plane streaming provided by dispel4py and the transparent data-plane streaming provided by CAPIO. Our integration eliminates file synchronization barriers without requiring modifications to existing workflow logic. To support this, we extend CAPIO with a new commit rule that allows streaming over dynamically generated file sets, enabling hybrid workflows that blend in-memory dataflows with file-based communication. We validate our approach using a real-world seismic cross-correlation workflow, achieving performance improvements between 23% and 40%. Unlike previous solutions, our method supports streaming across the entire workflow, including phase boundaries where file I/O would typically enforce strict execution ordering. Therefore, our approach can be straightforwardly extended to other multi-stage streaming applications.

Overcoming Dynamic I/O Boundaries: a Double-Sided Streaming Methodology with dispel4py and CAPIO

Santimaria M. E.;Medic D.;Colonnelli I.;Aldinucci M.
2025-01-01

Abstract

This work introduces a novel double-sided streaming methodology that combines control-plane and data-plane streaming. Our goal is to implement the long-advocated separation of concerns in workflow orchestration without introducing artificial boundaries in their execution. Our approach is exemplified by the integration of control-plane streaming provided by dispel4py and the transparent data-plane streaming provided by CAPIO. Our integration eliminates file synchronization barriers without requiring modifications to existing workflow logic. To support this, we extend CAPIO with a new commit rule that allows streaming over dynamically generated file sets, enabling hybrid workflows that blend in-memory dataflows with file-based communication. We validate our approach using a real-world seismic cross-correlation workflow, achieving performance improvements between 23% and 40%. Unlike previous solutions, our method supports streaming across the entire workflow, including phase boundaries where file I/O would typically enforce strict execution ordering. Therefore, our approach can be straightforwardly extended to other multi-stage streaming applications.
2025
International Conference for High Performance Computing, Networking, Storage and Analysis (was Supercomputing Conference)
Saint Louis, MO, USA
2025
Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
Association for Computing Machinery
2269
2280
979-8-4007-1871-7
Workflow Management Systems, dispel4py, CAPIO, cross correlation, control-plane streaming, data-plane streaming
Santimaria M.E.; Filgueira R.; Medic D.; Colonnelli I.; Aldinucci M.
File in questo prodotto:
File Dimensione Formato  
3731599.3767577.pdf

Accesso aperto

Descrizione: PDF Editoriale
Tipo di file: PDF EDITORIALE
Dimensione 1.18 MB
Formato Adobe PDF
1.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2108570
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact