Scientific workflows are increasingly characterized by complex task dependencies and large-scale data exchanges, which place significant pressure on the input/output (I/O) systems of traditional Workflow Engines (WFEs). These challenges are particularly evident in data-intensive and real-time processing contexts, where conventional disk-based I/O mechanisms often become performance bottlenecks. This paper presents an approach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware designed to support memory-based streaming I/O. The integration combines DAGonStar's orchestration capabilities with CAPIO's efficient data handling to better support workflows operating on continuous or large-scale datasets. We describe the architectural modifications introduced to enable this collaboration and provide an analysis of the resulting system. The proposed solution aims to improve the responsiveness and flexibility of scientific workflows by streamlining data transfers and simplifying task coordination. This work contributes to the evolution of workflow systems toward more efficient and scalable models for scientific computing.

Streaming I/O for scientific workflow engine acceleration

Santimaria, Marco Edoardo;Montella, Raffaele
Last
2025-01-01

Abstract

Scientific workflows are increasingly characterized by complex task dependencies and large-scale data exchanges, which place significant pressure on the input/output (I/O) systems of traditional Workflow Engines (WFEs). These challenges are particularly evident in data-intensive and real-time processing contexts, where conventional disk-based I/O mechanisms often become performance bottlenecks. This paper presents an approach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware designed to support memory-based streaming I/O. The integration combines DAGonStar's orchestration capabilities with CAPIO's efficient data handling to better support workflows operating on continuous or large-scale datasets. We describe the architectural modifications introduced to enable this collaboration and provide an analysis of the resulting system. The proposed solution aims to improve the responsiveness and flexibility of scientific workflows by streamlining data transfers and simplifying task coordination. This work contributes to the evolution of workflow systems toward more efficient and scalable models for scientific computing.
2025
175
1
13
File-based pipeline acceleration; High-performance computing; In-memory file systems; Scientific workflows; Streaming I/O; Workflow optimization
Perrotta, Simone; De Vita, Ciro Giuseppe; Mellone, Gennaro; Santimaria, Marco Edoardo; Torquati, Massimo; Blas, Javier Garcia; Montella, Raffaele...espandi
File in questo prodotto:
File Dimensione Formato  
Dagoncapio.pdf

Accesso aperto

Descrizione: Articolo PDF
Tipo di file: PDF EDITORIALE
Dimensione 3.54 MB
Formato Adobe PDF
3.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2091750
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact