With the increasing amount of digital data available for analysis and simulation, the class of I/O-intensive HPC workflows is fated to quickly expand, further exacerbating the performance gap between computing, memory, and storage technologies. This paper introduces CAPIO (Cross-Application Programmable I/O), a middleware capable of injecting I/O streaming capabilities into file-based workflows, improving the computation-I/O overlap without the need to change the application code. The contribution is twofold: 1) at design time, a new I/O coordination language allows users to annotate workflow data dependencies with synchronization semantics; 2) at run time, a user-space middleware automatically and transparently to the user turns a workflow batch execution into a streaming execution according to the semantics expressed in the configuration file. CAPIO has been tested on synthetic benchmarks simulating typical workflow I/O patterns and two real-world workflows. Experiments show that CAPIO reduces the execution time by 10% to 66% for data-intensive workflows that use the file system as a communication medium.

CAPIO: a Middleware for Transparent I/O Streaming in Data-Intensive Workflows

Alberto Riccardo Martinelli
First
;
Marco Aldinucci;Iacopo Colonnelli;Barbara Cantalupo
2023-01-01

Abstract

With the increasing amount of digital data available for analysis and simulation, the class of I/O-intensive HPC workflows is fated to quickly expand, further exacerbating the performance gap between computing, memory, and storage technologies. This paper introduces CAPIO (Cross-Application Programmable I/O), a middleware capable of injecting I/O streaming capabilities into file-based workflows, improving the computation-I/O overlap without the need to change the application code. The contribution is twofold: 1) at design time, a new I/O coordination language allows users to annotate workflow data dependencies with synchronization semantics; 2) at run time, a user-space middleware automatically and transparently to the user turns a workflow batch execution into a streaming execution according to the semantics expressed in the configuration file. CAPIO has been tested on synthetic benchmarks simulating typical workflow I/O patterns and two real-world workflows. Experiments show that CAPIO reduces the execution time by 10% to 66% for data-intensive workflows that use the file system as a communication medium.
2023
2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)
Goa, India
18-21 Dicembre 2023
30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)
IEEE
153
163
979-8-3503-8322-5
Workflow, In situ model, I/O coordination
Alberto Riccardo Martinelli, Massimo Torquati, Marco Aldinucci, Iacopo Colonnelli, Barbara Cantalupo
File in questo prodotto:
File Dimensione Formato  
CAPIO.pdf

Accesso riservato

Descrizione: PDF Editoriale
Tipo di file: PDF EDITORIALE
Dimensione 731.41 kB
Formato Adobe PDF
731.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
CAPIO-HiPC23-preprint.pdf

Accesso aperto

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1948632
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact