As the volume of digital data available for analysis and simulation continues to surge, the realm of I/O-intensive HPC workflows is poised for rapid expansion. This trend, however, threatens to widen the performance gap between computing, memory, and storage technologies, underscoring the criticality of our research. A workflow describes a sequence of application steps and their control/- data dependencies. In the HPC context, data dependencies are usually streamlined by storing data files in the distributed storage system in producer steps and read out by consumer steps afterward. However, with the increasing gap between the computation speed and the speed of the central storage system and the ever-increasing amount of data produced in scientific applications, sharing files between workflow steps through the file system is costly. Burst buffers and user-space ad-hoc file systems have been proposed to increase the available I/O bandwidth and reduce the contention on the shared file system by leveraging fast local storage. However, workflow steps need to be executed orderly according to the data dependency graph, and it could be difficult, or even impossible, to exploit pipeline parallelism among them. In-situ workflows were proposed to mitigate or avoid the cost of profoundly relying on the file system as communication media and enable temporal parallelism between workflow steps. In in-situ workflows, multiple steps are executed concurrently; data dependencies are accomplished by sidestepping the file system through explicit coordination mechanisms among workflow steps. However, it is not always desirable, or even possible (e.g., legacy code), to rewrite or patch existing workflows to enable in-situ orchestration by using specific frameworks. For this reason, we propose CAPIO (Cross- Application Programmable I/O), a middleware capable of transparently injecting I/O streaming capabilities into file-based workflows, improving the iii computation-I/O overlap without modifying the business code. The contribution is twofold: at design time, a new I/O coordination language allows users to annotate workflow data dependencies with synchronization semantics; at run time, a user-space software layer automatically turns a batch execution into a streaming execution according to the semantics expressed in the configuration file. CAPIO has been tested on synthetic benchmarks simulating typical I/O workflow patterns and three real-world workflows. The results show how CAPIO provides performance improvements in dataintensive workflows that extensively use the file system as a communication medium. Looking ahead, tools like CAPIO could reshape HPC workflow orchestration strategies. For instance, they might enable the distribution of pre-and post-processing I/O-intensive phases across computation phases, fostering better overlap between steps by reducing applications’ peak I/O demands. This prospect underscores the transformative potential of our research.

CAPIO: Cross-Application Programmable I/O(2024 Oct 16).

CAPIO: Cross-Application Programmable I/O

MARTINELLI, Alberto Riccardo
2024-10-16

Abstract

As the volume of digital data available for analysis and simulation continues to surge, the realm of I/O-intensive HPC workflows is poised for rapid expansion. This trend, however, threatens to widen the performance gap between computing, memory, and storage technologies, underscoring the criticality of our research. A workflow describes a sequence of application steps and their control/- data dependencies. In the HPC context, data dependencies are usually streamlined by storing data files in the distributed storage system in producer steps and read out by consumer steps afterward. However, with the increasing gap between the computation speed and the speed of the central storage system and the ever-increasing amount of data produced in scientific applications, sharing files between workflow steps through the file system is costly. Burst buffers and user-space ad-hoc file systems have been proposed to increase the available I/O bandwidth and reduce the contention on the shared file system by leveraging fast local storage. However, workflow steps need to be executed orderly according to the data dependency graph, and it could be difficult, or even impossible, to exploit pipeline parallelism among them. In-situ workflows were proposed to mitigate or avoid the cost of profoundly relying on the file system as communication media and enable temporal parallelism between workflow steps. In in-situ workflows, multiple steps are executed concurrently; data dependencies are accomplished by sidestepping the file system through explicit coordination mechanisms among workflow steps. However, it is not always desirable, or even possible (e.g., legacy code), to rewrite or patch existing workflows to enable in-situ orchestration by using specific frameworks. For this reason, we propose CAPIO (Cross- Application Programmable I/O), a middleware capable of transparently injecting I/O streaming capabilities into file-based workflows, improving the iii computation-I/O overlap without modifying the business code. The contribution is twofold: at design time, a new I/O coordination language allows users to annotate workflow data dependencies with synchronization semantics; at run time, a user-space software layer automatically turns a batch execution into a streaming execution according to the semantics expressed in the configuration file. CAPIO has been tested on synthetic benchmarks simulating typical I/O workflow patterns and three real-world workflows. The results show how CAPIO provides performance improvements in dataintensive workflows that extensively use the file system as a communication medium. Looking ahead, tools like CAPIO could reshape HPC workflow orchestration strategies. For instance, they might enable the distribution of pre-and post-processing I/O-intensive phases across computation phases, fostering better overlap between steps by reducing applications’ peak I/O demands. This prospect underscores the transformative potential of our research.
16-ott-2024
36
INFORMATICA
ALDINUCCI, Marco
File in questo prodotto:
File Dimensione Formato  
phdThesisMartinelli.pdf

Accesso aperto

Descrizione: Tesi
Dimensione 3.29 MB
Formato Adobe PDF
3.29 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2030844
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact