Emerging hardware constraints are pushing workloads to become more composite. This transition involves new jobs where the HPC I/O systems are shared among multiple and concurrent jobs. This can generate load imbalances and contention in the end-to-end I/O paths, thus degrading the I/O system performance and the workloads. Recognizing this context, we define a simulation-based framework that alleviates resource contention in applications and ultimately allows us to design contention avoidance strategies. Specifically, by capturing behavior system-wide and extracting phases and characteristics of various performance metrics, we can mitigate contention by delaying the launch of applications. This framework leverages frequency domain analysis of performance metrics alongside clustering methods and is coupled with a comprehensive model of an HPC system implemented using Extended Stochastic Symmetric Nets.

A Simulation-Based Framework to Reduce I/O Contention in HPC

Pernice, Simone
First
;
Cantalupo, Barbara;Aldinucci, Marco
2025-01-01

Abstract

Emerging hardware constraints are pushing workloads to become more composite. This transition involves new jobs where the HPC I/O systems are shared among multiple and concurrent jobs. This can generate load imbalances and contention in the end-to-end I/O paths, thus degrading the I/O system performance and the workloads. Recognizing this context, we define a simulation-based framework that alleviates resource contention in applications and ultimately allows us to design contention avoidance strategies. Specifically, by capturing behavior system-wide and extracting phases and characteristics of various performance metrics, we can mitigate contention by delaying the launch of applications. This framework leverages frequency domain analysis of performance metrics alongside clustering methods and is coupled with a comprehensive model of an HPC system implemented using Extended Stochastic Symmetric Nets.
2025
2025 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2025
ita
2025
Proceedings - 2025 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2025
Institute of Electrical and Electronics Engineers Inc.
1258
1260
HPC; I/O; Markov Process; Monitoring; Performance Modeling
Pernice, Simone; Tarraf, Ahmad; Besnard, Jean-Baptiste; Cantalupo, Barbara; Cascajo, Alberto; Singh, David E.; Wolf, Felix; Carretero, Jesús; Shende, ...espandi
File in questo prodotto:
File Dimensione Formato  
A_Simulation-Based_Framework_to_Reduce_I_O_Contention_in_HPC.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 3.53 MB
Formato Adobe PDF
3.53 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2101392
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact