The growing demands for data processing by new data-intensive applications are putting pressure on the performance and capacity of HPC storage systems. The advancement in storage technologies, such as NVMe and persistent memory, are aimed at meeting these demands. However, relying solely on ultra-fast storage devices is not cost-effective, leading to the need for multi-tier storage hierarchies to move data based on its usage. To address this issue, ad-hoc file systems have been proposed as a solution. They utilise the available storage of compute nodes, such as memory and persistent storage, to create a temporary file system that adapts to the application behaviour in the HPC environment. This work presents the design, implementation, and evaluation of a distributed ad-hoc in-memory storage system (Hercules), highlighting the new communication model included in Hercules. This communication model takes advantage of the Unified Communication X framework (UCX). This solution leverages the capabilities of RDMA protocols, including Infiniband, Onmipath, shared memory, and zero-copy transfers. The preliminary evaluation results show excellent network utilisation compared with other existing technologies.

Hercules: Scalable and Network Portable In-Memory Ad-Hoc File System for Data-Centric and High-Performance Applications

Martinelli A. R.;Aldinucci M.;
2023-01-01

Abstract

The growing demands for data processing by new data-intensive applications are putting pressure on the performance and capacity of HPC storage systems. The advancement in storage technologies, such as NVMe and persistent memory, are aimed at meeting these demands. However, relying solely on ultra-fast storage devices is not cost-effective, leading to the need for multi-tier storage hierarchies to move data based on its usage. To address this issue, ad-hoc file systems have been proposed as a solution. They utilise the available storage of compute nodes, such as memory and persistent storage, to create a temporary file system that adapts to the application behaviour in the HPC environment. This work presents the design, implementation, and evaluation of a distributed ad-hoc in-memory storage system (Hercules), highlighting the new communication model included in Hercules. This communication model takes advantage of the Unified Communication X framework (UCX). This solution leverages the capabilities of RDMA protocols, including Infiniband, Onmipath, shared memory, and zero-copy transfers. The preliminary evaluation results show excellent network utilisation compared with other existing technologies.
2023
Inglese
contributo
1 - Conferenza
29th International European Conference on Parallel and Distributed Computing, Euro-Par 2023
cyp
2023
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Comitato scientifico
Springer Science and Business Media Deutschland GmbH
Berlino
GERMANIA
14100
679
693
15
978-3-031-39697-7
978-3-031-39698-4
Data intensive; HPC; In-memory storage
SPAGNA
   Third Party CINI - "ADMIRE - Adaptive multi-tier intelligent data manager for Exascale" - Call H2020-JTI-EuroHPC-2019-1 - Grant Agreement n. 956748 - CUP F69J21003450007
   ADMIRE
   EUROPEAN COMMISSION
   H2020
   ALDINUCCI M.-H2020 RIA - G.A. 956748
1 – prodotto con file in versione Open Access (allegherò il file al passo 6 - Carica)
6
info:eu-repo/semantics/conferenceObject
04-CONTRIBUTO IN ATTI DI CONVEGNO::04A-Conference paper in volume
Garcia-Blas J.; Sanchez-Gallegos G.; Petre C.; Martinelli A.R.; Aldinucci M.; Carretero J.
273
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1946992
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact