We GPU ported with CUDA the solver module of the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) pipeline for the ESA Gaia mission. The code finds the astrometric parameters of ∼108 sources, by solving a linear system with the LSQR. The coefficient matrix is large (10–50 TB) and sparse. The CUDA code accelerates over the original MPI + OpenMP solver of ∼14x on CINECA cluster Marconi100. We migrated the code production to Leonardo, which has 4x GPU memory per node. This speedup was obtained without computing the system covariances, whose total number is Nunk × (Nunk − 1)/2 and occupy ∼1 EB with Nunk ∼ 5 × 108. This “Big Data” problem cannot be solved with standard methods: we defined a two jobs, I/O-based pipeline, where one job writes the files and the second concurrent job reads them, iteratively computes the covariances, and deletes them. The covariances calculation does not significantly slowdown the code until a number of covariances elements equal to ∼8 × 106
The Gaia AVU–GSR solver: CPU+GPU parallel solutions for linear systems solving and covariances calculation toward Exascale systems
Cesare V.;Lattanzi M. G.;Aldinucci M.;Bucciarelli B.
2024-01-01
Abstract
We GPU ported with CUDA the solver module of the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) pipeline for the ESA Gaia mission. The code finds the astrometric parameters of ∼108 sources, by solving a linear system with the LSQR. The coefficient matrix is large (10–50 TB) and sparse. The CUDA code accelerates over the original MPI + OpenMP solver of ∼14x on CINECA cluster Marconi100. We migrated the code production to Leonardo, which has 4x GPU memory per node. This speedup was obtained without computing the system covariances, whose total number is Nunk × (Nunk − 1)/2 and occupy ∼1 EB with Nunk ∼ 5 × 108. This “Big Data” problem cannot be solved with standard methods: we defined a two jobs, I/O-based pipeline, where one job writes the files and the second concurrent job reads them, iteratively computes the covariances, and deletes them. The covariances calculation does not significantly slowdown the code until a number of covariances elements equal to ∼8 × 106I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



