The Gaia Astrometric Verification Unit-Global Sphere Reconstruction (AVU-GSR) Parallel Solver aims to find the astrometric parameters for 10^8 stars in the Milky Way, the attitude and the instrumental specifications of the Gaia satellite, and the global parameter gamma of the post Newtonian formalism. The code iteratively solves a system of linear equations, A x = b, where the coefficient matrix A is large (10^11 x 10^8 elements) and sparse. To solve this system of equations, the code exploits a hybrid implementation of the iterative PC-LSQR algorithm, where the computation related to different horizontal portions of the coefficient matrix is assigned to separate MPI processes. In the original code, each matrix portion is further parallelized over the OpenMP threads. To further improve the code performance, we ported the application to the GPU, replacing the OpenMP parallelization language with OpenACC. In this port, 95% of the data is copied from the host to the device at the beginning of the entire cycle of iterations, making the code compute bound rather than data-transfer bound. The OpenACC code presents a speedup of 1.5 over the OpenMP version but further optimizations are in progress to obtain higher gains. The code runs on multiple GPUs and it was tested on the CINECA supercomputer Marconi100, in anticipation of a port to the pre-exascale system Leonardo, that will be installed at CINECA in 2022.

The Gaia AVU-GSR parallel solver: Preliminary studies of a LSQR-based application in perspective of exascale systems

Cesare, Valentina;Lattanzi, Mario Gilberto;Aldinucci, Marco;Bucciarelli, Beatrice
2022-01-01

Abstract

The Gaia Astrometric Verification Unit-Global Sphere Reconstruction (AVU-GSR) Parallel Solver aims to find the astrometric parameters for 10^8 stars in the Milky Way, the attitude and the instrumental specifications of the Gaia satellite, and the global parameter gamma of the post Newtonian formalism. The code iteratively solves a system of linear equations, A x = b, where the coefficient matrix A is large (10^11 x 10^8 elements) and sparse. To solve this system of equations, the code exploits a hybrid implementation of the iterative PC-LSQR algorithm, where the computation related to different horizontal portions of the coefficient matrix is assigned to separate MPI processes. In the original code, each matrix portion is further parallelized over the OpenMP threads. To further improve the code performance, we ported the application to the GPU, replacing the OpenMP parallelization language with OpenACC. In this port, 95% of the data is copied from the host to the device at the beginning of the entire cycle of iterations, making the code compute bound rather than data-transfer bound. The OpenACC code presents a speedup of 1.5 over the OpenMP version but further optimizations are in progress to obtain higher gains. The code runs on multiple GPUs and it was tested on the CINECA supercomputer Marconi100, in anticipation of a port to the pre-exascale system Leonardo, that will be installed at CINECA in 2022.
2022
41
100660
100674
Massively parallel algorithms; Astrometry
Cesare, Valentina; Becciani, Ugo; Vecchiato, Alberto; Lattanzi, Mario Gilberto; Pitari, Fabio; Raciti, Mario; Tudisco, Giuseppe; Aldinucci, Marco; Bu...espandi
File in questo prodotto:
File Dimensione Formato  
Technical_report_Gaia_MPI_OpenACC_Valentina_Cesare_et_al.pdf

Accesso aperto

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 1.87 MB
Formato Adobe PDF
1.87 MB Adobe PDF Visualizza/Apri
1-s2.0-S2213133722000749-main.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 1.45 MB
Formato Adobe PDF
1.45 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1890597
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact