C502: The Gaia AVU–GSR Parallel Solver: CUDA solutions for linear systems solving and covariances calculation toward Exascale infrastructures

Wednesday plenary 1: contributed talk

When

8:45 to 9 a.m., Nov. 8, 2023

Where

Theme: GPU implementations for core astronomical libraries

pretalx

We ported to the GPU with CUDA the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) Parallel Solver, developed for the ESA Gaia mission, by optimizing a previous OpenACC porting of the code. The code finds, with a 10-100 μas precision, the astrometric parameters of ∼10^8 sources, the attitude and instrument settings of the Gaia satellite, and the parameter γ of the PPN formalism, by solving a system of linear equations, A×x=b, with the LSQR iterative algorithm. The coefficient matrix A of the final Gaia dataset is large, with ∼10^11x(5x10^8) elements, and sparse, reaching a size of ∼10–100 TB, typical for the Big Data analysis, which requires an efficient parallelization to obtain scientific results in reasonable timescales. In the matrix size, 10^11 is the number of equations, i.e., of stellar observations, and 5x10^8 is the number of unknowns, Nunk. The speedup of the CUDA code over the original AVU–GSR solver, parallelized on the CPU with MPI+OpenMP, increases with the system size and the number of resources, reaching a maximum of 14x, >9x over the OpenACC code. This result is obtained by comparing the two codes on the CINECA cluster Marconi100, with 4 16 GB V100 GPUs per node. We verified the agreement between CUDA and OpenMP solutions for a set of production systems. The CUDA code was then put in production on Marconi100, essential for an optimal AVU–GSR pipeline and the successive Gaia Data Releases. We aim to port the production of this code on Leonardo CINECA infrastructure, expecting to obtain even higher performances, since this platform has 4x GPU memory per node compared to Marconi100.

To solve a system of linear equations, the system solution, the errors on the unknowns (variances) and the covariances can be calculated. Whereas the solution and the variances arrays have size Nunk~5x10^8, the variances-covariances matrix has a size ~Nunk^2/2, which can occupy ~1 EB. This represents a “Big Data” problem, which cannot be solved with standard methods. To cope with this difficulty, we define a novel I/O- based strategy in a two jobs-pipeline, where one job is dedicated to the files writing and the second concurrent job reads the files as they are created, iteratively computes the covariances, and deletes the files, to avoid storage issues. In this way, the covariances calculation does not significantly slowdown the AVU-GSR code for a number of covariances up to ~10^6.

These analyses represent a first step to understand the (pre-)Exascale behavior of a class of codes based on the same structure of this one.

Acknowledgments: This work is supported by the Spoke 1 “FutureHPC & BigData” of the ICSC– CN di Ricerca in HPC, Big Data and Quantum Computing–and hosting entity, funded by European Union–NextGenerationEU”. This work was also supported by ASI grant No. 2018-24-HH.0, in support of the Italian participation to the Gaia mission, and by CINI, under the project EUPEX, EC H2020 RIA, EuroHPC-02-2020 grant No. 101033975.

Contacts

Valentina Cesare, Osservatorio Astrofisico di Catania