P28 - GPU-Optimized Tridiagonal and Pentadiagonal System Solvers for Spectral Transforms in QuiCC
DescriptionQuiCC is a code under development designed to solve the equations of magnetohydrodynamics in a full sphere and other geometries. It uses a fully spectral approach to the problem, with the Jones-Worland polynomials as a radial basis and Spherical Harmonics as a spherical basis. We present an alternative to the quadrature approach to their evaluation - the polynomial connection approach, which is more accurate and requires less memory. In this work, we demonstrate an efficient GPU implementation of this algorithm. This poster focuses on the efficient tridiagonal and pentadiagonal GPU solvers used to evaluate the polynomial connections. Based on the Parallel Cyclic Reduction algorithm, they are optimized to exclusively perform on-chip data transfers through the warp shuffling instructions, exchanging data directly between threads registers. This results in the best occupancy (more registers per thread, more threadblocks per streaming multiprocessor) and full dispatch latency mitigation (no kernel synchronization during execution). The warp-shuffle approach to thread data exchange can be adapted for many other GPU algorithms as it is developed in the runtime code generation platform designed for future algorithm reuse, originally based on the VkFFT library.
TimeTuesday, June 2719:30 - 21:30 CEST