Porting ICON to AMD GPUs: Lessons Learned
DescriptionThe AMD Instinct MI250X GPU has rapidly become a key element of some of the most performant supercomputers in Europe and the USA. This steep ascent confronts HPC vendors and customers with significant challenges. While the system providers are committed to deliver a software stack that can leverage the significant potential of the new hardware early on, the application developers must port and optimize their codes on the new platform. This talk summarises the experiences gained with the ICON weather forecast model during the LUMI supercomputer procurement and acceptance. The core challenge consisted in bridging the performance and functionality gap between the NVIDIA and Cray compiler environment (CCE) for OpenACC which required a significant effort, while the application itself was gradually adapted to run most efficiently on the new hardware. We also report experiences with debugging and profiling tools from HPE, NVIDIA, and AMD which played a crucial role in this work. The resulting implementation was tested successfully on up to 2150 LUMI nodes demonstrating that Europe’s fastest supercomputer is ready for challenging workloads.
TimeTuesday, June 2717:00 - 17:30 CEST
Computer Science, Machine Learning, and Applied Mathematics