Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes
DescriptionFPGAs are fostering interest as energy-efficient accelerators for scientific simulations, including for methods operating on unstructured meshes. Considering the potential impact on high-performance computing, specific attention needs to be given to the scalability of such approaches. In this context, the networking capabilities of FPGA hardware and software stacks can play a crucial role to enable solutions that go beyond a traditional host-MPI and accelerator-offload model. In this work, we present the multi-FPGA scaling of a discontinuous Galerkin shallow water model using direct low-latency streaming communication between the FPGAs. To this end, the unstructured mesh defining the spatial domain of the simulation is partitioned, the inter-FPGA network is configured to match the topology of neighboring partitions, and halo communication is overlapped with the dataflow computation pipeline. With this approach, we demonstrate strong scaling on up to eight FPGAs with a parallel efficiency of >80% and execution times per time step of as low as 7.6 µs. At the same time, with weak scaling, the approach allows to simulate larger meshes that would exceed the local memory limits of a single FPGA, now supporting meshes up to more than 100,000 elements and reaching an aggregated performance of up to 6.5 TFLOPs. Finally, a hierarchical partitioning approach allows for better utilization of the FPGA compute resources in some designs and, by mitigating limitations posed by the communication topology, enables simulations with up to 32 partitions on 8 FPGAs.
TimeTuesday, June 2714:30 - 15:00 CEST
Chemistry and Materials
Climate, Weather and Earth Sciences
Computer Science, Machine Learning, and Applied Mathematics