# Posters

## Ab Initio Modeling of Magnetite Surfaces for Plutonium Retention

In many countries, thick steel casks are used for the containment of high-level radioactive waste in deep geological repositories. In contact with pore-water, steel corrodes forming mixed iron oxides, mainly magnetite at the surface. After tens of thousands of years, casks may breach allowing leaching of the radionuclides by pore-water. The magnetite can retard dissolved radionuclides either by adsorption or structural incorporation [1,2]. Our goal is to better understand these interaction mechanisms by using computer simulations alongside experiments [3]. Energetically favourable termination and stoichiometry of possible (111) Fe3O4 surfaces at repository relevant conditions are revealed based on Kohn-Sham density functional theory with Hubbard correction (DFT+U) for Fe 3d electrons [4]. Further, classical molecular dynamics (MD) simulations are applied to investigate the interaction at the water-magnetite interface. Moreover, after determining the U value to describe Pu 5f electrons, ab initio MD simulations of sorption structures on expected magnetite (111) surfaces are performed.

[1] T. Dumas et al., ACS Earth Space Chem. 2019, 3, 2197-2206.

[2] R. Kirsch et al., Environ. Sci. Technol. 2011, 45, 7267-7274.

[3] E. Yalçintaş et al., Dalton Trans. 2016, 45, 17874-17885.

[4] A. Kéri et al., Environ. Sci. Technol. 2017, 51, 10585-10594.

Author(s): Anita S. Katheras (University of Bern), Konstantinos Karalis (University of Bern), Matthias Krack (Paul Scherrer Institute), Andreas C. Scheinost (Helmholtz-Zentrum Dresden-Rossendorf), and Sergey V. Churakov (Paul Scherrer Institute, University of Bern)

Domain: Chemistry and Materials

## Accurate Electronic Properties and Intercalation Voltages of Li-Ion Cathode Materials from Extended Hubbard Functionals

The design of novel cathode materials for Li-ion batteries requires accurate first-principles predictions of their properties. Density-functional theory (DFT) with standard (semi-)local functionals fails due to the strong self-interaction errors of partially filled d shells of transition-metal (TM) elements. Here, we show for phospho-olivine and spinel cathodes that DFT with extended Hubbard functionals correctly predicts the “digital” change in oxidation states of the TM ions for the mixed-valence phases occurring at intermediate Li concentrations, leading to voltages in remarkable agreement with experiments [1,2]. This is achieved thanks to the use of onsite and intersite Hubbard parameters computed from density-functional perturbation theory with Lowdin-orthogonalized atomic orbitals [3]. We thus show that the inclusion of intersite Hubbard interactions is essential for the accurate prediction of thermodynamic quantities when electronic localization occurs in the presence of inter-atomic orbital hybridization. This work paves the way for reliable first-principles studies of other families of cathode materials without relying on empirical fitting or calibration procedures.

[1] I. Timrov et al., PRX Energy 1, 033003 (2022).

[2] I. Timrov et al., arXiv:2301.11143 (2023).

[3] I. Timrov et al., PRB 103, 045141 (2021).

Author(s): Iurii Timrov (EPFL), Francesco Aquilante (EPFL), Michele Kotiuga (EPFL), Matteo Cococcioni (University of Pavia), and Nicola Marzari (EPFL)

Domain: Physics

## Addressing Exascale Challenges for Numerical Algorithms of Strongly Correlated Lattice Models

Strongly Correlated Lattice Models play an important role for our understanding of Quantum Magnetism, High-Tc superconductors, and also Quantum Simulators built from cold atoms, trapped ions, Rydberg atoms, or superconducting qubits. Wave function based numerical algorithms, such as Exact Diagonalization or Tensor Network Algorithms are powerful state-of-the-art techniques in this field. Past efforts brought Shared Memory parallelisation for both methods, and also large scale MPI parallelisation for the former, to impressive levels. However the current trend towards more GPU based architectures and additional architectural inhomogeneity in the HPC landscape ask for renewed parallelisation efforts. In this poster we present the current state of the art in parallelisation, and discuss strategies and first results in porting some of these algorithms to the GPU and multi-GPU domain.

Author(s): Samuel Gozel (Paul Scherrer Institute), and Andreas M. Läuchli (Paul Scherrer Institute, EPFL)

Domain: Physics

## Analysis and Application of CNN to Improve Deterministic Optical Flow Nowcasting at DWD

Optical flow based nowcasting is essential for several operational productions at DWD, including time critical warnings. Precipitation and radar reflectivity nowcasts are produced every 5 minutes with a 5 minute stepping up to 2h lead time. The method assumes stationarity of the input data. It is a deterministic advection scheme without dynamic properties and does not take advantage of additional data sources.Recently, machine learning techniques were tested in nowcasting. Deterministic methods struggle to predict high-intensity values and become blurry for larger lead times. In this presentation we explore the potential of deterministic convolutional neural networks (CNN) to improve the operational nowcasting at DWD. A two-year dataset consisting of radar, NWP and orography data is used for training modified UNet based neural networks. The goals are to understand the technically limitations of the approach as well as the impact of the additional input data. Clever data manipulation and adaption of the network architecture to its properties are key. An impact study for the input data is performed. We explore combination methods for several data encoders, additional computation blocks for more nonlinearity and loss functions with spatial context. Baselines for comparison are the operational nowcasting at DWD and CNN approaches from literature.

Author(s): Ulrich Friedrich (DWD)

Domain: Climate, Weather and Earth Sciences

## Analyzing Physics-Informed Neural Networks for Solving Classical Flow Problems

The application of Neural Networks (NNs) has been extensively investigated for fluid dynamic problems. A specific form of NNs are Physics-Informed Neural Networks (PINNs), which incorporate physics-based embeddings to account for physical laws. In this work, the performance of PINNs is compared to that of DNNs with respect to accuracy. Therefore, results obtained from PINNs and DNNs are compared to analytical solutions of four classical flow problems – Poiseuille flow, potential flow around cylinder and Rankine oval, and Blasius boundary layer flow. The findings show that the PINNs provide more accurate representations of the flow fields than their DNN counterpart for potential flow around a cylinder and Blasius boundary layer flow. The investigations show that in some flow problems, inclusion of information on problem physics, governing equations, and boundary conditions in the loss function of an NN can improve prediction accuracy of NNs. Since PINNs are computationally expensive compared to DNNs, it is also investigated if the accuracy achieved with PINNs over DNNs is significantly high to justify the additional computation costs that are associated with their training.

Author(s): Rishabh Puri (Forschungszentrum Jülich), Mario Rüttgers (Forschungszentrum Jülich, RWTH Aachen University), Rakesh Sarma (Forschungszentrum Jülich), and Andreas Lintermann (Forschungszentrum Jülich)

Domain: Engineering

## Application of Deep Learning and Reinforcement Learning to Boundary Control Problems

Many scientific problems, such as fluid dynamics problems involving drag reduction, temperature control with some desired flow pattern, etc., rely on optimal boundary control algorithms. These forward solves are performed for multiple simulation timesteps, and hence, a method to solve the boundary control problem with fewer computations would expedite these simulations. The goal of the boundary control problem is, in essence, to find the optimal values for the boundaries such that the values for the enclosed domain are as close as possible to desired values. Traditionally, the solution is obtained using nonlinear optimization methods, such as interior point, wherein the computational bottleneck is introduced by the large linear systems. Our objective is to use deep learning methods to solve boundary control problems faster than traditional solvers. We approach the problem using both supervised and unsupervised learning techniques. In supervised learning, we use traditional solvers to generate training, testing and validation data, and, use Convolutional Neural Networks and/or Spatial Graph Convolutional Networks. In unsupervised learning, we use reinforcement learning wherein the reward function is a function of the network prediction, desired profile, governing differential equation and constraints. The computational experiments are performed on GPU-enabled clusters, demonstrating the viability of this approach.

Author(s): Zenin Easa Panthakkalakath (Università della Svizzera italiana), Juraj Kardoš (Università della Svizzera italiana), and Olaf Schenk (Università della Svizzera italiana, ETH Zurich)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Bridging the Language Gap: Classes for C++/Fortran Interoperability

Fortran and C++ remain popular languages for high-performance scientific computing. Interoperation of these two languages is of great interest; be it to take advantage of a mature ecosystem of libraries, or for coupling individual simulation codes into larger multi-scale or multi-physics applications. Fortran 2018 introduced enhanced facilities for interoperability with C including an API for passing and manipulating Fortran arrays (including those with allocatable and pointer attributes). This is achieved by means of a C descriptor – an opaque C structure type – and an accompanying library of functions. In this contribution I present a handful of new templated C++ classes that wrap the C descriptor and expose native semantic features of C++ including iterators, range-based for loops and elemental access operators. Implicit casts to std::span (C++20) and std::mdspan (C++23) containers enable efficient reuse of C++ routines. Reuse of Fortran routines from C++ is also simplified through the use of class template argument deduction (CTAD) to construct compatible C descriptors for popular C++ container and array types.

Author(s): Ivan Pribec (Leibniz Supercomputing Centre)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Building a Physics-Constrained, Fast and Stable Machine Learning-Based Radiation Emulator

Modeling the transfer of radiation through the atmosphere is a key component of weather and climate models. The operational radiation scheme in the Icosahedral Nonhydrostatic Weather and Climate Model (ICON) is ecRad. The radiation scheme ecRad is accurate but computationally expensive. It is operationally run in ICON on a grid coarser than the dynamical grid and the time step interval between two calls is large. This is known to reduce the quality of the climate prediction. A possible approach to accelerate the computation of the radiation fluxes is to use machine learning methods. In this work, we study random forest and neural network emulations of ecRad. Concerning the neural network, we compare loss functions with an additional energy penalty term and we observe that modifying the loss function is essential to predict accurately the heating rates. The random forest emulator, which is significantly faster to train than the neural network is used as a reference model that the neural network must outperform. The random forest emulator can become extremely accurate but the memory requirement quickly become prohibitive. Various numerical experiments are performed to illustrate the property of the machine learning emulators.

Author(s): Guillaume Bertoli (ETH Zurich), Sebastian Schemm (ETH Zurich), Firat Ozdemir (Swiss data science center), Fernando Perez Cruz (Swiss data science center), and Eniko Szekely (Swiss data science center)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Calculation of the Maximally Localized Wannier Functions in the SIRIUS Library

Electronic properties of the materials are one of the major line of research for studying existing and discovering novel materials. DFT+U and Koopman spectral functionals constitute a good approach for correcting the DFT band structure, which is usually not good for the prediction of some of the properties of the materials, such as the band gap. Both functionals are implemented in Quantum ESPRESSO, and both can be calculated starting from the Maximally Localized Wannier Functions (MLWF). The calculation of the MLWF can be optimized using the SIRIUS library. On one hand, the cost of the calculation can be reduced if one runs the minimization of the spread functional right after the DFT calculation, with no need of writing/reading all the information. On the other hand, all the bottlenecks in the calculation of the MLWF are represented by matrix multiplications; hence, the calculation can be easily optimized in parallel architectures, both at the MPI level or accelerating the program on GPUs. In this work, we present the implementation, validation and we show preliminary performances of the computation of the MLWFs with the implementation in the SIRIUS library.

Author(s): Giovanni Consalvo Cistaro (EPFL), Nicola Colonna (Paul Scherrer Institute), Iurii Timrov (EPFL), Anton Kozhevnikov (ETH Zurich / CSCS), and Nicola Marzari (EPFL)

Domain: Chemistry and Materials

## Closing the Gap: Aligning Developers’ Expectations and Users’ Practices in Cloud Computing Infrastructure

There are often discrepancies between the uses that infrastructure developers envision for their technology and the way they are implemented in reality. We report on this gap between expectation and practice based on our ongoing study of the user-experience on a national cyberinfrastructure system for scientific computing, the Cloud Infrastructure for Computing Platform (CICP). Through our observation and interviews with 15 CICP users, we found that infrastructure developers’ expectations of users’ learning processes differ from the users’ actual learning practices. Although CICP documentation and resources are intended to create common knowledge and behaviors, we have observed some local practices emerge through the influence of peers, mentors, and knowledge of other systems. We provide suggestions for minimizing the gap between expectations and practices that can be applied to other similar technology infrastructures in order to meet the goals of developers and improve user experience.

Author(s): Tamanna Motahar (University of Utah), Johanna Cohoon (University of Utah), Kazi Sinthia Kabir (University of Utah), and Jason Wiese (University of Utah)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Compressing Multidimensional Weather and Climate Data into Neural Networks

Weather and climate simulations produce petabytes of high-resolution data that are later analyzed by researchers in order to understand climate change or severe weather. We propose a new method of compressing this multidimensional weather and climate data: a coordinate-based neural network is trained to overfit the data, and the resulting parameters are taken as a compact representation of the original grid-based data. While compression ratios range from 300x to more than 3,000x, our method outperforms the state-of-the-art compressor SZ3 in terms of weighted RMSE, MAE. It can faithfully preserve important large scale atmosphere structures and does not introduce artifacts. When using the resulting neural network as a 790x compressed dataloader to train the WeatherBench forecasting model, its RMSE increases by less than 2%. The three orders of magnitude compression democratizes access to high-resolution climate data and enables numerous new research directions.

Author(s): Langwen Huang (ETH Zurich), and Torsten Hoefler (ETH Zurich)

Domain: Climate, Weather and Earth Sciences

## Denoising Electronic Signals from Particle Detectors Using a Flexible Deep Convolutional Autoencoder

In this work, we present the use of a deep convolutional autoencoder to denoise signals from particle detectors. The study of rare particle interactions is crucial in advancing our understanding of the Universe. However, the presence of electronic noise makes signal events difficult to distinguish from backgrounds, especially due to the infrequent nature of the interactions we are searching for. We begin with our recently published results on germanium detectors demonstrating that deep learning is more effective at removing noise than traditional approaches, while still preserving the underlying pulse shape well. We show that our approach also has practical implications on data storage and processing efficiency. To extend on our published work, we explored additional deep learning-based methods for signal denoising and modeling. We then present our simulations on implementing these algorithms at the hardware level directly for real-time denoising and data selection. Finally, we demonstrate that our approach is broadly applicable to other detector technologies and one-dimensional electronic signals.

Author(s): Mark Anderson (Queen’s University), Noah Rowe (Queen’s University), and Tianai Ye (Queen’s University)

Domain: Physics

## Detecting Financial Fraud with Graph Neural Networks

Detecting financial fraud is a challenging classification problem that entails the discovery of suspicious patterns in large-scale and time evolving data. Traditionally, financial institutions have been relying on rule-based methods to identify suspicious accounts, with such approaches becoming ineffective as the volume of transactions grows and criminal conduct gets more sophisticated. In this work, we capture in the form of directed graphs the interdependent nature of monetary transactions, with their nodes representing the financial entities involved, and their edges describing details regarding the transactions. We consider this format in both static and dynamic graph structures which mimic a time evolving financial environment. Subsequently, deep learning anomaly detection approaches are employed with the aim of separating fraudulent and benign nodes using their historical data. We consider classifiers including Graph Convolutional Networks, Graph Attention Networks, and Long-Short Term Memory Autoencoders and apply them on a wide range of artificial data that simulate the real-world behavior and complexity of monetary transactions. The accuracy and time-to-solution of our results highlight the applicability of deep learning methods in problems encountered by the financial industry.

Author(s): Julien Schmidt (Università della Svizzera italiana), Dimosthenis Pasadakis (Università della Svizzera italiana), and Olaf Schenk (Università della Svizzera italiana)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Directive-Based, Fortran/C++ Interoperable Approach to GPU Offloading of the High Performance Gyrokinetic Turbulence Code GENE-X

The achievement of high plasma confinement is the key to realize commercially attractive energy production by magnetic confinement fusion (MCF) devices. Turbulence plays a significant role in maintaining the plasma confinement within MCF devices. The GENE-X code is based on an Eulerian (continuum) approach to the discretization of the five dimensional gyrokinetic equation that describes plasma turbulence. Our discretization is specialized to simulate plasma turbulence anywhere within MCF devices, from the hot plasma core to the cold wall. GENE-X is written in object-oriented modern Fortran 2008 leveraging MPI+OpenMP parallelization to facilitate large scale turbulence simulations. Here, we present our development efforts to further extend the parallelization scheme to GPUs, which is essential for scalability support towards simulations of larger, reactor-relevant fusion devices. The current implementation in GENE-X provides a proof of concept of our native Fortran/C++ interoperability approach by successfully supporting several GPU backends such as OpenACC, OpenMP offload and CUDA. We present first benchmarks of our directive-based OpenACC implementation of the most computationally expensive part of GENE-X. A significant performance increase was achieved on the GPU, compared to equivalent CPU benchmarks.

Author(s): Jordy Trilaksono (Max Planck Institute for Plasma Physics), Philipp Ulbl (Max Planck Institute for Plasma Physics), Andreas Stegmeir (Max Planck Institute for Plasma Physics), and Frank Jenko (Max Planck Institute for Plasma Physics, University of Texas at Austin)

Domain: Physics

## Disruption Forecasting with Uncertainty Quantification for Fusion Plasmas with Deep Ensembles

We describe the extension of our existing deep learning model for the prediction of plasma instabilities in tokamak fusion reactors (FRNN) to ensembles of models, automatically constructed and trained using the DeepHyper framework for AutoML. Black-box optimization is performed on the long short-term memory (LSTM) network’s hyperparameters using a distributed Bayesian optimization method on a DGX A100 machine. The automated search of thousands of candidate models yields more than a hundred networks with significantly improved predictive accuracy relative to the original baseline FRNN. Several ensembles of these top performing networks are constructed using different techniques, including a greedy, top-k, and gradient-based selection criteria. The diversity within an ensemble provides robust uncertainty quantification of the overall prediction of plasma stability. The prediction’s total uncertainty is decomposed into epistemic and aleatoric uncertainty; the practical value of such a decomposition for tokamak machine operators and plasma control systems is discussed.

Author(s): Kyle Felker (Argonne National Laboratory)

Domain: Physics

## DNS of Strongly Turbulent Thermal Convection in a Non-Rotating Full Sphere

Body forces such as gravity can drive convective motion in fluids. Convection due to thermal gradients and the resulting buoyancy force is called thermal convection and occurs ubiquitously in nature. We present results on DNS of thermal convection in a non-rotating full sphere and with different boundary conditions, the Rayleigh and Prandtl numbers being the non-dimensional control parameters. We aim to characterise fluid flow and heat transfer in a wide range of parameters, using a global Reynolds number as a measure for the vigour of the flow and the Nusselt number quantifying the strength of heat advection as primary diagnostics. DNS are run up to a Rayleigh number appr. 10^8 times the critical Rayleigh at a Prandtl number of unity, yielding scaling relations between the diagnostics and Ra up to this regime of strong turbulence. The simulations were performed using the fully spectral, efficiently parallelised MHD/fluid dynamics code QuICC on CSCS’s Daint cluster.

Author(s): Tobias Sternberg (ETH Zurich), Philippe Marti (ETH Zurich), Giacomo Gastiglioni (ETH Zurich), and Andrew Jackson (ETH Zurich)

Domain: Climate, Weather and Earth Sciences

## Docker Container in DWD’s Seamless INtegrated FOrecastiNg sYstem (SINFONY)

At Deutscher Wetterdienst (DWD), the SINFONY project has been set up to develop a seamless ensemble prediction system for convective-scale forecasting with forecast ranges of up to 12 hours. It combines Nowcasting (NWC) techniques with numerical weather prediction (NWP) in a seamless way. So far NWC and NWP run on two different IT-Infrastructure levels. Their combination requires a data transfer between both infrastructures, which slows down SINFONY, increases complexity and is prone to disturbances. These disadvantages are solved by transferring the interconnected part of the SINFONY on one single architecture using Docker Container. With this aim in view Docker-Container of the respective NWC components are created, whereby the Container-Image build process is integrated to present CICD systems at DWD. For the application at the assimilation cycle one Container is already implemented in DWD’s development tool BACY. A major innovation of SINFONY is the rapid update cycle (RUC), an hourly refreshing NWP procedure with a Forecast range of 8 hours. The container will be implemented to the RUC and used for the subsequent combination of NWP and NWC.

Author(s): Matthias Zacharuk (DWD)

Domain: Climate, Weather and Earth Sciences

## Doppler-Boosted Lasers: A New Path to Extreme QED Pair Plasmas in Light-Matter and Light-Quantum Vacuum Interactions

How does light interact with matter or the quantum vacuum at intensities where the physics is governed by Quantum Electrodynamics (QED)? What are the properties of the QED electron-positron pair plasma produced in those interactions? Can the probing of this plasma help address open problems in quantum field theory and astrophysics? Answering these questions requires light intensities far beyond the ones achieved by the most intense PetaWatt (PW) laser on earth. To break this barrier, we recently proposed new schemes to considerably ‘boost’ the intensity of present lasers by Doppler effect employing physical systems called ‘relativistic plasma mirrors’. In this poster, we will first introduce the novel schemes that have been developed at Commissariat à l’Energie Atomique (CEA) to probe novel QED plasma states in light-matter and light-quantum vacuum interactions. We will then present the exascale simulation tools that we have co-developed with the Lawrence Berkeley National Lab (LBNL) to understand the basic physics of these QED plasmas and help identify clear SF-QED signatures that shall be observed in experiments. Our simulations will be key to stimulate, design and guide experiments intended to detect these signatures at PW laser facilities.

Author(s): Henri Vincenti (CEA), Luca Fedeli (CEA), Neil Zaim (CEA), Antonin Sainte-Marie (CEA), Pierre Bartoli (CEA), Thomas Clark (CEA), Jean-Luc Vay (Lawrence Berkeley National Laboratory), and Axel Huebl (Lawrence Berkeley National Laboratory)

Domain: Physics

## Efficient Data Managment in Fully Spectral Dynamo Simulations on Heterogeneous Nodes

Our CFD framework QuICC, based on a fully spectral method, has been successfully used for various dynamo simulations in spherical and Cartesian geometries. It runs efficiently on a few thousands of cores using a 2D data distribution based on a distributed memory paradigm (MPI). In order to better harness the computing power of current and upcoming HPC systems, we present our work on refactoring the framework to introduce a hybrid distributed and shared memory parallelization (MPI + X). Our fully spectral method in a spherical geometry leads to 3D sparse tensors with a well defined block structure. Our strategy is based on the principle of separation of concerns which is applied on multiple levels. The operators API map to mathematical operations on tensors, without knowledge of the data layout or back-end. The tensors are represented by a type that we call “View” which encodes sparsity and memory layout. The refactorization of the new API and data layout results in a code base that has a lower memory footprint, it is more composable thus easier to maintain and extend to cover different back-ends. The API and a performance comparison for different operators and back-ends (CPU and GPU) will be presented.

Author(s): Giacomo Castiglioni (ETH Zurich), Philippe Marti (ETH Zurich), Dmitrii Tolmachev (ETH Zurich), Daniel Ganellari (ETH Zurich / CSCS), and Andy Jackson (ETH Zurich)

Domain: Climate, Weather and Earth Sciences

## Enabling GENE for Exascale Computing via Modern Data Science

The computational power needed for theoretical plasma turbulence studies is massive because they typically employ numerical solutions of integro-differential equations in 5 or 6 dimensions on a very large parameter space. Lower precision arithmetic has advantages like faster throughput, reduced communication, and lower energy costs, but the results must be precisely measured. Special attention is given to the leading gyrokinetic plasma turbulence code GENE, which allows single and double precision computations. One of the aims of the National DaREXA-F (Data Reduction for Exascale Applications of Fusion Research) project is to create techniques for using reduced-precision arithmetic on contemporary architectures while reducing the amount of data required for transfer operations. Analysing GENE, several sub-structures emerge that perform specific tasks, hence the goal is to determine which of them benefit most from low precision. A further strategy is to develop adaptive precision algorithms so the appropriate precision is adopted dynamically. Alongside this, an error model is devised so that the statistical properties of new simulations are controlled. Scalability on heterogeneous systems must be improved so that the power of exascale systems is fully exploited.

Author(s): Luciana Tanzarella (Max Planck Institute for Plasma Physics, Max Planck Computing and Data Facility)

Domain: Physics

## Evaluation of GPU Accelerated Machine Learning Algorithms for Energy Price Prediction

The Locational Marginal Pricing (LMP) mechanism is a way to calculate the cost of providing electricity to a specific point in the grid. Accurate forecasting of LMP is important for market participants such as power producers or financial institutions to optimize operations and bidding strategies. The LMP is calculated using the optimal power flow (OPF) problem, which is a constrained nonlinear optimization problem to determine the least-cost power generation in the grid. However, this can be a time-consuming and computationally demanding task. Recent efforts have focused on using machine learning techniques, such as Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, and Deep Neural Networks, to predict LMP more efficiently. Modern machine learning libraries like Scikit-Learn and PyTorch are optimised to use multi-core CPU and GPU architectures that are common in modern High-Performance Computing (HPC) clusters. These models have been tested on multiple electricity grids and found to be 4-5 orders of magnitude faster than traditional methods. However, they do have slightly higher error rates on edge-case scenarios. Overall, there is a strong case for using machine learning models for LMP prediction on large scale electricity grids with the aid of HPC resources.

Author(s): Naga Venkata Sai Jitin Jami (Università della Svizzera italiana, Friedrich-Alexander-Universität Erlangen-Nürnberg), Juraj Kardos (Università della Svizzera italiana), Olaf Schenk (Università della Svizzera italiana), and Harald Köstler (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Geodynamo Simulations in a Full Sphere

Although the geomagnetic field exists since about 4 Gyr, recent estimates for the formation of the Earth’s inner core go back no further than 500 Myr to 1 Gyr. Here we run rapidly rotating dynamos in a full sphere geometry, representative of the Earth’s dynamo before the nucleation of the inner core. Numerically, the full sphere bears the difficulty of an adequate treatment of the singularity at the center. We perform a set of numerical simulations using a fully spectral simulation framework, where a careful choice of the radial basis functions resolves this singularity and allows us to systematically study the influence of three non-dimensional parameters, namely the Ekman number E (measuring viscous to Coriolis force), the Rayleigh number Ra (measuring the convective forcing) and the magnetic Prandtl number Pm (ratio of viscous and magnetic diffusivity). The output of our simulations allows us to characterize the dynamo regime as a function of Ra and Rm, which differs from similar diagrams for spherical shell geometry. In particular, we find that the regime of dipolar magnetic fields is narrower in the full sphere and a larger Pm is needed for dynamo action. Finally, we derive scaling laws for input and output parameters.

Author(s): Fabian Burmann (ETH Zurich), Jiawen Luo (ETH Zurich), Philippe David Marti (ETH Zurich), and Andrew Jackson (ETH Zurich)

Domain: Climate, Weather and Earth Sciences

## Ginkgo — A High-Performance Portable Numerical Linear Algebra Software

Numerical linear algebra building blocks are used in many modern scientific applications codes. Ginkgo is an open-source numerical linear algebra software designed around the principles of portability, flexibility, usability, and performance. The Ginkgo library is integrated into the deal.II, MFEM, OpenFOAM, HYTEG, Sundials, XGC, HiOp, and OpenCARP scientific applications, ranging from finite element libraries to CFD, power grid optimization, and heart simulations. The Ginkgo library grew from a math library supporting CPUs and NVIDIA GPUs to an ecosystem that has native support for GPU architectures from NVIDIA, AMD, and Intel, which can scale up to hundreds of GPUs. One of the keys to this success is the rapid development and availability of new algorithmic functionalities in the Ginkgo library such as, but not limited to, Multigrid preconditioner, advanced mixed-precision iterative solvers and preconditioners, a sparse iterative batched functionality, sparse direct solvers, and the distributed MPI-based backend. This poster will expose Ginkgo’s library design, performance results on a wide range of hardware, and integration within key applications.

Author(s): Terry Cojean (Karlsruhe Institute of Technology), Isha Aggarwal (Karlsruhe Institute of Technology), Natalie Beams (University of Tennessee), Hartwig Anzt (University of Tennessee), Yen-Chen Chen (Karlsruhe Institute of Technology), Thomas Grützmacher (Karlsruhe Institute of Technology), Fritz Göbel (Karlsruhe Institute of Technology), Marcel Koch (Karlsruhe Institute of Technology), Gregor Olenik (Karlsruhe Institute of Technology), Pratik Nayak (Karlsruhe Institute of Technology), Tobias Ribizel (Karlsruhe Institute of Technology), and Yu-Hsiang Tsai (Karlsruhe Institute of Technology)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Global Sensitivity Analysis of High-Dimensional Models with Correlated Inputs

Global sensitivity analysis is an important tool used in many domains of computational science to either gain insight into the mathematical model and interaction of its parameters or study the uncertainty propagation through the input-output interactions. This works introduces a comprehensive framework for conducting global sensitivity analysis on models with correlated inputs. Traditional sensitivity analysis methods assume independence between inputs and can provide misleading results when this assumption is violated. The proposed approach addresses parameter correlations using transformations such as Rosenblatt and Cholesky, which are incorporated into a polynomial surrogate model used to evaluate sensitivity indices. The effectiveness of the method is demonstrated through numerical experiments, which are conducted using the EasyVVUQ framework. The sensitivity analysis requires numerous execution of the target application, which requires significant computational resources. The numerical experiments are thus executed using HPC platforms equipped with a metascheduler and workflow automation tools. The results of these experiments are discussed and provide insights into the impact of correlated inputs on the sensitivity analysis.

Author(s): Juraj Kardos (Università della Svizzera italiana), Olaf Schenk (Università della Svizzera italiana), Derek Groen (Brunel University London), and Diana Suleimenova (Brunel University London)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## GPU-Accelerated Modelling of Greenhouse Gases and Air Pollutants in ICON with OpenACC

Releasing excess greenhouse gases into the atmosphere is the major cause of its natural composition alternation and climate change. Computational modelling of the atmospheric chemistry and transport processes has played a vital role in enhancing our understanding of such complex phenomena and helped develop major policy guidelines. Advanced high performance computing systems with heterogeneous architectures, that utilise “accelerators” such as Graphical Processing Units (GPUs), have provided the opportunity to develop scale-resolving high-resolution numerical weather prediction frameworks with complex atmospheric chemistry, physics and transport models. Compiler directives of OpenACC provide a relatively fast and high-level coding approach for such heterogeneous computing. In the present work, OpenACC is used to accelerate the “Online Emission Module (OEM)” of the FORTRAN-based atmospheric chemistry and transport model ICON-ART. The module reads in a small number of emission fields and handles essential processing steps during the simulation. The GPU-accelerated OEM code is tested based on cases of methane distribution over Europe. The newly developed code exhibits a noticeable speed-up, comparable with that of the baseline ICON GPU code. The GPU-accelerated OEM will be used to perform ICON-ART simulations coupled with the Carbon Tracker Data Assimilation Shell (CTDAS) at a higher spatial resolution than is currently possible.

Author(s): Arash Hamzehloo (Empa), and Dominik Brunner (Empa)

Domain: Climate, Weather and Earth Sciences

## GPU-Optimized Tridiagonal and Pentadiagonal System Solvers for Spectral Transforms in QuiCC

QuiCC is a code under development designed to solve the equations of magnetohydrodynamics in a full sphere and other geometries. It uses a fully spectral approach to the problem, with the Jones-Worland polynomials as a radial basis and Spherical Harmonics as a spherical basis. We present an alternative to the quadrature approach to their evaluation – the polynomial connection approach, which is more accurate and requires less memory. In this work, we demonstrate an efficient GPU implementation of this algorithm. This poster focuses on the efficient tridiagonal and pentadiagonal GPU solvers used to evaluate the polynomial connections. Based on the Parallel Cyclic Reduction algorithm, they are optimized to exclusively perform on-chip data transfers through the warp shuffling instructions, exchanging data directly between threads registers. This results in the best occupancy (more registers per thread, more threadblocks per streaming multiprocessor) and full dispatch latency mitigation (no kernel synchronization during execution). The warp-shuffle approach to thread data exchange can be adapted for many other GPU algorithms as it is developed in the runtime code generation platform designed for future algorithm reuse, originally based on the VkFFT library.

Author(s): Dmitrii Tolmachev (ETH Zurich), Philippe Marti (ETH Zurich), Giacomo Castiglioni (ETH Zurich), Daniel Ganellari (ETH Zurich / CSCS), and Andrew Jackson (ETH Zurich)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## GT4Py: A Python Framework for the Development of High-Performance Weather and Climate Applications

GT4Py is a Python framework for weather and climate applications simplifying the development and maintenance of high-performance codes in prototyping and production environments. GT4Py separates model development from hardware architecture dependent optimizations, instead of intermixing both together in source code, as regularly done in lower-level languages like Fortran, C, or C+. Domain scientists focus solely on numerical modeling using a declarative embedded domain specific language supporting common computational patterns of dynamical cores and physical parametrizations. An optimizing toolchain then transforms this high-level representation into a finely-tuned implementation for the target hardware architecture. This separation of concerns allows performance engineers to implement new optimizations or support new hardware architectures without requiring changes to the application, increasing productivity for domain scientists and performance engineers alike. We will present recent developments in the project: support for non-cartesian meshes, a new programming interface enabling operator composition, and a redesigned intermediate representation that separates stateful from stateless computations, simplifying the creation of optimization passes. We further showcase a performance comparison of the ICON model between the original Fortran implementation using openACC and a GT4Py-enabled version developed as part of the EXCLAIM project.

Author(s): Mauro Bianco (ETH Zurich / CSCS), Till Ehrengruber (ETH Zurich / CSCS), Nicoletta Farabullini (ETH Zurich), Abishek Gopal (ETH Zurich), Linus Groner (ETH Zurich / CSCS), Rico Häuselmann (ETH Zurich / CSCS), Peter Kardos (ETH Zurich), Samuel Kellerhals (ETH Zurich), Magdalena Luz (ETH Zurich), Christoph Müller (MeteoSwiss), Enrique G. Paredes (ETH Zurich / CSCS), Matthias Roethlin (MeteoSwiss), Felix Thaler (ETH Zurich / CSCS), Hannes Vogt (ETH Zurich / CSCS), Benjamin Weber (MeteoSwiss), and Thomas C. Schulthess (ETH Zurich / CSCS)

Domain: Climate, Weather and Earth Sciences

## High Performance Computing Meets Approximate Bayesian Inference

Despite the ongoing advancements in Bayesian computing, large-scale inference tasks continue to pose a computational challenge that often requires a trade-off between accuracy and computation time. Combining solution strategies from the field of high-performance computing with state-of-the-art statistical learning techniques, we present a highly scalable approach for performing spatial-temporal Bayesian modelling based on the methodology of integrated nested Laplace approximations (INLA). The spatial-temporal model component is reformulated as the solution to a discretized stochastic partial differential equation which induces sparse matrix representations for increased computational efficiency. We leverage the power of today’s distributed compute architectures by introducing a multi-level parallelism scheme throughout the algorithm. Moreover, we rethink the computational kernel operations and derive GPU-accelerated linear algebra solvers for fast and reliable model predictions.

Author(s): Lisa Gaedke-Merzhäuser (Università della Svizzera italiana), Haavard Rue (King Abdullah University of Science and Technology), and Olaf Schenk (Università della Svizzera italiana)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## High-Performance Computing by and for Patient Specific Mechanical Properties

Modeling the mechanical behavior of human trabecular bones improves technical applications and the treatment of fractures and bone or joint related diseases. However, this type of bone consists of a large number of struts and plates, resulting in a highly anisotropic and patient specific behavior. Furthermore, the tissue adapts over time and cannot be scanned in situ, by low resolution, clinical Computed Tomography (cCT) Scanners. With high resolution microfocus CT (µCT) scanners that are imaging the specimen ex vitro, the detailed shape can be derived with binary segmentation and direct discretization. We’ve developed a massively parallel software stack to compute the mechanical properties based on µCT and cCT images. We not only can solve such biomechanical challenges, but can also analyze the power draw and the energy usage of our high-performance computer HLRS Hawk in detail. Analyzing the hardware logging and combining these data with morphometric numbers of the bone leads to new approaches for saving energy and accelerating the computations. Overall, we show that targeting applications on different scales (cCT and µCT) with high-performance computers and improving the use of them can be accomplished simultaneously.

Author(s): Johannes Gebert (High-Performance Computing Center Stuttgart, University of Stuttgart), Ralf Schneider (High-Performance Computing Center Stuttgart, University of Stuttgart), and Michael Resch (High-Performance Computing Center Stuttgart, University of Stuttgart)

Domain: Life Sciences

## High-Throughput Computational Screening of Fast Li-Ion Conductors

We present a high-throughput computational screening to find fast Li-ion conductors to identify promising candidate materials for application in solid-state electrolytes. Starting with ~30,000 experimental structures sourced from COD, ICSD and MPDS repositories, we performed highly automated calculations using AiiDA at the level of Density Functional Theory (DFT) to identify electronic insulators and to estimate lithium ion diffusivity using the pinball model which describes the potential energy landscape of diffusing lithium at near DFT level accuracy while being orders of magnitude faster. We present the workflow where the accuracy of the pinball model is improved self-consistently and which is necessary in automatically running the thousands of required calculations and analysing their results. About a hundred promising super ionic conductors are further studied with first principles Molecular Dynamics simulations.

Author(s): Tushar Thakur (EPFL), Loris Ercole (EPFL), and Nicola Marzari (EPFL)

Domain: Chemistry and Materials

## ICON-GPU for Numerical Weather Prediction – A Status Report

Weather prediction centers are always seeking ways to improve the computational performance of their numerical weather prediction (NWP) models while staying within budget. The era of ever improving scalar CPU has come to an end but massively multiprocessing GPUs are advertised as the next step forward. The ICON framework, a large and continuously developed community code, has been adapted to work with GPU systems through a multi-institute effort using OpenACC directives. MeteoSwiss plans to use ICON-GPU operationally for limited area forecasts in 2023 and current development activities also make ICON-GPU ready to support the enhanced feature set used operationally by the German Weather Service (DWD). On our poster, we present the general porting strategy and the current state of the port. We discuss specific optimizations and the lessons learned while iteratively porting an actively developed code. Finally, we present the performance on current GPU and CPU machines and compare them to the currently operational setup on the DWD vector supercomputer.

Author(s): Marek Jacob (DWD), Dmitry Alexeev (NVIDIA inc.), Daniel Hupp (MeteoSwiss), Xavier Lapillonne (MeteoSwiss), Florian Prill (DWD), Daniel Reinert (DWD), and Günther Zängl (DWD)

Domain: Climate, Weather and Earth Sciences

## An Insilico Analysis of Schiff Base Derivatives to Identify Potential Inbhitors for Breast Cancer Multitargeted Proteins: Virtual Screening and Molecular Dynamics Simulation

Breast cancer is one of the most common malignancies in women worldwide and is a leading cause of mortality in every country. In order to create a potent treatment for breast cancer, this work uses bioinformatic approaches to identify appropriate and efficient compounds among a large number of molecules having Schiff bases. The proteins that are employed in this investigation are important contributor in breast cancer. Using the SwissADME online server, the Schiff base compounds with proven anticancer effects were examined for pathophysiological significance, pharmacokinetic traits, and drug-like qualities [1]. Additionally, the bioavailability and toxicity profiles of these compounds were evaluated using the SwissADME and ADMETlab 2.0 online servers, respectively. Using a bibliographic analysis, the six five receptors, EGFR, PR, mTOR, p53R2 and CTLA4 were identified. Furthermore, from the screening of 61 compounds, 58 molecules satisfied the Lipinski criterion. The compounds with each receptor were selected based on the binding affinity scores of 58 molecules, yielding the top 12 therapeutic candidates, all of which were risk-free and non-toxic. These results, in our opinion, will support the creation of conventional medical treatment modalities and the discovery of promising hits for lead optimization in the future development of breast cancer drugs.

Author(s): Presenjit Varma (Babasaheb Bhimrao Ambedkar University), and Divya Gautam (Indian Institute of Technology Roorkee)

Domain: Life Sciences

## Interpretable Compression of Fluid Flows Using Graph Neural Networks

Neural network (NN) based reduced-order models (ROMs) via autoencoding have been shown to drastically accelerate traditional computational fluid dynamics (CFD) simulations for rapid design optimization and prediction of fluid flows. However, many real-world applications (e.g. hypersonic propulsion, pollutant dispersion in cities, wildfire spread) rely on complex geometry treatment and unstructured mesh representations in the simulation workflow — in this setting, conventional NN-based modeling approaches break down. Instead, it is necessary to use frameworks that (a) easily interface with unstructured grid data, and (b) are not restricted to single geometric configurations after training. The goal here is to address this through the development of an interpretable autoencoding strategy based on the graph neural network (GNN) paradigm. More specifically, a novel graph autoencoder architecture is developed for ROM-amenable autoencoding. An adaptive graph pooling strategy, combined with multiscale message passing operations, is shown to produce interpretable latent spaces through the identification of coherent structures. With this notion of interpretability established, analysis is then conducted on effects of compression factors, physical significance of identified coherent structures, and impact of multi-scale message passing on reconstruction errors.

Author(s): Shivam Barwey (Argonne National Laboratory), and Romit Maulik (Argonne National Laboratory, University of Pennsylvania)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Investigating the Mechanism of a Local Windstorm in the Swiss Alps Using Large-Eddy Simulations

The Laseyer windstorm is a local and strong wind phenomenon in the narrow Schwende valley in northeastern Switzerland. The phenomenon has raised the interest of meteorologists as it has – in the past – led to derailments of the local train. It is characterised by easterly to southeasterly winds during strong northwesterly ambient wind conditions, thus the wind in the valley reverses compared to the flow further aloft. We use a newly developed large-eddy simulation (LES) configuration of the FVM atmospheric model based on Python and GT4Py to gain a better understanding of the Laseyer windstorm. The LES are performed in the very steep and highly complex terrain using idealised initial and boundary conditions. With this numerical tool we can study the sensitivity to ambient flow parameters in a controlled environment. As a result, we are able to shed light on the mechanism of the Laseyer by revealing the flow structure and variability in the narrow Schwende valley. A set of idealised LES with carefully selected ambient flow parameters allows us to identify the wind direction and wind speed conditions needed for the occurrence of the Laseyer and its response to changing ambient flow conditions.

Author(s): Nicolai Krieger (ETH Zurich), Christian Kühnlein (ECMWF), Michael Sprenger (ETH Zurich), and Heini Wernli (ETH Zurich)

Domain: Climate, Weather and Earth Sciences

## Iterative Refinement With Hierarchical Low-Rank Preconditioners Using Mixed Precision

It has been shown that the solution to a dense linear system can be accelerated by using mixed precision iterative refinement relying on approximate LU-factorization. While most recent work has focused on obtaining such a factorization at a reduced precision, we investigate an alternative via low-rank approximations. Using the hierarchical matrix format, we are able to benefit from the reduced complexity of the LU-factorization, while being able to compensate for the accuracy lost in the approximation via iterative refinement. The resulting method is able to produce results accurate to a double precision solver at a lower complexity of O (n 2 ) for certain matrices. We evaluate our approach for matrices arising from BEM for 2-dimensional problems. First, an experimental analysis of the convergence behaviour is conducted, assuring that we are able to adhere to the same error bounds as mixed precision iterative refinement. Afterwards, we evaluate the performance in terms of the execution time, comparing it to a general dense solver from LAPACK and preconditioned GMRES. On large matrices, we are able to achieve a speedup of more than 16 times when compared to a dense solver.

Author(s): Thomas Spendlhofer (Tokyo Institute of Technology), and Rio Yokota (Tokyo Institute of Technology)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## A Language-Interoperable C++-Based Memory-Manager for the ICON Climate and Weather Prediction Model

HPC machines now use accelerators such as GPUs. In addition, CPUs themselves now feature many cores as well as special fast memory, like the Fujistu A64FX and Intel Sapphire Rapids. These rapid changes create important challenges for simulation codes to accommodate different parallel programming models, but also orchestrate processing units and memory locations. In this poster, we present a new memory-manager concept for the ICON climate and weather prediction model to store and manipulate model variables. This new memory-manager concept allows going beyond OpenACC pragmas for GPU portability of the current Fortran code, thanks to more fine-grained memory management. The memory manager is written in C++ which allows supporting vendor-native parallel programming frameworks which are also C++-based, like CUDA, HIP, and SYCL, or portability layers like Kokkos. At the same time, the language-interoperability enables keeping backward compatibility with Fortran while introducing concepts facilitating better ICON component interfaces, but also to gradually and iteratively migrate parts of the code to newer languages for better efficiency as needed. This memory-manager concept has been introduced within the µphys subcomponent of ICON. In this poster, we show that beyond the ICON-specific aspects, this concept can be translated into many existing simulation codes.

Author(s): Claudius Holeksa (Karlsruhe Institute of Technology), Ralf Müller (German Climate Computing Centre), Jörg Behrens (German Climate Computing Centre), Florian Prill (DWD), Christopher Bignamini (ETH Zurich / CSCS), Will Sawyer (ETH Zurich / CSCS), Xavier Lapillonne (MeteoSwiss), Sergey Kosukhin (Max Planck Institute for Meteorology), Daniel Klocke (Max Planck Institute for Meteorology), Terry Cojean (Karlsruhe Institute of Technology), Yen-Chen Chen (Karlsruhe Institute of Technology), Hartwig Anzt (University of Tennessee), and Claudia Frauen (German Climate Computing Centre)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## LIBRSB: Multicore Sparse Matrix Performance Across Languages and Architectures

LIBRSB (http://librsb.sf.net/) is a highly interoperable multicore CPU-oriented library for sparse matrix computations.

It serves as a component in sparse linear solvers. LIBRSB builds upon its “RSB” hierarchical and coarse-grained sparse matrices storage. The RSB data structure and algorithms are geared for efficient “Sparse BLAS”-like operations, namely variants of sparse multiply and triangular solution. In addition to Sparse BLAS, LIBRSB also provides the operations commonly required by interpreted languages. Thanks to that, it can serve the needs of higher level numerical languages like GNU Octave or Python, without sacrificing much of its performance characteristics. This poster presents an overview of usage modes, their advantages, and what’s in the works. Emphasis is on multi-language support, as well as the different portability aspects.

Author(s): Michele Martone (Leibniz Supercomputing Centre)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Loki v0.1.1: A Source-To-Source Translation Tool for Numerical Weather Prediction Codes and More

All known or presumed candidates for exascale supercomputers will feature novel computing hardware or heterogeneous architectures, with GPUs currently being a cornerstone of this development. Using these machines efficiently with today’s numerical weather prediction (NWP) codes requires adapting large code bases to new programming paradigms and applying architecture specific optimizations. Encoding all these optimizations within a single code base is infeasible. Source-to-source translation offers the possibility to use existing code as-is and apply the necessary transformations and hardware-specific optimizations. To that end, we present Loki, a Python tool purpose built for ECMWF’s Integrated Forecasting System (IFS) that offers automatic source-to-source translation capabilities based on compiler technology to target a broad range of programming paradigms. Following the recent open-source release of version 0.1.1, Loki is available on GitHub and ready for testing by the weather and climate community and beyond. It offers an API to encode custom transformations, allowing for expert-guided code translation. It supports multiple Fortran front ends, and can output Fortran, C, Python and now also CUDA Fortran. In this poster, we highlight Loki’s key features, and present a performance comparison between auto-translated code and manually optimized variants.

Author(s): Michael Staneker (ECMWF), Ahmad Nawab (ECMWF), Balthasar Reuter (ECMWF), and Michael Lange (ECMWF)

Domain: Climate, Weather and Earth Sciences

## Mapping a Coupled Earth-System Simulator onto the Modular Supercomputer Architecture

The Modular Supercomputer Architecture concept, developed for the DEEP project series, describes a novel kind of heterogeneous computing platform comprising several different “modules”, each of which is a separate compute cluster in its own right. The modules are connected with a federated network to allow heterogeneous jobs to execute across them. One module may be GPU-based to benefit compute kernels with dense linear algebra or machine learning tasks, for example, whereas another module may have a particularly well optimised file system. The truly heterogeneous Modular Supercomputer Architecture therefore works particularly well for complex applications comprising a range of different compute patterns. One such application is the Earth system simulation, in which the Earth system is broken down into individual components for representing the atmosphere, the ocean, the land surface, and others. Here we present results from adapting the European Centre for Medium-Range Weather Forecasts’s Earth system model, the Integrated Forecasting System, to take advantage of the Modular Supercomputing Architecture. We focus on the relationship between two particularly compute-intensive model components: the atmosphere and the ocean. We will present results from performing concurrent heterogeneous atmosphere-ocean integrations on a prototypical Modular Supercomputer Architecture system, the DEEP machine at the Jülich Supercomputing Centre.

Author(s): Samuel Hatfield (ECMWF), Olivier Marsden (ECMWF), Kristian Mogensen (ECMWF), and Ioan Hadade (ECMWF)

Domain: Climate, Weather and Earth Sciences

## A Massively Parallel Approach to Forecasting Electricity Prices

With the ongoing energy crisis in Europe, accurate forecasting of electricity price levels and volatility is essential to planning grid operations and protecting consumers from extreme prices. We present how massively parallel stochastic optimal power flow models can be deployed on modern many-core architectures to efficiently forecast power grid configurations in real time. Processing of stochastic weather and economic scenarios is optimized on many-core CPUs to achieve maximal throughput and minimize latency from the receipt of weather data to the output and interpretation of model results.

Author(s): Timothy Holt (Università della Svizzera italiana, Oak Ridge National Laboratory)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Modeling a Novel Laser-Driven Electron Acceleration Scheme: Particle-In-Cell Simulations at the Exascale.

Intense femtosecond lasers focused on low-density gas jets can accelerate ultra-short electron bunches up to very high energies (from hundreds of MeV to several GeV) over a few millimeters or a few centimeters. However, conventional laser-driven electron acceleration schemes do not provide enough charge for most of the foreseen applications. To address this issue, we have devised a novel scheme consisting of a gas jet coupled to a solid target to accelerate substantially more charge. In 2022 we validated this concept with proof-of-principle experiments at the LOA laser facility (France), and with a large-scale Particle-In-Cell simulation campaign, carried out with the open-source WarpX code. Performing such simulations requires the use of the most powerful supercomputers in the world, as well as advanced numerical techniques such as mesh refinement, which is very challenging to implement in an electromagnetic Particle-In-Cell code, and indeed unique to the WarpX code. A work describing the technical challenges that we addressed to make these simulations possible was awarded the Gordon Bell prize in 2022. In this contribution, we will also discuss the performance portability of the WarpX code by presenting scaling tests on Frontier, Fugaku, Summit, and Perlmutter supercomputers.

Author(s): Luca Fedeli (CEA), Axel Huebl (Lawrence Berkeley National Laboratory), France Boillod-Cerneux (CEA), Thomas Clark (CEA), Kevin Gott (Lawrence Berkeley National Laboratory), Conrad Hillairet (Arm), Stephan Jaure (Atos), Adrien Leblanc (ENSTA), Rémi Lehe (Lawrence Berkeley National Laboratory), Andrew Myers (Lawrence Berkeley National Laboratory), Christelle Piechurski (GENCI), Mitsuhisa Sato (RIKEN), Neil Zaïm (CEA), Weiqun Zhang (Lawrence Berkeley National Laboratory), Jean-Luc Vay (Lawrence Berkeley National Laboratory), and Henri Vincenti (CEA)

Domain: Physics

## MPI for Multi-Core, Multi Socket, and GPU Architectures: Optimised Shared Memory Allreduce

In the literature the benefits of shared memory collectives especially allreduce have been shown. This intra-node communication is not only necessary for single node communications but it is also a key component of more complex inter-node communication algorithms [1]. In contrast to [2], our implementation of shared memory usage is invisible to the user of the library, the data of the send and receive buffers is not required to reside in shared memory already but the data from the send buffer is copied into the shared memory segment in parallel chunks where commutative reduction operations are necessary. Subsequently, the data is further reduced within the shared memory segment using a tree-based algorithm. The final result is then copied to the receive buffer. The reduction operations and synchronization barriers are combined during this process, and the algorithm is adapted, depending on performance measurements.

[1] Jocksch, A., Ohana, N., Lanti, E., Koutsaniti, E., Karakasis, V., Villard, L.: An optimisation of allreduce communication in message-passing systems. Parallel Computing 107, 102812 (2021)

[2] Li, S., Hoefler, T., Hu, C., Snir, M.: Improved MPI collectives for MPI processes in shared address spaces. Cluster computing 17(4), 1139–1155 (2014)

Author(s): Andreas Jocksch (ETH Zurich / CSCS), and Jean-Guillaume Piccinali (ETH Zurich / CSCS)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Multigrid in H(curl) on Hybrid Tetrahedral Grids

In many applications large scale solvers for Maxwell’s equations are an indispensable tool. This work presents theory and algorithms that are relevant to the solution of Maxwell’s equations as well as their implementation in HyTeG. We focus on multigrid methods for the curl-curl-problem which arises from the time-harmonic formulation of Maxwell’s equations. This problem is challenging because it is not elliptic and therefore standard multigrid smoothers are not effective. We rely on finite element exterior calculus (FEEC) to explain our choice of discretization: linear Nédélec edge elements of the first kind. FEEC is a relatively recent theory used to design stable finite element discretizations for a wide class of problems. It is centered around preserving certain structures of chain complexes exactly when going to the discrete level. The techniques introduced by FEEC also explain how effective multigrid smoothers in H(curl) can be designed. These were first introduced by Hiptmair in 1998. HyTeG is a finite element framework designed for massively parallel compute architectures. It supersedes the HHG framework which was already capable of solving systems with 1.1e13 unknowns. The key building block to achieve these impressive results is a matrix-free implementation of geometric multigrid on hybrid tetrahedral grids.

Author(s): Daniel Bauer (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Multilevel and Domain-Decomposition Solution Strategies for Solving Large-Scale Phase-Field Fracture Problems

The phase-field approach for fracture propagation is a state-of-the-art technique for simulating crack initiation, propagation, and coalescence. In this approach, a damage field, called the phase field, is introduced that characterizes the material state from intact to fully broken. Even though the phase field is a robust tool for modeling crack propagation, it gives rise to a strongly nonlinear system of equations. Due to this reason, it becomes essential to develop efficient and robust solution methods for solving the phase-field problem. To this aim, we propose to solve the nonlinear problems arising from the discretization of the phase-field fracture formulation using domain decomposition and multilevel methods. We employ the Recursive Multilevel Trust Region Method (RMTR) method in the context of the multilevel method, while we employ the Schwarz preconditioned inexact Newton method (SPIN) in the context of the domain decomposition method. In this work, we will present the required modifications in both solution strategies for solving the fracture problems. We will show the convergence properties and the performance of the RMTR and SPIN methods using several benchmark problems from the field of fracture mechanics where we will show that our methods outperforms widely used alternate minimization method.

Author(s): Hardik Kothari (Università della Svizzera italiana), Alena Kopanicakova (Brown University, Università della Svizzera italiana), Patrick Zulian (Università della Svizzera italiana, UniDistance Suisse), Maria Nestola (Università della Svizzera italiana), Edoardo Pezzulli (ETH Zurich), Thomas Driesner (ETH Zurich), and Rolf Krause (Università della Svizzera italiana, UniDistance Suisse)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Novel Geometric Deep Learning Surrogate Framework for Non-Linear Finite Element Simulations

Conventional numerical methods are computationally expensive in simulating non-linear phenomena arising in mechanics. In this aspect, deep learning (DL) techniques are being increasingly used for accelerating simulations in mechanics. However, existing DL methods perform inefficiently as the size and complexity of the problem increases. In this work we propose a novel geometric deep learning surrogate framework, which can efficiently find non-linear mappings between mesh-based datasets. In particular, we propose two novel neural network layers, Multichannel Aggregation (MAg) layer, and the graph pooling layer, which are combined to constitute a robust graph U-Net architecture. Our framework can efficiently tackle problems involving complex fine meshes and scales efficiently to large dimensional inputs. We validate the performance of our framework by learning on numerically generated non-linear finite element datasets and by comparing the performance to state-of-the-art convolutional neural network frameworks. In particular, we show that the proposed GDL framework is able to accurately predict the nonlinear deformations of irregular soft bodies in real-time.

Author(s): Saurabh Deshpande (University of Luxembourg), Jakub Lengiewicz (University of Luxembourg, IPPT PAN), and Stéphane Bordas (University of Luxembourg)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## A Novel Stochastic Parameterization for Lagrangian Modeling of Atmospheric Aerosol Transport

In recent years, it has become clear that the behavior of atmospheric aerosols has a non-negligible effect on radiative forcing within Earth’s climate and the computational models that simulate it [Carslaw, et al., Nature, 2013]. Thus, we must obtain descriptive aerosol models that are also predictive, particularly in a time when aerosol-emitting ships may soon traverse the polar arctic ocean and there is credible talk about climate intervention strategies like stratospheric aerosol injection. This begs the question of how we may accurately describe our changing climate or dynamic weather patterns in the face of such uncertainty. We propose a novel stochastic model that employs transport parameters that operate on differing scales and vary according to their respective machine-learned probability distribution. This parameterization allows our transport variables to be functions of space, time, and relevant exogenic properties, and forcing effects may be added, subtracted, or altered as we gain more confidence in the machine learning model. To verify and validate this model, particle simulation results are compared to corresponding LES simulations, data from fog chamber experiments, and satellite imagery of ship tracks in the Pacific Ocean off the coast of California.

Author(s): Michael Schmidt (Sandia National Laboratories)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Numerical Simulation of Gradual Compaction of Granular Materials and the Uncertainty Quantification of the Proposed Mathematical Model

The poster deals with mathematical modelling of granular materials and focuses on the process of their gradual compaction called ratchetting. The model of hypoplasticity introduced by E. Bauer et al. is investigated and the problem of stress-controlled hypoplasticity is considered. The behaviour of strain paths produced by periodic stress paths in a granular material during cyclic loading and unloading is calculated and a limit state of the material is numerically approximated. Then, the impact of uncertain input parameters on ratchetting trends and the material limit states is quantified by means of fuzzy set techniques.

Author(s): Judita Runcziková (Czech Technical University in Prague), and Jan Chleboun (Czech Technical University in Prague)

Domain: Chemistry and Materials

## Optimization of Non-Conventional Airfoils for Martian Rotorcraft with Direct Numerical Simulations Using High-Performance Computing

Design of rotorcraft for Mars is challenging due to the very low density and low speed of sound compared to Earth. These conditions require Martian rotor blades to operate in a low-Reynolds-number (1,000 to 10,000 based on chord) compressible flow regime, atypical of conventional, terrestrial helicopters. Non-conventional airfoils with sharp leading edges and flat surfaces show improved performance in such conditions by inducing an unsteady lift mechanism which operates in a compressible transitional/turbulent regime. To optimize these unconventional Martian airfoils, evolutionary algorithms have previously been used. However, they typically require many cost-function evaluations. For this reason, second-order Reynolds-Averaged Navier-Stokes (RANS)/ unsteady RANS (URANS) solvers have typically been used because of their relatively low computational cost. However, these solvers can have limited predictive capability when the flow is unsteady and/or transitional. The current work overcomes this limitation by optimizing with high-order accurate direct numerical simulations (DNS) using the compressible flow solver in PyFR (www.pyfr.org). This is made possible due to the capabilities of PyFR and the resources allocated to this project on the Piz Daint supercomputer.

Author(s): Lidia Caros (Imperial College London), Oliver Buxton (Imperial College London), and Peter Vincent (Imperial College London)

Domain: Engineering

## The P4est Software for Parallel AMR: A Shared Memory Workflow.

Parallel adaptive mesh refinement (AMR) is a key technique when simulations are required to capture time-dependent and/or multiscale features. A forest of octrees is a data structure to represent the recursive adaptive refinement of an initial, conforming coarse mesh of hexahedra. This poster presents several recent enhancements to the p4est software for forest-of-octrees AMR. The first introduces new ways of encoding quadrants as atomic objects, which vary both the in-memory binary format and the associated algorithms. We present a 128-bit AVX version and an optimized long integer format, respectively. The second enhancement exploits MPI-3 shared memory windows to eliminate redundancy of quadrant and metadata storage within each shared memory node. In conclusion, we demonstrate how different approaches to shared memory use affect performance, along with the comparison of runtimes for various quadrant implementations and representative simulation pipelines.

Author(s): Mikhail Kirilin (University of Bonn), and Carsten Burstedde (University of Bonn)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Parallel Second Order Conservative Remapping on the Sphere

We present an MPI-parallel implementation and analysis of a conservative second-order interpolation method between arbitrary very high resolution spherical meshes supported by the Atlas library of ECMWF. Hence, meshes are those used by ECMWF’s IFS model: structured grids such as the reduced Gaussian grids of IFS, quasi-structured grids such as ORCA of NEMO, or fully unstructured grids of FESOM2. This work is largely based on Kritsikis et al (2017) and is here extended to allow staggered data. Additionally, we have used a different approach for typical ingredients of remapping process: that of spherical polygon intersections and that of fast search of potential intersectors. For these two we rely on available tools in the numerical library for weather simulations – Atlas. A conventional conservative remapping assumes remapping values within cell centres of a source mesh to cell centres of a target mesh. We have extended the usage of our conservative remapping, by constructing sub-polygons, and allow remapping data either from cell-centres or from cell-vertices of a source mesh to either cell-centres or cell-vertices of target mesh.

Author(s): Slavko Brdar (ECMWF), Willem Deconinck (ECMWF), Pedro Maciel (ECMWF), and Michail Diamantakis (ECMWF)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Parallel Training of Deep Neural Networks

Deep neural networks (DNNs) are used in a wide range of application areas and scientific fields. The accuracy and the expressivity of the DNNs are tightly coupled to the number of parameters of the network as well as the amount of data used for training. As a consequence, the networks and the amount of training data have grown considerably over the last few years. Since this growing trend is expected to continue, the development of novel distributed and highly-scalable training methods becomes an essential task. In this work, we propose two distributed-training strategies by leveraging nonlinear domain-decomposition methods, which are well-established in the field of numerical mathematics. The proposed training methods utilize the decomposition of the parameter space and the data space. We show the necessary algorithmic ingredients for both training strategies. The convergence properties and scaling behavior of the training methods are demonstrated using several benchmark problems. Moreover, a comparison of both proposed approaches with the widely-used stochastic gradient optimizer is presented, showing a significant reduction in the number of iterations and the execution time. In the end, we demonstrate the scalability of our Pytorch-based training framework, which leverages CUDA and NCCL technologies in the backend.

Author(s): Samuel Cruz (Università della Svizzera italiana, UniDistance Suisse), Alena Kopanicakova (Brown University, Università della Svizzera italiana), Hardik Kothari (Università della Svizzera italiana), and Rolf Krause (Università della Svizzera italiana, UniDistance Suisse)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Partial Charge Prediction and Pattern Extraction from a AttentiveFP Graph Neural Network

Molecular dynamics (MD) simulations enable the time-resolved study of bio-molecular processes. The quality of MD simulations is, however, highly dependent on the set of interaction parameters used, so-called force fields. The accurate partial-charge assignment of all simulated atoms is hence a crucial part of every MD simulation. Due to the slowly decaying nature of the Coulomb interactions, the effects of different partial-charge assignments can be observed over long distances and can have drastic effects on the stability of a MD simulation. Therefore, many schemes have been developed over the last decades to improve partial-charge assignment: Classical tabulated values, ab initio calculations, or the prediction with a machine learning model. However, all these approaches have some shortcomings in either accuracy, speed, or interpretability. Here, we present an option to combine the accuracy of ab initio calculations, the speed of machine learning models, and the interpretability of tabulated assignments. An attention-based graph neural network is trained on a diverse dataset to predict high-quality atom-in-molecule (AIM) partial charges. We then use a model-agnostic approach to extract the most important sub-graph on an atomistic level to provide the user with the same level of interpretability as for tabulated values.

Author(s): Marc Thierry Lehner (ETH Zurich)

Domain: Chemistry and Materials

## ProtoX: A First Look

Stencil operation is a key component in the numerical solution of partial differential equations. Developers tend to use different libraries that provide these operations for them. One such library is Proto. It is a C++ based domain specific library designed to provide an intuitive interface that optimizes the designing and scheduling of an algorithm aimed at solving various partial differential equations numerically. The high level of abstractions used in Proto can be fused together to improve its current performance. However, abstraction fusion cannot be performed easily by a compiler. In order to overcome this shortcoming we present ProtoX, a code generation framework for stencil operation based on Proto and uses SPIRAL as its backend. SPIRAL is a GAP based code generation system that focuses on generating highly optimized target code in C/C++. We demonstrate the construction of ProtoX by considering two examples, the 2D Poisson problem and the Euler equations that appear in the study of gas dynamics. Some of the code generated for these two problem specifications is shown along with the initial speedup result.

Author(s): Het Mankad (Carnegie Mellon University), Sanil Rao (Carnegie Mellon University), Phil Colella (Lawrence Berkeley National Laboratory), Brian Van Straalen (Lawrence Berkeley National Laboratory), and Franz Franchetti (Carnegie Mellon University)

Domain: Applied Social Sciences and Humanities

## A Research Software Engineering Workflow for Computational Science and Engineering

We present a Research Software Engineering (RSE) workflow for developing research software in Computational Science and Engineering (CSE) in university research groups. Their members have backgrounds from different scientific disciplines and often lack education in RSE. Research software development lasts many years, contrary to team members leaving regularly. Combining and re-using ideas and results from others is a fundamental principle of science. In CSE, research software embodies research ideas. As CSE research advances, research software should grow sustainably over the years. To increase the sustainability of research software, our workflow simplifies investigation and integration of research ideas, ensures reproducibility, and new functionality does not impair existing one. These practices speed up research and increase the quality of scientific output. Our CSE-RSE workflow is simple, effective, and largely ensures the FAIR principles (Wilkinson, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 3, 160018 (2016)). The workflow uses established practices and tools, pragmatically adapted for CSE research software: version-control, secondary-data standards, continuous integration, and containerization. A detailed description of the CSE-RSE workflow is available as preprint (Marić, et al. A Research Software Engineering Workflow for Computational Science and Engineering. Preprint, https://doi.org/10.48550/arXiv.2208.07460 (2022)).

Author(s): Moritz Schwarzmeier (TU Darmstadt), Tomislav Marić (TU Darmstadt), Tobias Tolle (TU Darmstadt), Jan-Patrick Lehr (TU Darmstadt), Ioannis Pappagianidis (TU Darmstadt), Benjamin Lambie (TU Darmstadt), Dieter Bothe (TU Darmstadt), and Christian Bischof (TU Darmstadt)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## A Scalable Interior-Point Method for PDE-Constrained Inverse Problems Subject to Inequality Constraints

We present a scalable computational method for large-scale inverse problems with PDE and inequality constraints. Such problems are used to learn spatially distributed variables that respect bound constraints and parametrize PDE-based models from unknown or uncertain data. We first briefly overview PDE-constrained optimization and highlight computational challenges of Newton-based solution strategies, such as Krylov-subspace preconditioning of Newton linear systems for problems with inequality constraints. These problems are particularly challenging as their respective first order optimality systems are coupled PDE and nonsmooth complementarity conditions. We propose a Newton interior-point method with a robust filter-line search strategy whose performance is independent of the problem discretization. To solve the interior-point Newton linear systems we use a Krylov-subspace method with a block Gauss-Seidel preconditioner. We prove that the number of Krylov-subspace iterations is independent of both the problem discretization as well as any ill-conditioning due to the inequality constraints. We also present computational results, using MFEM and hypre linear solver packages, on an inverse problem wherein the block Gauss-Seidel preconditioner apply requires only a few scalable algebraic multigrid solves and thus permits the scalable solution of the PDE- and bound-constrained example problem. We conclude with future directions and outlook.

Author(s): Tucker Hatland (Lawrence Livermore National Laboratory), Cosmin Petra (Lawrence Livermore National Laboratory), Noemi Petra (University of California Merced), and Jingyi Wang (Lawrence Livermore National Laboratory)

Domain: Computer Science, Machine Learning, and Applied Mathematics

## Simulating Aquaplanet Using ICON with a GT4Py DSL Dynamical Core

We present the results of our efforts porting the dynamical core of the ICON climate and numerical weather prediction (NWP) model to GT4Py. GT4Py is a Domain Specific Language (DSL) designed for weather and climate applications, which allows domain scientists to write performance portable climate and weather code within a high level Python-based frontend. Porting code to GT4Py greatly improves readability as compared to equivalent GPU-accelerated codes written in Fortran + OpenACC. Additionally the DSL allows for a separation of concerns between the domain scientists, software engineers and optimization experts. The fine grained and automatic integration of the DSL generated code back into the Fortran ICON code enables us to compare the original Fortran code to the DSL generated code by running both versions. We call this the verification mode of the integration. On the other hand, in the substitution mode, only the DSL generated version of the code is executed. After a thorough verification process of the model porting, we present results from the Aquaplanet idealized experiment on a global icosahedral grid of resolution ~80km. We also compare the performance of ICON with GT4Py dycore with the standard ICON-NWP model running on GPUs.

Author(s): Christoph Müller (MeteoSwiss), Abishek Gopal (ETH Zurich / CSCS), Nicoletta Farabullini (ETH Zurich), Till Ehrengruber (ETH Zurich / CSCS), Samuel Kellerhals (ETH Zurich), Peter Kardos (ETH Zurich), Magdalena Luz (ETH Zurich), Matthias Röthlin (MeteoSwiss), Enrique G. Paredes (ETH Zurich / CSCS), David Leutwyler (MeteoSwiss), Benjamin Weber (MeteoSwiss), Rico Häuselmann (ETH Zurich / CSCS), Felix Thaler (ETH Zurich / CSCS), Jonas Jucker (ETH Zurich), Linus Groner (ETH Zurich / CSCS), Hannes Vogt (ETH Zurich / CSCS), Mauro Bianco (ETH Zurich / CSCS), Anurag Dipankar (ETH Zurich), Carlos Osuna (MeteoSwiss), and Xavier Lapillonne (MeteoSwiss)

Domain: Climate, Weather and Earth Sciences

## Towards a GPU-Enabled Linear-Response Algorithm in the SIRIUS Library

Electronic-structure approaches have become integral in materials science, physics and chemistry for studying existing and designing and discovering novel materials. Among the properties that can be studied, spectral properties of materials provide a wealth of information, and can be obtained from Koopmans spectral functionals as implemented in Quantum ESPRESSO. The linear-response algorithm in which the spectral properties rely on is a computationally expensive step, and is needed also in the calculation of the Hubbard parameters, U and V, when using extended Hubbard functionals. To reduce the computational cost and benefit from accelerated architectures such as GPUs, we use the SIRIUS library and further develop in it the linear-response algorithm for execution on GPUs. In this work we present the implementation, validation and preliminary performance of the linear-response algorithm.

Author(s): Giannis D. Savva (EPFL), Iurii Timrov (EPFL), Nicola Colonna (Paul Scherrer Institute), Anton Kozhevnikov (ETH Zurich / CSCS), and Nicola Marzari (EPFL)

Domain: Chemistry and Materials

## Towards a Python-Based Performance-Portable Finite-Volume Dynamical Core for Numerical Weather Prediction

We present recent progress in the development of a high-performance Python implementation of the FVM non-hydrostatic dynamical core at ECMWF and its member state partners. The FVM numerical formulation centred about 3D semi-implicit time integration of the fully compressible equations with finite-volume non-oscillatory advection is amenable for convective-scale resolutions and provides competitive time-to-solution. At the same time, it maps efficiently to modern supercomputer architectures offering multi-level parallelism. Here, we particularly highlight the sustainable software design of FVM with respect to emerging and future heterogeneous computing platforms by leveraging the GT4Py framework. Furthermore, we discuss aspects of coupling and porting selected ECMWF physical parameterizations to GT4Py.

Author(s): Stefano Ubbiali (ETH Zurich), Till Ehrengruber (ETH Zurich / CSCS), Nicolai Krieger (ETH Zurich), Christian Kühnlein (ECMWF), Lukas Papritz (ETH Zurich), and Heini Wernli (ETH Zurich)

Domain: Climate, Weather and Earth Sciences

## Tunable And Portable Extreme-Scale Drug Discovery Platform At Exascale: the LIGATE Approach

Today digital revolution is having a dramatic impact on the pharmaceutical industry and the entire healthcare system. The implementation of machine learning, extreme-scale computer simulations, and big data analytics in the drug design and development process offers an excellent opportunity to lower the risk of investment and reduce the time to the patient. Within the LIGATE project, we aim to integrate, extend, and co-design best-in-class European components to design Computer-Aided Drug Design (CADD) solutions exploiting today’s high-end supercomputers and tomorrow’s Exascale resources, fostering European competitiveness in the field. The proposed LIGATE solution is a fully integrated workflow that enables to deliver the result of a virtual screening campaign for drug discovery with the highest speed along with the highest accuracy. The full automation of the solution and the possibility to run it on multiple supercomputing centers at once permit to run an extreme scale in silico drug discovery campaign in few days to respond promptly for example to a worldwide pandemic crisis.

Author(s): Andrea Beccari (Dompé farmaceutici), Silvano Coletti (CHELONIA), Biagio Cosenza (Università di Salerno), Andrew Emerson (CINECA), Thomas Fahringer (University of Innsbruck), Daniele Gregori (E4 Engineering), Philipp Gschwandtner (UIBK), Erik Lindahl (KTH Royal Institute of Technology), Jan Martinovic (IT4Innovations National Supercomputing Center), Gianluca Palermo (Politecnico di Milano), and Torsten Schwede (University of Basel)

Domain: Life Sciences

## Ultra-High Resolution Simulations of Planetary Collisions

Giant impacts (GI) form the last stage of planet formation and play a key role in determining many aspects like the final structure of planetary systems and the masses and compositions of its constituents. A common choice for numerically solving the equations of motion is the Smoothed Particle Hydrodynamics (SPH) method. We present a new SPH code built on top of the modern gravity code pkdgrav3. The code uses the Fast Multipole Method (FMM) on a distributed binary tree to achieve O(N) scaling and is designed to use modern hardware (SIMD vectorization and GPU). Neighbor finding in SPH is done for a whole group of particles at once and is tightly coupled to the FMM tree code. It therefore preserves the O(N) scaling from the gravity code. A generalized Equation of State (EOS) interface allows the use of various material prescriptions. Currently available are the ideal gas and EOS for the typical constituents of planets: rock, iron, water, and hydrogen/helium mixtures. With the examples of an equal mass merger between two Earth-like bodies and a mantle stripping GI on Mercury (resolved with up to 200 million particles) we demonstrate the advantages of high-resolution SPH simulations for planet scale impacts.

Author(s): Thomas Meier (University of Zurich), Christian Reinhardt (University of Zurich), Douglas Potter (University of Zurich), and Joachim Stadel (University of Zurich)

Domain: Physics