BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230831T095746Z
LOCATION:Sertig
DTSTART;TZID=Europe/Stockholm:20230627T140000
DTEND;TZID=Europe/Stockholm:20230627T143000
UID:submissions.pasc-conference.org_PASC23_sess182_pap135@linklings.com
SUMMARY:FPGA Acceleration for HPC Supercapacitor Simulations
DESCRIPTION:Paper\n\nCharles Prouveur and Matthieu Haefele (CNRS), Tobias 
 Kenter (Paderborn University), and Nils Voss (Imperial College London)\n\n
 In the search of more energy efficient computing devices that could be ass
 embled to build future exascale systems, this study proposes a chip to chi
 p comparison between a CPU, a GPU and a FPGA, as well as a scalability stu
 dy on multiple FPGAs from two of the available vendors. The application co
 nsidered here has been extracted from a production code in material scienc
 e. This allows for the benchmarking of different implementations to be per
 formed on a production test case and not just theoretical ones. The core a
 lgorithm is a matrix free conjugate gradient that computes the total elect
 rostatic energy thanks to an Ewald summation at each iteration. This paper
  depicts the original MPI implementation of the application, details a num
 erical accuracy study and explains the methodology followed as well as the
  resulting FPGA implementation based on MaxCompiler. The FPGA implementati
 on using 40 bits floating point number representation outperforms the CPU 
 implementation both in terms of computing power and energy usage resulting
  in an energy efficiency more than 25 times better. Compared to the GPU of
  the same generation, the FPGA reaches 60\% of the GPU performance while t
 he ratio of the performance per watt is still better by a factor of 2. Tha
 nks to its low average power usage, the FPGA bests both fully loaded CPU a
 nd GPU in terms of number of conjugate gradient iterations per second and 
 per watt. Finally, an implementation using OneAPI is described as well, sh
 owcasing a new development environment for FPGA in HPC.\n\nDomain: Chemist
 ry and Materials, Climate, Weather and Earth Sciences, Computer Science, M
 achine Learning, and Applied Mathematics &#8232;\n\nSession Chair: Mauro Bianco 
 (ETH Zurich / CSCS)
END:VEVENT
END:VCALENDAR