Data Centric Computing & the Computing Continuum, the IO-SEA Project Proposal
DescriptionMore and more High-Performance Computing workflows are based upon data collected in the field and some of them require a careful design in term of data management. For instance, tsunami risk prediction processes data collected from seismic sensors spread around the world. When seismic waves are detected, the workflow must be run as fast as possible to evaluate the tsunami risk and possibly raise an alert. We will present in this talk how the concepts and tools developed in the European funded IO-SEA project can be used to implement such “distributed data centric” workflows. We introduce the concepts of datasets and namespaces to group data into sets that can be manipulated as a whole (moved, copied, archived…). Datasets are made accessible to computing resources through ephemeral I/O services running on dedicated "data nodes" optimized for handling large quantities of data. Users specify which datasets are required to execute their workflow steps, and the runtime environment sets up the ephemeral I/O services accordingly. Users can also control data movement within the storage hierarchy to optimize time-to-solution, keeping frequently accessed data in the fastest storage tiers.
TimeMonday, June 2614:30 - 15:00 CEST
Computer Science, Machine Learning, and Applied Mathematics