FOCUS DEMO F802: Empowering SKA Data Challenges: A homogeneous platform for enhanced collaboration and scalability fully aligned with Open Science.

Focus demo 7

When

10:15 to 10:45 a.m., Nov. 9, 2023

Where

PREVNEXT

Theme: Cloud infrastructures for astronomical data analysis

pretalxrecording    This focus demo starts at 1h58m

The Square Kilometre Array Observatory (SKAO) is an international collaborative effort focused on constructing and operating the world's most advanced radio telescope. The SKAO Science Data Challenges (SDCs) are a series of competitions that are designed to help scientists and engineers develop new techniques for analysing the vast amounts of data that the SKAO will generate. These SDCs have traditionally been conceived to use computing resources kindly provided by scientific institutions and facilities. The method of allocating computing resources for participants in the Data Challenges has varied among resource providers, resulting in a heterogeneous user experience where the users have access to Virtual Machines (VMs) with differing configurations, while others provide HPC-type resources. Providing an uniform platform for computing resources for SDC brings fairness, scalability, enhanced collaboration and consistency. Participants work with equal tools and streamlined collaboration. A standardised setup simplifies resource management, support, and evaluation, leading to enhanced efficiency and reliable results.

JupyterHub provides a platform for provisioning compute resources through a container orchestration service such as Kubernetes, in addition to providing user demand scaling, and enabling centrally managed authentication. The advantages of this approach include ease of deployment through Helm, homogenisation of the customisation for software and compute environment needed for the SDC, and horizontal scalability by allowing resources to be allocated to users by the Kubernetes cluster based on demand and availability.

With this contribution we want to present a highly portable, interactive and fully OpenScience-aligned analysis service for future participants in different Science Data Challenges to develop solutions on a horizontally scalable platform within the infrastructures of the SKA Regional Centres Network (SRCNet) and other IT facilities. In this context, we will show the process of configuring the Kubernetes cluster, the installation and preparation for BinderHub/JupyterHub, as well as a use case for a data analysis and workflow in radio astronomy, using Dask (a Python library for parallel and distributed computing) to take advantage of the capabilities of large distributed clusters in the cloud on Kubernetes. To ensure portability, two SRCNet cloud platforms such as ESPSRC (Spain) and CHSRC (Switzerland) have been used in addition to the infrastructure of a supercomputing centre (CESGA).

Contacts

Manuel Parra-Royón, IAA-CSIC