C802: Stimela 2, kubernauts, and dask-ms: radio interferometry data reduction in the cloud

Thursday Plenary 2: Contributed talk

When

11:45 a.m. to noon, Nov. 9, 2023

Where

Theme: Cloud infrastructures for astronomical data analysis

pretalx

Radio interferometry has been slow in adopting cloud-based technologies, despite some of their apparent advantages. I argue that it has been difficult to make radio interferometry on the cloud cost-effective for a number of reasons, chief among them: (a) awkward legacy data formats ill-suited to object store, (b) complex and heterogeneous software stacks with a heavy reliance on legacy code, and (c) awkward and complicated "thick/thin" workflows with very different resource requirements at different stages of the pipeline.

Recent software developments, however, offer a way forward. I will showcase some of these, including the Stimela 2 workflow management and containerization framework, which streamlines the orchestration of complex workflows on a Kubernetes cluster, and the dask-ms library, which maps legacy data formats onto diverse storage backends, providing support for object store. A new generation of software packages leverages these technologies, providing cloud-efficient implementations of the basic processing steps, which are able to exploit the auto-scaling capabilities inherent to cloud architectures. I will demonstrate a full data reduction workflow running on AWS. I will also argue that cloud-compatible pipelines go a long way to providing fully reproducible workflows.

Contacts

Oleg Smirnov, Rhodes University and SARAO