Thursday Plenary 2: Contributed talk
When
Where
Theme: Cloud infrastructures for astronomical data analysis
Radio interferometry has been slow in adopting cloud-based technologies, despite some of their apparent advantages. I argue that it has been difficult to make radio interferometry on the cloud cost-effective for a number of reasons, chief among them: (a) awkward legacy data formats ill-suited to object store, (b) complex and heterogeneous software stacks with a heavy reliance on legacy code, and (c) awkward and complicated "thick/thin" workflows with very different resource requirements at different stages of the pipeline.
Recent software developments, however, offer a way forward. I will showcase some of these, including the Stimela 2 workflow management and containerization framework, which streamlines the orchestration of complex workflows on a Kubernetes cluster, and the dask-ms library, which maps legacy data formats onto diverse storage backends, providing support for object store. A new generation of software packages leverages these technologies, providing cloud-efficient implementations of the basic processing steps, which are able to exploit the auto-scaling capabilities inherent to cloud architectures. I will demonstrate a full data reduction workflow running on AWS. I will also argue that cloud-compatible pipelines go a long way to providing fully reproducible workflows.