POSTER P804: Processing large Radio Astronomy data cubes within an Objectstore

ADASS posters are displayed all week

When

11:04 p.m., Nov. 9, 2023

Theme: Cloud infrastructures for astronomical data analysis

pretalxeposter

The future Square Kilometre Array (SKA) telescope and its current precursors such as the Australian SKA Pathfinder and the Murchison Widefield Array are changing the way in which we handle large data. Typical ASKAP data cubes can be on the scale of a terrabyte or so; SKA data cubes may be larger by two orders of magnitude or more.

Reduction of these data can only efficiently occur in High Performance Compute (HPC) facilities.  Modern HPC centres are moving to object storage for long-term storage of data, as opposed to the traditional POSIX-based file systems. They offer virtually limitless scalability, greater searchability (via metadata attributes), resilliency and cost efficiency. However, virtually all algorithms used by radio astronomers assume an underlying POSIX file system, with its familiar file methods of open(), write(), seek() etc. To work with objectstores, data must firstly be staged out to short-term POSIX file-system storage, prior to processing the data. This is not a trivial exercise; staging multi-terrabyte data sets may take several hours to days.

I present an alternative methodology to avoid this double-handling of data. A python wrapper requests cutouts from the datacube in the objectstore and converts the received stream into arrays to be fed directly into the process (in this case the source-finder SoFiA-2). This is shown to be considerably faster than staging out data to a scratch file system and then processing.

Contacts

Gordon WH German, CSIRO