CANCELLED: C808: Processing All- Sky Images At Scale On The Amazon Cloud: A HiPS Example

Thursday Plenary 4: Contributed talk

When

4:15 – 4:30 p.m., Nov. 9, 2023

Where

Theme: Cloud infrastructures for astronomical data analysis

pretalxslides

We report here on a project that has has developed a practical approach to processing all-sky image collections on cloud platforms, using as an exemplar application the creation of 3-color Hierarchical Progressive Survey (HiPS) maps of the 2MASS data set with the Montage Image Mosaic Engine on Amazon Web Services.  We will emphasize issues that must be considered by scientists wishing to use cloud platforms to perform such parallel processing, so providing a guide for scientists wishing to take exploit cloud platforms for similar large-scale processing.  A HiPS map is based on the HEALPix sky tiling scheme. Progressive zooming of a HiPS map reveals an image sampled at ever smaller or larger spatial scales that are defined by the HEALPix standard. Briefly, the approach used by Montage involves creating a base mosaic at the lowest required HEALPix level, usually chosen to match as closely as possible the spatial sampling of the input images, then cutting out the HiPS cells in PNG format from this mosaic. The process is repeated at successive HEALPix levels to create a nested collection of FITS files, from which are created PNG files that are shown in HiPS viewers. Stretching FITS files to produce PNGs is based on an image histogram.  For composite regions (up and including the whole sky) the histograms for each tile can be combined to create a composite histogram for the region.  Using this single histogram for each of the individual FITS files means all the PNGs are on the same brightness scale and displaying them side by side in a HiPS viewer produces a continuous uniform map across the entire sky.

All the processing just described can one readily performed in parallel on AWS instances. To create the HiPS maps on AWS, jobs were set up with a Docker container that contains the requisite data software components, including modules added to streamline processing on cloud platforms, including adjusting for inter-image background variations and developing a global model for visualization stretches. Jobs are set up and run with the Amazon Web Services (AWS) Batch processing mode, which spins up server instances as needed, pulling from a pool of pre-defined job script.  When a job is done it either the compute instance another job from the pool or shuts the instance down. This approach minimizes having idle instances which would still incur charges even when not processing.  A set of script generators developed for this project create, by design, simple scripts that are handed to the instances to run jobs inside the containers. Processing the whole sky at three wavelengths requires about ten thousand such jobs. We will discuss processing times and costs.

Contacts

Bruce Berriman, IPAC