C503: From LOFAR to SKA: towards a GPU-based source extractor

Wednesday plenary 1: Contributed talk

When

9:30 to 9:45 a.m., Nov. 8, 2023

Where

Theme: GPU implementations for core astronomical libraries

pretalxslides

The Amsterdam-ASTRON Radio Transients Facility And Analysis Center (AARTFAAC) is an all-sky radio telescope and transient-detection facility. It piggybacks on raw data from a limited number of antennas of the LOFAR telescope. In 2018, the AARTFAAC 2.0 program started, which couples a planned telescope upgrade with better transient-detection capabilities and new science. The PetaFLOP AARTFAAC Data-Reduction Engine (PADRE) aims to improve the AARTFAAC processing pipeline to detect transients in real time with low latency, so that the raw samples of all LOFAR antennas (which are available for only seven seconds) can be saved for further analysis, while other instruments, observing at other wavelengths, are alerted to initiate follow-up observations immediately. 

The last part of the AARTFAAC pipeline is image based. Every second, for every subband, an all-sky image is produced which may contain anything between several tens up to several thousands detectable sources. The pixels constituting those sources are extracted in order to measure the properties of each source, such as peak flux density, integrated flux, position and shape parameters. These properties are inserted into a database and associated with previous measurements of the same source: a process called source association. Peak flux densities of the same source, ordered in time, form light curves which are analysed, e.g. using machine learning techniques, to find transient sources. Source extraction, measurement and association together form a subpipeline called TraP: the LOFAR Transients Pipeline.
 
This talk will focus on refactoring PySE, the Python Source Extractor and source measurer in  TraP, in order to speed it up: from an original running time of ~20s per typical 2300² pixels image with ~2000 sources to less than a second. We will discuss the software engineering effort to turn slow, serial Python code into fast, parallel code. There are abundant options for parallellisation on the CPU, such as Ray and Dask. These tools were used to speed up the compute-intense task of deriving background characteristics through kappa, sigma clipping. Source measurements could be parallellised using Python's multiprocessing module. These and algorithmic improvements were not sufficient to reduce the total time for source extraction and source measurent to below 1s. To achieve further performance improvements the sep library (based on SExtractor) was used for kappa, sigma clipping, segmentation and connected component labeling. Source measurements were speeded up impressively using Numba's guvectorize decorator. This decorator opened up the way to perform the source measurements on the GPU, by adding the "target='cuda'" argument. In combination with replacing Numpy arrays by CuPy arrays all the naturally parallel workload can now be shifted to the GPU, which will make it a suitable source extractor for SKA, processing 4K * 4K images with tens of thousands of sources in less than 1s.

Contacts

Hanno Spreeuw, The Netherlands eScience Center