BOF Talk: Beyond FITS tiled compression: EleFits' on-the-fly adaptive compression

Birds-of-a-Feather session 2-C

When

6 to 6:10 p.m., Nov. 6, 2023

Theme: User Experience for astronomical software

Talk during the BoF: The Future of FITS and Other Standardized Astronomical Data Formats

The FITS file format is ubiquitous in astronomy. In addition to the classical external compression methods, it features an internal, tile-based image compression. This allows (de)compressing timely only relevant images instead of the entire file, and greatly reduces the memory usage at any given time. Metadata is even accessible without decompressing. Several compression algorithms have been adapted or introduced to handle various kinds of images, notably GZIP, Rice, H-compress and PLIO. Naturally, depending on the algorithm and the image it is applied to, the compression ratio can vary greatly. There is no one-fits-all algorithm and the same algorithm should not be applied unconditionally. However, up until now, FITS libraries like CFITSIO and Astropy have all relied on static compression settings, making the use of several algorithms depending on context quite tedious.

EleFits (https://github.com/CNES/EleFits) introduces compression strategies, which allow the user to define an adaptive heuristic for compressing FITS files dynamically. An arbitrary number of pre- or user-defined compression settings build a chain of responsibility: Each time an image is written, the first algorithm in the chain which can handle it is selected. The images which each algorithm can handle are defined by the properties of the HDU, e.g. image size, max pixel value, BITPIX or other keywords. The compression strategy is set once and then automatically applied when writing a FITS file, making compressing with EleFits' API straightforward (see snippet below). EleFits also provides its own turnkey strategies, including CompressAuto, which selects for each HDU the algorithm which should maximize the compression ratio. CompressAuto features tuned parameters for each algorithm and provides the option to choose between or combine lossy and lossless compressions.
```
MefFile f(filename, FileMode::Edit, CompressAuto());
f.append_image("SCI", {}, image); // Auto-compresses with GZIP
f.append_image("MASK", {}, mask); // Auto-compresses with PLIO
```
To assess the performance of our implementation, we have run a quantitative benchmark over a range of FITS files which cover a variety of use cases: they include images and masks of various sizes, types and statistics. The compression ratio and compression time are measured at file and HDU levels. CompressAuto is compared to common strategies which consist in applying one algorithm for integers and another algorithm for floating points. Lossless and lossy compressions are also compared. The results show that CompressAuto almost systematically produces the best compression ratios, sometimes by a large margin. Otherwise, differences with the optimal solution are negligible. Thanks to its adaptive parameters, the algorithm selected by CompressAuto also generally outperforms the default settings. Finally, choosing a more complex strategy, such as CompressAuto, does not significantly increase the walltime for compressing a FITS.

Contacts

Edgar Remy, CNES
Antoine Basset, CNES