C401: "You might also like these images": unsupervised affine-transformation-independent representation learning for the ALMA Science Archive

Tuesday plenary 2: contributed talk

When

11 to 11:15 a.m., Nov. 7, 2023

Where

Theme: AI in Astronomy

pretalx

With the exponential growth of the amount of astronomical data with time, finding the needles in the haystack is getting increasingly more difficult. Traditionally, archives have described their observations with metadata and made those searchable through web interfaces as well as programmatically. The next frontier for science archives is to also allow searches on the content of the observations themselves. As a step into this direction, we have implemented a prototype of a recommender system for the ALMA Science Archive. We use self-supervised affine-transformation-independent representation learning of source morphologies for the similarity estimation through contrastive learning with a deep neuronal network. Once the neuronal network is trained, the feature vectors for all images - both for continuum images and for peak-flux images of datacubes - are evaluated. In a next step, we compute the similarity matrix holding for each image the corresponding 1000 most similar images, ordered by their pairwise similarity. A kd-tree is used to speed up that computation from O(n^2) to O(log n). Our prototype interface then shows the most-similar images of which the archival researcher can select the most interesting ones. When they do select an image on the interface, we use a scoring algorithm to instantaneously compute the combined similarity of the all already selected images and reorder the displayed remaining images accordingly. Each selection thus further refines the similarity display. Finally, we use k-means clustering on the feature vectors of the displayed images to provide selectable 'source morphology categories' for a quick-select option. We conclude from the prototype that an image similarity interface can be a valuable asset to science archives and we are looking forward to discussing this work and related ideas with the ADASS community.

Contacts

Felix Stoehr