<div class="center-resize">
<h1 class="center-content">Gaia Utility for the Analysis of Self-Organizing Maps in DR3</h1>

For Gaia DR3 the Coordination Unit 8 Outlier Analysis working group analysed 56 million objects, with a probability of membership to typical astronomical object classes below a certain threshold (see DR3 [documentation](https://gea.esac.esa.int/archive/documentation/GDR3/Data_analysis/chap_cu8par/sec_cu8par_apsis/ssec_cu8par_apsis_oa.html)), i.e., classification outliers. Self-Organizing Maps (SOM) is the unsupervised clustering method selected to perform this task.

A free access web environment was designed to facilitate the analysis work of the Gaia DR3 Self-Organizing Map (Gaia Utility for the Analysis of Self-Organizing Maps). It was implemented with Web technologies, to benefit from their flexibility and easy access to the information.

The version of GUASOM that has been developed precisely for the spectra processed by the Outlier Analysis work package for the Gaia DR3 is called _GUASOM flavour DR3_ and contains several visualization utilities that allow a user-friendly analysis of the information present on the map. The tool provides both classical and specific domain representations:

- **Umatrix**: This representation shows the distance among the different neuron prototypes, where less distance means more similarity. This is useful to identify groups of neurons populated by objects with similar SEDs. In this application, the user can control the boundaries of the distance among neurons through the slider "Distance boundaries (Percentiles)", with the objective of exploring the inner structure of the map. Furthermore, the user can also select how to present the distance between neurons according to a normal, natural logarithmic or root square function, being the normal scale the default one. The legend in this map shows a colour gradient, from black to white, representing from the higher to the lower distances respectively, and using the exponential notation due to the small distances among most similar neurons.

- **Hits**: It displays the number of objects for each neuron, allowing to identify dense regions in the map. The slider "Hits boundaries (Percentiles)" allows the user to control the limits of the hits that are shown in the map, highlighting the neurons in the desired range. The legend in this map shows a colour gradient, from black to white, representing from less to high density regions respectively.

- **Parameter distribution**: This visualization shows the distribution of a particular parameter of the domain in the map, displaying the average values calculated in each neuron. The user can select the parameter in a drop-down menu, and also the scale of the values according to a normal, natural logarithmic or root square function, being the normal scale the default one. The legend in this map shows a colour gradient, from black to white, representing from the higher to the lower values in the map, respectively.

- **Catalogue labels**: This graphic shows the representative label of each neuron according to a specific catalogue that can be chosen by the user. By default, there is always a "quality_category" catalogue that has been defined internally to evaluate the quality of each neuron, as it is explained in the [online documentation](https://gea.esac.esa.int/archive/documentation/GDR3/Data_analysis/chap_cu8par/sec_cu8par_apsis/ssec_cu8par_apsis_oa.html). Furthermore, the available cross-matches with other catalogues can also be shown by selecting the proper catalogue in the drop-down. 
For the catalogues obtained through a cross-match procedure, the user can control the qualified majority limit that the label has to reach to be representative through the slider "Qualified majority limit (%)". By default, the percentage that each label represents on each neuron is calculated only with "the matches", those objects found in the other catalogue, but the user can also include all the objects that do not have a match by clicking in the checkbox "Include not found objects", recalculating its percentage value consequently.
The legend in this map shows all the labels that belong to the selected catalogue with the associated colour.

- **Template labels**: It is similar to the Catalogue labels visualization, but in this case, the representative label of each cluster is based on a template (crafted using the preprocess procedure explained in the [online documentation](https://gea.esac.esa.int/archive/documentation/GDR3/Data_analysis/chap_cu8par/sec_cu8par_apsis/ssec_cu8par_apsis_oa.html#SSS3.P1)). In order to select the template that best fits with the prototype, the Euclidean Distance is used.
The "Max distance to template (Percentiles)" slider allows the user to control the distance between the prototype and its corresponding template to decide about the adjustment threshold between them, assigning the label with sufficient confidence.
For those templates that do not represent stellar objects of an unique type, the "Qualified majority limit (%)" slider is available, and allows the user to control the value that defines the majority that the label has to reach to be representative. This is specially useful when combined templates are used, in order to determine which neurons are the ones that represent each object with more accuraccy.
The legend in this map shows all the labels that belong to the selected template with the associated colour.
For DR3, two sets of templates are available:
    - _class_label_basic_: This template set refers to broad spectral types for stars (early, intermediate and late), and extragalactic object types in wide redshift ranges. The labels of all neurons with specific types were converted to their related general type according to this [table](help/generic?markdown=templates_table.md). The available templates are shown [here](help/generic?markdown=basic_templates.md).

    - _class_label_specific_: This template set considers finer levels of classification for both stars and extragalactic objects. The available templates are shown [here](help/generic?markdown=specific_templates.md).
<br/>

- **Category distribution**: In this representation, the distribution of a single type of object is shown. The user can select the category to be displayed between a set of labels, according to the templates and catalogues available for the map. With this graphic, the user can easily observe the regions of the map containing objects of the chosen type. The legend in this map shows a colour gradient, from black to white, representing the fraction of the population that belongs to the selected category for every neuron in the map.

- **Colour distribution**: It shows the colour distribution of the objects in the map, derived as the difference in magnitudes between two photometric bands, which correspond to the two photometers, the blue photometer (BP) and the red photometer (RP), and the colour is calculated as G<sub>BP</sub> &minus; G<sub>RP</sub>. The legend in this map shows a colour gradient, from blue to red, that represents from the hottest to the coldest objects in the map.

- **Novelty**: This visualization displays the distance between the prototype of the neuron and the selected template. Less distance means less novelty because the template associated with the neuron is quite similar to the prototype, so it refers to a well-known object type. The user can select the set of templates to render. Furthermore, the distance can be represented in different scales: normal, natural logarithm and root square, that can be selected by the user through a drop-down menu, being the normal scale the default one.

It is also possible to combine pairs of the mentioned visualizations in 3D plots to analyse the information that is provided by both at the same time. It is remarkably useful the combination of **Umatrix + Templates** or **Umatrix + Catalogues**, to show the relation between the label and the distance between neurons, allowing to identify tendencies. Another remarkable combination is the one with **Hits + Templates** or **Hits + Catalogues**, showing the representative label and the density of the map at the same time, allowing to identify the types of objects that dominate the map.


The strength of the tool lies in its ability to explore the neurons and the objects assigned to them by means of the following specific representations:

- **Internal Spectra**: It shows the spectra that was used to train the SOM, after the preprocess and normalisation stages (see [online documentation](https://gea.esac.esa.int/archive/documentation/GDR3/Data_analysis/chap_cu8par/sec_cu8par_apsis/ssec_cu8par_apsis_oa.html#SSS3.P1)). It shows at least the prototype and the object-centroid of the neuron but, if available, it also shows the selected template, and the spectra of those objects in a neuron that best and worst fit the prototype.

- **Gaia spectra**: It shows the spectra for the best and worst 20 objects of the neuron. These spectra have been crafted using the coefficients available in the [datalink](https://www.cosmos.esa.int/web/gaia-users/archive/datalink-products) and using the [GaiaXPy](https://pypi.org/project/GaiaXPy/) Python tool.

- **Population**: Available only for those catalogues obtained by means of a cross-match procedure or for the combined templates. It shows the frequency of the different types of objects in the neuron.

- **Statistical summary**: The statistical summary shows a table with the statistical information available for a neuron. The description of the parameters is available in the [online documentation](https://gea.esac.esa.int/archive/documentation/GDR3/Gaia_archive/chap_datamodel/sec_dm_astrophysical_parameter_tables/ssec_dm_oa_neuron_information.html).

Furthermore, in order to perform additional analysis in different environments, GUASOM allows selecting multiple neurons and download their belonging sources in fits format. It also provides an ADQL query, that can be copied and pasted in the [Gaia Archive](https://gea.esac.esa.int/archive/), to retrieve these source ids and also the data related to the selected neurons.

<div class="image-side-group">
<div class="image-side">
<img src="https://gitlab.citic.udc.es/publico/guasom-markdown/-/raw/main/images/GUASOM_hits_map.png" alt="GUASOM_hits_map"/>
<img src="https://gitlab.citic.udc.es/publico/guasom-markdown/-/raw/main/images/GUASOM_templates_combined_map.png" 
alt="GUASOM_templates_combined_map"/>
</div>
<p>
Figure 1: SOM map lattice that represents the population or hits of each neuron or cell through a greyscale (upper panel), and their quality category values (lower panel)
</p>
</div>
<div class="image-side-group">
<div class="image-side">
<img src="https://gitlab.citic.udc.es/publico/guasom-markdown/-/raw/main/images/GUASOM_templates_specific_map.png" alt="GUASOM_templates_specific_map"/>
<img src="https://gitlab.citic.udc.es/publico/guasom-markdown/-/raw/main/images/GUASOM_quality_map.png" alt="GUASOM_quality_map"/>
</div>
<p>
Figure 2: SOM map lattice that represents the basic class labels assigned to each neuron or cell (upper panel), and the specific class labels (lower panel)
<p>
</div>
References:
<div>
<ul>
<li>Gaia Collaboration, L. Delchambre, et al.; Gaia Data Release 3: Apsis III - Non-stellar content and source classification; Astronomy and Astrophysics; 2022</li>
<li>M. A. Álvarez, C. Dafonte, M. Manteiga, D. Garabato and R. Santoveña; GUASOM: an adaptive visualization tool for unsupervised clustering in spectrophotometric astronomical surveys; Neural Computing and Applications; Volume 34(5); 2021; DOI: <a href="https://doi.org/10.1007/s00521-021-06510-9">10.1007/s00521-021-06510-9</a></li>
<li>C. Dafonte, D. Garabato, M. A. Álvarez and M. Manteiga; Distributed Fast Self-Organized Maps for Massive Spectrophotometric Data Analysis; Sensors; Volume 18(5); 1419; 2018; DOI: <a href="https://doi.org/10.3390/s18051419">10.3390/s18051419</a></li>
<li>D. Fustes, M. Manteiga, C. Dafonte, B. Arcay, K. Smith, A. Vallenari, X. Luri; SOM ensemble for unsupervised outlier analysis. Application to outlier identification in the Gaia astronomical survey; Expert System with Applications; Volume 40; 1530-1541; 2013; DOI: <a href="https://doi.org/10.1016/j.eswa.2012.08.069">10.1016/j.eswa.2012.08.069</a></li>
<li>D. Fustes, M. Manteiga, C. Dafonte, B. Arcay, K. Smith, R. Borrachero, R. Sordo; An approach to the analysis of SDSS spectroscopic outliers based on self-organizing maps. Designing the outlier analysis software package for the next Gaia survey; Astronomy and Astrophysics; Volume 559; A7; 2013; DOI: <a href="https://doi.org/10.1051/0004-6361/201321445">10.1051/0004-6361/201321445</a></li>
<li>D. Ordóñez, C. Dafonte. M. Manteiga, B. Arcay; HSC: A Multi-Resolution Clustering Strategy in Self-Organizing Maps applied to astronomical observations; Applied Soft Computing; Volume 12; 204-215; 2012; DOI: <a href="https://doi.org/10.1016/j.asoc.2011.08.052">10.1016/j.asoc.2011.08.052</a></li>
<ul>
</div>

When using the GUASOM tool, we ask you to recognise the ESA/Gaia/DPAC/OA team by adding an acknowledgement to your work as follows: "This research makes use of public analysis products and the visualization tool GUASOM provided by ESA/Gaia/DPAC/CU8/OA”
</div>