Harnessing the Spectroscopic Fingerprints of Stars through Domain Adaptation

Understanding the physical processes that engender galaxies is a fundamental yet unsolved problem in astrophysics. Our own galaxy, The Milky Way, offers a unique opportunity to accomplish these tasks because it is the only galaxy for which we can inspect individual stars in sufficient detail.  The study of the Milky Way (what is known as Galactic Archaeology) is evolving at a breathtaking pace. Recent spectroscopic surveys, such as the US-led APOGEE, the European-led Gaia-ESO, and Australian-led GALAH, collect high-resolution spectra for about a million stars in the Milky Way. Upcoming surveys, such as SDSS-V, DESI, and 4MOST, which ANU will be a part of, will further increase the sample size by another order of magnitude (10 million stellar spectra).

However, how well we can understand the evolution of the Milky Ways depends critically on how well we can extract the properties (ages, chemical compositions) of stars from their spectra. Even though we now routinely collect excellent spectroscopic data from stars, the critical bottleneck for stellar spectroscopy is that the theoretical spectral models often do not fully agree with observations. Such a “synthetic gap” has rendered a “systematics floor” on how well we can understand stars. This project will explore machine learning techniques in domain adaptation to auto-calibrate synthetic models by leveraging the existing observed large datasets. The goal is to better “morph” synthetic models to the observation space, and through the process, understand missing stellar physics in the current theoretical spectral models.



Demonstrate that we could auto-calibrate imperfect theoretical spectral models via existing observations through unsupervised learning. We will develop better theoretical spectral models that will have broad implications for the upcoming million-dollar spectroscopic surveys.


Background Literature





Python programming (Pytorch, Tensorflow) and experience in machine learning are essential.  Familiarity with domain adaptation (GAN, normalizing flows, autoregressive flows) is desirable.