Laboratory offer
Nom de la structure
IJCLab

Internship Offer - Development of software for an astronomical application and preparation of the corpus for an AI Large Language Models

Starting date
01-07-2026
Contract type
Stage
Contract length
5-6 months
Education level
M1/M2
Trade
Technicien
Topic
Supervision, contrôle, optimisation
  • context
  • IJCLab
  • Context
Contact

Share

lkml

Internship Offer - Development of software for an astronomical application and preparation of the corpus for an AI Large Language Models

Apply for this position

IJCLab

The Irène Joliot-Curie Laboratory of Physics of the Two Infinities, or IJCLab, is a joint research unit of the CNRS, Paris Saclay University, and Paris-Cité University, located on the campus of the Faculty of Sciences in Orsay. This laboratory is the result of the merger of five laboratories (CSNSM, IMNC, IPNO, LAL, and LPT) that were geographically and thematically close to each other on the Orsay campus. These laboratories share a common history, linked to the creation and development of the Orsay Campus.

IJCLab, which brings together around 730 people, covers the activities previously carried out in these five laboratories. IJCLab's identity is centered on the field of “the physics of the two infinities” and their applications, with all the richness of the themes that make up this physics. This is reflected in the presence of strong historical clusters, clusters linked to emerging themes, and activities at the interfaces. This laboratory has the capacity, vocation, and ambition to have a strong global impact on a wide range of scientific and technical fields, driving major flagship projects at the national and international levels. It also encourages and helps to support projects on a more local scale and with faster cycles, which may arise in response to scientific developments and/or technical innovations.

Détail de l'offre (poste, mission, profil)
Ancre
Context
Corps de texte

Nowadays, astronomers capture a plethora of transient phenomena such as stars devouring their partners, objects captured by supermassive black holes, dying stars, or even collisions of dead stars. Among those, high-energy phenomena can illuminate the sky over several hours to days. These rare events are in a very small minority but are of immeasurable value to several scientific fields, such as the origin of heavy elements found on Earth or the origin of Dark Energy. This is why we need to collect data sets using coordination from space and on the ground. In order to achieve that, we use a web application named SkyPortal (https://skyportal.io/), used both in the USA and Europe.

In this internship, in the context of the inter-disciplinary project MAFORAI (Monitoring Astronomical Follow-up Of Rare events with AI) and using Skyportal (a marshal and data science platform for time-domain astronomy), we aim to address the challenge of follow-up coordination in astronomy using an AI-based pipeline powered by a large language model (LLM). The long-term objective is to provide automated assistance to astronomers by using information collected in SkyPortal to suggest observational strategies but also better analysing the images collected during these campaigns.

For this internship, the work will explore the first stage of the project: creating a first corpus and data infrastructure behind to build the corpus. This includes the architecture needed to extract, organize, and label information from past observational campaigns found in Skyportal. These data products are currently dispersed across multiple sources, formats, and modalities, including heterogeneous data types such as text, images, and command logs, communications (such as instantaneous mission updates or informal reports). A key challenge is dealing with this heterogeneity and ensuring that the extracted information can be reproducible to train AI LLM.

Depending on the intern’s interests, the corpus may focus more heavily on:

  • text data, such as conversations, decision logs, with a focus on the evolution of the data regarding the timeline of the campaign

  • vision, such as images combined with expert comments / metadata. We require prior experience with Python, GitHub, and LLM API (e.g. openAI).

Overall, the internship provides an opportunity to define the baseline of an AI-assisted follow-up coordination system, with emphasis on corpus creation, data organization, and early-stage system architecture.