4 DataIA Modular Chair

DataIA Modular Chair

A research chair brings together researchers, socio-economic partners, and dedicated resources to address a common challenge. Over a period of approximately three years, it aims to advance high-level scientific research while promoting innovation, training, and knowledge transfer. Synergie chairs are also designed to serve as a genuine platform for collaboration between the academic community and public or private stakeholders.

The first wave of Modular Chairs selected by the DataIA Paris-Saclay Institute now includes four projects, led by nine researchers across several laboratories and research teams.

« [CentOrIA] Mathematical Principles of Learning and Communication in Attention‑Based Models »

Claire Boyer Professor at Orsay Mathematics Laboratory

Pablo Piantanida Professor at CentraleSupélec. Director of the ILLS

Etienne Boursier Researcher at the CÉLESTE team (Inria Saclay)

With the contribution of Orsay Mathematics Laboratory (Université Paris-Saclay – CNRS), Céleste Team (statistics mathematics and learning) from Inria Saclay Île-de-France and ILLS (International Laboratory on Learning Systems).

This project aims to develop a rigorous mathematical understanding of how artificial intelligence systems are based on attention mechanisms, such as large language models (LLMs).

Although these architectures are at the heart of recent advances in AI, their theoretical foundations remain only partially understood. We are applying tools from probability, optimization, and information theory to identify the principles that govern:

How attention mechanisms process and structure information.
How models learn sequential structures through autoregressive prediction and context-aware learning.
And how multiple AI agents can communicate effectively and cooperatively.

By bringing these perspectives together, the project aims to lay the groundwork for a general theory of attention-based learning, with the goal of contributing to the design of more interpretable, reliable, and robust AI systems.

« DEEP-CH : Deep Learning Exploration of the Effects of Clonal Hematopoiesis on Solid Tumor Progression »

Elsa Bernard Team Leader at the Computational Clinical Oncology Laboratory at Gustave Roussy

Stergios Christodoulidis Lecturer in the Mathematics Department at CentraleSupélec and member of MICS

With the contribution of Gustave Roussy and MICS (laboratory of Mathematics and Computer Science for Complexity and Systems) from CentraleSupélec.

DEEP-CH aims to advance diagnostics for clonal hematopoiesis (CH) and to uncover its mechanistic role in tumor progression through artificial intelligence and multimodal data integration.

This project will develop deep-learning tools for enhanced CH detection and mutation origin prediction by introducing novel deep-learning methods for automated analysis of blood smears and cell-free DNA sequencing data.

Moreover, we will incorporate explainable AI on histology and spatial transcriptomics to reveal how CH-mutant immune cells modulate the tumor microenvironment. Organized into four synergistic work packages, DEEP-CH will deliver new diagnostic tools and mechanistic insights into CH-driven tumor–immune interactions, with the ultimate goal of improving precision oncology.

« MULTI-OBJECTIVE OPTIMIZATION : a fresh perspective on the old problem »

Evgenii Chzhen Researcher at CNRS

Antonio Silveti-Falls Lecturer at CentraleSupélec

With the contribution of Orsay Mathematics Laboratory (Université Paris-Saclay – CNRS), of CVN (Digital Vision Center) and OPIS team (OPtimisation Imagerie et Santé) from Inria Saclay Île-de-France.

This project focuses on multi-objective optimization, a framework for understanding the trade-offs between multiple competing objectives that arise in modern learning and decision-making systems.

Rather than reducing these objectives to a single value, the project explores methods for studying the Pareto front, which captures the set of optimal trade-offs among them.

The research develops algorithmic approaches to identify and approximate these trade-offs while leveraging the geometric structure of optimization problems to design efficient methods. Applications include contexts such as algorithmic fairness, where predictive performance must be balanced with fairness across groups.

A distinctive aspect of the project is its use of o-minimal geometry as a central perspective: by exploiting the simple geometric structure of many learning problems, we aim to better understand the structure of Pareto sets and to guide the design of optimization algorithms.

« THERAPI : THEoRy and Applications of Physics Informed learning models »

Cyril Furtlehner Researcher and scientist at Inria

Pierfranceso Urbani Permament researcher at the CNRS

With the contribution of LISN (Interdisciplinary Laboratory for Digital Sciences - Université Paris-Saclay / Inria Saclay Île-de-France / CNRS / Centrale Supélec) and CEA (Atomic Energy Commission).

Le projet vise à développer et mettre en œuvre des solutions fondées sur l’intelligence artificielle afin d’accompagner la recherche et l’innovation. Il consiste notamment à concevoir des algorithmes, les adapter à des problématiques scientifiques concrètes et faciliter leur déploiement sur des infrastructures de calcul intensif. L’objectif est d’accélérer les avancées scientifiques et de favoriser l’émergence de nouvelles collaborations.

The application of machine learning (ML) to physics presents a set of challenges due to the inherent properties of physics data. Unlike generic datasets, physics data are constrained by symmetries, conservation laws, and causal relationships, and often involve rare, highly nonlinear events that are critical for understanding complex systems.

Physics-informed machine learning (PIML) addresses these challenges by integrating prior physics knowledge into ML models. Existing PIML frameworks —such as Physics-Informed Neural Networks (PINNs) or neural operators offer promising avenues for embedding domain expertise into ML architectures.

This project aims to build on two recent breakthroughs achieved by the project partners:

High-precision training of PINNs.
The Canyon landscape model, a theoretical tool particularly effective for analyzing learning dynamics.

Guided by the dialectic between theory and practice, our focus will be on developing theoretical foundations for PIML models and methodologies that are both data-efficient and physically meaningful. We seek to advance the field of PIML, on the theoretical side by developing simplified models where new algorithms can be derived and incorporate the specificity of physics data; on the practical side by providing real-world solutions. Pushing ML based numerical tools to the precision and scaling required by scientific discovery should in particular help us to decipher some important physics questions of interest to the consortium, particularly in turbulence and geophysical applications.