[đ„ WORKSHOP] "Mathematical Foundations of AI" - 6th Ă©dition
Registrations closed!
The âMathematical Foundations of AIâ day, organized jointly by the DataIA Institute and SCAI, in association with the scientific societies: the Jacques Hadamard Mathematical Foundation (FMJH), the Paris Mathematical Sciences Foundation (FSMP), the MALIA group of the French Statistical Society, and the Francophone Machine Learning Society (SSFAM), aims to provide an overview of some promising research directions at the interface between statistical learning and AI.
It is part of the Maths & AI network in the Ile-de-France region, of which the FMJH and DataIA are members.
This new edition will focus on issues of identifiability, whether for tensor analysis, neural networks, or generative AI. The day will feature three plenary presentations and a spotlight presentation by renowned researchers and specialists in the field:
- François Malgouyres (University of Toulouse), specialist in tensors and tensor identifiability issues;
- Elisabeth Gassiat (Orsay Mathematics Laboratory), professor and leading statistician, who has conducted research on VAE identifiability issues;
- Pavlo Mozharovskyi (Télécom ParisTech), professor and recognized expert on explainability, with research conducted on concept-based learning;
- Konstantin Usevich (CRAN, CNRS)
This day is also an opportunity for young researchers to present their work through short presentations.
Organizing Committee
- Marianne Clausel (Université de Lorraine)
- Emilie Chouzenoux (INRIA Saclay, Institut DataIA)
Â
Scientific Committee
- Ricardo Borsoi (CNRS, CRAN)
- Stéphane Chrétien (Univ. Lyon 2)
- Sylvain Le Corff (Sorbonne Université)
- Myriam Tami (CentraleSupélec)
Â
Geometry-induced regularization and identifiability of deep ReLU networks
Abstract: The first part of the presentation will use a simple, educational example to introduce the mathematical results developed in the second part, in order to make the concept accessible to as many people as possible. Due to implicit regularization that favors âgoodâ networks, neural networks with a large number of parameters do not generally overfit. Related phenomena that are still poorly understood include the properties of flat minima, saddle-to-saddle dynamics, and neuron alignment. To analyze these phenomena, we study the local geometry of deep ReLU neural networks. We show that, for a fixed architecture, when the weights vary, the image of a sample X forms a set whose local dimension changes. The parameter space is thus partitioned into regions where this local dimension remains constant. The local dimension is invariant with respect to the natural symmetries of ReLU networks (i.e., positive scale changes and neuron permutations). We then establish that the geometry of the network induces regularization, with the local dimension constituting a key measure of regularity. Furthermore, we relate the local dimension to a new notion of flatness of minima as well as to saddle-to-saddle dynamics. For networks with a hidden layer, we also show that the local dimension is related to the number of linear regions perceived by $X$, which sheds light on the effect of regularization. This result is supported by experiments and linked to neuron alignment. Finally, I will present experiments based on MNIST, which highlight the regularization induced by geometry in this context. Finally, I will make the connection between properties of the local dimension and the local identifiability of the network parameters.
Â
Biography: François Malgouyres is a professor at the University of Toulouse (France). His research focuses on the theoretical and methodological foundations of deep learning, with a particular interest in understanding the mathematical structure of neural networks. He has worked on network geometry, parameter identifiability, function approximation using neural networks, weight quantization in recurrent networks, and the design of orthogonal convolutional layers. He has also taken an interest in the straight-through estimatorâthe reference algorithm for quantized weight optimizationâand its applications to sparse signal reconstruction. Before joining the University of Toulouse, François Malgouyres was a lecturer at Paris Nord University, a postdoctoral fellow at the University of California, Los Angeles (UCLA), and then a doctoral student at ENS Paris-Saclay (then located in Cachan).
10 - 10:30am | Coffee Break
Identifiability of Deep Polynomial Neural Networks
Résumé : Polynomial Neural Networks (PNNs) possess a rich algebraic and geometric structure. However, their identifiability -- a key property for ensuring interpretability -- remains poorly understood. In this work, we present a comprehensive analysis of the identifiability of deep PNNs, including architectures with and without bias terms. Our results reveal an intricate interplay between activation degrees and layer widths in achieving identifiability. As special cases, we show that architectures with non-increasing layer widths are generically identifiable under mild conditions, while encoder-decoder networks are identifiable when the decoder widths do not grow too rapidly compared to the activation degrees. Our proofs are constructive and center on a connection between deep PNNs and low-rank tensor decompositions, and Kruskal-type uniqueness theorems. We also settle an open conjecture on the dimension of PNN's neurovarieties, and provide new bounds on the activation degrees required for it to reach the expected dimension.
Â
Biographie : Konstantin Usevich is a CNRS researcher (chargé de recherche) at CRAN (Centre de Recherche en Automatique de Nancy), member of the SiMul group. His  research interests are in linear and multilinear algebra, optimization, focused on tensor decompositions, low-rank approximations and their applications in statistics, signal processing and machine learning. He got my PhD from St. Petersburg University (Russia) in 2011. Prior to joining the CNRS in 2017, he was a postdoc at University of Southampton (UK), Vrije Universiteit Brussel (Belgium) and GIPSA-lab (Grenoble, France).
Title (TBA)
Abstract:
Biography:
Rémi VAUCHER (Halias Technology)
Signatures, and Quiver Representations: don't be afraid to use Algebra in Causality
Understanding and testing causal relationships is a central challenge in modern artificial intelligence. In this talk, we introduce a mathematical perspective on causality based on two theoretical tools: path signatures and quiver representations. Signatures provide a hierarchical and universal description of temporal data, enabling the detection of differential causality. Quiver representations then offer an algebraic framework in which these relations can be encoded and tested in a structured and interpretable way. This approach bridges algebra, geometry and machine learning, suggesting new avenues for causal inference in dynamic settings. We will present the core mathematical ideas and illustrate their potential through examples. The Quiver Representations part is a joint work with Antoine Caradot.
Manal BENHAMZA (CentraleSupélec)
Counterfactual Robustness: a framework to analyze the robustness of Causal Generative Models across interventions
Data generation using generative models is one of the most impressive growing field of artificial intelligence. However, such models are black boxes trained on huge datasets lacking interpretability properties. Causality is a natural framework to include expert knowledge into deep generative models. Other expected beneficial properties of causal generative models are fairness, transparency and robustness of the generation process. Up to our best knowledge, while many works have analyzed general generative modelsâ robustness, surprisingly none have focused on their causal counterpart even if their robustness is a common claim. In the present paper, we introduce the fundamental concept of counterfactual robustness, which evaluates how sensitive causal generative models are to interventions with respect to distribution shifts. Through a series of experiments on synthetic and real-life datasets, we demonstrate that all the studied causal generative models are not equal with respect to counterfactual robustness. More surprisingly, we show that all causal interventions are also not equally robust. We provide a simple explanation based on the causal mechanisms between the variables, that is theoretically grounded in the case of an extended CausalVAE. Our in-depth analysis also yields an efficient way to identify the most robust intervention based on prior knowledge on the causal graph.
Ali AGHABABAEI (Université Grenoble Alpes)
Unified Framework for Pre-trained Neural Network Compression via Decomposition and Optimized Rank Selection
Modern deep neural networks often contain millions of parameters, making them impractical for deployment on resource-constrained devices. In this talk, I present RENE (Rank adapt tENsor dEcomposition), a unified framework that combines tensor decomposition with automatic rank selection to efficiently compress pre-trained neural networks. Unlike traditional approaches that rely on manually chosen or grid-searched ranks, RENE performs continuous rank optimization through a multi-step search strategy, exploring large rank spaces while keeping memory and computation manageable. The method identifies layer-wise optimal ranks without requiring training data and subsequently fine-tunes the decomposed model through a lightweight distillation process. Experiments on benchmark datasets, covering both convolutional and transformer architectures, demonstrate superior compression rates with strong accuracy preservation.
12:45 - 1:45pm | Lunch Break
Title (TBA)
Abstract:
Biography:
2:45 - 3:30pm | Sweet Break
Sonia MAZELET (Polytechnique)
Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs
Optimal transport between graphs, based on Gromov-Wasserstein and other extensions, is a powerful tool for comparing and aligning graph structures. Â However, solving the associated non-convex optimization problems is computationally expensive, which limits the scalability of these methods to large graphs. In this work, we present Unbalanced Learning of Optimal Transport (ULOT), a deep learning method that predicts optimal transport plans between two graphs. Our method is trained by minimizing the fused unbalanced Gromov-Wasserstein (FUGW) loss. We propose a novel neural architecture with cross-attention that is conditioned on the FUGW tradeoff hyperparameters. We evaluate ULOT on synthetic stochastic block model (SBM) graphs and on real cortical surface data obtained from fMRI. ULOT predicts transport plans with competitive loss up to two orders of magnitude faster than classical solvers. Â Furthermore, the predicted plan can be used as a warm start for classical solvers to accelerate their convergence. Finally, the predicted transport plan is fully differentiable with respect to the graph inputs and FUGW hyperparameters, enabling the optimization of functionals of the ULOT plan.
Alexandre Chaussard (LPSM, Sorbonne Université)
Identifiability of AVEs
When studying ecosystems, hierarchical trees are often used to organize entities based on proximity criteria, such as the taxonomy in microbiology, social classes in geography, or product types in retail businesses, offering valuable insights into entity relationships. Despite their significance, current count-data models do not leverage this structured information. In particular, the widely used Poisson log-normal (PLN) model, known for its ability to model interactions between entities from count data, lacks the possibility to incorporate such hierarchical tree structures, limiting its applicability in domains characterized by such complexities. To address this matter, we introduce the PLN-Tree model as an extension of the PLN model, specifically designed for modeling hierarchical count data. By integrating structured deep variational inference techniques, we propose an adapted training procedure and establish identifiability results in the Poisson Log-Normal framework, enhancing both theoretical foundations and practical interpretability. Additionally, we present a proof-of-concept implication of identifiability by illustrating the practical benefits of using identifiable features for classification tasks.
Chuong LUONG (Université de Lorraine)
New Conditions for the Identifiability of Block-Term Tensor Decompositions
Tensor decompositions have become an important tool in machine learning and data analysis, as they can exploit the multidimensional structure of data. In particular, identifiability guarantees provide essential theoretical support to various latent variable modelling and source separation (e.g., unmixing) methods. However, for decompositions in block terms - which enjoy increased flexibility compared to the classical canonical polyadic decomposition, since each component is a block of multilinear ranks (L_r, M_r, N_r) -fewer results are available. In this ongoing work, we study the identifiability of general block-term decompositions of three-dimensional tensors from an algebraic-geometric viewpoint. Our current results provide new sufficient conditions for the identifiability of generic tensors based on the tensor dimensions, the shape of each block, and the number of components in the model (i.e., the tensor rank). Compared to previous results available in the literature, our conditions show that identifiability can hold for a larger number of components in certain regimes.
Mélissa ABIDER (Université Paris-Saclay)
Between identifiability and explainability: a mathematical and empirical exploration of variational models
Deep generative models, such as variational autoencoders (VAEs), learn to represent complex data in a hidden latent space. However, this representation is not always unique: several internal structures can produce the same observed results. This identifiability problem raises fundamental questions about the understanding and interpretation of AI models. In this presentation, I will offer both a theoretical and visual exploration of this phenomenon. I will briefly review the probabilistic framework of VAEs, before showing, through a small experiment, how regularization (ÎČ) and data noise influence the shape and stability of the latent space. These observations illustrate the trade-off between model fidelity and clarity of internal representation. This work aims to link the mathematical aspects of identifiability to the challenges of explainability in AI, and to open the discussion on how these properties could guide the design of more interpretable models.