The DATAIA Paris-Saclay Institute

The DATAIA Paris-Saclay Institute is the Artificial Intelligence Institute of Université Paris-Saclay.
Bandeau image
The DATAIA Paris-Saclay Institute

Winner of the "Instituts Convergences" call for projects launched by the French National Research Agency (ANR) in 2017, the DATAIA Paris-Saclay Institute has gathered the artificial intelligence (AI) expertise of the Paris-Saclay ecosystem to strengthen interdisciplinary collaboration between institutions in data science and AI. In January 2021, the Institute refocused its activities on the Université Paris-Saclay cluster of excellence, becoming the bearer of the University's strategy in artificial intelligence, research and training.

As France's leading artificial intelligence ecosystem, DATAIA Paris-Saclay Institute aims to federate and structure multi-disciplinary expertise to develop cutting-edge research in data science in conjunction with other disciplines, such as the humanities and social sciences. It now mobilizes over 1,200 researchers and teacher-researchers from 46 laboratories (including 15% international) within the Paris-Saclay University perimeter.

Download DATAIA Paris-Saclay Institute's activity report

Watch DATAIA Paris-Saclay Institute's 4th anniversary video (FR version)

L'Université Paris-Saclay

Paris-Saclay University comprises ten universities, four "grandes écoles", the Institut des Hautes Etudes Scientifiques, two associate member universities and laboratories shared with major research organizations. It offers students prestigious training courses that enable them to find employment and put their knowledge to good use in a wide range of scientific and economic sectors. Université Paris-Saclay and its members have distinguished themselves in a large number of disciplines, making the university the No. 1 university in France in 12 fields, No. 1 in Europe in physics, No. 1 in the world in mathematics, and No. 14 in the world for all disciplines combined.

Main goals

The aim of the DATAIA Paris-Saclay Institute is to bring together multi-disciplinary skills and energize the power of academic and industrial partners within the Paris-Saclay perimeter to develop disruptive research in AI, data science and their societal impacts.

  1. Develop cutting-edge research in data science: to advance the state of the art in data science in a concerted manner, preparing for the emergence of innovative artificial intelligence services (from algorithms to proofs of concept), and the junction of human sciences and the digital revolution. The DATAIA Institute is designed to enable the various disciplines involved to address the issue in all its dimensions and impacts;
  2. Promote excellence in training: develop and promote excellence in training by supporting innovative programs at master's and doctoral level, and by taking an active role in scientific leadership, to train the next generation of data scientists.
  3. Boosting relations between academia and industry: strengthen dialogue between the academic and industrial communities and consolidate the international visibility and expertise of Saclay's data science community, in particular by hosting major scientific figures.
  4. Bringing together multi-disciplinary expertise: bringing together diversified research skills aimed at producing new knowledge through the joint mobilization of different disciplinary skills. Université Paris-Saclay brings together experts of the highest international level in a wide range of disciplines: mathematics, computer sciences, physics, life sciences, economics and management, humanities and social sciences. This wealth of disciplines represents an unrivalled opportunity in France, covering the entire spectrum of data science and artificial intelligence, as well as the challenges facing society.

Scientific priorities

The DATAIA Institute draws on disciplinary foundations such as statistics, strategy, data science, law, etc., and takes up interdisciplinary challenges to spread its expertise to all scientific partners.

Learning and artificial intelligence

Recently, deep learning research has made spectacular advances in computer vision and natural language processing. Beyond the arrival of massive data, increased computing power and design efforts, the causes of these advances, which are still poorly understood, raise at least three questions: what learning theory will enable the analysis of deep architectures? How can we manage the compositionality of these architectures and their ability to apprehend more complex objects? How can we open the black box to update learned representations?


  • Innovative machine learning and AI: common sense, adaptability, generalization;
  • Deep learning and adversarial learning;
  • Machine learning and hyper-optimization;
  • Optimization for learning, e.g. improvements in stochastic gradient methods, Bayesian optimization, combinatorial optimization;
  • Learning-modeling link, integration of a priori in learning;
  • Reproducibility and robust learning;
  • Statistical inference and validation;
  • Compositionality of deep architectures.

From data to knowledge, from data to decisions

The growing availability of massive data is pushing back the technical frontiers in many fields. On the one hand, the heterogeneous, semi-structured, incomplete or uncertain nature of data calls into question the usual statistical models and algorithms dedicated to decision-making. On the other hand, data management raises new operability constraints, such as security, integrity and traceability. What's more, producing knowledge requires building models that deliver explainable, statistically valid and calculable decisions. Acceptance of results also requires that confidentiality and loyalty be reinforced. At the same time, new developments in optimization should make it possible to improve estimation procedures.


  • Heterogeneous, complex, incomplete, semi-structured and/or uncertain data;
  • Fast big data: structuring data to be able to exploit it;
  • E-learning, methodology for massive data, efficient methods;
  • Improved storage, computation and estimation for data science;
  • Game-theoretic modeling of interactions between agents (human or artificial);
  • Multi-scale and multimodal representation and algorithms;
  • Theoretical analysis of heuristic methods (complexity theory, information geometry, Markov chain theory);
  • Human-machine coevolution in autonomous systems: conversational agents, cars, social robots.

Transparency, responsible AI and ethics

Digital trust is built on the implementation of ethically responsible methodologies through the transparency and accountability of algorithmic systems; the regulation of the collection, use and processing of personal data; and the reinforcement of regulation through appropriate digital procedures. Privacy by design is a form of regulation that includes the protection of personal data at all stages of collection and processing. The tracing of tools applied to data must also be developed in such a way as to facilitate the explanation of the model for experts and users alike, making algorithmic systems auditable. Confidentiality principles, although easy to formulate, require modifications to storage and processing infrastructures, with major legislative, sociological and economic impacts. Transparency techniques for algorithmic systems will be developed, focusing on: fairness, loyalty and non-discrimination, and accountability-by-construction.


  • Responsibility-by-design, Explicability-by-design;
  • Transparency-by-design, fairness-by-design;
  • Auditing algorithmic systems: non-discrimination, fairness, technical bias, neutrality, equity;
  • Measuring trust and digital appropriation;
  • Progressive user-centric-analytics (interactive monitoring of decision-making systems: dataviz dashboards, GUIs);
  • Responsibility for information processing and decision-making: data usage control and fact-checking;
  • Causal discovery, traceability of inferences from source data, interpretability of deep architectures.

Protection, regulation and the data economy

Companies involved in the data economy continually need to rethink how they structure themselves: they must adopt a project-oriented organization with rapid changes in resource allocation. The data economy also raises issues of concentration and monopoly. A small number of companies (GAFAM) hold most of the data. This market concentration can lead to unfair competition, and innovation in small and medium-sized businesses is likely to suffer. Citizens expect governments to intervene in the digital economy to prevent too much concentration and monopoly. Governments must prevent the leakage of information to preserve the sovereignty of states and respect for regulations.


  • Privacy-by-design, GDPR;
  • Privacy-friendly learning (differential privacy);
  • Development of ethically responsible methodologies and technologies to regulate the collection, use and processing of personal data, and the exploitation of knowledge derived from such data;
  • IT security for data processing chains;
  • Security/crypto: blockchain and trusted third parties.