« Le Séminaire Palaisien » | Arnak Dalalyan and Frédéric Pascal on machine learning and statistics - Cloned

Bandeau image
Date de tri
Lieu de l'événement
At Inria Saclay - Amphi Sophie Germain


Le Séminaire Palaisien gathers, every first Tuesday of the month, the vast research community of Saclay around statistics and machine learning.
Corps de texte

Each seminar session is divided into 2 scientific presentations of 40 minutes each: 30 minutes of presentation and 10 minutes of questions.

Arnak Dalalyan (Inria) and Frédéric Pascal (CentraleSupélec) will lead the session of February 2022.

Nom de l'accordéon
Arnak Dalalyan - Robust Estimation of the Gaussian Mean by iterative reweighting
Texte dans l'accordéon

The goal of this talk is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by scaling, translations and orthogonal transformations. Third, it has a nearly-minimax-rate-breakdown point approximately equal to 0.28. Fourth, it is minimax rate optimal when data consist of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically optimal when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated using a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first of this kind results in the literature and involves only the effective rank of the covariance matrix.
(Joint work with Arshak Minasyan)

Nom de l'accordéon
Frederic Pascal - A new robust clustering algorithm - Application to images segmentation.
Texte dans l'accordéon

Though very popular, it is well known that the EM algorithm suffers from non-Gaussian distribution shapes, outliers and high-dimensionality. In this talk, we introduce a new robust clustering algorithm that can efficiently deal with noise and outliers in diverse data sets. As an EM-like algorithm, it is based on both estimations of clusters centers and covariances. In addition, using a semi-parametric paradigm, the method estimates an unknown scale parameter per datapoint. This allows the algorithm to accommodate for heavier tails distributions and outliers without significantly loosing efficiency in various classical scenarios. Then, we show that the proposed algorithm outperforms other classical unsupervised methods of the literature such as k-means, the EM algorithm and its recent modications or spectral clustering when applied to real data sets as MNIST, NORB and 20newsgroups. An application to Radar (SAR) image segmentation is proposed, highlighting the interest of the proposed approach.