Maximum Mean Discrepancy, Hilbert-Schmidt Independence Criterion
August 21, 2016 — March 1, 2024
An integral probability metric. The intersection of reproducing kernel methods, dependence tests, and probability metrics; where we use a kernel embedding to cleverly measure differences between probability distributions, typically an RKHS embedding, but any old Hilbert space will do.
Can be estimated from samples only, which is neat.
A mere placeholder. For a thorough treatment, see the canonical references (Gretton et al. 2008; Gretton, Borgwardt, et al. 2012).
1 Tutorial
Arthur Gretton, Dougal Sutherland, Wittawat Jitkrittum presentation: Interpretable Comparison of Distributions and Models.
Danica Sutherland’s explanation is clear.
Pierre Alquier’s post Universal estimation with Maximum Mean Discrepancy (MMD) shows how to use MMD in a robust nonparametric estimator.
Gaël Varoquaux’ introduction is friendly and illustrated, Comparing distributions: Kernels estimate good representations, l1 distances give good tests based on (Scetbon and Varoquaux 2019).
2 Hilbert-Schmidt Independence Criterion
The HSIC is the application of the MMD to dependence testing, AFAICT.
3 Connection to optimal transport losses
Husain (2020)’s results connect IPMs to transport metrics and regularisation theory, and classification.
Feydy et al. (2019) connects MMD to optimal transport losses.
Arbel et al. (2019) also looks pertinent and has some connections to Wasserstein gradient flow, which is a thing.
4 Connection to kernelized Stein discrepancy
TBD. See Stein VGD.
5 Choice of kernel
Hmm. See Gretton, Sriperumbudur, et al. (2012).
6 Tooling
MMD is included in the ITE toolbox (estimators).
6.1 GeomLoss
The GeomLoss library provides efficient GPU implementations for:
- Kernel norms (also known as Maximum Mean Discrepancies).
- Hausdorff divergences, which are positive definite generalizations of the Chamfer-ICP loss and are analogous to log-likelihoods of Gaussian Mixture Models.
- Debiased Sinkhorn divergences, which are affordable yet positive and definite approximations of Optimal Transport (Wasserstein) distances.
It is hosted on GitHub and distributed under the permissive MIT license. pypi pepy
GeomLoss functions are available through the custom PyTorch layers
SamplesLoss
,ImagesLoss
andVolumesLoss
which allow you to work with weighted point clouds (of any dimension), density maps and volumetric segmentation masks.