Score diffusion
August 17, 2020 — November 14, 2024
Langevin dynamics plus score functions to sample from tricky distributions, especially high dimensional ones, via clever use of stochastic differential equations. Made famous by neural diffusions.
Short version: Classic Langevin MCMC samples from a target distribution by creating an indefinitely long pseudo-time axis, \(\tau\in[0,\infty)\), and a stationary Markov transitition kernel that is reversible with respect to the target distribution. As this virtual time grows large the process samples from the desired target distribution with the invocation of some ergodic theory.
Score diffusions, by contrast, set up a slightly different equation, on a finite time interval \(\tau\in[0,T]\). The transition kernel is not stationary, but configured so that, at time \(\tau=T\), it is some “easy” distribution and at pseudo-time \(\tau=0\) it is some challenging distribution. Sampling is a more involved process, involving walking backwards and fowards through pseudo-time.
In both cases, we inject noise in a clever way and perturbing it with the score function, and use the same underlying Langevin dynamics.
1 Introductions
The idea is explained in small words in Sohl-Dickstein et al. (2015). See also popularised articles for ML people,
- Yang Song, Generative Modeling by Estimating Gradients of the Data Distribution.
- What are Diffusion Models?
All of those introductions include more work that we need to do in this notebook. Here we are given the score function, unlike in the above references, where it must be learned. Neural diffusions trditionally do that part by score matching, which is out of scope for this notebook.
2 Reversing a diffusion
Part of the process (Anderson 1982; Bao et al. 2016). Is this meant in the same sense as backward SDEs?
3 Conditioning
TBD