Langevin dynamcs MCMC

August 17, 2020 — September 5, 2024

Bayes
estimator distribution
functional analysis
Markov processes
Monte Carlo
neural nets
optimization
probabilistic algorithms
probability
SDEs
stochastic processes
Figure 1: Randomly exploring the posterior space.

Sampling using the particular SDE that is the Langevin equation. It can be rather close to optimising, via SDE representations of SGD.

1 Langevin dynamics

2 Metropolis-adjusted Langevin algorithm (MALA)

3 Continuous time

See log concave distributions for a family of distributions where this works especially well because implcit (more nearly continuous-time exact) solutions are available Hodgkinson, Salomone, and Roosta (2019).

Left-field, Max Raginsky, Sampling Using Diffusion Processes, from Langevin to Schrödinger:

the Langevin process gives only approximate samples from \(\mu\). I would like to discuss an alternative approach that uses diffusion processes to obtain exact samples in finite time. This approach is based on ideas that appeared in two papers from the 1930s by Erwin Schrödinger in the context of physics, and is now referred to as the Schrödinger bridge problem.

4 Annealed

TBC Jolicoeur-Martineau et al. (2022);Song and Ermon (2020a);Song and Ermon (2020b).

Yang Song, Generative Modeling by Estimating Gradients of the Data Distribution

5 Incoming

Holden Lee, Andrej Risteski introduce the connection between log-concavity and convex optimisation.

\[ x_{t+\eta} = x_t - \eta \nabla f(x_t) + \sqrt{2\eta}\xi_t,\quad \xi_t\sim N(0,I). \]

Below are some implementations I explored. I do not recommend them necessarily; if it is a simple unadjusted Langevin sample, the algorithm is simple enough implement from scratch. Better, the ChatGPT version comes out right first time in my experience.

6 References

Ahn, Korattikara, and Welling. 2012. Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12.
Alexos, Boyd, and Mandt. 2022. Structured Stochastic Gradient MCMC.” In Proceedings of the 39th International Conference on Machine Learning.
Bakker, Post, Langevin, et al. 2016. Scripting MODFLOW Model Development Using Python and FloPy.” Groundwater.
Beskos, Roberts, and Stuart. 2009. Optimal Scalings for Local Metropolis–Hastings Chains on Nonproduct Targets in High Dimensions.” The Annals of Applied Probability.
Brosse, Moulines, and Durmus. 2018. The Promises and Pitfalls of Stochastic Gradient Langevin Dynamics.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Chen, Fox, and Guestrin. 2014. Stochastic Gradient Hamiltonian Monte Carlo.” In Proceedings of the 31st International Conference on Machine Learning.
Dalalyan. 2017. Further and Stronger Analogy Between Sampling and Optimization: Langevin Monte Carlo and Gradient Descent.” arXiv:1704.04752 [Math, Stat].
Ding, Fang, Babbush, et al. 2014. Bayesian Sampling Using Stochastic Gradient Thermostats.” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14.
Dockhorn, Vahdat, and Kreis. 2022. Score-Based Generative Modeling with Critically-Damped Langevin Diffusion.”
Domke. 2017. A Divergence Bound for Hybrids of MCMC and Variational Inference and an Application to Langevin Dynamics and SGVI.” In PMLR.
Durmus, and Moulines. 2016. High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [Math, Stat].
Filippone, and Engler. 2015. Enabling Scalable Stochastic Gradient-Based Inference for Gaussian Processes by Employing the Unbiased LInear System SolvEr (ULISSE).” In Proceedings of the 32nd International Conference on Machine Learning.
Garbuno-Inigo, Hoffmann, Li, et al. 2020. Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.” SIAM Journal on Applied Dynamical Systems.
Ge, Lee, and Risteski. 2020. Simulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.” arXiv:1812.00793 [Cs, Math, Stat].
Girolami, and Calderhead. 2011. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Grenander, and Miller. 1994. Representations of Knowledge in Complex Systems.” Journal of the Royal Statistical Society: Series B (Methodological).
Gürbüzbalaban, Gao, Hu, et al. 2021. Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo.” Journal of Machine Learning Research.
Hairer, Stuart, Voss, et al. 2005. Analysis of SPDEs Arising in Path Sampling. Part I: The Gaussian Case.” Communications in Mathematical Sciences.
Hairer, Stuart, and Voss. 2007. Analysis of SPDEs Arising in Path Sampling Part II: The Nonlinear Case.” The Annals of Applied Probability.
Hodgkinson, Salomone, and Roosta. 2019. Implicit Langevin Algorithms for Sampling From Log-Concave Densities.” arXiv:1903.12322 [Cs, Stat].
Ho, Jain, and Abbeel. 2020. Denoising Diffusion Probabilistic Models.” arXiv:2006.11239 [Cs, Stat].
Holzschuh, Vegetti, and Thuerey. 2022. “Score Matching via Differentiable Physics.”
Jolicoeur-Martineau, Piché-Taillefer, Mitliagkas, et al. 2022. Adversarial Score Matching and Improved Sampling for Image Generation.” In.
Lim, Kovachki, Baptista, et al. 2023. Score-Based Diffusion Models in Function Space.”
Liu, Zhuo, Cheng, et al. 2019. Understanding and Accelerating Particle-Based Variational Inference.” In Proceedings of the 36th International Conference on Machine Learning.
Mandt, Hoffman, and Blei. 2017. Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR.
Ma, Wang, Kim, et al. 2021. Transfer Learning of Memory Kernels for Transferable Coarse-Graining of Polymer Dynamics.” Soft Matter.
Murray, and Ghahramani. 2004. Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms.” In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. UAI ’04.
Norton, and Fox. 2016. Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat].
Parisi. 1981. Correlation Functions and Computer Simulations.” Nuclear Physics B.
Pavliotis. 2014. Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations. Texts in Applied Mathematics.
Rásonyi, and Tikosi. 2022. On the Stability of the Stochastic Gradient Langevin Algorithm with Dependent Data Stream.” Statistics & Probability Letters.
Roberts, and Rosenthal. 1998. Optimal Scaling of Discrete Approximations to Langevin Diffusions.” Journal of the Royal Statistical Society. Series B (Statistical Methodology).
Roberts, and Tweedie. 1996. Exponential Convergence of Langevin Distributions and Their Discrete Approximations.” Bernoulli.
Seifert. 2012. Stochastic Thermodynamics, Fluctuation Theorems and Molecular Machines.” Reports on Progress in Physics.
Shang, Zhu, Leimkuhler, et al. 2015. Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling.” In Advances in Neural Information Processing Systems. NIPS’15.
Song, and Ermon. 2020a. Generative Modeling by Estimating Gradients of the Data Distribution.” In Advances In Neural Information Processing Systems.
———. 2020b. Improved Techniques for Training Score-Based Generative Models.” In Advances In Neural Information Processing Systems.
Song, Sohl-Dickstein, Kingma, et al. 2022. Score-Based Generative Modeling Through Stochastic Differential Equations.” In.
Welling, and Teh. 2011. Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11.
Xifara, Sherlock, Livingstone, et al. 2014. Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters.
Zhang, Ruqi, Liu, and Liu. 2022. A Langevin-Like Sampler for Discrete Distributions.” In Proceedings of the 39th International Conference on Machine Learning.
Zhang, Jianyi, Zhang, Carin, et al. 2020. Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory.” In International Conference on Artificial Intelligence and Statistics.