Generalized Bayesian inference

Approximating the Gibbs posterior

September 26, 2024 — February 12, 2025

Bayes

estimator distribution

functional analysis

Markov processes

Monte Carlo

neural nets

optimization

probabilistic algorithms

probability

SDEs

stochastic processes

Suspiciously similar content

1 ‘Generalized’

I dislike naming things “Generalized”, for all the obvious reasons. Imagine if biologists named Eukaryotes Generalized Prokaryotes. You cannot do this in the rest of the world, but in machine learning, somehow it is normal.

My ongoing battle to have a moratorium on naming anything “generalized” continues with no success.

1.1 Gibbs Posterior and Fancy Methods

The main idea seems to be using the Gibbs posterior and doing fancy stuff with it, like approximating it with variational inference or other cool methods.

2 Generalized Bayesian Computation

I just saw a presentation on Dellaporta et al. (2022) which stakes a claim to the term “Generalized Bayesian Computation”. She mixes bootstrap, Bayes nonparametrics, MMD, and simulation-based inference in an M-open setting. I’m not sure which of the results are specific to that (impressive) paper, but Dellaporta name-checks Fong, Lyddon, and Holmes (2019), Lyddon, Walker, and Holmes (2018), Matsubara et al. (2022), Pacchiardi and Dutta (2022), Schmon, Cannon, and Knoblauch (2021).

There’s some interesting stuff happening in that group. Maybe this introductory post will be a good start: Generalising Bayesian Inference.

3 Generalized Variational Inference

If we add a variational approximation, we can approximate the Gibbs posterior.

Knoblauch, Jewson, and Damoulas (2022) calls this Generalized Variational Inference ¹

The argument is that we can interpret the solution to the Robust Bayesian Inference problem variationally. We recall the average risk in mean form:

\[ R_n(\theta) = \frac{1}{n}\sum_{i=1}^n \ell(\theta, x_i) \]

which defines the Gibbs posterior measure as

\[ \pi_n(\theta) \propto \exp\{-\omega\, n\, R_n(\theta)\}\,\pi(\theta), \]

They argue it is equivalent to solving an optimisation problem over probability measures \[q(\theta)\] of the form

\[ q^* = \arg\min_{q \in \mathcal{P}(\Theta)} \left\{\omega\, n\, \mathbb{E}_q\bigl[R_n(\theta)\bigr] + \mathrm{KL}(q\| \pi)\right\}. \]

The GVI framework generalizes this by allowing three free ingredients in the inference procedure compared to the classic Bayesian (or variational Bayesian):

loss function \(\ell\), as in Gibbs posteriors
a divergence function \(D\) (which doesn’t have to be the KL divergence)
variational family \(\mathcal{Q}\).

The optimisation objective is

\[ q^* = \arg\min_{q\in \mathcal{Q}} \left\{\mathbb{E}_q\biggl[\sum_{i=1}^n \ell(\theta,x_i)\biggr] + D(q\| \pi)\right\}. \]

In this setup, when \(D\) is the KL divergence and the loss is the negative log-likelihood (properly scaled), the classical Bayesian posterior is recovered.

4 Connection to other non-KL inference.

TBD. See inference without KL divergence.

5 References

Dellaporta, Knoblauch, Damoulas, et al. 2022. “Robust Bayesian Inference for Simulator-Based Models via the MMD Posterior Bootstrap.” arXiv:2202.04744 [Cs, Stat].

Fong, Lyddon, and Holmes. 2019. “Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap.” arXiv:1902.03175 [Cs, Stat].

Galvani, Bardelli, Figini, et al. 2021. “A Bayesian Nonparametric Learning Approach to Ensemble Models Using the Proper Bayesian Bootstrap.” Algorithms.

Knoblauch, Jewson, and Damoulas. 2019. “Generalized Variational Inference: Three Arguments for Deriving New Posteriors.”

———. 2022. “An Optimization-Centric View on Bayes’ Rule: Reviewing and Generalizing Variational Inference.” Journal of Machine Learning Research.

Lyddon, Walker, and Holmes. 2018. “Nonparametric Learning from Bayesian Models with Randomized Objective Functions.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.

Matsubara, Knoblauch, Briol, et al. 2022. “Robust Generalised Bayesian Inference for Intractable Likelihoods.” Journal of the Royal Statistical Society Series B: Statistical Methodology.

Pacchiardi, and Dutta. 2022. “Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators.” arXiv:2104.03889 [Stat].

Schmon, Cannon, and Knoblauch. 2021. “Generalized Posteriors in Approximate Bayesian Computation.” arXiv:2011.08644 [Stat].

Footnotes

A name lab-grown to irritate me. I reject calling things “Generalized” and I also think that “variational inference” as statisticians use it is a misnomer. I acknowledge I will not win this naming fight, but that does not mean I need to like it.↩︎