Bayes for beginners
May 30, 2016 — July 23, 2022
Even for the most curmudgeonly frequentist, it is sometimes refreshing to move your effort from deriving frequentist estimators for intractable models to using the damn Bayesian ones, which fail in different and interesting ways than you are used to. If it works and you are feeling fancy, you might then justify your Bayesian method on frequentist grounds, which washes away the sin.
Here are some scattered tidbits about getting into it. No attempt is made to be comprehensive, novel, or even expert.
1 Prior choice
Is weird and important. Here are some argumentative and disputed rules of thumb.
2 Teaching
2.1 Course material
So many! Too many. Actually, I kinda like McElreath’s stuff to teach from; you get practical quite quickly.
How to measure anything (Hubbard 2014)
The milieu around Andrew Gelman (Gelman, Hill, and Vehtari 2021; Gelman and Nolan 2017; Gelman et al. 2013). These are very good courses for the kind of statistics most people need, including people who think they need different statistics. Bayesian Data Analysis is online
McElreath (2020) is a cult textbook which various people have reimplemented in various languages. It is remarkable how far this takes some very simple computational tools.
Cameron Davidson-Pilon, Probabilistic Programming & Bayesian Methods for Hackers (source) is an interesting one; does what it says on the tin. IMO McElreath is just a bit better, even for hackers, but this is cheaper and still a good start.
2.2 Worked examples
3 Linear regression
This workhorse pops up everywhere.
Deisenroth and Zafeiriou, Mathematics for Inference and Machine Learning give an ML perspective.
4 Workflow
If we want to use Bayesian tools to do science, there is a principled workflow that we need to be thinking about. For a fun rant, read Shalizi on Praxis and Ideology in Bayesian Data Analysis, about Gelman and Shalizi (2013).
The visualization how-to from, basically, the Stan team, is deeper than it sounds and highly recommended (Gabry et al. 2019).
Michael Betancourt’s examples, for example his workflow tips, are a good start for practical work, incorporating the inevitable collision of statistical and computational difficulties.
See also BAT the Bayesian Analysis Toolkit, which does sophisticated Bayes modelling although AFAICT uses a fairly basic sampler?
Notes on Rao-Blackwellization for doing faster MCMC inference, and even handling discrete parameters in Stan.
5 Nonparametrics
Dirichlet processes, Gaussian Process regression etc. 🏗