Statistical mechanics of statistics
December 1, 2016 — January 7, 2025
Boaz Barak has a miniature dictionary for statisticians:
I’ve always been curious about the statistical physics approach to problems from computer science. The physics-inspired algorithm survey propagation is the current champion for random 3SAT instances, statistical-physics phase transitions have been suggested as explaining computational difficulty, and statistical physics has even been invoked to explain why deep learning algorithms seem to often converge to useful local minima.
Unfortunately, I have always found the terminology of statistical physics, “spin glasses”, “quenched averages”, “annealing”, “replica symmetry breaking”, “metastable states” etc… to be rather daunting
Jaan Altosaar’s guided translation is great.
Connection to singular learning theory and neural networks?
1 Phase transitions in statistical inference
There is a deep analogy between statistical inference and statistical physics; I will give a friendly introduction to both of these fields. I will then discuss phase transitions in two problems of interest to a broad range of data sciences: community detection in social and biological networks, and clustering of sparse high-dimensional data. In both cases, if our data becomes too sparse or too noisy, it suddenly becomes impossible to find the underlying pattern, or even tell if there is one. Physics both helps us locate these phase transitions, and design optimal algorithms that succeed all the way up to this point. Along the way, I will visit ideas from computational complexity, random graphs, random matrices, and spin glass theory.
There is an overview lecture by Thomas Orton, which cites lots of the good stuff
Last week, we saw how certain computational problems like 3SAT exhibit a thresholding behaviour, similar to a phase transition in a physical system. In this post, we’ll continue to look at this phenomenon by exploring a heuristic method, belief propagation (and the cavity method), which has been used to make hardness conjectures, and also has thresholding properties. In particular, we’ll start by looking at belief propagation for approximate inference on sparse graphs as a purely computational problem. After doing this, we’ll switch perspectives and see belief propagation motivated in terms of Gibbs free energy minimisation for physical systems. With these two perspectives in mind, we’ll then try to use belief propagation to do inference on the stochastic block model. We’ll see some heuristic techniques for determining when BP succeeds and fails in inference, as well as some numerical simulation results of belief propagation for this problem. Lastly, we’ll talk about where this all fits into what is currently known about efficient algorithms and information theoretic barriers for the stochastic block model.
See Igor Carron’s “phase diagram” list, and stuff like (Oymak and Tropp 2015). Likely there are connections to Erdős-Renyi giant components and other complex network things in probabilistic graph learning. Read (Barbier 2015; Poole et al. 2016).
2 Replicator equations and evolutionary processes
See also evolution, game theory.
Gentle intro lecture by John Baez, Biology as Information Dynamics.
See (Baez 2011; Harper 2009; Shalizi 2009; Sinervo and Lively 1996).
3 Grokking
See Grokking.
4 Singular learning theory
See singular learning theory, which also produces an analysis of Grokking-like behaviour in terms of degeneracies in the loss landscape.
5 Annealing
See annealing.
6 Entropy vs information
7 Neural tangent kernel
Has been argued to fit in this category in, e.g. Cagnetta et al. (2023).