Bayesian and causal inference by foundation models
August 29, 2024 — February 24, 2025
Suspiciously similar content
Placeholder, for exploring the idea that transformers or their ilk might be good at actual general causal inference, without explicitly designing causality inference.
As set functions, transformers look a lot like ‘generalized inference machines’. Are they? Can we make them do ‘proper’ causal inference in some formal sense?
This is a scrapbook of interesting approaches; Bayesian inference over LLM outputs, understanding in-context learning as Bayesian conditioning, and so on.
Last time I checked, this phenomenon was understood empirically; there are lots of reasons we might imagine it can happen in practice.
Probably connected: Mechanistic interpretability, causal inference in foundation models, and so on.
1 Causal Abstraction
Geiger et al. (2024) builds upon Correa and Bareinboim (2020):
In some ways, studying modern deep learning models is like studying the weather or an economy: they involve large numbers of densely connected ‘microvariables’ with complex, non-linear dynamics. One way of reining in this complexity is to find ways of understanding these systems in terms of higher-level, more abstract variables (‘macrovariables’). For instance, the many microvariables might be clustered together into more abstract macrovariables. A number of researchers have been exploring theories of causal abstraction, providing a mathematical framework for causally analyzing a system at multiple levels of detail
Indeed they have. See Causal Abstraction
2 Probabilistic sampling from transformers
Alireza Makhzani introduces Zhao et al. (2024):
Many capability and safety techniques of LLMs—such as RLHF, automated red-teaming, prompt engineering, and infilling—can be viewed from a probabilistic inference perspective, specifically as sampling from an unnormalised target distribution defined by a given reward or potential function. Building on this perspective, we propose using twisted Sequential Monte Carlo (SMC) as a principled probabilistic inference framework to approach these problems. Twisted SMC is a variant of SMC with additional twist functions that predict the future value of the potential at each timestep, enabling the inference to focus on promising partial sequences. We show the effectiveness of twisted SMC for sampling rare, undesirable outputs from a pretrained model (useful for harmlessness training and automated red-teaming), generating reviews with varied sentiment, and performing infilling tasks.
Our paper offers much more! We propose a novel twist learning method inspired by energy-based models; we connect the twisted SMC literature with soft RL; we propose novel bidirectional SMC bounds on log partition functions as a method for evaluating inference in LLMs; and finally we provide probabilistic perspectives for many more controlled generation methods in LLMs.
More methods in the references.