Causality, agency, decisions

Updateless decision theory, Newcomb’s boxes, evidential decision theory, commitment races …

October 23, 2018 — February 20, 2025

adaptive

agents

cooperation

economics

evolution

extended self

game theory

incentive mechanisms

learning

mind

networks

social graph

utility

wonk

Suspiciously similar content

Notes on decision theory and causality where agents make decisions, especially in the context of AI safety. This is something I’m actively trying to understand better at the moment. There is some mysterious causality juju in foundation models and other neural nets. This suggests to me that we should think hard about this as we move into the age of AI.

AFAICT using causality to reason about intelligent systems requires some extensions to vanilla causality, because they can themselves reason about the outcomes they wish to achieve, which makes stuff complicated and occasionally weird.

TBC.

1 Causality with feedback

A basic thermostat is an example of the most basic extension of basic causality; see causality under feedback.

2 Mechanised Multi-agent DAGs

Extending causal DAGs to include agents and decisions. Connection to game theory and multi-agent systems (Hammond et al. 2023; Liu et al. 2024).

2.1 Basic Mechanisation

A recent and fresh introduction seems to be Everitt et al. (2021), so let us follow along with that.

set of random variables \(\boldsymbol{V}\) with joint distribution \(\operatorname{Pr}(\boldsymbol{V})\) is a directed acyclic graph (DAG) \(\mathcal{G}=(\boldsymbol{V}, \mathcal{E})\) with vertices \(\boldsymbol{V}\) and edges \(\mathcal{E}\) such that the joint distribution can be factorised as \(\operatorname{Pr}(\boldsymbol{V})=\prod_{V \in \boldsymbol{v}} \operatorname{Pr}\left(V \mid \boldsymbol{P a}_V\right)\), where \(\boldsymbol{P} \boldsymbol{a}_{\boldsymbol{V}}\) are the parents of \(V\) in \(G\).

This should be familiar from causal DAGs.

And there is an extension to classic Bayesian networks called influence diagrams which are a generalisation of Bayesian networks that can represent decision problems, using “square nodes for decision variables, diamond nodes for utility variables, and round nodes for everything else” In contrast to classic influence diagrams there are probability distributions over decision variables.

TBC

2.2 Multi-agent graphs

There seem to be a long series of works attempting this (Heckerman and Shachter 1994; Dawid 2002; Koller and Milch 2003). I am working from Hammond et al. (2023) and MacDermott, Everitt, and Belardinelli (2023), which introduce the One Ring that unifies them all in the form of something called a Mechanised Multi-Agent Influence Diagram, a.k.a. a MMAID.

3 Identifying agency

What even is agency? How do we recognise it in natural and artificial systems? What are the implications for control, economics, and technology?

Discovering Agents (Kenton et al. 2023; MacDermott et al. 2024) takes an empirical look at the question of agency by examining, AFAICT, the question of what is a deciding node in a mechanised causal graph.

4 Causal attribution and blameworthiness

I should write more about this: a connection to computational morality. Everitt et al. (2022) and Joseph Y. Halpern and Kleiman-Weiner (2018) seem to be works in this domain.

5 Causal vs Evidential decision theory

I no longer think this binary is a good way of understanding the Newcomb problem because

I think the mechanised causal graphs look like crisper definition of the concepts here
The analyses that start from flavours of decision theory, rather than the causal axiomatization, seem unusually spammy and full of vagueness.

This is kept for historical reasons.

Fancy decision theories for problems arising in strategic conflict and in superintelligence scenarios. Keyword: Newcomb’s paradox. A reflective twist of game theory worries about decision problems with smart predictive agents. Strong AI risk people are excitable in the vicinity of these problems.

I have had the following resources recommended to me:

Although their reading list is occasionally IMO undiscerning, you might want to start with MIRI’s intro which at least exists.

Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws).

I haven’t read any of those though. I would probably start from Wolpert and Benford (2013); David Wolpert always seems to have a good Gordian knot cutter on his analytical multitool.

6 Updateless decision theory

7 Commitment races

Commitment Races are important in international relations. They also seem popular in AI safety theory, although I am not sure why, since I don’t understand how AIs can credibly commit to things; setting up credible signals that they will commit to seems difficult and probably exceptional for very opaque systems.

8 References

Ånestrand. 2024. “Emergence of Agency from a Causal Perspective.”

Bell, Linsefors, Oesterheld, et al. 2021. “Reinforcement Learning in Newcomblike Environments.” In Advances in Neural Information Processing Systems.

Benford. 2010. “What Does Newcomb’s Paradox Teach Us?”

Bongers, Forré, Peters, et al. 2021. “Foundations of Structural Causal Models with Cycles and Latent Variables.” The Annals of Statistics.

Caniglia, Murray, Hernán, et al. n.d. “Estimating Optimal Dynamic Treatment Strategies Under Resource Constraints Using Dynamic Marginal Structural Models.” Statistics in Medicine.

Dawid. 2002. “Influence Diagrams for Causal Modelling and Inference.” International Statistical Review.

Everitt, Carey, Langlois, et al. 2021. “Agent Incentives: A Causal Perspective.” In Proceedings of the AAAI Conference on Artificial Intelligence.

Everitt, Ortega, Barnes, et al. 2022. “Understanding Agent Incentives Using Causal Influence Diagrams. Part I: Single Action Settings.”

Fernández-Loría, and Provost. 2021. “Causal Decision Making and Causal Effect Estimation Are Not the Same… and Why It Matters.” arXiv:2104.04103 [Cs, Stat].

Fox, MacDermott, Hammond, et al. 2023. “On Imperfect Recall in Multi-Agent Influence Diagrams.” Electronic Proceedings in Theoretical Computer Science.

Gans. 2018. “Self-Regulating Artificial General Intelligence.”

Geiger, Ibeling, Zur, et al. 2024. “Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability.”

Halpern, Joseph Y. 1998. “Axiomatizing Causal Reasoning.” In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. UAI’98.

Halpern, J. Y. 2000. “Axiomatizing Causal Reasoning.” Journal of Artificial Intelligence Research.

Halpern, Joseph Y., and Kleiman-Weiner. 2018. “Towards Formal Definitions of Blameworthiness, Intention, and Moral Responsibility.”

Hammond, Fox, Everitt, et al. 2023. “Reasoning about Causality in Games.” Artificial Intelligence.

Heckerman, and Shachter. 1994. “A Decision-Based View of Causality.” In Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence. UAI’94.

Howard, and Matheson. 2005. “Influence Diagrams.” Decision Analysis.

Kenton, Kumar, Farquhar, et al. 2023. “Discovering Agents.” Artificial Intelligence.

Koller, and Milch. 2003. “Multi-Agent Influence Diagrams for Representing and Solving Games.” Games and Economic Behavior, First World Congress of the Game Theory Society,.

Lattimore. 2017. “Learning How to Act: Making Good Decisions with Machine Learning.”

Liu, Wang, Li, et al. 2024. “Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games.”

MacDermott, Everitt, and Belardinelli. 2023. “Characterising Decision Theories with Mechanised Causal Graphs.”

MacDermott, Fox, Belardinelli, et al. 2024. “Measuring Goal-Directedness.”

Mollo, and Millière. 2023. “The Vector Grounding Problem.”

Richens, and Everitt. 2024. “Robust Agents Learn Causal World Models.”

Ward, Francis Rhys, MacDermott, Belardinelli, et al. 2024. “The Reasons That Agents Act: Intention and Instrumental Goals.”

Ward, Francis, Toni, Belardinelli, et al. 2023. “Honesty Is the Best Policy: Defining and Mitigating AI Deception.” In Advances in Neural Information Processing Systems.

Wolpert, and Benford. 2013. “The Lesson of Newcomb’s Paradox.” Synthese.