Incentive alignment problems

What is your loss function?

September 22, 2014 — February 2, 2025

adversarial

economics

extended self

faster pussycat

game theory

incentive mechanisms

institutions

networks

security

swarm

tail risk

Suspiciously similar content

Placeholder to discuss alignment problems in AI, economic mechanisms, and institutions.

Many things to unpack. What do we imagine alignment to be when our own goals are themselves a diverse evolutionary epiphenomenon? Does everything ultimately Goodhart? Is that the origin of Moloch?

1 AI alignment

The knotty case of superintelligent AI in particular, when it acquires goals.

2 Incoming

Joe Edelman, Is Anything Worth Maximizing? How metrics shape markets, how we’re doing them wrong

Metrics are how an algorithm or an organisation listens to you. If you want to listen to one person, you can just sit with them and see how they’re doing. If you want to listen to a whole city — a million people — you have to use metrics and analytics

and

What would it be like, if we could actually incentivize what we want out of life? If we incentivized lives well lived.
Goal Misgeneralization: How a Tiny Change Could End Everything - YouTube

This video explores how YOU, YES YOU, are a case of misalignment with respect to evolution’s implicit optimization objective. We also show an example of goal misgeneralization in a simple AI system, and explore how deceptive alignment shares similar features and may arise in future, far more powerful AI systems.

3 References

Aguirre, Dempsey, Surden, et al. 2020. “AI Loyalty: A New Paradigm for Aligning Stakeholder Interests.” IEEE Transactions on Technology and Society.

Aktipis. 2016. “Principles of Cooperation Across Systems: From Human Sharing to Multicellularity and Cancer.” Evolutionary Applications.

Bostrom. 2014. Superintelligence: Paths, Dangers, Strategies.

Daskalakis, Deckelbaum, and Tzamos. 2013. “Mechanism Design via Optimal Transport.” In.

Ecoffet, and Lehman. 2021. “Reinforcement Learning Under Moral Uncertainty.” In Proceedings of the 38th International Conference on Machine Learning.

Guha, Lawrence, Gailmard, et al. 2023. “AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing.” George Washington Law Review, Forthcoming.

Hutson. 2022. “Taught to the Test.” Science.

Jackson. 2014. “Mechanism Theory.” SSRN Scholarly Paper ID 2542983.

Korinek, Fellow, Balwit, et al. n.d. “Direct and Social Goals for AI Systems.”

Lambrecht, and Myers. 2017. “The Dynamics of Investment, Payout and Debt.” The Review of Financial Studies.

Manheim, and Garrabrant. 2019. “Categorizing Variants of Goodhart’s Law.”

Naudé. 2022. “The Future Economics of Artificial Intelligence: Mythical Agents, a Singleton and the Dark Forest.” IZA Discussion Papers, IZA Discussion Papers,.

Ngo, Chan, and Mindermann. 2024. “The Alignment Problem from a Deep Learning Perspective.”

Nowak. 2006. “Five Rules for the Evolution of Cooperation.” Science.

Omohundro. 2008. “The Basic AI Drives.” In Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference.

Ringstrom. 2022. “Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.”

Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control.

Silver, Singh, Precup, et al. 2021. “Reward Is Enough.” Artificial Intelligence.

Taylor, Yudkowsky, LaVictoire, et al. 2020. “Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence.

Xu, and Dean. 2023. “Decision-Aid or Controller? Steering Human Decision Makers with Algorithms.”

Zhuang, and Hadfield-Menell. 2021. “Consequences of Misaligned AI.”