Incentive alignment problems

What is your loss function?

September 22, 2014 — September 8, 2023

adversarial
economics
extended self
faster pussycat
game theory
incentive mechanisms
institutions
networks
tail risk
security
swarm
Figure 1

Placeholder to discuss alignment problems in AI, economic mechanisms, and institutions.

Many things to unpack. What do we imagine alignment to be when our own goals are themselves a diverse evolutionary epiphenomenon? Does everything ultimately Goodhart? Is that the origin of Moloch?

1 AI alignment

The knotty case of superintelligent AI in particular

1.1 Intuitionistic AI safety

For me, arguing that chatbots should not be able to simulate hateful speech is like saying we shouldn’t simulate car crashes. In my line of work, simulating things is precisely how we learn to prevent them. Generally, if something is terrible, it is very important to understand it in order to avoid it. It seems to me that understanding how hateful content arises can be achieved through simulation, just as car crashes can be understood through simulation. I would like to avoid both hate and car crashes.

I am not impressed by efforts to restrict what thoughts the machines can express. I think they are terribly, fiercely, catastrophically dangerous. The furore about whether they sound mean does not seem to me to be terribly relevant to this.

Kareem Carr more rigorously describes what he thinks people imagine the machines should do, which he calls a solution. He does, IMO, articulate beautifully what is going on.

I resist calling it a solution because I think the problem is ill-defined. Equity is purpose-specific and would have no universal solution, and in this ill-defined domain where the purpose of the models is not clear (placating pundits? fueling Twitter discourse?), there is not much specific to say about how to make a content generator equitable.

Figure 2

2 Decision theory of

TBD

3 Incoming

  • Joe Edelman, Is Anything Worth Maximizing? How metrics shape markets, how we’re doing them wrong

    Metrics are how an algorithm or an organization listens to you. If you want to listen to one person, you can just sit with them and see how they’re doing. If you want to listen to a whole city — a million people — you have to use metrics and analytics

    and

    What would it be like, if we could actually incentivize what we want out of life? If we incentivized lives well lived.

4 References

Aktipis. 2016. Principles of Cooperation Across Systems: From Human Sharing to Multicellularity and Cancer.” Evolutionary Applications.
Bostrom. 2014. Superintelligence: Paths, Dangers, Strategies.
Daskalakis, Deckelbaum, and Tzamos. 2013. Mechanism Design via Optimal Transport.” In.
Ecoffet, and Lehman. 2021. Reinforcement Learning Under Moral Uncertainty.”
Guha, Lawrence, Gailmard, et al. 2023. AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing.” George Washington Law Review, Forthcoming.
Hutson. 2022. Taught to the Test.” Science.
Jackson. 2014. Mechanism Theory.” SSRN Scholarly Paper ID 2542983.
Korinek, Fellow, Balwit, et al. n.d. “Direct and Social Goals for AI Systems.”
Lambrecht, and Myers. 2017. The Dynamics of Investment, Payout and Debt.” The Review of Financial Studies.
Manheim, and Garrabrant. 2019. Categorizing Variants of Goodhart’s Law.”
Naudé. 2022. The Future Economics of Artificial Intelligence: Mythical Agents, a Singleton and the Dark Forest.” IZA Discussion Papers, IZA Discussion Papers,.
Ngo, Chan, and Mindermann. 2024. The Alignment Problem from a Deep Learning Perspective.”
Nowak. 2006. Five Rules for the Evolution of Cooperation.” Science.
Omohundro. 2008. The Basic AI Drives.” In Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference.
Ringstrom. 2022. Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.”
Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control.
Silver, Singh, Precup, et al. 2021. Reward Is Enough.” Artificial Intelligence.
Taylor, Yudkowsky, LaVictoire, et al. 2020. Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence.
Xu, and Dean. 2023. Decision-Aid or Controller? Steering Human Decision Makers with Algorithms.”
Zhuang, and Hadfield-Menell. 2021. Consequences of Misaligned AI.”