Morality and computational constraints
It is as if we knew what we were doing
October 2, 2023 — January 16, 2025
Suspiciously similar content
Notes on connections between computation and morality. Is pleasure a reward signal? Is a loss penalty pain? Is efficient learning update a moral imperative? A workplace health and safety matter?
Links on those themes.
1 Background
1.1 Reinforcement learning and morality
I would like to know everything that people have said about this. The review in Vishwanath, Dennis, and Slavkovik (2024) is perfunctory, but I guess it is a start?
Abel, MacGlashan, and Littman (2016) >Emerging AI systems will be making more and more decisions that impact the lives of humans in a significant way. It >is essential, then, that these AI systems make decisions that >take into account the desires, goals, and preferences of other >people, while simultaneously learning about what those preferences are. In this work, we argue that the reinforcement-learning framework achieves the appropriate generality required to theorize about an idealized ethical artificial agent, >and offers the proper foundations for grounding specific questions about ethical learning and decision making that can pro- >mote further scientific investigation. We define an idealized >formalism for an ethical learner, and conduct experiments on >two toy ethical dilemmas, demonstrating the soundness and >flexibility of our approach
2 Ethical systems as computational optimization
- APXHARD, Ethical Systems as Computational Optimizations
- Stenseke (2023) also emphasises the computational tractability problems of morality
3 When does negative reinforcement hurt?
Ethical Issues in Artificial Reinforcement Learning
There is a remarkable connection between artificial reinforcement-learning (RL) algorithms and the process of reward learning in animal brains. Do RL algorithms on computers pose moral problems? I think current RL computations do matter, though they’re probably less morally significant than animals, including insects, because the degree of consciousness and emotional experience seems limited in present-day RL agents. As RL becomes more sophisticated and is hooked up to other more “conscious” brain-like operations, this topic will become increasingly urgent. Given the vast numbers of RL computations that will be run in the future in industry, video games, robotics, and research, the moral stakes may be high. I encourage scientists and altruists to work toward more humane approaches to reinforcement learning.
TBC
4 Computational definitions of blame
Everitt et al. (2022) and Halpern and Kleiman-Weiner (2018) seem to be in in this domain, as noted in the causality and agency snippet.
5 Incoming
Joscha Bach, From elementary computation to ethics? (Hear also his interview)
The disturbances and the performance of the mind are measured and controlled with a system of rewards and constraints. Because of the mind’s generality, it may find that the easiest way of regulating the disturbances that gave rise to its engagement would be to change the representation of the disturbance. In most cases, this amounts to anesthesia and will not serve the telos of the organism, so evolution has created considerable barriers to prevent write access of the mind to its governing mechanisms. These barriers are responsible for creating the mind’s identification with the rewards and self-model.
When we are concerned about suffering, we are usually referring to disturbances that generate a strong negative reward, but cannot be resolved within the given constraints. On an individual level, disease, mishap and crisis lead to suffering. But we also have a global suffering that is caused by the universal maladaption of humans to present human society, which developed within few generations and deviates considerably from the small-tribe collective environment that we are evolutionary adapted for.[…]
Humans experience morality viscerally; a famous example is Haidt’s the Moral foundations model, but there are others.
What collective moralities are possible? I think about them as <em>moral orbits</em>.
Karl Friston and the other predictive coding theorists are IMO implicitly involved via the theory of motivation (Miller Tate 2021)
Grosse et al. (2023) is a magisterial study of LLMs which looks at how they reason from examples. Why I think this is significant to computational is that it suggests a case-based morality might be feasible for LLMs
need to find a compact statement of what Professor Javen Qinfeng Shi said in a presentation I saw:
Mind is a choice maker. Choices shape the mind
- Q learning: do what a good/kind person would do (moment to moment), learn wisdom (V function) and have faith in future and self-growth. It naturally leads to optimal long-term accumulative rewards (Bellman equation)
- Policy gradient: learn from past successes (to repeat or mimic) and mistakes (to avoid). Require complete episodes to reveal the end accumulative reward per episode.
This is the first time I have heard of policy gradient as utilitarianism versus Q learning as virtue ethics. Citation needed.
The analogy in Krening (2023) is different. Maybe Shiravand and André (2024)? Govindarajulu, Bringjsord, and Ghosh (2018) lays out the ethical systems but seems oblivious to RL.