Reinforcement learning

November 27, 2014 — July 17, 2023

bandit problems
control
signal processing
stochastic processes
stringology
Figure 1

Here’s an intro to all of machine learning through a historical tale about one particular attempt to teach a machine (not a computer!) to play tic-tac-toe:

1 Theory

1.1 Practice

Figure 2

2 Without reward

Ringstrom (2022)

3 Via diffusion

Is Conditional Generative Modeling all you need for Decision-Making? (Ajay et al. 2023)

4 Deep reinforcement learning

Of course, artificial neural networks are a thing in this domain too.

See Andrej Karpathy’s explanation.

Casual concrete example and intro by Mat Kelcey.

The trick is you approximate the action table in Q-learning using a neural net.

Figure 3

5 Multi agent

With theory of mind.

today we are unveiling Recursive Belief-based Learning (ReBeL), a general RL+Search algorithm that can work in all two-player zero-sum games, including imperfect-information games. ReBeL builds on the RL+Search algorithms like AlphaZero that have proved successful in perfect-information games. Unlike those previous AIs, however, ReBeL makes decisions by factoring in the probability distribution of different beliefs each player might have about the current state of the game, which we call a public belief state (PBS). In other words, ReBeL can assess the chances that its poker opponent thinks it has, for example, a pair of aces.

By accounting for the beliefs of each player, ReBeL is able to treat imperfect-information games akin to perfect-information games. ReBeL can then leverage a modified RL+Search algorithm that we developed to work with the more complex (higher-dimensional) state and action space of imperfect-information games.

6 Incoming

Algorithms for Decision Making: Decision making, in the sense of reinforcement learning

This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of topics related to decision making, introducing the underlying mathematical problem formulations and the algorithms for solving them.

Includes much of interest, including multi-agent learning.

7 References

Ajay, Du, Gupta, et al. 2023. Is Conditional Generative Modeling All You Need for Decision-Making? In.
Bensoussan, Li, Nguyen, et al. 2020. Machine Learning and Control Theory.” arXiv:2006.05604 [Cs, Math, Stat].
Brockman, Cheung, Pettersson, et al. 2016. OpenAI Gym.” arXiv:1606.01540 [Cs].
Clifton, and Laber. 2020. Q-Learning: Theory and Applications.” Annual Review of Statistics and Its Application.
Dayan, and Watkins. n.d. “Reinforcement Learning.” In Encyclopedia of Cognitve Science.
Drori. 2022a. “Deep Reinforcement Learning.” In The Science of Deep Learning.
———. 2022b. “Reinforcement Learning.” In The Science of Deep Learning.
———. 2022c. The Science of Deep Learning.
Fellows, Mahajan, Rudner, et al. 2019. VIREL: A Variational Inference Framework for Reinforcement Learning.” In Advances in Neural Information Processing Systems.
Jaakkola, Singh, and Jordan. 1995. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.” In Advances in Neural Information Processing Systems.
Kaelbling, Littman, and Moore. 1996. Reinforcement Learning: A Survey.” Journal of Artifical Intelligence Research.
Kochenderfer, Wheeler, and Wray. 2022. Algorithms for decision making.
Korbak, Perez, and Buckley. 2022. RL with KL Penalties Is Better Viewed as Bayesian Inference.”
Krakovsky. 2016. Reinforcement Renaissance.” Commun. ACM.
Krishnamurthy, Agarwal, and Langford. 2016. Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations.” arXiv:1602.02722 [Cs, Stat].
Lehman, Gordon, Jain, et al. 2022. Evolution Through Large Models.”
Levine. 2018. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review.” arXiv:1805.00909 [Cs, Stat].
Mania, Guy, and Recht. 2018. Simple Random Search Provides a Competitive Approach to Reinforcement Learning.” arXiv:1803.07055 [Cs, Math, Stat].
Mukherjee, and Liu. 2023. Bridging Physics-Informed Neural Networks with Reinforcement Learning: Hamilton-Jacobi-Bellman Proximal Policy Optimization (HJBPPO).”
Parisotto, and Salakhutdinov. 2017. Neural Map: Structured Memory for Deep Reinforcement Learning.” arXiv:1702.08360 [Cs].
Pfau, and Vinyals. 2016. Connecting Generative Adversarial Networks and Actor-Critic Methods.” arXiv:1610.01945 [Cs, Stat].
Ren, Zhang, Lee, et al. 2023. Spectral Decomposition Representation for Reinforcement Learning.”
Ringstrom. 2022. Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.”
Salimans, Ho, Chen, et al. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning.” arXiv:1703.03864 [Cs, Stat].
Schulman, Wolski, Dhariwal, et al. 2017. Proximal Policy Optimization Algorithms.”
Shibata, Yoshinaka, and Chikayama. 2006. Probabilistic Generalization of Simple Grammars and Its Application to Reinforcement Learning.” In Algorithmic Learning Theory. Lecture Notes in Computer Science 4264.
Silver, Singh, Precup, et al. 2021. Reward Is Enough.” Artificial Intelligence.
Sutton, Richard S, and Barto. 1998. Reinforcement Learning.
Sutton, Richard S., and Barto. 2018. Reinforcement Learning, second edition: An Introduction.
Sutton, Richard S., McAllester, Singh, et al. 2000. Policy Gradient Methods for Reinforcement Learning with Function Approximation.” In Advances in Neural Information Processing Systems.
Thrun. 1992. Efficient Exploration In Reinforcement Learning.”