Differentiable learning of automata

October 14, 2016 — March 31, 2025

compsci
machine learning
making things
neural nets

Learning stack machines, random access machines, nested hierarchical parsing machines, Turing machines, and whatever other automata-with-memory you wish, from data. In other words, teaching computers to program themselves via a deep learning formalism.

Figure 1

This is an obvious idea, indeed, one of the original ideas from the GOFAI days. How to make it work in NNs is not obvious, but there are some charming toy examples. Obviously, a hypothetical superhuman Artificial General Intelligence would be good at handling computer-science problems.

Directly learning automata is not the absolute hippest research area right now, though, on account of being hard in general. Some progress has been made. I feel that most of the hyped research that looks like differentiable computer learning is in the slightly-better-contained area of reinforcement learning where more progress can be made, or in the hot area of large LLMs which are harder to explain but solve the same kind of problems while looking different inside.

Related: grammatical inference, memory machines, computational complexity results in NNs, reasoning with LLMs.

1 Differentiable Cellular Automata

Mordvintsev et al. (2020) is a fun paper. They improve upon boring old-school cellular automata in several ways (not all of which are completely novel, but seem to be a novel combination.)

  1. Continuous states whose rules can be differentiably learned
  2. Use of Sobel filters for CA based on local gradients
  3. Framing the problem as “designing attractors of a dynamical system”
  4. Clever use of noise in the training.
Figure 2: image from Growing Neural Cellular Automata

I am instinctively annoyed by the unfashionable loss function which is not any kind, e.g. cool optimal transport metric, which is what this problem looks like it wants to me. But hey, it works so don’t listen to me about that. Logical extensions such as creating a model which can produce different patterns parametrically and interpolate between them also seem to leap out at me. I am also curious about an information bottleneck analysis which gives us restrictions on what patterns can be learned.

More general models of morphogenesis are out there, obviously. I will not even touch upon how this happens in real creatures as opposed to fake emoji monsters for now.

Jonathan Whitaker in Fun With Neural Cellular Automata shows how to make steerable automata (Mordvintsev and Niklasson 2021; Randazzo, Mordvintsev, and Fouts 2023) which ends up being like steerable diffusions.

TBD: connection to bio computing, and the particular special case, models of pattern formation.

In Differentiable Logic CA: from Game of Life to Pattern Generation they push this all the way to a differentiable version of Conway’s Game of Life, and thus credible Turing completeness, by working with discrete gradients.

Figure 3: Differentiable pointers

2 Are LLMs even Turing complete, bro?

3 Incoming

Blazek claims his neural networks implement predicate logic directly and yet are tractable which would be interesting to look into (Blazek and Lin 2021, 2020; Blazek, Venkatesh, and Lin 2021).

Google branded: Differentiable neural computers.

Christopher Olah’s Characteristically pedagogic intro.

Adrian Colyer’s introduction to neural Turing machines.

Andrej Karpathy’s memory machine list.

Facebook’s GTN:

GTN is an open source framework for automatic differentiation with a powerful, expressive type of graph called weighted finite-state transducers (WFSTs). Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs. AI researchers and engineers can use GTN to more effectively train graph-based machine learning models.

4 References

Blazek, and Lin. 2020. A Neural Network Model of Perception and Reasoning.” arXiv:2002.11319 [Cs, q-Bio].
———. 2021. Explainable Neural Networks That Simulate Reasoning.” Nature Computational Science.
Blazek, Venkatesh, and Lin. 2021. Deep Distilling: Automated Code Generation Using Explainable Deep Learning.” arXiv:2111.08275 [Cs].
Bottou. 2011. From Machine Learning to Machine Reasoning.” arXiv:1102.1808 [Cs].
Bubeck, Chandrasekaran, Eldan, et al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4.”
Clark, Tafjord, and Richardson. 2020. Transformers as Soft Reasoners over Language.” In IJCAI 2020.
Dehghani, Gouws, Vinyals, et al. 2019. Universal Transformers.”
Dziri, Lu, Sclar, et al. 2023. Faith and Fate: Limits of Transformers on Compositionality.”
Ellis, Solar-Lezama, and Tenenbaum. 2016. Sampling for Bayesian Program Learning.” In Advances in Neural Information Processing Systems 29.
Garcez, and Lamb. 2020. Neurosymbolic AI: The 3rd Wave.”
Graves, Wayne, and Danihelka. 2014. Neural Turing Machines.” arXiv:1410.5401 [Cs].
Graves, Wayne, Reynolds, et al. 2016. Hybrid Computing Using a Neural Network with Dynamic External Memory.” Nature.
Grefenstette, Hermann, Suleyman, et al. 2015. Learning to Transduce with Unbounded Memory.” arXiv:1506.02516 [Cs].
Gulcehre, Chandar, Cho, et al. 2016. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes.” arXiv:1607.00036 [Cs].
Hannun, Pratap, Kahn, et al. 2020. Differentiable Weighted Finite-State Transducers.” arXiv:2010.01003 [Cs, Stat].
Ibarz, Kurin, Papamakarios, et al. 2022. A Generalist Neural Algorithmic Learner.”
Ikeda. 1989. Decentralized Control of Large Scale Systems.” In Three Decades of Mathematical System Theory: A Collection of Surveys at the Occasion of the 50th Birthday of Jan C. Willems. Lecture Notes in Control and Information Sciences.
Jaitly, Sussillo, Le, et al. 2015. A Neural Transducer.” arXiv:1511.04868 [Cs].
Kaiser, and Sutskever. 2015. Neural GPUs Learn Algorithms.” arXiv:1511.08228 [Cs].
Kim, and Bassett. 2022. A Neural Programming Language for the Reservoir Computer.” arXiv:2203.05032 [Cond-Mat, Physics:nlin].
Kleinberg, and Mullainathan. 2024. Language Generation in the Limit.”
Lai, Domke, and Sheldon. 2022. Variational Marginal Particle Filters.” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.
Lamb, Garcez, Gori, et al. 2020. Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective.” In IJCAI 2020.
Lample, and Charton. 2019. Deep Learning for Symbolic Mathematics.” arXiv:1912.01412 [Cs].
Looks, Herreshoff, Hutchins, et al. 2017. Deep Learning with Dynamic Computation Graphs.” In Proceedings of ICLR.
Mordvintsev, and Niklasson. 2021. μNCA: Texture Generation with Ultra-Compact Neural Cellular Automata.”
Mordvintsev, Randazzo, Niklasson, et al. 2020. Growing Neural Cellular Automata.” Distill.
Niklasson, Mordvintsev, Randazzo, et al. 2021. Self-Organising Textures.” Distill.
Pajouheshgar, Xu, Mordvintsev, et al. 2023. Mesh Neural Cellular Automata.”
Peng, Narayanan, and Papadimitriou. 2024. On Limitations of the Transformer Architecture.”
Pereira. 2002. Formal Grammar and Information Theory: Together Again? In Current Issues in Linguistic Theory.
Perez, and Liu. 2016. Gated End-to-End Memory Networks.” arXiv:1610.04211 [Cs, Stat].
Putzky, and Welling. 2017. Recurrent Inference Machines for Solving Inverse Problems.” arXiv:1706.04008 [Cs].
Randazzo, Mordvintsev, and Fouts. 2023. Growing Steerable Neural Cellular Automata.” In.
Randazzo, Mordvintsev, Niklasson, et al. 2020. Self-Classifying MNIST Digits.” Distill.
Schuurmans, Dai, and Zanini. 2024. Autoregressive Large Language Models Are Computationally Universal.”
Veličković, and Blundell. 2021. Neural Algorithmic Reasoning.” Patterns.
Wang, Xin, Chen, and Zhu. 2021. A Survey on Curriculum Learning.”
Wang, Junxiong, Gangavarapu, Yan, et al. 2024. MambaByte: Token-Free Selective State Space Model.”
Wang, Cheng, and Niepert. 2019. State-Regularized Recurrent Neural Networks.”
Wei, Fan, Carin, et al. 2017. An Inner-Loop Free Solution to Inverse Problems Using Deep Neural Networks.” arXiv:1709.01841 [Cs].
Weston, Chopra, and Bordes. 2014. Memory Networks.” arXiv:1410.3916 [Cs, Stat].
Wu, Tan, Wang, et al. 2024. Beyond Language Models: Byte Models Are Digital World Simulators.”
Zhang, Backurs, Bubeck, et al. 2022. Unveiling Transformers with LEGO: A Synthetic Reasoning Task.”