At the training time of a neural net, say, we are storing something like memories in the weights. How best should learning algorithms store and retrieve memories at inference time?
As my colleague Tom Blau points out, perhaps best considered as a topic in its own right. Is it distinct from continual learning?
Implicit in recurrent networks. One of the chief advantages of neural Turing machines is that they make this need explicit. A great trick of transformers is that they have an idea of what to “remember” baked in to their context window, in the sense of what they attend to, although it is not so clear how to make this general. Behrouz, Zhong, and Mirrokni (2024) introduce learnable longer-term memory to transformers.
References
Grefenstette, Hermann, Suleyman, et al. 2015.
“Learning to Transduce with Unbounded Memory.” arXiv:1506.02516 [Cs].
Hochreiter, and Schmidhuber. 1997.
“Long Short-Term Memory.” Neural Computation.
Nagathan, Mungara, and Manimozhi. 2014.
“Content-Based Image Retrieval System Using Feed-Forward Backpropagation Neural Network.” International Journal of Computer Science and Network Security (IJCSNS).
Patraucean, Handa, and Cipolla. 2015.
“Spatio-Temporal Video Autoencoder with Differentiable Memory.” arXiv:1511.06309 [Cs].
Perez, and Liu. 2016.
“Gated End-to-End Memory Networks.” arXiv:1610.04211 [Cs, Stat].
Voelker, Kajic, and Eliasmith. n.d. “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks.”
Weston, Chopra, and Bordes. 2014.
“Memory Networks.” arXiv:1410.3916 [Cs, Stat].
Zhan, Xie, Mao, et al. 2022.
“Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models.” In
Proceedings of the 31st ACM International Conference on Information & Knowledge Management. CIKM ’22.