Neural nets with basis decomposition layers

March 9, 2021 — February 1, 2022

convolution

dynamical systems

Hilbert space

linear algebra

machine learning

neural nets

nonparametric

optimization

regression

signal processing

sparser than thou

statmech

stochastic processes

Suspiciously similar content

Neural networks incorporating basis decompositions.

Why might you want to do this? For one, it is a different lens through which to analyse neural nets’ mysterious success. For another, it gives you interpolation for free. Also, this idea is part of the connection between neural nets and low rank GPs. There are possibly other reasons — perhaps the right basis gives you better priors for understanding a partial differential equation? Or something else?

1 Unrolling: Implementing sparse coding using neural nets

Often credited to Gregor and LeCun (2010), this trick imagines each step in an iterative sparse coding optimisation as a layer in a neural net and then optimises the gradient descent step of that iterative coding, giving you, in effect, a way of learning optimally fast, or optimally fast, sparse bases. This has been taken a long way by, e.g. Monga, Li, and Eldar (2021).

2 Convolutional neural networks as sparse coding

Elad and Papyan and others have a miniature school of Deep Learning analysis based on Multi Layer Convolutional Sparse Coding (Papyan, Romano, and Elad 2017; Papyan et al. 2018; Papyan, Sulam, and Elad 2017; Sulam et al. 2018). The argument here is that essentially Convnets are already solving sparse coding problems; they just don’t know it. They argue:

The recently proposed multilayer convolutional sparse coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of convolutional neural networks (CNNs). Under this framework, the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors from a given input signal. …Our work represents a bridge between matrix factorization, sparse dictionary learning, and sparse autoencoders, and we analyse these connections in detail.

However, as interesting as this sounds, I am not deeply engaged with it, since this does not solve any immediate problems for me.

3 Continuous basis functions

Convnet requires a complete rasterised grid, but often signals are not observed on a regular grid. This is precisely the problem of signal sampling. With basis functions of continuous support and a few assumptions, it is tempting to imagine we can get neural networks which operate in a continuous space. Can I use continuous bases in the computation of a neural net? If so, this could be useful in things like learning PDEs. The virtue of these things is that they do not depend (much?) upon the scale of some grid. Possibly this naturally leads to us being able to sample the problem very sparsely. It also might allow us to interpolate sparse solutions. In addition, analytic basis functions are easy to differentiate; we can use autodiff to find their local spatial gradients, even deep ones.

There are various other ways to do native interpolation; One hack uses the implicit representation method which is a clever trick — in that setting we reuse the autodiff architecture to calculate gradients with respect to the output index, but not plausible for every problem, where something better behaved like a basis function interpretation is more helpful.

Specifically, I would like to do Bayesian inference which looks extremely hard through an implicit net, but only very hard through a basis decomposition.

In practice, how would I do this?

Using a well-known basis, such as orthogonal polynomial or Fourier bases, creating a layer which encodes your net is easy. After all, that is just an inner product. That is what methods like that of Li et al. (2020) exploit.

More general, non-orthogonal bases such as sparse/overcomplete frames might need to solve a complicated sparse optimisation problem inside the network.

One approach is presumably to solve the basis problem in implicit layers.¹ Differentiable Convex Optimization Layers introduces cvxpylayers; perhaps that does some of the work we want?

I would probably not attempt to learn an arbitrary sparse basis dictionary in this context, because that does not interpolate naturally, but I can imagine learning a parametric sparse dictionary, such as one defined by some simple basis such as decaying sinusoids.

How would wavelet decompositions fit in here?

4 References

Aberdam, Sulam, and Elad. 2019. “Multi-Layer Sparse Coding: The Holistic Way.” SIAM Journal on Mathematics of Data Science.

Arora, Ge, Ma, et al. 2015. “Simple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of The 28th Conference on Learning Theory.

Barron. 1994. “Approximation and Estimation Bounds for Artificial Neural Networks.” Machine Learning.

Bradley, and Bagnell. 2008. “Differentiable Sparse Coding.” In Proceedings of the 21st International Conference on Neural Information Processing Systems. NIPS’08.

Chen, Liu, Wang, et al. 2018. “Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.

Chi, Jiang, and Mu. 2020. “Fast Fourier Convolution.” In Advances in Neural Information Processing Systems.

Gregor, and LeCun. 2010. “Learning fast approximations of sparse coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10).

———. 2011. “Efficient Learning of Sparse Invariant Representations.” arXiv:1105.5307 [Cs].

Knudson, Yates, Huk, et al. 2014. “Inferring Sparse Representations of Continuous Signals with Continuous Orthogonal Matching Pursuit.” In Advances in Neural Information Processing Systems 27.

Lemhadri, Ruan, Abraham, et al. 2021. “LassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research.

Li, Kovachki, Azizzadenesheli, et al. 2020. “Fourier Neural Operator for Parametric Partial Differential Equations.” arXiv:2010.08895 [Cs, Math].

Liu, Yeo, and Lu. 2020. “Statistical Modeling for Spatio-Temporal Data From Stochastic Convection-Diffusion Processes.” Journal of the American Statistical Association.

Monga, Li, and Eldar. 2021. “Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing.” IEEE Signal Processing Magazine.

Oreshkin, Carpov, Chapados, et al. 2020. “N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting.” arXiv:1905.10437 [Cs, Stat].

Papyan, Romano, and Elad. 2017. “Convolutional Neural Networks Analyzed via Convolutional Sparse Coding.” The Journal of Machine Learning Research.

Papyan, Romano, Sulam, et al. 2018. “Theoretical Foundations of Deep Learning via Sparse Representations: A Multilayer Sparse Model and Its Connection to Convolutional Neural Networks.” IEEE Signal Processing Magazine.

Papyan, Sulam, and Elad. 2017. “Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding.” IEEE Transactions on Signal Processing.

Rackauckas. 2019. “The Essential Tools of Scientific Machine Learning (Scientific ML).”

Sulam, Aberdam, Beck, et al. 2020. “On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Sulam, Papyan, Romano, et al. 2018. “Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning.” IEEE Transactions on Signal Processing.

Footnotes

Not to be confused with implicit representation layers which are completely different.↩︎