Rough path theory and signature methods

April 2, 2021 — April 30, 2024

control
dynamical systems
SDEs
signal processing
sparser than thou
statistics
stochastic processes
time series
Figure 1

I am not sure yet what this is. Do they mean rough in the sense of approximate or the sense of not smooth? Or maybe both?

Seems to originate in a fairly impenetrable body of work by Lyons, e.g. T. Lyons (1994) but modern recommendations are to read more approachable stuff. Friz and Hairer (2020), available free online, as an introduction, which covers the simplest (?) case of Gaussian noise.

1 Rough differential equations

Try Morrill et al. (2021)?

2 Discrete approximation

Wong-Zakai approximations Twardowska (1996). (Martin Hairer recommendation.)

Possibly compact refs: (Kelly 2016; Kelly and Melbourne 2014).

3 In learning

Hodgkinson, Roosta, and Mahoney (2021) makes use of rough path integrals to justify learning by the adjoint method in stochastic differential equations. Cass and Salvi (2024) is a friendly introduction to this area.

4 Signatures

Chevyrev and Kormilitzin (2016) discusses path signatures in particular, which is something arising in the theory about which I know little. Bonnier et al. (2019) summarises:

When data is ordered sequentially then it comes with a natural path-like structure: the data may be thought of as a discretisation of a path \(X:[0,1] \rightarrow V\), where \(V\) is some Banach space. In practice we shall always take \(V=\mathbb{R}^d\) for some \(d \in \mathbb{N}\). For example the changing air pressure at a particular location may be thought of as a path in \(\mathbb{R}\); the motion of a pen on paper may be thought of as a path in \(\mathbb{R}^2\); the changes within financial markets may be thought of as a path in \(\mathbb{R}^d\), with \(d\) potentially very large.

Given a path, we may define its signature, which is a collection of statistics of the path. The map from a path to its signature is called the signature transform. Definition 1.1. Let \(\mathbf{x}=\left(x_1, \ldots, x_n\right)\), where \(x_i \in \mathbb{R}^d\). Let \(f=\left(f_1, \ldots, f_d\right):[0,1] \rightarrow \mathbb{R}^d\) be continuous, such that \(f\left(\frac{i-1}{n-1}\right)=x_i\), and linear on the intervals in between. Then the signature of \(\mathbf{x}\) is defined as the collection of iterated integrals \[ \operatorname{Sig}(\mathbf{x})=\left(\left(\int_{0<t_1<\cdots<t_k<1} \cdots \prod_{j=1}^k \frac{\mathrm{d} f_{i_j}}{\mathrm{~d} t}\left(t_j\right) \mathrm{d} t_1 \cdots \mathrm{d} t_k\right)_{1 \leq i_1, \ldots, i_k \leq d}\right)_{k \geq 0} \]

…In short, the signature of a path determines the path essentially uniquely, and does so in an efficient, computable way. Furthermore, the signature is rich enough that every continuous function of the path may be approximated arbitrarily well by a linear function of its signature; it may be thought of as a ‘universal nonlinearity’. Taken together these properties make the signature an attractive tool for machine learning. The most simple way to use the signature is as feature transformation, as it may often be simpler to learn a function of the signature than of the original path.

This makes it sound like we have a connection to koopman operators?

5 Code

The signature of a stream of data is essentially a collection of statistics about that stream of data. This collection of statistics does such a good job of capturing the information about the stream of data that it actually determines the stream of data uniquely. (Up to something called ’tree-like equivalence’ anyway, which is really just a technicality. It’s an equivalence relation that matters about as much as two functions being equal almost everywhere. That is to say, not much at all.) The signature transform is a particularly attractive tool in machine learning because it is what we call a ’universal nonlinearity’: it is sufficiently rich that it captures every possible nonlinear function of the original stream of data. Any function of a stream is linear on its signature. Now for various reasons this is a mathematical idealisation not borne out in practice (which is why we put them in a neural network and don’t just use a simple linear model), but they still work very well!

6 References

Bonnier, Kidger, Arribas, et al. 2019. Deep Signature Transforms.” In Advances in Neural Information Processing Systems.
Cass, and Salvi. 2024. Lecture Notes on Rough Paths and Applications to Machine Learning.”
Chevyrev, and Kormilitzin. 2016. A Primer on the Signature Method in Machine Learning.” arXiv:1603.03788 [Cs, Stat].
Friz, and Hairer. 2020. A Course on Rough Paths. Edited by Peter K. Friz and Martin Hairer. Universitext.
Hodgkinson, Roosta, and Mahoney. 2021. “Stochastic Continuous Normalizing Flows: Training SDEs as ODEs.” Uncertainty in Artificial Intelligence.
Kalsi, Lyons, and Arribas. 2020. Optimal Execution with Rough Path Signatures.” SIAM Journal on Financial Mathematics.
Kelly. 2016. Rough Path Recursions and Diffusion Approximations.” The Annals of Applied Probability.
Kelly, and Melbourne. 2014. Smooth Approximation of Stochastic Differential Equations.”
Kidger. 2022. On Neural Differential Equations.”
Levin, Lyons, and Ni. 2016. Learning from the Past, Predicting the Statistics for the Future, Learning an Evolving System.”
Lyons, Terry. 1994. Differential Equations Driven by Rough Signals (I): An Extension of an Inequality of L. C. Young.” Mathematical Research Letters.
———. 2014. Rough Paths, Signatures and the Modelling of Functions on Streams.” arXiv:1405.4537 [Math, q-Fin, Stat].
Lyons, Terry J., and Sidorova. 2005. Sound Compression: A Rough Path Approach.” In Proceedings of the 4th International Symposium on Information and Communication Technologies. WISICT ’05.
Morrill, Salvi, Kidger, et al. 2021. Neural Rough Differential Equations for Long Time Series.” In Proceedings of the 38th International Conference on Machine Learning.
Salvi, Cass, Foster, et al. 2021. The Signature Kernel Is the Solution of a Goursat PDE.” SIAM Journal on Mathematics of Data Science.
Salvi, Lemercier, Liu, et al. 2024. Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes.” In Advances in Neural Information Processing Systems. NIPS ’21.
Twardowska. 1996. “Wong-Zakai Approximations for Stochastic Differential Equations.” Acta Applicandae Mathematica.