Warping of stationary stochastic processes

September 16, 2019 — January 21, 2021

Hilbert space
kernel tricks
metrics
signal processing
statistics
stochastic processes
Figure 1

Transforming stationary processes into non-stationary ones by transforming their inputs (Sampson and Guttorp 1992; Genton 2001; Genton and Perrin 2004; Perrin and Senoussi 1999, 2000).

This is of interest in the context of composing kernels to have known desirable properties by known transforms, and also learning (somewhat) arbitrary transforms to attain stationarity.

One might consider instead processes that are stationary upon a manifold.

1 Stationary reducible kernels

Figure 2

The main idea is to find a new feature space where stationarity (Sampson and Guttorp 1992) or local stationarity (Perrin and Senoussi 1999, 2000; Genton and Perrin 2004) can be achieved.

Genton (2001) summarises:

We say that a nonstationary kernel \(K(\mathbf{x}, \mathbf{z})\) is stationary reducible if there exist a bijective deformation \(\Phi\) such that: \[ K(\mathbf{x}, \mathbf{z})=K_{S}^{*}(\mathbf{\Phi}(\mathbf{x})-\mathbf{\Phi}(\mathbf{z})) \] where \(K_{S}^{*}\) is a stationary kernel.

2 Classic deformations

2.1 MacKay warping

2.2 As a function of input

Invented apparently by Gibbs (1998) and generalised in Paciorek and Schervish (2003).

Let \(k_S\) be some stationary kernel on \(\mathbb{R}^D.\) Let \(\Sigma(\mathbf{x})\) be a \(D \times D\) matrix-valued function which is positive definite for all \(\mathbf{x},\) and let \(\Sigma_{i} \triangleq \Sigma\left(\mathbf{x}_{i}\right) .\) Then define \[ Q_{i j}=\left(\mathbf{x}_{i}-\mathbf{x}_{j}\right)^{\top}\left(\left(\Sigma_{i}+\Sigma_{j}\right) / 2\right)^{-1}\left(\mathbf{x}_{i}-\mathbf{x}_{j}\right) \] Then \[ k_{\mathrm{NS}}\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)=2^{D / 2}\left|\Sigma_{i}\right|^{1 / 4}\left|\Sigma_{j}\right|^{1 / 4}\left|\Sigma_{i}+\Sigma_{j}\right|^{-1 / 2} k_{\mathrm{S}}\left(\sqrt{Q_{i j}}\right) \] is a valid non-stationary covariance function.

Homework question: Is this a product of convolutional gaussian processes.

3 Learning transforms

Figure 3

4 References

Anderes, Ethan, and Chatterjee. 2009. Consistent Estimates of Deformed Isotropic Gaussian Random Fields on the Plane.” The Annals of Statistics.
Anderes, Ethan B., and Stein. 2008. Estimating Deformations of Isotropic Gaussian Random Fields on the Plane.” The Annals of Statistics.
Belkin, Ma, and Mandal. 2018. To Understand Deep Learning We Need to Understand Kernel Learning.” In International Conference on Machine Learning.
Bohn, Griebel, and Rieger. 2018. A Representer Theorem for Deep Kernel Learning.” arXiv:1709.10441 [Cs, Math].
Damian, Sampson, and Guttorp. 2001. Bayesian Estimation of Semi-Parametric Non-Stationary Spatial Covariance Structures.” Environmetrics.
Feragen, and Hauberg. 2016. Open Problem: Kernel Methods on Manifolds and Metric Spaces. What Is the Probability of a Positive Definite Geodesic Exponential Kernel? In Conference on Learning Theory.
Genton. 2001. Classes of Kernels for Machine Learning: A Statistics Perspective.” Journal of Machine Learning Research.
Genton, and Perrin. 2004. On a Time Deformation Reducing Nonstationary Stochastic Processes to Local Stationarity.” Journal of Applied Probability.
Gibbs. 1998. Bayesian Gaussian processes for regression and classification.”
Hinton, and Salakhutdinov. 2008. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes.” In Advances in Neural Information Processing Systems 20.
Ikeda, Ishikawa, and Sawano. 2021. Composition Operators on Reproducing Kernel Hilbert Spaces with Analytic Positive Definite Functions.” arXiv:1911.11992 [Math, Stat].
Paciorek, and Schervish. 2003. Nonstationary Covariance Functions for Gaussian Process Regression.” In Proceedings of the 16th International Conference on Neural Information Processing Systems. NIPS’03.
Perrin, and Senoussi. 1999. Reducing Non-Stationary Stochastic Processes to Stationarity by a Time Deformation.” Statistics & Probability Letters.
———. 2000. Reducing Non-Stationary Random Fields to Stationarity and Isotropy Using a Space Deformation.” Statistics & Probability Letters.
Rasmussen, and Williams. 2006. Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning.
Sampson, and Guttorp. 1992. Nonparametric Estimation of Nonstationary Spatial Covariance Structure.” Journal of the American Statistical Association.
Schmidt, and O’Hagan. 2003. Bayesian Inference for Non-Stationary Spatial Covariance Structure via Spatial Deformations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Shimotsu, and Phillips. 2004. Local Whittle Estimation in Nonstationary and Unit Root Cases.” The Annals of Statistics.
Snoek, Swersky, Zemel, et al. 2014. Input Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning (ICML-14).
Tompkins, and Ramos. 2018. Fourier Feature Approximations for Periodic Kernels in Time-Series Modelling.” Proceedings of the AAAI Conference on Artificial Intelligence.
Vu, Zammit-Mangion, and Cressie. 2020. Modeling Nonstationary and Asymmetric Multivariate Spatial Covariances via Deformations.”
Wilson, Hu, Salakhutdinov, et al. 2016. Deep Kernel Learning.” In Artificial Intelligence and Statistics.
Zammit-Mangion, Ng, Vu, et al. 2021. Deep Compositional Spatial Models.” Journal of the American Statistical Association.