Reproducing kernels satisfying physical equations

August 12, 2023 — October 24, 2024

functional analysis
Gaussian
generative
Hilbert space
kernel tricks
regression
spatial
stochastic processes
time series

A placeholder to collect articles on this idea (perhaps, collection of ideas) because, basically, the terminologies are not always obvious. No new insights yet.

If we want to use a kernel frequently to do some function-valued Gaussian process regression and the solutions should satisfy some partial differential equation, how might we encode that in the kernel itself? When is it worthwhile to do so? Closely related: learning physical operators.

Figure 1

When modelling physical systems, especially those governed by partial differential equations (PDEs), we often want to incorporate the underlying physical constraints directly into the kernel functions of reproducing kernel Hilbert spaces (RKHS). This approach ensures that the solutions not only fit the observed data but also adhere to known physical laws.

There are many methods that fit this description, and what works depends very much on which physical equations we are solving, on what domain, and so on.

The categories in this taxonomy are not mutually exclusive. I have not read the literature well enough to make claims about that. Some of them look very similar though.

1 Gaussian Processes and Their Derivatives

An important property of Gaussian processes is that linear transformations of GPs remain GPs under broad circumstances. Specifically, the derivative of a GP is also a GP, provided the covariance function is sufficiently smooth. If \(f\) is a GP with mean function \(m(x)\) and covariance function \(k(x, x')\), then its derivative \(f'\) is a GP with mean \(m'(x)\) and covariance \(k'(x, x')\), where:

\[ k'(x, x') = \frac{\partial^2 k(x, x')}{\partial x \partial x'} \]

This property allows us to incorporate differential operators into the GP framework, enabling us to encode PDE constraints directly into the kernel.

2 Latent Force Models

Latent force models are one of the earlier methods for integrating differential equations into GP regression (M. Álvarez, Luengo, and Lawrence 2009; M. A. Álvarez, Luengo, and Lawrence 2013; Moss et al. 2022). In LFMs, the idea is to model the unknown latent forces driving a system using GPs. These latent forces are then connected to the observed data through differential equations.

e.g. Consider a system governed by a linear ordinary differential equation (ODE):

\[ \frac{d f(t)}{d t} + a f(t) = u(t) \]

Here, \(f(t)\) is the observed function, \(a\) is a known constant, and \(u(t)\) is an unknown latent function modelled as a GP. By placing a GP prior on \(u(t)\), we induce a GP prior on \(f(t)\) that inherently satisfies the ODE.

The function \(f(t)\) resides in the Sobolev space \(W^{1,2}([0, T])\), which consists of functions whose first derivative is square-integrable over the interval ([0, T]).

3 Divergence-Free and Curl-Free Kernels

Some fun tricks are of special relevance to fluids; e.g. for kernels which imply divergence-free or curl-free fields, especially on the surface of a sphere (Narcowich, Ward, and Wright 2007; E. J. Fuselier, Shankar, and Wright 2016; E. J. Fuselier and Wright 2009).

E. Fuselier (2008) says:

Constructing divergence-free and curl-free matrix-valued RBFs is fairly simple. If \(\phi\) is a scalar-valued function consider

\[ \begin{aligned} \Phi_{\text {div }} & :=\left(-\Delta I+\nabla \nabla^T\right) \phi, \\ \Phi_{c u r l} & :=-\nabla \nabla^T \phi . \end{aligned} \]

If \(\phi\) is an RBF, then these functions can be used to produce divergence-free and curl-free interpolants, respectively. We note that these are not radial functions, but because they are usually generated by an RBF \(\phi\), they are still commonly called “matrix-valued RBFs”.

AFAICT there is nothing RBF-specific; I think it works for any stationary kernel. Do we even need stationarity?

The functions produced by these kernels reside in specific Sobolev spaces that respect the divergence-free or curl-free conditions. For instance, divergence-free vector fields in \(\mathbb{R}^3\) belong to the space:

\[ \mathbf{H}_{\text{div}} = \{\mathbf{f} \in [L^2(\Omega)]^3 : \nabla \cdot \mathbf{f} = 0\} \]

3.1 On the Sphere

When dealing with fields on the surface of a sphere, such as global wind patterns, special considerations are required (E. J. Fuselier and Wright 2009). The construction of divergence-free and curl-free kernels on the sphere involves accounting for the manifold’s curvature and ensuring that the vector fields are tangent to the sphere’s surface.

For a scalar function \(\phi\) defined on the sphere \(\mathbb{S}^2\), divergence-free kernels can be constructed using surface differential operators. These kernels help model tangential vector fields that are essential in geophysical applications, i.e. on the surface of the planet.

4 Linearly-constrained Operator-Valued Kernels

Operator-valued kernels extend the concept of scalar kernels to vector or function outputs. They are particularly handy when the physical constraints can be expressed as linear operators acting on functions (Lange-Hegermann 2018, 2021)

Consider a linear operator \(\mathcal{L}\) acting on a function \(f\). An operator-valued kernel \(K(x, x')\) can be designed such that:

\[ \mathcal{L}_x K(x, x') = \delta(x - x') \]

where \(\delta\) is the Dirac delta function. This approach ensures that functions drawn from the associated RKHS satisfy \(\mathcal{L} f = 0\).

Insert details here.

5 Physics-Informed Gaussian Processes

Physics-informed Gaussian processes (PIGPs) incorporate physical laws by modifying the GP’s prior or likelihood to enforce PDE constraints (Raissi and Karniadakis 2018). This can be done by penalizing deviations from the PDE in the loss function or by directly incorporating the differential operators into the kernel.

For a function \(f(x)\) that should satisfy \(\mathcal{L} f(x) = 0\), the GP prior can be adjusted such that:

\[ \text{Cov}(\mathcal{L} f(x), \mathcal{L} f(x')) = k_{\mathcal{L}}(x, x') \]

where \(k_{\mathcal{L}}\) is a covariance function constructed to reflect the operator \(\mathcal{L}\).

More at (Perdikaris et al. 2017; Raissi and Karniadakis 2018; Raissi, Perdikaris, and Karniadakis 2017a, 2017b, 2018)

6 Implicit

Not quite sure what to call it, but Kian Ming A. Chai introduced us to Mora et al. (2024), which seems to be an interesting variant. Keyword match to Brouard, Szafranski, and D’Alché-Buc (2016).

7 References

Albert. 2019. Gaussian Processes for Data Fulfilling Linear Differential Equations.” Proceedings.
Álvarez, Mauricio, Luengo, and Lawrence. 2009. Latent Force Models.” In Artificial Intelligence and Statistics.
Álvarez, Mauricio A., Luengo, and Lawrence. 2013. Linear Latent Force Models Using Gaussian Processes.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Bao, Qian, Liu, et al. 2022. An Operator Learning Approach via Function-Valued Reproducing Kernel Hilbert Space for Differential Equations.”
Besginow, and Lange-Hegermann. 2024. Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations.” In Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22.
Bolin, and Wallin. 2021. Efficient Methods for Gaussian Markov Random Fields Under Sparse Linear Constraints.” In Advances in Neural Information Processing Systems.
Brouard, Szafranski, and D’Alché-Buc. 2016. “Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels.” The Journal of Machine Learning Research.
Cockayne, Oates, Sullivan, et al. 2016. Probabilistic Numerical Methods for Partial Differential Equations and Bayesian Inverse Problems.”
———, et al. 2017. Probabilistic Numerical Methods for PDE-Constrained Bayesian Inverse Problems.” In AIP Conference Proceedings.
Cotter, Dashti, and Stuart. 2010. Approximation of Bayesian Inverse Problems for PDEs.” SIAM Journal on Numerical Analysis.
Fuselier, Edward. 2008. Sobolev-type approximation rates for divergence-free and curl-free RBF interpolants.” Mathematics of Computation.
Fuselier, Edward J. 2008. Improved Stability Estimates and a Characterization of the Native Space for Matrix-Valued RBFs.” Advances in Computational Mathematics.
Fuselier, Edward J., Shankar, and Wright. 2016. A High-Order Radial Basis Function (RBF) Leray Projection Method for the Solution of the Incompressible Unsteady Stokes Equations.” Computers & Fluids.
Fuselier, Edward J., and Wright. 2009. Stability and Error Estimates for Vector Field Interpolation and Decomposition on the Sphere with RBFs.” SIAM Journal on Numerical Analysis.
Gulian, Frankel, and Swiler. 2022. Gaussian Process Regression Constrained by Boundary Value Problems.” Computer Methods in Applied Mechanics and Engineering.
Harkonen, Lange-Hegermann, and Raita. 2023. Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients.” In Proceedings of the 40th International Conference on Machine Learning.
Heinonen, and d’Alché-Buc. 2014. Learning Nonparametric Differential Equations with Operator-Valued Kernels and Gradient Matching.” arXiv:1411.5172 [Cs, Stat].
Henderson. 2023. PDE Constrained Kernel Regression Methods.”
Henderson, Noble, and Roustant. 2023. Characterization of the Second Order Random Fields Subject to Linear Distributional PDE Constraints.” Bernoulli.
Hutchinson, Terenin, Borovitskiy, et al. 2021. Vector-Valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels.” In Advances in Neural Information Processing Systems.
Kadri, Duflos, Preux, et al. 2016. Operator-Valued Kernels for Learning from Functional Response Data.” The Journal of Machine Learning Research.
Kim, Luettgen, Paynabar, et al. 2023. Physics-Based Penalization for Hyperparameter Estimation in Gaussian Process Regression.” Computers & Chemical Engineering.
Krämer, Schmidt, and Hennig. 2022. Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations.” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.
Kübler, Muandet, and Schölkopf. 2019. Quantum Mean Embedding of Probability Distributions.” Physical Review Research.
Lange-Hegermann. 2018. Algorithmic Linearly Constrained Gaussian Processes.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
———. 2021. Linearly Constrained Gaussian Processes with Boundary Conditions.” In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research.
Magnani, Krämer, Eschenhagen, et al. 2022. Approximate Bayesian Neural Operators: Uncertainty Quantification for Parametric PDEs.”
Micchelli, and Pontil. 2005. On Learning Vector-Valued Functions.” Neural Computation.
Mora, Yousefpour, Hosseinmardi, et al. 2024. Operator Learning with Gaussian Processes.”
Moss, Opolka, Dumitrascu, et al. 2022. Approximate Latent Force Model Inference.”
Narcowich, Ward, and Wright. 2007. Divergence-Free RBFs on Surfaces.” Journal of Fourier Analysis and Applications.
Negiar. 2023. Constrained Machine Learning: Algorithms and Models.”
Oates, Cockayne, Aykroyd, et al. 2019. Bayesian Probabilistic Numerical Methods in Time-Dependent State Estimation for Industrial Hydrocyclone Equipment.” Journal of the American Statistical Association.
Perdikaris, Raissi, Damianou, et al. 2017. Nonlinear Information Fusion Algorithms for Data-Efficient Multi-Fidelity Modelling.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
Raissi, and Karniadakis. 2018. Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” Journal of Computational Physics.
Raissi, Perdikaris, and Karniadakis. 2017a. Inferring Solutions of Differential Equations Using Noisy Multi-Fidelity Data.” Journal of Computational Physics.
———. 2017b. Machine Learning of Linear Differential Equations Using Gaussian Processes.” Journal of Computational Physics.
———. 2018. Numerical Gaussian Processes for Time-Dependent and Nonlinear Partial Differential Equations.” SIAM Journal on Scientific Computing.
Ranftl. n.d. “Physics-Consistency of Infinite Neural Networks.”
Saha, and Balamurugan. 2020. Learning with Operator-Valued Kernels in Reproducing Kernel Krein Spaces.” In Advances in Neural Information Processing Systems.
Sigrist, Künsch, and Stahel. 2015. Stochastic Partial Differential Equation Based Modelling of Large Space-Time Data Sets.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Singh, and Principe. 2022. A Physics Inspired Functional Operator for Model Uncertainty Quantification in the RKHS.”
Stepaniants. 2023. Learning Partial Differential Equations in Reproducing Kernel Hilbert Spaces.” Journal of Machine Learning Research.
Wang, Cockayne, and Oates. 2018. On the Bayesian Solution of Differential Equations.” In 38th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering.
Zhang, Zhen, Wang, and Nehorai. 2020. Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zhang, Haizhang, Xu, and Zhang. 2012. “Refinement of Operator-Valued Reproducing Kernels.” The Journal of Machine Learning Research.