Gaussian process regression software

And classification.

December 2, 2019 — July 28, 2022

functional analysis
Gaussian
generative
Hilbert space
kernel tricks
nonparametric
regression
spatial
stochastic processes
time series

Implementations of Gaussian process regression.

Figure 1

GPy was a common default choice in Python, and GPFlow, for example, has attempted to follow its API. For another value of default, scikit-learn has a GP implementation. Moreover, many generic Bayesian inference toolkits support GP models generically. All things being equal, I want better-than-generic support for GP models. Making them go fast can be a subtle business, and there are all kinds of fancy bells and whistles I would generally like to support, such as inducing point methods and sparse variational inference.

1 GPy

GPy originates in GP-urdaddy Neil Lawrence’s lab. It is well-tested and featureful. However, it is also crufty, confusing, and slow. It predates some modern autodiff technology and rolls its own, in opaque ways. It is best regarded as a kind of reference implementation but maybe not used in practice.

2 Stheno

Stheno seems to be popular for Julia and also comes in an alternative flavour, python stheno, targeting PyTorch, TensorFlow, or JAX. It seems to be sponsored by Invenia as a by-product of their main business. They write excellent GP tutorials, e.g. Scaling multi-output Gaussian process models with exact inference.

3 GPyTorch

Gardner et al. (2018)

Russell Tsuchida:

have you used gpytorch? Its the bomb it is truly amazing. This tutorial is well worth the 15 minutes it takes. Crank up the training and testing set datasize to 10000 and it is a breeze I have tried GPy, GPflow and tensorflow in the past and they would struggle

Bonus feature: integration (in the API sense) with pyro, for more elaborate likelihood models.

4 Plain pyro

There are native GP models in the PyTorch-based pyro, which you can use without GPyTorch.

5 JuliaGaussianProcesses Ecosystem

JuliaGaussianProcesses includes various interoperable Julia GP regression packages.

AugmentedGaussianProcess.jl by Théo Galy-Fajou looks nice and has sparse approximation plus some nice variational approx tricks. Deprecated in favour of JuliaGaussianProcesses/AugmentedGPLikelihoods.jl

6 GPJax

Via Cheng Soon Ong:

GPJax (Pinder and Dodd 2022) is a didactic Gaussian process library that supports GPU acceleration and just-in-time compilation. We seek to provide a flexible API as close as possible to how the underlying mathematics is written on paper to enable researchers to rapidly prototype and develop new ideas.

7 TinyGP

tinygp is an extremely lightweight library for building Gaussian Process (GP) models in Python, built on top of jax. It has a nice interface, and it’s pretty fast (see Benchmarks). Thanks to jax, tinygp supports things like GPU acceleration and automatic differentiation.

8 BayesNewton

Bayes-Newton (Wilkinson, Särkkä, and Solin 2021) is a library for approximate inference in Gaussian processes (GPs) in JAX (with objax), built and actively maintained by Will Wilkinson. Bayes-Newton provides a unifying view of approximate Bayesian inference, and allows for the combination of many models (e.g. GPs, sparse GPs, Markov GPs, sparse Markov GPs) with the inference method of your choice (VI, EP, Laplace, Linearization).

9 George

George is a Python library implementing Ambikasaran et al. (2015)’s fast method for time-series GPs, as explained in Scaling Gaussian Processes to big datasets.

10 Geostat Framework

GeoStat Framework:

This Framework was created within the PhD project of Sebastian Müller at the Computational Hydrosystems Department at the UFZ Leipzig.

11 Stan

Bayes workhorse Stan can do Gaussian Process regression just like almost everything else; see Michael Betancourt’s blog posts, 1. 2. 3.

12 scikit-learn

The current scikit-learn has basic Gaussian processes. The introduction disposes me against using this implementation:

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.

The advantages of Gaussian processes are:

  • The prediction interpolates the observations (at least for regular kernels).
  • The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest.
  • Versatile: different kernels can be specified. Common kernels are provided, but it is also possible to specify custom kernels.

The disadvantages of Gaussian processes include:

  • They are not sparse, i.e., they use the whole samples/features information to perform the prediction.
  • They lose efficiency in high dimensional spaces — namely when the number of features exceeds a few dozens.

This list of disadvantages has, at best, imprecision and, at worst, mistakes. Let us suppose that this description is supposed to draw a distinction between GP regression and some other regression model (note that Bayes linear regression and GP regression are intimately linked).

So what is this telling us about the unusual qualities of GP regression?

The first point is strictly correct, but not useful, in that sparse approximate GPs is a whole industry, and this is a statement that this implementation is missing a fairly standard approximate inference method rather than a problem with the family of methods per se. (“The problem with cars is that they all run on diesel”). The second point might be correct under a maximally generous interpretation. What do they mean by efficiency here?

In the least generous interpretation, it is just plain wrong; I use GPs with dozens of inputs often; it works fine. In particular, if they mean ‘efficiency’ in the sense that computational cost grows with input dimension, this is suspect. Naïve inference is \(\mathcal{O}(DN^3)\) for \(N\) observations and \(D\) features. That is, dimensionality cost is no worse than linear regression for prediction and superior for training, although other models that have a linear complexity in sample dimension escape without such a warning in the scikit-learn docs.

Perhaps they mean statistical efficiency, i.e. in terms of variance of the estimate with respect to a fixed number of data points? That is a subtler question. Yes, for a fixed isotropic kernel we can get bad behaviour in high dimensional spaces because, informally, low dimensional projections are all kinda similar 🚧TODO🚧 clarify. OTOH, if we use a good non-isotropic kernel or have adaptive kernel selection, which is kinda standard, then we can have a kernel that behaves well in many dimensions by learning to adapt to the important ones. This is pretty standard. As the number of input dimensions grows larger still, the hyperparameter selection problem grows less-well-determined. However, that is also true of linear regression in general. If they wish to make an assertion that this problem in GP regression is even worse than linear regression then they could make that case I suppose, but something better than this one sentence aside would be needed.

13 GPFlow

GPflow (Matthews et al. 2016) (using TensorFlow) is an option. The GPflow docs include the following clarification of its genealogy.

GPflow has origins in GPy…, and much of the interface is intentionally similar for continuity (though some parts of the interface may diverge in future). GPflow has a rather different remit from GPy though:

  • GPflow leverages TensorFlow for faster/bigger computation
  • GPflow has much less code than GPy, mostly because all gradient computation is handled by TensorFlow.
  • GPflow focuses on variational inference and MCMC — there is no expectation propagation or Laplace approximation.
  • GPflow does not have any plotting functionality.

In practice, it is faster than GPy but does not seem any easier or less confusing to implement tricky stuff in.

14 Misc Python

PyMC3. ladax is a jax-based one.

15 AutoGP

autogp (also using TensorFlow) incorporates much fancy GP technology at once. I believe it is no longer actively maintained, which is a pity as it is the only one I have contributed substantially to.

16 MATLAB

Should mention the various MATLAB/Scilab options.

GPStuff is one for MATLAB/Octave that I have seen around the place.

17 References

Ambikasaran, Foreman-Mackey, Greengard, et al. 2015. Fast Direct Methods for Gaussian Processes.” arXiv:1403.6015 [Astro-Ph, Stat].
Galy-Fajou, Perrone, and Opper. 2021. Flexible and Efficient Inference with Particles for the Variational Gaussian Approximation.” Entropy.
Gardner, Pleiss, Bindel, et al. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Gramacy. 2016. laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R.” Journal of Statistical Software.
Ko, and Fox. 2009. GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models.” In Autonomous Robots.
Krauth, Bonilla, Cutajar, et al. 2016. AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models.” In UAI17.
Matthews, van der Wilk, Nickson, et al. 2016. GPflow: A Gaussian Process Library Using TensorFlow.” arXiv:1610.08733 [Stat].
Pinder, and Dodd. 2022. GPJax: A Gaussian Process Framework in JAX.” Journal of Open Source Software.
Vanhatalo, Riihimäki, Hartikainen, et al. 2013. GPstuff: Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research.
Wilkinson, Särkkä, and Solin. 2021. Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees.”