Ensemble Kalman updates are empirical Matheron updates
June 22, 2022 — February 6, 2025
Suspiciously similar content
Assumed audience:
Machine learning researchers and geospatial statisticians, together in peace at last
A note about a fact that I use a lot, which is that the Ensemble Kalman Filter can be formulated as, and IMO more conveniently explained as, an empirical version of the Matheron update, which is a handy tool from Gaussian process regression.
If you prefer your maths in PDF form, check out the arXiv report that this post is translated from (MacKinlay 2025). The fact that this was translated from a semi-formal paper might also warn you that the language here will be stilted and jargony. I may fix that if I have time some day.
This is one of those facts that I am amazed does not seem to be widely known. I would be surprised if I were the first person to notice it, but on the other hand, I do not have another citation that lays out the connection directly, and it is very useful to know. It may be in one of the references below; I have not read all of them, but rather given up after looking at too many candidates after I came up blank. Please let me know in the comments if you complete that quest.
\[ \renewcommand{\Normal}{\mathcal{N}} \renewcommand{\disteq}{\stackrel{\mathrm{d}}{=}} \renewcommand{\mmmean}[1]{\bar{\mathrm{#1}}} \renewcommand{\mmdev}[1]{\breve{\mathrm{#1}}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\vrv}[1]{\vv{\rv{#1}}} \renewcommand{\rv}[1]{\mathsf{#1}} \renewcommand{\op}[1]{\mathcal{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \newcommand{\Law}[1][]{\mu_{#1}} \newcommand{\ELaw}[1][]{\widehat{\mu}_{#1}} \renewcommand{\Ex}{\mathbb{E}} \renewcommand{\Pr}{\mathbb{P}} \renewcommand{\var}{\operatorname{Var}} \]
1 Introduction
The Ensemble Kalman Filter (EnKF) (Evensen 2003, 2009a) is a cornerstone in data assimilation for large-scale dynamical systems due to its computational efficiency and scalability. The EnKF approximates the state estimation problem by evolving an ensemble of state vectors through the model dynamics and updating them using observational data.
Separately, the Matheron update provides a sample-based method for conditioning Gaussian random variables on observations(Doucet 2010; Wilson et al. 2020, 2021). We transform prior samples into posterior samples, which is sometimes much easier than calculating posterior densities. This approach is well-established in geostatistics and spatial statistics, but the connection to ensemble methods in data assimilation seems not to be well-known.
In this work, we establish that the ensemble update step in the EnKF is equivalent to an empirical Matheron update by putting them on a common probabilistic footing. By explicitly representing the ensemble mean and covariance using empirical approximations, we demonstrate this equivalence. Recognising this connection provides an alternative probabilistic foundation for the EnKF and suggests potential improvements in ensemble data assimilation techniques by leveraging the properties of the Matheron update. Conversely, the analytic Matheron updates in the literature could benefit from the computational optimisations of the data assimilation community.
1.1 Notation
We write random variates sans serif, \(\mathsf{x}\). Equality in distribution is \(\disteq\). The law, or measure, of a random variate \(\mathsf{x}\) is denoted \(\Law[\mathsf{x}]\), so that \[ (\Law[\mathsf{x}]=\Law[\mathsf{y}]) \;\Rightarrow\; (\mathsf{x}\disteq\mathsf{y}). \] Mnemonically, samples drawn from the \(\Law[\mathsf{x}]\) are written with a serif \(\boldsymbol{x}\sim\Law[\mathsf{x}]\). We use a hat to denote empirical estimates, e.g. \(\ELaw[\mathrm{X}]\) is the empirical law induced by the sample matrix \(\mathrm{X}\). When there is no ambiguity, we suppress the sample matrix, writing simply \(\widehat{\Law}\).1
2 Matheron Update
The Matheron update is a technique for sampling from the conditional distribution of a Gaussian random variable given observations, without explicitly computing the posterior covariance (Doucet 2010; Wilson et al. 2020, 2021) but instead updating prior samples. Spoiler: the EnKF update is the same thing for a particular kind of Gaussian model where the posterior is generated by a simulator.
Suppose we have a jointly Gaussian vector \[ \begin{bmatrix} \mathsf{x} \\ \mathsf{y} \end{bmatrix} \sim \Normal\!\Bigl( \begin{bmatrix} \boldsymbol{m}_{\mathsf{x}} \\ \boldsymbol{m}_{\mathsf{y}} \end{bmatrix}, \begin{bmatrix} \mathrm{C}_{\mathsf{x}\mathsf{x}} & \mathrm{C}_{\mathsf{x}\mathsf{y}} \\ \mathrm{C}_{\mathsf{y}\mathsf{x}} & \mathrm{C}_{\mathsf{y}\mathsf{y}} \end{bmatrix} \Bigr). \tag{1}\]
Then the conditional distribution \((\mathsf{x} | \mathsf{y} = \boldsymbol{y}^{*})\) is again Gaussian with mean and covariance: \[ \mathbb{E}[\mathsf{x}\mid \mathsf{y}=\boldsymbol{y}^*] = \boldsymbol{m}_{\mathsf{x}} + \mathrm{C}_{\mathsf{x}\mathsf{y}}\, \mathrm{C}_{\mathsf{y}\mathsf{y}}^{-1}\, (\boldsymbol{y}^* - \boldsymbol{m}_{\mathsf{y}}) \tag{2}\]
and \[ \mathrm{C}_{\mathsf{x}\mid \mathsf{y}} = \mathrm{C}_{\mathsf{x}\mathsf{x}} - \mathrm{C}_{\mathsf{x}\mathsf{y}}\, \mathrm{C}_{\mathsf{y}\mathsf{y}}^{-1}\,\mathrm{C}_{\mathsf{y}\mathsf{x}}. \tag{3}\]
The Matheron update (Doucet 2010) says that drawing a sample from \((\mathsf{x}\mid\mathsf{y}=\boldsymbol{y}^*)\) is equivalent to \[ (\mathsf{x}\mid \mathsf{y}=\boldsymbol{y}^*) \;\;\disteq\;\; \mathsf{x} + \mathrm{C}_{\mathsf{x}\mathsf{y}}\, \mathrm{C}_{\mathsf{y}\mathsf{y}}^{-1}\, \bigl(\boldsymbol{y}^* - \mathsf{y}\bigr). \tag{4}\]
3 Kalman Filter
We begin by recalling that in state filtering the goal is to update our estimate of the system state when a new observation becomes available. In a general filtering problem, the objective is to form the posterior distribution \[ \Law[\vrv{x}_{t+1} \mid \vrv{x}_t, \vv{y}^*] \] from the prior \[ \Law[\vrv{x}_{t+1} \mid \vrv{x}_t] \] by incorporating the new information in the observation \(\vv{y}^*\).
We assume a known observation operator \(\op{H}\) such that observations \(\vrv{y}\) are a priori related to the state \(\vrv{x}\) by \[ \vrv{y} = \op{H}(\vrv{x}). \] Thus, there exists a joint law for the prior random vector, \[ \begin{bmatrix} \vrv{x} \\ \vrv{y} \end{bmatrix} = \begin{bmatrix} \vrv{x} \\ \op{H}(\vrv{x}) \end{bmatrix}, \tag{5}\] which is determined by the prior state distribution \(\Law[\vrv{x}]\) and the observation operator \(\op{H}\).
The analysis step—in state filtering parlance—is the update at time \(t\) of \(\Law[\vrv{x}_t]\) into the posterior \[ \Law[\vrv{x}_t \mid (\vrv{y}_t = \vv{y}_t^*)], \] i.e. incorporating the likelihood of the observation \(\vv{y}_t^* = \vv{y}_t\). Although recursive updating in \(t\) is the focus of the classic Kalman filter, here we are concerned only with the observational update step. Henceforth we suppress the time index \(t\) and consider an individual analysis update.
Suppose that the state and observation noise are independent, and all variates are defined over a finite-dimensional real vector space, with \[ \vv{x}\in\mathbb{R}^{D_{\vrv{x}}},\quad \vv{y}\in \mathbb{R}^{D_{\vrv{y}}}. \] Furthermore, suppose that at the update step our prior belief about \(\vrv{x}\) is Gaussian with mean \(\vv{m}_{\vrv{x}}\) and covariance \(\mm{C}_{\vrv{xx}}\), that the observation noise is centred Gaussian with covariance \(\mm{R}\), and that the observation operator is linear with matrix \(\mm{H}\) so that the observation is related to the state via \[ \vv{y} = \mm{H}\,\vv{x} + \vv{\varepsilon}, \quad \vv{\varepsilon} \sim \mathcal{N}(0,\mm{R}). \]
Then the joint distribution of the prior state and observation is Gaussian, and equation Equation 5 implies that it is \[ \begin{bmatrix} \vrv{x} \\ \vrv{y} \end{bmatrix} \sim \mathcal{N}\!\left( \begin{bmatrix} \vv{m}_{\vrv{x}} \\ \vv{m}_{\vrv{y}} \end{bmatrix}, \begin{bmatrix} \mm{C}_{\vrv{xx}} & \mm{C}_{\vrv{xy}} \\ \mm{C}_{\vrv{yx}} & \mm{C}_{\vrv{yy}} \end{bmatrix} \right) \] which can be rewritten as \[ \mathcal{N}\!\left( \begin{bmatrix} \vv{m}_{\vrv{x}} \\ \mm{H}\,\vv{m}_{\vrv{x}} \end{bmatrix}, \begin{bmatrix} \mm{C}_{\vrv{xx}} & \mm{C}_{\vrv{xx}}\,\mm{H}^\top \\ \mm{H}\,\mm{C}_{\vrv{xx}} & \mm{H}\,\mm{C}_{\vrv{xx}}\,\mm{H}^\top + \mm{R} \end{bmatrix} \right). \]
When an observation \(\vv{y}^*\) is obtained, we apply the standard formulae for Gaussian conditioning (see, for example, equations for the conditional Gaussian) to obtain \[ \left(\vrv{x} \mid \vrv{y} = \vv{y}^*\right) \sim \Normal\Bigl(\vv{m}_{\vrv{x}\mid\vrv{y}}, \mm{C}_{\vrv{x}\mid\vrv{y}}\Bigr) \] with the conditional mean \[ \vv{m}_{\vrv{x}\mid \vv{y}} = \vv{m}_{\vrv{x}} + \mm{K} \Bigl(\vv{y}^* - \vv{m}_{\vrv{y}}\Bigr) = \vv{m}_{\vrv{x}} + \mm{K} \Bigl(\vv{y}^* - \mm{H}\,\vv{m}_{\vrv{x}}\Bigr), \] and the conditional covariance \[ \mm{C}_{\vrv{x}\mid\vrv{y}} = \mm{C}_{\vrv{xx}} - \mm{K}\mm{C}_{\vrv{yx}} = \mm{C}_{\vrv{xx}} - \mm{K}\,\mm{H}\,\mm{C}_{\vrv{xx}}. \]
The Kalman gain is defined by \[ \mm{K} \coloneq \mm{C}_{\vrv{xy}}\,\mm{H}^\top \left(\mm{C}_{\vrv{yy}}\right)^{-1}, \tag{6}\] or equivalently, \[ \mm{K} \coloneq \mm{C}_{\vrv{xx}}\,\mm{H}^\top \left(\mm{H}\,\mm{C}_{\vrv{xx}}\,\mm{H}^\top + \mm{R}\right)^{-1}. \] For simplicity we assume a constant diagonal \(\mm{R}\), so that \[ \mm{R} = \rho^2\,\mm{I}_{D_{\vrv{y}}}. \]
4 Ensemble Kalman Filter
In high-dimensional or nonlinear settings, directly computing these posterior updates is often intractable. The Ensemble Kalman Filter (EnKF) addresses this by representing the belief about the state empirically via an ensemble of \(N\) state vectors sampled from the prior distribution, \[ \mm{X} = \begin{bmatrix} \vv{x}^{(1)} & \vv{x}^{(2)} & \cdots & \vv{x}^{(N)} \end{bmatrix}, \] and working with the empirical measure \(\ELaw[\mm{X}] \approx \Law\).
Since Gaussian measures are completely specified by their first two moments, we construct empirical measures that match the target in these moments. For convenience, we introduce the following notation for the matrix mean and deviations: \[ \mmmean{X} \coloneq \frac{1}{N}\sum_{i=1}^N \vv{x}^{(i)}, \quad\quad \mmdev{X} \coloneq \frac{1}{\sqrt{N-1}} \Bigl( \mm{X} - \mmmean{X}\,\vv{1}^\top \Bigr), \tag{7}\] where \(\vv{1}^\top\) is a row vector of \(N\) ones.
The ensemble mean (see Equation 8) and covariance (see Equation 9) are computed from the empirical measure: \[ \widehat{\Ex}[\vrv{x}] \coloneq \Ex_{\vrv{x}\sim \ELaw[\mm{X}]}[ \vrv{x}] = \widehat{\vv{m}}_{\vrv{x}} = \mmmean{X}, \tag{8}\] and \[ \begin{aligned} \widehat{\var}(\vrv{x}) \coloneq \var_{\vrv{x}\sim \ELaw[\mm{X}]} (\vrv{x}) &= \widehat{\mm{C}}_{\vrv{xx}} \coloneq \frac{1}{N-1} \sum_{i=1}^{N} \Bigl(\vv{x}^{(i)} - \widehat{\vv{m}}_{\vrv{x}}\Bigr)\Bigl(\vv{x}^{(i)} - \widehat{\vv{m}}_{\vrv{x}}\Bigr)^\top\\[1mm] &= \mmdev{X} \mmdev{X}^\top + \bigl(\xi^2\,\mm{I}_{D_{\vrv{x}}}\bigr), \end{aligned} \tag{9}\] where \(\xi^2\) is a scalar constant that introduces regularization to ensure invertibility.
Abusing notation, we associate a Gaussian distribution with the empirical measure: \[ \ELaw[\mm{X}] \approx \Normal\Bigl(\widehat{\Ex}[\vrv{x}],\,\widehat{\var}(\vrv{x})\Bigr) = \Normal\Bigl(\mmmean{X},\,\mmdev{X}\mmdev{X}^\top + \xi^2\,\mm{I}_{D_{\vrv{x}}}\Bigr). \]
We overload the observation operator to apply to ensemble matrices, writing \[ \mm{Y} \coloneq \op{H}\mm{X} \coloneq \begin{bmatrix}\op{H}(\vv{x}^{(1)}) & \op{H}(\vv{x}^{(2)}) & \dots & \op{H}(\vv{x}^{(N)})\end{bmatrix}. \] This also induces an empirical joint distribution \[ \ELaw[{\begin{bmatrix} \mm{X}\\ \mm{Y} \end{bmatrix}}] \coloneq \Normal\!\left( \begin{bmatrix} \mmmean{X}\\ \mmmean{Y} \end{bmatrix}, \begin{bmatrix} \mmdev{X}\mmdev{X}^\top + \xi^2\mm{I}_{{D_{\vrv{x}}}} & \mmdev{X}\mmdev{Y}^\top \\ \mmdev{Y}\mmdev{X}^\top & \mmdev{Y}\mmdev{Y}^\top + \upsilon^2\mm{I}_{{D_{\vrv{y}}}} \end{bmatrix} \right). \tag{10}\]
Here the regularization term \(\upsilon^2\,\mm{I}_{D_{\vrv{y}}}\) is introduced to ensure the empirical covariance is invertible when the ensemble of observations is rank deficient (typically when \(D_{\vrv{y}} > N\)).
The Kalman gain in the ensemble setting is constructed by plugging the empirical estimates from Equation 10 into Equation 6. This yields \[ \widehat{\mm{K}} = \widehat{\mm{C}}_{\vrv{xx}}\, \op{H}^\top \left(\op{H}\,\widehat{\mm{C}}_{\vrv{xx}}\,\op{H}^\top + \upsilon^2\,\mm{I}_{D_{\vrv{y}}} + \rho^2\,\mm{I}_{D_{\vrv{y}}}\right)^{-1}. \tag{11}\] Here, \(\rho^2\,\mm{I}_{D_{\vrv{y}}}\) is the observation error covariance matrix and \(\upsilon^2\,\mm{I}_{D_{\vrv{y}}}\) regularizes the empirical observation covariance. Combining these two, we define \[ \gamma^2 \coloneq \upsilon^2 + \rho^2, \] so that the effective covariance in the observation space is \[ \mm{C}_{\vrv{yy}} = \mmdev{Y}\mmdev{Y}^\top + \gamma^2\,\mm{I}_{D_{\vrv{y}}}. \] We also define \[ \mm{Y}^* = \vv{y}^*\,\mathbf{1}^\top, \] so that each column of \(\mm{Y}^*\) equals the observation \(\vv{y}^*\). Then the analysis update for the ensemble is given by \[ \mm{X}' = \mm{X} + \widehat{\mm{K}} \left(\mm{Y}^* - \op{H}\mm{X}\right). \tag{12}\] In other words, each ensemble member is updated as \[ \vv{x}^{(i)}{}' \gets \vv{x}^{(i)} + \widehat{\mm{K}} \left(\vv{y}^* - \op{H}(\vv{x}^{(i)})\right). \] Equating moments, we see that \[ \ELaw[\vrv{x}\sim \mm{X}'] \approx \Law[\vrv{x} \mid \vrv{y} = \vv{y}^*], \] which justifies the EnKF update as an empirical approximation to the Gaussian posterior update.
5 Empirical Matheron Update is Equivalent to Ensemble Kalman Update
Under the substitution \[ \begin{aligned} \vv{m}_{\vrv{x}} &\to \mmmean{X}, & \quad \mm{C}_{\vrv{xx}} &\to \mmdev{X}\mmdev{X}^\top + \xi^2\,\mm{I}_{D_{\vrv{x}}},\\ \vv{m}_{\vrv{y}} &\to \mmmean{Y}, & \quad \mm{C}_{\vrv{yy}} &\to \mmdev{Y}\mmdev{Y}^\top + \gamma^2\,\mm{I}_{D_{\vrv{y}}},\\ \mm{C}_{\vrv{xy}} &\to \mmdev{X}\mmdev{Y}^\top, & \quad \mm{C}_{\vrv{yx}} &\to \mmdev{Y}\mmdev{X}^\top, \end{aligned} \] the Matheron update (Equation 4 from the earlier section) becomes identical to the ensemble update given in Equation 12.
6 Computational Complexity
As is well understood in the EnKF literature, one of the main advantages of the EnKF is that the analysis update can be performed using only empirical ensemble statistics, thereby avoiding the direct computation and inversion of full, high-dimensional covariance matrices. The naive Kalman update must operate on these full covariance matrices, which can be computationally prohibitive.
In practice, we calculate the gain from Equation 11 as follows: \[ \begin{aligned} \widehat{\mm{K}} &= \widehat{\mm{C}}_{\vrv{xx}}\, \op{H}^\top \left(\op{H}\,\widehat{\mm{C}}_{\vrv{xx}}\,\op{H}^\top + \gamma^2\,\mm{I}_{D_{\vrv{y}}}\right)^{-1\\[1mm]} &= \mmdev{X}\mmdev{Y}^\top \left(\mmdev{Y}\mmdev{Y}^\top + \gamma^2\,\mm{I}_{D_{\vrv{y}}}\right)^{-1\\[1mm]} \\ &= \gamma^{-2}\,\mm{I}_{D_{\vrv{y}}} - \gamma^{-4}\,\mmdev{Y}\left(\mm{I}_N + \gamma^{-2}\,\mmdev{Y}^\top\mmdev{Y}\right)^{-1}\mmdev{Y}^\top. \end{aligned} \] Here, \(\gamma^2 = \upsilon^2 + \rho^2\) aggregates the regularization term for the empirical observation covariance and the intrinsic observation noise.
Thus, the overall computational cost of the ensemble update is \[ \mathcal{O}\Bigl(D_{\vrv{y}}N^2 + N^3 + D_{\vrv{x}}D_{\vrv{y}}N\Bigr). \] Since in practical applications the ensemble size \(N\) is typically much smaller than the state and observation dimensions \(D_{\vrv{x}}\) and \(D_{\vrv{y}}\), these costs are significantly lower than those incurred by the naive Kalman update, which is dominated by \(\mathcal{O}(D_{\vrv{y}}^3)\) operations. By carefully ordering operations to work with the \(N\times N\) system, the EnKF update is both computationally efficient and scalable to high-dimensional state spaces.
7 Implications
Understanding the EnKF as an empirical Matheron update opens up several possibilities. Both the Matheron update and Ensemble Kalman methods have been extensively studied in their respective fields. In recent years, numerous developments in Bayesian machine learning have advanced both approaches; see the many developments in Ensemble Kalman methods in the Ensemble Kalman notebook and especially the Neural networks meet Kalman filters notebook.
If we interpret the pathwise Matheron update in the context of the empirical measures used in the EnKF, it suggests the potential to import developments from one field to the other. The vast literature on the EnKF indicates that many ideas developed there might be applicable to Gaussian Process regression via the Matheron update. For instance, the clear probabilistic interpretation of the ensemble update could inspire improved regularization and localization strategies. One example is the Local Ensemble Transform Kalman Filter (Bocquet, Farchi, and Malartic n.d.), which provides state-of-the-art performance in high-dimensional filtering and system identification settings and could be adapted to the Matheron update (homework exercise).
Where to go from here? I don’t know; but cite me if this makes your life easier.
8 References
Footnotes
This measure notation from probability theory is unpopular in both machine learning and data assimilation contexts, but is succinct here.↩︎