Effective sample size
November 21, 2016 — March 3, 2020
We have an estimator \(\hat{\theta}\) of some statistic which is, for the sake of argument, presumed to be the mean calculated from observations of some stochastic process \(\mathsf{v}\). Under certain assumptions, we can use central limit theorems to find that the variance of our estimator calculated from \(N\) i.i.d. samples is given by \(\operatorname{Var}(\hat{\theta})\propto 1/N.\) Effective Sample Size (ESS) gives us a different \(N\), \(N_{\text{Eff}}\) such that \(\operatorname{Var}(\hat{\theta})\propto 1/N_{\text{Eff}}.\)
1 Statistics
When your experiment design (e.g. because it is a time series, or because of non-random sampling) is highly correlated, your data might give you less information than you hope, or expect from the uncorrelated case, with regard to a particular statistic you wish to calculate. In practice, all the introductions use the mean.
This is a kind of dual to effective degrees of freedom, which tells you how far your sample size can get you.
Turns out to be important in, e.g. LASSO and, circularly, covariance estimation.
- Tom Leinster, effective sample size.
Huber’s (1981) “equivalent number of observations” is probably the same?
2 Monte Carlo estimation
Related, but a slightly different setup. Not about experimental samples, but the number of simulations in simulation-based inference where you are using e.g. importance sampling or a sequential Markov chain sampler. Sebastian Nowozin, Effective Sample Size in Importance Sampling, Kenneth Tay, Effective sample size for Markov Chain Monte Carlo.
[The effective sample size] can be used after or during importance sampling to provide a quantitative measure of the quality of the estimated mean. Even better, the estimate is provided on a natural scale of worth in samples from p, that is, if we use \(n=1000\) samples \(X_i\sim q\) and obtain an ESS of say 350 then this indicates that the quality of our estimate is about the same as if we would have used 350 direct samples.
Since Markov Chain Monte Carlo is so common in statistical inference these days, effective sample size for simulations might be a more common use of effective sample size in statistics than the directly statistical notion of the term.
In practice, we usually cargo cult in the formulae for ESS from a paper on how to do it best. The Stan method for ESS in MCMC, which is a best-practice method AFAICT, is based on the autocorrelogram.
\[ \tau = 1 + 2 \sum_{t=1}^{m} \rho_{t} \] and
\[N_{\text{eff}}=\frac{N}{\tau}.\]
Here \(\rho_{t}\) is the autocorrelation at a given lag. We are implicitly assuming the statistic of interest here is the mean. A short calculation should persuade us that this gives us the convergence rate we expect, but TBH I have not done that here. It seems plausible at least.
In practice, you are doing Markov Chain Monte Carlo because the problem is difficult enough that you cannot calculate an effective sample size analytically. So you estimate it, which means, in turn, estimating those autocorrelations. We can use FFT to calculate correlation (Geyer 2011). There are various fiddly details, especially if running multiple Markov chains.