(Outlier) robust statistics
November 25, 2014 — January 21, 2022
Terminology note: I mean robust statistics in the sense of Huber, which is, informally, outlier robustness.
There are also robust estimators in econometrics; then it means something about good behaviour under heteroskedastic and/or correlated error. Robust Bayes means something about inference that is robust to the choice of prior (which could overlap but is a rather different emphasis).
Outlier robustness is, as far as I can tell, more or less a frequentist project. Bayesian approaches seem to achieve robustness largely by choosing heavy-tailed priors or heavy-tailed noise distributions where they might have chosen light-tailed ones, e.g. Laplacian distributions instead of Gaussian ones. Such heavy-tailed distributions may have arbitrary prior parameters, but not more arbitrary than usual in Bayesian statistics and therefore do not attract so much need to wash away the guilt as frequentists seem to feel.
One can of course use heavy-tailed noise distributions in frequentist inference as well and that will buy a kind of robustness. That seems to be unpopular due to making frequentist inference as difficult as Bayesian inference.
1 Corruption models
- Random (mixture) corruption
- (Adversarial) total variation \(\epsilon\)-corruption.
- Wasserstein corruption models (does one usually assume adversarial here or random) as seen in “distributionally robust” models.
- other?
2 M-estimation with robust loss
The one that I, at least, would think of when considering robust estimation.
In M-estimation, instead of hunting a maximum of the likelihood function as you do in maximum likelihood, or a minimum of the sum of squared residuals, as you do in least-squares estimation, you minimise a specifically chosen loss function for those residuals. You may select an objective function more robust to deviations between your model and reality. Credited to Huber (1964).
See M-estimation for some details.
As far as I can tell, the definition of M-estimation includes the possibility that you could in principle select a less-robust loss function than least sum-of-squares but I have not seen this in the literature. Generally, some robustified approach is presumed, which penalises outliers less severely than least-squares.
For M-estimation as robust estimation, various complications ensue, such as the difference between noise in your predictors, noise in your regressors, and whether the “true” model is included in your class, and which of these difficulties you have resolved or not.
Loosely speaking, no, you haven’t solved problems of noise in your predictors, only the problem of noise in your responses.
And the cost is that you now have a loss function with some extra arbitrary parameters which you have to justify, which is anathema to frequentists, who like to claim to be less arbitrary than Bayesians.
2.1 Huber loss
2.2 Tukey loss
3 MM-estimation
🚧TODO🚧 clarify
4 Median-based estimators
Rousseeuw and Yohai’s idea (P. Rousseeuw and Yohai 1984)
Many permutations on the theme here, but it rapidly gets complex. The only one of these families I have looked into are the near trivial cases of the Least Median Of Squares and Least Trimmed Squares estimations. (P. J. Rousseeuw 1984) More broadly we should also consider S-estimators, which do something with… robust estimation of scale and using this to do robust estimation of location? 🚧TODO🚧 clarify
Theil-Sen-(Oja) estimators: Something about medians of inferred regression slopes. 🚧TODO🚧 clarify
Tukey median, and why no-one uses it what with it being NP-Hard.
5 Others
RANSAC — some kind of randomised outlier detection estimator. 🚧TODO🚧 clarify
6 Bayes
See robust Bayes.
7 Incoming
- relation to penalized regression.
- connection with Lasso.
- Beran’s Hellinger-ball contamination model, which I also don’t yet understand.
- Breakdown point explanation
- Yet Another Math Programming Consultant: Huber regression: different formulations