Bayesian model selection

August 20, 2017 — November 19, 2024

Bayes
information
model selection
statistics
Figure 1

Frequentist model selection is not the only type, although you can miss learning about this in introductory Bayesian statistics. What is model selection in a Bayesian context? Surely you don’t ever get some models with zero posterior probability? In my intro Bayesian classes, I learned that one simply keeps all the models weighted by posterior likelihood when making predictions. But sometimes we wish to get rid of some models. When does this work, and when not?

🏗

1 Sparsity

Setting: in the phenomenon we observe, we suspect that of the many regressors, most “don’t matter” in some sense. This is an interesting special case: See Bayesian sparsity.

2 Cross-validation and Bayes

There is a relation between cross-validation and Bayes evidence, a.k.a. marginal likelihood — see (Claeskens and Hjort 2008; Fong and Holmes 2020).

3 Evidence/marginal likelihood/type II maximum likelihood/Bayes factor

A classic; Worth its own notebook. See model selection by model evidence maximisation.

4 Incoming

John Mount on applied variable selection (Mount 2020)

We have also always felt a bit exposed in this, as feature selection seems unjustified in standard explanations of regression. One feels that if a coefficient were meant to be zero, the fitting procedure would have set it to zero. Under this misapprehension, stepping in and removing some variables feels unjustified.

Regardless of intuition or feelings, it is a fair question: is variable selection a natural justifiable part of modelling? Or is it something that is already done (therefore redundant)? Or is it something that is not done for important reasons (such as avoiding damaging bias)?

In this note we will show that feature selection is in fact an obvious justified step when using a sufficiently sophisticated model of regression. This note is long, as it defines so many tiny elementary steps. However this note ends with a big point: variable selection is justified. It naturally appears in the right variation of Bayesian Regression. You should select variables, using your preferred methodology. And you shouldn’t feel bad about selecting variables.

5 References

Bartolucci, Scaccia, and Mira. 2006. Efficient Bayes Factor Estimation from the Reversible Jump Output.” Biometrika.
Bhadra, Datta, Polson, et al. 2016. Default Bayesian Analysis with Global-Local Shrinkage Priors.” Biometrika.
Bhattacharya, Page, and Dunson. 2011. Density Estimation and Classification via Bayesian Nonparametric Learning of Affine Subspaces.”
Bondell, and Reich. 2012. Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions.” Journal of the American Statistical Association.
Bürkner, Gabry, and Vehtari. 2020. Approximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation.
Carvalho, Polson, and Scott. 2010. The Horseshoe Estimator for Sparse Signals.” Biometrika.
Castillo, Schmidt-Hieber, and van der Vaart. 2015. Bayesian Linear Regression with Sparse Priors.” The Annals of Statistics.
Chipman, George, McCulloch, et al. 2001. The Practical Implementation of Bayesian Model Selection.” In Model Selection. IMS Lecture Notes - Monograph Series.
Claeskens, and Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics.
Efron. 2012. Bayesian Inference and the Parametric Bootstrap.” The Annals of Applied Statistics.
Filippone, and Engler. 2015. Enabling Scalable Stochastic Gradient-Based Inference for Gaussian Processes by Employing the Unbiased LInear System SolvEr (ULISSE).” In Proceedings of the 32nd International Conference on Machine Learning.
Fong, and Holmes. 2020. On the Marginal Likelihood and Cross-Validation.” Biometrika.
Gelman, and Rubin. 1995. Avoiding Model Selection in Bayesian Social Research.” Sociological Methodology.
George, and McCulloch. 1997. Approaches for bayesian variable selection.” Statistica Sinica.
Hirsh, Barajas-Solano, and Kutz. 2022. Sparsifying Priors for Bayesian Uncertainty Quantification in Model Discovery.” Royal Society Open Science.
Ishwaran, and Rao. 2005. Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics.
Jankowiak. 2022. Bayesian Variable Selection in a Million Dimensions.”
Kadane, and Lazar. 2004. Methods and Criteria for Model Selection.” Journal of the American Statistical Association.
Laud, and Ibrahim. 1995. Predictive Model Selection.” Journal of the Royal Statistical Society. Series B (Methodological).
Li, and Dunson. 2016. A Framework for Probabilistic Inferences from Imperfect Models.” arXiv:1611.01241 [Stat].
Liu, and Ročková. 2023. Variable Selection Via Thompson Sampling.” Journal of the American Statistical Association.
Lorch, Rothfuss, Schölkopf, et al. 2021. DiBS: Differentiable Bayesian Structure Learning.” In.
Lv, and Liu. 2014. Model Selection Principles in Misspecified Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology.
Mackay. 1995. Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems.
MacKay. 1999. Comparison of Approximate Methods for Handling Hyperparameters.” Neural Computation.
Madigan, and Raftery. 1994. Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window.” Journal of the American Statistical Association.
Meyers, Bryan, McFarland, et al. 2017. Computational Correction of Copy Number Effect Improves Specificity of CRISPR–Cas9 Essentiality Screens in Cancer Cells.” Nature Genetics.
Mount. 2020. “Don’t Feel Guilty About Selecting Variables.”
Navarro. 2019. Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection.” Computational Brain & Behavior.
Ohn, and Kim. 2021. Posterior Consistency of Factor Dimensionality in High-Dimensional Sparse Factor Models.” Bayesian Analysis.
Ohn, and Lin. 2021. Adaptive Variational Bayes: Optimality, Computation and Applications.” arXiv:2109.03204 [Math, Stat].
Ormerod, Stewart, Yu, et al. 2017. Bayesian Hypothesis Tests with Diffuse Priors: Can We Have Our Cake and Eat It Too? arXiv:1710.09146 [Math, Stat].
Page, Bhattacharya, and Dunson. 2013. Classification via Bayesian Nonparametric Learning of Affine Subspaces.” Journal of the American Statistical Association.
Piironen, and Vehtari. 2017. Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing.
Polson, and Scott. 2012. Local Shrinkage Rules, Lévy Processes and Regularized Regression.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Raftery. 1995. Bayesian Model Selection in Social Research.” Sociological Methodology.
Robert, Wraith, Goggans, et al. 2009. Computational Methods for Bayesian Model Choice.” In AIP Conference Proceedings.
Ročková, and George. 2018. The Spike-and-Slab LASSO.” Journal of the American Statistical Association.
Schmidt, and Makalic. 2020. Log-Scale Shrinkage Priors and Adaptive Bayesian Global-Local Shrinkage Estimation.”
Simchoni, and Rosset. 2023. Integrating Random Effects in Deep Neural Networks.”
Stein. 2008. A Modeling Approach for Large Spatial Datasets.” Journal of the Korean Statistical Society.
Tang, Xu, Ghosh, et al. 2016. Bayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors.”
Thomas, You, Lin, et al. 2022. Learning Subspaces of Different Dimensions.” Journal of Computational and Graphical Statistics.
van der Linden, and Chryst. 2017. No Need for Bayes Factors: A Fully Bayesian Evidence Synthesis.” Frontiers in Applied Mathematics and Statistics.
van Wieringen. 2021. Lecture Notes on Ridge Regression.” arXiv:1509.09169 [Stat].
Vehtari, and Ojanen. 2012. A Survey of Bayesian Predictive Methods for Model Assessment, Selection and Comparison.” Statistics Surveys.
Xu, Schmidt, Makalic, et al. 2017. Bayesian Sparse Global-Local Shrinkage Regression for Selection of Grouped Variables.”
Zanella, and Roberts. 2019. Scalable Importance Tempering and Bayesian Variable Selection.” Journal of the Royal Statistical Society Series B: Statistical Methodology.