Directed graphical models

September 19, 2017 — May 13, 2020

algebra
graphical models
hidden variables
hierarchical models
machine learning
networks
probability
statistics
Figure 1

Graphs of conditional, directed independence are a convenient formalism for many statistical models. If you have some kind of generating process for a model, often the most natural type of graphical model to express it with is a DAG. These are also called Bayes nets (not to be confused with Bayesian inference).

These can even be causal graphical models, and when we can infer those we are extracting Science (ONO) from observational data. See causal graphical models.

The laws of message passing inference assume their (MO) most complicated form for directed models; in practice it is frequently easier to convert a directed model to a factor graph for implementation. YMMV.

Figure 2: I found James F. Fixx’s puzzle book on the shelf when writing this post

1 Simpson’s paradox

Simpson’s paradox is an evergreen example of the importance of that causal graph. For a beautiful and clear example see Allen Downey’s Simpson’s Paradox and Age Effects. It is also the key explanation in Michael Nielsen’s Reinventing Explanation.

2 Tools

BayesNets is a Julia package for reasoning over directed graphical models.

Figure 3: Inferring the optimal intervention requires accounting for which arrows are independent of which.

3 References

Aragam, Amini, and Zhou. 2015. Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression.” arXiv:1511.08963 [Cs, Math, Stat].
Aragam, Gu, and Zhou. 2017. Learning Large-Scale Bayesian Networks with the Sparsebn Package.” arXiv:1703.04025 [Cs, Stat].
Aragam, and Zhou. 2015. Concave Penalized Estimation of Sparse Gaussian Bayesian Networks.” Journal of Machine Learning Research.
Aral, Muchnik, and Sundararajan. 2009. Distinguishing Influence-Based Contagion from Homophily-Driven Diffusion in Dynamic Networks.” Proceedings of the National Academy of Sciences.
Arnold, Castillo, and Sarabia. 1999. Conditional Specification of Statistical Models.
Bareinboim, Tian, and Pearl. 2014. Recovering from Selection Bias in Causal and Statistical Inference. In AAAI.
Bloniarz, Liu, Zhang, et al. 2015. Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” arXiv:1507.03652 [Math, Stat].
Brodersen, Gallusser, Koehler, et al. 2015. Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics.
Bühlmann, Kalisch, and Meier. 2014. High-Dimensional Statistics with a View Toward Applications in Biology.” Annual Review of Statistics and Its Application.
Bühlmann, Rütimann, and Kalisch. 2013. Controlling False Positive Selections in High-Dimensional Regression and Causal Inference.” Statistical Methods in Medical Research.
Buntine. 1996. A Guide to the Literature on Learning Probabilistic Networks from Data.” IEEE Transactions on Knowledge and Data Engineering.
Chen, and Pearl. 2012. “Regression and Causation: A Critical Examination of Econometric Textbooks.”
Christakis, and Fowler. 2007. The Spread of Obesity in a Large Social Network over 32 Years.” New England Journal of Medicine.
Colombo, Maathuis, Kalisch, et al. 2012. Learning High-Dimensional Directed Acyclic Graphs with Latent and Selection Variables.” The Annals of Statistics.
Dawid, A. Philip. 1979. Conditional Independence in Statistical Theory.” Journal of the Royal Statistical Society. Series B (Methodological).
———. 1980. Conditional Independence for Statistical Operations.” The Annals of Statistics.
Dawid, A. P. 2001. Separoids: A Mathematical Framework for Conditional Independence and Irrelevance.” Annals of Mathematics and Artificial Intelligence.
De Luna, Waernbaum, and Richardson. 2011. Covariate Selection for the Nonparametric Estimation of an Average Treatment Effect.” Biometrika.
Edwards, and Ankinakatte. 2015. Context-Specific Graphical Models for Discrete Longitudinal Data.” Statistical Modelling.
Fixx. 1977. Games for the superintelligent.
Frey, and Jojic. 2005. A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence.
Gu, Fu, and Zhou. 2014. Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.” arXiv:1403.2310 [Stat].
Guo, Tóth, Schölkopf, et al. 2022. Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data.”
Jordan, Michael Irwin. 1999. Learning in Graphical Models.
Jordan, Michael I., Ghahramani, Jaakkola, et al. 1999. An Introduction to Variational Methods for Graphical Models.” Machine Learning.
Jordan, Michael I., and Weiss. 2002a. Graphical Models: Probabilistic Inference.” The Handbook of Brain Theory and Neural Networks.
———. 2002b. Probabilistic Inference in Graphical Models.” Handbook of Neural Networks and Brain Theory.
Kalisch, and Bühlmann. 2007. Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm.” Journal of Machine Learning Research.
Koller, and Friedman. 2009. Probabilistic Graphical Models : Principles and Techniques.
Krause, and Guestrin. 2009. “Optimal Value of Information in Graphical Models.” J. Artif. Int. Res.
Lauritzen, Steffen L. 1996. Graphical Models. Oxford Statistical Science Series.
Lauritzen, S. L., and Spiegelhalter. 1988. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems.” Journal of the Royal Statistical Society. Series B (Methodological).
Maathuis, and Colombo. 2013. A Generalized Backdoor Criterion.” arXiv Preprint arXiv:1307.5636.
Malioutov, Johnson, and Willsky. 2006. Walk-Sums and Belief Propagation in Gaussian Graphical Models.” Journal of Machine Learning Research.
Marbach, Prill, Schaffter, et al. 2010. Revealing Strengths and Weaknesses of Methods for Gene Network Inference.” Proceedings of the National Academy of Sciences.
Mihalkova, and Mooney. 2007. Bottom-up Learning of Markov Logic Network Structure.” In Proceedings of the 24th International Conference on Machine Learning.
Montanari. 2011. Lecture Notes for Stat 375 Inference in Graphical Models.”
Murphy. 2012. Machine learning: a probabilistic perspective. Adaptive computation and machine learning series.
Neapolitan. 2003. Learning Bayesian Networks.
Pearl. 1982. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach.” In Proceedings of the Second AAAI Conference on Artificial Intelligence. AAAI’82.
———. 1986. Fusion, Propagation, and Structuring in Belief Networks.” Artificial Intelligence.
———. 2008. Probabilistic reasoning in intelligent systems: networks of plausible inference. The Morgan Kaufmann series in representation and reasoning.
Pearl, Geiger, and Verma. 1989. Conditional Independence and Its Representations.” Kybernetika.
Pereda, Quiroga, and Bhattacharya. 2005. “Nonlinear Multivariate Analysis of Neurophysiological Signals.” Progress in Neurobiology.
Pollard. 2004. “Hammersley-Clifford Theorem for Markov Random Fields.”
Rabbat, Figueiredo, and Nowak. 2008. Network Inference from Co-Occurrences.” IEEE Transactions on Information Theory.
Ran, and Hu. 2017. Parameter Identifiability in Statistical Machine Learning: A Review.” Neural Computation.
Schölkopf, Janzing, Peters, et al. 2012. On Causal and Anticausal Learning.” In ICML 2012.
Shachter. 1998. Bayes-Ball: Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams).” In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. UAI’98.
Shalizi, and McFowland III. 2016. Controlling for Latent Homophily in Social Networks Through Inferring Latent Locations.” arXiv:1607.06565 [Physics, Stat].
Smith, and Eisner. 2008. Dependency Parsing by Belief Propagation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
Spirtes, Glymour, and Scheines. 2001. Causation, Prediction, and Search. Adaptive Computation and Machine Learning.
Studený, and Vejnarová. 1998. “On Multiinformation Function as a Tool for Measuring Stochastic Dependence.” In Learning in Graphical Models.
Su, Wang, and Lai. 2012. Detecting Hidden Nodes in Complex Networks from Time Series.” Phys. Rev. E.
Textor, Idelberger, and Liśkiewicz. 2015. Learning from Pairwise Marginal Independencies.” arXiv:1508.00280 [Cs].
Visweswaran, and Cooper. 2014. Counting Markov Blanket Structures.” arXiv:1407.2483 [Cs, Stat].
Wainwright, and Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends® in Machine Learning.
Weiss. 2000. Correctness of Local Probability Propagation in Graphical Models with Loops.” Neural Computation.
Weiss, and Freeman. 2001. Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology.” Neural Computation.
Winn, and Bishop. 2005. Variational Message Passing.” In Journal of Machine Learning Research.
Wright. 1934. The Method of Path Coefficients.” The Annals of Mathematical Statistics.
Yedidia, Freeman, and Weiss. 2003. Understanding Belief Propagation and Its Generalizations.” In Exploring Artificial Intelligence in the New Millennium.
Zhang, Peters, Janzing, et al. 2012. Kernel-Based Conditional Independence Test and Application in Causal Discovery.” arXiv:1202.3775 [Cs, Stat].
Zhou, Cong, and Chen. 2017. “Augmentable Gamma Belief Networks.”