Statistics and machine learning

April 15, 2011 — September 13, 2023

graphical models

how do science

machine learning

probability

statistics

Suspiciously similar content

Those who ignore statistics are condemned to reinvent it

— Bradley Efron, according to Kareem Carr

A methodological distinction that some people make: What’s the difference between analytics and statistics?

Analytics helps you form hypotheses. It improves the quality of your questions.

Statistics helps you test hypotheses. It improves the quality of your answers.

I would divide these into exploratory and descriptive statistics, but that terminology is not universal.

It is best not to need statistics at all. If a pattern is so clear it is undeniable, then we can go home early. Although — is our bar for undeniable high enough? Have we really eliminated our own biases and wishful thinking? How would we know?

OK, statistics also play some other roles, like giving us greater accuracy in our predictions, but that doesn’t fit into the aphorism so nicely.

What else to say here? I am not sure. I created this page before I became a professional statistician, and statistics grew to be half this website. For more information on statistics, see… pretty much any page.

1 Role in science

Statistics is an Excellent Servant and a Bad Master:

This means that Galileo, Newton, Kepler, Hooke, Pasteur, Mendel, Lavoisier, Maxwell, von Helmholtz, Mendeleev, etc. did their work without anything that resembled modern statistics, and that Einstein, Curie, Fermi, Bohr, Heisenberg, etc. etc. did their work in an age when statistics was still extremely rudimentary. We don’t need statistics to do good research.

Indeed we do not. What we need statistics for is to ensure that marginally viable research is not 💩 research.

2 Exploratory data analysis

See exploratory data analysis.

3 Unifying statistics and ML

I’m especially interested in modern fusion methods that harmonise what we would call statistics and machine learning methods, and the unnecessary terminological confusion between those systems. But I have nothing to say about that right now.

Why not read Kevin Murphy’s probml/pml-book: Probabilistic Machine Learning textbooks? They are free online.

4 Decisions

TODO: Introduce decision theory.

5 Tests

TODO: Introduce tests.

6 Taxonomies

Boaz Barak, ML Theory with bad drawings attempts one division of labour:

However, what we actually do is at least thrice-removed from this ideal:

The model gap: We do not optimise over all possible systems, but rather a small subset of such systems (e.g., ones that belong to a certain family of models).

The metric gap: In almost all cases, we do not optimise the actual measure of success we care about, but rather another metric that is at best correlated with it.

The algorithm gap: We don’t even optimise the latter metric since it will almost always be non-convex, and hence the system we end up with depends on our starting point and the particular algorithms we use.

The magic of machine learning is that sometimes (though not always!) we can still get good results despite these gaps. Much of the theory of machine learning is about understanding under what conditions can we bridge some of these gaps.

The above discussion explains the “machine Learning is just X” takes. The expressivity of our models falls under approximation theory. The gap between the success we want to achieve and the metric we can measure often corresponds to the difference between population and sample performance, which becomes a question of statistics. The study of our algorithms’ performance falls under optimisation.

7 Textbooks, resources

Arne Hallam’s Home Page includes some excellent lectures on statistics
Encyclopedia of Machine Learning and Data Science would really like to be a definitive reference
roboticcam/machine-learning-notes: My continuously updated Machine Learning, Probabilistic Models and Deep Learning notes and demos (2000+ slides) 我不间断更新的机器学习，概率模型和深度学习的讲义(2000+页)和视频链接 Richard’s notes are a masterclass in 80/20ing your note-taking. Breathtakingly ambitious, well explained, little bit scruffy.
Jean Gallier and Jocelyn Quaintance, Algebra, Topology, Differential Calculus, and Optimization Theory for Computer Science and Machine Learning, 2188 pages as of 2022/10/30, and growing.
Arne Hallam’s Home Page includes some excellent lectures and tutorials on statistics

8 References

Aggarwal. 2015. Data Mining.

Cox, and Hinkley. 2000. Theoretical Statistics.

Dadkhah. 2011. Foundations of Mathematical and Computational Economics.

Devroye, Györfi, and Lugosi. 1996. A Probabilistic Theory of Pattern Recognition.

Efron, and Hastie. 2016. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Institute of Mathematical Statistics Monographs.

Freedman, and Stark. 2009. “What Is the Chance of an Earthquake?” In Statistical Models and Causal Inference: A Dialogue with the Social Sciences.

Gelman, Carlin, Stern, et al. 2013. Bayesian Data Analysis. Chapman & Hall/CRC texts in statistical science.

Greenland. 1995a. “Dose-Response and Trend Analysis in Epidemiology: Alternatives to Categorical Analysis.” Epidemiology.

———. 1995b. “Problems in the Average-Risk Interpretation of Categorical Dose-Response Analyses.” Epidemiology.

Guttman. 1977. “What Is Not What in Statistics.” Journal of the Royal Statistical Society. Series D (The Statistician).

Guttorp. 1995. Stochastic modeling of scientific data. Stochastic modeling series.

Hardt, and Recht. 2021. “Patterns, Predictions, and Actions: A Story about Machine Learning.” arXiv:2102.05242 [Cs, Stat].

Hastie, Tibshirani, and Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction.

Kobayashi, Mark, and Turin. 2011. Probability, Random Processes, and Statistical Analysis: Applications to Communications, Signal Processing, Queueing Theory and Mathematical Finance.

Kroese, Botev, Taimre, et al. 2019. Mathematical and Statistical Methods for Data Science and Machine Learning. Chapman & Hall/CRC Machine Learning & Pattern Recognition.

Lehmann, E. L., and Casella. 1998. Theory of point estimation. Springer texts in statistics.

Lehmann, Erich L., and Romano. 2010. Testing statistical hypotheses. Springer texts in statistics.

Lumley, Diehr, Emerson, et al. 2002. “The Importance of the Normality Assumption in Large Public Health Data Sets.” Annual Review of Public Health.

Mohri, Rostamizadeh, and Talwalkar. 2018. Foundations of Machine Learning. Adaptive Computation and Machine Learning.

Murphy. 2012. Machine learning: a probabilistic perspective. Adaptive computation and machine learning series.

———. 2022. Probabilistic Machine Learning: An Introduction. Adaptive Computation and Machine Learning Series.

———. 2023. Probabilistic Machine Learning: Advanced Topics.

Robert, and Casella. 2004. Monte Carlo Statistical Methods. Springer Texts in Statistics.

Schervish. 2012. Theory of Statistics. Springer Series in Statistics.

Soch, Proofs, Faulkenberry, et al. 2020. “StatProofBook/StatProofBook.github.io: StatProofBook 2020.”

van der Vaart. 2007. Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics.

Wasserman. 2013. All of Statistics: A Concise Course in Statistical Inference.