Matrix calculus

July 9, 2018 — July 23, 2023

algebra
calculus
functional analysis
linear algebra
optimization
Figure 1

We can generalize high school calculus, which is about scalar functions of a scalar argument, in various ways to handle matrix-valued functions or matrix-valued arguments and still look tidy. One could generalize this further, by going to full tensor calculus. But it happens that specifically matrix/vector operations are at a useful point of complexity for lots of algorithms. (I usually want this for higher-order gradient descent.)

I mention two convenient and popular formalisms for lazy matrix calculus. In practice, a mix of each is often useful.

1 Matrix differentials

Figure 2

🏗 I need to return to this and tidy it up with some examples.

A special case of tensor calculus; where the rank of the argument and value of the function is not too big. In this setting, we often get to cheat and use some handy shortcuts. Fun pain point: agreeing upon the layout of derivatives, numerator vs denominator.

If our problem is nice, this often gets us a low-fuss, compact, tidy solution even for some surprising cases where it seems that more general tensors would be more natural — for which, see below.

2 Automating matrix calculus

  1. Use Mathematica+NCAlgebra to find matrix differentials.
  2. Soeren Laue, Matthias Mitterreiter, Joachim Giesen and Jens K. Mueller (Soeren Laue, Mitterreiter, and Giesen 2018) have a website MatrixCalculus.org which uses Ricci calculus to generate matrix calculus formulae. Bonus feature: generates both python and LaTeX code.

3 Indexed tensor calculus

Filed under multilinear algebra.

4 References

Bhatia. 1997. Matrix Analysis. Graduate Texts in Mathematics.
Del Moral, and Niclas. 2018. A Taylor Expansion of the Square Root Matrix Functional.”
Dolcetti, and Pertici. 2020. Real Square Roots of Matrices: Differential Properties in Semi-Simple, Symmetric and Orthogonal Cases.”
Giles. 2008. Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation.” In Advances in Automatic Differentiation.
Golub, and van Loan. 1983. Matrix Computations.
Graham. 1981. Kronecker Products and Matrix Calculus: With Applications.
Gupta, and Nagar. 1999. Matrix Variate Distributions. Chapman & Hall/CRC Monographs and Surveys in Pure and Applied Mathematics 104.
Ionescu, Vantzos, and Sminchisescu. 2016. Training Deep Networks with Structured Layers by Matrix Backpropagation.”
Laue, Soeren, Mitterreiter, and Giesen. 2018. Computing Higher Order Derivatives of Matrix and Tensor Expressions.” In Advances in Neural Information Processing Systems 31.
Laue, Sören, Mitterreiter, and Giesen. 2020. “A Simple and Efficient Tensor Calculus.” In AAAI Conference on Artificial Intelligence, (AAAI).
Magnus, and Neudecker. 2019. Matrix differential calculus with applications in statistics and econometrics. Wiley series in probability and statistics.
Minka. 2000. Old and new matrix algebra useful for statistics.
Parr, and Howard. 2018. The Matrix Calculus You Need For Deep Learning.”
Petersen, and Pedersen. 2012. The Matrix Cookbook.”
Schotthöfer, Zangrando, Kusch, et al. 2022. Low-Rank Lottery Tickets: Finding Efficient Low-Rank Neural Networks via Matrix Differential Equations.”
Searle. 2014. Matrix Algebra.” In Wiley StatsRef: Statistics Reference Online.
Searle, and Khuri. 2017. Matrix Algebra Useful for Statistics.
Seber. 2007. A Matrix Handbook for Statisticians.
Simoncini. 2016. Computational Methods for Linear Matrix Equations.” SIAM Review.
Song, Sebe, and Wang. 2022. Fast Differentiable Matrix Square Root.” In.
Steeb. 2006. Problems and Solutions in Introductory and Advanced Matrix Calculus.
Turkington. 2002. Matrix Calculus and Zero-One Matrices: Statistical and Econometric Applications.
Yurtsever, Tropp, Fercoq, et al. 2021. Scalable Semidefinite Programming.” SIAM Journal on Mathematics of Data Science.