Krylov subspace iteration

Power method, pagerank, Lanczos decomposition, …

August 5, 2014 — October 16, 2024

feature construction
functional analysis
high d
linear algebra
networks
probability
signal processing
sparser than thou
statistics

Assumed audience:

People with undergrad linear algebra

Figure 1

1 Power iteration

The simplest possible variant.

2 Lanczos decomposition

Lanczos decomposition is a handy approximation for matrices that are cheap to multiply because of some structure, but expensive to store. It can also be used to calculate an approximate inverse cheaply.

I learnt this trick from Gaussian process literature in the context of Lanczos Variance Estimates (LOVE) (Pleiss et al. 2018), although I believe it exists elsewhere.

Given some rank \(k\) and an arbitrary starting vector \(\boldsymbol{b}\), the Lanczos algorithm iteratively approximates \(\mathrm{K} \in\mathbb{R}^{n \times n}\) by a low-rank factorisation \(\mathrm{K}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\), where \(\mathrm{T} \in \mathbb{R}^{k \times k}\) is tridiagonal and \(\mathrm{Q} \in \mathbb{R}^{n \times k}\) has orthogonal columns. Crucially, we do not need to form \(\mathrm{K}\) to evaluate matrix-vector products \(\mathrm{K}\boldsymbol{b}\) for an arbitrary vector \(\boldsymbol{b}\). Moreover, with a given Lanczos approximand \(\mathrm{Q},\mathrm{T}\) we may estimate \[\begin{align*} \mathrm{K}^{-1}\boldsymbol{c}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\boldsymbol{c}. \end{align*}\] even for \(\boldsymbol{b}\neq\boldsymbol{c}\). Say we wish to calculate \(\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)^{-1}\mathrm{B}\), with \(\mathrm{Z}\in\mathbb{R}^{D\times N }\) and \(N\ll D\).

We approximate the solution to this linear system using the partial Lanczos decomposition starting with probe vector \(\boldsymbol{b}=\overline{\mathrm{B}}\) and \(\mathrm{K}=\left(\mathrm{Z} \mathrm{Z}^{\top}+\sigma^2 \mathrm{I}\right)\).

This requires \(k\) matrix-vector products of the form \[\begin{align*} \underbrace{\left(\underbrace{\mathrm{Z} \mathrm{Z}^{\top}}_{\mathcal{O}(ND^2)}+\sigma^2 \mathrm{I}\right)\boldsymbol{b}}_{\mathcal{O}(D^2)} =\underbrace{\mathrm{Z} \underbrace{(\mathrm{Z}^{\top}\boldsymbol{b})}_{\mathcal{O}(ND)}}_{\mathcal{O}(ND)} +\sigma^2 \boldsymbol{b}. \end{align*}\] Using the latter representation, the required matrix-vector product may be found with a time complexity cost of \(\mathcal{O}(ND)\). Space complexity is also \(\mathcal{O}(ND)\). The output of the Lanczos decomposition is \(\mathrm{Q},\mathrm{T}\) such that \(\left(\mathrm{Z}\mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)\boldsymbol{b}\approx \mathrm{Q} \mathrm{T} \mathrm{Q}^{\top}\boldsymbol{b}\). Then the solution to the inverse-matrix-vector product may be approximated by \(\left(\mathrm{Z} \mathrm{Z}^{\top} +\sigma^2 \mathrm{I}\right)^{-1} \mathrm{B}\approx \mathrm{Q}\mathrm{T}^{-1}\mathrm{Q}^{\top}\mathrm{B}\). requiring the solution in \(\mathrm{X}\) of the much smaller linear system \(\mathrm{X}\mathrm{T}=\mathrm{Q}\). Exploiting the positive-definiteness of \(\mathrm{T}\) we may use the Cholesky decomposition of \(\mathrm{T}=\mathrm{L}^{\top}\mathrm{L}\) for a constant speedup over solving an arbitrary linear system. The time cost of the solution is \(\mathcal{O}(Dk^3)\), for an overall cost to the matrix inversions of \(\mathcal{O}(NDk+Dk^3)\).

make more precise; we may have missed some multiplies and Lanczos overhead, plus deviation calcs

Lifehack: Find derivatives of Lanczos iterations via pnkraemer/matfree: Matrix-free linear algebra in JAX. (Krämer et al. 2024).

3 Incoming

4 References

Alexanderian, Petra, Stadler, et al. 2016. A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Nonlinear Inverse Problems.” SIAM Journal on Scientific Computing.
Ameli, and Shadden. 2023. A Singular Woodbury and Pseudo-Determinant Matrix Identities and Application to Gaussian Process Regression.” Applied Mathematics and Computation.
Ba, Grosse, and Martens. 2016. Distributed Second-Order Optimization Using Kronecker-Factored Approximations.”
Borcea, Druskin, and Knizhnerman. 2005. On the Continuum Limit of a Discrete Inverse Spectral Problem on Optimal Finite Difference Grids.” Communications on Pure and Applied Mathematics.
Chow, and Saad. 1997. Approximate Inverse Techniques for Block-Partitioned Matrices.” SIAM Journal on Scientific Computing.
Chung, and Chung. 2014. An Efficient Approach for Computing Optimal Low-Rank Regularized Inverse Matrices.” Inverse Problems.
Cordero, Soto-Quiros, and Torregrosa. 2021. A General Class of Arbitrary Order Iterative Methods for Computing Generalized Inverses.” Applied Mathematics and Computation.
Drineas, Kannan, and Mahoney. 2006a. Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix.” SIAM Journal on Computing.
———. 2006b. Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition.” SIAM Journal on Computing.
Gardner, Pleiss, Bindel, et al. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.
Golub, and Meurant. 2010. Matrices, Moments and Quadrature with Applications.
Gower. 2016. Sketch and Project: Randomized Iterative Methods for Linear Systems and Inverting Matrices.” arXiv:1612.06013 [Math].
Hager. 1989. Updating the Inverse of a Matrix.” SIAM Review.
Halko, Martinsson, Shkolnisky, et al. 2011. An Algorithm for the Principal Component Analysis of Large Data Sets.” SIAM Journal on Scientific Computing.
Heinig, and Rost. 2011. Fast Algorithms for Toeplitz and Hankel Matrices.” Linear Algebra and Its Applications.
Huang, and Wu. 2023. Truncated and Sparse Power Methods with Partially Updating for Large and Sparse Higher-Order PageRank Problems.” Journal of Scientific Computing.
Khan. 2008. Updating Inverse of a Matrix When a Column Is Added/Removed.”
Krämer, Moreno-Muñoz, Roy, et al. 2024. Gradients of Functions of Large Matrices.”
Kumar, and Shneider. 2016. Literature Survey on Low Rank Approximation of Matrices.” arXiv:1606.06511 [Cs, Math].
Mahoney. 2010. Randomized Algorithms for Matrices and Data.
Martens. 2010. Deep Learning via Hessian-Free Optimization.” In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10.
Martens, and Grosse. 2015. Optimizing Neural Networks with Kronecker-Factored Approximate Curvature.” In Proceedings of the 32nd International Conference on Machine Learning.
Martens, and Sutskever. 2012. Training Deep and Recurrent Networks with Hessian-Free Optimization.” In Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science.
Martinsson. 2016. Randomized Methods for Matrix Computations and Analysis of High Dimensional Data.” arXiv:1607.01649 [Math].
Martinsson, and Voronin. 2015. A Randomized Blocked Algorithm for Efficiently Computing Rank-Revealing Factorizations of Matrices.” arXiv:1503.07157 [Math].
Petersen, and Pedersen. 2012. The Matrix Cookbook.”
Pleiss, Gardner, Weinberger, et al. 2018. Constant-Time Predictive Distributions for Gaussian Processes.” In.
Rokhlin, Szlam, and Tygert. 2009. A Randomized Algorithm for Principal Component Analysis.” SIAM J. Matrix Anal. Appl.
Saad. 2003. Iterative Methods for Sparse Linear Systems: Second Edition.
Sachdeva. 2013. Faster Algorithms via Approximation Theory.” Foundations and Trends® in Theoretical Computer Science.
Searle. 2014. Matrix Algebra.” In Wiley StatsRef: Statistics Reference Online.
Searle, and Khuri. 2017. Matrix Algebra Useful for Statistics.
Shin, Zhao, and Shomorony. 2023. Adaptive Power Method: Eigenvector Estimation from Sampled Data.” In Proceedings of The 34th International Conference on Algorithmic Learning Theory.
Simoncini. 2016. Computational Methods for Linear Matrix Equations.” SIAM Review.
Simoncini, and Szyld. 2003. Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing.” SIAM Journal on Scientific Computing.
Simpson, Daniel Peter. 2008. Krylov Subspace Methods for Approximating Functions of Symmetric Positive Definite Matrices with Applications to Applied Statistics and Anomalous Diffusion.”
Simpson, Daniel P., Turner, Strickland, et al. 2013. Scalable Iterative Methods for Sampling from Massive Gaussian Random Vectors.”
Song, Babu, and Palomar. 2015. Sparse Generalized Eigenvalue Problem Via Smooth Optimization.” IEEE Transactions on Signal Processing.
Spantini, Cui, Willcox, et al. 2017. Goal-Oriented Optimal Approximations of Bayesian Linear Inverse Problems.” SIAM Journal on Scientific Computing.
Spantini, Solonen, Cui, et al. 2015. Optimal Low-Rank Approximations of Bayesian Linear Inverse Problems.” SIAM Journal on Scientific Computing.
Tenorio. 2017. An Introduction to Data Analysis and Uncertainty Quantification for Inverse Problems. Mathematics in Industry.
Ubaru, Chen, and Saad. 2017. Fast Estimation of \(tr(f(A))\) via Stochastic Lanczos Quadrature.” SIAM Journal on Matrix Analysis and Applications.
van der Vorst. n.d. Krylov Subspace Iteration.” Computing in Science & Engineering.
Wang. 2015. A Practical Guide to Randomized Matrix Computations with MATLAB Implementations.” arXiv:1505.07570 [Cs].
Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science 1.0.
Xu. 2023. On the Accelerated Noise-Tolerant Power Method.” In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics.
Yuan, and Zhang. 2013. Truncated Power Method for Sparse Eigenvalue Problems.” J. Mach. Learn. Res.
Yurtsever, Tropp, Fercoq, et al. 2021. Scalable Semidefinite Programming.” SIAM Journal on Mathematics of Data Science.