Randomized iterative methods for linear systems
Robert Mansel Gower
IMA Leslie Fox Prize Meeting, Strathclyde, June 2017
Motivation
Large scale Kernel Ridge Regression
Problem: a9a
Origin: LIBSVM
Large scale Kernel Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Large scale Kernel Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Block Coordinate
Descent
Large scale Kernel Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Rademacher Sketch?
Block Coordinate
Descent
Large scale Kernel Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Rademacher Sketch?
Block Coordinate
Descent
Good enough
Large scale Kernel Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Cheikh S. Toure
Rademacher Sketch?
Block Coordinate
Descent
Good enough
Linear Systems
The Problem
Assumption: The system is consistent (i.e., has a solution)
The Problem
The Problem
B: Symmetric and positive definite
The Problem
B: Symmetric and positive definite
As there are possibly multiple solutions, we compute the solution with the least B-norm.
Randomized Methods
Often fits in parallel/distributed architecture
Easy to analyse, good complexity
Easy to implement
Suitable for large scale problems: short recurrence, low iteration cost and low memory
The return of old methodsOld methods (Kaczmarz 1937, Gauss-Seidel
1823) make a randomized return, why?
Stochasticity inherent in problem
Old Methods
Randomized Kaczmarz T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
Karczmarz, M. S. (1937). Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Académie Polonaise Des Sciences et Des Lettres, 35, 355–357.
Randomized Kaczmarz T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
Karczmarz, M. S. (1937). Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Académie Polonaise Des Sciences et Des Lettres, 35, 355–357.
Randomized Kaczmarz T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
Karczmarz, M. S. (1937). Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Académie Polonaise Des Sciences et Des Lettres, 35, 355–357.
Randomized Kaczmarz T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
Karczmarz, M. S. (1937). Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Académie Polonaise Des Sciences et Des Lettres, 35, 355–357.
Randomized Kaczmarz T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
G.N. Hounsfield. Computerized transverse axial scanning (tomography): Part I. description of the system. British Journal Radiology. 1973
Karczmarz, M. S. (1937). Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Académie Polonaise Des Sciences et Des Lettres, 35, 355–357.
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Observation:
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Observation:
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Observation:
Randomized Coordinate DescentLeventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Observation:
Block Coord. Descent
Modern Sketching
Randomized Sketching
The Sketching Matrix
David P. Woodruff (2014), Foundations and Trends® in Theoretical Computer, Sketching as a Tool for Numerical Linear Algebra.
W. B. Johnson and J. Lindenstrauss (1984). Contemporary Mathematics, 26, Extensions of Lipschitz mappings into a Hilbert space.
Sketching and Projecting
1. Relaxation Viewpoint“Sketch and Project”
1. Relaxation Viewpoint“Sketch and Project”
1. Relaxation Viewpoint“Sketch and Project”
2. Optimization Viewpoint “Constrain and Approximate”
2. Optimization Viewpoint “Constrain and Approximate”
2. Optimization Viewpoint “Constrain and Approximate”
3. Geometric Viewpoint “Random Intersect”
3. Geometric Viewpoint “Random Intersect”
3. Geometric Viewpoint “Random Intersect”
3. Geometric Viewpoint “Random Intersect”
3. Geometric Viewpoint “Random Intersect”
3. Geometric Viewpoint “Random Intersect”
3. Geometric Viewpoint “Random Intersect”
(1)
3. Geometric Viewpoint “Random Intersect”
(2)
(1)
4. Algebraic Viewpoint“Random Update”
Random Update Vector
4. Algebraic Viewpoint“Random Update”
Moore-Penrose pseudo inverse
Random Update Vector
Fact:
4. Algebraic Viewpoint“Random Update”
Moore-Penrose pseudo inverse
Random Update Vector
Fact:
Small matrix
5. Analytic Viewpoint“Random Fixed Point”
5. Analytic Viewpoint“Random Fixed Point”
5. Analytic Viewpoint“Random Fixed Point”
Random Iteration Matrix
Theory
Complexity / Convergence
Theorem [GR‘15]
Complexity / Convergence
Theorem [GR‘15]
Smallest nonzero eigenvalue
Case study of
Case study of
Special Choice of Parameters
Case study of
Special Choice of Parameters
Case study of
Special Choice of Parameters
Case study of
Special Choice of Parameters
Case study of
Special Choice of Parameters
No zero rows in A is positive definite
The rate: lower and upper bounds
Theorem [RG‘15]
The rate: lower and upper bounds
Theorem [RG‘15]
Insight: The method is a contraction (without any assumptions on S whatsoever). That is, things can not get worse.
The rate: lower and upper bounds
Theorem [RG‘15]
Insight: The method is a contraction (without any assumptions on S whatsoever). That is, things can not get worse.
The rate: lower and upper bounds
Theorem [RG‘15]
Insight: The method is a contraction (without any assumptions on S whatsoever). That is, things can not get worse.
Insight: lower rank of A and great rank of STA gives better lower bound. In other words, when the dimension of the search space in the “constrain and approximate” viewpoint grows.
Special Case: Randomized
Kaczmarz Method
T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
T. Strohmer and R. J. Vershynin, (2009). A Randomized Kaczmarz Algorithm with Exponential Convergence Journal of Fourier Analysis and Applications, 15:262
Randomized Kaczmarz: derivation and rate
General Method
Randomized Kaczmarz: derivation and rate
General Method
Special Choice of Parameters
Randomized Kaczmarz: derivation and rate
General Method
Special Choice of Parameters
Randomized Kaczmarz: derivation and rate
General Method
Special Choice of Parameters
Randomized Kaczmarz: derivation and rate
General Method
Special Choice of Parameters
Randomized Kaczmarz: derivation and rate
General Method
Special Choice of Parameters
Randomized Kaczmarz: derivation and rate
General Method
Special Choice of Parameters
Complexity Rate.
Special Case: Randomized
Coordinate Descent
T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
Leventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Randomized Coordinate Descent: derivation and rateGeneral Method
Randomized Coordinate Descent: derivation and rateGeneral Method
Special Choice of Parameters
positive definite
Randomized Coordinate Descent: derivation and rateGeneral Method
Special Choice of Parameters
positive definite
Randomized Coordinate Descent: derivation and rateGeneral Method
Special Choice of Parameters
positive definite
Randomized Coordinate Descent: derivation and rateGeneral Method
Special Choice of Parameters
positive definite
Randomized Coordinate Descent: derivation and rateGeneral Method
Special Choice of Parameters
positive definite
Randomized Coordinate Descent: derivation and rateGeneral Method
Special Choice of Parameters
Complexity Rate
positive definite
Theory recovers known and new convergence results
Method Convergence Rate
Randomized CDLeast square
B S
T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
*Leventhal, D., & Lewis, A. S. (2010). Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35(3), 641–654.
Gaussian psd
Gaussian Kaczmarz
Designing New Methods
Optimal methodsOptimal choice
Optimal methodsOptimal choice
B
Optimal methodsOptimal choice
B
Optimal S
Optimal methodsOptimal choice
B
Optimal S
S with fixed range
Optimal methodsOptimal choice
B
Optimal S
S with fixed range Optimal pi's
Optimal methodsOptimal choice
B
Optimal S
S with fixed range Optimal pi's Difficult SDP
Practical New MethodsOne Shot Sketches
T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
N. Ailon and B. Chazelle (2006). Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. Mathematics of Operations Research, 35(3), 641–654.
Practical New MethodsOne Shot Sketches
S
Gaussian Matrix
T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
N. Ailon and B. Chazelle (2006). Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. Mathematics of Operations Research, 35(3), 641–654.
Subsampled Hadamard-Welsh
Countmin Sketch
Computing STA
Practical New MethodsOne Shot Sketches
S
Gaussian Matrix
T. Strohmer and R. Vershynin. A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications 15(2), pp. 262–278, 2009
N. Ailon and B. Chazelle (2006). Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. Mathematics of Operations Research, 35(3), 641–654.
Subsampled Hadamard-Welsh
Countmin Sketch
Computing STA
Rademacher Sketch
Sub-Rademacher Sketching
1
1
Sub-Rademacher Sketching
1
1
2
2
Sub-Rademacher Sketching
1
1
2
23
3
1
3
2
Sub-Rademacher Sketching
1
1
2
23
3
1
3
2
Sub-Rademacher Sketching Flip the sign with 50% probability
1
1
2
23
3
1
3
2
Sub-Rademacher Sketching Flip the sign with 50% probability
Experiments
Large scale Ridge Regression
Problem: w8a
Conjugate Gradient
Block Coordinate
Descent
Rademacher Sketch?
Origin: LIBSVM
Large scale Ridge Regression
Problem: rcv1
Conjugate Gradient
Block Coordinate
Descent
Rademacher Sketch?
Origin: LIBSVM
Large scale Ridge Regression
Problem: mnist
Conjugate Gradient
Origin: LIBSVM
Conclusions
Unites many randomized methods under a single framework
Improved convergence New lower bounds, less assumptions, tightest results.
Design new methods S = Guassian, count-sketch, Walsh-Hadamard ...etc
Optimal Sampling We can choose a sampling that optimizes the convergence rate.
Large scale Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Rademacher Sketch
Block Coordinate
Descent
Large scale Ridge Regression
Problem: a9a
Origin: LIBSVM
Conjugate Gradient
Rademacher Sketch
Block Coordinate
Descent
Fast initial sublinear convergence
RMG and Peter RichtárikStochastic Dual Ascent for Solving Linear SystemsPreprint arXiv:1512.06890, 2015
RMG and Peter RichtárikRandomized quasi-Newton updates are linearly convergent matrix inversion algorithmsPreprint arXiv:1602.01768, 2016
RMG and Peter RichtárikRandomized Iterative Methods for Linear Systems. SIAM. J. Matrix Anal. & Appl., 36(4), 1660–1690, 2015. Most Downloaded SIMAX Paper!
Thank you,Questions?