+ All Categories
Home > Documents > The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod...

The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod...

Date post: 01-Apr-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
168
The Lanczos Method in Data Science Christopher Musco Massachusetts Institute of Technology. 1
Transcript
Page 1: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

The Lanczos Method in Data Science

Christopher Musco

Massachusetts Institute of Technology.

1

Page 2: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

a remarkable algorithm

The Lanczos Method

Used for solving linear systems, eigendecomposition, matrixexponentials, and approximating any matrix function.

• Introduced in 1950, developed through the 70s, ubiquitousin well-developed scientific computing libraries.

• Resurgence of interest due to new applications in datascience and machine learning.

2

Page 3: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

a remarkable algorithm

The Lanczos MethodUsed for solving linear systems, eigendecomposition, matrixexponentials, and approximating any matrix function.

• Introduced in 1950, developed through the 70s, ubiquitousin well-developed scientific computing libraries.

• Resurgence of interest due to new applications in datascience and machine learning.

2

Page 4: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

a remarkable algorithm

The Lanczos MethodUsed for solving linear systems, eigendecomposition, matrixexponentials, and approximating any matrix function.

• Introduced in 1950, developed through the 70s, ubiquitousin well-developed scientific computing libraries.

• Resurgence of interest due to new applications in datascience and machine learning.

2

Page 5: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

a remarkable algorithm

The Lanczos MethodUsed for solving linear systems, eigendecomposition, matrixexponentials, and approximating any matrix function.

• Introduced in 1950, developed through the 70s, ubiquitousin well-developed scientific computing libraries.

• Resurgence of interest due to new applications in datascience and machine learning.

2

Page 6: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos in data science

New applications combine Lanczos with super-scalablestochastic iterative and randomized sketching methods.

Require understanding of performance with noisy inputs.

Today’s results:

1. Lanczos is very noise stable, performing essentiallyoptimally amongst other polynomial methods.

2. Except when solving linear systems! We provide stronglow-bounds that noise can significantly impair Lanczosand the closely related conjugate gradient method.

3

Page 7: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos in data science

New applications combine Lanczos with super-scalablestochastic iterative and randomized sketching methods.

Require understanding of performance with noisy inputs.

Today’s results:

1. Lanczos is very noise stable, performing essentiallyoptimally amongst other polynomial methods.

2. Except when solving linear systems! We provide stronglow-bounds that noise can significantly impair Lanczosand the closely related conjugate gradient method.

3

Page 8: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos in data science

New applications combine Lanczos with super-scalablestochastic iterative and randomized sketching methods.

Require understanding of performance with noisy inputs.

Today’s results:

1. Lanczos is very noise stable, performing essentiallyoptimally amongst other polynomial methods.

2. Except when solving linear systems! We provide stronglow-bounds that noise can significantly impair Lanczosand the closely related conjugate gradient method.

3

Page 9: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos in data science

New applications combine Lanczos with super-scalablestochastic iterative and randomized sketching methods.

Require understanding of performance with noisy inputs.

Today’s results:

1. Lanczos is very noise stable, performing essentiallyoptimally amongst other polynomial methods.

2. Except when solving linear systems! We provide stronglow-bounds that noise can significantly impair Lanczosand the closely related conjugate gradient method.

3

Page 10: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

relevant paper

Stability of the Lanczos Method for Matrix FunctionApproximation [SODA 2018]

Aaron Sidford(Stanford)

Cameron Musco(MIT)

Roy Frostig(Google)

Principal Component Projection Without Principal ComponentAnalysis [ICML 2016]

4

Page 11: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

relevant paper

Stability of the Lanczos Method for Matrix FunctionApproximation [SODA 2018]

Aaron Sidford(Stanford)

Cameron Musco(MIT)

Roy Frostig(Google)

Principal Component Projection Without Principal ComponentAnalysis [ICML 2016]

4

Page 12: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

what is a matrix function?

4

Page 13: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

what is a matrix function?

Every matrix A ∈ Rn×d has a singular value decomposition:

U, V are orthogonal, Σ is diagonal, σ1 ≥ . . . ≥ σd ∈ R+.

5

Page 14: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

what is a matrix function?

Every matrix A ∈ Rn×d has a singular value decomposition:

U, V are orthogonal, Σ is diagonal, σ1 ≥ . . . ≥ σd ∈ R+.

5

Page 15: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

what is a matrix function?

Every symmetric matrix A ∈ Rd×d has an orthogonaleigendecomposition:

6

Page 16: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

what is a matrix function?

For any scalar function f : R → R define f(A):

7

Page 17: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

computing matrix functions

Cost to compute f (A):

O(n3)︸ ︷︷ ︸eigendecompose A = VΛVT

+ O(n)︸︷︷︸compute f(Λ)

+ O(n3)︸ ︷︷ ︸form Vf (Λ) VT

= O(n3) in practice

In theory can be improved to O(nω) ≈ O(n2.3728639).(but this is still slow)

8

Page 18: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

computing matrix functions

Cost to compute f (A):

O(n3)︸ ︷︷ ︸eigendecompose A = VΛVT

+ O(n)︸︷︷︸compute f(Λ)

+ O(n3)︸ ︷︷ ︸form Vf (Λ) VT

= O(n3) in practice

In theory can be improved to O(nω) ≈ O(n2.3728639).(but this is still slow)

8

Page 19: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

computing matrix functions

Cost to compute f (A):

O(n3)︸ ︷︷ ︸eigendecompose A = VΛVT

+ O(n)︸︷︷︸compute f(Λ)

+ O(n3)︸ ︷︷ ︸form Vf (Λ) VT

= O(n3) in practice

In theory can be improved to O(nω) ≈ O(n2.3728639).(but this is still slow)

8

Page 20: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

computing matrix functions

Cost to compute f (A):

O(n3)︸ ︷︷ ︸eigendecompose A = VΛVT

+ O(n)︸︷︷︸compute f(Λ)

+ O(n3)︸ ︷︷ ︸form Vf (Λ) VT

= O(n3) in practice

In theory can be improved to O(nω) ≈ O(n2.3728639).(but this is still slow)

8

Page 21: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

computing matrix functions

Cost to compute f (A):

O(n3)︸ ︷︷ ︸eigendecompose A = VΛVT

+ O(n)︸︷︷︸compute f(Λ)

+ O(n3)︸ ︷︷ ︸form Vf (Λ) VT

= O(n3) in practice

In theory can be improved to O(nω) ≈ O(n2.3728639).(but this is still slow)

8

Page 22: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

computing matrix functions

Cost to compute f (A):

O(n3)︸ ︷︷ ︸eigendecompose A = VΛVT

+ O(n)︸︷︷︸compute f(Λ)

+ O(n3)︸ ︷︷ ︸form Vf (Λ) VT

= O(n3) in practice

In theory can be improved to O(nω) ≈ O(n2.3728639).(but this is still slow)

8

Page 23: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

faster matrix functions

Typically only interested in computing f(A)x for some x ∈ Rn.

f

A

·

x

Often much cheaper than computing f(A) explicitly!

(this is what Lanczos and other algorithms target)

9

Page 24: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

faster matrix functions

Typically only interested in computing f(A)x for some x ∈ Rn.

f

A

·

x

Often much cheaper than computing f(A) explicitly!

(this is what Lanczos and other algorithms target)

9

Page 25: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

applications indata problems

9

Page 26: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions in data analysis

Least squares regression

Find w that minimizesn∑i=1

|bi − aTiw|2 = ∥Aw− b∥22

Solution: w =(ATA)−1 ATb

10

Page 27: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions in data analysis

Least squares regression

Find w that minimizesn∑i=1

|bi − aTiw|2 = ∥Aw− b∥22

Solution: w =(ATA)−1 ATb

10

Page 28: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions in data analysis

Least squares regression

Find w that minimizesn∑i=1

|bi − aTiw|2 = ∥Aw− b∥22

Solution: w =(ATA)−1 ATb

10

Page 29: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix inverse

f

ATA

·

x

Where f(λ) = 1/λ and x = ATb.

Since VTV = VVT = I :ATA︷ ︸︸ ︷[ V

][λ1 . . .

λn

][VT

](ATA)−1︷ ︸︸ ︷[ V

] 1λ1 . . . 1

λn

[ VT] = I

11

Page 30: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix inverse

f

ATA

·

x

Where f(λ) = 1/λ and x = ATb.

Since VTV = VVT = I :ATA︷ ︸︸ ︷[ V

][λ1 . . .

λn

][VT

](ATA)−1︷ ︸︸ ︷[ V

] 1λ1 . . . 1

λn

[ VT] = I

11

Page 31: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix inverse

ExampleLinear system solving, A−1x

Functionf(x) = 1/x

0 0.2 0.4 0.6 0.8 1

20

40

60

80

100

Countless applications...

12

Page 32: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix exponential

ExampleMatrix exponential, eAx

Functionf(x) = ex

0 0.2 0.4 0.6 0.8 1

1

1.5

2

2.5

Applications in semidefinite programming, graph algorithms(balanced separator), differential equations.

[Arora, Hazan, Kale, ‘05], [Iyengar, Phillips, and Stein ’11],[Orecchia, Sachdeva, Vishnoi, ‘12], [Higham ‘08] (very complete survey) 12

Page 33: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix log

ExampleMatrix log, log(A)x

Functionf(x) = log(x)

0 0.2 0.4 0.6 0.8 1

-5

-4

-3

-2

-1

0

Used to estimate log(det(A)) = tr(log(A)).Appears in log-likelihood equation for multivariate Gaussian.Applications in Gaussian process regression, learning distance

kernels, Markov random fields.[Dhillon, et al ‘06, ‘07,‘08], [Han, Malioutov, Shin ‘15], [Saibaba, Alexanderian, Ipsen ‘17]

12

Page 34: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix step function

ExampleStep function, stepλ(A)x

Function

f(x) =

1, x ≥ λ

0, x < λ0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

Projection to top eigenvectors, eigenvalue counting, computingmatrix norms, spectral filtering, many more...

[Frostig, Musco, Musco, Sidford ‘16], [Saad, Ubaru ‘16], [Allen-Zhu, Li ‘17], [Tremblay, Puy,Gribonval, Vandergheynst ‘16], [Musco, Netrapalli, Sidford, Ubaru and Woodruff ‘18]

12

Page 35: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Standard Regression:

Given: A, bSolve: x∗ = argminx ∥Ax− b∥2

Principal Component Regression:

Given: A, b, λSolve: x∗ = argminx ∥Aλx− b∥2

13

Page 36: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Standard Regression:

Given: A, bSolve: x∗ = argminx ∥Ax− b∥2

Principal Component Regression:

Given: A, b, λSolve: x∗ = argminx ∥Aλx− b∥2

13

Page 37: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Standard Regression:

Given: A, bSolve: x∗ = argminx ∥Ax− b∥2

Principal Component Regression:

Given: A, b, λSolve: x∗ = argminx ∥Aλx− b∥2

13

Page 38: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Standard Regression:

Given: A, bSolve: x∗ = argminx ∥Ax− b∥2

Principal Component Regression:

Given: A, b, λSolve: x∗ = argminx ∥Aλx− b∥2

13

Page 39: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Singular values of A

0 100 200 300 400 500 600 700 800 900 10000

5

10

15

20

i

σi

2

14

Page 40: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Singular values of A

0 100 200 300 400 500 600 700 800 900 10000

5

10

15

20

i

σi

“Signal”

“Noise”

λ

2

14

Page 41: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Singular values of Aλ

0 100 200 300 400 500 600 700 800 900 10000

5

10

15

20

i

σi

“Signal”

λ “Noise”

2

14

Page 42: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Principal Component Regression (PCR):Goal: x∗ = argminx ∥Aλx− b∥2

Solution: x = (ATλAλ)−1ATb

Fastest way to apply ATλAλ and (ATλAλ)−1 to a vector is with amatrix step function.

15

Page 43: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

Principal Component Regression (PCR):Goal: x∗ = argminx ∥Aλx− b∥2

Solution: x = (ATλAλ)−1ATb

Fastest way to apply ATλAλ and (ATλAλ)−1 to a vector is with amatrix step function.

15

Page 44: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

principal component regression

16

Page 45: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

eigenvalue counting

How many eigenvalues does A have that are greater than λ?

d∑i=1

I[λi > λ] =d∑i=1

stepλ(λi(A))

= trace (stepλ(A))

Hutchinson’s estimator:

Ex∼Nd

[xTf(A)x

]= trace (f(A))

Same method used for estimating log-determinants andmatrix norms.

17

Page 46: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

eigenvalue counting

How many eigenvalues does A have that are greater than λ?d∑i=1

I[λi > λ] =d∑i=1

stepλ(λi(A))

= trace (stepλ(A))

Hutchinson’s estimator:

Ex∼Nd

[xTf(A)x

]= trace (f(A))

Same method used for estimating log-determinants andmatrix norms.

17

Page 47: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

eigenvalue counting

How many eigenvalues does A have that are greater than λ?d∑i=1

I[λi > λ] =d∑i=1

stepλ(λi(A)) = trace (stepλ(A))

Hutchinson’s estimator:

Ex∼Nd

[xTf(A)x

]= trace (f(A))

Same method used for estimating log-determinants andmatrix norms.

17

Page 48: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

eigenvalue counting

How many eigenvalues does A have that are greater than λ?d∑i=1

I[λi > λ] =d∑i=1

stepλ(λi(A)) = trace (stepλ(A))

Hutchinson’s estimator:

Ex∼Nd

[xTf(A)x

]= trace (f(A))

Same method used for estimating log-determinants andmatrix norms.

17

Page 49: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

eigenvalue counting

How many eigenvalues does A have that are greater than λ?d∑i=1

I[λi > λ] =d∑i=1

stepλ(λi(A)) = trace (stepλ(A))

Hutchinson’s estimator:

Ex∼Nd

[xTf(A)x

]= trace (f(A))

Same method used for estimating log-determinants andmatrix norms. 17

Page 50: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

fast algorithms formatrix functions

17

Page 51: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix polynomials

Matrix polynomials can be computed iteratively.

f([

A])

·

x

Akx = VΛVTVΛVT · · · VΛVTx = VΛkVTx

Total time to compute p(A)x = c0x+ c1Ax+ c2A2x+ . . .+ ckAkx:

O(k · nnz(A))

18

Page 52: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix polynomials

Matrix polynomials can be computed iteratively.

p([

A])

·

x

Akx = VΛVTVΛVT · · · VΛVTx = VΛkVTx

Total time to compute p(A)x = c0x+ c1Ax+ c2A2x+ . . .+ ckAkx:

O(k · nnz(A))

18

Page 53: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix polynomials

Matrix polynomials can be computed iteratively.

p([

A])

·

x

Akx = VΛVTVΛVT · · · VΛVTx = VΛkVTx

Total time to compute p(A)x = c0x+ c1Ax+ c2A2x+ . . .+ ckAkx:

O(k · nnz(A))

18

Page 54: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix polynomials

Matrix polynomials can be computed iteratively.

p([

A])

·

x

Akx = VΛVTVΛVT · · · VΛVTx = VΛkVTx

Total time to compute p(A)x = c0x+ c1Ax+ c2A2x+ . . .+ ckAkx:

O(k · nnz(A))18

Page 55: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix polynomials

Matrix polynomials can be computed iteratively.

p([

A])

·

x

Akx = VΛVTVΛVT · · · VΛVTx = VΛkVTx

Total time to compute p(A)x = c0x+ c1Ax+ c2A2x+ . . .+ ckAkx:

O(k · nnz(A)) ≤ O(k · n2) ≪ O(n3)18

Page 56: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

For general matrix functions:approximate f(x) with low-degree polynomial p(x).

0 0.2 0.4 0.6 0.8 1

0

20

40

60

80

100

f(A)x ≈ p(A)x

How does error in approximating scale function f(·)translate to error on matrix function?

19

Page 57: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

For general matrix functions:approximate f(x) with low-degree polynomial p(x).

0 0.2 0.4 0.6 0.8 1

0

20

40

60

80

100

f(A)x ≈ p(A)x

How does error in approximating scale function f(·)translate to error on matrix function?

19

Page 58: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

For general matrix functions:approximate f(x) with low-degree polynomial p(x).

0 0.2 0.4 0.6 0.8 1

0

20

40

60

80

100

f(A)x ≈ p(A)x

How does error in approximating scale function f(·)translate to error on matrix function?

19

Page 59: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

∥f(A)x− p(A)x∥ ≤ ∥f(A)− p(A)∥ · ∥x∥

≤ ϵ · ∥x∥where

ϵ = maxi=1,...,n

|f(λi)− p(λi)|.

20

Page 60: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

∥f(A)x− p(A)x∥ ≤ ∥f(A)− p(A)∥ · ∥x∥ ≤ ϵ · ∥x∥where

ϵ = maxi=1,...,n

|f(λi)− p(λi)|.

20

Page 61: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

∥f(A)x− p(A)x∥ ≤ ∥f(A)− p(A)∥ · ∥x∥ ≤ ϵ · ∥x∥where

ϵ = maxi=1,...,n

|f(λi)− p(λi)|.

20

Page 62: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

∥f(A)x− p(A)x∥ ≤ ∥f(A)− p(A)∥ · ∥x∥ ≤ ϵ · ∥x∥where

ϵ = maxi=1,...,n

|f(λi)− p(λi)|.

20

Page 63: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial approximation

∥f(A)x− p(A)x∥ ≤ ∥f(A)− p(A)∥ · ∥x∥ ≤ ϵ · ∥x∥where

ϵ = maxi=1,...,n

|f(λi)− p(λi)|.

20

Page 64: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

finding good approximating polynomials

If we know λmin(A) and λmax(A) we can explicitly compute anear optimal polynomial p via Chebyshev interpolation.

δk = minp a degree k polynomial

(max

x∈[λmin(A),λmax(A)]|f(x)− p(x)|

)

Final bound: Output y such that

∥f(A)x− y∥ ≤ O(log k) · δk · ∥x∥.

21

Page 65: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

finding good approximating polynomials

If we know λmin(A) and λmax(A) we can explicitly compute anear optimal polynomial p via Chebyshev interpolation.

δk = minp a degree k polynomial

(max

x∈[λmin(A),λmax(A)]|f(x)− p(x)|

)

Final bound: Output y such that

∥f(A)x− y∥ ≤ O(log k) · δk · ∥x∥.

21

Page 66: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

finding good approximating polynomials

If we know λmin(A) and λmax(A) we can explicitly compute anear optimal polynomial p via Chebyshev interpolation.

δk = minp a degree k polynomial

(max

x∈[λmin(A),λmax(A)]|f(x)− p(x)|

)

Final bound: Output y such that

∥f(A)x− y∥ ≤ O(log k) · δk · ∥x∥.

21

Page 67: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

applications of lanczos

Example bounds:

• Linear systems in O(√

λmax / λmin

)iterations.

• Matrix exponential in O (∥A∥) iterations.• Matrix sign function in O (1/ϵ) iterations.• Top eigenvector in O (log(n)/

√ϵ) iterations.

No one actually uses Chebyshev interpolation!

22

Page 68: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

applications of lanczos

Example bounds:

• Linear systems in O(√

λmax / λmin

)iterations.

• Matrix exponential in O (∥A∥) iterations.• Matrix sign function in O (1/ϵ) iterations.• Top eigenvector in O (log(n)/

√ϵ) iterations.

No one actually uses Chebyshev interpolation!

22

Page 69: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

the lanczos methodfor matrix functions

22

Page 70: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

the lanczos method

Cornelius Lanczos, 1950

• Simple to implement.• No need to know λmin(A) and λmax(A).• Much better convergence in practice (for many reasons).

• Matches optimal uniform approximation up to factor 2.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

23

Page 71: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

the lanczos method

Cornelius Lanczos, 1950

• Simple to implement.• No need to know λmin(A) and λmax(A).• Much better convergence in practice (for many reasons).

• Matches optimal uniform approximation up to factor 2.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

23

Page 72: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

the lanczos method

Cornelius Lanczos, 1950

• Simple to implement.• No need to know λmin(A) and λmax(A).• Much better convergence in practice (for many reasons).• Matches optimal uniform approximation up to factor 2.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

23

Page 73: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

the lanczos method

Cornelius Lanczos, 1950

• Simple to implement.• No need to know λmin(A) and λmax(A).• Much better convergence in practice (for many reasons).• Matches optimal uniform approximation up to factor 2.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.23

Page 74: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Step 1: Form orthogonal matrix Q = [q0,q1, . . . ,qk] that spansthe Krylov subspace

K = {x,Ax,A2x, . . .Akx}.

Step 2: ComputeT = QTAQ

Step 3: Approximate f(A)x by

Qf(T)QTx

24

Page 75: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Step 1: Form orthogonal matrix Q = [q0,q1, . . . ,qk] that spansthe Krylov subspace

K = {x,Ax,A2x, . . .Akx}.

Step 2: ComputeT = QTAQ

Step 3: Approximate f(A)x by

Qf(T)QTx

24

Page 76: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Step 1: Form orthogonal matrix Q = [q0,q1, . . . ,qk] that spansthe Krylov subspace

K = {x,Ax,A2x, . . .Akx}.

Step 2: ComputeT = QTAQ

Step 3: Approximate f(A)x by

Qf(T)QTx

24

Page 77: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Runtime: O(k · nnz(A)) +O(nk2)+O(k3)

Runtime: O(k · nnz(A) + nk)

Reduce the problem to the cost of computing a matrixfunction for a k× k matrix.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

25

Page 78: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Runtime: O(k · nnz(A)) +O(nk2)

+O(k3)

Runtime: O(k · nnz(A) + nk)

Reduce the problem to the cost of computing a matrixfunction for a k× k matrix.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

25

Page 79: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Runtime: O(k · nnz(A)) +O(nk2)

+O(k3)

Runtime: O(k · nnz(A) + nk)

Reduce the problem to the cost of computing a matrixfunction for a k× k matrix.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

25

Page 80: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Runtime: O(k · nnz(A)) +O(nk2)+O(k3)

Runtime: O(k · nnz(A) + nk)

Reduce the problem to the cost of computing a matrixfunction for a k× k matrix.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

25

Page 81: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Runtime: O(k · nnz(A)) +O(nk2)+O(k3)

Runtime: O(k · nnz(A) + nk)

Reduce the problem to the cost of computing a matrixfunction for a k× k matrix.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

25

Page 82: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos method for matrix functions

Runtime: O(k · nnz(A)) +O(nk2)+O(k3)

Runtime: O(k · nnz(A) + nk)

Reduce the problem to the cost of computing a matrixfunction for a k× k matrix.

Final bound: Output y such that ∥f(A)x− y∥ ≤ 2δk · ∥x∥.

25

Page 83: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 84: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 85: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 86: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 87: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 88: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 89: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 90: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

Claim: Lanczos applies degree k polynomials exactly.

Proof:

x,Ax,A2x all lie in the span of Q.

26

Page 91: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥

≤ δk∥x∥

+ 0+ ∥p(T)− f(T)∥ · ∥QTx∥

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 92: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥

≤ δk∥x∥

+ 0+ ∥p(T)− f(T)∥ · ∥QTx∥

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 93: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥

≤ δk∥x∥

+ 0+ ∥p(T)− f(T)∥ · ∥QTx∥

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 94: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥≤ δk∥x∥

+ 0+ ∥p(T)− f(T)∥ · ∥QTx∥

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 95: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥≤ δk∥x∥+ 0

+ ∥p(T)− f(T)∥ · ∥QTx∥

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 96: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥≤ δk∥x∥+ 0+ ∥p(T)− f(T)∥ · ∥QTx∥

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 97: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

quick analysis of lanczos

How about for a general functions f(x)?

Lanczos automatically applies the polynomial “part” of f.(simple application of triangle inequality)

For any degree k polynomial p,

∥f(A)x− Qf(T)QTx∥ ≤ ∥f(A)x− p(A)x∥+ ∥p(A)x− Qp(T)QTx∥+ ∥Qp(T)QTx− Qf(T)QTx∥≤ δk∥x∥+ 0+ δk∥x∥.

Since T = QTAQ, [λmin(T), λmax(T)] ⊆ [λmin(A), λmax(A)].

27

Page 98: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

polynomial methodswith noise

27

Page 99: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions with noise

In many data applications, we do not multiply by A exactly!

Natural model when Lanczos is combined withsuper-scalable randomized methods.

28

Page 100: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions with noise

In many data applications, we do not multiply by A exactly!

Natural model when Lanczos is combined withsuper-scalable randomized methods.

28

Page 101: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions with noise

In many data applications, we do not multiply by A exactly!

Natural model when Lanczos is combined withsuper-scalable randomized methods.

28

Page 102: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions with noise

In many data applications, we do not multiply by A exactly!

Natural model when Lanczos is combined withsuper-scalable randomized methods.

28

Page 103: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions with noise

In many data applications, we do not multiply by A exactly!

Natural model when Lanczos is combined withsuper-scalable randomized methods.

28

Page 104: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix functions with noise

Powerful paradigm:

• A = B−1 for some matrix B.• Apply B−1 to vectors very quickly and approximately.

29

Page 105: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix step function

Fastest algorithms for computing S = stepλ(ATA) actuallycompute step1/2(R) where R = (ATA+ λI)−1ATA.

0 100 200 300 400 500 600 700 800 900 1000−0.5

0

0.5

1

1.5

Spectrum of S

Spectrum of R

i(large σi) (small σi)

Most of the work is computing Rx.

30

Page 106: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

matrix step function

Fastest algorithms for computing S = stepλ(ATA) actuallycompute step1/2(R) where R = (ATA+ λI)−1ATA.

0 100 200 300 400 500 600 700 800 900 1000−0.5

0

0.5

1

1.5

Spectrum of S

Spectrum of R

i(large σi) (small σi)

Most of the work is computing Rx.30

Page 107: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

Rx = (ATA+ λI)−1ATAx is a convex optimization problem.

31

Page 108: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

Lots of recent interest and new algorithms for convex problemson massive datasets (i.e. when A does not fit in memory).

Stochastic IterativeMethods

Randomized Sketching

Runtimes scale roughly as O (nnz(A) · log(1/ϵ)).(for ϵ approximate solution)

32

Page 109: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

Lots of recent interest and new algorithms for convex problemson massive datasets (i.e. when A does not fit in memory).

Stochastic IterativeMethods

Randomized Sketching

Runtimes scale roughly as O (nnz(A) · log(1/ϵ)).(for ϵ approximate solution)

32

Page 110: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

• Faster eigenvector algorithms (in many regimes).• Faster eigenvalue counting algorithms.• Faster log-determinant and matrix norm algorithms.• Faster balanced separator algorithms for graphs (viaLaplacian matrix exponential).

33

Page 111: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

• Faster eigenvector algorithms (in many regimes).• Faster eigenvalue counting algorithms.• Faster log-determinant and matrix norm algorithms.• Faster balanced separator algorithms for graphs (viaLaplacian matrix exponential).

33

Page 112: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

We need to understand how the performance of ouralgorithms change when we replace every matrix-vector

multiplication Ax with an approximate solution.

Are matrix function algorithms stable?

Same stability questions were asked decades ago tounderstand roundoff error when computing Ax!

fl(x ◦ y) = (1± ϵ)(x ◦ y) for ◦ = +,−,×,÷

34

Page 113: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

We need to understand how the performance of ouralgorithms change when we replace every matrix-vector

multiplication Ax with an approximate solution.

Are matrix function algorithms stable?

Same stability questions were asked decades ago tounderstand roundoff error when computing Ax!

fl(x ◦ y) = (1± ϵ)(x ◦ y) for ◦ = +,−,×,÷

34

Page 114: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

We need to understand how the performance of ouralgorithms change when we replace every matrix-vector

multiplication Ax with an approximate solution.

Are matrix function algorithms stable?

Same stability questions were asked decades ago tounderstand roundoff error when computing Ax!

fl(x ◦ y) = (1± ϵ)(x ◦ y) for ◦ = +,−,×,÷

34

Page 115: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

It is very easy to design iterative methods that converge veryslowly when Ax is computed approximately. But the Lanczosmethod (with no modifications) continues to perform well.

Can we explain this phenomena?

35

Page 116: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos and randomized methods

It is very easy to design iterative methods that converge veryslowly when Ax is computed approximately. But the Lanczosmethod (with no modifications) continues to perform well.

Can we explain this phenomena?

35

Page 117: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

How can we apply polynomials in a stable way?

1. Want to compute p(x) = c0 + c1x+ . . .+ ckxk.2. We do not know x, but we have access to a functionapproxMult that for any input z outputs:

approxMult(z) = z · x+ ϵ.

36

Page 118: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

How can we apply polynomials in a stable way?

1. Want to compute p(x) = c0 + c1x+ . . .+ ckxk.2. We do not know x, but we have access to a functionapproxMult that for any input z outputs:

approxMult(z) = z · x+ ϵ.

36

Page 119: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Goal: Compute p(x) = 64x7 − 112x5 + 56x3 − 7x.Using approxMult with ϵ = .05.

Directly compute and sum monomials.

xi = approxMult(approxMult(. . .approxMult(1) . . .))

37

Page 120: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Goal: Compute p(x) = 64x7 − 112x5 + 56x3 − 7x.Using approxMult with ϵ = .05.

Directly compute and sum monomials.

xi = approxMult(approxMult(. . .approxMult(1) . . .))37

Page 121: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Goal: Compute p(x) = 64x7 − 112x5 + 56x3 − 7x.Using approxMult with ϵ = .05.

Factor p(x) = (x− .98)(x− .78) . . . (x− .43).

t1 = (approxMult(1)− .98), t2 = approxMult(t1)− .78 · t1, . . .37

Page 122: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Goal: Compute p(x) = 64x7 − 112x5 + 56x3 − 7x.Using approxMult with ϵ = .05.

Use special recurrence relation for this polynomial.

ti = 2 · approxMult(ti−1)− ti−237

Page 123: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Assume we want to approximate p(x) for x ∈ [−1, 1].Assume |p(x)| ≤ C.

ClaimWe can compute any p(x) to accuracy ϵ · Ck3 if approxMulthas accuracy ϵ.

38

Page 124: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Assume we want to approximate p(x) for x ∈ [−1, 1].Assume |p(x)| ≤ C.

ClaimWe can compute any p(x) to accuracy ϵ · Ck3 if approxMulthas accuracy ϵ.

38

Page 125: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stable polynomial computation

Assume we want to approximate p(x) for x ∈ [−1, 1].Assume |p(x)| ≤ C.

ClaimWe can compute any p(x) to accuracy ϵ · Ck3 if approxMulthas accuracy ϵ. 38

Page 126: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

Compute monomials:

(x+ ϵ1)

Since |x| ≤ 1, error on xi bounded by ϵ1 + ϵ2 + . . .+ ϵ3 ≤ ϵi.

We can then compute p(x) = c0 + c1x+ . . . ckxk up to error:

c1ϵ+ 2 · c2ϵ+ . . .+ k · ckϵ ≤ ϵk ·k∑i=1

|ck|

39

Page 127: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

Compute monomials:

(x (x+ ϵ1) + ϵ2)

Since |x| ≤ 1, error on xi bounded by ϵ1 + ϵ2 + . . .+ ϵ3 ≤ ϵi.

We can then compute p(x) = c0 + c1x+ . . . ckxk up to error:

c1ϵ+ 2 · c2ϵ+ . . .+ k · ckϵ ≤ ϵk ·k∑i=1

|ck|

39

Page 128: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

Compute monomials:

(x (x (x+ ϵ1) + ϵ2) + ϵ3)

Since |x| ≤ 1, error on xi bounded by ϵ1 + ϵ2 + . . .+ ϵ3 ≤ ϵi.

We can then compute p(x) = c0 + c1x+ . . . ckxk up to error:

c1ϵ+ 2 · c2ϵ+ . . .+ k · ckϵ ≤ ϵk ·k∑i=1

|ck|

39

Page 129: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

Compute monomials:

xi + xi−1ϵ1 + xi−2ϵ2 + . . .+ ϵi.

Since |x| ≤ 1, error on xi bounded by ϵ1 + ϵ2 + . . .+ ϵ3 ≤ ϵi.

We can then compute p(x) = c0 + c1x+ . . . ckxk up to error:

c1ϵ+ 2 · c2ϵ+ . . .+ k · ckϵ ≤ ϵk ·k∑i=1

|ck|

39

Page 130: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

Compute monomials:

xi + xi−1ϵ1 + xi−2ϵ2 + . . .+ ϵi.

Since |x| ≤ 1, error on xi bounded by ϵ1 + ϵ2 + . . .+ ϵ3 ≤ ϵi.

We can then compute p(x) = c0 + c1x+ . . . ckxk up to error:

c1ϵ+ 2 · c2ϵ+ . . .+ k · ckϵ ≤ ϵk ·k∑i=1

|ck|

39

Page 131: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

Compute monomials:

xi + xi−1ϵ1 + xi−2ϵ2 + . . .+ ϵi.

Since |x| ≤ 1, error on xi bounded by ϵ1 + ϵ2 + . . .+ ϵ3 ≤ ϵi.

We can then compute p(x) = c0 + c1x+ . . . ckxk up to error:

c1ϵ+ 2 · c2ϵ+ . . .+ k · ckϵ ≤ ϵk ·k∑i=1

|ck|

39

Page 132: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

∑ki=1 |ck| can be far larger than our goal of ϵ · Ck3.

There are polynomials with C = 1 but∑k

i=1 |ck| = O(2k).

Exponential instead of polynomial loss in k.

Runtimes of randomized system solvers depended on log(1/ϵ).

40

Page 133: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

∑ki=1 |ck| can be far larger than our goal of ϵ · Ck3.

There are polynomials with C = 1 but∑k

i=1 |ck| = O(2k).

Exponential instead of polynomial loss in k.

Runtimes of randomized system solvers depended on log(1/ϵ).

40

Page 134: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

∑ki=1 |ck| can be far larger than our goal of ϵ · Ck3.

There are polynomials with C = 1 but∑k

i=1 |ck| = O(2k).

Exponential instead of polynomial loss in k.

Runtimes of randomized system solvers depended on log(1/ϵ).

40

Page 135: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

first attempt

∑ki=1 |ck| can be far larger than our goal of ϵ · Ck3.

There are polynomials with C = 1 but∑k

i=1 |ck| = O(2k).

Exponential instead of polynomial loss in k.

Runtimes of randomized system solvers depended on log(1/ϵ).40

Page 136: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

“bad” polynomials

What are those polynomials?

Chebyshev polynomials of the first kind.

T0(x) = 1T1(x) = xT2(x) = 2x2 − 1

...Tk(x) = 2xTk−1(x)− Tk−2(x)

41

Page 137: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

“bad” polynomials

What are those polynomials?

Chebyshev polynomials of the first kind.

T0(x) = 1T1(x) = xT2(x) = 2x2 − 1

...Tk(x) = 2xTk−1(x)− Tk−2(x)

41

Page 138: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

“bad” polynomials

What are those polynomials?

Chebyshev polynomials of the first kind.

We can apply these in a stable way, using their recurrence!

42

Page 139: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

“bad” polynomials

What are those polynomials?

Chebyshev polynomials of the first kind.

We can apply these in a stable way, using their recurrence! 42

Page 140: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

“good” polynomials?

ti = 2 · approxMult(ti−1)− ti−2

Not hard to show that when computing Tk(x) the error ≤ ϵk2.

43

Page 141: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

“good” polynomials?

ti = 2 · approxMult(ti−1)− ti−2

Not hard to show that when computing Tk(x) the error ≤ ϵk2.

43

Page 142: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

key observation

Chebyshev polynomails are the only hard case.

Property: If a degree k polynomial p(x) is bounded by C on[−1, 1], it can be written as

p(x) = c0T0(x) + c1T1(x) + . . . ckTk(x)

where every ci ≤ C.

Total error of sum p(x) is bounded byC · 12ϵ+ C · 22ϵ+ . . .+ C · k2ϵ ≤ Ck3ϵ.

44

Page 143: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

key observation

Chebyshev polynomails are the only hard case.

Property: If a degree k polynomial p(x) is bounded by C on[−1, 1], it can be written as

p(x) = c0T0(x) + c1T1(x) + . . . ckTk(x)

where every ci ≤ C.

Total error of sum p(x) is bounded byC · 12ϵ+ C · 22ϵ+ . . .+ C · k2ϵ ≤ Ck3ϵ.

44

Page 144: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

key observation

Chebyshev polynomails are the only hard case.

Property: If a degree k polynomial p(x) is bounded by C on[−1, 1], it can be written as

p(x) = c0T0(x) + c1T1(x) + . . . ckTk(x)

where every ci ≤ C.

Total error of sum p(x) is bounded byC · 12ϵ+ C · 22ϵ+ . . .+ C · k2ϵ ≤ Ck3ϵ.

44

Page 145: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Same arguments extends from scalar polynomials to matrixpolynomials.

Framework allows us to analyze Lanczos as well.

Step 1: Lanczos stably applies Chebyshev polynomials(building on results of Paige [‘71, ‘76, ‘80]).

Step 2: By linearity, Lanczos stably applies polynomialsbounded by C.

Step 3: If |f(x)| ≤ C, a good approximating polynomial has|p(x)| ≤ O(C), so Lanczos is stable for bounded functions.

Use Lanczos without fear (on bounded functions)!

45

Page 146: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Same arguments extends from scalar polynomials to matrixpolynomials. Framework allows us to analyze Lanczos as well.

Step 1: Lanczos stably applies Chebyshev polynomials(building on results of Paige [‘71, ‘76, ‘80]).

Step 2: By linearity, Lanczos stably applies polynomialsbounded by C.

Step 3: If |f(x)| ≤ C, a good approximating polynomial has|p(x)| ≤ O(C), so Lanczos is stable for bounded functions.

Use Lanczos without fear (on bounded functions)!

45

Page 147: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Same arguments extends from scalar polynomials to matrixpolynomials. Framework allows us to analyze Lanczos as well.

Step 1: Lanczos stably applies Chebyshev polynomials(building on results of Paige [‘71, ‘76, ‘80]).

Step 2: By linearity, Lanczos stably applies polynomialsbounded by C.

Step 3: If |f(x)| ≤ C, a good approximating polynomial has|p(x)| ≤ O(C), so Lanczos is stable for bounded functions.

Use Lanczos without fear (on bounded functions)!

45

Page 148: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Same arguments extends from scalar polynomials to matrixpolynomials. Framework allows us to analyze Lanczos as well.

Step 1: Lanczos stably applies Chebyshev polynomials(building on results of Paige [‘71, ‘76, ‘80]).

Step 2: By linearity, Lanczos stably applies polynomialsbounded by C.

Step 3: If |f(x)| ≤ C, a good approximating polynomial has|p(x)| ≤ O(C), so Lanczos is stable for bounded functions.

Use Lanczos without fear (on bounded functions)!

45

Page 149: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Same arguments extends from scalar polynomials to matrixpolynomials. Framework allows us to analyze Lanczos as well.

Step 1: Lanczos stably applies Chebyshev polynomials(building on results of Paige [‘71, ‘76, ‘80]).

Step 2: By linearity, Lanczos stably applies polynomialsbounded by C.

Step 3: If |f(x)| ≤ C, a good approximating polynomial has|p(x)| ≤ O(C), so Lanczos is stable for bounded functions.

Use Lanczos without fear (on bounded functions)!

45

Page 150: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Same arguments extends from scalar polynomials to matrixpolynomials. Framework allows us to analyze Lanczos as well.

Step 1: Lanczos stably applies Chebyshev polynomials(building on results of Paige [‘71, ‘76, ‘80]).

Step 2: By linearity, Lanczos stably applies polynomialsbounded by C.

Step 3: If |f(x)| ≤ C, a good approximating polynomial has|p(x)| ≤ O(C), so Lanczos is stable for bounded functions.

Use Lanczos without fear (on bounded functions)!

45

Page 151: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

stability of lanczos

Stochastic IterativeMethods

Randomized Sketching

See paper for applications to step function, matrixexponential, top eigenvector, etc. 46

Page 152: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

full result

Answer to old question on Lanczos in finite precision:

Theorem (Lanczos is stable for any bounded function)If |f(x)| ≤ C for x ∈ [λmin(A), λmax(A)], then if Lanczos is run fork iterations on a computer with O(log(nCκ)) bits of precision,it outputs a vector y such that

∥f(A)x− y∥ ≤ 7k · δk · ∥x∥

where δk is the error of the best degree k uniformapproximation to f.

• Compare to ∥f(A)x− y∥ ≤ 2 · δk · ∥x∥ in exact arithmetic.• Matches known bound for A−1x (Greenbaum, ‘89).

47

Page 153: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

full result

Answer to old question on Lanczos in finite precision:

Theorem (Lanczos is stable for any bounded function)If |f(x)| ≤ C for x ∈ [λmin(A), λmax(A)], then if Lanczos is run fork iterations on a computer with O(log(nCκ)) bits of precision,it outputs a vector y such that

∥f(A)x− y∥ ≤ 7k · δk · ∥x∥

where δk is the error of the best degree k uniformapproximation to f.

• Compare to ∥f(A)x− y∥ ≤ 2 · δk · ∥x∥ in exact arithmetic.

• Matches known bound for A−1x (Greenbaum, ‘89).

47

Page 154: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

full result

Answer to old question on Lanczos in finite precision:

Theorem (Lanczos is stable for any bounded function)If |f(x)| ≤ C for x ∈ [λmin(A), λmax(A)], then if Lanczos is run fork iterations on a computer with O(log(nCκ)) bits of precision,it outputs a vector y such that

∥f(A)x− y∥ ≤ 7k · δk · ∥x∥

where δk is the error of the best degree k uniformapproximation to f.

• Compare to ∥f(A)x− y∥ ≤ 2 · δk · ∥x∥ in exact arithmetic.• Matches known bound for A−1x (Greenbaum, ‘89).

47

Page 155: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

negative result forlinear systems

47

Page 156: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

We proved earlier that Lanczos always matches the bestuniform approximating polynomial for f(x):

For linear systems it actually does better than that.

48

Page 157: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

We proved earlier that Lanczos always matches the bestuniform approximating polynomial for f(x):

For linear systems it actually does better than that.48

Page 158: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

We proved earlier that Lanczos always matches the bestuniform approximating polynomial for f(x):

For linear systems it actually does better than that.48

Page 159: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

• The best uniform approximation to 1/x has degree√λmax / λmin · log(1/ϵ).

• 1/x can be represented exactly by a degree n− 1polynomial if A only has n eigenvalues.

Claim: On exact arithmetic computers, linear systems can besolved in O(nnz(A) · n) time (i.e. n iterations of Lanczos)

Research question: To what extent does this bound hold truein finite precision? Are n logn iterations sufficient? n2?

49

Page 160: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

• The best uniform approximation to 1/x has degree√λmax / λmin · log(1/ϵ).

• 1/x can be represented exactly by a degree n− 1polynomial if A only has n eigenvalues.

Claim: On exact arithmetic computers, linear systems can besolved in O(nnz(A) · n) time (i.e. n iterations of Lanczos)

Research question: To what extent does this bound hold truein finite precision? Are n logn iterations sufficient? n2?

49

Page 161: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

• The best uniform approximation to 1/x has degree√λmax / λmin · log(1/ϵ).

• 1/x can be represented exactly by a degree n− 1polynomial if A only has n eigenvalues.

Claim: On exact arithmetic computers, linear systems can besolved in O(nnz(A) · n) time (i.e. n iterations of Lanczos)

Research question: To what extent does this bound hold truein finite precision? Are n logn iterations sufficient? n2?

49

Page 162: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lanczos for linear systems

• The best uniform approximation to 1/x has degree√λmax / λmin · log(1/ϵ).

• 1/x can be represented exactly by a degree n− 1polynomial if A only has n eigenvalues.

Claim: On exact arithmetic computers, linear systems can besolved in O(nnz(A) · n) time (i.e. n iterations of Lanczos)

Research question: To what extent does this bound hold truein finite precision? Are n logn iterations sufficient? n2?

49

Page 163: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

linear systems in finite precision

Greenbaum (1989): Finite precision Lanczos and conjugategradient match the best polynomial approximating 1/x in tinyintervals around A’s eigenvalues:

η is on the order of machine precision! 50

Page 164: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lower bound

Theorem (Stable polynomial lower bound.)For any n, there is a matrix A ∈ Rn×n with condition numberλmax / λmin such that no k degree polynomial satisfiesGreenbaum’s condition with error ≤ 1/3 for all

k ≤ (λmax / λmin)1/5

even when η ≤ 12n/ logκ .

In other words, we cannot avoid polynomial dependence oncondition number unless we have nearly n bits of precision.

51

Page 165: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lower bound

Theorem (Stable polynomial lower bound.)For any n, there is a matrix A ∈ Rn×n with condition numberλmax / λmin such that no k degree polynomial satisfiesGreenbaum’s condition with error ≤ 1/3 for all

k ≤ (λmax / λmin)1/5

even when η ≤ 12n/ logκ .

In other words, we cannot avoid polynomial dependence oncondition number unless we have nearly n bits of precision.

51

Page 166: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

lower bound

Construction: Eigenvalues roughly uniform on geometric scale.

Proof: Simple potential function argument.

52

Page 167: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

open questions

• Can (λmax / λmin)1/5 be tightened to (λmax / λmin)

1/2

• Does Greenbaum’s estimate fully characterize Lanczos?Can the lower bound be extend to an actual runtime lowerbound?

• How about for a more general class of algorithms? Anymethod accessing A only through noisy matrix-vectorproducts?

53

Page 168: The Lanczos Method in Data Science · 2021. 3. 18. · aremarkablealgorithm TheLanczosMethod Usedforsolvinglinearsystems,eigendecomposition,matrix exponentials,andapproximatinganymatrixfunction.

thank you!

53


Recommended