+ All Categories
Transcript
Page 1: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Lecture 2 Randomized Iterative Methods for LinearSystems

February 21 - 28 2020

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 1 43

1 Pseudoinverse solutions of linear systems

Consider a linear system of equations iℓ

Ax = b A isin Rmtimesn b isin Rm

The system is called consistent if b isin range(A) otherwiseinconsistent

We are interested in the pseudoinverse solution Adaggerb where Adagger

denotes the Moore-Penrose pseudoinverse of A

Ax = b rank(A) Adaggerbconsistent = n unique solutionconsistent lt n unique minimum ℓ2-norm solution

inconsistent = n unique least-squares (LS) solution

inconsistent lt n unique minimum ℓ2-norm LS solution

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 2 43

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 3 43

2 Notation and preliminaries

For any random variable ξ let E983045ξ983046denote its expectation

For an integer m ge 1 let

[m] = 1 2 3 m

For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively

I the identity matrix whose order is clear from the context

For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)

σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0

to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43

For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively

Let I1 I2 Is denote a partition of [m] that is

Ii cap Ij = empty cupsi=1Ii = [m]

Let J1J2 Jt denote a partition of [n] Let

P = I1 I2 Istimes J1J2 Jt

Lemma 1

For any vector u isin Rm and any matrix A isin Rmtimesn it holds

uTAATu le 983042A9830422FuTu

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43

Lemma 2

For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds

uTAATu ge σ2r (A)983042u98304222

Lemma 3

Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds

983056983056983056983056983056

983061Iminus αAAT

983042A9830422F

983062k

u

9830569830569830569830569830562

le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042u9830422

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 2: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

1 Pseudoinverse solutions of linear systems

Consider a linear system of equations iℓ

Ax = b A isin Rmtimesn b isin Rm

The system is called consistent if b isin range(A) otherwiseinconsistent

We are interested in the pseudoinverse solution Adaggerb where Adagger

denotes the Moore-Penrose pseudoinverse of A

Ax = b rank(A) Adaggerbconsistent = n unique solutionconsistent lt n unique minimum ℓ2-norm solution

inconsistent = n unique least-squares (LS) solution

inconsistent lt n unique minimum ℓ2-norm LS solution

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 2 43

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 3 43

2 Notation and preliminaries

For any random variable ξ let E983045ξ983046denote its expectation

For an integer m ge 1 let

[m] = 1 2 3 m

For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively

I the identity matrix whose order is clear from the context

For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)

σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0

to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43

For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively

Let I1 I2 Is denote a partition of [m] that is

Ii cap Ij = empty cupsi=1Ii = [m]

Let J1J2 Jt denote a partition of [n] Let

P = I1 I2 Istimes J1J2 Jt

Lemma 1

For any vector u isin Rm and any matrix A isin Rmtimesn it holds

uTAATu le 983042A9830422FuTu

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43

Lemma 2

For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds

uTAATu ge σ2r (A)983042u98304222

Lemma 3

Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds

983056983056983056983056983056

983061Iminus αAAT

983042A9830422F

983062k

u

9830569830569830569830569830562

le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042u9830422

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 3: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 3 43

2 Notation and preliminaries

For any random variable ξ let E983045ξ983046denote its expectation

For an integer m ge 1 let

[m] = 1 2 3 m

For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively

I the identity matrix whose order is clear from the context

For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)

σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0

to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43

For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively

Let I1 I2 Is denote a partition of [m] that is

Ii cap Ij = empty cupsi=1Ii = [m]

Let J1J2 Jt denote a partition of [n] Let

P = I1 I2 Istimes J1J2 Jt

Lemma 1

For any vector u isin Rm and any matrix A isin Rmtimesn it holds

uTAATu le 983042A9830422FuTu

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43

Lemma 2

For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds

uTAATu ge σ2r (A)983042u98304222

Lemma 3

Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds

983056983056983056983056983056

983061Iminus αAAT

983042A9830422F

983062k

u

9830569830569830569830569830562

le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042u9830422

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 4: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

2 Notation and preliminaries

For any random variable ξ let E983045ξ983046denote its expectation

For an integer m ge 1 let

[m] = 1 2 3 m

For any vector u isin Rm we use uT and 983042u9830422 to denote thetranspose and the Euclidean norm (ℓ2-norm) of u respectively

I the identity matrix whose order is clear from the context

For any matrix A isin Rmtimesn we use AT Adagger 983042A983042F range(A)

σ1(A) ge σ2(A) ge middot middot middot ge σr(A) gt 0

to denote the transpose the Moore-Penrose pseudoinverse theFrobenius norm the column space and all the nonzero singularvalues of A respectively Obviously r is the rank of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 4 43

For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively

Let I1 I2 Is denote a partition of [m] that is

Ii cap Ij = empty cupsi=1Ii = [m]

Let J1J2 Jt denote a partition of [n] Let

P = I1 I2 Istimes J1J2 Jt

Lemma 1

For any vector u isin Rm and any matrix A isin Rmtimesn it holds

uTAATu le 983042A9830422FuTu

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43

Lemma 2

For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds

uTAATu ge σ2r (A)983042u98304222

Lemma 3

Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds

983056983056983056983056983056

983061Iminus αAAT

983042A9830422F

983062k

u

9830569830569830569830569830562

le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042u9830422

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 5: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

For index sets I sube [m] and J sube [n] let AI AJ and AIJdenote the row submatrix indexed by I the column submatrixindexed by J and the submatrix that lies in the rows indexed byI and the columns indexed by J respectively

Let I1 I2 Is denote a partition of [m] that is

Ii cap Ij = empty cupsi=1Ii = [m]

Let J1J2 Jt denote a partition of [n] Let

P = I1 I2 Istimes J1J2 Jt

Lemma 1

For any vector u isin Rm and any matrix A isin Rmtimesn it holds

uTAATu le 983042A9830422FuTu

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 5 43

Lemma 2

For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds

uTAATu ge σ2r (A)983042u98304222

Lemma 3

Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds

983056983056983056983056983056

983061Iminus αAAT

983042A9830422F

983062k

u

9830569830569830569830569830562

le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042u9830422

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 6: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Lemma 2

For any matrix A isin Rmtimesn with rank r and any vector u isin range(A) itholds

uTAATu ge σ2r (A)983042u98304222

Lemma 3

Let α gt 0 and A be any nonzero real matrix For every u isin range(A)it holds

983056983056983056983056983056

983061Iminus αAAT

983042A9830422F

983062k

u

9830569830569830569830569830562

le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042u9830422

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 6 43

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 7: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

3 A doubly stochastic block Gauss-Seidel algorithm [2]

Algorithm 1 Doubly stochastic block Gauss-Seidel (DSBGS)

Let α gt 0 Initialize x0 isin Rn

for k = 1 2 do

Pick (IJ ) isin P with probability983042AIJ 9830422F983042A9830422F

Set xk = xkminus1 minus αIJ (AIJ )T(II)T

983042AIJ 9830422F(Axkminus1 minus b)

Landweber [3] (s = 1 and t = 1)

xk = xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 7 43

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 8: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Randomized Kaczmarz (RK) [7] (s = m t = 1 α = 1)

At step k RK projects xkminus1 onto the hyperplane x | Aix = bi

xk = xkminus1 minus Aixkminus1 minus bi

983042Ai98304222(Ai)

T

where Ai is the ith row of A and bi is the ith component of b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 8 43

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 9: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Randomized Gauss-Seidel [4][5] (s = 1 t = n α = 1)

xk = xkminus1 minus (Aj)T(Axkminus1 minus b)

983042Aj98304222Ij

where Aj is the jth column of A and Ij is the jth column of thentimes n identity matrix I

Doubly stochastic Gauss-Seidel [6] (s = m t = n)

xk = xkminus1 minus αAij(Aix

kminus1 minus bi)

|Aij |2Ij

where Aij is the (i j) entry of A

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 9 43

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 10: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

31 Convergence of the norms of the expectations

Theorem 4

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

wherex0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the solution set

x isin Rn | Ax = b

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 10 43

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 11: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Proof of Theorem 4

Note that the conditioned expectation on xkminus1

E[xk |xkminus1]

= xkminus1 minus αE983063IJ (AIJ )T(II)T

983042AIJ 9830422F

983064(Axkminus1 minus b)

= xkminus1 minus α

983091

983107983131

(IJ )isinP

IJ (AIJ )T(II)T

983042AIJ 9830422F983042AIJ 9830422F983042A9830422F

983092

983108 (Axkminus1 minus b)

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)

Then the conditioned expectation E[xk minus x0983183 |xkminus1] is given by

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 11 43

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 12: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

E[xk minus x0983183 |xkminus1] = E[xk |xkminus1]minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minus b)minus x0

983183

= xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx0

983183)minus x0983183

=

983061Iminus αATA

983042A9830422F

983062(xkminus1 minus x0

983183)

Taking expectation gives

E[xk minus x0983183] = E[E[xk minus x0

983183 |xkminus1]] =

983061Iminus αATA

983042A9830422F

983062E[xkminus1 minus x0

983183]

=

983061Iminus αATA

983042A9830422F

983062k

(x0 minus x0983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 12 43

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 13: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Applying the norms to both sides we obtain

983042E[xk minus x0983183]9830422 le

983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042x0 minus x09831839830422

Here the inequality follows from the fact that

x0 minus x0983183 = AdaggerAx

0 minusAdaggerb isin range(AT)

and Lemma 3

Remark 1

If x0 isin range(AT) then x0983183 = Adaggerb

To ensure convergence of the expected iterate it suffices to have

max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055 lt 1 ie 0 lt α lt2983042A9830422Fσ21(A)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 13 43

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 14: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Theorem 5

Let xk denote the kth iterate of DSBGS applied to the consistent orinconsistent linear system

Ax = b

with arbitrary x0 isin Rn In exact arithmetic it holds

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

where x983183 is any solution of

ATAx = ATb

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 14 43

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 15: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Proof of Theorem 5

Note that the conditioned expectation on xkminus1

E[Axk minusAx983183 |xkminus1]

= A(E[xk |xkminus1]minus x983183)

= A

983061xkminus1 minus α

AT

983042A9830422F(Axkminus1 minus b)minus x983183

983062

= A

983061xkminus1 minus αAT

983042A9830422F(Axkminus1 minusAx983183)minus x983183

983062

(by ATb = ATAx983183)

= Axkminus1 minusAx983183 minusαAAT

983042A9830422F(Axkminus1 minusAx983183)

=

983061Iminus αAAT

983042A9830422F

983062(Axkminus1 minusAx983183)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 15 43

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 16: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Taking expectation gives

E[Axk minusAx983183] = E[E[Axk minusAx983183 |xkminus1]]

=

983061Iminus αAAT

983042A9830422F

983062E[Axkminus1 minusAx983183]

=

983061Iminus αAAT

983042A9830422F

983062k

(Ax0 minusAx983183)

Applying the norms to both sides we obtain

983042E[Axk minusAx983183]9830422 le983061max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

983062k

983042Ax0 minusAx9831839830422

Here the inequality follows from the fact that

Ax0 minusAx983183 isin range(A)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 16 43

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 17: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

32 Convergence of the expected norms

Theorem 6

Let xk denote the kth iterate of DSBGS applied to the full column rankconsistent linear system

Ax = b

with arbitrary x0 isin Rn Assume

0 lt α lt 2t

In exact arithmetic it holds

E[983042xk minusAdaggerb98304222] le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 17 43

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 18: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Proof of Theorem 6

983042xk minusAdaggerb98304222

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minus α

983061IJ (AIJ )T(II)

T

983042AIJ 9830422F

983062A(xkminus1 minusAdaggerb)minusAdaggerb

9830569830569830569830562

2

=

983056983056983056983056xkminus1 minusAdaggerbminus α

983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

9830569830569830569830562

2

= 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATIIAIJ (IJ )TIJ (AIJ )T(II)

TA

983042AIJ 9830424F

983062(xkminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus 2α(xkminus1 minusAdaggerb)T983061IJ (AIJ )T(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

+α2(xkminus1 minusAdaggerb)T983061ATII(II)

TA

983042AIJ 9830422F

983062(xkminus1 minusAdaggerb)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 18 43

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 19: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

The last equality follows from (IJ )TIJ = I and Lemma 1 Takingexpectation gives

E[983042xk minusAdaggerb98304222 |xkminus1]

le 983042xkminus1 minusAdaggerb98304222 minus (2αminus tα2)(xkminus1 minusAdaggerb)T983061ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062983042xkminus1 minusAdaggerb98304222 (by Lemma 2)

Taking expectation again gives

E[983042xk minusAdaggerb98304222] = E[E[983042xk minusAdaggerb98304222 |xkminus1]]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062E[983042xkminus1 minusAdaggerb98304222]

le9830611minus (2αminus tα2)σ2

n(A)

983042A9830422F

983062k

983042x0 minusAdaggerb98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 19 43

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 20: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Remark 2

If t = 1 and x0 isin range(AT) we can show

xk minus x0983183 isin range(AT)

by induction where

x0983183 = (IminusAdaggerA)x0 +Adaggerb

ie the projection of x0 onto the set

x isin Rn | Ax = b

Then for rank deficient consistent linear systems by the sameapproach we can prove the convergence bound

E[983042xk minus x098318398304222] le

9830611minus (2αminus α2)σ2

r (A)

983042A9830422F

983062k

983042x0 minus x098318398304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 20 43

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 21: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Theorem 7

Let xk denote the kth iterate of DSBGS applied to the consistent linearsystem (full column rank or rank-deficient)

Ax = b

with arbitrary x0 isin Rn If t = n and 0 lt α lt 2σ2r (A)983042A9830422F then

E[983042Axk minus b98304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minus b98304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 21 43

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 22: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Proof of Theorem 7

Note that

983042Axk minus b98304222

=

983056983056983056983056Axkminus1 minus α

983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)minus b

9830569830569830569830562

2

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061IIAIJ (IJ )TATAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

If t = n then it follows from

(IJ )TATAIJ = 983042AJ 9830422F

(since AIJ = AJ is a column vector) that

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 22 43

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 23: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

983042Axk minus b98304222

= 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061983042AJ 9830422FII(II)T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le (1 + α2)983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611 + α2 minus 2ασ2

r(A)

983042A9830422F

983062983042Axkminus1 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 23 43

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 24: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 24 43

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 25: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

If t lt n then it follows from

(IJ )TATAIJ = AT

JAJ ≼ ρI

(since ρ = max1lejlet σ21(AJj )) that

983042Axk minus b98304222

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρIIAIJ (AIJ )T(II)

T

983042AIJ 9830424F

983062(Axkminus1 minus b)

le 983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T983061AIJ (AIJ )T(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b)

+α2(Axkminus1 minus b)T983061ρII(II)

T

983042AIJ 9830422F

983062(Axkminus1 minus b) (by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 25 43

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 26: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Taking expectation gives

E[983042Axk minus b98304222 |xkminus1]

le9830611 +

tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222 minus 2α(Axkminus1 minus b)T

983061AAT

983042A9830422F

983062(Axkminus1 minus b)

le9830611minus 2ασ2

r(A)minus tρα2

983042A9830422F

983062983042Axkminus1 minus b98304222

The last inequality follows from Axkminus1 minus b isin range(A) and Lemma 2Taking expectation again gives

E[983042Axk minus b98304222] = E[E[983042Axk minus b98304222 |xkminus1]]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062E[983042Axkminus1 minus b98304222]

le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 26 43

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 27: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Remark 3

Note that(AJ )

Tb = (AJ )TAx983183

where x983183 is any solution of ATAx = ATb For inconsistent linearsystems we can prove DSBGS(1 t) has the convergence bounds Ift = n and 0 lt α lt 2σ2

r (A)983042A9830422F then

E[983042Axk minusAx98318398304222] le9830611 + α2 minus 2ασ2

r (A)

983042A9830422F

983062k

983042Ax0 minus b98304222

If t lt n and 0 lt α lt 2σ2r (A)(tρ) then

E[983042Axk minusAx98318398304222] le9830611minus 2ασ2

r (A)minus tρα2

983042A9830422F

983062k

983042Ax0 minus b98304222

whereρ = max

1lejletσ21(AJj )

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 27 43

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 28: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

4 Randomized extended block Kaczmarz (REBK) [1]

Algorithm 2 Randomized extended block Kaczmarz (REBK)

Let I1 I2 Is and J1J2 Jt be partitions of [m]and [n] respectivelyLet α gt 0 Initialize z0 isin b+ range(A) and x0 isin range(AT)for k = 1 2 do

Pick j isin [t] with probability 983042AJj9830422F983042A9830422FSet zk = zkminus1 minus α

983042AJj9830422FAJj (AJj )

Tzkminus1

Pick i isin [s] with probability 983042AIi9830422F983042A9830422FSet xk = xkminus1 minus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

By choosing s = m t = n and α = 1 we recover the well-knownrandomized extended Kaczmarz (REK) algorithm of Zouzias and Freris[8]

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 28 43

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 29: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Let Ekminus1

983045middot983046denote the conditional expectation conditioned on the first

k minus 1 iterations of REBK That is

Ekminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1

983046

where jl is the lth column block chosen and il is the lth row blockchosen

We denote the conditional expectation conditioned on the first k minus 1iterations and the kth column block chosen as

Eikminus1

983045middot983046= E

983045middot|j1 i1 j2 i2 jkminus1 ikminus1 jk

983046

Then by the law of total expectation we have

Ekminus1

983045middot983046= Ekminus1

983045Eikminus1

983045middot983046983046

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 29 43

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 30: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

41 Convergence of the norms of the expectations

Theorem 8

For any given consistent or inconsistent linear system

Ax = b

let xk be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

It holds

983042E983045xk minusAdaggerb

9830469830422 le δk

983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

where

δ = max1leiler

9830559830559830559830551minusασ2

i (A)

983042A9830422F

983055983055983055983055

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 30 43

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 31: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Proof By ATb = ATAAdaggerb and

xk minusAdaggerb = xkminus1 minusAdaggerbminus α

983042AIi9830422F(AIi)

T(AIixkminus1 minus bIi + zkIi)

we have

Ekminus1

983045xk minusAdaggerb

983046= Ekminus1

983045Eikminus1

983045xk minusAdaggerb

983046983046

= xkminus1 minusAdaggerbminus Ekminus1

983063αAT(Axkminus1 minus b+ zk)

983042A9830422F

983064

= xkminus1 minusAdaggerbminus αATAxkminus1 minusATb

983042A9830422Fminus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422FEkminus1

983045zk

983046

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

AT

983042A9830422F

983061Iminus α

AAT

983042A9830422F

983062zkminus1

=

983061Iminus α

ATA

983042A9830422F

983062(xkminus1 minusAdaggerb)minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422Fzkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 31 43

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 32: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Taking expectation gives

E983045xk minusAdaggerb

983046

= E983045Ekminus1

983045xk minusAdaggerb

983046983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062AT

983042A9830422FE983045zkminus1

983046

=

983061Iminus α

ATA

983042A9830422F

983062E983045xkminus1 minusAdaggerb

983046minus α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

=

983061Iminus α

ATA

983042A9830422F

9830622

E983045xkminus2 minusAdaggerb

983046minus 2α

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F= middot middot middot

=

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 32 43

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 33: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Applying the norms to both sides we obtain

983042E983045xk minusAdaggerb

9830469830422

=

983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)minus αk

983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le983056983056983056983056983056

983061Iminus α

ATA

983042A9830422F

983062k

(x0 minusAdaggerb)

9830569830569830569830569830562

+

983056983056983056983056983056αk983061Iminus α

ATA

983042A9830422F

983062kATz0

983042A9830422F

9830569830569830569830569830562

le δk983061983042x0 minusAdaggerb9830422 +

αk983042ATz09830422983042A9830422F

983062

Here the last inequality follows from the facts that

x0 minusAdaggerb isin range(AT) ATz0 isin range(AT)

and Lemma 3

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 33 43

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 34: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

42 Convergence of E983045983042xk minusAdaggerb98304222

983046

The convergence ofE983045983042xk minusAdaggerb98304222

983046

depends on the positive number ρ defined as

ρ = 1minus (2αminus α2)σ2r (A)

983042A9830422F

In the following lemma we show that the vector zk generated in REBKwith z0 isin b+ range(A) converges to

bperp = (IminusAAdagger)b

which is the orthogonal projection of z0 onto the set z | ATz = 0

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 34 43

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 35: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Lemma 9

For any given consistent or inconsistent linear system

Ax = b

let zk be the vector generated in REBK with

z0 isin b+ range(A)

It holdsE983045983042zk minus bperp98304222

983046le ρk983042z0 minus bperp98304222

Proof By (AJj )Tbperp = 0 we have

zk minus bperp = zkminus1 minus bperp minus α

983042AJj9830422FAJj (AJj )

T(zkminus1 minus bperp) (1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 35 43

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 36: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

By z0 minus bperp = AAdaggerz0 isin range(A) we can show by induction that

zk minus bperp isin range(A)

It follows from (1) that

983042zk minus bperp98304222

= 983042zkminus1 minus bperp98304222 minus2α983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

+α2

983042AJj9830424F(zkminus1 minus bperp)

TAJj (AJj )TAJj (AJj )

T(zkminus1 minus bperp)

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042(AJj )

T(zkminus1 minus bperp)98304222983042AJj9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 36 43

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 37: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Taking the conditioned expectation on the first k minus 1 iterations yields

Ekminus1

983045983042zk minus bperp98304222

983046

le 983042zkminus1 minus bperp98304222 minus(2αminus α2)983042AT(zkminus1 minus bperp)98304222

983042A9830422Fle ρ983042zkminus1 minus bperp98304222 (by Lemma 2)

Taking expectation again gives

E983045983042zk minus bperp98304222

983046= E

983045Ekminus1

983045983042zk minus bperp98304222

983046983046

le ρE983045983042zkminus1 minus bperp98304222

983046

le ρk983042z0 minus bperp98304222

This completes the proof

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 37 43

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 38: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Theorem 10

For any given consistent or inconsistent linear system Ax = b let xk

be the kth iterate of REBK with

z0 isin b+ range(A) and x0 isin range(AT)

For any ε gt 0 it holds

E983045983042xk minusAdaggerb98304222

983046le (1 + ε)kρk

983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Proof Let

983141xk = xkminus1 minus α

983042AIi9830422F(AIi)

TAIi(xkminus1 minusAdaggerb)

which is actually one DSBGS update for the linear systemAx = AAdaggerb from xkminus1

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 38 43

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 39: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

It follows from

xk minus 983141xk =α

983042AIi9830422F(AIi)

T(bIi minusAIiAdaggerbminus zkIi)

that

983042xk minus 983141xk98304222

=α2

983042AIi9830424F(bIi minusAIiA

daggerbminus zkIi)TAIi(AIi)

T(bIi minusAIiAdaggerbminus zkIi

)

leα2983042bIi minusAIiA

daggerbminus zkIi98304222

983042AIi9830422F (by Lemma 1) (2)

It follows from

Ekminus1

983045983042xk minus 983141xk98304222

983046= Ekminus1

983045Eikminus1

983045983042xk minus 983141xk98304222

983046983046

le Ekminus1

983063α2983042bminusAAdaggerbminus zk98304222

983042A9830422F

983064(by (2))

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 39 43

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 40: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

that

E983045983042xk minus 983141xk98304222

983046le α2

983042A9830422FE983045983042bminusAAdaggerbminus zk98304222

983046

le α2ρk

983042A9830422F983042z0 minus bperp98304222 (by Lemma 9) (3)

By x0 isin range(AT) and Adaggerb isin range(AT) we have

x0 minusAdaggerb isin range(AT)

Then we can show that xk minusAdaggerb isin range(AT) by induction By

983042983141xk minusAdaggerb98304222

= 983042xkminus1 minusAdaggerb98304222 minus2α983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

+α2

983042AIi9830424F(xkminus1 minusAdaggerb)T(AIi)

TAIi(AIi)TAIi(x

kminus1 minusAdaggerb)

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042AIi(x

kminus1 minusAdaggerb)98304222983042AIi9830422F

(by Lemma 1)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 40 43

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 41: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

we have

Ekminus1

983045983042983141xk minusAdaggerb98304222

983046

le 983042xkminus1 minusAdaggerb98304222 minus(2αminus α2)983042A(xkminus1 minusAdaggerb)98304222

983042A9830422Fle ρ983042xkminus1 minusAdaggerb98304222 (by Lemma 3)

which yields

E983045983042983141xk minusAdaggerb98304222

983046le ρE

983045983042xkminus1 minusAdaggerb98304222

983046 (4)

Note that for any ε gt 0 we have

983042xk minusAdaggerb98304222= 983042xk minus 983141xk + 983141xk minusAdaggerb98304222le (983042xk minus 983141xk9830422 + 983042983141xk minusAdaggerb9830422)2

le 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222 + 2983042xk minus 983141xk9830422983042983141xk minusAdaggerb9830422

le9830611 +

1

ε

983062983042xk minus 983141xk98304222 + (1 + ε)983042983141xk minusAdaggerb98304222 (5)

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 41 43

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 42: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Combining (3) (4) and (5) yields

E983045983042xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062E983045983042xk minus 983141xk98304222

983046+ (1 + ε)E

983045983042983141xk minusAdaggerb98304222

983046

le9830611 +

1

ε

983062α2ρk

983042A9830422F983042z0 minus bperp98304222 + (1 + ε)ρE

983045983042xkminus1 minusAdaggerb98304222

983046

le9830611 +

1

ε

983062(1 + 1 + ε)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)2ρ2E983045983042xkminus2 minusAdaggerb98304222

983046

le middot middot middot

le9830611 +

1

ε

983062(1 + 1 + ε+ middot middot middot+ (1 + ε)kminus1)α2ρk

983042A9830422F983042z0 minus bperp98304222

+(1 + ε)kρk983042x0 minusAdaggerb98304222

le (1 + ε)kρk983061(1 + ε)α2983042z0 minus bperp98304222

ε2983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 42 43

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 43: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Remark 4

For the case REBK with s = m t = n and α = 1 (ie REK) by theorthogonality

(983141xk minusAdaggerb)T(xk minus 983141xk) = 0

the equation (5) becomes

983042xk minusAdaggerb98304222 = 983042xk minus 983141xk98304222 + 983042983141xk minusAdaggerb98304222

which yields the following convergence for REK

E983045983042xk minusAdaggerb98304222

983046le ρk

983061k983042z0 minus bperp98304222

983042A9830422F+ 983042x0 minusAdaggerb98304222

983062

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43

Page 44: Lecture 2: Randomized Iterative Methods for Linear Systemsmath.xmu.edu.cn/group/nona/damc/Lecture02.pdf · Randomized Iterative Methods Lecture 2 February 21 - 28, 2020 4 / 43. For

Kui Du Wutao Si and Xiaohui Sun

Pseudoinverse-free randomized extended block kaczmarz for solving least squaresarXiv preprint arXiv200104179 2020

Kui Du and Xiaohui Sun

A doubly stochastic block Gauss-Seidel algorithm for solving linear equationsarXiv preprint arXiv191213291 2019

Lawrence H Landweber

An iteration formula for Fredholm integral equations of the first kindAmer J Math 73615ndash624 1951

D Leventhal and A S Lewis

Randomized methods for linear constraints convergence rates and conditioningMath Oper Res 35(3)641ndash654 2010

Anna Ma Deanna Needell and Aaditya Ramdas

Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methodsSIAM J Matrix Anal Appl 36(4)1590ndash1604 2015

Meisam Razaviyayn Mingyi Hong Navid Reyhanian and Zhi-Quan Luo

A linearly convergent doubly stochastic Gauss-Seidel algorithm for solving linear equations and acertain class of over-parameterized optimization problemsMath Program 176(1-2 Ser B)465ndash496 2019

Thomas Strohmer and Roman Vershynin

A randomized Kaczmarz algorithm with exponential convergenceJ Fourier Anal Appl 15(2)262ndash278 2009

Anastasios Zouzias and Nikolaos M Freris

Randomized extended Kaczmarz for solving least squaresSIAM J Matrix Anal Appl 34(2)773ndash793 2013

Randomized Iterative Methods Lecture 2 February 21 - 28 2020 43 43


Top Related