Download - M2AA3 - Notes

M2AA3 Orthogonality

Lectured by John BarrettLyxed by jm407

December 13, 2008

www.ma.ic.ac.uk/~jwb/teaching• 2hr exam in the summer term (4 questions)• 2 small assessed projects (involving computation - MatLab or whatever you

want)• exam 6 : project 1• deadline for the 2 assessed projects

1st project - mid/late november2nd project - first week of spring term

1

Contents

1 Applied Linear Algebra 31.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Inner Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Outer Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.1 Classical Gram-Schmidt Algorithm . . . . . . . . . . . . . . . . . . . 12

1.3 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4 Cauchy-Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.6 Inner Products Revisited and Positive Definite Matrices . . . . . . . . . . . . 361.7 Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2 Least Squares Problems 52

3 Orthogonal Polynomials 58

4 Polynomial Interpolation 65

5 Best Approximation in ‖.‖∞ 77

2

1 Applied Linear Algebra

1.1 Orthogonality

When two vectors are perpendicular to each other

a ∈ Rn ≡ Rn×1

a =

a1

a2...an

n×1

n rows, 1 column ai ∈ R

Transpose of a:aT = (a1 . . . an)

1×n∈ R1×n 1 row, n columns

Given a,b ∈ Rn

1.1.1 Inner Product:

aTb = (a1 . . . an)

b1...bn

1× n n× 1︸︷︷︸

1×1R

=n∑i=1

aibi ∈ R

1.1.2 Outer Product:

abT =

a1...an

(b1 . . . bn)

n× 1 1× n︸︷︷︸n×n

=

a1b1 . . . a1bn

a2b1...

......

anb1 . . . anbn

n×n

Therefore abT ∈ Rn×n (abT

)jk

= ajbk

such that

j = 1→ n

k = 1→ n

3

Useful for some questions on sheet 1

u ∈ Rn

abT

n× 1 1× n︸︷︷︸n×n

un×1

= an×1

(bTu

)1×1

=(bTu

)a

is always a multiple of a ∀u,a,b ∈ Rn

Note:

let A & B be matrices of dimensions p× q, r × s respectively

A ·B = C

with C being a matrix of dimensions p× s

Given a,b ∈ Rn, let

〈a,b〉 = aTb =n∑i=1

aibi

〈. . . 〉 : Rn × Rn → R inner product

〈a,b〉 =n∑i=1

aibi =n∑i=1

biai = 〈b,a〉 ∀a,b ∈ Rn symmetric (1)

the order doesn’t matter

〈a, µb + λc〉 = aT (µb + λc)

=n∑i=1

ai (µbi + λci) (2)

= µ

n∑i=1

aibi + λ

n∑i=1

aici

= µ 〈a,b〉+ λ 〈a, c〉

linear with respect to the 2nd argument ∀a,b, c ∈ Rn and ∀µ, λ ∈ R

(1) + (2)⇒ (3)

4

〈µa + λb, c〉 (1)= 〈c, µa + λb〉(2)= µ 〈c,a〉+ λ 〈c,b〉(1)= µ 〈c,a〉+ λ 〈b, c〉 (3)

linear with respect to the 1st argument

〈a,a〉 = aTa =n∑i=1

a2i ≥ 0

Let ‖a‖ = [〈a,a〉]12 =

(∑ni=1 a

2i

) 12 length or norm of a

‖a‖ ≥ 0= 0

∀a ∈ Rn if and only if a = 0

Recall - Geometric Vectors in R3

see diagram 20081009.M2AA3.1

a = a1i + a2j + a3k

b = b1i + b2j + b3k

1.1.3 Dot (Scalar) Product

a · b = b · a = |a| |b| cos θ

order doesn’t mattera · a = |a|2

therefore|a| = (a · a)

12(

θ = 0cos θ = 1

)


5

⇒ i · i = j · j = k · k = 1 (as i, j,k are unit vectors)

⇒ i · j = j · k = i · k = 0 (as θ =π

2)

Easy to show that

a · (λb + µc) = λa · b + µa · c (4)

Non Trivial Vectors a & b (a 6= 0,b 6= 0)a and b are orthogonal (perpendicular) if and only if their dot product = 0

a · b = 0⇔ cos θ = 0⇔ θ =π

2

a · b = a (b1i + b2j + b3k)(4)= b1a · i + b2a · j + b3a · k= b1a1 + b2a2 + b3a3

therefore

a · b =3∑i=1

aibi

Given

a = a1i + a2j + a3k ≡ a =

a1

a2

a3

∈ R3

a · b =3∑i=1

aibi = aTb ≡ 〈a,b〉

which is the inner product of a & bNon-trivial vectors a & b are orthogonal if and only if (the inner product)

〈a,b〉 = 0

Definition:Dot product = Inner product

6


〈a,b〉 = aTb =n∑i=1

aibi ∀a,b ∈ Rn

Inner product: take two vectors in Rn and spue out a vector in R3 rules

1. isometric, order doesn’t matter

2. linearity (linear combination of inner product see above)

3. again linearity on the other argument

Length/norm as above

Ex.a,b ∈ Rn orthogonal ⇒ ‖a + b‖2 = ‖a‖2 + ‖b‖2

(Generalised Pythagoras)


Proof:

‖a + b‖2 def= 〈a + b,a + b〉(2)= 〈a + b,a〉+ 〈a + b,b〉(3)= 〈a,a〉+ 〈b,a〉+ 〈a,b〉+ 〈b,b〉

def‖.‖= ‖a‖2 + ‖b‖2 + 2 〈a,b〉 = 0

hence result.{qk}nk=1 , qk ∈ Rm, qk 6= 0 k = 1→ n

is ORTHOGONAL if and only if

〈qk,qj〉 = 0 j, k = 1→ n ∧ j 6= k

7

Kronecker delta notation

δjk ={

1 if j = k0 if j 6= k

identity matrix I ∈ Rn×n

I =

1 0 0

0. . . 0

0 0 1

Ijk = δjk j, k = 1→ n (5)

{qk}nk=1 , qk ∈ Rm k = 1→ n

is ORTHONORMAL if and only if

〈qk,qj〉 = δjk j, k = 1→ n

Definition:

i.e. ORTHONORMAL ≡ORTHOGONAL + each vector has unit length

‖qk‖ = [〈qk,qk〉]12 = 1 k = 1→ n

Linearly Independent Vectors

{ak}nk=1 , ak ∈ Rm k = 1→ n

{ak}nk=1 is said to be LINEARLY INDEPENDENT if

n∑k=1

ckak = 0 =⇒ ck = 0 k = 1→ n

(only choice){ak}nk=1 is said to be LINEARLY DEPENDENT if

∃ {ck}nk=1

not all zero such thatn∑k=1

ckak = 0

8

(e.g. if ci 6= 0⇒ ai = −∑n

k=1k 6=i

ckci

ak)

Let A ∈ Rm×n that has {ak}nk=1 as its columns

Am×n

=←n→

(a1,a2, ...,an)∈Rm

Am×n

cn×1

= (a1,a2, ...,an)

c1...cn

=n∑k=1

ckak ∈ Rm

therefore if the only solution to Ac = 0 is c = 0 then {ak}nk=1 is linearly independenthowever if ∃ a non-trivial solution, c 6= 0, then {ak}nk=1 is linearly dependent.

Restrict to the case m = n

A = (a1, ...,an) , ak ∈ Rn k = 1→ n

(a) If A−1 exists, then

Ac = 0 ⇒ A−1Ac = A−10⇒ Ic = 0⇒ c = 0⇒ {ak}nk=1 is lin. ind.

(b) If {ak}nk=1 is lin. ind.

⇒ they form a basis for Rn, i.e. span Rn

⇒ ∀b ∈ Rn ⇒ ∃{ck}nk=1 such that b =n∑k=1

ckak (6)

Is {ck}nk=1 unique?

Assume the contrary

b =n∑k=1

dkak (7)

(6)− (7) = 0 =n∑k=1

(ck − dk)ak

{ak}nk=1 lin. ind.⇒ ck − dk = 0⇒ ck = dk k = 1→ n

therefore the representation of b by {ak}nk=1 is unique

b =n∑k=1

ckak = Ac

9

(linear combination), where

A = (a1, ...,an) ∈ Rn×n

c =

c1...cn

therefore ∀b ∈ Rn, ∃!c ∈ Rn (! = unique) such that Ac = b


Hence (a) & (b) yield for m = n

A−1 existsA

n×n=(a1,...,an)

⇐⇒ {ak}nk=1 lin. indep.

Therefore given

ei =

0...010...0

← iA position ∈ Rn ⇒ ∃ unique si ∈ Rn

i.e. (ei)j = δij j = 1→ n

such thatAsj = ei i = 1→ n

thereforeS = A−1

i.e. A−1 exists

Lemma:

{ak}nk=1 , ak ∈ Rm, ak 6= 0, k = 1→ n

and orthogonal〈aj ,ak〉 = 0 j, k = 1→ n

j 6=k

⇒ {ak}nk=1 linearly independent

⇒ n ≤ m

(Can’t have n > m linearly independent vectors in Rm - recall exchange lemma)

10

Proof: Ifn∑k=1

ckak = 0

⇒ 〈∑n

k=1 ckak,aj〉 = 〈0,aj〉(3)⇒

∑nk=1 ck 〈ak,aj〉

=0 if k 6=j= 0

⇒ ck 〈ak,aj〉 = 0

aj 6= 0⇒ ‖aj‖2 = 〈aj ,aj〉

6= 0

thereforecj = 0

Repeat for j = 1→ ntherefore cj = 0 for j = 1→ ntherefore {ak}nk=1 lin. indep.

orthogonality implies linear independence

therefore non-trivial orthogonal vectors are lin. ind.However, lin. ind. ; orthogonal

Ex.n = m = 2

a1 =(

20

)a2 =

(31

)c1a1 + c2a2 = 0

⇒ 2c1 + 3c2 = 0c2 = 0

}c1 = c2 = 0

therefore {ai}2i=1 lin. ind.〈a1,a2〉 = aT

1 a2 = 6 6= 0

11

1.2 Gram-Schmidt

Given{ai}ni=1 , ai ∈ Rm i = 1→ n, lin. ind.

( ⇒n ≤ m

)j

find{qi}ni=1 , qi ∈ Rm i = 1→ n, ORTHOGONAL

i.e.〈qi,qj〉 = δij i, j = 1→ n

with span {qi}ni=1 = span {ai}ni=1

1.2.1 Classical Gram-Schmidt (CGS) Algorithm

v1 = a1, q1 =v1

‖v1‖for k = 2→ n

vk = ak −k−1∑l=1

〈ak,ql〉ql (8)

qk =vk‖vk‖

Proof:q1 =

v1

‖v1‖=

a1

‖a1‖

{ai}ni=1 lin. ind. ⇒ ai 6= 0 i = 1→ n

therefore

‖a1‖ 6= 0

‖q1‖2 = 〈q1,q1〉 =⟨

a1

‖a1‖,

a1

‖a1‖

⟩=

1‖a1‖2

〈a1,a1〉 = 1

span {a1} = span {q1}

Setv2 = a2 − 〈a2,q1〉q1(8)

⇒ 〈v2,q1〉 = 〈a2 − 〈a2,q1〉q1,q1〉(2)= 〈a2,q1〉 − 〈a2,q1〉〈q1,q1〉︸︷︷︸

1

= 0

12


check

Is v2 = 0?If v2 = 0 =⇒ a2 is a multiple of q1

(so a2 a multiple of a1 which is impossible since they are lin. ind.) contradiction

therefore v2 6= 0therefore q2 = v2

‖v2‖

〈v2,q1〉 = 0⇒ 〈q2,q1〉 =⟨

v2

‖v2‖,q1

⟩= 0

Also

〈q2,q2〉 =⟨

v2

‖v2‖,

v2

‖v2‖

⟩= 1

therefore{qi}2i=1 ORTHOGONAL

v2 is a lin. combination of a2 and q1

so v2 is a lin. combination of a2 and a1

so q2 is a lin. combination of a2 and a1

Similarly a2 is a lin. comb. of q1 and q2

Thereforespan {qi}2i=1 = span {ai}2i=1

Continue by induction

assume that when we’ve done up to k − 1

{qi}k−1i=1 are all ORTHONORMAL

qj = lin. comb. of {ai}ji=1 j = 1→ k − 1

aj = lin. comb. of {qi}ji=1 j = 1→ k − 1

true for k = 2 and 3 from the above

Set

v1 = ak −k−1∑l=1

〈ak,ql〉ql

13

=⇒ 〈vk,qj〉 =(2)〈ak,qj〉 −

k−1∑l=1

〈ak,ql〉〈ql,qj〉 j = 1→ l − 1

= 〈ak,qj〉 − 〈ak,qj〉 = 0

therefore〈vk,qj〉 = 0 j = 1→ k − 1

If vk = 0 this would tell us that ak is a lin. comb. of {ql}k−1l=1 but the inductive hypothesis

said that the q’s can be written like the a’s, therefore it would tell us that ak is a lin. comb.of {al}k−1

l=1

=⇒ contradiction to {ai}ni=1 lin. ind.

thereforeqk =

vk‖vk‖

〈vk,qj〉 = 0 j = 1→ k − 1 ⇒ 〈qk,qj〉 = 0 j = 1→ k − 1Also 〈qk,qk〉 = 1

therefore{qk}ki=1 ORTHONORMAL

vk lin. comb. of ak and {ql}k−1l=1

vk lin. comb. of ak and {al}kl=1

qk lin. comb. of ak and {al}kl=1

therefore qj = lin. comb. of {ai}ji=1 j = 1→ k

Similarlyaj = lin. comb. of {qi}ji=1 j = 1→ k

Ex.n = m = 2

a1 =(

3−4

)a2 =

(12

)clearly lin. ind.

first step:q1 =

a1

‖a1‖first we need to work out the length of a1

‖a1‖ = 〈a1,a1〉 = aT1 a1 = (3)2 + (−4)2 = 25

this implies⇒ ‖a1‖ = 5

14

so q1

⇒ q1 =15

(3−4

)⇒ ‖q1‖ = 1

v2 = a2 − 〈a2,q1〉q1 (9)

first calculate 〈a2,q1〉〈a2,q1〉 = aT

2 q1 =15

(3− 8) = −1

now put that back into (9)

= a2 + q1 =(

12

)+

15

(3−4

)=

15

(86

)so

‖v2‖2 = 〈v2,v2〉 = vT2 v2 =

((85

)2

+(

65

)2)

=10025

= 4

⇒ ‖v2‖ = 2⇒ q2 =v2

‖v2‖=

15

(43

){(

3−4

),

(12

)}CGS→

{15

(3−4

),15

(43

)}

1.3 QR Factorisation

{ai}ni=1 lin. ind. CGS→ {qi}ni=1 ai ∈ Rm i = 1→ n⇒

lin. ind.n≤m

Look at this from a different viewpointLet

A = (a1,a2, ...,an) ∈ Rm×n

Q̂ = (q1,q2, ...,qn) ∈ Rm×n

Let R̂ ∈ Rn×n be the upper trianglar matrix

R̂ =

r11 r12 . . . r1n

r22 . . . r2n. . .

...0 rnn

R̂lk ={rlk if l ≤ k0 if l > k

rlk will be determined later

15

Let e(n)k ∈ Rn ((n) is to stress it is in Rn as opposed to Rm)

e(n)k =

0...010...0

← kth row

(e(n)k

)j

= δjk j, k = 1→ n

Bm×n

e(n)kn×1

= (b1,b2, ...,bn)

0...010...0

= bk ∈ Rm kth column of B

Q̂m×n

R̂n×n

e(n)

n×1︸︷︷︸ = Q̂

r1kr2k...rkk0...0

=

k∑l=1

rlkql (10)

CGS ⇒a1 = v1 = ‖v1‖q1 where r11 = ‖v1‖

ak = vk +k−1∑l=1

〈ak,ql〉ql

= ‖vk‖qk +k−1∑l=1

〈ak,ql〉ql

=k∑l=1

rlkql where rkk = ‖vk‖rlk=〈ak,ql〉 l=1→k−1

therefore

Am×n

e(n)kn×1

= ak =k∑l=1

rlkql(10)= Q̂R̂e(n)

k

16

R̂ is the upper triangular n× n matrix with coefficients as above.

therefore columns of A and Q̂R̂ are the sametherefore A

m×n= Q̂

m×nR̂n×n

Q̂ has orthonormal columnswhereas R̂ is a square matrix, upper triangular where it’s diagonal entries are the lengths ofthe vk (so rkk = ‖vk‖ > 0 k = 1 → n) and since the v’s are non-trivial, R̂ has strictlypositive diagonal entries.

therefore CGS yields a factorisation of A

Am×n

= Q̂m×n

R̂n×n

if m > n A rectangular, Q̂ rectangular, R̂ squarewith n ≤ m REDUCED QR FACTORISATION of A

QR Factorisation of A

Am×n

= Qm×m

Rm×n

where

Q =[Q̂←n→

qn+1 . . .qm←m−n→

]↑m↓

where{qj}mj=n+1

are chosen so that all columns of Q are orthonormal

〈qi,qj〉 =5δij i, j = 1→ m

Rm×n

=[R̂0

]←n→

↑n↓↑

m− n↓

QR =[Q̂qn+1 . . .qm

]m×m

[R̂0

]m×n

= Q̂R̂ = A

Am×n

= Qm×m

Rm×n

Note:

QT

m×mQ

m×m=

qT1...

qTm

[q1 . . .qm]

17

QTQ︸︷︷︸m×m

jk

= qTj qk = 〈qj ,qk〉 = δjk j, k = 1→ n

QTQ = I(m) ∈ Rm×m identity matrix

thereforeQT = Q−1

thereforeQTQ = I(m) = QQT

therefore the columns of Q orthonormal⇔ rows of Q orthonormal (columns of QT orthonor-mal)

Definition: Q ∈ Rm×m is called ORTHOGONAL if QTQ = I(m) = QQT (orthonormalwould be a better name however due to historical reasons it is named orthogonal)

Definition: A ∈ Rm×n and A = QR where Q ∈ Rm×m is orthogonal and R ∈ Rm×nis an upper triangular matrix, then we say that we have a QR factorisation of A

Proposition

Orthogonal matrices preserve length and angleIf Q ∈ Rm×m and QTQ = I(m), then ∀v,w ∈ Rm

〈Qv, Qw〉 = 〈v,w〉 ’angle’ (?)

and‖Qv‖ = ‖v‖ ’length’ (??)

Proof:

〈Qv, Qw〉 = (Qv)TQw

= vTQTQw = vTI(m)w

= vTw = 〈v,w〉

‖Qv‖ = [〈Qv, Qv〉]12 =by the above

[〈v,v〉]12 = ‖v‖

Geometric vectorsab = |a| |b| cos θ

18


One can show, see section 1.4 Cauchy-Schwarz, that

〈v,w〉 = ‖v‖ ‖w‖ cos θ

〈Qv, Qw〉 = ‖Qv‖ ‖Qw‖ cosφ=

(??)‖v‖ ‖w‖ cosφ

But

(?) ⇒ cos θ = cosφ⇒ θ = φ as θ, φ ∈ [0, π]

Proposition

Q1, Q2 ∈ Rm×m, orthogonal i.e. QT1 Q1 = I(m) = QT

2 Q2 then Q1Q2 ∈ Rm×m is orthogonal.

Proof:

(Q1Q2)TQ1Q2 = QT

2 QT1 Q1Q2 = QT

2 I(m)Q2 = QT2 Q2

= I(m)

therefore Q1Q2 is orthogonal.

Ex: (rotation matrices) m = 2

Q =(

cos θ − sin θsin θ cos θ

)= (q1,q2)

therefire〈qi,qj〉 = δij i, j = 1→ 2

therefore Q is orthogonal as its columns are orthonormalQ represents rotation through an angle θ(

xy

)= l

(cosφsinφ

)

l =(x2 + y2

) 12(

ab

)= Q

(xy

)

19

Transform (xy

)→(l0

)choose

θ = −φ

therefore

cos θ = cos (−φ) = cosφ =x

l

sin θ = sin (−φ) = − sinφ = −yl

therefore

Q =(

xl

yl

−yl

xl

)where

l =(x2 + y2

) 12

therefore the rotation matrix

Q =1

(x2 + y2)12

(x y−y x

)(orthogonal)

takes (xy

)→

( (x2 + y2

) 12

0

)for 0 ≤ p < q ≤ m introduce Gpq (θ) ∈ Rm×m

Gpq (θ) =

1 0

0. . .

1cos θ sin θ ← p

1. . .

1sin θ cos θ ← q

1. . .

0 ↑p

↑q

1

=

20

=

e(m)j =

1...010...0

← jth position of j 6= p ∨ q

0...0

cos θ ← pth

0...0

sin θ ← qth

0...0

position if j = p

0...0

− sin θ ← pth

0...0

cos θ ← qth

0...0

position if j = q

each column of Gpq (θ) has unit length, the columns are also orthogonal

therefore the columns of Gpq (θ) are orthonormal.

⇒ Gpq (θ) ∈ Rm×m an orthogonal matrix

a ∈ Rm

21

Gpq (θ)a = b ⇒ bj = aj if j 6= p ∨ qbp = cos θap − sin θaqbq = sin θap + cos θaq

SimilarlyGpq (θ)m×m

Am×n

= Bm×n

All rows of B are the same as A except rows p and qGpq (θ) are called GIVENS Rotation Matrices (circa. 1950)

Obtain a QR factorisation of A using a sequence of Givens rotations (an alternative pro-cedure: Householder reflections)

Ex: m = 3, n = 2

A =

3 654 012 13

take a sequence of Givens Rotation so that

A→

X X0 X0 0

= R

Choose G12 (θ) such that

G12 (θ)A =

X X0 X12 13←

last row not affected

G12 (θ) =

cos θ − sin θ 0sin θ cos θ 0

0 0 1

G12 (θ)

3 654 012 13

x = 3, y = 4, l = 5

15

(3 4−4 3

)(34

)=(

50

)therefore choose

G12 (θ) =

34

45 0

−45

35 0

0 0 1

A(1) = G12 (θ)A =

34

45 0

−45

35 0

0 0 1

3 654 012 13

=

5 390 −5212 13

22

use a rotation matrix to obtain 0 in row 3, column 1choose either G13 (φ) or G23 (φ)?choose G13 (φ) as G23 (φ) will affect row 2, column 1 which would be counter productive

G13 (φ) =

cosφ 0 − sinφ0 1 0

sinφ 0 cosφ

choose φ on x = 5 and y = 12⇒ l = 13

⇒ G13 (φ) =

513 0 12

130 1 0−12

13 0 513

A(2) = G13 (φ)A(1) =

513 0 12

130 1 0−12

13 0 513

5 390 −5212 13

=

13 270 −520 −31

now use G13 (ψ) or G23 (ψ)? G13 (ψ) would mess up the 0 in 3,1, therefore use G23 (ψ)

x = −52, y = −31⇒ l =√

3665

A(3) = G23 (ψ)A(2) =

13 270√

36650 0

= R upper triangular

with strictly positive diagonal entries.

Note:Gpq (.) makes the (q, p)th element in the current A zero.

Therefore

R = A(3) = G23 (ψ)A(2)

= G23 (ψ)G13 (φ)A(1)

= G23 (ψ)G13 (φ)G12 (θ)︸︷︷︸G

A

G is a product of Givens rotationseach Gpq (.) is orthogonaltherefore G is orthogonal (a product of orthogonal matrices)

GTG = I = GGT

23

therefore

GA = R

⇒ GTGA = GTR

⇒ A = QR

whereQ = GT

Note:QTQ =

(GT)T

GT = GGT = I

therefore Q orthogonaltherefore it is a QR Factorisation of A

General A ∈ Rm×n with m ≥ nApply a sequence of Givens Rotations to take A to R ∈ Rm×n upper triangular with strictlypositive diagonal entries

GA = R

whereG = Gnm . . .Gnn+1︸︷︷︸

column n

. . .G2m . . .G23︸︷︷︸column 2

G1m . . .G12︸︷︷︸column 1

Gpq makes (q, p)th element zeroif y = 0, then Gpq = I Gpq ∈ Rm×mLet Q = GT ∈ Rm×m ⇒ Q is orthogonal

GA = R⇒ A = QR (QR factorisation)

We might be interested in solving

A√

m×nx?

n×1= b

√

m×1(m ≥ n)

Apply G ∈ Rm×m to Ax = b

GAx = Gb

⇒ Rm×n

xn×1

= c ∈ Rm×1 (equiv. system to Ax = b)


24

If m > n and if ci 6= 0 for some i = n + 1 → m, there is no solution to Rx = c(≡there isno solution to Ax = b)INCONSISTENT SYSTEM (Return to this later in the course)

Otherwise i.e. ci = 0 i = n+ 1→ m

∃!x ∈ Rn!=unique

such that Ax = b (Rx = c)

Solve by backward substitution

xn =cnrnn

xi =

(ci −

∑nj=i+1 rijxj

)rii

i = n− 1, n− 2, . . . , 2, 1

This is all that’s needed to do questions on sheet 1.

1.4 Cauchy-Schwartz Inequality

For geometric vectors in R3

a · b = |a| |b| cos θ


Generalises to Rn

〈a,b〉 = aTb = ‖a‖ ‖b‖ cos θ⇐⇒ |〈a,b〉| = ‖a‖ ‖b‖ |cos θ| ≤ ‖a‖ ‖b‖

Theorem (Cauchy-Schwarz Inequality)

If you take any vector a,b ∈ Rn, then

|〈a,b〉| ≤ ‖a‖ ‖b‖

with equality if and only if a and b are linearly dependent.

Proof:If a = 0, then 〈a,b〉 = 0 and ‖a‖ = 0 so result is trivial

25

If a 6= 0, then q = a‖a‖ ⇒ 〈q,q〉 = ‖q‖2 = 1

let

c = b− 〈b,q〉q⇒ 〈c,q〉 = 〈b− 〈k,q〉q,q〉

=(3)〈b,q〉 − 〈b,q〉〈q,q〉︸︷︷︸

1

= 0


0 ≤ ‖c‖2 = 〈c, c〉 = 〈c,b− 〈b,q〉q〉=(2)〈c,b〉 − 〈b,q〉〈c,q〉

= 〈c,b〉=(3)〈b− 〈b,q〉q,b〉

= 〈b,b〉 − 〈b,q〉〈q,b〉= ‖b‖2 − [〈q,b〉]2

∴ [〈q,b〉]2 ≤ ‖b‖2

∴ [〈a,b〉]2 ≤ ‖a‖2 ‖b‖2

Taking square root ⇒ desired result

Equality if and only if c = 0i.e. b a multiple of qi.e. b a multiple of ai.e. a and b lin. dep.

1.5 Gradients and Hessians

f : R→ R, f (x)

one independent variable


Taylor Series

f (a+ h) = f (a) + hf ′ (a) +h2

2f ′′ (a) +

O(h3)↑R , where |R| ≤ Ch3, write O

(h3)

26

We want to generalise this to functions of n independent variables

f : Rn → R f (x1, x2, . . . xn)

Write f (x) where x =

x1...xn

∈ RnPartial derivative of f with respect to xi, write as δf

δxi(x);

(differentiate f with respect to x; holding x1, . . . xi−1, xi+1, . . . , xn as constants)

Ex. n = 2, f (x1, x2), x =(x1

x2

)∈ R2

f (x) = sinx1 sinx2

δf

δx1(x) = cosx1 sinx2

δf

δx2(x) = sinx1 cosx2

δ2f

δxiδxj=

δ

δxi

[δf

δxj

](?)=

δ

δxj

[δf

δxi

]=

δ2f

δxjδxii, j = 1→ n

(?)← if both derivatives exist and are continuous

Ex.

δ2f

δx2δx1(x) =

δ

δx2

[δf

δx1(x)]

= cosx1 cosx2

‖δ2f

δx1δx2(x) =

δ

δx1

[δf

δx2(x)]

= cosx1 cosx2

δ2f

δx21

=δ

δx1

[δf

δx1(x)]

= − sinx1 sinx2

δ2f

δx22

=δ

δx2

[δf

δx2(x)]

= − sinx1 sinx2

Chain Rule

(n = 1)f : R→ R, f (x)

Change variables t = t (x)⇐⇒ x = x (t)e.g.

x (t) = t2 ⇐⇒ t (x) = x12

27

Let w (t) = f (x (t))dw

dt(t) =

df

dx(x (t))

dx

dt(t)

Extend this to

f : Rn → R, f (x) , x =

x1...xn

∈ RnExample:

x (t) =√

a + t

√

h (√

= fixed)


⇒ xi (t) = ai + thi i = 1→ n

In generalf (x) , x (t)

Let w (t) = f (x (t))

dw

dt(t) =

δf

δx1(x (t))

dx1

dt(t) + . . .

δf

δxn(x (t))

dxndt

(t)

dw

dt(t) =

n∑i=1

δf

δxi(x (t))

dxidt

(t) (11)

Ex. n = 2, f (x1, x2) , x =(x1

x2

)∈ R2

f (x) = sinx1 sinx2

x1 (t) = t2, x2 (t) = cos t

⇒ w (t) = f (x (t)) =u

sin t2v

sin (cos t)dw

dt(t) = cos t2sin (cos t) 2t︸︷︷︸

vu′

+ sin t2u

cos (cos t) (− sin t)︸︷︷︸v′

δf

δx1(x (t))

dx1

dt(t) +

δf

δx2(x (t))

dx2

dt(t)

28

going back to example

f : Rn → R, f (x) , x =

x1...xn

∈ Rnx (t) = a + th


⇒ xi (t) = ai + thi

⇒ dxidt (t) = hi i = 1→ n

Letw (t) = f (x (t)) = f (a + th) (12)

=⇒


Taylor series for w (t)⇒

w (1) = w (0) + 1 · w′ (0) +1212w′′ (0) + . . .

= w (0) + w′ (0) +12w′′ (0)

(11)⇒(12)

f (a + h) = f (a) +n∑

9=1

δf

δxi(a)hi + . . . (13)

29

(11)

dw

dt(t) =

n∑i=1

hiδ

δxif (x (t))

∴d

dt≡

n∑i=1

hiδ

δxi

⇒(d

dt

)m≡

(n∑i=1

hiδ

δxi

)m

∴d2w

dt2(t) =

n∑j=1

hjδ

δxj

( n∑i=1

hiδ

δxi

)f (x (t))

=n∑j=1

n∑i=1

hjhiδ2f

δxjδxi(x (t))

⇒ w′′ (0) =n∑i=1

n∑j=1

hihjδ2f

δxjδxi(a)

inserting this into (13)

=⇒ f (a + h) = f (a) +n∑i=1

hiδf

δxi(a) +

12

n∑i=1

n∑j=1

hihjδ2f

δxjδxi(a) +O

(‖h‖3

)compare this to (11).

We introduce the GRADIENT of f (gard f - vector of first order partial derivatives)

∇f (x) ∈ Rn

∇f (x) =

δfδx1

(x)...

δfδxn

(x)

i.e.

[∇f (x)]i =δf

δxi(x) i = 1→ n

Introduce the HESSIAN of f (matrix of second derivatives)

D2f (x) ∈ Rn×n[D2f (x)

]ij =

δ2f

δxiδxj(x) i, j = 1→ n

"smooth" f ⇒ D2f (x) is symmetric

30

n = 2

D2f (x) =

δ2fδx2

1(x) δ2f

δx1δx2(x)

↙ δ2fδx2

2(x)

A ∈ Rn×n Aij

An×n

xn×1∈ Rn [Ax]i =

n∑j=1

Aijxj

xT

1×nAn×n

xn×1

= xT (Ax) =n∑i=1

xi (Ax)i

=n∑i=1

n∑j=1

xiAijxj

∴ f (a + h) = f (a) + hT∇f (a) +12hTD2f (a)h +O

(‖h‖3

)

Ex.f (x) = xTAx ∀x ∈ Rn

where A ∈ Rn×n and is symmetric.∴ f : Rn → Rfind (i) ∇f (x) (the gradient of f) (ii) D2f (x) (the hessian of f)

(i)

f (x) = xTAx =n∑i=1

n∑j=1

Aijxixj

[∇f (x)]p =δf

δxp(x) =

n∑i=1

n∑j=1

Aijδ

δxp(xixj)

δ

δxp(xixj) =

δxiδxp

xj + xiδ

δxpxj

x1, . . . , xn are independent variables

⇒ δxiδxp

= δip i, p = 1→ n

31

∴ [∇f (x)]p =n∑i=1

n∑j=1

Aij (δipxj + xiδjp)

=n∑j=1

Apjxj +

(AT)pi

‖n∑i=1

Aipxi = [Ax]p +[ATx

]p

⇒ ∇f (x) = Ax +ATx

= 2Ax if AT = A

(ii)[D2f (x)

]qp

= δ2fδxqδxp

(x)

we know thatδf

δxp(x) =

n∑j=1

Apjxj +n∑i=1

(AT)pixi

⇒ δ2f

δxqδxp(x) =

n∑j=1

Apjδjq +n∑i=1

(AT)piδiq

= Apq +(AT)pq

Note:

δ = kronecker deltaδf

δx= partial derivative

∴ D2f (x) = A+AT

= 2A if AT = A

∴ f (x) = xTAxf : Rn → R

∇f (x) = 2Ax ∈ Rn

D2f (x) = 2A ∈ Rn×n

Analogue of f (x) = ax2 a ∈ Rf : R→ R f ′ (x) = 2ax f ′′ (x) = 2a

32

Definition:f : Rn → R

f (x) has a LOCAL MINIMUM at x = aif ∀u ∈ Rn, ‖u‖ = 1,∃ε > 0 such that

f (a + hu) ≥(≤)

f (a) ∀h ∈ [0, ε]


n = 1, f : R→ Ru


Reminder: Taylor Series

f (a + h) = f (a) + hT∇f (a) +12hTD2f (a)h +O

(‖h‖3

)(14)

Proposition

Let n = 1. Then f ′ (a) = 0 and f ′′ (a) > 0<0

are sufficient conditions for f to have a local

minimummaximum

at x = a.

Proof: n = 1⇒ u = ±1


f (a± h) = f (a)± hf ′ (a) +12

(±h)2 f ′′ (a) +O(h3)

= f (a) +12h2f ′′ (a)︸︷︷︸

>0

+O(h3)

︸︷︷︸≥0 for h sufficiently small

as f ′ (a) = 0

≥(≤)

f (a) for h sufficiently small ⇒ x = a is a local minimum(maximum)

of f

33

Proposition

If ∇f (a) 6= 0, then f (x) does not have a local minimum or maximum at x = a

Proof: Put h = hu, ‖u‖ = 1, in (14)

⇒ f (a + hu) = f (a) + huT∇f (a) +O(h2)

for h ≥ 0

∇f (a) 6= 0

letu = ± ∇f (a)

‖∇f (a)‖⇒ ‖u‖ = 1

∴ f

(a± h ∇f (a)

‖∇f (a)‖

)= f (a) +

h

‖∇f (a)‖‖∇f (a)‖2 +O

(h2)

= f (a)± h‖∇f (a)‖︸︷︷︸>0

+O(h2)

>(<)

f (a) for h sufficiently small, no local min or max

∴ ∇f (a) = 0 is a necessary condition for f (x) to have a local minimum or maximum atx = a. Points a where ∇f (a) are called stationary points of f (x).

Proposition

If ∇f (a) = 0 andwTD2f (a)w >

(<)0 ∀w ∈ Rn

and w 6= 0then f (x) has a local minimum

maximumat x = a.

Proof: h = hu, ‖u‖ = 1, in (14)

⇒ f (a + hu) = f (a) + huT∇f (a)︸︷︷︸=0as ∇f(a)=0

+12h2uTD2f (a)u +O

(h3)

︸︷︷︸≥(≤)

0 for h suff. small if w2D2f(a)w >0(<)w 6=0

≥(≤)

f (a) for h suff. small ⇒ local minmax

Ex.

n = 2, x =(x1

x2

)∈ R2

f (x) =(x2

1 − 2x1 + 1)

+(x2

2 − 2x2 + 1)

f : R2 → R

34

∇f (x) =

[δfδx1

(x)δfδx2

(x)

]=[

2x1 − 22x2 − 2

]= 2

(x1 − 1x2 − 1

)∈ R2

∇f (a) = 0⇔ a =(

11

)only one stationary point

Find

D2f (x) =

δ2fδx2

1(x) δ2f

δx1δx2(x)

↙ δ2fδx2

2(x)

=[

2 00 2

]= 2I ∈ R2×2

wTD2f (a)w = 2wTw = 2 ‖w‖2 > 0

∴ a =(

11

)is a local minimum (also a global minimum as it’s the only stationary point).

[Obvious as f (x) = (x1 − 1)2 + (x2 − 1)2]

Definition

A ∈ Rn×n is called positive definite if

xTAx > 0 ∀x ∈ Rn, x 6= 0

or negative definite ifxTAx < 0 ∀x ∈ Rn, x 6= 0

or non-negative definite ifxTAx ≥ 0 ∀x ∈ Rn

or non-positive definite ifxTAx ≤ 0 ∀x ∈ Rn

Example:

n = 2, A =(

1 −1−1 1

), x =

(x1

x2

)∈ R2

xTAx =2∑i=1

2∑j=1

Aijxixj = x21 + x2

2 − 2x1x2

= (x1 − x2)2 ≥ 0 ∀x ∈ R2

∴ A is non-negative definite but not positive definite.

e.g. a =(

11

)aTAa = 0

Using these definitions, we can rewrite the above proposition

35

Proposition

If ∇f (a) = 0 and D2f (a) is{

positivenegative

}definite, then f (x) has a local

{minimummaximum

}at x = a.

1.6 Inner Products Revisited and Positive Definite Matrices

Let A ∈ Rn×n be symmetric(AT = A

)and positive definite

(xTAx > 0 ∀x ∈ Rn, x 6= 0

)Generalise the idea of an inner product by defining

〈u,v〉A = uTAv ∀u,v ∈ Rn

(perviously 〈u,v〉 ≡ 〈u,v〉I = uTIv = uTv)

Make sure the properties of the inner product still hold with this new definition

〈., .〉A : Rn × Rn → R

〈v,u〉A = vTAu =(vTAu

)T= uTATv AT=A= uTAv

= 〈u,v〉A∴ 〈v,u〉A = 〈u,v〉A ∀u,v ∈ Rn (symmetric)Easy to show that

〈u, αv + βw〉A = α 〈u,v〉A + β 〈u,w〉A〈αu + βv,w〉A = α 〈u,w〉A + β 〈v,w〉A

}∀u,v,w ∈ Rn, ∀α, β ∈ R

Introduce the idea of a generalised norm (length) by defining

‖u‖A = [〈u,u〉A]12 ∀u ∈ Rn

Note:

〈u,u〉A = uTAu > 0 if u 6= 0= 0 if and only if u = 0

∴ ‖u‖A ≥ 0 ∀u ∈ Rn

‖u‖A = 0 if and only if u = 0

A key property of positive definite matrices is that they are invertiblei.e.

Ax = 0⇒ xTAx = 0⇒ x = 0

⇒columns of A are linearly independent ⇒ A−1 exists.

36

Theorem (Generalised Cauchy-Schwarz Inequality)

If A ∈ Rn×n is symmetric positive definite, then

|〈u,v〉A| ≤ ‖u‖A ‖v‖A ∀u,v ∈ Rn

with equality if and only if u and v are linearly dependent.

Proof: Simply replace 〈., .〉 by 〈., .〉A and ‖.‖ by ‖.‖A in the original proof.

It is easy to generate symmetric positive definite matrices.Given P ∈ Rn×n, which is invertible i.e. P−1 exists, then A = PTP is symmetric positivedefinite.Check:

A ∈ Rn×n√

AT =(PTP

)T= PT

(PT)T

= PTP = A√

any x ∈ Rn xTAx = xTPTPx = (Px)T (Px) = ‖Px‖2 ≥ 0

Note‖Px‖ = 0⇔ Px = 0⇔ x = 0 as P−1 exists.

√

∴ A is positive definite.

We now prove the reverse implication.

Theorem

Let A ∈ Rn×n be any symmetric positive definite matrix.Then ∃ an invertible P ∈ Rn×n such that A = PTP .Furthermore we can choose P to be upper triangular with Pii > 0 i = 1 → n (diagonalentries stricly positive) in which case we say that A = PTP is a Cholesky Decomposi-tion/Factorisation of A.

Proof:

Let {v}ni=1 be any n linearly independent vectors in Rn.Using the inner product 〈., .〉A induced by A, 〈a,b〉A = aTAb, we apply the Classical Gram-Schmidt process to {vi}ni=1.

u1 =v1

‖v1‖A⇒ ‖u1‖A = 1

wi = vi −i−1∑j=1

〈vi,uj〉A uj ⇒ 〈wi,wj〉 = 0 j = 1→ i− 1, i = 2→ n

ui =wi

‖wi‖A⇒ ‖ui‖A = 1 i = 2→ n

37

⇒ ui is a linearl combination of {vj}ij=1 , i = 1→ n and

〈ui,uj〉A = δij i, j = 1→ n

Let

U = [u1,u2, . . . ,un] ∈ Rn×n

⇒ AU = [Au1, Au2, . . . , Aun] ∈ Rn×n

UTAU ∈ Rn×n

UT =

uT

1

uT2...

uTn

[UTAU

]ij

= uTi↑

ith row of UT

· Auj↑

jth col. of AU

= 〈ui,uj〉A = δij i, j = 1→ n

∴ UTAU = I(n)

∴ U−1 = UTA exists⇒

(U−1

)T =(UTA

)T= ATU = AU

LetP = U−1 ∈ Rn×n P−1 = U exists

PTP =(U−1

)TU−1 = AUU−1 = A

√

To show that we can choose P upper triangular with Pii > 0, i = 1 → n, we chooseparticular {vi}ni=1

Let vi = e(n)i =

0...010...0

← ith position ∈ Rn

(e(n)i

)j

= δij i, j = 1→ n

u1 is a multiple of e1 =

×0...0

ui is a linearl combination of

{e(n)j

}ij=1

, i = 2→ n

∴ ui ∈ Rn with (ui)k = 0 if k > i i = 1→ n

⇒ U = [u1,u2, . . . ,un] ∈ Rn×n upper triangular

38

we now show that (ui)j > 0 i = 1→ n

u1 =e(n)

1∥∥∥e(n)1

∥∥∥A

=

×0...0

↖ strictly positive

ui =wi

‖wi‖A∴ (ui)i > 0 if and only if (wi)i > 0 i = 2→ n

wi = e(n)i −

i−1∑j=1

⟨e(n)i ,uj

⟩A

uj

(uj)k = 0 if k > j j = 1→ n

⇒ (wi)i =(e(n)i

)i= 1 > 0

∴ U ∈ Rn×n is upper triangular with Uii = (ui)i > 0 i = 1→ nFind

P = U−1 = [p1,p2, . . . ,pn]

UP = I(n) =[e(n)

1 , e(n)2 , . . . , e(n)

n

]i.e.

[Up1 , Up2 , Up3 , . . . ] =[e(n)

1 , e(n)2 , . . .

]i.e.

Upi = e(n)i i = 1→ n (15)

solve by backwards substitution∴ (pi)n = (pi)n−1 = · · · (pi)i+1 = 0i.e.

(pi)k = 0 for k > i i = 1→ n

ith row of (15)

Uii (pi)i +n∑

k=i+1

Uik (pi)k︸︷︷︸=0 as(pi)k=0 for k>i

=(e(n)i

)i= 1

∴ (pi)i = 1Uii

> 0 i = 1→ n∴ P is upper triangular with Pii = (pi)i > 0 i = 1→ n

Proposition

A ∈ Rn×n symmetric positive definite⇒ Akk > 0 k = 1→ n and

|Ajk| < (Ajj)12 (Akk)

12 j, k = 1→ n j 6= k

39

Proof: From the above theorem A = PTP , P ∈ Rn×n, P−1 existsLet

P {p1,p2, . . . ,pn} pi ∈ Rn P−1 exists⇒ {pi}ni=1 lin. indep.

A = PTP =

pT

1

pT2...

pTn

[p1 p2 . . . pn]

∴ Ajk = pTj pk j, k = 1→ n

∴ Akk = pTk pk = ‖pk‖2 > 0 k = 1→ n as pk 6= 0

|Ajk| =∣∣pTj pk

∣∣ = |〈pj ,pk〉| <cauchy schwarz inequality

‖pj‖ ‖pk‖ j, k = 1→ n j 6= k

it is a strict inequality as {pi}ni=1 lin. ind.

Using the result ‖p‖k = (Akk)12 k = 1→ n

⇒ |Ajk| < (Ajj)12 (Akk)

12 j, k = 1→ n j 6= k

Compute a Cholesky Decomposition of A.Let L = PT. Find a lower triangular matrix L ∈ Rn×n with Lii > 0 i = 1→ n such thatA = LLT.Could compute L = PT, where P = U−1 and U = [u1 . . . un] with{

e(n)i

}ni=1

CGS→〈.,.〉A

{ui}ni=1

there is, however, an easier way.

Let L = [l1 l2 . . . ln] li ∈ Rn(lower triangular and Lii > 0 i = 1→ n)

L =

× 0... ×...

.... . .

......

. . .× × ×

A = LLT

Aij =n∑k=1

Lik(LT)kj

=n∑k=1

LikLjk =n∑k=1

(lk)i (lk)j

40

Note:lkn×1

lTk1×n∈ Rn×n ⇒

(lklTk

)ij

= (lk)i (lk)j

∴ Aij =n∑k=1

(lklTk

)ij

⇒

A =n∑k=1

lklTk

Example

n = 3Find the Cholesky Decomposition of

A =

A12 2 −1 0−1 5

2 −10 −1 5

2

i.e. Find L ∈ R3×3 lower triangular Lii > 0 i = 1→ 3 such that A = LLT.Is A symmetric positive definite?Cearly AT = A

√

Akk > 0 k = 1→ 3

|A12| = |−1| = 1 <√

2

√52

=√

5 = (A11)12 (A22)

12 etc.

The above are necessary, and not sufficient.Check directly that A is positive definite.

x =

x1

x2

x3

6= 0⇒ xTAx =3∑i=1

3∑j=1

Aijxixj

= 2x21 +

52x2

2 +52x2

3 −2x1x2 − 2x2x3︸︷︷︸+2

„−x1

r

«x2s

+2

„−x2

r

«x3s

41

Note:

(r + s)2 = r2 + 2rs+ s2 ≥ 0

⇒ rs ≥ −12(r2 + s2

)∀r, s ∈ R (16)

≥applying (16)

2x21 +

52x2

2 +52x2

3 −(x2

1 + x22

)−(x2

2 + x23

)

= x21 +

12x2

2 +32x2

3

≥ 12

3∑i=1

x2i =

12‖x‖2 > 0 ∀x ∈ R3 x 6= 0

Recapp from exercisen = 3

A =

A12 2 −1 0−1 5

2 −10 −1 5

2

symmetric, positive definite

√

L = [l1 l2 l3] lower triangular

A = l1lT1 + l2lT2 + l3lT3

find L

l1 =

×××

l2 =

0××

l3 =

00×

A = lT1 l1 + l2lT2 + l3lT3 (17) × × ×

× × ×× × ×

symmetric

+

0 0 00 × ×0 × ×

symmetric

+

0 0 00 0 00 0 ×

∴ first column/row of A is generated by l,Equate first columns of l1lT1

3×3and A

3×3 (l1)1 (l1)1(l1)2 (l1)1(l1)3 (l1)1

=

A11

A21

A31

⇒ (l1)i =Ai1(l1)1

i = 1→ 3

42

but [(l1)1]2 = A11

∴ (l1)i =Ai1√A11

i = 1→ 3

∴ in this example

l1 =1√2

2−10

Let

A(1) = A− l1lT1

=

2 −1 0−1 5

2 −10 −1 5

2

− 12

4 −2 0−2 1 00 0 0

⇒ A(1) =

0 0 00 2 −10 −1 5

2

= l2lT2 + l3lT3 by (17)

2nd column/row of A(1) is generated by l2

(l2)i =A

(1)i2√A

(1)22

⇒ l2 =1√2

02−1

A(2) = A(1) − l2lT2

=

0 0 00 2 −10 −1 5

2

− 12

0 0 00 4 −20 −2 1

=

0 0 00 0 00 0 2

= l3lT3 by (17)

∴ l3 = 1√2

002

∴ L = [l1 l2 l3] =1√2

2 0 0−1 2 00 −1 2

lower triangular Lii > 0 i = 1→ 3

Check A = LLT√

43

Now consider the above constructive algorithm in the general casei.e. A ∈ Rn×n symmetric positive definite.Since A11 > 0, we can start the algorithm

l1 =1√A11

A11

A21...

An1

Let A(1) = A− l1lT1 ∈ Rn×n (symmetric as A, l1lT1 are both symmetric)and has the structure

A(1) =

0 −→ 0↓0 B

, where B ∈ R(n−1)×(n−1) is symmetric

To continue we needA

(1)22 = B11 > 0

We will now prove that B is positive definite ⇒ Bkk > 0 k = 1→ n− 1

To do this, we note that

e1 =

10...0

∈ Rn ⇒ Ae1 =

A11

A21...

An1

← first column of A

eT1 Ae1 = A11

∴ l1 =1√eT

1 AAe1

Theorem

B ∈ R(n−1)×(n−1), as defined above, is positive definite

Proof: We need to show that uTBu > 0 ∀u ∈ Rn−1, u 6= 0.

Given u ∈ Rn−1, u 6= 0, let v =(

0u

)∈ Rn so v 6= 0

eT1 v = 0, e1,v1 6= 0⇒ e1,v lin. ind.

44

uTBu = vTA(1)v

= vT(A− l1lT1

)v

= vTAv − vT l1 lT1 v

1× n n× 1︸︷︷︸1× n n× 1︸︷︷︸(lT1 v)T

= vTAv −(lT1 v

)2l1 =

1√eT

1 Ae1

Ae1

⇒ lT1 =1√

e1Ae1eT

1

AqAT

∴ uTBu = vTAv −(eT

1 Av)2

eT1 Ae1

= 〈v,v〉A −[〈e1,v〉A]2

〈e1, e1〉A

=‖v‖2A ‖e1‖2A − [〈e1,v〉A]2

‖e1‖2AApply Cauchy-Schwarz inequality

|〈e1,v〉A| <e1,v lin ind.

‖e1‖A ‖v‖A

⇒ uTBu > 0 ∀uRn−1, u 6= 0

∴ B is positive definite.and

B = BT ⇒ Bkk > 0 k = 1→ n− 1

∴ A(1)22 = B11 > 0⇒ Cholesky Decomposition can continue etc.

Application of the Cholesky Decomposition

Given A ∈ Rn×n symmetric positive definite.If we find the Cholesky Decomposition of A

i.e. A = LLT L ∈ Rn×n lower triangular with Lii > 0 i = 1→ n,

then it is easy to solve An×n

xn×1

= bn×1

for a given b ∈ Rn.

Ax = b⇐⇒√

L

√

LT ?x︸︷︷︸

z?

=√

b

45

∴ Llower triang.

z = b and LT

upper triang.x = z

Solve for z by a forward solve

z1 =b1L11

zk =

(bk −

∑k−1j=1 Lkjzj

)Lkk

k = 2→ n

Solve for x by a back solve

xn =znLTnn

=znLnn

xk =

zk −∑nj=k+1

(LT)kj︸︷︷︸

Ljk

xj

(LT)kk︸︷︷︸

Lkk

k = (n− 1)→ 1

1.7 Least Squares Problems

Give A ∈ Rm×n (m ≥ n) , b ∈ RmFind x ∈ Rn such that

Ax = b(m equationsn unknowns

)If m > n, then generally there is no solution x to Ax = b.So find an approximate solution in some sense.Find x∗ ∈ Rn such that Ax∗ − b is “small”. Make this precise.

Example

Pendulum


length l, period T √g

lT = 2π

estimate g (acceleration due to gravity) from the above relationship.

46

Let L =√l and c = 2π√

g ⇒√

Lc =√

T

Do m experiments.Lm×1

c1×1

= Tm×1

L =

L1...Lm

T =

T1...Tm


Fit a straight line through the origin to the dots

Choose c ∈ R to minimise the sum of the squares of the errors

minc∈R

m∑i=1

(Ti − cLi)2 = minc∈R‖T− cL‖2

Let

S (c) = ‖T− cL‖2 = 〈T− cL,T− cL〉= 〈T,T〉 − 2c 〈L,T〉+ c2 〈L,L〉

∴ S (c) = ‖T‖2 − 2c 〈L,T〉+ c2 ‖L‖2

dS

dc(c) = −2 〈L,T〉+ 2c ‖L‖2

d2S

dc2(c) = 2 ‖L‖2 > 0

dS

dc(c∗) = 0 ⇐⇒ c∗ =

〈L,T〉‖L‖2

∴ S (c∗) ≤ S (c) ∀c ∈ R

c∗ is the global minimum of S (c).

Note:

c∗ ∈ R is such that−〈L,T〉+ c∗‖L‖2

〈L,L〉= 0

⇒ 〈T− c∗L,L〉 = 0

47


Generalise to √

Am×n

xn×1

=√

bm×1

m > n generally no solution x as we have an overdetermined system.Find x∗ ∈ Rn such that

‖Ax∗ − b‖ ≤ ‖Ax− b‖ ∀x ∈ Rn

i.e.minx∈Rn

‖Ax− b‖ ≡ minx∈Rn

‖Ax− b‖2

Let Q(x) = ‖Ax− b‖2Q : Rn → R

minQ (x)

Q1Sheet2

Recall -Given A ∈ Rm×n, b ∈ Rm (m ≥ n)For m > n, in general, no solution x ∈ Rn to Ax = b.Find approximate solution x∗ ∈ Rn such that

Q (x∗) ≤ Q (x) = ‖Ax− b‖2 ∀x ∈ Rn

Q : Rn → R

Q (x) = ‖Ax− b‖2 = 〈Ax− b, Ax− b〉 = (Ax− b)T (Ax− b)=

(xTAT − bT

)(Ax− b) = xTATAx− bTAx− xTATb + bTb

= xTATAx− 2bTAx+ bTb since xTATb =(xTATb

)T= bTAx ∈ R

= xTGx− 2µTx + ‖b‖2

where G = AT

n×nAm×n

∈ Rn×n and µ = AT

n×mb

m×1∈ Rn

NoteGT =

(ATA

)T= ATA = G⇒ G ∈ Rn×n symmetric

µ = ATb⇒ µT = bTA

Recall Q1 on Sheet2 ⇒

∇Q (x) = 2(Gx− µ

)D3Q (x) = 2G

48

Theorem

Let A ∈ Rm×n (m ≥ n) has n linearly independent columns, and b ∈ Rm.Then ATA ∈ Rn×n is symmetric positive definiteMoreever ∃ unique x∗ ∈ Rn such that ATAx∗ = ATb [Normal Equations of Ax = b] andx∗ is the global minimum of Q (x) = ‖Ax− b‖2 x∗ is called the least squares solution ofAx = b.Proof: ATA is symmetric - see above

A = [a1 a2 . . . an] , ai ∈ Rm lin. ind.

cTATAc = (Ac)TAc = ‖Ac‖2 ≥ 0

and = 0 if and only if Ac = 0 i.e.n∑i=1

ciai = 0

{ai}ni=1 lin. ind. ⇒ c = 0

∴ cTATAc > 0 ∀c ∈ Rn, c 6= 0∴ ATA ∈ Rn×n is symmetric positive definite.∴(ATA

)−1 exists∴ ∃ unique x∗ ∈ Rn solving ATAx∗ = ATbNow show x∗ is the global minimum of Q (x)

Q (x) = ‖Ax− b‖2 = xTATAx− 2(ATb

)Tx + ‖b‖2

⇒ ∇Q (x) = 2(ATAx−ATb

)D2Q (x) = 2ATA

For x∗ ∈ Rn a local minimum of Q (x), we require ∇Q (x∗) = 0 and D2Q (x∗) to be sym-metric positive definite.∴ ∇Q (x∗) = 0⇐⇒ ATAx∗ = ATb∴!x∗ such that ∇Q (x∗) = 0, D2Q (x∗) = ATA s.p.d. (symmetric positive definite) ⇒ x∗

global minimum of Q (x).

Ex m = 3, n = 2

A =

3 654 012 13

b =

111

easy to show that ∃ no x ∈ R2 solving Ax = b∴ compute the least squares solution x∗ ∈ R2 such that

ATAx∗ = ATb

ATA =[

3 4 1265 0 13

]AT2×3

3 654 012 13

A

3×2

=(

169 351351 4394

)AT

2×3b

3×1=(

1978

)2×1

49

(169 351351 4394

)x∗ =

(1978

)x∗ =

(0.090587 . . .0.010515 . . .

)Note Ax∗ − b 6= 0

In practice, it is not a good idea to solve the normal equations (ATAx∗ = ATb) sinceATA is generally ill-conditioned.A matrix B ∈ Rn×n is ill-conditioned if small changes to the RHS of the system Bx = blead to large changes in the solution, unacceptable errors on a computer.i.e.

Bx = b, B (x + δx) = b + δb

B ill-conditioned: “small” δb⇒”large” δx

Alternative procedure to find x∗

In practice, find x∗ using the QR approach.Take a sequence of Givens rotations

G = Gmn . . . G13G12 such that

Gm×m

Am×n

= Rm×n

upper triangular with rii > 0 i = 1→ n

Gpq p < q

Gpq ∈ Rm×m orthogonal

GTpqGpq = I(m) = GpqG

Tpq

take out original system

Ax = b⇒ GAx = Gb⇒ Rx = Gb ∈ Rm

=

(Gb)1...

(Gb)n0...0

+

0...0

(Gb)n+1...

(Gb)m

= α+ β α, β ∈ Rm⟨

α, β⟩

= 0

50

Recall

R =

↖ X X0 ↖

↘ X↘0

0 0

↑n↓↑

m−n↓

Ax⇒ b⇒ Rx = α+ β

if β = 0 ⇒ ∃ unique solution x ∈ Rn to Rx = α = Gb

⇒ ∃ unique solution x ∈ Rn to the original system Ax = b

if β 6= 0 ⇒ Rx = Gb = α+ β is an inconsistant system⇒ no solution x

⇒ no solution x to Ax = b

If β 6= 0, solve the consistent systemRx∗ = α

∃!x∗ solving this, solve by a back solve

Claim: x∗ ∈ Rn is the least squares solution to Ax = bi.e.

‖Ax∗ − b‖2 ≤ ‖Ax− b‖2 ∀x ∈ Rn

Orthogonal matrices preserve length ‖Gy‖ = ‖y‖ ∀y ∈ Rm

∴ ‖Ax− b‖2 = ‖G (Ax− b)‖2 =∥∥Rx−

(α+ β

)∥∥2

=⟨(Rx− α)− β, (Rx− α)− β

⟩∴ ‖Ax− b‖2 = ‖Rx− α‖2 +

∥∥β∥∥2 − 2⟨Rx− α, β

⟩︸︷︷︸=0

since⟨Rx, β

⟩∀x∈Rn

=⟨α, β

⟩= 0

∴ minx∈Rn

‖Ax− b‖2 = minx∈Rn

‖Rx− α‖2 +∥∥β∥∥2

∴ Rx∗ = α⇒ ‖Rx∗ − α‖ = 0

∴ x∗ is such that∥∥β∥∥2 = ‖Ax∗ − b‖2 ≤ ‖Ax− b‖2 ∀x ∈ Rn

51

Example

Use QR approach on the example abovem = 3, n = 2

A =

3 654 012 13

b =

111

G = G23 (ψ)G13 (φ)G12 (θ) (recall from QR notes)

GA = R =

13 270 (3665)

12

0 0

Gb =

1913501

13(3665)12

41

(3665)12

=

1.46154 . . .0.63659 . . .0.67725 . . .

⇒ α =

1.46154 . . .0.63659 . . .0.67725 . . .

β =

00

0.67725

Solve Rx∗ = α⇒ x∗ =

(0.0905870.010515

)∴ ‖Ax∗ − b‖2 =

∥∥β∥∥2 = (0.67725)2

2 Least Squares Problems (A more abstract approach)

A more abstract definition of an inner product

Definition:

Let V be a real vector spaceAn inner product on V × V is a function 〈., .〉 : V × V → R such that

(i) 〈λu+ µv,w〉 = λ 〈u,w〉+ µ 〈v, w〉

(ii) 〈u, v〉 = 〈v, u〉

(i)+(ii)⇒ 〈w, λu+ µv〉 = λ 〈w, u〉+ µ 〈w, v〉

(iii) 〈u, u〉 ≥ 0 with equality if and only if u = 0 ∈ V

An inner product induces a norm

‖u‖ = [〈u, u〉]12 ∀u ∈ V

⇒ ‖u‖ = 0 if and only if u = 0

52

ExampleV = C [a, b] continuous functions over the closed interval [a, b]

Let w ∈ C [a, b] with w (x) > 0 ∀x ∈ [a, b] - w is the weight function

Define (two continous functions over the interval [a, b]) 〈f, g〉=∫ ba w (x) f (x) g (x) dx ∀f, g ∈

C [a, b]Clearly 〈., .〉 : V × V → R and (i), (ii) clearly hold.

(iii)

〈f, f〉 =∫ b

aw (x) [f (x)]2 dx ≥ 0 ∀f ∈ C [a, b]

= 0 if and only if f ≡ 0 function

Cauchy-Schwarz Inequality

|〈u, v〉| ≤ ‖u‖ ‖v‖ ∀u, v ∈ V

with strict inequality if and only if u, v are linearly independent.

Proof: same as before.

Abstract Form of the Least Squares Problem

Let V be a real vector space with inner product 〈., .〉. Let U be a finite dimensional subspaceof V with basis {φi}ni=1, by basis we mean linearly independent and span the subspace U .

Given v ∈ V , find u∗ ∈ U such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U

Example

V = C [a, b] , 〈f, g〉 =∫ b

af (x) g (x) dx

(i.e. w(x)≡1)

U = Pn−1 (polynomials of degree ≤ n− 1)Basis φi = xi−1 i = 1→ n

‖v − u∗‖2 ≤ ‖v − u‖2 =∫ b

a[v (x)− u (x)]2 dx

Return to the general case

u ∈ U ⇐⇒ u =n∑i=1

λiφi for some λ =

λ1...λn

∈ R

53

u∗ ∈ U ⇐⇒ u∗ =n∑i=1

λ∗iφi ⇐⇒ λ∗ ∈ Rn

Finding u∗ ∈ U such that ‖v − u∗‖2 ≤ ‖v − u‖2 ∀u ∈ U⇐⇒ finding λ∗ ∈ Rn such that ‖v −

∑ni=1 λ

∗iφi‖

2 ≤ ‖v −∑n

i=1 λiφi‖2 ∀λ ∈ Rn

Let

E (λ) =

∥∥∥∥∥v −n∑i=1

λiφi

∥∥∥∥∥2

E : Rn → R≥0

Find λ∗ ∈ Rn such that E (λ∗) ≤ E (λ) ∀λ ∈ Rn.

E (λ) =

⟨v −

n∑i=1

λiφi, v −n∑j=1

λjφj

⟩i, j dummy variable

= 〈v, v〉 −n∑i=1

λi 〈φi, v〉︸︷︷︸−n∑j=1

λj 〈v, φj〉︸︷︷︸the same

+n∑i=1

n∑j=1

λiλj 〈φi, φj〉

Let µ ∈ Rn such that µi = 〈v, φi〉 i = 1→ nG ∈ Rn×n such that Gij = 〈φi, φj〉 i, j = 1→ n∴ E (λ) = ‖v‖2 − 2µTλ+ λTGλ

=⇒ ∇E (λ) = 2(Gλ− µ

)D2E (λ) = 2G

E (λ∗) is a local minimum of E (λ) if

∇E (λ∗) = 0

and G is positive definite.

∇E (λ∗) = 0⇔ Gλ∗ = µ normal equations

G is called the Gram matrix, it depends on the basis {φi}nn=1 for U .[Sometimes write G (φ1 φ2 . . . φn)]

Lemma:{φi}ni=1 basis for U

⇒ {φi}ni=1 lin. ind.⇒ G is symmetric positive definite

Proof: G ∈ Rn×nGij = 〈φi, φj〉 i, j = 1→ n

Gji = 〈φj , φi〉(ii)= 〈φi, φj〉 = Gij i, j = 1→ n

54

λTGλ =n∑i=1

n∑j=1

Gijq

〈φi,φj〉

λiλj

(i),(ii)=

⟨n∑i=1

λiφi,n∑j=1

λjφj

⟩

=

∥∥∥∥∥n∑i=1

λiφi

∥∥∥∥∥2

≥ 0

and = 0 if and only if∑n

i=1 λiφi = 0 ∈ V ⇐⇒ λ = 0 as {φi}ni=1 linearly independent.

∴ λTGλ > 0 ∀λ ∈ Rn, λ 6= 0 =⇒ G is symmetric positive definite.

G positive definite ⇒ G−1 exists⇒ ∃!λ∗ ∈ Rn solving Gλ∗ = µ normal equations⇒ ∇E (λ∗) = 0

(and is unique - no other stationary points).∴ D2E (λ∗) = 2G symmetric positive definite∴ λ∗ ∈ Rn which solves the normal equations is the global minimum of E (λ)⇒ u∗ =

∑ni=1 λ

∗iφi

Recall -V a real Vector Space, Inner Product 〈., .〉U a finite dimensional subspace, basis {φi}ni=1

Given v ∈ V , find u∗ ∈ U such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U

Then u∗ =∑n

i=1 λ∗iφi, where λ

∗ ∈ Rn is the unique solution of

Gλ∗ = µ Normal Equations

G ∈ Rn×n is the GRAM MATRIX depends on basis for U .

Gij = 〈φi, φj〉 i, j = 1→ n

symmetric positive definite

µ ∈ Rn µi = 〈v, φi〉 i = 1→ n

Theorem (orthogonality property):⟨v − u∗︸︷︷︸, uerror

⟩= 0 ∀u ∈ U

55


Proof:

Gλ∗ = µ

⇒ λTGλ∗ = λTµ ∀λ ∈ Rn

Implication goes the other way as well

λTGλ∗ = λTµ ∀λ ∈ Rn

Let

λ = ei =

0...010...1

← ith position

=⇒(Gλ∗)i = µi

Repeat for i = 1→ n =⇒ Gλ∗ = µ

∴ Gλ∗ = µ ⇔ λTGλ∗ = λTµ ∀λ ∈ Rn

⇔n∑i=1

n∑j=1

Gijq

〈φi,φj〉

λiλ∗j =

n∑i=1

λi µiq

〈v,φi〉

⇔

⟨n∑i=1

λiφi︸︷︷︸u∈U

,n∑j=1

λ∗jφj︸︷︷︸u∗

⟩=

⟨v,

n∑i=1

λiφi︸︷︷︸u∈U

⟩∀λ ∈ Rn

⇔ 〈v − u∗, u〉 = 0 ∀u ∈ U

Example

1. V = C [0, 1], 〈f, g〉 =∫ 10 f (x) g (x) dx

U = Pn−1 Basis{xi−1

}ni=1

i.e. φi = xi−1

u ∈ Pn−1 ⇔ u (x) =∑n

i=1 λixi−1

Given v ∈ C [0, 1], find u∗ (x) =∑n

i=1 λ∗ixi−1i such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ Pn−1

56

⇐⇒ ‖v − u∗‖2 ≤ ‖v − u‖2 ∀u ∈ Pn−1

⇐⇒∫ 10 (v − u∗)2 dx ≤

∫ 1

0(v − u)2 dx

Find λ∗ from solving the normal equations Gλ∗ = µ

µi = 〈v, φi〉 =∫ 10 v (x)xi−1dx i = 1→ n

Gij = 〈φi, φj〉 =∫ 10 x

i−1xj−1dx =∫ 10 x

i+j−2dx = 1i+j−1 i, j = 1→ n

=⇒ G =

1 1

2 . . . 1n

12

13 . . . 1

n+1...

.... . .

...1n

1n+1 . . . 1

2n−1

the n× n Hilbert Matrix

Badly conditioned, columns → linear dependence as n→∞.

2. V = Rm, 〈a,b〉 = aTb ∀a,b ∈ Rm

U = span {ai}ni=1 where n ≤ m, linearly independent

i.e. φi = ai ∈ RmGiven v ∈ Rm, find u∗ =

∑ni=1 λ

∗i ai such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U (18)

Let A = [a1 a2 . . . an] ∈ Rm×n

Am×n

λ∗

n×1=

n∑i=1

λ∗i ai

(18)⇐⇒ ‖v −Aλ∗‖ ≤ ‖v −Aλ‖ ∀λ ∈ Rn

Find λ∗ from solving the Normal Equations Gλ∗ = µ

µ ∈ Rn µi = 〈v, φi〉 = 〈v,ai〉 = aTi v i = 1→ n

G ∈ Rn×n Gij = 〈φi, φj〉 = 〈ai,aj〉 = aTi aj i, j = 1→ n

A = [a1 a2 . . . an] m× n

AT =

aT

1

aT2...

aTn

n×m

⇒ ATA ∈ Rn×n(ATA

)ij

= aT1 aj

⇒ G = ATA

57

AT

n×mv

m×1∈ Rn

⇒(ATv

)i= aT

i v i = 1→ n

∴ µ = ATv

∴ Gλ∗ = µ⇒ ATAλ∗ = ATv Normal Equations for Aλ = v

Change basis

1.{xi−1

}ni=1

Gram-Schmidt→ {ψ}ni=1 orthonormal

Gij = 〈ψi, ψj〉 = δij i, j = 1→ n =⇒ G ≡ I⇒ λ∗ = µ

where µi = 〈v, ψi〉 i = 1→ n

2.{xi−1

}ni=1

Gram-Schmidt→ {ψ}ni=1 orthogonal

Gij = 〈ψi, ψj〉 = 0 i, j = 1→ ni 6=j

andGii = ‖ψi‖2 > 0 i = 1→ n

⇒ G is a diagonal matrix

Gλ∗ = µ⇒ λ∗i =µi

‖ψi‖2i = 1→ n

∴ u∗ =n∑i=1

〈v, ψi〉‖ψi‖2

ψi

It is very easy to construct this orthogonal basis.

3 Orthogonal Polynomials

V = C [a, b] 〈f, g〉 =∫ b

aw (x) f (x) g (x) dx

Weight function w ∈ C (a, b) with w (x) > 0 ∀x ∈ [a, b](w (x)→∞ as x→ a, or x→ b possibly)


58

Require integral to be well-defined

|〈f, g〉| =∣∣∣∣∫ b

aw (x) f (x) g (x) dx

∣∣∣∣≤

∫ b

a|w (x) f (x) g (x)| dx

=∫ b

aw (x) |f (x)| |g (x)| dx ≤

∫ b

aw (x) dx max

a≤x≤b[|f (x)| |g (x)|]︸︷︷︸

<∞ as f,g∈C[a,b]

∴ require∫ ba w (x) dx <∞

Ex

[a, b] = [0, 1], w (x) = x−α α > 0


w ∈ C (0, 1) ∫ 1

0x−αdx =

[x1−α

1− α

]1

0

<∞ if α < 1

Note, α = 1 ∫ 1

0x−1dx = [lnx]10 �∞

U = Pn polynomials of degree ≤ n

Canonical basis{xi}ni=0⇒ ill-conditioned Gram matrix

so construct a new basis for Pn{φi (x)}ni=0

where φj (x) is a Monic polynomial of degree j, and is also orthogonal 〈φi, φj〉 = 0 i 6= j

φj (x) = xj +j−1∑i=0

ajixi

(monic-leading coefficient is 1).∴ φ0 (x) = 1, φ1 (x) = x− a0, where a0 ∈ Rchosen so that 〈φ0, φ1〉 = 0.

59

TheoremMonic orthogonal polynomials, φj ∈ Pj , satisfies the three term recurrence relation

φj+1 = (x− aj)φj (x)− bjφj−1 (x) for j ≥ 1

where aj = 〈xφj ,φj〉‖φj‖2

and bj = ‖φj‖2

‖φj−1‖2also for j ≥ 1.

Proof:

φj (x) ∈ Pj , monic ⇒

φj+1 (x)− xφj (x) ∈ Pj ⇒ φj+1 (x)− xφj (x) =j∑

k=0

ckφk (x)

Find ck, k = 0→ n⟨n∑k=0

ckφ(x)k , φi (x)

⟩= 〈φj+1 (x)− xφj (x) , φi (x)〉 i = 0→ j

ci ‖φi (x)‖2 = −〈xφj (x) , φi (x)〉 i = 0→ j (19)

since {φj} orthogonal

〈xφj (x) , φi (x)〉 =∫ b

aw (x)xφj (x)φi (x) dx

=

⟨φj (x) , xφi (x)︸︷︷︸

∈Pi+1

⟩

φj (x) is orthogonal to {φk}j−1k=0

⇒ φj (x) is orthogonal to Pj−1 degree ≤ j − 1∴ if i+ 1 ≤ j − 1, i.e. i ≤ j − 2then 〈xφj (x) , φi (x)〉 = 0

∴ (19)⇒ ci = 0 if i ≤ j − 2∴ φj+1 (x)− xφj (x) = cj−1φj−1 (x) + cjφj (x)

all other coefficients are 0, where

cj−1 =−〈xφj (x) , φj−1 (x)〉‖φj−1 (x)‖2

cj =−〈xφj (x) , φj (x)〉‖φj (x)‖2

=−〈xφj , φj〉‖φj‖2

〈φj , xφj−1〉 =

⟨φj ,

Pj−1︷︸︸︷xφj−1 − φj

⟩︸︷︷︸

=0

+ 〈φj , φj〉

60

∴ cj−1 =−‖φj‖2

‖φj−1‖2

Let aj = −cj = 〈xφj ,φj〉‖φj‖2

Let bj = −cj−1 = ‖φj‖2

‖φj−1‖2

φj+1 (x)− xφj (x) = −ajφj (x)− bjφj−1 (x)

⇒ φj+1 (x) = (x− aj)φj (x)− bjφj−1 (x) j ≥ 1

whereaj = 〈xφj ,φj〉

‖φj‖2

bj = ‖φj‖2

‖φj−1‖2

j ≥ 1√

(20)

φ0 (x) = 1φ1 (x) = x− a0, where a0 ∈ R

such that〈φ1, φ0〉 = 0

i.e.〈x− a01, 1〉 = 0

a0 〈1, 1〉 = 〈x, 1〉

∴ a0 =〈x, 1〉‖1‖2

=〈xφ0, φ0〉‖φ0‖2

∴ extend (20) to j ≥ 0

φj+1 (x) = (x− aj)φj (x)− bjφj−1 (x) j ≥ 0

with φ0 (x) = 1, φ−1 (x) = 0

aj =〈xφj , φj〉‖φj‖2

j ≥ 0 bj =‖φ‖2

‖φj−1‖2j ≥ 1 (21)

Recall -

g (x) is even g(−x) = g (x) ∀x⇒∫ 2−2 g (x) dx = 2

∫ 20 g (x) dx

61


g (x) is odd if g (−x) = −g (x) ∀x⇒∫ 2−2 g (x) dx = 0


Ex

〈f, g〉 =∫ 1

−1f (x) g (x) dx

i.e. [a, b] = [−1, 1], w (x) = 1 ∀x ∈ [−1, 1]Find the monic orthogonal polynomials with respect to this inner product.Apply (21)

φ0 (x) = 1 φ1 (x) = x− a0 a0 =〈xφ0, φ0〉‖φ0‖2

=

∫ 1−1 xdx

‖φ0‖2= 0 as x is odd

⇒ φ1 (x) = x

φ2 (x) = (x− a1)φ1 (x)− b1φ0 (x)= (x− a1)x− b1

a1 =〈xφ1, φ1〉‖φ1‖2

=

∫ 1−1 x

3dx

‖φ1‖2= 0

b1 =‖φ1‖2

‖φ0‖2=

∫ 1−1 x

2dx∫ 1−1 12dx

=2∫ 10 x

2dx

2=

13

⇒ φ2 (x) = x2 − 13

etc.φ3 (x) = x3 − 3

5x

Summary

V = C [a, b],

〈f, g〉 =∫ b

aw (x) f (x) g (x) dx ∀f, g ∈ C [a, b]

62

with constraint that w ∈ C (a, b), w (x) > 0 ∀x ∈ [a, b] and it is integrable∫ b

aw (x) dx <∞

Given f ∈ C [a, b], we are looking to approximate this by a polynomial of degree n, findp∗n (x) ∈ Pn such that the associated norm with this product is minimal, ‖f − p∗n‖ ≥‖f − pn‖ ∀pn ∈ PnOrthogonal basis {φj (x)}nj=0 for Pn ⇒ p∗n (x) =

∑ni=0

〈f,φi〉‖φi‖2

φi (x)

p∗n ∈ Pn is the best approximation to f from Pn, in that norm ‖.‖.

Ex

Show that polynomials Tk (x) = cos(k cos−1 x

)for x ∈ [−1, 1] are orthogonal with respect

to the inner product

〈f, g〉 =∫ 1

−1

(1− x2

)− 12︸︷︷︸

w(x)

f (x) g (x) dx


Tk (x) polynomial?

⇒ T0 (x) = 1T1 (x) = x

Introduce change of variableθ = cos−1 x⇒ x = cos θ


x ∈ [−1, 1]⇔ θ ∈ [0, π]∴ Tk (x) = cos kθ

Recall the trigonometric identity

cos (k + 1) θ + cos (k − 1) θ = 2 cos kθ cos θ

⇒ Tk+1 (x) = 2xTk (x)− Tk−1 (x) k ≥ 1

63

⇒ T2 (x) = 2xT1 (x)− T0 (x)= 2x2 − 1 ∈ P2

T3 (x) = 2xT2 (x)− T1 (x)= 2x

(2x2 − 1

)− x

= 4x3 − 3x

by inductionTk (x) = 2k−1xk + · · · ∈ Pk

not monic.

Show {Tk (x)}k≥1 is orthogonal with respect to

〈f, g〉 =∫ 1

−1

(1− x2

)− 12︸︷︷︸

w(x)

f (x) g (x) dx


∫ 1

−1

(1− x2

)− 12 Tk (x)Tj (x) dx

x = cos θ ⇒ dx

dθ= − sin θ∫ 0

π(sin θ)−1 cos kθ cos jθ (− sin θ) dθ

=∫ π

0cos kθ cos jθdθ

=12

∫ π

0[cos [(k + j) θ] + cos [(k − j) θ]] dθ

=12

[sin (k + j) θ

k + j+

sin (k − j) θk − j

]π0

not valid if k = j ∨ k = j = 0

=

0 if k 6= jπ2 if k = j 6= 0π if k = j = 0

∴ {Tk (x)}k≥0 orthogonal, not orthonormal. These polynomials are called Chebyshev Poly-nomials.

64

4 Polynomial Interpolation

Abandon best approximation, and consider the more practical approach of polynomial inter-polation.

Given {(zj , fj)}nj=0 with zj , fj ∈ C as j = 0→ n, find pn (z) ∈ Pn such that pn (zj) = fjwith j = 0→ n.Ex. zj , fj ∈ R j = 0→ n


pn is called the interpolating polynomial for this data.

Natural Questions

1. Does pn exist?

2. Is pn unique?

3. What is the construction of pn?

1. Prove the existence by a construction proof. Clearly {zj}nj=0 should be distinct.

Lemma

Given {(zj , fj)}nj=0 with zj , fj ∈ C and j = 0→ n, zj distinct.Let

lj (z) =n∏k=0k 6=j

(z − zk)(zj − zk)

j = 0→ n

Then lj (z) ∈ Pn j = 0→ n and lj (zr) = δjr j, r = 0→ n.

Proof

lj (z) is a product of n factors of the form z−zkzj−zk

k 6= j ⇒ lj (z) ∈ Pn

lj (zr) =n∏k=0k 6=j

zr − zkzj − zk

If r = j ⇒ lj (zj) = 1If r 6= j ⇒ one factor = 0 when k = r

⇒ lj (zr) = 0

65

Example

zj ∈ R j = 0→ n


{lj (z)}nj=0 are the lagrange basis functions.

Lemma

The interpolating polynomials pn (z) ∈ Pn for the data {(zj , fj)}nj=0, zj , fj ∈ C j = 0→ n,zj distinct is such that

pn (z) =n∑j=0

fjlj (z)

Proof

lj (z) ∈ Pn j = 0→ n

⇒ pn (x) =n∑j=0

fjlj (z) ∈ Pn

pn (zr) (polynomial with data point zr) you want to guarantee it spews out fr.

pn (zr) =n∑j=0

fjlj (zr) =n∑j=0

fjδjr = fr r = 0→ n

∴ pn (z) interpolates the data {(zj , fj)}nj=0

2. Is pn unique?

Theorem (Fundamental Theorem of Algebra)

Letpn (z) = a0 + a1z = a2z

2 + · · ·+ anzn ai ∈ C, i = 0→ n

Then pn (z) has at most n distinct roots (zeros) in C, unless ai = 0, i = 0→ n⇒ pn (z) ≡ 0.

Recall

Given {(zj , fj)}nj=0, zj , fj ∈ C, zj distinct; find the interpolating polynomial pn ∈ Pn suchthat

pn (zj) = fj j = 0→ n

66

Lagrange Construction

pn (z) =n∑j=0

fjlj (z) where lj (z)∈Pn

=n∏k=0k 6=j


j = 0→ n

lj (zr) = δjr j, r = 0→ n

Is the interpolating polynomial unique?Assume the contrary

∃pn, qn ∈ Pn such that

pn (zj) = qn (zj) = fj j = 0→ n

to get a contradiction, we will use the fundamental theorem of algebra

⇒ pn − qn ∈ Pn and

(pn − qn) (zj) = 0 j = 0→ n

∴ pn − qn ∈ Pn has (n+ 1) distinct roots (zeros) as zj are distinct

F.T.A. ⇒ pn − qn = 0⇒ pn = qn

⇒ uniqueness

Example

Find p2 ∈ P2 such that p2(0)z0

= af0, p2(1)

z1

= bf1, p2(4)

z2

= cf2

n = 2

p2 (z) =2∑j=0

fjlj (z)

l0 (z) =(z − z1) (z − z2)

(z0 − z1) (z0 − z2)=

(z − 1) (z − 4)(−1) (−4)

=14(z2 − 5z + 4

)l1 (z) = · · · = −1

3(z2 − 4z

)l2 (z) = · · · = 1

12(z2 − z

)∴ p2 (z) = al0 (z) + bl1 (z) + cl2 (z) lagrange form

=(a

4+b

3+

c

12

)z2 −

(5a4− 4b

3+

c

12

)z + a canonical form

One could find the coefficients in the canonical form directly by using pn (z) =∑n

k=0 akzk.

We know that

pn (zj) =n∑k=0

akzkj = fj , j = 0→ n,

67

⇒

1 z0 . . . zn01 z1 . . . zn1...

......

1 zn . . . znn

a0

a1...an

=

f0

f1...fn

,

⇒ V a↑

C(n+1)×(n+1)

= f , a, f ∈ Cn+1,

Vjk = zkj , j, k = 0→ n,

V is called Vandermonde matrix (Q4, Sheet5). In general V is ill-conditioned (as zj getsclose to zi, columns i and j become linearly independent (this is why it is ill-conditioned)).

Canonical Basis pn (z) =n∑k=0

akzk,

{zk}nk=0⇒ V a = f ,

You should certainly not use the canonical basis, it looks as if we should use the Lagrangebasis, however there is a flaw in this basis as we will see later, even though it is far better.

Lagrange Basis pn (z) =n∑k=0

fklk (z) ,

{lk (z)}nk=0 ⇒ If = f ,

The Lagrange basis if far better than the canonical basis. However, this basis has to beconstructed. Assume we have found pn−1 ∈ Pn−1, interpolating {(zj , fj)}n−1

j=0 and one isthen given a new data point (zn, fn) . One cannot use pn−1 ∈ Pn−1 to find pn ∈ Pn. Onehas to compute new Lagrange basis functions ∈ Pn.

We now look for an alternative construction. If pn−1 ∈ Pn−1 such that pn−1 (z) =fj , j = 0→ n− 1, now find pn ∈ Pn such that pn (zj) = fj , j = 0→ n. Let

pn (z) = pn−1 (z) + Cn−1∏k=0

(z − zk)︸︷︷︸∈Pn vanishes at zj , ,j=0→n−1

⇒ pn (zj) = pn−1 (zj) = fj , j = 0→ n− 1.

Then choose C ∈ C such that

pn (zn) = pn−1 (zn) + Cn−1∏k=0

(zn − zk) = fn,

{zj}nj=0 distinct ⇒ C =fn − pn−1 (zn)∏n−1k=0 (zn − zk)

,

68

∴ C depends on all data points {(zj , fj)}nj=0.

Classical Notation C = f [z0, z1, . . . , zn] ,

This is called a divided difference of order n (depends on (n+ 1) points).

∴ pn (z) = pn−1 (z) + f [z0, z1, . . . , zn]n−1∏k=0

(z − zk) ,

so the coefficient of zn in pn (z) is f [z0, z1, . . . , zn].

Note that pn is unique and pn (zj) = fj , j = 0→ n,

⇒ f [zπ0 , zπ1 , . . . zπn ] = f [z0, z1, . . . , zn] ,

for any permutation π of the points {z0, z1, . . . , zn}.

Lemma

If {(zj , fj)}nj=0, zj , fj ∈ C, zj distinct, then

f [z0, z1, z2, . . . , zn] =n∑j=0

fj∏nk=0k 6=j

(zj − zk).

Furthermore, if fj = f (zj) , j = 0→ n for some function f (z), then f [z0, z1, . . . zn] = 0if f ∈ Pn−1.

Proof

Compare coefficient of zn in the Lagrange form of pn (z) with

pn (z) = pn−1 (z) + f [z0, z1, . . . , zn]n−1∏k=0

(z − zk) . (22)

Coefficient of zn in (22) is f [z0, z1, . . . , zn].

Recall, Lagrange form

pn (z) =n∑j=0

fjlj (z) =n∑j=0

fj

n∏k=0k 6=j


, (23)

⇒ coefficient of zn in (23),n∑j=0

fj

n∏k=0k 6=j

1(zj − zk)

,

69

hence the result.

If fj = f (zj), j = 0 → n, when f ∈ Pn−1, then the uniqueness of the interpolatingpolynomial,

⇒ pn (z) = f (z) ∈ Pn−1.


The coefficient of zn in pn (z) is f [z0, z1, . . . , zn]. But pn ∈ Pn−1 in this case,

⇒ f [z0, z1, . . . , zn] = 0.

Note that,

pn (z)↑

interpolates{(zj ,fj)}nj=0

= pn−1 (z)↑

interpolates{(zj ,fj)}n−1

j=0

+ f [z0, z1, . . . , zn]n−1∏k=0

(z − zk) ,

pn−1 (z) = pn−2 (z)↑

{(zj ,fj)}n−2j=0

+ f [z0, z1, . . . , zn−1]n−2∏k=0

(z − zk) ,

...

p1 (z)↑

{(zj ,fj)}1j=0

= p0 (z)↑

(z0,f0)

f [z0]=f0

+ f [z0, z1] (z − z0) ,

∴ pn (z) = f [z0]qf0

+n∑j=1

f [z0, . . . , zj ]j−1∏k=0

(z − zk) .

This is the Newton Form of the Interpolating Polynomial.

Note that,f [z0, z1, . . . , zj ] is the coefficient of zj in pj (z) ,

where pj ∈ Pj and pj (zk) = fk, k = 0→ j.

Theorem

For any distinct z0, z1, . . . , zn+1 ∈ C, the divided difference based on all the points,

f [z0, z1, . . . , zn+1]←n+2 points→

=f←n+1 points→[z0, z1, . . . , zn]− f

←n+1 points→[z1, z2, . . . , zn+1]

z0 − zn+1

70

Proof

Given {(zj , fj)}n+1j=0 , we construct pn, qn ∈ Pn such that,

pn (zj) = fj , j = 0→ n⇒ coefficient of zn in pn (z) is f [z0, z1, . . . , zn] ,qn (zj) = fj , j = 1→ n+ 1⇒ coefficient of zn in qn (z) is f [z1, z2, . . . , zn+1]

Letrn+1 (z) =

(z − zn+1) pn (z)− (z − z0) qn (z)z0 − zn+1

∈ Pn+1, (24)

rn+1 (z0) = pn (z0) = f0

rn+1 (zj) =(zj − zn+1) fj − (zj − z0) fj

z0 − zn+1, j = 1→ n,

= fj

rn+1 (zn+1) = qn (zn+1) = fn+1,

∴ rn+1 (z) ∈ Pn+1 is such that,

rn+1 (zj) = fj , j = 0→ n.

Compare the coefficient of zn+1 in (24),

⇒ f [z0, z1, . . . , zn+1]←n+2→

=f

←n+1→[z0, z1, . . . , zn]− f

←n+1→[z1, . . . , zn+1]

z0 − zn+1,

hence result. This is the divided difference recurrence relation.

Divided Difference Tableau

z0 f [z0] = f0

z1 f [z1] = f1↖← f [z0, z1] =

f [z0]− f [z1]z0 − z1

z2 f [z2]...

= f2↖← f [z1, z2] =

f [z1]− f [z2]z1 − z2

↖← f [z0, z1, z2]...

=f [z0, z1]− f [z1, z2]

z0 − z2

zn f [zn] = fn↖← f [zn−1, zn] =

f [zn−1]− f [zn]zn−1 − zn

↖← f [zn−2, zn−1, zn] etc.

Diagonal entries in this tableau appear in the Newton form of pn (z).

Example

n = 2

z0 = 0, z1 = 1, z2 = 4f0 = 0, f1 = b, f2 = c

71

z0 = 0 f [z0] = a

z1 = 1 f [z1] = b↖← f [z0, z1] =

a− b−1

= b− a

z2 = 0 f [z2] = c↖← f [z1, z2] =

b− c−3

=c− b

3↖← f [z0, z1, z2] =

(b− a)−(c−b3

)−4

=(a

4− b

3+

c

12

)so

p2 (z) = f [z0] + f [z0, z1] (z − z0) + f [z0, z1, z2] (z − z0) (z − z1)

= a+ (b− a) z +(a

4− b

3+

c

12

)z (z − 1) .

We may be interested in approximating a function f (z) that is complicated to evaluate, by apolynomial pn (z) ∈ Pn. Evaluate f (z) at {zj}nj=0, distinct points, and form the interpolatingpolynomial pn (z), pn (zj) = f (zj) j = 0→ n. Then approximate f (z), by pn (z)


Theorem

Let pn (z) interpolate f (z) at n+ 1 distinct points {zj}nj=0, zj ∈ C, then the error: e (z) =f (z)− pn (z) is such that

e (z) = f [z0, z1, . . . , zn, z]n∏k=0

(z − zk) z 6= zjj=0→n

,

(Note that e (zj) = 0, j = 0→ n).

Proof

pn (z) interpolates f (z) at {zj}nj=0. We now add a new point which is different from thepoints we already have, zn+1 6= zj , j = 0→ n.. This implies that the new polynomial is

⇒ pn+1 (z) ∈ Pn+1, interpolates f (z) at {zj}n+1j=0 .

Newton form of pn+1 (z) is

pn+1 (z) = pn (z) + f [z0, z1, . . . , zn, zn+1]n∏k=0

(z − zk)

⇒ f (zn+1) = pn+1 (zn+1) + f [z0, z1, . . . , zn, zn+1]n∏k=0

(zn+1 − zk)

⇒ e (zn+1) = f [z0, z1, . . . zn, zn+1]n∏k=0

(zn+1 − zk) ,

72

but zn+1 is any point zn+1 6= zj j = 0→ n

zn+1 = z 6= zj j = 0→ n

⇒ e (z) = f [z0, z1, . . . , zn, z]n∏k=0

(z − zk)√

For the above result to be useful, we need to bound

f [z0, z1, . . . , zn, z] .

We restrict ourselves from now on to the real case,

zj = xj ∈ R, j = 0→ n, distinct

f (z) = f (x) a real function

f [xj ] = f (xj) j = 0→ n

zero order divided difference based on one point.

First order divided difference, is based on 2 points

e.g.

f [x0, x1] =f [x0]− f [x1]

x0 − x1

=f (x0)− f (x1)

x0 − x1.

Mean Value Theorem

f (x1) = f (x0) + (x1 − x0)distance moved

f ′ (ξ) where ξ lies between x0 and x1,

this assumes that f ∈ C1 [x0, x1] x0 < x2 (C1 [x1, x0] if x1 < x0)

∴ f [x0, x1] = f ′ (ξ)

1st order divided difference.

Recall

e (z) = f (z)− pn (z) = f [z0, z1, . . . , zn, z]n∏k=0

(z − zk) z 6= zj j = 0→ n,

(e (zj) = 0 j = 0→ n)

73

Theorem

Let f ∈ Cn [x0, xn], i.e. f and its first n derivaties are continuous on [x0, xn] , wherefor ease of exposition we have assumed that the real interpolations points are ordered,x0 < x1 < · · · < xn.

Then ∃ξ ∈ [x0, xn] such that

f [x0, x1, . . . , xn] =1n!f (n) (ξ) ,

n+ 1 points nth order divided difference.

Proof

Let pn ∈ Pn interpolate f (x) at xi with i = 0→ n, let

e (x) = f (x)− pn (x)⇒ e (xi) = 0 i = 0→ n,

∴ e (x) has at least (n+ 1) zeros in [x0, xn].


Rolle’s Theorem

⇒ e′ (x) has at least n zeroes in [x0, xn],⇒ e′′ (x) has at least (n− 1) zeroes in [x0, xn],...⇒ e(n) (x) has at least 1 zero in [x0, xn]

Let ξ ∈ [x0, xn] be such thate(n) (ξ) = 0

e(n) (x) = f (n) (x)− p(n)n (x) ,

Recall Newton form of pn (x)

pn (x) = f [x0, x1, . . . , xn]xn + . . .

⇒ p(n)n (x) = n!f [x0, x1, . . . , xn] ∈ R,

∴ f (n) (ξ) = p(n)n (ξ) = n!f [x0, x1, . . . , xn] ,

hence result.

We now combine the above theorems.

74

Theorem

Let f ∈ Cn+1 [a, b]. Let {xi}ni=0 be our interpolation points which are distinct over theinterval [a, b]. If pn ∈ Pn interpolates f at {xi}ni=0, then the error e (x) = f (x) − pn (x)satisfies

|e (x)| ≤ 1(n+ 1)!

∣∣∣∣∣n∏i=0

(x− xi)

∣∣∣∣∣ maxa≤y≤b

∣∣∣f (n+1) (y)∣∣∣ ∀x ∈ [a, b] .

Proof

The result is clearly true for the interpolation points x = xi, i = 0 → n, as e (xi) = 0 ⇒0 ≤ 0

√since the product of factors

∏ni=0 (x− xi) also = 0.

1st Theorem ⇒

e (x) = f [x0, x1, . . . , xn, x]n∏k=0

(x− xk) x 6= xi, i = 0→ n.

2nd Theorem ⇒

e (x) =f (n+1) (ξ)(n+ 1)!

n∏k=0

(x− xk) for some ξ ∈ [a, b] ,

⇒

|e (x)| =1

(n+ 1)!

∣∣∣∣∣n∏k=0

(x− xk)

∣∣∣∣∣ ∣∣∣f (n+1) (ξ)∣∣∣

≤ 1(n+ 1)!

∣∣∣∣∣n∏k=0

(x− xk)

∣∣∣∣∣ maxa≤y≤b

∣∣∣f (n+1) (y)∣∣∣√

Let ‖g‖∞ = maxa≤x≤b

|g (x)|, (‖.‖∞ = infinity norm)

∴ ‖e‖∞ ≤1

(n+ 1)!

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

∥∥∥f (n+1)∥∥∥∞.

Does ‖e‖∞ → 0 as n→∞, assuming f ∈ C∞ [a, b]?

Ex. 1

[a, b] =[−1

2,12

], f (x) = ex,

we know that

x, xi ∈[−1

2,12

]⇒ |x− xi| ≤ 1,

⇒

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

≤

∥∥∥∥∥n∏i=0

|x− xi|

∥∥∥∥∥∞

≤ 1 ∀n,

75

∥∥∥f (n+1)∥∥∥∞

= ‖ex‖∞ = e12 ,

⇒ ‖e‖∞ ≤1

(n+ 1)!e

12 → 0 as n→∞.

Ex 2.

general [a, b], f (x) = cosx,⇒∥∥∥f (n+1)

∥∥∥∞≤ 1,

x, xi ∈ [a, b]⇒ |x− xi| ≤ b− a,

⇒ ‖e‖∞ ≤1

(n+ 1)!(b− a)n+1 → 0 as n→∞.

Ex 3.

f (x) = (1 + x)−1 on [0, 1]f ′ (x) = − (1 + x)−2

⇓f (n+1) (x) = (−1)n+1 (n+ 1)! (1 + x)−(n+2) ,

‖e‖∞ → 0 as n→∞?

‖f − pn‖9 0 as n→∞,

see Sheet5, Q12.

Can we choose the interpolation points {xi}ni=0 in a smart way?

Fix [a, b], fix n, and we are given f .Choose distinct interpolation points {xi}ni=0 ∈ [a, b] such that we minimize the product offactors

min{xi}ni=0

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

, (25)

⇒minqn∈Pn

∥∥xn+1 − qn (x)∥∥∞ , (26)

solve (26) i.e. find q∗n ∈ Pn such that∥∥xn+1 − q∗n (x)∥∥∞ ≤

∥∥xn+1 − qn (x)∥∥∞ ∀qn ∈ Pn.

If xn+1 − q∗n (x) has n+ 1 distinct zeroes {xi}ni=0 in [a, b], the we have solved (25).

min{xi}ni=0

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

⇒ minqn∈Pn

∥∥xn+1 − qn (x)∥∥∞

76

5 Best Approximation in ‖.‖∞(Best approximation in the uniform sense or “Minimax” approximation)Given g ∈ C [a, b], find q∗n ∈ Pn such that

‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn ⇐⇒ ‖g − q∗n‖∞ = minq∈Pn

{maxa≤x≤b

|g (x)− qn (x)|}

Theorem

Let g ∈ C [a, b] and n ≥ 0.

Suppose ∃q∗n ∈ Pn and (n+ 2) distinct points{x∗j

}n+1

j=0, where a ≤ x∗0 < x∗1 · · · < x∗n <

x∗n+1 ≤ b, such that

g(x∗j)− q∗n

(x∗j)

= (−1)j σ ‖g − q∗n‖∞ j = 0→ n+ 1, (27)

where σ = +1 or −1.Then q∗n ∈ Pn is the Best Approximation to g from Pn in ‖.‖∞ i.e.

‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn.

Examplen = 3, σ = +1 and E = ‖g − q∗n‖∞


5 alternating extremes.

Proof

Let E = ‖g − q∗n‖∞, if E = 0 then q∗n = g is the best approximation. Assume E > 0 andsuppose ∃qn ∈ Pn d oing better than q∗n, i.e.

‖g − qn‖∞ < ‖g − q∗n‖∞ = E.

Consider q∗n − qn ∈ Pn at the n+ 2 points{x∗j

}n+1

j=0

q∗n(x∗j)− qn

(x∗j)

=[q∗n(x∗j)− g

(x∗j)]

+[g(x∗j)− qn

(x∗j)]

= (−1)j+1 σE + γj and |γj | < E,

∴ sign ((q∗n)− qn)(x∗j)

= sign((−1)j+1 σE

)j = 0→ n+ 1,

77

∴ q∗n− qn ∈ Pn changes sign at least n+1 times⇒ q∗n− qn ∈ Pn has (n+ 1) distinct zeroes

FTA⇒ q∗n − qn ≡ 0 ⇒ qn = q∗n

⇒ contradiction to ‖g − qn‖∞ < ‖g − q∗n‖∞

∴ q∗n ∈ Pn is the best approximation.A polynomial satisfying the condition (27) in the above theorem is said to have the Equioscil-

lation Property (or the error g − q∗n is said to have the equioscilation property, note that q∗nmay degenerate and have degree < n - see Sheet5, Q10).The above theorem is one half of the Chebyshev Equioscillation Theorem.

Let g ∈ C [a, b] and n ≥ 0. Then ∃ a unique q∗n ∈ Pn such that

‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn,

and hence satisfies (27).

Proof

Omitted (straightforward apparently...)

Construction of q∗n is difficult in general. Hence why we study best least squares andinterpolation. However for g (x) = xn+1 it is easy to construct q∗n.

Theorem

Let [a, b] ≡ [−1, 1]. Consider g (x) = xn+1. Then the best approximation to xn+1 by Pn in‖.‖∞ on [−1, 1] is

q∗n (x) = xn+1 − 2−nTn+1 (x) ,

where Tn+1 (x) is the Chebyshev polynomial of degree n+ 1.

Proof

Recall Tn (x) = cos(n cos−1 x

)with n ≥ 0, remember the change of variable

θ = cos−1 x⇔ x = cos θ

[−1, 1]⇔ [0, π]

Tn (x) = cosnθ

⇒ Tn+1 (x) = 2xTn (x)− Tn−1 (x) n ≥ 1,

T0 (x) = 1, T1 (x) = x

⇒ Tn+1 (x) = 2nxn+1 + . . .

∴ q∗n (x) = xn+1 − 2−nTn+1 (x) ∈ Pn.

78

∴ the error

xn+1 − q∗n (x) = 2−nTn+1 (x)= 2−n cos (n+ 1) θ.

79