M2AA3 Orthogonality
Lectured by John BarrettLyxed by jm407
December 13, 2008
www.ma.ic.ac.uk/~jwb/teaching• 2hr exam in the summer term (4 questions)• 2 small assessed projects (involving computation - MatLab or whatever you
want)• exam 6 : project 1• deadline for the 2 assessed projects
1st project - mid/late november2nd project - first week of spring term
1
Contents
1 Applied Linear Algebra 31.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Inner Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Outer Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.1 Classical Gram-Schmidt Algorithm . . . . . . . . . . . . . . . . . . . 12
1.3 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4 Cauchy-Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.6 Inner Products Revisited and Positive Definite Matrices . . . . . . . . . . . . 361.7 Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2 Least Squares Problems 52
3 Orthogonal Polynomials 58
4 Polynomial Interpolation 65
5 Best Approximation in ‖.‖∞ 77
2
1 Applied Linear Algebra
1.1 Orthogonality
When two vectors are perpendicular to each other
a ∈ Rn ≡ Rn×1
a =
a1
a2...an
n×1
n rows, 1 column ai ∈ R
Transpose of a:aT = (a1 . . . an)
1×n∈ R1×n 1 row, n columns
Given a,b ∈ Rn
1.1.1 Inner Product:
aTb = (a1 . . . an)
b1...bn
1× n n× 1︸ ︷︷ ︸
1×1R
=n∑i=1
aibi ∈ R
1.1.2 Outer Product:
abT =
a1...an
(b1 . . . bn)
n× 1 1× n︸ ︷︷ ︸n×n
=
a1b1 . . . a1bn
a2b1...
......
anb1 . . . anbn
n×n
Therefore abT ∈ Rn×n (abT
)jk
= ajbk
such that
j = 1→ n
k = 1→ n
3
Useful for some questions on sheet 1
u ∈ Rn
abT
n× 1 1× n︸ ︷︷ ︸n×n
un×1
= an×1
(bTu
)1×1
=(bTu
)a
is always a multiple of a ∀u,a,b ∈ Rn
Note:
let A & B be matrices of dimensions p× q, r × s respectively
A ·B = C
with C being a matrix of dimensions p× s
Given a,b ∈ Rn, let
〈a,b〉 = aTb =n∑i=1
aibi
〈. . . 〉 : Rn × Rn → R inner product
〈a,b〉 =n∑i=1
aibi =n∑i=1
biai = 〈b,a〉 ∀a,b ∈ Rn symmetric (1)
the order doesn’t matter
〈a, µb + λc〉 = aT (µb + λc)
=n∑i=1
ai (µbi + λci) (2)
= µ
n∑i=1
aibi + λ
n∑i=1
aici
= µ 〈a,b〉+ λ 〈a, c〉
linear with respect to the 2nd argument ∀a,b, c ∈ Rn and ∀µ, λ ∈ R
(1) + (2)⇒ (3)
4
〈µa + λb, c〉 (1)= 〈c, µa + λb〉(2)= µ 〈c,a〉+ λ 〈c,b〉(1)= µ 〈c,a〉+ λ 〈b, c〉 (3)
linear with respect to the 1st argument
〈a,a〉 = aTa =n∑i=1
a2i ≥ 0
Let ‖a‖ = [〈a,a〉]12 =
(∑ni=1 a
2i
) 12 length or norm of a
‖a‖ ≥ 0= 0
∀a ∈ Rn if and only if a = 0
Recall - Geometric Vectors in R3
see diagram 20081009.M2AA3.1
a = a1i + a2j + a3k
b = b1i + b2j + b3k
1.1.3 Dot (Scalar) Product
a · b = b · a = |a| |b| cos θ
order doesn’t mattera · a = |a|2
therefore|a| = (a · a)
12(
θ = 0cos θ = 1
)
see diagram 20081009.M2AA3.2
5
⇒ i · i = j · j = k · k = 1 (as i, j,k are unit vectors)
⇒ i · j = j · k = i · k = 0 (as θ =π
2)
Easy to show that
a · (λb + µc) = λa · b + µa · c (4)
Non Trivial Vectors a & b (a 6= 0,b 6= 0)a and b are orthogonal (perpendicular) if and only if their dot product = 0
a · b = 0⇔ cos θ = 0⇔ θ =π
2
a · b = a (b1i + b2j + b3k)(4)= b1a · i + b2a · j + b3a · k= b1a1 + b2a2 + b3a3
therefore
a · b =3∑i=1
aibi
Given
a = a1i + a2j + a3k ≡ a =
a1
a2
a3
∈ R3
a · b =3∑i=1
aibi = aTb ≡ 〈a,b〉
which is the inner product of a & bNon-trivial vectors a & b are orthogonal if and only if (the inner product)
〈a,b〉 = 0
Definition:Dot product = Inner product
6
see diagram 20081009.M2AA3.3
〈a,b〉 = aTb =n∑i=1
aibi ∀a,b ∈ Rn
Inner product: take two vectors in Rn and spue out a vector in R3 rules
1. isometric, order doesn’t matter
2. linearity (linear combination of inner product see above)
3. again linearity on the other argument
Length/norm as above
Ex.a,b ∈ Rn orthogonal ⇒ ‖a + b‖2 = ‖a‖2 + ‖b‖2
(Generalised Pythagoras)
see diagram 20081014.M2AA3.1
Proof:
‖a + b‖2 def= 〈a + b,a + b〉(2)= 〈a + b,a〉+ 〈a + b,b〉(3)= 〈a,a〉+ 〈b,a〉+ 〈a,b〉+ 〈b,b〉
def‖.‖= ‖a‖2 + ‖b‖2 + 2 〈a,b〉 = 0
hence result.{qk}nk=1 , qk ∈ Rm, qk 6= 0 k = 1→ n
is ORTHOGONAL if and only if
〈qk,qj〉 = 0 j, k = 1→ n ∧ j 6= k
7
Kronecker delta notation
δjk ={
1 if j = k0 if j 6= k
identity matrix I ∈ Rn×n
I =
1 0 0
0. . . 0
0 0 1
Ijk = δjk j, k = 1→ n (5)
{qk}nk=1 , qk ∈ Rm k = 1→ n
is ORTHONORMAL if and only if
〈qk,qj〉 = δjk j, k = 1→ n
Definition:
i.e. ORTHONORMAL ≡ORTHOGONAL + each vector has unit length
‖qk‖ = [〈qk,qk〉]12 = 1 k = 1→ n
Linearly Independent Vectors
{ak}nk=1 , ak ∈ Rm k = 1→ n
{ak}nk=1 is said to be LINEARLY INDEPENDENT if
n∑k=1
ckak = 0 =⇒ ck = 0 k = 1→ n
(only choice){ak}nk=1 is said to be LINEARLY DEPENDENT if
∃ {ck}nk=1
not all zero such thatn∑k=1
ckak = 0
8
(e.g. if ci 6= 0⇒ ai = −∑n
k=1k 6=i
ckci
ak)
Let A ∈ Rm×n that has {ak}nk=1 as its columns
Am×n
=←n→
(a1,a2, ...,an)∈Rm
Am×n
cn×1
= (a1,a2, ...,an)
c1...cn
=n∑k=1
ckak ∈ Rm
therefore if the only solution to Ac = 0 is c = 0 then {ak}nk=1 is linearly independenthowever if ∃ a non-trivial solution, c 6= 0, then {ak}nk=1 is linearly dependent.
Restrict to the case m = n
A = (a1, ...,an) , ak ∈ Rn k = 1→ n
(a) If A−1 exists, then
Ac = 0 ⇒ A−1Ac = A−10⇒ Ic = 0⇒ c = 0⇒ {ak}nk=1 is lin. ind.
(b) If {ak}nk=1 is lin. ind.
⇒ they form a basis for Rn, i.e. span Rn
⇒ ∀b ∈ Rn ⇒ ∃{ck}nk=1 such that b =n∑k=1
ckak (6)
Is {ck}nk=1 unique?
Assume the contrary
b =n∑k=1
dkak (7)
(6)− (7) = 0 =n∑k=1
(ck − dk)ak
{ak}nk=1 lin. ind.⇒ ck − dk = 0⇒ ck = dk k = 1→ n
therefore the representation of b by {ak}nk=1 is unique
b =n∑k=1
ckak = Ac
9
(linear combination), where
A = (a1, ...,an) ∈ Rn×n
c =
c1...cn
therefore ∀b ∈ Rn, ∃!c ∈ Rn (! = unique) such that Ac = b
see diagram 20081014.M2AA3.2
Hence (a) & (b) yield for m = n
A−1 existsA
n×n=(a1,...,an)
⇐⇒ {ak}nk=1 lin. indep.
Therefore given
ei =
0...010...0
← iA position ∈ Rn ⇒ ∃ unique si ∈ Rn
i.e. (ei)j = δij j = 1→ n
such thatAsj = ei i = 1→ n
thereforeS = A−1
i.e. A−1 exists
Lemma:
{ak}nk=1 , ak ∈ Rm, ak 6= 0, k = 1→ n
and orthogonal〈aj ,ak〉 = 0 j, k = 1→ n
j 6=k
⇒ {ak}nk=1 linearly independent
⇒ n ≤ m
(Can’t have n > m linearly independent vectors in Rm - recall exchange lemma)
10
Proof: Ifn∑k=1
ckak = 0
⇒ 〈∑n
k=1 ckak,aj〉 = 〈0,aj〉(3)⇒
∑nk=1 ck 〈ak,aj〉
=0 if k 6=j= 0
⇒ ck 〈ak,aj〉 = 0
aj 6= 0⇒ ‖aj‖2 = 〈aj ,aj〉
6= 0
thereforecj = 0
Repeat for j = 1→ ntherefore cj = 0 for j = 1→ ntherefore {ak}nk=1 lin. indep.
orthogonality implies linear independence
therefore non-trivial orthogonal vectors are lin. ind.However, lin. ind. ; orthogonal
Ex.n = m = 2
a1 =(
20
)a2 =
(31
)c1a1 + c2a2 = 0
⇒ 2c1 + 3c2 = 0c2 = 0
}c1 = c2 = 0
therefore {ai}2i=1 lin. ind.〈a1,a2〉 = aT
1 a2 = 6 6= 0
11
1.2 Gram-Schmidt
Given{ai}ni=1 , ai ∈ Rm i = 1→ n, lin. ind.
( ⇒n ≤ m
)j
find{qi}ni=1 , qi ∈ Rm i = 1→ n, ORTHOGONAL
i.e.〈qi,qj〉 = δij i, j = 1→ n
with span {qi}ni=1 = span {ai}ni=1
1.2.1 Classical Gram-Schmidt (CGS) Algorithm
v1 = a1, q1 =v1
‖v1‖for k = 2→ n
vk = ak −k−1∑l=1
〈ak,ql〉ql (8)
qk =vk‖vk‖
Proof:q1 =
v1
‖v1‖=
a1
‖a1‖
{ai}ni=1 lin. ind. ⇒ ai 6= 0 i = 1→ n
therefore
‖a1‖ 6= 0
‖q1‖2 = 〈q1,q1〉 =⟨
a1
‖a1‖,
a1
‖a1‖
⟩=
1‖a1‖2
〈a1,a1〉 = 1
span {a1} = span {q1}
Setv2 = a2 − 〈a2,q1〉q1(8)
⇒ 〈v2,q1〉 = 〈a2 − 〈a2,q1〉q1,q1〉(2)= 〈a2,q1〉 − 〈a2,q1〉 〈q1,q1〉︸ ︷︷ ︸
1
= 0
12
see diagram 20081015.M2AA3.1
check
Is v2 = 0?If v2 = 0 =⇒ a2 is a multiple of q1
(so a2 a multiple of a1 which is impossible since they are lin. ind.) contradiction
therefore v2 6= 0therefore q2 = v2
‖v2‖
〈v2,q1〉 = 0⇒ 〈q2,q1〉 =⟨
v2
‖v2‖,q1
⟩= 0
Also
〈q2,q2〉 =⟨
v2
‖v2‖,
v2
‖v2‖
⟩= 1
therefore{qi}2i=1 ORTHOGONAL
v2 is a lin. combination of a2 and q1
so v2 is a lin. combination of a2 and a1
so q2 is a lin. combination of a2 and a1
Similarly a2 is a lin. comb. of q1 and q2
Thereforespan {qi}2i=1 = span {ai}2i=1
Continue by induction
assume that when we’ve done up to k − 1
{qi}k−1i=1 are all ORTHONORMAL
qj = lin. comb. of {ai}ji=1 j = 1→ k − 1
aj = lin. comb. of {qi}ji=1 j = 1→ k − 1
true for k = 2 and 3 from the above
Set
v1 = ak −k−1∑l=1
〈ak,ql〉ql
13
=⇒ 〈vk,qj〉 =(2)〈ak,qj〉 −
k−1∑l=1
〈ak,ql〉 〈ql,qj〉 j = 1→ l − 1
= 〈ak,qj〉 − 〈ak,qj〉 = 0
therefore〈vk,qj〉 = 0 j = 1→ k − 1
If vk = 0 this would tell us that ak is a lin. comb. of {ql}k−1l=1 but the inductive hypothesis
said that the q’s can be written like the a’s, therefore it would tell us that ak is a lin. comb.of {al}k−1
l=1
=⇒ contradiction to {ai}ni=1 lin. ind.
thereforeqk =
vk‖vk‖
〈vk,qj〉 = 0 j = 1→ k − 1 ⇒ 〈qk,qj〉 = 0 j = 1→ k − 1Also 〈qk,qk〉 = 1
therefore{qk}ki=1 ORTHONORMAL
vk lin. comb. of ak and {ql}k−1l=1
vk lin. comb. of ak and {al}kl=1
qk lin. comb. of ak and {al}kl=1
therefore qj = lin. comb. of {ai}ji=1 j = 1→ k
Similarlyaj = lin. comb. of {qi}ji=1 j = 1→ k
Ex.n = m = 2
a1 =(
3−4
)a2 =
(12
)clearly lin. ind.
first step:q1 =
a1
‖a1‖first we need to work out the length of a1
‖a1‖ = 〈a1,a1〉 = aT1 a1 = (3)2 + (−4)2 = 25
this implies⇒ ‖a1‖ = 5
14
so q1
⇒ q1 =15
(3−4
)⇒ ‖q1‖ = 1
v2 = a2 − 〈a2,q1〉q1 (9)
first calculate 〈a2,q1〉〈a2,q1〉 = aT
2 q1 =15
(3− 8) = −1
now put that back into (9)
= a2 + q1 =(
12
)+
15
(3−4
)=
15
(86
)so
‖v2‖2 = 〈v2,v2〉 = vT2 v2 =
((85
)2
+(
65
)2)
=10025
= 4
⇒ ‖v2‖ = 2⇒ q2 =v2
‖v2‖=
15
(43
){(
3−4
),
(12
)}CGS→
{15
(3−4
),15
(43
)}
1.3 QR Factorisation
{ai}ni=1 lin. ind. CGS→ {qi}ni=1 ai ∈ Rm i = 1→ n⇒
lin. ind.n≤m
Look at this from a different viewpointLet
A = (a1,a2, ...,an) ∈ Rm×n
Q̂ = (q1,q2, ...,qn) ∈ Rm×n
Let R̂ ∈ Rn×n be the upper trianglar matrix
R̂ =
r11 r12 . . . r1n
r22 . . . r2n. . .
...0 rnn
R̂lk ={rlk if l ≤ k0 if l > k
rlk will be determined later
15
Let e(n)k ∈ Rn ((n) is to stress it is in Rn as opposed to Rm)
e(n)k =
0...010...0
← kth row
(e(n)k
)j
= δjk j, k = 1→ n
Bm×n
e(n)kn×1
= (b1,b2, ...,bn)
0...010...0
= bk ∈ Rm kth column of B
Q̂m×n
R̂n×n
e(n)
n×1︸ ︷︷ ︸ = Q̂
r1kr2k...rkk0...0
=
k∑l=1
rlkql (10)
CGS ⇒a1 = v1 = ‖v1‖q1 where r11 = ‖v1‖
ak = vk +k−1∑l=1
〈ak,ql〉ql
= ‖vk‖qk +k−1∑l=1
〈ak,ql〉ql
=k∑l=1
rlkql where rkk = ‖vk‖rlk=〈ak,ql〉 l=1→k−1
therefore
Am×n
e(n)kn×1
= ak =k∑l=1
rlkql(10)= Q̂R̂e(n)
k
16
R̂ is the upper triangular n× n matrix with coefficients as above.
therefore columns of A and Q̂R̂ are the sametherefore A
m×n= Q̂
m×nR̂n×n
Q̂ has orthonormal columnswhereas R̂ is a square matrix, upper triangular where it’s diagonal entries are the lengths ofthe vk (so rkk = ‖vk‖ > 0 k = 1 → n) and since the v’s are non-trivial, R̂ has strictlypositive diagonal entries.
therefore CGS yields a factorisation of A
Am×n
= Q̂m×n
R̂n×n
if m > n A rectangular, Q̂ rectangular, R̂ squarewith n ≤ m REDUCED QR FACTORISATION of A
QR Factorisation of A
Am×n
= Qm×m
Rm×n
where
Q =[Q̂←n→
qn+1 . . .qm←m−n→
]↑m↓
where{qj}mj=n+1
are chosen so that all columns of Q are orthonormal
〈qi,qj〉 =5δij i, j = 1→ m
Rm×n
=[R̂0
]←n→
↑n↓↑
m− n↓
QR =[Q̂qn+1 . . .qm
]m×m
[R̂0
]m×n
= Q̂R̂ = A
Am×n
= Qm×m
Rm×n
Note:
QT
m×mQ
m×m=
qT1...
qTm
[q1 . . .qm]
17
QTQ︸︷︷︸m×m
jk
= qTj qk = 〈qj ,qk〉 = δjk j, k = 1→ n
QTQ = I(m) ∈ Rm×m identity matrix
thereforeQT = Q−1
thereforeQTQ = I(m) = QQT
therefore the columns of Q orthonormal⇔ rows of Q orthonormal (columns of QT orthonor-mal)
Definition: Q ∈ Rm×m is called ORTHOGONAL if QTQ = I(m) = QQT (orthonormalwould be a better name however due to historical reasons it is named orthogonal)
Definition: A ∈ Rm×n and A = QR where Q ∈ Rm×m is orthogonal and R ∈ Rm×nis an upper triangular matrix, then we say that we have a QR factorisation of A
Proposition
Orthogonal matrices preserve length and angleIf Q ∈ Rm×m and QTQ = I(m), then ∀v,w ∈ Rm
〈Qv, Qw〉 = 〈v,w〉 ’angle’ (?)
and‖Qv‖ = ‖v‖ ’length’ (??)
Proof:
〈Qv, Qw〉 = (Qv)TQw
= vTQTQw = vTI(m)w
= vTw = 〈v,w〉
‖Qv‖ = [〈Qv, Qv〉]12 =by the above
[〈v,v〉]12 = ‖v‖
Geometric vectorsab = |a| |b| cos θ
18
see diagram 20081009.M2AA3.2
One can show, see section 1.4 Cauchy-Schwarz, that
〈v,w〉 = ‖v‖ ‖w‖ cos θ
〈Qv, Qw〉 = ‖Qv‖ ‖Qw‖ cosφ=
(??)‖v‖ ‖w‖ cosφ
But
(?) ⇒ cos θ = cosφ⇒ θ = φ as θ, φ ∈ [0, π]
Proposition
Q1, Q2 ∈ Rm×m, orthogonal i.e. QT1 Q1 = I(m) = QT
2 Q2 then Q1Q2 ∈ Rm×m is orthogonal.
Proof:
(Q1Q2)TQ1Q2 = QT
2 QT1 Q1Q2 = QT
2 I(m)Q2 = QT2 Q2
= I(m)
therefore Q1Q2 is orthogonal.
Ex: (rotation matrices) m = 2
Q =(
cos θ − sin θsin θ cos θ
)= (q1,q2)
therefire〈qi,qj〉 = δij i, j = 1→ 2
therefore Q is orthogonal as its columns are orthonormalQ represents rotation through an angle θ(
xy
)= l
(cosφsinφ
)
l =(x2 + y2
) 12(
ab
)= Q
(xy
)
19
Transform (xy
)→(l0
)choose
θ = −φ
therefore
cos θ = cos (−φ) = cosφ =x
l
sin θ = sin (−φ) = − sinφ = −yl
therefore
Q =(
xl
yl
−yl
xl
)where
l =(x2 + y2
) 12
therefore the rotation matrix
Q =1
(x2 + y2)12
(x y−y x
)(orthogonal)
takes (xy
)→
( (x2 + y2
) 12
0
)for 0 ≤ p < q ≤ m introduce Gpq (θ) ∈ Rm×m
Gpq (θ) =
1 0
0. . .
1cos θ sin θ ← p
1. . .
1sin θ cos θ ← q
1. . .
0 ↑p
↑q
1
=
20
=
e(m)j =
1...010...0
← jth position of j 6= p ∨ q
0...0
cos θ ← pth
0...0
sin θ ← qth
0...0
position if j = p
0...0
− sin θ ← pth
0...0
cos θ ← qth
0...0
position if j = q
each column of Gpq (θ) has unit length, the columns are also orthogonal
therefore the columns of Gpq (θ) are orthonormal.
⇒ Gpq (θ) ∈ Rm×m an orthogonal matrix
a ∈ Rm
21
Gpq (θ)a = b ⇒ bj = aj if j 6= p ∨ qbp = cos θap − sin θaqbq = sin θap + cos θaq
SimilarlyGpq (θ)m×m
Am×n
= Bm×n
All rows of B are the same as A except rows p and qGpq (θ) are called GIVENS Rotation Matrices (circa. 1950)
Obtain a QR factorisation of A using a sequence of Givens rotations (an alternative pro-cedure: Householder reflections)
Ex: m = 3, n = 2
A =
3 654 012 13
take a sequence of Givens Rotation so that
A→
X X0 X0 0
= R
Choose G12 (θ) such that
G12 (θ)A =
X X0 X12 13←
last row not affected
G12 (θ) =
cos θ − sin θ 0sin θ cos θ 0
0 0 1
G12 (θ)
3 654 012 13
x = 3, y = 4, l = 5
15
(3 4−4 3
)(34
)=(
50
)therefore choose
G12 (θ) =
34
45 0
−45
35 0
0 0 1
A(1) = G12 (θ)A =
34
45 0
−45
35 0
0 0 1
3 654 012 13
=
5 390 −5212 13
22
use a rotation matrix to obtain 0 in row 3, column 1choose either G13 (φ) or G23 (φ)?choose G13 (φ) as G23 (φ) will affect row 2, column 1 which would be counter productive
G13 (φ) =
cosφ 0 − sinφ0 1 0
sinφ 0 cosφ
choose φ on x = 5 and y = 12⇒ l = 13
⇒ G13 (φ) =
513 0 12
130 1 0−12
13 0 513
A(2) = G13 (φ)A(1) =
513 0 12
130 1 0−12
13 0 513
5 390 −5212 13
=
13 270 −520 −31
now use G13 (ψ) or G23 (ψ)? G13 (ψ) would mess up the 0 in 3,1, therefore use G23 (ψ)
x = −52, y = −31⇒ l =√
3665
A(3) = G23 (ψ)A(2) =
13 270√
36650 0
= R upper triangular
with strictly positive diagonal entries.
Note:Gpq (.) makes the (q, p)th element in the current A zero.
Therefore
R = A(3) = G23 (ψ)A(2)
= G23 (ψ)G13 (φ)A(1)
= G23 (ψ)G13 (φ)G12 (θ)︸ ︷︷ ︸G
A
G is a product of Givens rotationseach Gpq (.) is orthogonaltherefore G is orthogonal (a product of orthogonal matrices)
GTG = I = GGT
23
therefore
GA = R
⇒ GTGA = GTR
⇒ A = QR
whereQ = GT
Note:QTQ =
(GT)T
GT = GGT = I
therefore Q orthogonaltherefore it is a QR Factorisation of A
General A ∈ Rm×n with m ≥ nApply a sequence of Givens Rotations to take A to R ∈ Rm×n upper triangular with strictlypositive diagonal entries
GA = R
whereG = Gnm . . .Gnn+1︸ ︷︷ ︸
column n
. . .G2m . . .G23︸ ︷︷ ︸column 2
G1m . . .G12︸ ︷︷ ︸column 1
Gpq makes (q, p)th element zeroif y = 0, then Gpq = I Gpq ∈ Rm×mLet Q = GT ∈ Rm×m ⇒ Q is orthogonal
GA = R⇒ A = QR (QR factorisation)
We might be interested in solving
A√
m×nx?
n×1= b
√
m×1(m ≥ n)
Apply G ∈ Rm×m to Ax = b
GAx = Gb
⇒ Rm×n
xn×1
= c ∈ Rm×1 (equiv. system to Ax = b)
see diagram 20081028.M2AA3.1
24
If m > n and if ci 6= 0 for some i = n + 1 → m, there is no solution to Rx = c(≡there isno solution to Ax = b)INCONSISTENT SYSTEM (Return to this later in the course)
Otherwise i.e. ci = 0 i = n+ 1→ m
∃!x ∈ Rn!=unique
such that Ax = b (Rx = c)
Solve by backward substitution
xn =cnrnn
xi =
(ci −
∑nj=i+1 rijxj
)rii
i = n− 1, n− 2, . . . , 2, 1
This is all that’s needed to do questions on sheet 1.
1.4 Cauchy-Schwartz Inequality
For geometric vectors in R3
a · b = |a| |b| cos θ
see diagram 20081009.M2AA3.2
Generalises to Rn
〈a,b〉 = aTb = ‖a‖ ‖b‖ cos θ⇐⇒ |〈a,b〉| = ‖a‖ ‖b‖ |cos θ| ≤ ‖a‖ ‖b‖
Theorem (Cauchy-Schwarz Inequality)
If you take any vector a,b ∈ Rn, then
|〈a,b〉| ≤ ‖a‖ ‖b‖
with equality if and only if a and b are linearly dependent.
Proof:If a = 0, then 〈a,b〉 = 0 and ‖a‖ = 0 so result is trivial
25
If a 6= 0, then q = a‖a‖ ⇒ 〈q,q〉 = ‖q‖2 = 1
let
c = b− 〈b,q〉q⇒ 〈c,q〉 = 〈b− 〈k,q〉q,q〉
=(3)〈b,q〉 − 〈b,q〉 〈q,q〉︸ ︷︷ ︸
1
= 0
see diagram 20081028.M2AA3.2
0 ≤ ‖c‖2 = 〈c, c〉 = 〈c,b− 〈b,q〉q〉=(2)〈c,b〉 − 〈b,q〉 〈c,q〉
= 〈c,b〉=(3)〈b− 〈b,q〉q,b〉
= 〈b,b〉 − 〈b,q〉 〈q,b〉= ‖b‖2 − [〈q,b〉]2
∴ [〈q,b〉]2 ≤ ‖b‖2
∴ [〈a,b〉]2 ≤ ‖a‖2 ‖b‖2
Taking square root ⇒ desired result
Equality if and only if c = 0i.e. b a multiple of qi.e. b a multiple of ai.e. a and b lin. dep.
1.5 Gradients and Hessians
f : R→ R, f (x)
one independent variable
see diagram 20081029.M2AA3.1
Taylor Series
f (a+ h) = f (a) + hf ′ (a) +h2
2f ′′ (a) +
O(h3)↑R , where |R| ≤ Ch3, write O
(h3)
26
We want to generalise this to functions of n independent variables
f : Rn → R f (x1, x2, . . . xn)
Write f (x) where x =
x1...xn
∈ RnPartial derivative of f with respect to xi, write as δf
δxi(x);
(differentiate f with respect to x; holding x1, . . . xi−1, xi+1, . . . , xn as constants)
Ex. n = 2, f (x1, x2), x =(x1
x2
)∈ R2
f (x) = sinx1 sinx2
δf
δx1(x) = cosx1 sinx2
δf
δx2(x) = sinx1 cosx2
δ2f
δxiδxj=
δ
δxi
[δf
δxj
](?)=
δ
δxj
[δf
δxi
]=
δ2f
δxjδxii, j = 1→ n
(?)← if both derivatives exist and are continuous
Ex.
δ2f
δx2δx1(x) =
δ
δx2
[δf
δx1(x)]
= cosx1 cosx2
‖δ2f
δx1δx2(x) =
δ
δx1
[δf
δx2(x)]
= cosx1 cosx2
δ2f
δx21
=δ
δx1
[δf
δx1(x)]
= − sinx1 sinx2
δ2f
δx22
=δ
δx2
[δf
δx2(x)]
= − sinx1 sinx2
Chain Rule
(n = 1)f : R→ R, f (x)
Change variables t = t (x)⇐⇒ x = x (t)e.g.
x (t) = t2 ⇐⇒ t (x) = x12
27
Let w (t) = f (x (t))dw
dt(t) =
df
dx(x (t))
dx
dt(t)
Extend this to
f : Rn → R, f (x) , x =
x1...xn
∈ RnExample:
x (t) =√
a + t
√
h (√
= fixed)
see diagram 20081029.M2AA3.2
⇒ xi (t) = ai + thi i = 1→ n
In generalf (x) , x (t)
Let w (t) = f (x (t))
dw
dt(t) =
δf
δx1(x (t))
dx1
dt(t) + . . .
δf
δxn(x (t))
dxndt
(t)
dw
dt(t) =
n∑i=1
δf
δxi(x (t))
dxidt
(t) (11)
Ex. n = 2, f (x1, x2) , x =(x1
x2
)∈ R2
f (x) = sinx1 sinx2
x1 (t) = t2, x2 (t) = cos t
⇒ w (t) = f (x (t)) =u
sin t2v
sin (cos t)dw
dt(t) = cos t2sin (cos t) 2t︸ ︷︷ ︸
vu′
+ sin t2u
cos (cos t) (− sin t)︸ ︷︷ ︸v′
δf
δx1(x (t))
dx1
dt(t) +
δf
δx2(x (t))
dx2
dt(t)
28
going back to example
f : Rn → R, f (x) , x =
x1...xn
∈ Rnx (t) = a + th
see diagram 20081029.M2AA3.3
⇒ xi (t) = ai + thi
⇒ dxidt (t) = hi i = 1→ n
Letw (t) = f (x (t)) = f (a + th) (12)
=⇒
see diagram 20081029.M2AA3.4
Taylor series for w (t)⇒
w (1) = w (0) + 1 · w′ (0) +1212w′′ (0) + . . .
= w (0) + w′ (0) +12w′′ (0)
(11)⇒(12)
f (a + h) = f (a) +n∑
9=1
δf
δxi(a)hi + . . . (13)
29
(11)
dw
dt(t) =
n∑i=1
hiδ
δxif (x (t))
∴d
dt≡
n∑i=1
hiδ
δxi
⇒(d
dt
)m≡
(n∑i=1
hiδ
δxi
)m
∴d2w
dt2(t) =
n∑j=1
hjδ
δxj
( n∑i=1
hiδ
δxi
)f (x (t))
=n∑j=1
n∑i=1
hjhiδ2f
δxjδxi(x (t))
⇒ w′′ (0) =n∑i=1
n∑j=1
hihjδ2f
δxjδxi(a)
inserting this into (13)
=⇒ f (a + h) = f (a) +n∑i=1
hiδf
δxi(a) +
12
n∑i=1
n∑j=1
hihjδ2f
δxjδxi(a) +O
(‖h‖3
)compare this to (11).
We introduce the GRADIENT of f (gard f - vector of first order partial derivatives)
∇f (x) ∈ Rn
∇f (x) =
δfδx1
(x)...
δfδxn
(x)
i.e.
[∇f (x)]i =δf
δxi(x) i = 1→ n
Introduce the HESSIAN of f (matrix of second derivatives)
D2f (x) ∈ Rn×n[D2f (x)
]ij =
δ2f
δxiδxj(x) i, j = 1→ n
"smooth" f ⇒ D2f (x) is symmetric
30
n = 2
D2f (x) =
δ2fδx2
1(x) δ2f
δx1δx2(x)
↙ δ2fδx2
2(x)
A ∈ Rn×n Aij
An×n
xn×1∈ Rn [Ax]i =
n∑j=1
Aijxj
xT
1×nAn×n
xn×1
= xT (Ax) =n∑i=1
xi (Ax)i
=n∑i=1
n∑j=1
xiAijxj
∴ f (a + h) = f (a) + hT∇f (a) +12hTD2f (a)h +O
(‖h‖3
)
Ex.f (x) = xTAx ∀x ∈ Rn
where A ∈ Rn×n and is symmetric.∴ f : Rn → Rfind (i) ∇f (x) (the gradient of f) (ii) D2f (x) (the hessian of f)
(i)
f (x) = xTAx =n∑i=1
n∑j=1
Aijxixj
[∇f (x)]p =δf
δxp(x) =
n∑i=1
n∑j=1
Aijδ
δxp(xixj)
δ
δxp(xixj) =
δxiδxp
xj + xiδ
δxpxj
x1, . . . , xn are independent variables
⇒ δxiδxp
= δip i, p = 1→ n
31
∴ [∇f (x)]p =n∑i=1
n∑j=1
Aij (δipxj + xiδjp)
=n∑j=1
Apjxj +
(AT)pi
‖n∑i=1
Aipxi = [Ax]p +[ATx
]p
⇒ ∇f (x) = Ax +ATx
= 2Ax if AT = A
(ii)[D2f (x)
]qp
= δ2fδxqδxp
(x)
we know thatδf
δxp(x) =
n∑j=1
Apjxj +n∑i=1
(AT)pixi
⇒ δ2f
δxqδxp(x) =
n∑j=1
Apjδjq +n∑i=1
(AT)piδiq
= Apq +(AT)pq
Note:
δ = kronecker deltaδf
δx= partial derivative
∴ D2f (x) = A+AT
= 2A if AT = A
∴ f (x) = xTAxf : Rn → R
∇f (x) = 2Ax ∈ Rn
D2f (x) = 2A ∈ Rn×n
Analogue of f (x) = ax2 a ∈ Rf : R→ R f ′ (x) = 2ax f ′′ (x) = 2a
32
Definition:f : Rn → R
f (x) has a LOCAL MINIMUM at x = aif ∀u ∈ Rn, ‖u‖ = 1,∃ε > 0 such that
f (a + hu) ≥(≤)
f (a) ∀h ∈ [0, ε]
see diagram 20081030.M2AA3.1
n = 1, f : R→ Ru
see diagram 20081030.M2AA3.2
Reminder: Taylor Series
f (a + h) = f (a) + hT∇f (a) +12hTD2f (a)h +O
(‖h‖3
)(14)
Proposition
Let n = 1. Then f ′ (a) = 0 and f ′′ (a) > 0<0
are sufficient conditions for f to have a local
minimummaximum
at x = a.
Proof: n = 1⇒ u = ±1
see diagram 20081104.M2AA3.1
f (a± h) = f (a)± hf ′ (a) +12
(±h)2 f ′′ (a) +O(h3)
= f (a) +12h2f ′′ (a)︸ ︷︷ ︸
>0
+O(h3)
︸ ︷︷ ︸≥0 for h sufficiently small
as f ′ (a) = 0
≥(≤)
f (a) for h sufficiently small ⇒ x = a is a local minimum(maximum)
of f
33
Proposition
If ∇f (a) 6= 0, then f (x) does not have a local minimum or maximum at x = a
Proof: Put h = hu, ‖u‖ = 1, in (14)
⇒ f (a + hu) = f (a) + huT∇f (a) +O(h2)
for h ≥ 0
∇f (a) 6= 0
letu = ± ∇f (a)
‖∇f (a)‖⇒ ‖u‖ = 1
∴ f
(a± h ∇f (a)
‖∇f (a)‖
)= f (a) +
h
‖∇f (a)‖‖∇f (a)‖2 +O
(h2)
= f (a)± h‖∇f (a)‖︸ ︷︷ ︸>0
+O(h2)
>(<)
f (a) for h sufficiently small, no local min or max
∴ ∇f (a) = 0 is a necessary condition for f (x) to have a local minimum or maximum atx = a. Points a where ∇f (a) are called stationary points of f (x).
Proposition
If ∇f (a) = 0 andwTD2f (a)w >
(<)0 ∀w ∈ Rn
and w 6= 0then f (x) has a local minimum
maximumat x = a.
Proof: h = hu, ‖u‖ = 1, in (14)
⇒ f (a + hu) = f (a) + huT∇f (a)︸ ︷︷ ︸=0as ∇f(a)=0
+12h2uTD2f (a)u +O
(h3)
︸ ︷︷ ︸≥(≤)
0 for h suff. small if w2D2f(a)w >0(<)w 6=0
≥(≤)
f (a) for h suff. small ⇒ local minmax
Ex.
n = 2, x =(x1
x2
)∈ R2
f (x) =(x2
1 − 2x1 + 1)
+(x2
2 − 2x2 + 1)
f : R2 → R
34
∇f (x) =
[δfδx1
(x)δfδx2
(x)
]=[
2x1 − 22x2 − 2
]= 2
(x1 − 1x2 − 1
)∈ R2
∇f (a) = 0⇔ a =(
11
)only one stationary point
Find
D2f (x) =
δ2fδx2
1(x) δ2f
δx1δx2(x)
↙ δ2fδx2
2(x)
=[
2 00 2
]= 2I ∈ R2×2
wTD2f (a)w = 2wTw = 2 ‖w‖2 > 0
∴ a =(
11
)is a local minimum (also a global minimum as it’s the only stationary point).
[Obvious as f (x) = (x1 − 1)2 + (x2 − 1)2]
Definition
A ∈ Rn×n is called positive definite if
xTAx > 0 ∀x ∈ Rn, x 6= 0
or negative definite ifxTAx < 0 ∀x ∈ Rn, x 6= 0
or non-negative definite ifxTAx ≥ 0 ∀x ∈ Rn
or non-positive definite ifxTAx ≤ 0 ∀x ∈ Rn
Example:
n = 2, A =(
1 −1−1 1
), x =
(x1
x2
)∈ R2
xTAx =2∑i=1
2∑j=1
Aijxixj = x21 + x2
2 − 2x1x2
= (x1 − x2)2 ≥ 0 ∀x ∈ R2
∴ A is non-negative definite but not positive definite.
e.g. a =(
11
)aTAa = 0
Using these definitions, we can rewrite the above proposition
35
Proposition
If ∇f (a) = 0 and D2f (a) is{
positivenegative
}definite, then f (x) has a local
{minimummaximum
}at x = a.
1.6 Inner Products Revisited and Positive Definite Matrices
Let A ∈ Rn×n be symmetric(AT = A
)and positive definite
(xTAx > 0 ∀x ∈ Rn, x 6= 0
)Generalise the idea of an inner product by defining
〈u,v〉A = uTAv ∀u,v ∈ Rn
(perviously 〈u,v〉 ≡ 〈u,v〉I = uTIv = uTv)
Make sure the properties of the inner product still hold with this new definition
〈., .〉A : Rn × Rn → R
〈v,u〉A = vTAu =(vTAu
)T= uTATv AT=A= uTAv
= 〈u,v〉A∴ 〈v,u〉A = 〈u,v〉A ∀u,v ∈ Rn (symmetric)Easy to show that
〈u, αv + βw〉A = α 〈u,v〉A + β 〈u,w〉A〈αu + βv,w〉A = α 〈u,w〉A + β 〈v,w〉A
}∀u,v,w ∈ Rn, ∀α, β ∈ R
Introduce the idea of a generalised norm (length) by defining
‖u‖A = [〈u,u〉A]12 ∀u ∈ Rn
Note:
〈u,u〉A = uTAu > 0 if u 6= 0= 0 if and only if u = 0
∴ ‖u‖A ≥ 0 ∀u ∈ Rn
‖u‖A = 0 if and only if u = 0
A key property of positive definite matrices is that they are invertiblei.e.
Ax = 0⇒ xTAx = 0⇒ x = 0
⇒columns of A are linearly independent ⇒ A−1 exists.
36
Theorem (Generalised Cauchy-Schwarz Inequality)
If A ∈ Rn×n is symmetric positive definite, then
|〈u,v〉A| ≤ ‖u‖A ‖v‖A ∀u,v ∈ Rn
with equality if and only if u and v are linearly dependent.
Proof: Simply replace 〈., .〉 by 〈., .〉A and ‖.‖ by ‖.‖A in the original proof.
It is easy to generate symmetric positive definite matrices.Given P ∈ Rn×n, which is invertible i.e. P−1 exists, then A = PTP is symmetric positivedefinite.Check:
A ∈ Rn×n√
AT =(PTP
)T= PT
(PT)T
= PTP = A√
any x ∈ Rn xTAx = xTPTPx = (Px)T (Px) = ‖Px‖2 ≥ 0
Note‖Px‖ = 0⇔ Px = 0⇔ x = 0 as P−1 exists.
√
∴ A is positive definite.
We now prove the reverse implication.
Theorem
Let A ∈ Rn×n be any symmetric positive definite matrix.Then ∃ an invertible P ∈ Rn×n such that A = PTP .Furthermore we can choose P to be upper triangular with Pii > 0 i = 1 → n (diagonalentries stricly positive) in which case we say that A = PTP is a Cholesky Decomposi-tion/Factorisation of A.
Proof:
Let {v}ni=1 be any n linearly independent vectors in Rn.Using the inner product 〈., .〉A induced by A, 〈a,b〉A = aTAb, we apply the Classical Gram-Schmidt process to {vi}ni=1.
u1 =v1
‖v1‖A⇒ ‖u1‖A = 1
wi = vi −i−1∑j=1
〈vi,uj〉A uj ⇒ 〈wi,wj〉 = 0 j = 1→ i− 1, i = 2→ n
ui =wi
‖wi‖A⇒ ‖ui‖A = 1 i = 2→ n
37
⇒ ui is a linearl combination of {vj}ij=1 , i = 1→ n and
〈ui,uj〉A = δij i, j = 1→ n
Let
U = [u1,u2, . . . ,un] ∈ Rn×n
⇒ AU = [Au1, Au2, . . . , Aun] ∈ Rn×n
UTAU ∈ Rn×n
UT =
uT
1
uT2...
uTn
[UTAU
]ij
= uTi↑
ith row of UT
· Auj↑
jth col. of AU
= 〈ui,uj〉A = δij i, j = 1→ n
∴ UTAU = I(n)
∴ U−1 = UTA exists⇒
(U−1
)T =(UTA
)T= ATU = AU
LetP = U−1 ∈ Rn×n P−1 = U exists
PTP =(U−1
)TU−1 = AUU−1 = A
√
To show that we can choose P upper triangular with Pii > 0, i = 1 → n, we chooseparticular {vi}ni=1
Let vi = e(n)i =
0...010...0
← ith position ∈ Rn
(e(n)i
)j
= δij i, j = 1→ n
u1 is a multiple of e1 =
×0...0
ui is a linearl combination of
{e(n)j
}ij=1
, i = 2→ n
∴ ui ∈ Rn with (ui)k = 0 if k > i i = 1→ n
⇒ U = [u1,u2, . . . ,un] ∈ Rn×n upper triangular
38
we now show that (ui)j > 0 i = 1→ n
u1 =e(n)
1∥∥∥e(n)1
∥∥∥A
=
×0...0
↖ strictly positive
ui =wi
‖wi‖A∴ (ui)i > 0 if and only if (wi)i > 0 i = 2→ n
wi = e(n)i −
i−1∑j=1
⟨e(n)i ,uj
⟩A
uj
(uj)k = 0 if k > j j = 1→ n
⇒ (wi)i =(e(n)i
)i= 1 > 0
∴ U ∈ Rn×n is upper triangular with Uii = (ui)i > 0 i = 1→ nFind
P = U−1 = [p1,p2, . . . ,pn]
UP = I(n) =[e(n)
1 , e(n)2 , . . . , e(n)
n
]i.e.
[Up1 , Up2 , Up3 , . . . ] =[e(n)
1 , e(n)2 , . . .
]i.e.
Upi = e(n)i i = 1→ n (15)
solve by backwards substitution∴ (pi)n = (pi)n−1 = · · · (pi)i+1 = 0i.e.
(pi)k = 0 for k > i i = 1→ n
ith row of (15)
Uii (pi)i +n∑
k=i+1
Uik (pi)k︸ ︷︷ ︸=0 as(pi)k=0 for k>i
=(e(n)i
)i= 1
∴ (pi)i = 1Uii
> 0 i = 1→ n∴ P is upper triangular with Pii = (pi)i > 0 i = 1→ n
Proposition
A ∈ Rn×n symmetric positive definite⇒ Akk > 0 k = 1→ n and
|Ajk| < (Ajj)12 (Akk)
12 j, k = 1→ n j 6= k
39
Proof: From the above theorem A = PTP , P ∈ Rn×n, P−1 existsLet
P {p1,p2, . . . ,pn} pi ∈ Rn P−1 exists⇒ {pi}ni=1 lin. indep.
A = PTP =
pT
1
pT2...
pTn
[p1 p2 . . . pn]
∴ Ajk = pTj pk j, k = 1→ n
∴ Akk = pTk pk = ‖pk‖2 > 0 k = 1→ n as pk 6= 0
|Ajk| =∣∣pTj pk
∣∣ = |〈pj ,pk〉| <cauchy schwarz inequality
‖pj‖ ‖pk‖ j, k = 1→ n j 6= k
it is a strict inequality as {pi}ni=1 lin. ind.
Using the result ‖p‖k = (Akk)12 k = 1→ n
⇒ |Ajk| < (Ajj)12 (Akk)
12 j, k = 1→ n j 6= k
Compute a Cholesky Decomposition of A.Let L = PT. Find a lower triangular matrix L ∈ Rn×n with Lii > 0 i = 1→ n such thatA = LLT.Could compute L = PT, where P = U−1 and U = [u1 . . . un] with{
e(n)i
}ni=1
CGS→〈.,.〉A
{ui}ni=1
there is, however, an easier way.
Let L = [l1 l2 . . . ln] li ∈ Rn(lower triangular and Lii > 0 i = 1→ n)
L =
× 0... ×...
.... . .
......
. . .× × ×
A = LLT
Aij =n∑k=1
Lik(LT)kj
=n∑k=1
LikLjk =n∑k=1
(lk)i (lk)j
40
Note:lkn×1
lTk1×n∈ Rn×n ⇒
(lklTk
)ij
= (lk)i (lk)j
∴ Aij =n∑k=1
(lklTk
)ij
⇒
A =n∑k=1
lklTk
Example
n = 3Find the Cholesky Decomposition of
A =
A12 2 −1 0−1 5
2 −10 −1 5
2
i.e. Find L ∈ R3×3 lower triangular Lii > 0 i = 1→ 3 such that A = LLT.Is A symmetric positive definite?Cearly AT = A
√
Akk > 0 k = 1→ 3
|A12| = |−1| = 1 <√
2
√52
=√
5 = (A11)12 (A22)
12 etc.
The above are necessary, and not sufficient.Check directly that A is positive definite.
x =
x1
x2
x3
6= 0⇒ xTAx =3∑i=1
3∑j=1
Aijxixj
= 2x21 +
52x2
2 +52x2
3 −2x1x2 − 2x2x3︸ ︷︷ ︸+2
„−x1
r
«x2s
+2
„−x2
r
«x3s
41
Note:
(r + s)2 = r2 + 2rs+ s2 ≥ 0
⇒ rs ≥ −12(r2 + s2
)∀r, s ∈ R (16)
≥applying (16)
2x21 +
52x2
2 +52x2
3 −(x2
1 + x22
)−(x2
2 + x23
)
= x21 +
12x2
2 +32x2
3
≥ 12
3∑i=1
x2i =
12‖x‖2 > 0 ∀x ∈ R3 x 6= 0
Recapp from exercisen = 3
A =
A12 2 −1 0−1 5
2 −10 −1 5
2
symmetric, positive definite
√
L = [l1 l2 l3] lower triangular
A = l1lT1 + l2lT2 + l3lT3
find L
l1 =
×××
l2 =
0××
l3 =
00×
A = lT1 l1 + l2lT2 + l3lT3 (17) × × ×
× × ×× × ×
symmetric
+
0 0 00 × ×0 × ×
symmetric
+
0 0 00 0 00 0 ×
∴ first column/row of A is generated by l,Equate first columns of l1lT1
3×3and A
3×3 (l1)1 (l1)1(l1)2 (l1)1(l1)3 (l1)1
=
A11
A21
A31
⇒ (l1)i =Ai1(l1)1
i = 1→ 3
42
but [(l1)1]2 = A11
∴ (l1)i =Ai1√A11
i = 1→ 3
∴ in this example
l1 =1√2
2−10
Let
A(1) = A− l1lT1
=
2 −1 0−1 5
2 −10 −1 5
2
− 12
4 −2 0−2 1 00 0 0
⇒ A(1) =
0 0 00 2 −10 −1 5
2
= l2lT2 + l3lT3 by (17)
2nd column/row of A(1) is generated by l2
(l2)i =A
(1)i2√A
(1)22
⇒ l2 =1√2
02−1
A(2) = A(1) − l2lT2
=
0 0 00 2 −10 −1 5
2
− 12
0 0 00 4 −20 −2 1
=
0 0 00 0 00 0 2
= l3lT3 by (17)
∴ l3 = 1√2
002
∴ L = [l1 l2 l3] =1√2
2 0 0−1 2 00 −1 2
lower triangular Lii > 0 i = 1→ 3
Check A = LLT√
43
Now consider the above constructive algorithm in the general casei.e. A ∈ Rn×n symmetric positive definite.Since A11 > 0, we can start the algorithm
l1 =1√A11
A11
A21...
An1
Let A(1) = A− l1lT1 ∈ Rn×n (symmetric as A, l1lT1 are both symmetric)and has the structure
A(1) =
0 −→ 0↓0 B
, where B ∈ R(n−1)×(n−1) is symmetric
To continue we needA
(1)22 = B11 > 0
We will now prove that B is positive definite ⇒ Bkk > 0 k = 1→ n− 1
To do this, we note that
e1 =
10...0
∈ Rn ⇒ Ae1 =
A11
A21...
An1
← first column of A
eT1 Ae1 = A11
∴ l1 =1√eT
1 AAe1
Theorem
B ∈ R(n−1)×(n−1), as defined above, is positive definite
Proof: We need to show that uTBu > 0 ∀u ∈ Rn−1, u 6= 0.
Given u ∈ Rn−1, u 6= 0, let v =(
0u
)∈ Rn so v 6= 0
eT1 v = 0, e1,v1 6= 0⇒ e1,v lin. ind.
44
uTBu = vTA(1)v
= vT(A− l1lT1
)v
= vTAv − vT l1 lT1 v
1× n n× 1︸ ︷︷ ︸1× n n× 1︸ ︷︷ ︸(lT1 v)T
= vTAv −(lT1 v
)2l1 =
1√eT
1 Ae1
Ae1
⇒ lT1 =1√
e1Ae1eT
1
AqAT
∴ uTBu = vTAv −(eT
1 Av)2
eT1 Ae1
= 〈v,v〉A −[〈e1,v〉A]2
〈e1, e1〉A
=‖v‖2A ‖e1‖2A − [〈e1,v〉A]2
‖e1‖2AApply Cauchy-Schwarz inequality
|〈e1,v〉A| <e1,v lin ind.
‖e1‖A ‖v‖A
⇒ uTBu > 0 ∀uRn−1, u 6= 0
∴ B is positive definite.and
B = BT ⇒ Bkk > 0 k = 1→ n− 1
∴ A(1)22 = B11 > 0⇒ Cholesky Decomposition can continue etc.
Application of the Cholesky Decomposition
Given A ∈ Rn×n symmetric positive definite.If we find the Cholesky Decomposition of A
i.e. A = LLT L ∈ Rn×n lower triangular with Lii > 0 i = 1→ n,
then it is easy to solve An×n
xn×1
= bn×1
for a given b ∈ Rn.
Ax = b⇐⇒√
L
√
LT ?x︸︷︷︸
z?
=√
b
45
∴ Llower triang.
z = b and LT
upper triang.x = z
Solve for z by a forward solve
z1 =b1L11
zk =
(bk −
∑k−1j=1 Lkjzj
)Lkk
k = 2→ n
Solve for x by a back solve
xn =znLTnn
=znLnn
xk =
zk −∑nj=k+1
(LT)kj︸ ︷︷ ︸
Ljk
xj
(LT)kk︸ ︷︷ ︸
Lkk
k = (n− 1)→ 1
1.7 Least Squares Problems
Give A ∈ Rm×n (m ≥ n) , b ∈ RmFind x ∈ Rn such that
Ax = b(m equationsn unknowns
)If m > n, then generally there is no solution x to Ax = b.So find an approximate solution in some sense.Find x∗ ∈ Rn such that Ax∗ − b is “small”. Make this precise.
Example
Pendulum
see diagram 20081113.M2AA3.1
length l, period T √g
lT = 2π
estimate g (acceleration due to gravity) from the above relationship.
46
Let L =√l and c = 2π√
g ⇒√
Lc =√
T
Do m experiments.Lm×1
c1×1
= Tm×1
L =
L1...Lm
T =
T1...Tm
see diagram 20081113.M2AA3.2
Fit a straight line through the origin to the dots
Choose c ∈ R to minimise the sum of the squares of the errors
minc∈R
m∑i=1
(Ti − cLi)2 = minc∈R‖T− cL‖2
Let
S (c) = ‖T− cL‖2 = 〈T− cL,T− cL〉= 〈T,T〉 − 2c 〈L,T〉+ c2 〈L,L〉
∴ S (c) = ‖T‖2 − 2c 〈L,T〉+ c2 ‖L‖2
dS
dc(c) = −2 〈L,T〉+ 2c ‖L‖2
d2S
dc2(c) = 2 ‖L‖2 > 0
dS
dc(c∗) = 0 ⇐⇒ c∗ =
〈L,T〉‖L‖2
∴ S (c∗) ≤ S (c) ∀c ∈ R
c∗ is the global minimum of S (c).
Note:
c∗ ∈ R is such that−〈L,T〉+ c∗‖L‖2
〈L,L〉= 0
⇒ 〈T− c∗L,L〉 = 0
47
see diagram 20081113.M2AA3.3
Generalise to √
Am×n
xn×1
=√
bm×1
m > n generally no solution x as we have an overdetermined system.Find x∗ ∈ Rn such that
‖Ax∗ − b‖ ≤ ‖Ax− b‖ ∀x ∈ Rn
i.e.minx∈Rn
‖Ax− b‖ ≡ minx∈Rn
‖Ax− b‖2
Let Q(x) = ‖Ax− b‖2Q : Rn → R
minQ (x)
Q1Sheet2
Recall -Given A ∈ Rm×n, b ∈ Rm (m ≥ n)For m > n, in general, no solution x ∈ Rn to Ax = b.Find approximate solution x∗ ∈ Rn such that
Q (x∗) ≤ Q (x) = ‖Ax− b‖2 ∀x ∈ Rn
Q : Rn → R
Q (x) = ‖Ax− b‖2 = 〈Ax− b, Ax− b〉 = (Ax− b)T (Ax− b)=
(xTAT − bT
)(Ax− b) = xTATAx− bTAx− xTATb + bTb
= xTATAx− 2bTAx+ bTb since xTATb =(xTATb
)T= bTAx ∈ R
= xTGx− 2µTx + ‖b‖2
where G = AT
n×nAm×n
∈ Rn×n and µ = AT
n×mb
m×1∈ Rn
NoteGT =
(ATA
)T= ATA = G⇒ G ∈ Rn×n symmetric
µ = ATb⇒ µT = bTA
Recall Q1 on Sheet2 ⇒
∇Q (x) = 2(Gx− µ
)D3Q (x) = 2G
48
Theorem
Let A ∈ Rm×n (m ≥ n) has n linearly independent columns, and b ∈ Rm.Then ATA ∈ Rn×n is symmetric positive definiteMoreever ∃ unique x∗ ∈ Rn such that ATAx∗ = ATb [Normal Equations of Ax = b] andx∗ is the global minimum of Q (x) = ‖Ax− b‖2 x∗ is called the least squares solution ofAx = b.Proof: ATA is symmetric - see above
A = [a1 a2 . . . an] , ai ∈ Rm lin. ind.
cTATAc = (Ac)TAc = ‖Ac‖2 ≥ 0
and = 0 if and only if Ac = 0 i.e.n∑i=1
ciai = 0
{ai}ni=1 lin. ind. ⇒ c = 0
∴ cTATAc > 0 ∀c ∈ Rn, c 6= 0∴ ATA ∈ Rn×n is symmetric positive definite.∴(ATA
)−1 exists∴ ∃ unique x∗ ∈ Rn solving ATAx∗ = ATbNow show x∗ is the global minimum of Q (x)
Q (x) = ‖Ax− b‖2 = xTATAx− 2(ATb
)Tx + ‖b‖2
⇒ ∇Q (x) = 2(ATAx−ATb
)D2Q (x) = 2ATA
For x∗ ∈ Rn a local minimum of Q (x), we require ∇Q (x∗) = 0 and D2Q (x∗) to be sym-metric positive definite.∴ ∇Q (x∗) = 0⇐⇒ ATAx∗ = ATb∴!x∗ such that ∇Q (x∗) = 0, D2Q (x∗) = ATA s.p.d. (symmetric positive definite) ⇒ x∗
global minimum of Q (x).
Ex m = 3, n = 2
A =
3 654 012 13
b =
111
easy to show that ∃ no x ∈ R2 solving Ax = b∴ compute the least squares solution x∗ ∈ R2 such that
ATAx∗ = ATb
ATA =[
3 4 1265 0 13
]AT2×3
3 654 012 13
A
3×2
=(
169 351351 4394
)AT
2×3b
3×1=(
1978
)2×1
49
(169 351351 4394
)x∗ =
(1978
)x∗ =
(0.090587 . . .0.010515 . . .
)Note Ax∗ − b 6= 0
In practice, it is not a good idea to solve the normal equations (ATAx∗ = ATb) sinceATA is generally ill-conditioned.A matrix B ∈ Rn×n is ill-conditioned if small changes to the RHS of the system Bx = blead to large changes in the solution, unacceptable errors on a computer.i.e.
Bx = b, B (x + δx) = b + δb
B ill-conditioned: “small” δb⇒”large” δx
Alternative procedure to find x∗
In practice, find x∗ using the QR approach.Take a sequence of Givens rotations
G = Gmn . . . G13G12 such that
Gm×m
Am×n
= Rm×n
upper triangular with rii > 0 i = 1→ n
Gpq p < q
Gpq ∈ Rm×m orthogonal
GTpqGpq = I(m) = GpqG
Tpq
take out original system
Ax = b⇒ GAx = Gb⇒ Rx = Gb ∈ Rm
=
(Gb)1...
(Gb)n0...0
+
0...0
(Gb)n+1...
(Gb)m
= α+ β α, β ∈ Rm⟨
α, β⟩
= 0
50
Recall
R =
↖ X X0 ↖
↘ X↘0
0 0
↑n↓↑
m−n↓
Ax⇒ b⇒ Rx = α+ β
if β = 0 ⇒ ∃ unique solution x ∈ Rn to Rx = α = Gb
⇒ ∃ unique solution x ∈ Rn to the original system Ax = b
if β 6= 0 ⇒ Rx = Gb = α+ β is an inconsistant system⇒ no solution x
⇒ no solution x to Ax = b
If β 6= 0, solve the consistent systemRx∗ = α
∃!x∗ solving this, solve by a back solve
Claim: x∗ ∈ Rn is the least squares solution to Ax = bi.e.
‖Ax∗ − b‖2 ≤ ‖Ax− b‖2 ∀x ∈ Rn
Orthogonal matrices preserve length ‖Gy‖ = ‖y‖ ∀y ∈ Rm
∴ ‖Ax− b‖2 = ‖G (Ax− b)‖2 =∥∥Rx−
(α+ β
)∥∥2
=⟨(Rx− α)− β, (Rx− α)− β
⟩∴ ‖Ax− b‖2 = ‖Rx− α‖2 +
∥∥β∥∥2 − 2⟨Rx− α, β
⟩︸ ︷︷ ︸=0
since⟨Rx, β
⟩∀x∈Rn
=⟨α, β
⟩= 0
∴ minx∈Rn
‖Ax− b‖2 = minx∈Rn
‖Rx− α‖2 +∥∥β∥∥2
∴ Rx∗ = α⇒ ‖Rx∗ − α‖ = 0
∴ x∗ is such that∥∥β∥∥2 = ‖Ax∗ − b‖2 ≤ ‖Ax− b‖2 ∀x ∈ Rn
51
Example
Use QR approach on the example abovem = 3, n = 2
A =
3 654 012 13
b =
111
G = G23 (ψ)G13 (φ)G12 (θ) (recall from QR notes)
GA = R =
13 270 (3665)
12
0 0
Gb =
1913501
13(3665)12
41
(3665)12
=
1.46154 . . .0.63659 . . .0.67725 . . .
⇒ α =
1.46154 . . .0.63659 . . .0.67725 . . .
β =
00
0.67725
Solve Rx∗ = α⇒ x∗ =
(0.0905870.010515
)∴ ‖Ax∗ − b‖2 =
∥∥β∥∥2 = (0.67725)2
2 Least Squares Problems (A more abstract approach)
A more abstract definition of an inner product
Definition:
Let V be a real vector spaceAn inner product on V × V is a function 〈., .〉 : V × V → R such that
(i) 〈λu+ µv,w〉 = λ 〈u,w〉+ µ 〈v, w〉
(ii) 〈u, v〉 = 〈v, u〉
(i)+(ii)⇒ 〈w, λu+ µv〉 = λ 〈w, u〉+ µ 〈w, v〉
(iii) 〈u, u〉 ≥ 0 with equality if and only if u = 0 ∈ V
An inner product induces a norm
‖u‖ = [〈u, u〉]12 ∀u ∈ V
⇒ ‖u‖ = 0 if and only if u = 0
52
ExampleV = C [a, b] continuous functions over the closed interval [a, b]
Let w ∈ C [a, b] with w (x) > 0 ∀x ∈ [a, b] - w is the weight function
Define (two continous functions over the interval [a, b]) 〈f, g〉=∫ ba w (x) f (x) g (x) dx ∀f, g ∈
C [a, b]Clearly 〈., .〉 : V × V → R and (i), (ii) clearly hold.
(iii)
〈f, f〉 =∫ b
aw (x) [f (x)]2 dx ≥ 0 ∀f ∈ C [a, b]
= 0 if and only if f ≡ 0 function
Cauchy-Schwarz Inequality
|〈u, v〉| ≤ ‖u‖ ‖v‖ ∀u, v ∈ V
with strict inequality if and only if u, v are linearly independent.
Proof: same as before.
Abstract Form of the Least Squares Problem
Let V be a real vector space with inner product 〈., .〉. Let U be a finite dimensional subspaceof V with basis {φi}ni=1, by basis we mean linearly independent and span the subspace U .
Given v ∈ V , find u∗ ∈ U such that
‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U
Example
V = C [a, b] , 〈f, g〉 =∫ b
af (x) g (x) dx
(i.e. w(x)≡1)
U = Pn−1 (polynomials of degree ≤ n− 1)Basis φi = xi−1 i = 1→ n
‖v − u∗‖2 ≤ ‖v − u‖2 =∫ b
a[v (x)− u (x)]2 dx
Return to the general case
u ∈ U ⇐⇒ u =n∑i=1
λiφi for some λ =
λ1...λn
∈ R
53
u∗ ∈ U ⇐⇒ u∗ =n∑i=1
λ∗iφi ⇐⇒ λ∗ ∈ Rn
Finding u∗ ∈ U such that ‖v − u∗‖2 ≤ ‖v − u‖2 ∀u ∈ U⇐⇒ finding λ∗ ∈ Rn such that ‖v −
∑ni=1 λ
∗iφi‖
2 ≤ ‖v −∑n
i=1 λiφi‖2 ∀λ ∈ Rn
Let
E (λ) =
∥∥∥∥∥v −n∑i=1
λiφi
∥∥∥∥∥2
E : Rn → R≥0
Find λ∗ ∈ Rn such that E (λ∗) ≤ E (λ) ∀λ ∈ Rn.
E (λ) =
⟨v −
n∑i=1
λiφi, v −n∑j=1
λjφj
⟩i, j dummy variable
= 〈v, v〉 −n∑i=1
λi 〈φi, v〉︸ ︷︷ ︸−n∑j=1
λj 〈v, φj〉︸ ︷︷ ︸the same
+n∑i=1
n∑j=1
λiλj 〈φi, φj〉
Let µ ∈ Rn such that µi = 〈v, φi〉 i = 1→ nG ∈ Rn×n such that Gij = 〈φi, φj〉 i, j = 1→ n∴ E (λ) = ‖v‖2 − 2µTλ+ λTGλ
=⇒ ∇E (λ) = 2(Gλ− µ
)D2E (λ) = 2G
E (λ∗) is a local minimum of E (λ) if
∇E (λ∗) = 0
and G is positive definite.
∇E (λ∗) = 0⇔ Gλ∗ = µ normal equations
G is called the Gram matrix, it depends on the basis {φi}nn=1 for U .[Sometimes write G (φ1 φ2 . . . φn)]
Lemma:{φi}ni=1 basis for U
⇒ {φi}ni=1 lin. ind.⇒ G is symmetric positive definite
Proof: G ∈ Rn×nGij = 〈φi, φj〉 i, j = 1→ n
Gji = 〈φj , φi〉(ii)= 〈φi, φj〉 = Gij i, j = 1→ n
54
λTGλ =n∑i=1
n∑j=1
Gijq
〈φi,φj〉
λiλj
(i),(ii)=
⟨n∑i=1
λiφi,n∑j=1
λjφj
⟩
=
∥∥∥∥∥n∑i=1
λiφi
∥∥∥∥∥2
≥ 0
and = 0 if and only if∑n
i=1 λiφi = 0 ∈ V ⇐⇒ λ = 0 as {φi}ni=1 linearly independent.
∴ λTGλ > 0 ∀λ ∈ Rn, λ 6= 0 =⇒ G is symmetric positive definite.
G positive definite ⇒ G−1 exists⇒ ∃!λ∗ ∈ Rn solving Gλ∗ = µ normal equations⇒ ∇E (λ∗) = 0
(and is unique - no other stationary points).∴ D2E (λ∗) = 2G symmetric positive definite∴ λ∗ ∈ Rn which solves the normal equations is the global minimum of E (λ)⇒ u∗ =
∑ni=1 λ
∗iφi
Recall -V a real Vector Space, Inner Product 〈., .〉U a finite dimensional subspace, basis {φi}ni=1
Given v ∈ V , find u∗ ∈ U such that
‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U
Then u∗ =∑n
i=1 λ∗iφi, where λ
∗ ∈ Rn is the unique solution of
Gλ∗ = µ Normal Equations
G ∈ Rn×n is the GRAM MATRIX depends on basis for U .
Gij = 〈φi, φj〉 i, j = 1→ n
symmetric positive definite
µ ∈ Rn µi = 〈v, φi〉 i = 1→ n
Theorem (orthogonality property):⟨v − u∗︸ ︷︷ ︸, uerror
⟩= 0 ∀u ∈ U
55
see diagram 20081124.M2AA3.1
Proof:
Gλ∗ = µ
⇒ λTGλ∗ = λTµ ∀λ ∈ Rn
Implication goes the other way as well
λTGλ∗ = λTµ ∀λ ∈ Rn
Let
λ = ei =
0...010...1
← ith position
=⇒(Gλ∗)i = µi
Repeat for i = 1→ n =⇒ Gλ∗ = µ
∴ Gλ∗ = µ ⇔ λTGλ∗ = λTµ ∀λ ∈ Rn
⇔n∑i=1
n∑j=1
Gijq
〈φi,φj〉
λiλ∗j =
n∑i=1
λi µiq
〈v,φi〉
⇔
⟨n∑i=1
λiφi︸ ︷︷ ︸u∈U
,n∑j=1
λ∗jφj︸ ︷︷ ︸u∗
⟩=
⟨v,
n∑i=1
λiφi︸ ︷︷ ︸u∈U
⟩∀λ ∈ Rn
⇔ 〈v − u∗, u〉 = 0 ∀u ∈ U
Example
1. V = C [0, 1], 〈f, g〉 =∫ 10 f (x) g (x) dx
U = Pn−1 Basis{xi−1
}ni=1
i.e. φi = xi−1
u ∈ Pn−1 ⇔ u (x) =∑n
i=1 λixi−1
Given v ∈ C [0, 1], find u∗ (x) =∑n
i=1 λ∗ixi−1i such that
‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ Pn−1
56
⇐⇒ ‖v − u∗‖2 ≤ ‖v − u‖2 ∀u ∈ Pn−1
⇐⇒∫ 10 (v − u∗)2 dx ≤
∫ 1
0(v − u)2 dx
Find λ∗ from solving the normal equations Gλ∗ = µ
µi = 〈v, φi〉 =∫ 10 v (x)xi−1dx i = 1→ n
Gij = 〈φi, φj〉 =∫ 10 x
i−1xj−1dx =∫ 10 x
i+j−2dx = 1i+j−1 i, j = 1→ n
=⇒ G =
1 1
2 . . . 1n
12
13 . . . 1
n+1...
.... . .
...1n
1n+1 . . . 1
2n−1
the n× n Hilbert Matrix
Badly conditioned, columns → linear dependence as n→∞.
2. V = Rm, 〈a,b〉 = aTb ∀a,b ∈ Rm
U = span {ai}ni=1 where n ≤ m, linearly independent
i.e. φi = ai ∈ RmGiven v ∈ Rm, find u∗ =
∑ni=1 λ
∗i ai such that
‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U (18)
Let A = [a1 a2 . . . an] ∈ Rm×n
Am×n
λ∗
n×1=
n∑i=1
λ∗i ai
(18)⇐⇒ ‖v −Aλ∗‖ ≤ ‖v −Aλ‖ ∀λ ∈ Rn
Find λ∗ from solving the Normal Equations Gλ∗ = µ
µ ∈ Rn µi = 〈v, φi〉 = 〈v,ai〉 = aTi v i = 1→ n
G ∈ Rn×n Gij = 〈φi, φj〉 = 〈ai,aj〉 = aTi aj i, j = 1→ n
A = [a1 a2 . . . an] m× n
AT =
aT
1
aT2...
aTn
n×m
⇒ ATA ∈ Rn×n(ATA
)ij
= aT1 aj
⇒ G = ATA
57
AT
n×mv
m×1∈ Rn
⇒(ATv
)i= aT
i v i = 1→ n
∴ µ = ATv
∴ Gλ∗ = µ⇒ ATAλ∗ = ATv Normal Equations for Aλ = v
Change basis
1.{xi−1
}ni=1
Gram-Schmidt→ {ψ}ni=1 orthonormal
Gij = 〈ψi, ψj〉 = δij i, j = 1→ n =⇒ G ≡ I⇒ λ∗ = µ
where µi = 〈v, ψi〉 i = 1→ n
2.{xi−1
}ni=1
Gram-Schmidt→ {ψ}ni=1 orthogonal
Gij = 〈ψi, ψj〉 = 0 i, j = 1→ ni 6=j
andGii = ‖ψi‖2 > 0 i = 1→ n
⇒ G is a diagonal matrix
Gλ∗ = µ⇒ λ∗i =µi
‖ψi‖2i = 1→ n
∴ u∗ =n∑i=1
〈v, ψi〉‖ψi‖2
ψi
It is very easy to construct this orthogonal basis.
3 Orthogonal Polynomials
V = C [a, b] 〈f, g〉 =∫ b
aw (x) f (x) g (x) dx
Weight function w ∈ C (a, b) with w (x) > 0 ∀x ∈ [a, b](w (x)→∞ as x→ a, or x→ b possibly)
see diagram 20081126.M2AA3.1
58
Require integral to be well-defined
|〈f, g〉| =∣∣∣∣∫ b
aw (x) f (x) g (x) dx
∣∣∣∣≤
∫ b
a|w (x) f (x) g (x)| dx
=∫ b
aw (x) |f (x)| |g (x)| dx ≤
∫ b
aw (x) dx max
a≤x≤b[|f (x)| |g (x)|]︸ ︷︷ ︸
<∞ as f,g∈C[a,b]
∴ require∫ ba w (x) dx <∞
Ex
[a, b] = [0, 1], w (x) = x−α α > 0
see diagram 20081126.M2AA3.2
w ∈ C (0, 1) ∫ 1
0x−αdx =
[x1−α
1− α
]1
0
<∞ if α < 1
Note, α = 1 ∫ 1
0x−1dx = [lnx]10 �∞
U = Pn polynomials of degree ≤ n
Canonical basis{xi}ni=0⇒ ill-conditioned Gram matrix
so construct a new basis for Pn{φi (x)}ni=0
where φj (x) is a Monic polynomial of degree j, and is also orthogonal 〈φi, φj〉 = 0 i 6= j
φj (x) = xj +j−1∑i=0
ajixi
(monic-leading coefficient is 1).∴ φ0 (x) = 1, φ1 (x) = x− a0, where a0 ∈ Rchosen so that 〈φ0, φ1〉 = 0.
59
TheoremMonic orthogonal polynomials, φj ∈ Pj , satisfies the three term recurrence relation
φj+1 = (x− aj)φj (x)− bjφj−1 (x) for j ≥ 1
where aj = 〈xφj ,φj〉‖φj‖2
and bj = ‖φj‖2
‖φj−1‖2also for j ≥ 1.
Proof:
φj (x) ∈ Pj , monic ⇒
φj+1 (x)− xφj (x) ∈ Pj ⇒ φj+1 (x)− xφj (x) =j∑
k=0
ckφk (x)
Find ck, k = 0→ n⟨n∑k=0
ckφ(x)k , φi (x)
⟩= 〈φj+1 (x)− xφj (x) , φi (x)〉 i = 0→ j
ci ‖φi (x)‖2 = −〈xφj (x) , φi (x)〉 i = 0→ j (19)
since {φj} orthogonal
〈xφj (x) , φi (x)〉 =∫ b
aw (x)xφj (x)φi (x) dx
=
⟨φj (x) , xφi (x)︸ ︷︷ ︸
∈Pi+1
⟩
φj (x) is orthogonal to {φk}j−1k=0
⇒ φj (x) is orthogonal to Pj−1 degree ≤ j − 1∴ if i+ 1 ≤ j − 1, i.e. i ≤ j − 2then 〈xφj (x) , φi (x)〉 = 0
∴ (19)⇒ ci = 0 if i ≤ j − 2∴ φj+1 (x)− xφj (x) = cj−1φj−1 (x) + cjφj (x)
all other coefficients are 0, where
cj−1 =−〈xφj (x) , φj−1 (x)〉‖φj−1 (x)‖2
cj =−〈xφj (x) , φj (x)〉‖φj (x)‖2
=−〈xφj , φj〉‖φj‖2
〈φj , xφj−1〉 =
⟨φj ,
Pj−1︷ ︸︸ ︷xφj−1 − φj
⟩︸ ︷︷ ︸
=0
+ 〈φj , φj〉
60
∴ cj−1 =−‖φj‖2
‖φj−1‖2
Let aj = −cj = 〈xφj ,φj〉‖φj‖2
Let bj = −cj−1 = ‖φj‖2
‖φj−1‖2
φj+1 (x)− xφj (x) = −ajφj (x)− bjφj−1 (x)
⇒ φj+1 (x) = (x− aj)φj (x)− bjφj−1 (x) j ≥ 1
whereaj = 〈xφj ,φj〉
‖φj‖2
bj = ‖φj‖2
‖φj−1‖2
j ≥ 1√
(20)
φ0 (x) = 1φ1 (x) = x− a0, where a0 ∈ R
such that〈φ1, φ0〉 = 0
i.e.〈x− a01, 1〉 = 0
a0 〈1, 1〉 = 〈x, 1〉
∴ a0 =〈x, 1〉‖1‖2
=〈xφ0, φ0〉‖φ0‖2
∴ extend (20) to j ≥ 0
φj+1 (x) = (x− aj)φj (x)− bjφj−1 (x) j ≥ 0
with φ0 (x) = 1, φ−1 (x) = 0
aj =〈xφj , φj〉‖φj‖2
j ≥ 0 bj =‖φ‖2
‖φj−1‖2j ≥ 1 (21)
Recall -
g (x) is even g(−x) = g (x) ∀x⇒∫ 2−2 g (x) dx = 2
∫ 20 g (x) dx
61
see diagram 20081127.M2AA3.1
g (x) is odd if g (−x) = −g (x) ∀x⇒∫ 2−2 g (x) dx = 0
see diagram 20081127.M2AA3.2
Ex
〈f, g〉 =∫ 1
−1f (x) g (x) dx
i.e. [a, b] = [−1, 1], w (x) = 1 ∀x ∈ [−1, 1]Find the monic orthogonal polynomials with respect to this inner product.Apply (21)
φ0 (x) = 1 φ1 (x) = x− a0 a0 =〈xφ0, φ0〉‖φ0‖2
=
∫ 1−1 xdx
‖φ0‖2= 0 as x is odd
⇒ φ1 (x) = x
φ2 (x) = (x− a1)φ1 (x)− b1φ0 (x)= (x− a1)x− b1
a1 =〈xφ1, φ1〉‖φ1‖2
=
∫ 1−1 x
3dx
‖φ1‖2= 0
b1 =‖φ1‖2
‖φ0‖2=
∫ 1−1 x
2dx∫ 1−1 12dx
=2∫ 10 x
2dx
2=
13
⇒ φ2 (x) = x2 − 13
etc.φ3 (x) = x3 − 3
5x
Summary
V = C [a, b],
〈f, g〉 =∫ b
aw (x) f (x) g (x) dx ∀f, g ∈ C [a, b]
62
with constraint that w ∈ C (a, b), w (x) > 0 ∀x ∈ [a, b] and it is integrable∫ b
aw (x) dx <∞
Given f ∈ C [a, b], we are looking to approximate this by a polynomial of degree n, findp∗n (x) ∈ Pn such that the associated norm with this product is minimal, ‖f − p∗n‖ ≥‖f − pn‖ ∀pn ∈ PnOrthogonal basis {φj (x)}nj=0 for Pn ⇒ p∗n (x) =
∑ni=0
〈f,φi〉‖φi‖2
φi (x)
p∗n ∈ Pn is the best approximation to f from Pn, in that norm ‖.‖.
Ex
Show that polynomials Tk (x) = cos(k cos−1 x
)for x ∈ [−1, 1] are orthogonal with respect
to the inner product
〈f, g〉 =∫ 1
−1
(1− x2
)− 12︸ ︷︷ ︸
w(x)
f (x) g (x) dx
see diagram 20081127.M2AA3.3
Tk (x) polynomial?
⇒ T0 (x) = 1T1 (x) = x
Introduce change of variableθ = cos−1 x⇒ x = cos θ
see diagram 20081127.M2AA3.4
x ∈ [−1, 1]⇔ θ ∈ [0, π]∴ Tk (x) = cos kθ
Recall the trigonometric identity
cos (k + 1) θ + cos (k − 1) θ = 2 cos kθ cos θ
⇒ Tk+1 (x) = 2xTk (x)− Tk−1 (x) k ≥ 1
63
⇒ T2 (x) = 2xT1 (x)− T0 (x)= 2x2 − 1 ∈ P2
T3 (x) = 2xT2 (x)− T1 (x)= 2x
(2x2 − 1
)− x
= 4x3 − 3x
by inductionTk (x) = 2k−1xk + · · · ∈ Pk
not monic.
Show {Tk (x)}k≥1 is orthogonal with respect to
〈f, g〉 =∫ 1
−1
(1− x2
)− 12︸ ︷︷ ︸
w(x)
f (x) g (x) dx
see diagram 20081202.M2AA3.1
∫ 1
−1
(1− x2
)− 12 Tk (x)Tj (x) dx
x = cos θ ⇒ dx
dθ= − sin θ∫ 0
π(sin θ)−1 cos kθ cos jθ (− sin θ) dθ
=∫ π
0cos kθ cos jθdθ
=12
∫ π
0[cos [(k + j) θ] + cos [(k − j) θ]] dθ
=12
[sin (k + j) θ
k + j+
sin (k − j) θk − j
]π0
not valid if k = j ∨ k = j = 0
=
0 if k 6= jπ2 if k = j 6= 0π if k = j = 0
∴ {Tk (x)}k≥0 orthogonal, not orthonormal. These polynomials are called Chebyshev Poly-nomials.
64
4 Polynomial Interpolation
Abandon best approximation, and consider the more practical approach of polynomial inter-polation.
Given {(zj , fj)}nj=0 with zj , fj ∈ C as j = 0→ n, find pn (z) ∈ Pn such that pn (zj) = fjwith j = 0→ n.Ex. zj , fj ∈ R j = 0→ n
see diagram 20081202.M2AA3.2
pn is called the interpolating polynomial for this data.
Natural Questions
1. Does pn exist?
2. Is pn unique?
3. What is the construction of pn?
1. Prove the existence by a construction proof. Clearly {zj}nj=0 should be distinct.
Lemma
Given {(zj , fj)}nj=0 with zj , fj ∈ C and j = 0→ n, zj distinct.Let
lj (z) =n∏k=0k 6=j
(z − zk)(zj − zk)
j = 0→ n
Then lj (z) ∈ Pn j = 0→ n and lj (zr) = δjr j, r = 0→ n.
Proof
lj (z) is a product of n factors of the form z−zkzj−zk
k 6= j ⇒ lj (z) ∈ Pn
lj (zr) =n∏k=0k 6=j
zr − zkzj − zk
If r = j ⇒ lj (zj) = 1If r 6= j ⇒ one factor = 0 when k = r
⇒ lj (zr) = 0
65
Example
zj ∈ R j = 0→ n
see diagram 20081202.M2AA3.3
{lj (z)}nj=0 are the lagrange basis functions.
Lemma
The interpolating polynomials pn (z) ∈ Pn for the data {(zj , fj)}nj=0, zj , fj ∈ C j = 0→ n,zj distinct is such that
pn (z) =n∑j=0
fjlj (z)
Proof
lj (z) ∈ Pn j = 0→ n
⇒ pn (x) =n∑j=0
fjlj (z) ∈ Pn
pn (zr) (polynomial with data point zr) you want to guarantee it spews out fr.
pn (zr) =n∑j=0
fjlj (zr) =n∑j=0
fjδjr = fr r = 0→ n
∴ pn (z) interpolates the data {(zj , fj)}nj=0
2. Is pn unique?
Theorem (Fundamental Theorem of Algebra)
Letpn (z) = a0 + a1z = a2z
2 + · · ·+ anzn ai ∈ C, i = 0→ n
Then pn (z) has at most n distinct roots (zeros) in C, unless ai = 0, i = 0→ n⇒ pn (z) ≡ 0.
Recall
Given {(zj , fj)}nj=0, zj , fj ∈ C, zj distinct; find the interpolating polynomial pn ∈ Pn suchthat
pn (zj) = fj j = 0→ n
66
Lagrange Construction
pn (z) =n∑j=0
fjlj (z) where lj (z)∈Pn
=n∏k=0k 6=j
(z − zk)(zj − zk)
j = 0→ n
lj (zr) = δjr j, r = 0→ n
Is the interpolating polynomial unique?Assume the contrary
∃pn, qn ∈ Pn such that
pn (zj) = qn (zj) = fj j = 0→ n
to get a contradiction, we will use the fundamental theorem of algebra
⇒ pn − qn ∈ Pn and
(pn − qn) (zj) = 0 j = 0→ n
∴ pn − qn ∈ Pn has (n+ 1) distinct roots (zeros) as zj are distinct
F.T.A. ⇒ pn − qn = 0⇒ pn = qn
⇒ uniqueness
Example
Find p2 ∈ P2 such that p2(0)z0
= af0, p2(1)
z1
= bf1, p2(4)
z2
= cf2
n = 2
p2 (z) =2∑j=0
fjlj (z)
l0 (z) =(z − z1) (z − z2)
(z0 − z1) (z0 − z2)=
(z − 1) (z − 4)(−1) (−4)
=14(z2 − 5z + 4
)l1 (z) = · · · = −1
3(z2 − 4z
)l2 (z) = · · · = 1
12(z2 − z
)∴ p2 (z) = al0 (z) + bl1 (z) + cl2 (z) lagrange form
=(a
4+b
3+
c
12
)z2 −
(5a4− 4b
3+
c
12
)z + a canonical form
One could find the coefficients in the canonical form directly by using pn (z) =∑n
k=0 akzk.
We know that
pn (zj) =n∑k=0
akzkj = fj , j = 0→ n,
67
⇒
1 z0 . . . zn01 z1 . . . zn1...
......
1 zn . . . znn
a0
a1...an
=
f0
f1...fn
,
⇒ V a↑
C(n+1)×(n+1)
= f , a, f ∈ Cn+1,
Vjk = zkj , j, k = 0→ n,
V is called Vandermonde matrix (Q4, Sheet5). In general V is ill-conditioned (as zj getsclose to zi, columns i and j become linearly independent (this is why it is ill-conditioned)).
Canonical Basis pn (z) =n∑k=0
akzk,
{zk}nk=0⇒ V a = f ,
You should certainly not use the canonical basis, it looks as if we should use the Lagrangebasis, however there is a flaw in this basis as we will see later, even though it is far better.
Lagrange Basis pn (z) =n∑k=0
fklk (z) ,
{lk (z)}nk=0 ⇒ If = f ,
The Lagrange basis if far better than the canonical basis. However, this basis has to beconstructed. Assume we have found pn−1 ∈ Pn−1, interpolating {(zj , fj)}n−1
j=0 and one isthen given a new data point (zn, fn) . One cannot use pn−1 ∈ Pn−1 to find pn ∈ Pn. Onehas to compute new Lagrange basis functions ∈ Pn.
We now look for an alternative construction. If pn−1 ∈ Pn−1 such that pn−1 (z) =fj , j = 0→ n− 1, now find pn ∈ Pn such that pn (zj) = fj , j = 0→ n. Let
pn (z) = pn−1 (z) + Cn−1∏k=0
(z − zk)︸ ︷︷ ︸∈Pn vanishes at zj , ,j=0→n−1
⇒ pn (zj) = pn−1 (zj) = fj , j = 0→ n− 1.
Then choose C ∈ C such that
pn (zn) = pn−1 (zn) + Cn−1∏k=0
(zn − zk) = fn,
{zj}nj=0 distinct ⇒ C =fn − pn−1 (zn)∏n−1k=0 (zn − zk)
,
68
∴ C depends on all data points {(zj , fj)}nj=0.
Classical Notation C = f [z0, z1, . . . , zn] ,
This is called a divided difference of order n (depends on (n+ 1) points).
∴ pn (z) = pn−1 (z) + f [z0, z1, . . . , zn]n−1∏k=0
(z − zk) ,
so the coefficient of zn in pn (z) is f [z0, z1, . . . , zn].
Note that pn is unique and pn (zj) = fj , j = 0→ n,
⇒ f [zπ0 , zπ1 , . . . zπn ] = f [z0, z1, . . . , zn] ,
for any permutation π of the points {z0, z1, . . . , zn}.
Lemma
If {(zj , fj)}nj=0, zj , fj ∈ C, zj distinct, then
f [z0, z1, z2, . . . , zn] =n∑j=0
fj∏nk=0k 6=j
(zj − zk).
Furthermore, if fj = f (zj) , j = 0→ n for some function f (z), then f [z0, z1, . . . zn] = 0if f ∈ Pn−1.
Proof
Compare coefficient of zn in the Lagrange form of pn (z) with
pn (z) = pn−1 (z) + f [z0, z1, . . . , zn]n−1∏k=0
(z − zk) . (22)
Coefficient of zn in (22) is f [z0, z1, . . . , zn].
Recall, Lagrange form
pn (z) =n∑j=0
fjlj (z) =n∑j=0
fj
n∏k=0k 6=j
(z − zk)(zj − zk)
, (23)
⇒ coefficient of zn in (23),n∑j=0
fj
n∏k=0k 6=j
1(zj − zk)
,
69
hence the result.
If fj = f (zj), j = 0 → n, when f ∈ Pn−1, then the uniqueness of the interpolatingpolynomial,
⇒ pn (z) = f (z) ∈ Pn−1.
see diagram 20081204.M2AA3.1
The coefficient of zn in pn (z) is f [z0, z1, . . . , zn]. But pn ∈ Pn−1 in this case,
⇒ f [z0, z1, . . . , zn] = 0.
Note that,
pn (z)↑
interpolates{(zj ,fj)}nj=0
= pn−1 (z)↑
interpolates{(zj ,fj)}n−1
j=0
+ f [z0, z1, . . . , zn]n−1∏k=0
(z − zk) ,
pn−1 (z) = pn−2 (z)↑
{(zj ,fj)}n−2j=0
+ f [z0, z1, . . . , zn−1]n−2∏k=0
(z − zk) ,
...
p1 (z)↑
{(zj ,fj)}1j=0
= p0 (z)↑
(z0,f0)
f [z0]=f0
+ f [z0, z1] (z − z0) ,
∴ pn (z) = f [z0]qf0
+n∑j=1
f [z0, . . . , zj ]j−1∏k=0
(z − zk) .
This is the Newton Form of the Interpolating Polynomial.
Note that,f [z0, z1, . . . , zj ] is the coefficient of zj in pj (z) ,
where pj ∈ Pj and pj (zk) = fk, k = 0→ j.
Theorem
For any distinct z0, z1, . . . , zn+1 ∈ C, the divided difference based on all the points,
f [z0, z1, . . . , zn+1]←n+2 points→
=f←n+1 points→[z0, z1, . . . , zn]− f
←n+1 points→[z1, z2, . . . , zn+1]
z0 − zn+1
70
Proof
Given {(zj , fj)}n+1j=0 , we construct pn, qn ∈ Pn such that,
pn (zj) = fj , j = 0→ n⇒ coefficient of zn in pn (z) is f [z0, z1, . . . , zn] ,qn (zj) = fj , j = 1→ n+ 1⇒ coefficient of zn in qn (z) is f [z1, z2, . . . , zn+1]
Letrn+1 (z) =
(z − zn+1) pn (z)− (z − z0) qn (z)z0 − zn+1
∈ Pn+1, (24)
rn+1 (z0) = pn (z0) = f0
rn+1 (zj) =(zj − zn+1) fj − (zj − z0) fj
z0 − zn+1, j = 1→ n,
= fj
rn+1 (zn+1) = qn (zn+1) = fn+1,
∴ rn+1 (z) ∈ Pn+1 is such that,
rn+1 (zj) = fj , j = 0→ n.
Compare the coefficient of zn+1 in (24),
⇒ f [z0, z1, . . . , zn+1]←n+2→
=f
←n+1→[z0, z1, . . . , zn]− f
←n+1→[z1, . . . , zn+1]
z0 − zn+1,
hence result. This is the divided difference recurrence relation.
Divided Difference Tableau
z0 f [z0] = f0
z1 f [z1] = f1↖← f [z0, z1] =
f [z0]− f [z1]z0 − z1
z2 f [z2]...
= f2↖← f [z1, z2] =
f [z1]− f [z2]z1 − z2
↖← f [z0, z1, z2]...
=f [z0, z1]− f [z1, z2]
z0 − z2
zn f [zn] = fn↖← f [zn−1, zn] =
f [zn−1]− f [zn]zn−1 − zn
↖← f [zn−2, zn−1, zn] etc.
Diagonal entries in this tableau appear in the Newton form of pn (z).
Example
n = 2
z0 = 0, z1 = 1, z2 = 4f0 = 0, f1 = b, f2 = c
71
z0 = 0 f [z0] = a
z1 = 1 f [z1] = b↖← f [z0, z1] =
a− b−1
= b− a
z2 = 0 f [z2] = c↖← f [z1, z2] =
b− c−3
=c− b
3↖← f [z0, z1, z2] =
(b− a)−(c−b3
)−4
=(a
4− b
3+
c
12
)so
p2 (z) = f [z0] + f [z0, z1] (z − z0) + f [z0, z1, z2] (z − z0) (z − z1)
= a+ (b− a) z +(a
4− b
3+
c
12
)z (z − 1) .
We may be interested in approximating a function f (z) that is complicated to evaluate, by apolynomial pn (z) ∈ Pn. Evaluate f (z) at {zj}nj=0, distinct points, and form the interpolatingpolynomial pn (z), pn (zj) = f (zj) j = 0→ n. Then approximate f (z), by pn (z)
see diagram 20081209.M2AA3.1
Theorem
Let pn (z) interpolate f (z) at n+ 1 distinct points {zj}nj=0, zj ∈ C, then the error: e (z) =f (z)− pn (z) is such that
e (z) = f [z0, z1, . . . , zn, z]n∏k=0
(z − zk) z 6= zjj=0→n
,
(Note that e (zj) = 0, j = 0→ n).
Proof
pn (z) interpolates f (z) at {zj}nj=0. We now add a new point which is different from thepoints we already have, zn+1 6= zj , j = 0→ n.. This implies that the new polynomial is
⇒ pn+1 (z) ∈ Pn+1, interpolates f (z) at {zj}n+1j=0 .
Newton form of pn+1 (z) is
pn+1 (z) = pn (z) + f [z0, z1, . . . , zn, zn+1]n∏k=0
(z − zk)
⇒ f (zn+1) = pn+1 (zn+1) + f [z0, z1, . . . , zn, zn+1]n∏k=0
(zn+1 − zk)
⇒ e (zn+1) = f [z0, z1, . . . zn, zn+1]n∏k=0
(zn+1 − zk) ,
72
but zn+1 is any point zn+1 6= zj j = 0→ n
zn+1 = z 6= zj j = 0→ n
⇒ e (z) = f [z0, z1, . . . , zn, z]n∏k=0
(z − zk)√
For the above result to be useful, we need to bound
f [z0, z1, . . . , zn, z] .
We restrict ourselves from now on to the real case,
zj = xj ∈ R, j = 0→ n, distinct
f (z) = f (x) a real function
f [xj ] = f (xj) j = 0→ n
zero order divided difference based on one point.
First order divided difference, is based on 2 points
e.g.
f [x0, x1] =f [x0]− f [x1]
x0 − x1
=f (x0)− f (x1)
x0 − x1.
Mean Value Theorem
f (x1) = f (x0) + (x1 − x0)distance moved
f ′ (ξ) where ξ lies between x0 and x1,
this assumes that f ∈ C1 [x0, x1] x0 < x2 (C1 [x1, x0] if x1 < x0)
∴ f [x0, x1] = f ′ (ξ)
1st order divided difference.
Recall
e (z) = f (z)− pn (z) = f [z0, z1, . . . , zn, z]n∏k=0
(z − zk) z 6= zj j = 0→ n,
(e (zj) = 0 j = 0→ n)
73
Theorem
Let f ∈ Cn [x0, xn], i.e. f and its first n derivaties are continuous on [x0, xn] , wherefor ease of exposition we have assumed that the real interpolations points are ordered,x0 < x1 < · · · < xn.
Then ∃ξ ∈ [x0, xn] such that
f [x0, x1, . . . , xn] =1n!f (n) (ξ) ,
n+ 1 points nth order divided difference.
Proof
Let pn ∈ Pn interpolate f (x) at xi with i = 0→ n, let
e (x) = f (x)− pn (x)⇒ e (xi) = 0 i = 0→ n,
∴ e (x) has at least (n+ 1) zeros in [x0, xn].
see diagram 20081210.M2AA3.1
Rolle’s Theorem
⇒ e′ (x) has at least n zeroes in [x0, xn],⇒ e′′ (x) has at least (n− 1) zeroes in [x0, xn],...⇒ e(n) (x) has at least 1 zero in [x0, xn]
Let ξ ∈ [x0, xn] be such thate(n) (ξ) = 0
e(n) (x) = f (n) (x)− p(n)n (x) ,
Recall Newton form of pn (x)
pn (x) = f [x0, x1, . . . , xn]xn + . . .
⇒ p(n)n (x) = n!f [x0, x1, . . . , xn] ∈ R,
∴ f (n) (ξ) = p(n)n (ξ) = n!f [x0, x1, . . . , xn] ,
hence result.
We now combine the above theorems.
74
Theorem
Let f ∈ Cn+1 [a, b]. Let {xi}ni=0 be our interpolation points which are distinct over theinterval [a, b]. If pn ∈ Pn interpolates f at {xi}ni=0, then the error e (x) = f (x) − pn (x)satisfies
|e (x)| ≤ 1(n+ 1)!
∣∣∣∣∣n∏i=0
(x− xi)
∣∣∣∣∣ maxa≤y≤b
∣∣∣f (n+1) (y)∣∣∣ ∀x ∈ [a, b] .
Proof
The result is clearly true for the interpolation points x = xi, i = 0 → n, as e (xi) = 0 ⇒0 ≤ 0
√since the product of factors
∏ni=0 (x− xi) also = 0.
1st Theorem ⇒
e (x) = f [x0, x1, . . . , xn, x]n∏k=0
(x− xk) x 6= xi, i = 0→ n.
2nd Theorem ⇒
e (x) =f (n+1) (ξ)(n+ 1)!
n∏k=0
(x− xk) for some ξ ∈ [a, b] ,
⇒
|e (x)| =1
(n+ 1)!
∣∣∣∣∣n∏k=0
(x− xk)
∣∣∣∣∣ ∣∣∣f (n+1) (ξ)∣∣∣
≤ 1(n+ 1)!
∣∣∣∣∣n∏k=0
(x− xk)
∣∣∣∣∣ maxa≤y≤b
∣∣∣f (n+1) (y)∣∣∣√
Let ‖g‖∞ = maxa≤x≤b
|g (x)|, (‖.‖∞ = infinity norm)
∴ ‖e‖∞ ≤1
(n+ 1)!
∥∥∥∥∥n∏i=0
(x− xi)
∥∥∥∥∥∞
∥∥∥f (n+1)∥∥∥∞.
Does ‖e‖∞ → 0 as n→∞, assuming f ∈ C∞ [a, b]?
Ex. 1
[a, b] =[−1
2,12
], f (x) = ex,
we know that
x, xi ∈[−1
2,12
]⇒ |x− xi| ≤ 1,
⇒
∥∥∥∥∥n∏i=0
(x− xi)
∥∥∥∥∥∞
≤
∥∥∥∥∥n∏i=0
|x− xi|
∥∥∥∥∥∞
≤ 1 ∀n,
75
∥∥∥f (n+1)∥∥∥∞
= ‖ex‖∞ = e12 ,
⇒ ‖e‖∞ ≤1
(n+ 1)!e
12 → 0 as n→∞.
Ex 2.
general [a, b], f (x) = cosx,⇒∥∥∥f (n+1)
∥∥∥∞≤ 1,
x, xi ∈ [a, b]⇒ |x− xi| ≤ b− a,
⇒ ‖e‖∞ ≤1
(n+ 1)!(b− a)n+1 → 0 as n→∞.
Ex 3.
f (x) = (1 + x)−1 on [0, 1]f ′ (x) = − (1 + x)−2
⇓f (n+1) (x) = (−1)n+1 (n+ 1)! (1 + x)−(n+2) ,
‖e‖∞ → 0 as n→∞?
‖f − pn‖9 0 as n→∞,
see Sheet5, Q12.
Can we choose the interpolation points {xi}ni=0 in a smart way?
Fix [a, b], fix n, and we are given f .Choose distinct interpolation points {xi}ni=0 ∈ [a, b] such that we minimize the product offactors
min{xi}ni=0
∥∥∥∥∥n∏i=0
(x− xi)
∥∥∥∥∥∞
, (25)
⇒minqn∈Pn
∥∥xn+1 − qn (x)∥∥∞ , (26)
solve (26) i.e. find q∗n ∈ Pn such that∥∥xn+1 − q∗n (x)∥∥∞ ≤
∥∥xn+1 − qn (x)∥∥∞ ∀qn ∈ Pn.
If xn+1 − q∗n (x) has n+ 1 distinct zeroes {xi}ni=0 in [a, b], the we have solved (25).
min{xi}ni=0
∥∥∥∥∥n∏i=0
(x− xi)
∥∥∥∥∥∞
⇒ minqn∈Pn
∥∥xn+1 − qn (x)∥∥∞
76
5 Best Approximation in ‖.‖∞(Best approximation in the uniform sense or “Minimax” approximation)Given g ∈ C [a, b], find q∗n ∈ Pn such that
‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn ⇐⇒ ‖g − q∗n‖∞ = minq∈Pn
{maxa≤x≤b
|g (x)− qn (x)|}
Theorem
Let g ∈ C [a, b] and n ≥ 0.
Suppose ∃q∗n ∈ Pn and (n+ 2) distinct points{x∗j
}n+1
j=0, where a ≤ x∗0 < x∗1 · · · < x∗n <
x∗n+1 ≤ b, such that
g(x∗j)− q∗n
(x∗j)
= (−1)j σ ‖g − q∗n‖∞ j = 0→ n+ 1, (27)
where σ = +1 or −1.Then q∗n ∈ Pn is the Best Approximation to g from Pn in ‖.‖∞ i.e.
‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn.
Examplen = 3, σ = +1 and E = ‖g − q∗n‖∞
see diagram 20081211.M2AA3.1
5 alternating extremes.
Proof
Let E = ‖g − q∗n‖∞, if E = 0 then q∗n = g is the best approximation. Assume E > 0 andsuppose ∃qn ∈ Pn d oing better than q∗n, i.e.
‖g − qn‖∞ < ‖g − q∗n‖∞ = E.
Consider q∗n − qn ∈ Pn at the n+ 2 points{x∗j
}n+1
j=0
q∗n(x∗j)− qn
(x∗j)
=[q∗n(x∗j)− g
(x∗j)]
+[g(x∗j)− qn
(x∗j)]
= (−1)j+1 σE + γj and |γj | < E,
∴ sign ((q∗n)− qn)(x∗j)
= sign((−1)j+1 σE
)j = 0→ n+ 1,
77
∴ q∗n− qn ∈ Pn changes sign at least n+1 times⇒ q∗n− qn ∈ Pn has (n+ 1) distinct zeroes
FTA⇒ q∗n − qn ≡ 0 ⇒ qn = q∗n
⇒ contradiction to ‖g − qn‖∞ < ‖g − q∗n‖∞
∴ q∗n ∈ Pn is the best approximation.A polynomial satisfying the condition (27) in the above theorem is said to have the Equioscil-
lation Property (or the error g − q∗n is said to have the equioscilation property, note that q∗nmay degenerate and have degree < n - see Sheet5, Q10).The above theorem is one half of the Chebyshev Equioscillation Theorem.
Let g ∈ C [a, b] and n ≥ 0. Then ∃ a unique q∗n ∈ Pn such that
‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn,
and hence satisfies (27).
Proof
Omitted (straightforward apparently...)
Construction of q∗n is difficult in general. Hence why we study best least squares andinterpolation. However for g (x) = xn+1 it is easy to construct q∗n.
Theorem
Let [a, b] ≡ [−1, 1]. Consider g (x) = xn+1. Then the best approximation to xn+1 by Pn in‖.‖∞ on [−1, 1] is
q∗n (x) = xn+1 − 2−nTn+1 (x) ,
where Tn+1 (x) is the Chebyshev polynomial of degree n+ 1.
Proof
Recall Tn (x) = cos(n cos−1 x
)with n ≥ 0, remember the change of variable
θ = cos−1 x⇔ x = cos θ
[−1, 1]⇔ [0, π]
Tn (x) = cosnθ
⇒ Tn+1 (x) = 2xTn (x)− Tn−1 (x) n ≥ 1,
T0 (x) = 1, T1 (x) = x
⇒ Tn+1 (x) = 2nxn+1 + . . .
∴ q∗n (x) = xn+1 − 2−nTn+1 (x) ∈ Pn.
78
∴ the error
xn+1 − q∗n (x) = 2−nTn+1 (x)= 2−n cos (n+ 1) θ.
79