M1 Numerical Analysis Course

Numerical AnalysisM1 SMA International

Ecole Centrale de Nantes

Anthony NOUY

[email protected]

Oce : F231

Origin of problems in numerical analysis References

Part I

Introduction

1 Origin of problems in numerical analysis

2 References


Part I

Introduction


2 References


Origin of problems in numerical analysis I

How to interpret the reality with a computer language: from a continuousworld to a discrete world.

Numerical solution of a dierential equation

Find u : x ∈ Ω 7→ u(x) such that

A(u) = b

Example 1 (1D diusion equation, beam in traction, ...)

− d

dx(α

du

dx) = b(x) for x ∈ Ω = (0, 1), u(0) = u(1) = 0


Origin of problems in numerical analysis II

Approximation (from a continuous to a discrete representation)

Represent a function u on a (nite-dimensional) approximation space :

u(x) =n∑i=1

uihi (x)

The solution is then represented by u = (u1, . . . , un) ∈ Rn.

For the denition of the expansion, dierent alternatives such as methodsbased on a weak formulation of the problem.

Example 2 (Galerkin approximation)

Find u ∈ V = v : (0, 1)→ R; v(0) = v(1) = 0 such that∫Ω

dv

dxαdu

dxdx =

∫Ω

v b dx ∀v ∈ V

and replace function space V by approximation spaceVn = v(x) =

∑n

i=1vihi (x) ⊂ V


Origin of problems in numerical analysis III

If A is a linear operator, the initial continuous equation is then transformed into

Linear systems of equations

Find u ∈ Rn such thatAu = b

where A ∈ Rn×n is a matrix and b ∈ Rn a vector.

In order to construct the system of equation (matrix A and right-hand-side b):

Numerical Integration ∫Ω

f (x) dx ≈K∑k=1

ωk f (xk)

If A is a nonlinear operator:


Origin of problems in numerical analysis IV

Nonlinear system of equations

Find u ∈ Rn such thatA(u) = b

where A : u ∈ Rn 7→ A(u) ∈ Rn.

Remedy: iterative solution techniques which transform the solution of anonlinear equation into solution of linear equations.

Example 3

− d

dx(α(x , u)

du

dx) = b(x , u) for x ∈ Ω = (0, 1), u(0) = u(1) = 0

Eigenproblems

Find (u, λ) ∈ Cn × C such that

Au = λu or Au = λBu

where A,B ∈ Cn×n are matrices.


Origin of problems in numerical analysis V

Example 4 (Eigenmodes of a beam)

Wave equation: solution u(x , t) such that

− ∂

∂x(α∂u

∂x) + ρ

∂2u

∂t2= 0 for x ∈ Ω = (0, 1), u(0, t) = u(1, t) = 0

for which we search solutions of the form u(x , t) = w(x) cos(ωt):

w ∈ Vn,

∫Ω

∂v

∂xα∂w

∂xdx = ω2

∫Ω

ρv w dx ∀v ∈ Vn

Ordinary dierential equations in time

d

dtu(t) + A(u(t); t) = b(t)


Part I

Introduction


2 References


References for the course

G. Allaire and S. M. Kaber.

Numerical linear algebra.Springer, 2007. → materials for chapters 1 (Linear Algebra), 2 (Linearsystems), 3(Eigenvalues)

K. Atkinson and W. Han.

Theoretical Numerical Analysis: A Functional Analysis Framework.Springer, 2009.→ materials for chapters 4 (Nonlinear equations), 5(Approximation/Interpolation)→ a quite abstract introduction to numerical analysis (very instructive),with an introduction to functional analysis

E. Suli and D. Mayers.

An Introduction to Numerical Analysis.Cambridge University Press, 2003.→ a clear and simple presentation of all the ingredients of the course

G. Allaire.

Numerical Analysis and Optimization.Cambridge University Press, 2007.→ additional material for numerical solution of PDE and optimizationproblems→ a natural continuation of the course

Matrices Reduction of matrices Vector and matrix norms

Part II

Linear algebra

3 Matrices

4 Reduction of matrices

5 Vector and matrix norms


Part II

Linear algebra

3 Matrices




Vector space

Let V be a vector space with nite dimension n, on the eld K (R or C).Let E = e1, . . . , en be a basis of V . A vector v ∈ V admits a uniquedecomposition

v =n∑i=1

viei

where the (vi )ni=1 are the components of v on the basis E . When a basis is

chosen and when there is no ambiguity, we can identify V to Kn (Rn or Cn)and let v = (vi )

ni=1, represented by the column vector

v =

v1...vn

We denote respectively by vT and vH the transpose and conjugate transpose ofv , which are the following row vectors

vT =(v1 . . . vn

)vH =

(v1 . . . vn

)where a denotes the complex conjugate of a.


Canonical inner product

We denote by (·, ·) : V × V → K the canonical inner product dened for allu, v ∈ V by

(u, v) = uT v = vTu =n∑i=1

uivi if K = R

(u, v) = uHv = vHu =n∑i=1

uivi if K = C

It is called euclidian inner product if K = R and hermitian inner product ifK = C.


Orthogonality

Orthogonality on a vector space V must be thought with respect to an innerproduct (·, ·). If not mentioned, we classically consider the canonical innerproduct.Two vectors u, v ∈ V are said orthogonal with respect to inner product (·, ·) ifand only if (u, v) = 0.A vector v is said orthogonal to a linear subspace U ⊂ V , which is denotedv ⊥ U, if and only if (v , u) = 0 for all u ∈ U. Two linear subspaces U ⊂ V andU ′ ⊂ V are said orthogonal, and it is denoted U ⊥ U ′, if

(u, u′) = 0 ∀u ∈ U, ∀u′ ∈ U ′

For a given subspace U ⊂ V , we denote by U⊥ its orthogonal complement,which is the largest subspace orthogonal to U. The orthogonal complement ofa vector v ∈ V is denoted by v⊥.


Matrices

Let V and W be two vector spaces with dimension n and m respectively, withbases E = (ei )

ni=1 and F = (fi )

mi=1. A linear map A : V →W , relatively to

those bases, is represented by a matrix A with m rows and n columns

A =

a11 a12 . . . a1na21 a22 . . . a2n...

......

am1 am2 . . . amn

where the coecients aij are such that

Aej =m∑i=1

aij fi , 1 6 j 6 n

We denote (A)ij = aij . The j-th column of A represents the vector Aej in thebasis F .

Denition 5

The set of matrices with m rows and n columns with entries in the eld K is avector space denotedMm,n(K) or Km×n.


Transpose

We denote AH the adjoint (or conjugate transpose) matrix of a complex matrixA = (aij ) ∈ Cm×n, dened by

(AH)ij = aji

We denote AT the transpose of a real matrix A = (aij ) ∈ Rn×m, dened by

(AT )ij = aji

We have the following characterization of AH and AT :

(Au, v) = (u,AHv) ∀u ∈ Cn, v ∈ Cm

(Au, v) = (u,AT v) ∀u ∈ Rn, v ∈ Rm


Product

To the composition of two linear maps corresponds the multiplication of theassociated matrices. If A = (aik) ∈ Km×q and B = (bkj ) ∈ Kq×n, the productAB ∈ Km×n is dened by

(AB)ij =

q∑k=1

aikbkj

We have(AB)T = BTAT , (AB)H = BHAH

The set of square matricesMn,n(K) is simply denotedMn(K) = Kn×n. In thefollowing, unless it is mentioned, we only consider square matrices.


Inverse

We denote by In the identity matrix on Kn×n, associated with the identity mapfrom V to V . If there is no ambiguity, we simply denote In = I and

(I )ij = δij

where δij is the Knonecker delta.A matrix is invertible if there exists a matrix denoted A−1 (unique if it exists)and called the inverse matrix of A, such that AA−1 = A−1A = I . A matrixwhich is not invertible is said singular. If A and B are invertible, we have

(AB)−1 = B−1A−1, (AT )−1 = (A−1)T ≡ A−T , (AH)−1 = (A−1)H ≡ A−H


Particular matrices

Denition 6

A matrix A ∈ Cn×n is said

Hermitian if A = AH

Normal if AAH = AHA

Unitary if AAH = AHA = I

Denition 7

A matrix A ∈ Rn×n is said

Symmetric if A = AT

Orthogonal if AAT = ATA = I


Particular matrices

A matrix A ∈ Kn×n is said diagonal if aij = 0 for i 6= j and we denote

A = diag(aii ) = diag(a11, . . . , ann) =

a11 0 . . . 0

0. . .

. . ....

.... . .

. . . 00 . . . 0 ann

A matrix A is said upper triangular if aij = 0 for i > j :

A =

a11 a12 . . . a1n0 a22 . . . a2n...

. . .. . .

...0 . . . 0 ann

A matrix A is said lower triangular if aij = 0 for j > i :

A =

a11 0 . . . 0

a21 a22. . .

......

.... . . 0

an1 an2 . . . ann


Properties of triangular matrices

Let Ln ⊂ Kn×n be the set of lower triangular matrices, and Un ⊂ Kn×n be theset of upper triangular matrices.

Theorem 8

If A,B ∈ Ln, then AB ∈ Ln

If A,B ∈ Un, then AB ∈ Un

A ∈ Ln (or Un) is invertible if and only if all its diagonal terms are nonzero.

If A ∈ Ln, A−1 ∈ Ln (if it exists)

If A ∈ Un, A−1 ∈ Un (if it exists)


Trace

Denition 9

The trace of a matrix A ∈ Kn×n is dened as

tr(A) =n∑i=1

aii

Property 10

tr(A + B) = tr(A) + tr(B), tr(AB) = tr(BA)


Determinant

Let Sn denote the set of permutations of 1, . . . , n. For σ ∈ Sn, we denote bysign(σ) the signature of the permutation, with sign(σ) = +1 (resp. −1) if σ isan even (resp. odd) permutation of 1, . . . , n.

Denition 11

The determinant of a matrix A ∈ Kn×n is dened as

det(A) =∑σ∈Sn

sign(σ)aσ(1)1 . . . aσ(n)n

Property 12

det(AB) = det(BA) = det(A)det(B)


Image, Kernel I

Denition 13

The image of A ∈ Km×n is a linear subspace of Km dened by

Im(A) = Av ; v ∈ Kn

The rank of a matrix A, denoted rank(A), is the dimension of Im(A):

rank(A) = dim(Im(A)) 6 min(m, n)

Denition 14

The kernel of A ∈ Km×n is a linear subspace of Kn dened by

Ker(A) = v ∈ Kn;Av = 0

The dimension of Ker(A) is called the nullity of A.

Property 15

dim(Im(A)) + dim(Ker(A)) = n


Image, Kernel II

Property 16

For A ∈ Rm×n,

Ker(AT ) + Im(A) = Rm, Ker(AT ) = Im(A)⊥

Ker(A) + Im(AT ) = Rn, Ker(A) = Im(AT )⊥

Proof.

Let us prove that Ker(AT ) = Im(A)⊥, which implies Ker(AT ) + Im(A) = Rm.First, u ∈ Ker(AT )⇒ ATu = 0 ⇒ vTATu = 0 ∀v ⇒ uT y = 0 ∀y ∈ Im(A) ⇒Ker(AT ) ⊂ Im(A)⊥.Secondly, u ∈ Im(A)⊥ ⇒ uTAv = 0 ∀v ⇒ vT (ATu) = 0 ∀v ⇒ ATu = 0 ⇒Im(A)⊥ ⊂ Ker(AT ).

Exercice.

Finish the proof.


Eigenvalues and eigenvectors I

Denition 17

Eigenvalues λi = λi (A), 1 6 i 6 n, of a matrix A ∈ Kn×n are the n roots of itscharacteristic polynomial

pA : λ ∈ C 7→ pA(λ) = det(A− λI )

The eigenvalues may be real or complex. An eigenvalue is said of multiplicity kif it is a root of pA with multiplicity k. The spectrum of matrix A is thefollowing subset of the complex plane

sp(A) = λi (A)ni=1

We have

tr(A) =n∑i=1

λi (A), det(A) =n∏i=1

λi (A)


Eigenvalues and eigenvectors II

Denition 18

The spectral radius ρ(A) of a matrix A is dened by

ρ(A) = max16i6n

|λi (A)|

Property 19

λ ∈ sp(A) if and only if the following equation has (at least) a nontrivialsolution v ∈ Cn\0:

Av = λv

Denition 20

For λ ∈ sp(A), a vector v satisfying Av = λv is called an eigenvector of Aassociated with λ. The linear subspace v ∈ Kn;Av = λv (with dimension atleast one) is called the eigenspace associated with λ.


Part II

Linear algebra

3 Matrices




Reduction of matrices

Let V be a vector space with dimension n and A : V → V a linear map on V .Let A be the matrix associated with A, relatively to the basis E = (ei )

ni=1 of V .

Relatively to another basis F = (fi )ni=1 of V , the application A is associated

with another matrix B such that

B = P−1AP

where P is an invertible matrix whose j-th column is composed by thecomponents of fj on the basis E .

Denition 21

Matrices A and B are said similar when they represent the same linear map intwo dierent basis, i.e. when there exists an invertible matrix P such thatB = P−1AP.


Theorem 22 (Triangularization)

For A ∈ Cn×n, there exists a unitary matrix U such that U−1AU is a triangularmatrix, called the Schur form of A (if upper triangular).

Remark.

The previous theorem says that there exists a nested sequence of A-invariantsubspaces 0 = V0 ⊂ V1 ⊂ . . . ⊂ Vn = Cn and there exists an orthonormalbasis of Cn such that Vi is the span of the rst i basis vectors.

Theorem 23 (Diagonalization)

For a normal matrix A ∈ Cn×n, i.e. such that AHA = AAH , there exists aunitary matrix U such that U−1AU is diagonal.

For a symmetric matrix A ∈ Rn×n, there exists an orthogonal matrix Osuch that O−1AO is diagonal.


Singular values and vectors

Denition 24

The singular values of A ∈ Km×n are the eigenvalues of√AHA ∈ Kn×n.

Singular values of A are real non-negative numbers.

Denition 25

σ ∈ R+ is a singular value of A if and only if there exists normalized vectorsu ∈ Km and v ∈ Kn such that we have simultaneously

Av = σu and AHu = σv

u and v are respectively called the left and right singular vectors of Aassociated with singular value σ.


Singular value decomposition (SVD) I

Theorem 26

For A ∈ Km×n, there exist two orthogonal (if K = R) or unitary (if K = C)matrices U ∈ Km×m and V ∈ Kn×n such that

A = USVH

where S = diag(σi ) ∈ Rm×n is a diagonal matrix, with σi the singular values ofA. The columns of U are the left singular vectors of A, and the columns of Vare the right singular vectors of A.

If n = m, S = diag(σi ) =

σ1. . .

σm

. If n 6= m,

S = diag(σi ) ∈ Rm×n must be interpreted as follows (0kl is a k × l matrix withzero entries):

σ1. . . 0m(n−m)

σn

if n > m,

σ1

. . .

σn0(m−n)n

if n < m,


Truncated Singular Value Decomposition (SVD)

The SVD of A can be written

A = USVH =

min(n,m)∑i=1

σiuivHi

After ordering the singular values by decreasing values (σ1 ≥ σ2 ≥ . . .), matrixA can be approximated by a rank-K matrix AK obtained by a truncation of theSVD:

AK =K∑i=1

σiuivHi

We have the following error estimate:

‖A− AK‖F2‖A‖F

=

min(n,m)∑i=K+1

σ2i


Illustration: SVD for data compression

Initial image (778× 643) Singular values Rank-10 SVD

















Part II

Linear algebra

3 Matrices




Vector norms

Denition 27

A norm on vector space V is an application ‖ · ‖ : V → R+ verifying

‖v‖ = 0 if and only if v = 0

‖αv‖ = |α|‖v‖ for all v ∈ V and ∀α ∈ K‖u + v‖ 6 ‖u‖+ ‖v‖ for all u, v ∈ V (triangle inequality)

Example 28 (For V = Kn)

(2-norm) ‖v‖2 =(∑n

i=1|vi |2

)1/2(1-norm) ‖v‖1 =

∑n

i=1|vi |

(∞-norm) ‖v‖∞ = maxi∈1,...,n |vi |

(p-norm) ‖v‖p =(∑n

i=1|vi |p

)1/pfor p > 1.


Useful inequalities

(·, ·) denote the canonical inner product.

Theorem 29 (Cauchy-Schwartz inequality)

|(u, v)| 6 ‖u‖2‖v‖2

Theorem 30 (Hölder's inequality)

Let 1 ≤ p, q ≤ ∞ such that 1

p+ 1

q= 1, then

|(u, v)| 6 ‖u‖p‖v‖q

Theorem 31 (Minkowski inequality)

Let 1 6 p 6∞, then‖u + v‖p 6 ‖u‖p + ‖v‖p

Minkowski inequality is in fact the triangular inequality for the norm ‖ · ‖p.


Matrix norms I

Denition 32

A norm on Km×n is a map ‖ · ‖ : Km×n → R+ which veries

‖A‖ = 0 is and only if A = 0

‖αA‖ = |α|‖A‖ for all A ∈ Km×n and ∀α ∈ K‖A + B‖ 6 ‖A‖+ ‖B‖ for all A,B ∈ Km×n (triangle inequality)

For square matrices (n = m), a matrix norm is a norm which satises thefollowing additional inequality

‖AB‖ 6 ‖A‖‖B‖ for all A ∈ Kn×n, B ∈ Kn×n

An important class of matrix norms is the class of subordinate matrix norms.

Denition 33 (subordinate matrix norm)

Given norms ‖ · ‖ on Kn and Km, we can dene a natural norm on Km×n,subordinate to the vectors norms, and dened by

‖A‖ = maxv∈Cn :v 6=0

‖Av‖‖v‖ = max

v∈Cn :‖v‖61

‖Av‖ = maxv∈Cn :‖v‖=1

‖Av‖


Matrix norms II

Example 34

When considering classical vector norms on Kn, we have the followingcharacterization of the subordinate norms of a square matrix A ∈ Kn×n:

‖A‖1 = maxv‖Av‖1‖v‖1 = maxj

∑i |aij |

‖A‖∞ = maxv‖Av‖∞‖v‖∞ = maxi

∑j |aij |

‖A‖2 = maxv‖Av‖2‖v‖2 =

√ρ(AHA) =

√ρ(AAH) = ‖AH‖2.

Note that ‖A‖2 corresponds to the dominant singular value of A.

Property 35

For all unitary matrix U (i.e. UUH = I ), we have

‖A‖2 = ‖AU‖2 = ‖UA‖2 = ‖UHAU‖2

If A is normal (i.e. AAH = AHA), then ‖A‖2 = ρ(A).


Matrix norms III

Theorem 36

Let A be a square matrix and ‖ · ‖ an arbitrary matrix norm. Then

ρ(A) 6 ‖A‖

For ε > 0, there exists at least one subordinate matrix norm such that

‖A‖ 6 ρ(A) + ε

Conditioning Direct methods Iterative methods

Part III

Systems of linear equations

6 Conditioning

7 Direct methodsTriangular systemsGauss eliminationLU factorizationCholesky factorizationHouseholder method and QR factorizationComputational work

8 Iterative methodsGeneralitiesJacobi, Gauss-Seidel, RelaxationProjection methodsKrylov subspace methods


The aim is to introduce dierent strategies for the solution of a system of linearequations

Ax = b

with A ∈ Rn×n, b ∈ Rn.


Part III


6 Conditioning




Condition number

Let consider the following two systems of equations10 7 8 77 5 6 58 6 10 97 5 9 10

x =

32233331

⇒ x =

1111

10 7 8 77 5 6 58 6 10 97 5 9 10

x =

32.122.933.130.9

⇒ x =

9.2−12.64.5−1.1

We observe that a little modication of the right-hand side leads a largemodication in the solution.If an error is made on the input data (here the right-hand side), the error onthe solution may be drastically amplied.This phenomenon is due to a bad conditioning of the matrix A. It reveals thatfor badly conditioned matrices, the solution of systems of equations obtainedwith nite precision computers has to be considered carefully or even notconsidered as a good solution.


Denition 37

Let A ∈ Kn×n be an invertible matrix and let ‖ · ‖ be a matrix normsubordinate to the vector norm ‖ · ‖. The condition number of A is dened as

cond(A) = ‖A‖‖A−1‖

Let b ∈ Kn be the right-hand side of a system and let δA ∈ Kn×n and δb ∈ Kn

be perturbations of matrix A and vector b.

Property 38

If x and xε are solutions of the following systems

Ax = b, Aεxε = bε,

with ‖A− Aε‖ = O(ε) and ‖b − bε‖ = O(ε), then

‖x − xε‖‖x‖ 6 cond(A)

(‖A− Aε‖‖A‖ +

‖b − bε‖‖b‖

)+ O(ε2)


Property 39

For every matrix A and every matrix norm, cond(A) > 1,cond(A) = cond(A−1), cond(αA) = cond(A), ∀α 6= 0.

For every matrix A, the condition number cond2(A) = ‖A‖2‖A−1‖2associated with the 2-norm veries

cond2(A) =maxi σi (A)

mini σi (A)

where the σi (A) are the singular values of A.

For a normal matrix A,

cond2(A) =maxi |λi (A)|mini |λi (A)|

where the λi (A) are the eigenvalues of A.

For unitary or orthogonal matrix A, the condition number cond2(A) = 1.

The condition number cond2(A) is invariant trough unitarytransformation: cond2(A) = cond2(AU) = cond2(UA) = cond2(UHAU)for every unitary matrix U.

Conditioning Direct methods Iterative methods Triangular systems Gauss elimination LU factorization Cholesky factorization Householder method and QR factorization Computational work

Part III


6 Conditioning




Principle of direct methods I

For solving

Ax = b,

direct methods consist in determining an invertible matrix M such that

MAx = Mb

is an upper triangular system. This is called the elimination step. Then, asimple backward substitution can be performed to solve this triangular system.

Do not compute the inverse !!!

In practice, the solution x of Ax = b is not obtained by rst computing theinverse A−1 and then computing the matrix-vector product A−1b. Indeed, itwould be equivalent to solving n systems of linear equations.

For simplicity, we use sometimes the notation M−1x but the inverse is nevercomputed in practise. This operation corresponds to the solution of a system ofequations (generally easy due to properties of M: diagonal, triangular).


Part III


6 Conditioning




Triangular systems of equations I

If A is lower triangular, the systema11 0 . . . 0

a21 a22. . .

......

.... . . 0

an1 an2 . . . ann

x1

...xn

=

b1...bn

is solved by a forward substitution

Algorithm 40 (Forward substitution for lower triangular system)

Step 1. a11x1 = b1

Step 2. a22x2 = −a21x1...

Step n. annxn = bn −∑n−1

j=1anjbj


Triangular systems of equations II

If A is upper triangular, the systema11 a12 . . . a1n0 a22 . . . a2n...

. . .. . .

...0 . . . 0 ann

x1

...xn

=

b1...bn

is solved by a backward substitution

Algorithm 41 (Backward substitution for upper triangular system)

Step 1. annxn = bn

Step 2. an−1,n−1xn−1 = −an−1,nxn...

Step n. a11x1 = b1 −∑n

j=2a1jbj


Part III


6 Conditioning




Gauss elimination I

Denition 42 (Pivoting matrix)

A pivoting matrix P(i , j), associated with a linear mapping written in a basisE = (ei )

ni=1, is dened as follows

P(i , j) = I − (ei − ej )(ei − ej )H

For A ∈ Kn×n, P(i , j)A is the matrix A with permuted lines i and j , andAP(i , j) is the matrix A with permuted columns i and j . Let us note thatP(i , i) = I .

We now describe the Gauss elimination procedure

Step 1.

Let A = A1 = (a1ij ). Select a nonzero element a1i∗1 of the rst column andpermute the lines 1 and i∗. Let P1 = P(1, i∗) and set

A1 = P1A1 = (a1ij )


Gauss elimination II

Let introduce the matrix

E1 =

1

− a121a111

. . .

.... . .

− a1n1a111

1

such that

A2 = E1A1 =

a111 a112 . . . a11n0 a222 . . . a22n...

......

0 a2n2 . . . a2nn

Step 2.

We have det(A2) = det(E1P1A1) = det(E1)det(P1)det(A) = ±det(A)(−det(A) if a line permutation has been made, +det(A) if not). Therefore A2

is invertible, and so is the submatrix (A2)ij , 2 6 i , j 6 n. We can then operateas in step 1 for this submatrix for eliminating the subdiagonal elements of


Gauss elimination III

column 2: introduce a permutation matrix P2 = P(2, i∗), with i∗ > 2, and aline operation matrix E2, and let A2 = P3A2 and A3 = E3A2.

Step k − 1.After k − 1 steps, we have the matrix

Ak = Ek−1Pk−1 . . .E1P1A1 =

ak11 ak12 . . . . . . . . . ak1nak22 . . . . . . . . . ak2n

. . ....

akkk . . . akkn...

...

aknk . . . aknn

After an eventual pivoting with a pivoting matrix Pk , we dene Ak = PkAk andAk+1 = Ek Ak with


Gauss elimination IV

Ek =

1. . .

1

−akk+1,k

akkk

. . .

.... . .

− aknk

akkk

1

Last step

After n − 1 steps, by we obtain an upper triangular matrix

An = En−1Pn−1 . . .E1P1A

The invertible matrix M = En−1Pn−1 . . .E1P1 is then an invertible matrix suchthat MA is upper triangular.


Gauss elimination V

Remark. Choice of pivoting

In order to avoid dramatic roundo errors with nite precision computers, weadopt one of the following pivoting strategies.

Partial pivoting. At step k, we select Pk = P(k, i∗) such that|aki∗k | = max

k6i6n|akik |

Total pivoting. At step k, we select i∗ and j∗ such that|aki∗j∗ | = max

i>k,j6n|akij | and we permute lines and columns by dening

Ak = P(k, i∗)AkP(j∗, k).


Gauss elimination VI

Remark. Computing the determinant of a matrix

The Gauss elimination is an ecient technique for computing the determinantof a matrix. Indeed,

det(A) = det(An)det(M)−1 = ±n∏i=1

anii

where the sign depends on the number of pivoting operations that have beenperformed.

Remark.

In practice, for solving a system Ax = b, we don't compute the matrix M. Werather operate simultaneously on b by computing

Mb = bn = En−1Pn−1 . . .E1P1b

Then, we solve the triangular system MAx = MB, or equivalently Anx = bn.


Gauss elimination VII

Computational work of Gauss Elimination

O(2

3n3)

For an arbitrary matrix, it seems that this computational work in O(n3) is nearthe optimal that we can expect. That is the reason why Gauss elimination canbe used when no additional information is given on the matrix.

Theorem 43

For A ∈ Kn (inversible or not), there exists at least one invertible matrix Msuch that MA is an upper triangular matrix.

Proof.

For A invertible, the Gauss elimination procedure is a constructive proof for thistheorem. Otherwise, the matrix A is singular if and only there exists a matrixAk with elements akik = 0 for k 6 i 6 n. In this case, we can set Ek = I andPk = I at step k of the Gauss elimination and go to the next step.


Part III


6 Conditioning




LU factorization I

The LU factorization of a matrix consists in constructing lower and uppertriangular matrices L and U such that A = LU. In fact, this factorization isobtained by the Gauss elimination procedure.Let us consider the Gauss elimination without pivoting, i.e. by letting Ak = Ak .It is possible if at step k, akkk 6= 0. We then let

M = En−1 . . .E1

and obtainMA = U

where U is the desired upper triangular matrixa111 a112 . . . a11n

a222 . . . a22n. . .

...annn

M being a product of lower triangular matrices, it is a lower triangular matrixand so is its inverse M−1. We then have the desired decomposition with

L = M−1 = E−11 . . .E−1n−1


LU factorization II

Matrix L = (lij ) is directly obtained from matrices Ek

Ek =

1. . .

1

−lk+1,k

. . ....

. . .

−lnk 1

, E−1k =

1. . .

1

lk+1,k

. . ....

. . .

lnk 1


LU factorization III

Theorem 44

Let A ∈ Kn×n be such that the diagonal submatricesa11 . . . a1k...

...ak1 . . . akk

∈ Kk×k are invertible. Then, there exists a lower triangular

matrix L and an upper triangular matrix U such that

A = LU

If we further impose that the diagonal elements of L are equal to 1, thisdecomposition is unique.

Proof.

The condition on the invertibility of submatrices ensures that at step k, thediagonal term akkk is nonzero and therefore that pivoting can be omitted.


Part III


6 Conditioning




Cholesky factorization I

Theorem 45

If A ∈ Rn×n is a symmetric denite positive matrix, there exists at least onelower triangular matrix B = (bij ) ∈ Rn×n such that

A = BBT

If we further impose that the diagonal elements bii > 0, the decomposition isunique.


Cholesky factorization II

Proof.

We simply show that the diagonal submatrices ∆k = (aij ), 1 6 i , j 6 k, arepositive denite. Therefore, they are invertible and there exists a unique LUfactorization A = LU such that L has unit diagonal terms. Since the ∆k arepositive denite, we have

∏k

i=1uii = det(∆kk) > 0, for all k > 1. We then

dene the diagonal matrix D = diag(√uii ) and we write

A = (LΛ)(Λ−1U) = BC

where B = LΛ and C = Λ−1U have both diagonal terms bii = cii =√uii . The

symmetry of matrix A imposes that BC = CTBT and therefore

CB−T =

1 × . . . ×

1 . . . ×. . .

...1

=

1× 1...

. . .

× . . . × 1

= B−1CT

and this last equality is only possible if CB−T = I ⇒ C = BT . (Prove theuniqueness of the decomposition).


Part III


6 Conditioning




Householder matrices

Denition 46

For v a nonzero vector in Cn, we introduce the following matrix, calledHouseholder matrix associated with v :

H(v) = I − 2vvH

vHv

We will consider, although incorrect, that the identity I is a Householder matrix.

Theorem 47

For x = (xi )ni=1 ∈ Cn, there exists two householder matrices H such that

(Hx)i = 0 for i > 2.

Proof.

Denoting by e1 the rst basis vector of Cn, one veries that the twohouseholder matrices H(v) are associated with the vectors v = x ± ‖x‖2e iαe1,where α ∈ R is the argument of x1 ∈ C, i.e. x1 = |x1|e iα, and we have

H(v)x = ∓‖x‖2e1


Householder method I

The Householder method for solving Ax = b consists in nding n − 1householder matrices Hin−1i=1

such that Hn−1 . . .H1A is upper triangular.Then, we solve the following triangular system by backward substitution:

Hn−1 . . .H1Ax = Hn−1 . . .H1b

Suppose that Ak = Hk−1 . . .H1A is under the form

Ak =

ak11 ak12 . . . . . . . . . ak1na222 . . . . . . . . . a22n

. . ....

akkk . . . akkn...

...

aknk . . . aknn

Let c = (ci )

n−k+1

i=1∈ Cn−k+1 be the vector with components ci = aki+k−1. There

exists a Householder matrix H(vk), with vk ∈ Cn−k+1, such that H(vk)c has


Householder method II

zero components except the rst one. Then, we denote vk =

(0vk

)∈ Cn and

we let Hk = H(vk) the householder matrix associated with vk . Let us note that

Hk = H(vk) =

(Ik−1 00 H(vk)

)Performing this operation for k = 1 . . . n − 1, we obtain the desired uppertriangular matrix An = Hn−1 . . .H1A.


QR factorization I

The QR factorization is a matrix interpretation of the Householder method.

Theorem 48

For A ∈ Kn×n, there exist a unitary matrix Q ∈ Kn×n and an upper triangularmatrix R ∈ Kn×n such that

A = QR

Moreover, one can choose the diagonal elements of R > 0. Then, if A isinvertible, the corresponding QR factorization is unique.


QR factorization II

Proof.

The previous householder construction proves the existence of an uppertriangular matrix

R = Hn−1 . . .H1A

where the Hi are householder matrices. The matrix

Q = (Hn−1 . . .H1)−1 = H−11 . . .H−1n−1 = H1 . . .Hn−1

is unitary (recall that the Hk are unitary and hermitian, i.e. H−1k = HHk = Hk).

This proves this existence of a QR decomposition. Let now denote by αi ∈ Rthe arguments of the diagonal elements rkk = |rkk |e iαk and let D = diag(e iαk ).The matrix Q = QD is still unitary and the matrix R = D−1R is still uppertriangular with all its diagonal elements greater than 0. We then have theexistence of a QR factorization A = QR with rkk > 0. We can then show theuniqueness of this decomposition (let as an exercice).

Remark.

If A ∈ Rn×n, Q,R ∈ Rn×n, with Q an orthogonal matrix.


Part III


6 Conditioning




Computational complexity

With classical algorithms...

Algorithm Operations

LU O( 23n3)

Cholesky O( 13n3)

QR O( 23n3)

Conditioning Direct methods Iterative methods Generalities Jacobi, Gauss-Seidel, Relaxation Projection methods Krylov subspace methods

Part III


6 Conditioning




Part III


6 Conditioning




Basic iterative methods I

For the solution of a linear system of equations Ax = b, basic iterative methodsconsist in constructing a sequence xkk≥0 dened by

xk+1 = Bxk + c

from an initial vector x0. Matrix B and vector c are to be dened such that theiterative method converges towards the solution x , i.e.

limk→∞

xk = x

B and c are chosen such that I − B is invertible and such that x is the uniquesolution of x = Bx + c.

Theorem 49

Let B ∈ Kn×n. The following assertions are equivalent

(1) limk→∞ Bk = 0

(2) limk→∞ Bkv = 0 ∀v(3) ρ(B) < 1

(4) ‖B‖ < 1 for at least one subordinate matrix norm ‖ · ‖


Basic iterative methods II

Proof.

(1)⇒ (2). ‖Bkv‖ ≤ ‖Bk‖‖v‖ −→k→∞

0

(2)⇒ (3). If ρ(B) ≥ 1, there exists a vector v 6= 0 such that Bv = λv with|λ| ≥ 1 and then Bkv = λkv does not converge towards 0, a contradiction.(3)⇒ (4). Consequence of theorem 36(4)⇒ (1). ‖Bk‖ ≤ ‖B‖k −→

k→∞0.

Theorem 50

The following assertions are equivalent

(i) The iterative method is convergent

(ii) ρ(B) < 1

(iii) ‖B‖ < 1 for at least one subordinate matrix norm ‖ · ‖

Proof.

The iterative method is convergent if and only if limk→∞ ek = 0, withek = xk − x = Bke0. The proof then results from theorem 49.


Part III


6 Conditioning




Jacobi, Gauss-Seidel, Relaxation (SOR) I

We decompose A under the form

A = M − N

where M is an invertible matrix and then

Ax = b ⇔ Mx = Nx + b

and we compute the sequence

xk+1 = M−1Nxk + M−1b ≡ Bxk + c

In practice, at each iteration, we solve the system Mxk+1 = Nxk + b. Themethod is then ecient if M have a simple form (diagonal or triangular).

Denition 51

We decompose A = D − E − F where D is the diagonal part of A, −E and −Fits strict lower and upper parts.


Jacobi, Gauss-Seidel, Relaxation (SOR) II

Denition 52 (Jacobi)

M = D, N = E + F

Denition 53 (Gauss-Seidel)

M = D − E , N = F

Denition 54 (Successive Over Relaxation (SOR))

M = ω−1D − E , N = ω−1(1− ω)D + F


Convergence results I

Theorem 55

Let A a positive denite hermitian matrix, decomposed under the formA = M −N with M invertible. If the matrix (MH + N) is positive denite, thenρ(M−1N) < 1.


Convergence results II

Proof.

From theorem 36, we know that it suces to nd a matrix norm for which‖M−1N‖ < 1. We will show this property for the matrix norm subordinate tothe vector norm ‖v‖ =

√vHAv . Let rst note that (MH +N) is hermitian since

(MH + N)H = M + NH = A + N + NH = AH + NH + N = MH + N.

We have‖M−1N‖ = ‖I −M−1A‖ = sup

‖v‖=1

‖v −M−1Av‖

Denoting w = M−1Av , we have, for v such that ‖v‖ = 1,

‖v − w‖2 = 1− vHAw − wHAv + wHAw

= 1− wHMHw − wHMw + wHAw = 1− wH(MH + N)w︸︷︷︸>0

Therefore ‖v‖ = 1⇒ ‖v −M−1Av‖ < 1. The functionv ∈ Cn 7→ ‖v −M−1Av‖ ∈ R is continuous on the unit sphere, which is acompact set, and therefore the supremum is reached.


Convergence results III

Theorem 56 (Sucient condition for convergence of relaxation)

If A is hermitian positive denite, relaxation method converges if 0 < ω < 2.

Proof.

We show that MH + N = 2−ωω

D. Since A is denite positive, we have for the

canonical basis vectors vi , vHi Avi = vHi Dvi > 0. Matrix MH + N is then

hermitian positive denite if and only if 0 < ω < 2, and the proof ends withtheorem 55.

Theorem 57 (Necessary condition for convergence of relaxation)

The spectral radius of the matrix Bω = M−1N of the relaxation method veries

ρ(Bω) ≥ |ω − 1|

and therefore, relaxation method converges only if 0 < ω < 2.


Convergence results IV

Proof.

We haveBω = (ω−1D − E)−1(ω−1(1− ω)D + F )

and then

det(Bω) = (1− ω)n =n∏i=1

λi (Bω)

Then

ρ(Bω) ≥

(n∏i=1

λi (Bω)

)1/n

= |1− ω|


Part III


6 Conditioning




Projection methods I

We consider a real system of equations Ax = b. Projection techniques consistsin searching an approximate solution x in a subspace V of Rn. Theapproximate solution is then dened by

x ∈ V, b − Ax ⊥ W

where W is a subspace of Rn with the same dimension of V. The approximatesolution is then dened by orthogonality constraints on the residual. x is calleda projection of x onto the subspace V and parallel to subspace W. The caseV =W corresponds to an orthogonal projection and the orthogonalityconstraint is called Galerkin orthogonality. The case V 6=W corresponds to anoblique projection and the orthogonality constraint is called Petrov-Galerkinorthogonality.Let V = (v1, . . . , vm) and W = (w1, . . . ,wm) dene bases of V and W, theapproximation is then dened by x = Vy , with y ∈ Rm such that

WTAVy = WTb ⇒ y = (WTAV )−1WTb


Projection methods II

Projection method

Until convergence

1 Select V = (v1, . . . , vm) and W = (w1, . . . ,wm)

2 r = b − Ax

3 y = (WTAV )−1WT r

4 x = x + Vy

Subspaces must be chosen such that WTAV is nonsingular. Two importantparticular choices satises this property.

Theorem 58

WTAV is nonsingular for either one the following conditions

A is positive denite and V =WA is nonsingular and W = AV.


Projection methods III

Theorem 59

Assume that A is symmetric denite positive and V =W. Then, x ∈ V is suchthat Ax − b ⊥ V if and only if

‖x − x‖2A = minx∈V‖x − x‖2A, ‖x‖2A = xTAx

Theorem 60

Let A a nonsingular matrix and W = AV. Then, x ∈ V is such thatAx − b ⊥ W if and only if it minimizes the 2-norm of the residual

‖b − Ax‖2 = minx∈V‖b − Ax‖2


Basic one-dimensional projection algorithms I

Basic one-dimensional projection schemes consist in selecting V and W withdimension 1. Let us denote V = spanv and W = spanw. Denotingr = b − Axk the residual at iteration k, the next iterate is dened by

xk+1 = xk + αv , α =(w , r)

(w ,Av)=

wT r

wTAv

Denition 61 (Steepest descent)

We let v = r and w = r . We then have

xk+1 = xk + αr , α =(r , r)

(Ar , r)

If A is symmetric positive denite matrix, xk+1 is the solution of

minα

f (xk + αr), f (x) = ‖x − x‖2A = (x − x ,A(x − x))

We note that −∇f (xk) = A(x − xk) = b − Axk = r , and thereforexk+1 = xk − α∇f (xk). It then corresponds to a steepest descent algorithm forminimizing the convex function f (x), with an optimal choice of step α.


Basic one-dimensional projection algorithms II

Theorem 62 (Convergence of steepest descent)

If A is symmetric positive denite matrix, the steepest descent algorithmconverges.

Denition 63 (Minimal residual)

We let v = r and w = Ar . We then have

xk+1 = xk + αr , α =(Ar , r)

(Ar ,Ar)

which is the solution ofminα‖b − A(xk + αr)‖2

Theorem 64

If A is positive denite, minimal residual algorithm converges.


Basic one-dimensional projection algorithms III

Denition 65 (Residual norm steepest descent)

We let v = AT r and w = Av = AAT r . We then have

xk+1 = xk + αAT r , α =(Av , r)

(Av ,Av)=‖v‖2

‖Av‖2

which is the solution of

minα

f (xk + αv), f (x) = ‖b − Ax‖2 = (Ax − b,Ax − b)

Note that −∇f (xk) = AT (b − Axk) = AT r = v . It then corresponds to asteepest descent algorithm on convex function f (x), with an optimal choice ofstep α.

Theorem 66

If A is nonsingular, residual norm steepest descent algorithm converges.


Part III


6 Conditioning




Krylov subspace methods

Krylov subspace methods are projection methods which consists in deningsubspace V as the m-dimensional Krylov subspace of matrix A, associated withr0 = b − Ax0, where x0 is an initial guess. This Krylov subspace is dened by

V = Km(A, r0) = spanr0,Ar0, . . . ,Am−1r0

The dierent Krylov subspace methods dier from the choice of space W andfrom the choice of a preconditioner. First class of methods consisting in takingW = Km(A, r0) or W = AKm(A, r0). Second class of methods consisting intaking W = Km(AT , r0).

A complete reference about iterative methods

Yousef Saad.Iterative Methods for Sparse Linear Systems.SIAM, 2003.

Jacobi Givens-Householder QR Power iterations Krylov

Part IV

Eigenvalue problems

9 Jacobi method

10 Givens-Householder method

11 QR method

12 Power iterations

13 Methods based on Krylov subspaces


Eigenvalue problems

The aim is to present dierent techniques for nding the eigenvalues andeigenvectors (λi , vi ) of a matrix A:

Avi = λivi


Part IV

Eigenvalue problems

9 Jacobi method


11 QR method

12 Power iterations



Jacobi method I

Jacobi method allows to nd all the eigenvalues of a symmetric matrix A. It iswell adapted to full matrices.There exists an orthogonal matrix O such that OTAO = diag(λ1, . . . , λn),where the λi are the eigenvalues of A, distinct or not. The Jacobi methodconsists in constructing a sequence of elementary orthogonal matrices (Ωk)k≥1such that the sequence (Ak)k≥1, dened by

Ak+1 = ΩTk AkΩk = (Ω1 . . .Ωk)TAk(Ω1 . . .Ωk) = OT

k AOk

converges towards the diagonal matrix diag(λ1, . . . , λn) (with an eventualpermutation).Each transformation Ak → Ak+1 consists in eliminating two symmetricextra-diagonal terms by a rotation. Let A = Ak and B = Ak+1. The matrix Ωk

is selected as follows

Ωk = I + (cos(θ)− 1)(epeTp + eqe

Tq ) + sin(θ)epe

Tq − sin(θ)eqe

Tp

where θ ∈ (−π/4, π/4)\0 is the unique angle such that bpq = bqp = 0. θ issolution of

cotan(2θ) =aqq − app2apq


Jacobi method II

Theorem 67 (Convergence of eigenvalues)

The sequence (Ak)k≥1 obtained with the Jacobi method converges and

limk→∞

Ak = diag(λσ(i))

where σ is a permutation of 1, ..., n.

Theorem 68 (Convergence of eigenvectors)

We suppose that all eigenvalues of A are distinct. Then, the sequence (Ok)k≥1in the Jacobi method converges to an orthogonal matrix whose columns forman orthonormal set of eigenvectors of A.


Part IV

Eigenvalue problems

9 Jacobi method


11 QR method

12 Power iterations



Givens-Householder method I

Givens-Householder method is adapted to the research of selected eigenvaluesof a symmetric matrix A, such as the eigenvalues lying in a given interval.Two steps

1 Determine an orthogonal matrix P such that PTAP is tridiagonal, withthe Householder method.

2 Compute the eigenvalues of a tridiagonal symmetric matrix with theGivens method.

Theorem 69

For a symmetric matrix A, there exists an orthogonal matrix P, product of n− 2Householder matrices Hk such that PTAP is tridiagonal: P = H1H2 . . .Hn−2

HT1 AH1 =

× × 0 0 . . .× × × × . . .0 × × × . . .0 × × × . . .

.

.

....

.

.

.

, HT2 H

T1 AH1H2 =

× × 0 0 . . .× × × 0 . . .0 × × × . . .0 0 × × . . .

.

.

....

.

.

.

...


Part IV

Eigenvalue problems

9 Jacobi method


11 QR method

12 Power iterations



QR method I

The most commonly used method to compute the whole set of eigenvalues ofan arbitrary matrix A, even nonsymmetric.

QR algorithm

Let A1 = A. For k ≥ 1, perform until convergence

Ak = QkRk (QR factorization)

Ak+1 = RkQk

All matrices Ak are similar to matrix A. Under certain conditions, the matrixAk converges towards a triangular matrix which is the Schur form of A, whosediagonal terms are the eigenvalues of A.


Part IV

Eigenvalue problems

9 Jacobi method


11 QR method

12 Power iterations



Power iterations method I

Power iteration method allows the capture of the dominant (largest magnitude)eigenvalue and associated eigenvector of a real matrix A.

Power iteration algorithm

Start with an arbitrary normalized vector x (0) and compute the sequence

x (k+1) =Ax (k)

‖Ax (k)‖

andβ(k+1) = (Ax (k), x (k))

Theorem 70

If the dominant eigenvalue is real and of multiplicity 1, the sequences (x (k))k≥0and (β(k))k≥0 respectively converge towards the dominant eigenvector andeigenvalue.


Power iterations method II

Proof.

Let us prove the convergence of the method when A is symmetric. Then, thereexists an orthonormal basis of eigenvectors (v1, . . . , vn), associated witheigenvalues (λ1, . . . , λn). Let us consider that |λ1| > |λi | for all i > 1. Theinitial vector x (0) can be decomposed on this basis: x (0) =

∑n

i=1aivi and then,

since Avi = λivi ,

x (k) =Ax (k−1)

‖Ax (k)‖=

Akx (0)

‖Akx (0)‖

Akx (0) =n∑i=1

aiλki vi = a1λ

k1w

(k), w (k) =

(v1 +

n∑i=2

aia1

(λiλ1

)k

vi

)and since w (k) → v1, we obtain

x (k) =a1λ

k1w

(k)

‖a1λk1w (k)‖−→k→∞

sign(a1λk1)v1, β(k) −→

k→∞(Av1, v1) = λ1

Let us note that for general matrices, a proof using the Jordan form can beused.


Power iterations method III

Exercice. Power method with deation

Under certain conditions, Power method with deation allows to compute thewhole set of eigenvalues of a matrix. See exercices.

Denition 71 (Inverse power method)

For an invertible matrix A, applying the power method to matrix A−1 allows toobtain the eigenvalue of A with smallest magnitude and the associatedeigenvector (if the smallest magnitude eigenvalue is of multiplicity 1).

Denition 72 (Shifted inverse power method)

The shifted inverse power method consists in applying the inverse powermethod to the shifted matrix Aσ = (A− σI ). It allows the capture of theeigenvalue (and associated eigenvector) which is the closest from the value σ.Indeed, if we denote by (vi , λi ) the eigenpairs of matrix A, Aσ has foreigenpairs (vi , λi − σ). Therefore the inverse power method on Aσ willconverge towards the eigenvalue (λi − σ) such that |λi − σ| = minj |λj − σ|.


Part IV

Eigenvalue problems

9 Jacobi method


11 QR method

12 Power iterations



Methods based on Krylov subspaces

A complete reference for the solution of eigenvalue problems

Yousef Saad.Numerical Methods For Large Eigenvalue Problems.SIAM, 2011.

Fixed point Monotone operators Dierential calculus Newton method

Part V

Nonlinear equations

14 Fixed point theorem

15 Nonlinear equations with monotone operators

16 Dierential calculus for nonlinear operators

17 Newton method


Solving nonlinear equations

The aim is to introduce dierent techniques for nding the solution u of anonlinear equation

A(u) = b, u ∈ K ⊂ V

where K is a subset of a vector space V and A : K → V is a nonlinear mapping.We will equivalently consider the nonlinear equation

F (u) = 0, u ∈ K ⊂ V

where F : K → V .


Innite dimensional framework

Denition 73

A Banach space V is a complete normed vector space. That means that this isa vector space (on complex or real elds) equipped with a norm ‖ · ‖ and suchthat every Cauchy sequence with respect to this norm has a limit in V .

Denition 74

A Hilbert space is a Banach space V whose norm ‖ · ‖ is associated with anscalar (or hermitian) product (·, ·), with ‖v‖2 = (v , v).

Example 75

V = Rn equipped the natural euclidian scalar product is a nite-dimensionalHilbert space.V = Cn equipped the natural hermitian product is a nite-dimensional Hilbertspace on complex eld.


Part V

Nonlinear equations




17 Newton method


Fixed point theorem I

We here consider nonlinear problems under the form

T (u) = u, u ∈ K ⊂ V (1)

where T : K → V is a nonlinear operator.

Denition 76

A solution u of the equation T (u) = u is called a xed point of mapping T .

We are interested in the existence of a solution to equation (1) and in thepossibility of approaching this solution by the following sequence (uk)k≥0dened by

uk+1 = T (uk)

Remark.

Let us note that nonlinear equations F (u) = 0 can be recasted (in dierentways) in the form (1), by letting

T (u) = F (u) + u, T (u) = αF (u) + u, . . .


Fixed point theorem II

Denition 77

Let V be a Banach space endowed with a norm ‖ · ‖. A mappingT : K ⊂ V → V is said

contractive if there exists a constant α, with 0 ≤ α < 1, such that

‖T (u)− T (v)‖ ≤ α‖u − v‖ ∀u, v ∈ K

α is called the contractivity constant.

non-expansive if

‖T (u)− T (v)‖ ≤ ‖u − v‖ ∀u, v ∈ K

Lipschitz continuous if there exists a constant β ≥ 0 such that

‖T (u)− T (v)‖ ≤ β‖u − v‖ ∀u, v ∈ K

β is called the Lipschitz-continuity constant.


Fixed point theorem III

Theorem 78 (Banach xed-point theorem)

Assume that K is a closed set in a Banach space V and that T : K → K is acontractive mapping with contractivity constant α. Then, we have thefollowing results:

There exists a unique u ∈ K such that T (u) = u

For any u0 ∈ K, the sequence (uk)k≥0 in K, dened by uk+1 = T (uk),converges to u, i.e.

‖u − uk‖ −→k→∞

0


Fixed point theorem IV

Proof.

Let us prove that uk is a Cauchy sequence. We have

‖uk+1 − uk‖ = ‖T (uk )− T (uk−1)‖ ≤ α‖uk − uk−1‖ ≤ αk‖u1 − u0‖

for m ≥ k ≥ 1, we then have

‖um − uk‖ ≤m−1∑i=k

‖ui+1 − ui‖ ≤m−1∑i=k

αi‖u1 − u0‖ = ‖u1 − u0‖αk

m−1−k∑i=0

αi

=αk (1− αm−k )

1− α‖u1 − u0‖ ≤

αk

1− α‖u1 − u0‖

Since α ∈ [0, 1), ‖um − uk‖ → 0 as m, k →∞, and therefore, uk is a Cauchy sequence.Since the sequence uk is Cauchy in a Banach space V , it converges to some u ∈ V andsince K is closed, the limit u ∈ K . In the relation uk+1 = T (uk ), we take the limit k →∞and obtain u = T (u), by continuity of T . Then, u is a xed point of T .For the uniqueness, suppose that u1 and u2 are two xed points. Then we have

‖u2 − u1‖ = ‖T (u2)− T (u1)‖ ≤ α‖u2 − u1‖

which is possible only if u2 = u1.


Fixed point theorem V

Example 79

Let V = R and T (x) = ax + b. If a 6= 1, the sequence xk+1 = T (xk) ischaracterized by

xk = axk−1 + b = akx0 +1− ak

1− ab

If |a| < 1, xk converges to b1−a , which is the unique xed point of T . If |a| > 1,

the sequence diverges. Let us note that

|T (x)− T (x)| = |a||x − x |

and therefore, we have that T is a contractive mapping if |a| < 1.


Part V

Nonlinear equations




17 Newton method


Nonlinear equations with monotone operators I

We consider the application of the xed point theorem to the analysis ofsolvability of a class of nonlinear equations

A(u) = b u ∈ V

where V is a Hilbert space and A : V → V is a Lipschitz continuous andstrictly monotone operator.

Denition 80 (Monotone operator)

A mapping A : V → V on a Hilbert space V is said

monotone if(A(u)− A(v), u − v) ≥ 0 ∀u, v ∈ V

strictly monotone if

(A(u)− A(v), u − v) > 0 ∀u, v ∈ V , u 6= v

strongly monotone if there exists a constant α > 0 such that

(A(u)− A(v), u − v) ≥ α‖u − v‖2 ∀u, v ∈ V

α is called the strong monotonicity constant.


Nonlinear equations with monotone operators II

Theorem 81

Let V be a Hilbert space and A : V → V a strongly monotone and Lipschitzcontinuous operator, with monotonicity constant α and Lipschitz-continuityconstant β. Then, for any b ∈ V , there exists a unique u ∈ V such that

A(u) = b

Moreover, if A(u1) = b1 and A(u2) = b2, then

‖u1 − u2‖ ≤1

α‖b1 − b2‖

which means that the solution depends continuously on the right-hand side b.


Nonlinear equations with monotone operators III

Proof.

The equation A(u) = b is equivalent to Tγ(u) = u, withTγ(u) = u − γ(A(u)− b) for any γ 6= 0. The idea is to prove that there existsa γ such that Tγ : V → V is contractive. The application of Banach xedpoint theorem will then give the existence and uniqueness of a xed point of u,and therefore, the existence and uniqueness of a solution to A(u) = b. We have

‖Tγ(w)− Tγ(v)‖2 = ‖(w − v)− γ(A(w)− A(v))‖2

= ‖w − v‖2 − 2γ(A(w)− A(v),w − v) + γ2‖A(w)− A(v)‖2

≤ (1− 2γα + γ2β2)‖w − v‖2

For γ2 < 2α/β2, we have (1− 2γα + γ2β2) < 1, and Tγ is a contraction.Now if A(u1) = b1 and A(u2) = b2, we have A(u1)− A(u2) = b1 − b2 and

α‖u1−u2‖2 ≤ (A(u1)−A(u2), u1−u2) = (b1−b2, u1−u2) ≤ ‖b1−b2‖‖u1−u2‖

where the second inequality is the Cauchy-Schwartz inequality satised by theinner product of a Hilbert space. This proves the continuity of the solution uwith respect to b.


Part V

Nonlinear equations




17 Newton method


Fréchet and Gâteaux derivatives I

Let F : K ⊂ V →W be a nonlinear mapping, where K is a subset of a normedspace V and W a normed space. We denote by L(V ,W ) the set of linearapplications from V to W .

Denition 82 (Fréchet derivative)

F is Fréchet-dierentiable at u if and only if there exists A ∈ L(V ,W ) suchthat

F (u + v) = F (u) + Av + o(‖v‖) as ‖v‖ → 0

A is denoted F ′(u) and is called the Fréchet derivative of F at u. If F isFréchet-dierentiable at all points in K , we denote by F ′ : K ⊂ V → L(V ,W )the Fréchet derivative of F on K .

Property 83

If F admits a Fréchet derivative F ′(u) at u, then F is continuous at u.


Fréchet and Gâteaux derivatives II

Denition 84 (Gâteaux derivative)

F is Gâteaux-dierentiable at u if and only if there exists A ∈ L(V ,W ) suchthat

limt→0

F (u + tv)− F (u)

t= Av ∀v ∈ V (2)

A is denoted F ′(u) and is called the Gâteaux derivative of F at u. If F isGâteaux-dierentiable at all points in K , we denote by F ′ : K ⊂ V → L(V ,W )the Gâteaux derivative of F on K .

Property 85

If a mapping F is Fréchet-dierentiable, it is also Gâteaux dierentiable andthe derivatives F ′ coincide.Conversely, if a mapping F is Gâteaux-dierentiable at u and if F is continuousat u or if the limit in (2) is uniform with v such that ‖v‖ = 1, then F is alsoFréchet-dierentiable and the two derivatives coincide.


Convex functions I

Denition 86

A subset K of a vector space V is said convex if

∀u, v ∈ K , ∀t ∈ [0, 1], tu + (1− t)v ∈ K

Denition 87

A function J : K → R, dened on a convex set K of V , is said

convex if for all u, v ∈ K

J(tu + (1− t)v) ≤ tJ(u) + (1− t)J(v) ∀t ∈ [0, 1]

strictly convex if for all u, v ∈ K with u 6= v ,

J(tu + (1− t)v) < tJ(u) + (1− t)J(v) ∀t ∈ (0, 1)


Convex functions II

Theorem 88

Let J : K ⊂ V → R be Gateaux-dierentiable. The following statements areequivalent:

(1) J is convex

(2) J(v) ≥ J(u) + (J ′(u), v − u), for all u, v ∈ K

(3) J ′ is monotone, i.e. (J ′(v)− J ′(u), v − u) ≥ 0 , for all u, v ∈ K

Theorem 89

Let J : K ⊂ V → R be Gateaux-dierentiable. The following statements areequivalent:

(1) J is strictly convex

(2) J(v) > J(u) + (J ′(u), v − u), for all u, v ∈ K with u 6= v

(3) J ′ is strictly monotone, i.e. (J ′(v)− J ′(u), v − u) > 0 , for all u, v ∈ Kwith u 6= v


Convex functions III

Denition 90

A function J : K ⊂ V → R is said strongly convex if it is Gateaux-dierentiableand if its Gâteaux derivative is strongly monotone, i.e. if there exists a constantα > 0 such that

(J ′(v)− J ′(u), v − u) ≥ α‖u − v‖2


Convex optimization I

Theorem 91

Let K be a closed convex subset of an Hilbert space V . Assume thatJ : K → R be a convex and Gâteaux dierentiable mapping. Then, there existsu ∈ K such that

J(u) = infv∈K

J(v) (3)

if and only if there exists u ∈ K such that

(J ′(u), v − u) ≥ 0 ∀v ∈ K (4)

When K is a linear subspace, the last inequality reduces to

(J ′(u), v) = 0 ∀v ∈ K (5)


Convex optimization II

Proof.

Assume (3). Then ∀v ∈ K and ∀t ∈ [0, 1],

J(u) ≤ J(tv + (1− t)u) ≤ tJ(v) + (1− t)J(u)

and thenJ(u + t(v − u))− J(u)

t≥ J(v)− J(u) ∀t ∈ (0, 1]

Taking the limit t → 0+, we obtain

(J′(u), v − u) ≥ J(v)− J(u) ≥ 0

Now, assume (4). Since J is convex, we have ∀v ∈ K

J(v) ≥ J(u) + (J′(u), v − u) ≥ 0

Finally, if K is a subspace, then for all v ∈ K , u ± v ∈ K and therefore

(J′(u),±v) ≥ 0 ⇒ (J′(u), v) = 0 ∀v ∈ K


Part V

Nonlinear equations




17 Newton method


Newton method I

Let U and V be two Banach spaces and F : U → V a Fréchet-dierentiablefunction. We want to solve

F (u) = 0

The Newton method consists in constructing a sequence unn∈N by solvingsuccessive linearized problems. At iteration n, we introduce the linearization Fof F at un, dened by

F (v) = F (un) + F ′(un)(v − un)

and we dene un+1 such that F (un+1) = 0. The Newton iterations are thendened as follows.

Newton iterations

Start from an initial guess u0 and compute the sequence unn∈N dened by

un+1 = un − F ′(un)−1F (un)


Newton method II

Theorem 92 (local convergence of Newton method)

Assume u∗ is solution of F (u∗) = 0 and assume that F ′(u∗)−1 exists and is acontinuous linear map from V to U. Assume that F ′ is locally Lipschitzcontinuous at u∗, i.e.

‖F ′(u)− F ′(v)‖ ≤ L‖u − v‖ ∀u, v ∈ N(u∗)

where N(u∗) is a neighborhood of u∗. Then, there exists δ > 0 such that if‖u0 − u∗‖ ≤ δ, the sequence unn≥1 of the Newton method is well-denedand converges to u∗. Moreover, there exists a constant M < 1/δ such that

‖un+1 − u∗‖ ≤ ‖un − u∗‖2

and‖un − u∗‖ ≤ (Mδ)2

n

/M

Proof.

See [Atkinson & Han (2009, section 5.4)]


Newton method for nonlinear systems of equations I

Let F : Rm → Rm and consider the nonlinear system of equations

F (u) = 0

The iterations of the Newton method are dened by

un+1 = un − F ′(un)−1F (un)

where F ′(un) ∈ Rm×m is called the tangent matrix at un.In algebraic notations, F (u) and F ′(u) can be expressed as follows:

u =

a1...am

, F (u) =

F1(a1, . . . , am)...

Fn(a1, . . . , am)

, F ′(u) =

∂F1∂a1

(u) . . . ∂F1∂am

(u)...

...∂Fm∂a1

(u) . . . ∂Fm∂am

(u)


Modied Newton method

One iteration of the (full) Newton method can be written as a linear system ofequations

Anδn = −F (un), δn = un+1 − un

where An = F ′(un). In order to avoid the computation of the tangent matrixF ′(un) at each iteration, we can use modied Newton iterations where An isonly an approximation of F ′(un). For example, we could update An when theconvergence is too slow or after every k iterations:

An = F ′(um) for n = mk + j , j ∈ 0, . . . , k − 1

Remark.

The convergence of the modied Newton method is usually slower that (full)Newton method but more iterations can be performed for the samecomputation time.

Interpolation Best approximation Orthogonal polynomials

Part VI

Interpolation / Approximation

18 InterpolationLagrange interpolationHermite interpolationTrigonometric interpolation

19 Best approximationElements on topological vector spacesGeneral existence resultsExistence and uniqueness of best approximationBest approximation in Hilbert spaces

20 Orthogonal polynomialsWeighted L2 spacesClassical orthogonal polynomials

Interpolation Best approximation Orthogonal polynomials

Introduction

Principle of approximation

The aim is to replace a function f , known exactly or approximately, by anapproximating function p which is more convenient for numerical computation.

The most commonly used approximating functions p are polynomials, piecewisepolynomials or trigonometric polynomials.There are several ways of dening the approximating function among a givenclass of functions: interpolation, projection, ...

Interpolation Best approximation Orthogonal polynomials Lagrange interpolation Hermite interpolation Trigonometric interpolation

Part VI






Part VI






Preliminary denitions

We denote by Pn(I ) the space of polynomials of degre n dened on the closedinterval I ⊂ R:

Pn(I ) = v : I → R; v(x) =n∑i=0

vixi , vi ∈ R

We denote by C(I ) the space of continuous functions f : I → R. C(I ) is aBanach space when equipped with the norm

‖f ‖C(I ) = supx∈I|f (x)|

We denote by f (i) the i-th derivative of f . We denote by Cm(I ) the space of mtimes dierentiable functions f such that all its derivatives f (i) of order i ≤ mare continuous. Cm(I ) is a Banach space when equipped with the norm

‖f ‖Cm(I ) = maxi≤m‖f (i)‖C(I )


Lagrange interpolation

Let f ∈ C([a, b]) be a continuous function dened on the interval [a, b]. Weintroduce a set of n + 1 distinct points xini=0 on [a, b], such that

a ≤ x0 < . . . < xn ≤ b

The Lagrange interpolation pn ∈ Pn of f is the unique polynomial of degree nsuch that

pn(xi ) = f (xi ) for all i ∈ 0, . . . , n

We can represent pn as follows:

pn(x) =n∑i=0

f (xi )ì (x), ì (x) =n∏j=0

j 6=i

x − xjxi − xj

where the ìni=0 form a basis of Pn, called the Lagrange interpolation basis. Itis the unique basis of functions satisfying the interpolation conditions

ì (xj ) = δij ∀i , j ∈ 0, . . . , n



Theorem 93

Assume f ∈ Cn+1([a, b]). Then, for x ∈ [a, b], there exists ξx ∈ [a, b] such that

f (x)− pn(x) =ωn(x)

(n + 1)!f (n+1)(ξx), ωn(x) =

n∏i=0

(x − xi )

Inuence of the interpolation grid: Function wn(x) on [−1, 1]

Gauss-Legendre grid (blue), Uniform grid (red), Random grid (black)

−1 −0.5 0 0.5 1−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

n = 5 −1 −0.5 0 0.5 1−0.5

0

0.5

1

1.5

2



Theorem 93

Assume f ∈ Cn+1([a, b]). Then, for x ∈ [a, b], there exists ξx ∈ [a, b] such that

f (x)− pn(x) =ωn(x)

(n + 1)!f (n+1)(ξx), ωn(x) =

n∏i=0

(x − xi )

Inuence of the interpolation grid: Function wn(x) on [−1, 1]

Gauss-Legendre grid (blue), Uniform grid (red), Random grid (black)

−1 −0.5 0 0.5 1−6

−5

−4

−3

−2

−1

0

1

2x 10

−3

n = 11 −1 −0.5 0 0.5 1−0.5

0

0.5

1

1.5

2


Lagrange interpolation: a famous example...

Runge function f (x) = 1

1+x2on [−5, 5]

Uniform grid: n = 5, 11, 19

−5 0 5−0.2

0

0.2

0.4

0.6

0.8

1

1.2

−5 0 5−0.2

0

0.2

0.4

0.6

0.8

1

1.2

−5 0 5−1

0

1

2

3

4

5

6

7

8

9

Gauss-Legendre grid: n = 5, 11, 19

−5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Part VI






Hermite polynomial interpolationFirst order interpolation

First order Hermite polynomial interpolation consists in interpolating a functionf (x) and its derivative f ′(x).Assume f ∈ C 1([a, b]). We introduce a set of n + 1 distinct points xini=0 on[a, b], with

a ≤ x0 < . . . < xn ≤ b

The hermite interpolant p2n+1 ∈ P2n+1 of f is uniquely dened by the followinginterpolation conditions:

p2n+1(xi ) = f (xi ), p′2n+1(xi ) = f ′(xi ), 0 ≤ i ≤ n


General Hermite polynomial interpolationHigher order interpolation

Hermite interpolation can be generalized for the interpolation of higher orderderivatives. At a given point xi , it interpolates the function and its derivativesup to the order mi ∈ N. Let N =

∑n

i=0(mi + 1)− 1. A generalized Hermite

interpolant pN ∈ PN is uniquely dened by the following conditions

p(j)N (xi ) = f (j)(xi ), 0 ≤ j ≤ mi , 0 ≤ i ≤ n

Theorem 94

Assume f ∈ CN+1([a, b]). Then, for x ∈ [a, b], there exists ξx ∈ [a, b] such that

f (x)− pN(x) =ωN(x)

(N + 1)!f (N+1)(ξx), ωN(x) =

n∏i=0

(x − xi )mi


Part VI






Trigonometric polynomials

A trigonometric polynomial is dened as follows

pn(x) = a0 +n∑j=1

(aj cos(jx) + bj sin(jx)) , x ∈ [0, 2π)

pn is said of degree n if |an|+ |bn| 6= 0. An equivalent notation is as follows:

pn(x) =n∑

j=−n

cjeijx ,

witha0 = c0, aj = cj + c−j , bj = i(cj − c−j )

or equivalently (under a polynomial-like form)

pn(x) =n∑

j=−n

cjzj = z−n

2n∑k=0

ck−nzk , z = e ix


Trigonometric interpolation

We introduce 2n + 1 distinct interpolation points xj2nj=0 in [0, 2π). Classically,we use uniformly distributed points

xj = j2π

2n + 1, 0 ≤ j ≤ 2n

The trigonometric interpolant of degree n of function f is dened by thefollowing conditions

pn(xj ) = f (xj ), 0 ≤ j ≤ 2n

It can be equivalently reformulated as an interpolation problem in the complexplane: nd cknk=−n such that

2n∑k=0

ck−nzkj = znj f (xj ), 0 ≤ j ≤ 2n

where we have introduce complex points zj = e ixj .

Interpolation Best approximation Orthogonal polynomials Elements on topological vector spaces General existence results Existence and uniqueness of best approximation Best approximation in Hilbert spaces

Part VI






The problem of the best approximation

The aim is to nd the best approximation p of a function f in a set offunctions K (e.g. a polynomial space, piecewise polynomial space, ...)

minp∈K‖f − p‖

The obtained best approximation p depends on the norm selected formeasuring the error (e.g. L2-norm, L∞-norm, ...).

We will rst introduce some general results about optimization problems

infv∈K

J(v)

by giving some general conditions on the set K and the function J for theexistence of a minimizer.


An rst comprehensive case: extrema of real-valued functions I

Consider a real-valued continuous function J ∈ C([a, b]). The problem is tond a minimizer of J

infv∈[a,b]

J(v)

The classical result of Weierstrass states that a continuous function on a closedinterval K = [a, b] has a minimum in K (and a maximum). We recall the mainsteps of a typical proof in order to obtain more general requirements on K andJ.

1 We denote byα = inf

v∈KJ(v)

By denition of the inmum, there exists a sequence vn ⊂ K such thatlimn→∞ J(vn) = α.

2 K is a closed and bounded interval in R, and therefore it is a compact set.Therefore, from the sequence vn ⊂ K , we can extract a subsequencevnk which converges to some v∗ ∈ K ,

vnk −→k→∞v∗


An rst comprehensive case: extrema of real-valued functions II

3 Using the continuity of J, we obtain

J(v∗) = limk→∞

J(vnk ) = α

which proves that v∗ is a minimizer of J in K .

Now we come back on the dierent points of the proof in order to generalizethe existence result for functionals J dened on a subset K of a Banach spaceV .

1 The existence of a minimizing sequence vn ⊂ K is the denition of theinmum.

2 In an innite-dimensional Banach space V , a bounded sequence does notnecessarily admits a converging subsequence. However, for a reexiveBanach space V , there exists a weakly convergent subsequence. We thensuppose that V is a reexive Banach space and K ⊂ V is a bounded set.In order for K to contain the limit of this subsequence, K has to be weaklyclosed.

3 Finally, we want the weak limit of the subsequence to be a minimizer of J.We could then impose to J to be continuous with respect to a weak limit.However, this condition is too restrictive and it is sucient to impose thatJ is weakly lower semi continuous (allowing discontinuities).


Part VI






Elements on topological vector spaces I

In the following, V denotes a normed space, i.e. a vector space equipped witha norm ‖ · ‖.

Denition 95 (Strong convergence on V )

A sequence vn ⊂ V is said to converge strongly to v ∈ V if

limn→∞

‖vn − v‖ = 0

It is denotedvn → v

Denition 96 (Cauchy sequence)

A sequence vn ⊂ V is Cauchy if

limn→∞

supi,j≥n‖vi − vj‖ = 0

or equivalently, if ∀ε > 0, there exists n ∈ N such that for all i , j ≥ n,‖vi − vj‖ ≤ ε.


Elements on topological vector spaces II

Denition 97 (Closed set)

A subset K ⊂ V is said to be closed if it contains all the limits of itsconvergent sequences:

vn ⊂ K and vn → v ⇒ v ∈ K

The closure K of a set K is the union of this set and of the limits of allconverging sequences in K .

Denition 98 (Compact set)

A subset K of a normed space V is said to be (sequentially) compact if everysequence vnn∈N contains a subsequence vnk k∈N converging to an elementin K .A set K whose closure K is compact is said relatively compact.

Denition 99 (Banach space)

A Banach space is a complete normed vector space, i.e. a normed vector spacesuch that every Cauchy sequence in V has a limit in V .


Elements on topological vector spaces III

Denition 100 (Dual of a normed space V )

The dual space of a normed space V is set space V ′ = L(V ,R) of linearcontinuous maps from V to R. V ′ is a Banach space for the norm

‖L‖ = supv∈V :‖v‖≤1

|L(v)| = supv∈V

|L(v)|‖v‖ , L ∈ V ′

Denition 101 (Reexive normed space)

A normed space V is said reexive if V ′′ = V , where V ′′ = (V ′)′ is the dual ofthe dual of V , also called bidual of V .

Denition 102 (Strong convergence on V ′)

A sequence Ln ⊂ V ′ is said to converge strongly to L ∈ V ′ if

limn→∞

‖Ln − L‖ = 0


Elements on topological vector spaces IV

The dual space can be used to dene a new topology on V , called the weaktopology. The notions of convergence, closure, continuity... can be redenedwith respect to this new topology.

Denition 103 (Weak convergence on V )

A sequence vn ⊂ V is said to converge weakly to v ∈ V if

limn→∞

L(v − vn) = 0 ∀L ∈ V ′

It is denotedvn v

Denition 104 (Weakly closed set in V )

A subset K ⊂ V is said to be weakly closed if it contains all the limits of itsweakly convergent sequences:

vn ⊂ K and vn v ⇒ v ∈ K


Elements on topological vector spaces V

Denition 105 (Weakly compact set)

A subset K of a normed space V is said to be weakly compact if everysequence vnn∈N contains a subsequence vnk k∈N weakly converging to anelement in K .A set K whose closure in the weak topology is weakly compact is said weaklyrelatively compact.

Theorem 106 (Reexive Banach spaces and converging bounded sequences)

A Banach space V is reexive if and only if every bounded sequence in V has asubsequence weakly converging to an element in V .

Let us note that the above theorem could be reformulated as follows: a Banachspace is reexive if and only if the unit ball is relatively compact in the weaktopology.

Theorem 107

A set K in V is bounded and weakly closed if and only if it is weakly compact.


Lower semicontinuity I

Denition 108 (Lower semicontinuity)

A function J : V → R is lower semicontinuous (l.s.c.) if

vn ⊂ K and vn → v ∈ K ⇒ J(v) ≤ lim infn→∞

J(vn)

Denition 109 (Weak lower semicontinuity)

A function J : V → R is weakly lower semicontinuous (w.l.s.c.) if

vn ⊂ K and vn v ∈ K ⇒ J(v) ≤ lim infn→∞

J(vn)

Proposition 110

Continuity implies lower semicontinuity (but the converse statement is nottrue)

Weak lower semicontinuity implies lower semicontinuity (but the conversestatement is not true)


Lower semicontinuity II

Example 111

Let us prove that the norm function ‖.‖ : v ∈ V 7→ ‖v‖ ∈ R in a normed spaceV is w.l.s.c.Let vn ⊂ V be a weakly convergent sequence with vn v . There exists alinear form L ∈ V ′ such that L(v) = ‖v‖ and ‖L‖ = 1 (Corollary of theGeneralized Hahn-Banach theorem). We then have

L(vn) ≤ ‖L‖‖vn‖ = ‖vn‖

and therefore‖v‖ = L(v) = lim

n→∞L(vn) ≤ lim inf

n→∞‖vn‖

If V is an inner product space, we have a simpler proof. Indeed,

‖v‖2 = (v , v) = limn→∞

(vn, v) ≤ lim infn→∞

‖v‖‖vn‖


Part VI






General existence results I

We introduce the problem

infv∈K

J(v) (π)

Theorem 112

Assume V is a reexive Banach space. Let K ⊂ V denote a bounded andweakly closed set. Let J : V → R denote a weakly l.s.c. function. Then,problem (π) has a solution in K.

Proof.

Denote α = infv∈K J(v) and vn ⊂ K a minimizing sequence such thatlimn→∞ J(vn) = α. Since K is bounded, vn is a bounded sequence in areexive Banach space and therefore, we can extract a subsequence vnk weakly converging to some u ∈ V . Since K is weakly closed, u ∈ K . Since J isw.l.s.c.

J(u) ≤ lim infk→∞

J(vnk ) = α

and therefore, u ∈ K is a minimizer of J.


General existence results II

We now remove the boundedness of the set K by adding a coercivity conditionon J.

Denition 113

A functional J : V → R is said coercive if

J(v)→ +∞ as ‖v‖ → ∞

Theorem 114

Assume V is a reexive Banach space. Let K ⊂ V denote a weakly closed set.Let J : V → R denote a weakly l.s.c. and coercive function. Then, the problem(π) has a solution in K.


General existence results III

Proof.

Pick an element v0 ∈ K with J(v0) <∞ and let K0 = v ∈ K ; J(v) ≤ J(v0).Since J is coercive, K0 is bounded. Moreover, K0 is weakly closed. Indeed, ifvn ⊂ K0 is such that vn v∗, then v∗ ∈ K (since K is weakly closed) andJ(v∗) ≤ lim infn J(vn) ≤ J(v0), and therefore v∗ ∈ K0. The optimizationproblem is then equivalent to the optimization problem

infv∈K0

J(v)

of a w.l.s.c. function on a bounded and weakly closed set. Theorem 112 allowsto conclude on the existence of a minimizer.

Lemma 115 (Convex closed sets are weakly closed)

A convex and closed set K ⊂ V is weakly closed.

Lemma 116 (Convex l.s.c. functions are w.l.s.c.)

A convex and l.s.c. function is also w.l.s.c.


General existence results IV

For convex sets and convex functions, theorems 112 and 114 can then bereplaced by the following theorem.

Theorem 117

Assume V is a reexive Banach space. Let K ⊂ V denote a convex and closedset. Let J : V → R denote a convex l.s.c. function. Then, if either (i) K isbounded, or (ii) J is coercive on K, then the minimization problem (π) has asolution in K. Moreover, if J is strictly convex, this solution is unique.

Proof.

The existence simply follows from theorems 112 and 114 and from properties115 and 116. It remains to prove the uniqueness if J is strictly convex. Assumethat u1, u2 ∈ K are two solutions such that u1 6= u2. We haveJ(u1) = J(u2) = minv∈K J(v). Since K is convex, αu1 + (1− α)u2 ∈ K forα ∈ (0, 1), and by strict convexity of J, we have

J(αu1 + (1− α)u2) < αJ(u1) + (1− α)J(u2) = minv∈K

J(v)

which contradicts the fact that u1 and u2 are solutions.


General existence results V

In the case of a non reexive Banach space V (e.g. V = C([a, b])) the abovetheorems do not apply. However, the reexivity is used for the extraction of aweakly convergent subsequence from a bounded sequence in K . In fact, we justneed the completeness of the set K and not of the space V . In particular, fornite-dimensional subset K , we have.

Theorem 118

Assume V is a normed space. Let K ⊂ V denote a nite-dimensional convexand closed set. Let J : V → R denote a convex l.s.c. function. Then, if either(i) K is bounded, or (ii) J is coercive on K, then the minimization problem (π)has a solution in K. Moreover, if J is strictly convex, this solution is unique.


Part VI






Existence and uniqueness of best approximation I

We apply the general results about optimization on the following bestapproximation problem. For a given element u ∈ V , where V is a normedspace, we want to nd the elements in a subset K ⊂ V which are the closest tou. The problem writes

infv∈K‖u − v‖

Denoting J(v) = ‖u − v‖, the problem can then be written under the forminfv∈K J(v).

Property 119

Function J(v) = ‖u − v‖ is convex, continuous (and hence w.l.s.c), andcoercive.

We then have the two existence results.

Theorem 120

Let V be a reexive Banach space and K ⊂ V a closed convex subset. Thenthere exists a best approximation u ∈ K verifying

‖u − u‖ = minv∈K‖u − v‖


Existence and uniqueness of best approximation II

Theorem 121

Let V be a normed space and K ⊂ V a nite-dimensional closed convexsubset. Then there exists a best approximation u ∈ K verifying

‖u − u‖ = minv∈K‖u − v‖

For the uniqueness of the best approximation, we have to look at the propertiesof the norm.

Theorem 122

I there exists a p > 1 such that v 7→ ‖v‖p is strictly convex, then a solution uof the best approximation problem is unique.

Example 123

If V is a Hilbert space equipped with the inner product (·, ·) andassociated norm ‖ · ‖, v 7→ ‖v‖2 is a strictly convex function.

If V = Lp(Ω) with p ∈ (1,+∞), v 7→ ‖v‖pLp(Ω) is strictly convex.


Part VI






Best approximation in Hilbert spaces I

Let V be a Hilbert space equipped with inner product (·, ·) and associatednorm ‖ · ‖.

Lemma 124

Let K be a closed convex set in Hilbert space V . u ∈ K is a bestapproximation of u ∈ V if and only if

(u − u, v − u) ≤ 0 ∀v ∈ K


Best approximation in Hilbert spaces II

Proof.

First suppose that u ∈ K is a best approximation of u ∈ V . Then,

‖u − u‖2 ≤ ‖w − u‖2

∀w ∈ K . By selecting w = u + α(v − u), with α ∈ (0, 1) and v ∈ K , we have

0 ≥ ‖u − u‖2 − ‖(u − u) + α(v − u)‖2

= −α2(v − u, v − u)− 2α(u − u, v − u)

for all α ∈ (0, 1). That implies (u − u, v − u) ≥ 0 ∀v ∈ K .Conversely, if (u − u, v − u) ≥ 0, ∀v ∈ K , then

‖v − u‖2 = ‖(v − u) + (u − u)‖2

= ‖v − u‖2 + 2(v − u, u − u) + ‖u − u‖2

≥ ‖u − u‖2

for all v ∈ K .


Best approximation in Hilbert spaces III

Corollary 125

Let K be a closed convex set in Hilbert space V . For any u ∈ V , the bestapproximation in K is unique.

Proof.

Let u1, u2 ∈ K be two best approximations of u ∈ V . Then,(u − u1, u2 − u1) ≤ 0 and (u − u2, u1 − u2) ≤ 0. Additionning theseinequalities, we obtain

(u2 − u1, u2 − u1) = ‖u2 − u1‖2 ≤ 0

and therefore u1 = u2.

We then conclude with the following theorem.


Best approximation in Hilbert spaces IV

Theorem 126

Let K ⊂ V be a nonempty closed convex set in Hilbert space V . For anyu ∈ V , there exists a unique best approximation u ∈ K dened by

‖u − u‖ = minv∈K‖u − v‖


Best approximation in Hilbert spaces V

Remark.

Let us give another classical proof for the existence of a best approximation,which uses the inner product structure of the space V . Let unn∈N ⊂ K be aminimizing sequence such that limn→∞ ‖u − un‖ = α = infv∈K ‖u − v‖. Usingthe parallelogram law satised by the norm ‖.‖ in an inner product space, wehave

2‖u − un‖2 + 2‖u − um‖2 = ‖un − um‖2 + ‖2u − un − um‖2

Since K is convex, we have (un + um)/2 ∈ K and therefore

‖un − um‖2 = 2‖u − un‖2 + 2‖u − um‖2 − 4‖u − (un + um)/2‖2

≤ 2‖u − un‖2 + 2‖u − um‖2 − 4α2 −→m,n→∞

0

which proves that un is a Cauchy sequence. Since V is complete, un ⊂ Kconverges to an element u ∈ V and since K is closed, u ∈ K .


Best approximation in Hilbert spaces: Projection I

Denition 127 (Projector on a convex set)

The best approximation u ∈ K of u ∈ V in a closed convex set K is called theprojection of u onto K and is denoted

u = PK (u)

where PK : V → K is called the projection operator of V onto K .

Proposition 128

The projection operator is monotone

(PK (v)− PK (u), v − u) ≥ 0 ∀u, v ∈ V

and non expansive

‖PK (v)− PK (u)‖ ≤ ‖v − u‖ ∀u, v ∈ V


Best approximation in Hilbert spaces: Projection II

Proof.

From the characterizations of PK (u) ∈ K and PK (v) ∈ K , we have respectively

(PK (u)− u,PK (v)− PK (u)) ≥ 0, (PK (v)− v ,PK (u)− PK (v)) ≥ 0

Adding these inequalities, we obtain

(v − u,PK (v)− PK (u)) ≥ (PK (v)− PK (u),PK (v)− PK (u)) ≥ 0

and

‖PK (v)− PK (u)‖2 ≤ (v − u,PK (v)− PK (u)) ≤ ‖v − u‖‖PK (v)− PK (u)‖

We now introduce the following particular case when K is a subspace of V .


Best approximation in Hilbert spaces: Projection III

Theorem 129 (Projection on linear subspaces)

Let K be a complete subspace of V . Then, for any u ∈ V , there exists aunique best approximation u = PK (u) ∈ K characterized by

(u − PK (u), v) = 0 ∀v ∈ K

Proof.

We have(u − u,w − u) ≤ 0 ∀w ∈ K

and since K is a subspace, for all v ∈ K , w = u ± v ∈ K , and therefore

±(u − u, v) ≤ 0 ∀v ∈ K

In the case where K is a subspace, u − PK (u) is orthogonal to K , andtherefore, PK is called an orthogonal projection operator.


Best approximation in Hilbert spaces: Projection IV

Let us consider that we know an orthonormal basis ϕini=1 of K = Kn. Theprojection PKn (u) is characterized by

PKn (u) =n∑i=1

(ϕi , u)ϕi

Example 130 (Least square approximation by polynomials)

Let V = L2(−1, 1) and Kn = Pn(−1, 1) the space of polynomials of degree lessthan n. An orthonormal basis of Kn is given by the Legendre polynomialsLini=0 dened by

Li (x) =√

(2i + 1)/21

2i i !

d i

dx i

((x2 − 1)i

)


Best approximation in Hilbert spaces: Projection V

Example 131 (Least square approximation by trigonometric polynomials)

Let V = L2(0, 2π) and Kn the space of trigonometric polynomials of degreeless than n. The best approximation un = PKn (u) is characterized by

un(x) = a0/2 +n∑j=1

(aj cos(jx) + bj sin(jx))

with

aj =1

(cos(jx), cos(jx))(u(x), cos(jx)) =

1

π

∫2π

0

u(x) cos(jx)dx , j ≥ 0

bj =1

(sin(jx), sin(jx))(u(x), sin(jx)) =

1

π

∫2π

0

u(x) sin(jx)dx , j ≥ 1

Note that un tends to the well-known Fourier series expansion of u.

Interpolation Best approximation Orthogonal polynomials Weighted L2 spaces Classical orthogonal polynomials

Part VI






Part VI






Weighted L2 spaces

Let I ⊂ R and ω : I → R be a weight function which is integrable on I andalmost everywhere positive. We introduce the weighted function space

L2ω(I ) = v : I → R; v is measurable on I ,

∫I

|v(x)|2ω(x)dx < +∞

L2ω(I ) is a Hilbert space for the inner product

(u, v) =

∫I

u(x)v(x)ω(x)dx

and associated norm

‖u‖ =

(∫I

u(x)2ω(x)dx

)1/2

Two functions u, v ∈ L2ω(I ) are said orthogonal if (u, v) = 0.


Part VI






Classical orthogonal polynomials I

A system of orthonormal polynomials pnn≥0, with pn ∈ Pn(I ), can beconstructed by applying the Gram-Schmidt procedure to the basis ofmonomials 1, x , x2, . . .. For a given interval I and weight function ω, it leadsto a uniquely dened system of polynomials.In the following table, we indicate classical families of polynomials for dierentinterval domains I and weight functions.

Classical orthogonal polynomials

I ω(x) pn

(−1, 1) (1+x)a−1(1−x)b−1

2a+b−1B(a,b)Jacobi

(−1, 1) 1

2Legendre

(−1, 1) (1−x2)−1/2

B(1/2,1/2)Chebyshev of rst kind

(−1, 1) (1−x2)1/2

4B(3/2,3/2)Chebyshev of second kind

R 1√2πexp(− x2

2) Hermite

(0,+∞) 1

Γ(a)xaexp(−x) Laguerre


Classical orthogonal polynomials II

Γ denotes the Euler Gamma function dened by

Γ(a) =

∫ ∞0

xa−1exp(−x)dx

B(a, b) denotes the Euler Beta function dened by

B(a, b) =Γ(a)Γ(b)

Γ(a + b)

Remark.

The given weight functions are such that∫I

ω(x)dx = 1

It then denes a measure µ with density ω (dµ(x) = ω(x)dx) and with unitarymass. Equivalently, µ (resp. ω) can be interpreted as the probability law (resp.probability density function) of a random variable.


Classical orthogonal polynomials III

Exercice.

Construct by the Gram-Schmidt procedure the orthonormal polynomials ofdegree n = 0, 1, 2 on the interval I = (0, 1) and for the weight functionω(x) = log(1/x).

Basic quadrature formulas Gauss quadrature

Part VII

Numerical integration

21 Basic quadrature formulas

22 Gauss quadrature



Given a function f : Ω→ R, the aim is to approximate the value of the integral

I (f ) =

∫Ω

f (x)dx

using evaluations of the function

I (f ) ≈n∑

k=1

f (xk)ωk

or eventually of the function and its derivatives

I (f ) ≈n∑

k=1

f (xk)ωk +n∑

k=1

f ′(xk)ωk + . . .

These approximations are called quadrature formulas. A quadrature formula issaid of interpolation type if it uses only evaluations of the function.


Integration error and precision

We denote by In(f ) the quadrature formula.

Denition 132

A quadrature formula have a degree of precision k if it integrates exactly allpolynomials of degree less or equal to k

In(f ) = I (f ) ∀f ∈ Pk(Ω)

In(f ) 6= I (f ) for some f ∈ Pk+1(Ω)


Part VII



22 Gauss quadrature


Basic quadrature formulas

Rectangle formula (precision degree 1)∫ b

a

f (x)dx ≈ (b − a)f (a + b

2)

Trapezoidal formula (precision degree 1)∫ b

a

f (x)dx ≈ (b − a)f (a) + f (b)

2

Simpson formula (precision degree 3)∫ b

a

f (x)dx ≈ (b − a)

6(f (a) + 4f (

a + b

2) + f (b))

...


Composite quadrature formulas

In order to compute

I (f ; Ω) =

∫Ω

f (x)dx ,

we divide the domain Ω into m subdomains Ωmi=1 such that

I (f ; Ω) =m∑i=1

I (f ; Ωi )

and we introduce a basic quadrature formula on each subdomain

I (f ; Ω) ≈m∑i=1

In(f ; Ωi )


Part VII



22 Gauss quadrature


Gauss quadrature I

We want to approximate the weighted integral of a function f

Iw (f ) =

∫ b

a

f (x)w(x)dx

where w(x)dx denes a measure of integration. A Gauss quadrature formulawith n points is dened by

Iw (f ) ≈ Iwn (f ) =n∑i=1

ωi f (xi )

with points and weights such that it integrates exactly all polynomialsf ∈ P2n−1(a, b). The xi (resp. ωi ) are called Gauss points (resp. Gaussweights) associated with the present measure. We introduce the function spaceL2w (a, b) and its natural inner product

(f , g)w =

∫ b

a

f (x)g(x)w(x)dx


Gauss quadrature II

Theorem 133

In(f ) = I (f ) for all f ∈ P2n−1(a, b) if and only if

the points xi are such that the polynomialzn(x) =

∏n

i=1(x − xi ) ∈ Pn(a, b) is orthogonal to Pn−1(a, b), i.e.

(zn(x), p(x))w = 0 ∀p ∈ Pn−1(a, b)

the weights are dened by ωi = I (Li ), where Li is the Lagrange interpolantat xi , dened by Li (x) =

∏n

j=1,j 6=i (x − xj )/(xi − xj )

Corollary 134

The n Gauss points of a n-points Gauss quadrature are the n roots of thedegree n orthogonal polynomial.

For (a, b) = (−1, 1) and w(x) = 1, the xi are the n roots of the degree nLegendre polynomial.

For (a, b) = (−∞,∞) and w(x) = exp(−x2), the xi are the n roots of thedegree n Hermite polynomial.

...

Date post:	07-Dec-2014
Category:	Documents
Upload:	franco-nelson
View:	24 times
Download:	1 times

M1 Numerical Analysis Course

Documents