MATH 3795Lecture 6. Sensitivity of the Solution of a Linear
System
Dmitriy Leykekhman
Fall 2008
GoalsI Understand how does the solution of Ax = b changes when A or b
change.
I Condition number of a matrix (with respect to inversion).
I Vector and matrix norms.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 1
Linear Systems
I Given A ∈ Rn×n and b ∈ Rn we are interested in the solutionx ∈ Rn of
Ax = b.
I Suppose that instead of A, and b we are given A+ ∆A and b+ ∆b,where ∆A ∈ Rn×n and ∆b ∈ Rn. How do these perturbations inthe data change the solution of the linear system?
I First we need to understand how to measure the size of vectors andof matrices. This leads to vector norms and matrix norms.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 2
Vector Norms
DefinitionA (vector) norm on Rn is a function
‖ · ‖ : Rn → Rx→ ‖x‖
which for all x, y ∈ Rn and α ∈ R satisfies
1. ‖x‖ ≥ 0, ‖x‖ = 0 ⇔ x = 0,
2. ‖αx‖ = |α|‖x‖,3. ‖x+ y‖ ≤ ‖x‖+ ‖y‖, (triangle inequality).
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 3
Vector Norms
The most frequently used norms on Rn are given by
‖x‖2 =
(n∑i=1
x2i
)1/2
, 2-norm
The MATLAB’s build in function norm(x) or norm(x, 2).More generally for any p ∈ [1,∞)
‖x‖p =
(n∑i=1
|xi|p)1/p
, p-norm.
The MATLAB’s build in function norm(x, p) and
‖x‖∞ = maxi=1,...,n
|xi|, ∞-norm.
The MATLAB’s build in function norm(x, inf)
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 4
Vector Norms
ExampleLet x = (1,−2, 3,−4)T . Then
‖x‖1 = 1 + 2 + 3 + 4 = 10,
‖x‖2 =√
1 + 4 + 9 + 16 =√
30 ≈ 5.48,‖x‖∞ = max {1, 2, 3, 4} = 4.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 5
Vector Norms
The boundaries of the unit balls defined by
{x ∈ Rn : ‖x‖p ≤ 1}.
One can show the following useful inequalities:
I
‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1.I Let ‖ · ‖ is any vector norm on Rn, then
‖x+ y‖ ≥∣∣‖x‖ − ‖y‖∣∣ for all x, y ∈ Rn.
I Cauchy-Schwarz inequality,
xT y ≤ ‖x‖2‖y‖2 for all x, y ∈ Rn.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 6
Vector Norms
The boundaries of the unit balls defined by
{x ∈ Rn : ‖x‖p ≤ 1}.
One can show the following useful inequalities:
I
‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1.
I Let ‖ · ‖ is any vector norm on Rn, then
‖x+ y‖ ≥∣∣‖x‖ − ‖y‖∣∣ for all x, y ∈ Rn.
I Cauchy-Schwarz inequality,
xT y ≤ ‖x‖2‖y‖2 for all x, y ∈ Rn.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 6
Vector Norms
The boundaries of the unit balls defined by
{x ∈ Rn : ‖x‖p ≤ 1}.
One can show the following useful inequalities:
I
‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1.I Let ‖ · ‖ is any vector norm on Rn, then
‖x+ y‖ ≥∣∣‖x‖ − ‖y‖∣∣ for all x, y ∈ Rn.
I Cauchy-Schwarz inequality,
xT y ≤ ‖x‖2‖y‖2 for all x, y ∈ Rn.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 6
Vector Norms
The boundaries of the unit balls defined by
{x ∈ Rn : ‖x‖p ≤ 1}.
One can show the following useful inequalities:
I
‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1.I Let ‖ · ‖ is any vector norm on Rn, then
‖x+ y‖ ≥∣∣‖x‖ − ‖y‖∣∣ for all x, y ∈ Rn.
I Cauchy-Schwarz inequality,
xT y ≤ ‖x‖2‖y‖2 for all x, y ∈ Rn.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 6
Vector Norms
TheoremVector norms on Rn are equivalent, i.e. for every two vector norms ‖ · ‖aand ‖ · ‖b on Rn there exist constants cab, Cab (depending on the vectornorms ‖ · ‖a and ‖ · ‖b, but not on x) such that
cab‖x‖b ≤ ‖x‖a ≤ Cab‖x‖b ∀x ∈ Rn.
In particular, for any x ∈ Rn we have the inequalities
1√n‖x‖1 ≤ ‖x‖2 ≤ ‖x‖1
‖x‖∞ ≤ ‖x‖2 ≤√n‖x‖∞
‖x‖∞ ≤ ‖x‖1 ≤ n‖x‖∞.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 7
Matrix Norms
DefinitionA matrix norm on Rm×n is a function
‖ · ‖ : Rm×n → RA→ ‖A‖,
which for all A,B ∈ Rm×n and α ∈ R satisfies
1. ‖A‖ ≥ 0, ‖A‖ = 0 ⇔ A = 0 (zero matrix),
2. ‖αA‖ = |α|‖A‖,3. ‖A+B‖ ≤ ‖A‖+ ‖B‖, (triangle inequality).
Warning:Matrix- and vector-norms are denoted by the same symbol ‖ · ‖.However, as we will see shortly, vector-norms and matrix-norms arecomputed very differently. Thus, before computing a norm we need toexamine carefully whether it is applied to a vector or to a matrix. Itshould be clear from the context which norm, a vector-norm or amatrix-norm, is used.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 8
Matrix Norms. First Approach.
I View a matrix A ∈ Rm×n as a vector in Rmn, by stacking thecolumns of the matrix into a long vector.
I Apply the vector-norms to this vectors of length mn.
I This will give matrix norms. For example if we apply the2-vector-norm, then
‖A‖F =
n∑i=1
m∑j=1
a2ij
1/2
.
This is called the Frobenius norm.(We will use ‖A‖2 to denote a different matrix norm.)
I This approach is not very useful.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9
Matrix Norms. First Approach.
I View a matrix A ∈ Rm×n as a vector in Rmn, by stacking thecolumns of the matrix into a long vector.
I Apply the vector-norms to this vectors of length mn.
I This will give matrix norms. For example if we apply the2-vector-norm, then
‖A‖F =
n∑i=1
m∑j=1
a2ij
1/2
.
This is called the Frobenius norm.(We will use ‖A‖2 to denote a different matrix norm.)
I This approach is not very useful.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9
Matrix Norms. First Approach.
I View a matrix A ∈ Rm×n as a vector in Rmn, by stacking thecolumns of the matrix into a long vector.
I Apply the vector-norms to this vectors of length mn.
I This will give matrix norms. For example if we apply the2-vector-norm, then
‖A‖F =
n∑i=1
m∑j=1
a2ij
1/2
.
This is called the Frobenius norm.(We will use ‖A‖2 to denote a different matrix norm.)
I This approach is not very useful.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9
Matrix Norms. First Approach.
I View a matrix A ∈ Rm×n as a vector in Rmn, by stacking thecolumns of the matrix into a long vector.
I Apply the vector-norms to this vectors of length mn.
I This will give matrix norms. For example if we apply the2-vector-norm, then
‖A‖F =
n∑i=1
m∑j=1
a2ij
1/2
.
This is called the Frobenius norm.(We will use ‖A‖2 to denote a different matrix norm.)
I This approach is not very useful.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 9
Matrix Norms. Second Approach.
I We want to solve linear systems Ax = b.Find a vector x such that if we multiply A by this vector (we applyA to this vector), then we obtain b.
I View a matrix A ∈ Rm×n as a linear mapping, which maps a vectorx ∈ Rn into a vector Ax ∈ Rm
A : Rn → Rm
x→ Ax.
I How do we define the size of a linear mapping?
I Compare the size of the image Ax ∈ Rm with the size of x. Thisleads us to look at
supx6=0
‖Ax‖‖x‖
Here Ax ∈ Rm and x ∈ Rn are vectors and ‖ · ‖ are vector norms (inRm and Rn).
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10
Matrix Norms. Second Approach.
I We want to solve linear systems Ax = b.Find a vector x such that if we multiply A by this vector (we applyA to this vector), then we obtain b.
I View a matrix A ∈ Rm×n as a linear mapping, which maps a vectorx ∈ Rn into a vector Ax ∈ Rm
A : Rn → Rm
x→ Ax.
I How do we define the size of a linear mapping?
I Compare the size of the image Ax ∈ Rm with the size of x. Thisleads us to look at
supx6=0
‖Ax‖‖x‖
Here Ax ∈ Rm and x ∈ Rn are vectors and ‖ · ‖ are vector norms (inRm and Rn).
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10
Matrix Norms. Second Approach.
I We want to solve linear systems Ax = b.Find a vector x such that if we multiply A by this vector (we applyA to this vector), then we obtain b.
I View a matrix A ∈ Rm×n as a linear mapping, which maps a vectorx ∈ Rn into a vector Ax ∈ Rm
A : Rn → Rm
x→ Ax.
I How do we define the size of a linear mapping?
I Compare the size of the image Ax ∈ Rm with the size of x. Thisleads us to look at
supx6=0
‖Ax‖‖x‖
Here Ax ∈ Rm and x ∈ Rn are vectors and ‖ · ‖ are vector norms (inRm and Rn).
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10
Matrix Norms. Second Approach.
I We want to solve linear systems Ax = b.Find a vector x such that if we multiply A by this vector (we applyA to this vector), then we obtain b.
I View a matrix A ∈ Rm×n as a linear mapping, which maps a vectorx ∈ Rn into a vector Ax ∈ Rm
A : Rn → Rm
x→ Ax.
I How do we define the size of a linear mapping?
I Compare the size of the image Ax ∈ Rm with the size of x. Thisleads us to look at
supx6=0
‖Ax‖‖x‖
Here Ax ∈ Rm and x ∈ Rn are vectors and ‖ · ‖ are vector norms (inRm and Rn).
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 10
Matrix Norms
I Let p ∈ [1,∞]. The following identities re valid
supx 6=0
‖Ax‖p‖x‖p
= sup‖x‖p=1
‖Ax‖p = maxx 6=0
‖Ax‖p‖x‖p
= max‖x‖p=1
‖Ax‖p
I One can show
‖A‖p = maxx 6=0
‖Ax‖p‖x‖p
. (1)
Note that on the left hand side in (1) the symbol ‖ · ‖p refers to thep-matrix-norm, while on the right hand side in (1) the symbol ‖ · ‖prefers to the p-vector-norm applied to the vectors Ax ∈ Rm andx ∈ Rn, respectively.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 11
Matrix Norms
I Let p ∈ [1,∞]. The following identities re valid
supx 6=0
‖Ax‖p‖x‖p
= sup‖x‖p=1
‖Ax‖p = maxx 6=0
‖Ax‖p‖x‖p
= max‖x‖p=1
‖Ax‖p
I One can show
‖A‖p = maxx 6=0
‖Ax‖p‖x‖p
. (1)
Note that on the left hand side in (1) the symbol ‖ · ‖p refers to thep-matrix-norm, while on the right hand side in (1) the symbol ‖ · ‖prefers to the p-vector-norm applied to the vectors Ax ∈ Rm andx ∈ Rn, respectively.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 11
Matrix Norms
For the most commonly used matrix-norms (1) with p = 1, p = 2, orp =∞, there exist rather simple representations.
Let ‖ · ‖p be the matrix norm defined in (1), then
‖A‖1 = maxj=1,...,n
m∑i=1
|aij | (maximum column norm);
‖A‖∞ = maxi=1,...,m
n∑j=1
|aij | (maximum row norm);
‖A‖2 =√λmax(ATA) (spectral norm).
where λmax(ATA) is the largest eigenvalue of ATA.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 12
Matrix Norms
ExampleLet
A =
1 3 −6−2 4 2
2 1 −1
.
Then‖A‖1 = max 5, 8, 9 = 9,‖A‖∞ = max 10, 8, 4 = 10,
‖A‖2 =√
max {3.07, 23.86, 49.06} ≈ 7.0045,
‖x‖F =√
76 ≈ 8.718.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 13
Matrix NormsTwo important inequalities.
TheoremFor any A ∈ Rm×n, B ∈ Rn×k and x ∈ Rn, the following inequalitieshold.
‖Ax‖p ≤ ‖A‖p‖x‖p (compatibility of matrix and vector norm)
and
‖AB‖p ≤ ‖A‖p‖B‖p (submultiplicativity of matrix norms)
Note that for the identity matrix I,
‖I‖p = maxx 6=0
‖Ix‖p‖x‖p
= 1.
Compare this with the first approach in which we view I as a vector oflength n2. For example the Frobenius norm (2-vector norm) is
‖I‖F =√n.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 14
Error Analysis
I LetAx = b (2)
be the original system, where A ∈ Rn×n and b ∈ Rn.
I Let(A+ ∆A)x̃ = b+ ∆b (3)
be the perturbed system, where ∆A ∈ Rn×n and ∆b ∈ Rn representthe perturbations in A and b, respectively.
I What is the error ∆x = x̃− x between the solution x of the exactlinear system (7) and the solution ex perturbed linear system (8).
I Use a representationx̃ = x+ ∆x.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 15
Error Analysis
I LetAx = b (2)
be the original system, where A ∈ Rn×n and b ∈ Rn.
I Let(A+ ∆A)x̃ = b+ ∆b (3)
be the perturbed system, where ∆A ∈ Rn×n and ∆b ∈ Rn representthe perturbations in A and b, respectively.
I What is the error ∆x = x̃− x between the solution x of the exactlinear system (7) and the solution ex perturbed linear system (8).
I Use a representationx̃ = x+ ∆x.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 15
Error Analysis
I LetAx = b (2)
be the original system, where A ∈ Rn×n and b ∈ Rn.
I Let(A+ ∆A)x̃ = b+ ∆b (3)
be the perturbed system, where ∆A ∈ Rn×n and ∆b ∈ Rn representthe perturbations in A and b, respectively.
I What is the error ∆x = x̃− x between the solution x of the exactlinear system (7) and the solution ex perturbed linear system (8).
I Use a representationx̃ = x+ ∆x.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 15
Error Analysis. Perturbation in b onlyThe original linear system,
Ax = b,
where A ∈ Rn×n and b ∈ Rn. The perturbed linear system
A(x+ ∆x) = b+ ∆b,
where ∆b ∈ Rn represents the perturbations in b.Subtracting we get
A∆x = ∆b, or ∆x = A−1∆b.
Take norms:‖∆x‖ = ‖A−1∆b‖ ≤ ‖A−1‖‖∆b‖. (4)
To estimate relative error, note that Ax = b and as a result
‖b‖ = ‖Ax‖ ≤ ‖A‖‖x‖ ⇒ 1‖x‖≤ ‖A‖ 1
‖b‖. (5)
Combining (4) and (5) we get
‖∆x‖‖x‖
≤ ‖A‖‖A−1‖‖∆b‖‖b‖
. (6)
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 16
Error Analysis. Perturbation in b onlyThe original linear system,
Ax = b,
where A ∈ Rn×n and b ∈ Rn. The perturbed linear system
A(x+ ∆x) = b+ ∆b,
where ∆b ∈ Rn represents the perturbations in b.Subtracting we get
A∆x = ∆b, or ∆x = A−1∆b.
Take norms:‖∆x‖ = ‖A−1∆b‖ ≤ ‖A−1‖‖∆b‖. (4)
To estimate relative error, note that Ax = b and as a result
‖b‖ = ‖Ax‖ ≤ ‖A‖‖x‖ ⇒ 1‖x‖≤ ‖A‖ 1
‖b‖. (5)
Combining (4) and (5) we get
‖∆x‖‖x‖
≤ ‖A‖‖A−1‖‖∆b‖‖b‖
. (6)
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 16
Error Analysis. Perturbation in b only
DefinitionThe (p-) condition number κp(A) of a matrix A (with respect toinversion) is defined by
κp(A) = ‖A‖p‖A−1‖p.
Set κp(A) =∞ is A is not invertible. MATLAB’s build in functioncond(A).
IfAx = b,
andA(x+ ∆x) = b+ ∆b,
then the relative error between the solutions obeys
‖∆x‖‖x‖
≤ κp(A)‖∆b‖‖b‖
.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 17
Error Analysis. General Case.
I LetAx = b (7)
be the original system, where A ∈ Rn×n and b ∈ Rn.
I Let(A+ ∆A)(∆x+ x) = b+ ∆b (8)
be the perturbed system, where ∆A ∈ Rn×n and ∆b ∈ Rn representthe perturbations in A and b, respectively.
I If ‖A−1‖p‖∆A‖p < 1, then
‖∆x‖p‖x‖p
≤ κp(A)
1− κp(A)‖∆A‖p
‖A‖p
(‖∆A‖p‖A‖p
+‖∆b‖‖b‖
). (9)
If κp(A) is small, we say that the linear system is well conditioned.Otherwise, we say that the linear system is ill conditioned.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 18
Error Analysis. Example. Hilbert Matrix
ExampleHilbert Matrix H ∈ Rn×n with entries
hij =∫ 1
0
xi+j−2 dx =1
i+ j − 1.
For n = 4,
H =
1 1
213
14
12
13
14
15
13
14
15
16
14
15
16
17
.
H−1 =
16 −120 240 −140−120 1200 −2700 1680240 −2700 6480 −4200−140 1680 −4200 2800
.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 19
Error Analysis. Example. Hilbert Matrix.
ExampleWe compute that the condition number of a Hilbert matrix grows veryfast with n. For n = 4
‖H‖1 =2512
‖H−1‖1 = 13620, κ1(H) = 28375,
‖H‖∞ =2512
‖H−1‖∞ = 13620, κ∞(H) = 28375,
‖H‖2 ≈ 1.5 ‖H−1‖2 ≈ 1.03 ∗ 104, κ2(H) ≈ 1.55 ∗ 104.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 20
Error Analysis. Example. Hilbert Matrix.ExampleWe consider the linear systems
Hx = b.
For given n we set xex = (1, . . . , 1)T ∈ Rn, and compute b = Hxex.Then we compute the solution of the linear system Hx = b using theLU-decomposition and compute the relative error between exact solutionxex and computed solution x.
n κ∞(H) ‖xex−x‖∞‖xex‖∞
4 2.837500e+ 004 2.958744e− 0135 9.436560e+ 005 5.129452e− 0126 2.907028e+ 007 5.096734e− 0117 9.851949e+ 008 2.214796e− 0088 3.387279e+ 010 1.973904e− 0079 1.099651e+ 012 4.215144e− 00510 3.535372e+ 013 5.382182e− 004
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 21
Error Analysis.I If we use finite precision arithmetic, then rounding causes errors in
the input data. Using m-digit floating point arithmetic it holds that
|x− fl(x)||x|
≤ 0.5 ∗ 10−m+1.
I Thus, if we solve the linear system in m-digit floating pointarithmetic, then, as rule of thumb, we may approximate the theinput errors due to rounding by
‖∆A‖‖A‖
≈ 0.5 ∗ 10−m+1,‖∆b‖‖b|
≈ 0.5 ∗ 10−m+1
I If the condition number of A is κ(A) = 10α, then
‖∆x‖‖x|
≤ 10α
1− 10α−m+1(0.5 ∗ 10−m + 0.5 ∗ 10−m) ≈ 10α−m.
Provided 10α−m+1 < 1.I Rule of thumb: If the linear system is solved in m-digit floating
point arithmetic and if the condition number of A is of the order10α, then only m− α− 1 digits in the solution can be trusted.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 22
Summary.
I If the condition number of a matrix A is large, then small errors inthe data may lead to large errors in the solution.
I Rule of thumb: If the linear system is solved in m-digit floating pointarithmetic and if the condition number of A is of the order 10α,then only m− α− 1 digits in the solution can be trusted.
D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Sensitivity of the Solution – 23