Math 342 - Linear Algebra II Notes 1. Inner …people.oregonstate.edu/~maischf/M342N5.pdfMath 342 -...

Math 342 - Linear Algebra II Notes

1. Inner Products and Norms

One knows from a basic introduction to vectors in Rn (Math 254 at OSU) that the lengthof a vector x = (x1 x2 ... xn)T ∈ Rn, denoted ‖x‖, or |x|, is given by the formula

‖x‖ =√

x21 + x2

2 + · · ·+ x2n.

We will call this this function ‖ − ‖ : Rn → [0,∞) the (standard) norm on Rn.

Recall the familiar dot product on Rn:

Let x = (x1 x2 ... xn)T ,y = (y1 y2 ... yn)T ∈ Rn. Then

x · y =n∑

i=1

xiyi.

We will call the dot product on Rn the (standard) inner product on Rn, and we will use thefancier notation (x,y) for x · y. Formally, we can regard the inner product as a function(−,−) : Rn × Rn → R.

An easy and important consequence is that (x,x) = ‖x‖2, equivalently, ‖x‖ =√

(x,x).

We also have another expression for this inner product:

For all x,y ∈ Rn, (x,y) = yTx (same as xTy).

Now let’s explore the (standard) norm and the (standard) inner product on Cn:

Recall that for z ∈ C, z = a + bi, and |z|2 = a2 + b2 = zz (really just the R2 (standard)norm!)

Now let z = (z1 z2 ... zn)T ∈ Cn with zj = aj + bji for j = 1, 2, ..., n.

It is natural to define the (standard) norm on Cn by

‖z‖ =

√√√√n∑

j=1

|zj|2 =

√√√√n∑

j=1

(a2j + b2

j) for z ∈ Cn.

Furthermore, with w = (w1 w2 ... wn)T ∈ Cn, we define the (standard) inner product on Cn

by

(z,w) =n∑

i=1

ziwi.

Again, an easy and important consequence is that (z, z) = ‖z‖2, equivalently, ‖z‖ =√

(z, z).

Let A ∈ Mmn have entries in C. Let A∗ = (A)T where A determined by applying the conju-gate operation entry-wise. This combination (conjugate transpose) is the hermitian adjointor just the adjoint of A.

For w, w∗ = (w)T = (w1 w2 ... wn) and thus

(z,w) = w∗z.

Now let’s get more abstract. We will let F denote either R or C.

Let V be an arbitrary vector space over F. An inner product on V is a function

(−,−) : V × V → F,

that takes in two vectors u,v ∈ V and outputs a scalar (u,v) ∈ F, satisfying these fourfundamental properties:

(1) Positivity (aka non-negativity): For all v ∈ V , (v,v) ∈ R and (v,v) ≥ 0.

(2) Definiteness (aka non-degeneracy): Let v ∈ V . Then (v,v) = 0 iff v = 0.

(3) Linearity: For all c ∈ F, for all u,v,w ∈ V , (u + cv,w) = (u,w) + c(v,w).

(4) Conjugate symmetry: For all u,v ∈ V , (u,v) = (v,u) (just symmetry if F = R.)

If V is a vector space over F and (−,−) is an inner product on V then we call V andinner product space over F.

Examples:

(In all of these, if F = R then there is no need for the conjugation.)

(1) Let V = Fn. We have already covered standard (aka Euclidean) inner product on Vgiven by

For all u,v ∈ V, (u,v) = v∗u.

(ELFY) Show that this function does indeed satisfy the four fundamental proper-ties of an inner product...

(2) Let V = Pn, polynomials of degree at most n over F. Let the interval I = [a, b].Then for f(x), g(x) ∈ Pn define

(f(x), g(x))I =

∫ b

a

f(x)g(x)dx.


(3) Let V = Mm,n (over F). For A,B ∈ Mmn define (A,B) = trace(B∗A).


Note:

trace(B∗A) =n∑

i=1

(B∗A)ii =n∑

i=1

(b1ia1i + b2ia2i + · · ·+ bmiami).

Properties of inner products:

From here until otherwise indicated (page 12), let V be an inner product space (over F).

(ELFY) Show that For all c ∈ F, for all u,v,w ∈ V , (u,v + cw) = (u,v) + c(u,w).

Hint: Use conjugate symmetry, linearity, and then conjugate symmetry again!

Proposition: Let x ∈ V . Then,

x = 0 iff for all y ∈ V , (x,y) = 0.

Proof : The forward direction is trivial:

For all y ∈ V , (0,y) = (0y,y) = 0(y,y) = 0... (also (y,0) = (y, 0y) = 0(y,y) = 0).

Now assume x ∈ V satisfies for all y ∈ V , (x,y) = 0. Then in particular, (x,x) = 0. Usingthe definite property of an inner product we get x = 0 ¤

(ELFY) Use the proposition above to prove that for all x,y ∈ V , x = y iff for all z ∈ V ,(x, z) = (y, z).

Proposition: Let T, S ∈ L(V ). Suppose that for all x,y ∈ V , (Tx,y) = (Sx,y). ThenT = S.

Proof : Let T, S ∈ L(V ).

(*) Assume for any x,y ∈ V that (Tx,y) = (Sx,y).

Assume T 6= S. Then there exists x ∈ V such that Tx 6= Sx. However, (Tx,y) = (Sx,y) forall y ∈ V by assumption (*). Using the exercise left for you on the previous page, Tx = Sx,which is a contradiction. So T = S ¤

An inner product on V induces a norm on V corresponding to the inner product, which is afunction ‖ − ‖ : V → R given by

‖v‖ =√

(v,v) for v ∈ V.

Basic norm property:

‖cv‖ =√

(cv, cv) =√

cc(v,v) =√|c|2(v,v) = |c|‖v‖.

Also, we call two vectors u,v ∈ V orthogonal if (u,v) = 0 (by conjugate symmetry, (v,u) =0 would also hold).

(ELFY) if u,v are orthogonal, then ‖u + v‖2 = ‖u‖2 + ‖v‖2 (Pythagorean Theorem).

Orthogonal decomposition:

Let u,v ∈ V . Assume v 6= 0. Then (v,v) 6= 0. Let c ∈ F. Clearly,

u = cv + (u− cv).

Let’s find c such that v, (u− cv) are orthogonal:

(u− cv,v) = 0 → (u,v)− c(v,v) = 0 → (u,v) = c(v,v) → c =(u,v)

(v,v)=

(u,v)

‖v‖2.

We will call projv(u) = (u,v)‖v‖2 v the orthogonal projection of u onto v.

We will call u = projv(u) + (u− projv(u)) the orthogonal decomposition of u along v

Consider that

(∗) ‖projv(u)‖ =

( |(u,v)|‖v‖2

)‖v‖ =

|(u,v)|‖v‖ .

Notice that since (u− projv(u),v) = 0 it follows that

(u− projv(u), projv(u)) = (u− projv(u),(u,v)

‖v‖2v) =

(u,v)

‖v‖2(u− projv(u),v) = 0.

Thus if u = projv(u) + (u − projv(u)) is the orthogonal decomposition of u along v, thenthe vector terms satisfy the Pythagorean Theorem:

(∗∗) ‖u‖2 = ‖projv(u)‖2 + ‖u− projv(u)‖2.

Then (∗) and (∗∗) implies

‖u− projv(u)‖2 = ‖u‖2 − |(u,v)|2‖v‖2

.

You could of course calculate this directly:

‖u− projv(u)‖2 = (u− projv(u),u− projv(u))

= (u,u− projv(u))− (projv(u),u− projv(u))

= (u,u)− (u, projv(u))− (projv(u),u) + (projv(u), projv(u))

= ‖u‖2 +|(u,v)|2‖v‖2

− (u, projv(u))− (u, projv(u))

= ‖u‖2 +|(u,v)|2‖v‖2

− (u,(u,v)

‖v‖2v)− (u,

(u,v)

‖v‖2v)

= ‖u‖2 +|(u,v)|2‖v‖2

− (u,v)

‖v‖2(u,v)− (u,v)

‖v‖2(u,v)

= ‖u‖2 +|(u,v)|2‖v‖2

− |(u,v)|2‖v‖2

− |(u,v)|2‖v‖2

= ‖u‖2 − |(u,v)|2‖v‖2

.

In the last step we used the fact that conjugation does nothing to a real number:

|(u,v)|2‖v‖2

=|(u,v)|2‖v‖2

.

The following propositions is of great importance.

Proposition (Cauchy-Schwarz Inequality): If u,v ∈ V then

|(u,v)| ≤ ‖u‖‖v‖.

Proof : Let u,v ∈ V . If v = 0 then the inequality holds as both sides are clearly 0.

So we assume that v 6= 0. Consider the orthogonal decomposition u = projv(u) + w wherew = (u− projv(u)).

0 ≤ ‖w‖2 = ‖u‖2 − |(u,v)|2‖v‖2

→

|(u,v)|2‖v‖2

≤ ‖u‖2 →

|(u,v)|‖v‖ ≤ ‖u‖ →

|(u,v)| ≤ ‖u‖‖v‖ ¤

Proposition (Triangle Inequality): If u,v ∈ V then

‖u + v‖ ≤ ‖u‖+ ‖v‖.Proof : Let u,v ∈ V . Consider that for all z = a + bi ∈ C,

Re(a + bi) = a ≤ |a| =√

a2 ≤√

a2 + b2 = |a + bi|.

Then

‖u + v‖2 = (u + v,u + v)

= (u,u) + (v,v) + (u,v) + (v,u)

= ‖u‖2 + ‖v‖2 + (u,v) + (u,v)

= ‖u‖2 + ‖v‖2 + 2Re[(u,v)]

≤ ‖u‖2 + ‖v‖2 + 2|(u,v)|

≤ ‖u‖2 + ‖v‖2 + 2‖u‖‖v‖

= (‖u‖+ ‖v‖)2 .

Hence, ‖u + v‖ ≤ ‖u‖+ ‖v‖ ¤

These are the so-called polarization identities:

Proposition: For all x,y ∈ V ,

(1)

(x,y) =1

4

(‖x + y‖2 − ‖x− y‖2)

if F = R.

(2)

(x,y) =1

4

(‖x + y‖2 − ‖x− y‖2 + i‖x + iy‖2 − i‖x− iy‖2)

if F = C.

Proof : Let x,y ∈ V . Assume F = R. Then conjugation does nothing and (x,y) = (y,x).

Consider that

1

4

(‖x + y‖2 − ‖x− y‖2)

=1

4((x + y,x + y)− (x− y,x− y)) =

=1

4((x,x) + 2(x,y) + (y,y)− [(x,x)− 2(x,y) + (y,y)]) = (x,y).

Assume F = C. Then

z =1

4

(‖x + y‖2 − ‖x− y‖2 + i‖x + iy‖2 − i‖x− iy‖2)

=

=1

4((x + y,x + y)− (x− y,x− y) + i(x + iy,x + iy)− i(x− iy,x− iy))

Let’s split z into two parts:

c1 =1

4((x + y,x + y)− (x− y,x− y)) ,

and

c2 =1

4(i(x + iy,x + iy)− i(x− iy,x− iy)) .

Clearly z = c1 + c2.

c1 =1

4((x,x) + (x,y) + (y,x) + (y,y)− [(x,x)− (x,y)− (y,x) + (y,y)]) →

c1 =1

4(2(x,y) + 2(y,x)) =

1

2((x,y) + (y,x)) .

c2 =i

4((x,x) + (x, iy) + (iy,x) + (iy, iy)− [(x,x)− (x, iy)− (iy,x) + (iy, iy)]) →

c2 =i

4(2(x, iy) + 2(iy,x)) =

i

2[(−i)(x,y) + i(y,x)] =

1

2((x,y)− (y,x)) .

Finally, z = c1 + c2 = (x,y) ¤

Proposition (Parallelogram Identity): For all x,y ∈ V ,

‖x + y‖2 + ‖x− y‖2 = 2(‖x‖2 + ‖y‖2

).

I leave the proof of this one to you...it’s pretty easy.

Now let V be a vector space over F, but we no longer assume that there is an inner producton V .

The function n : V → R is a norm on V if it satisfies the following properties:

(1) Homogeneity: For all v ∈ V and for all c ∈ F, n(cv) = |c|n(v).

(2) Triangle Inequality: For all u,v ∈ V , n(u + v) ≤ n(u) + n(v).

(3) Positivity (aka non-negativity): For all v ∈ V , n(v) ≥ 0.

(4) Definiteness (aka non-degeneracy): Let v ∈ V . Then n(v) = 0 iff v = 0.

Of course, the norm ‖ − ‖ : V → R induced from an inner product space V satisfies theseproperties:

We have already verified that (1) and (2) hold. Properties (3) and (4) are direct consequencesof the positivity and definiteness of the inner product.

So all inner product spaces are normed spaces, via the induced norm ‖v‖ =√

(v,v).

However, there are normed spaces where the norm does not correspond to an inner product.

When V be a normed vector space, with norm n, we will still use the notation n(v) = ‖v‖.

Let V = Rn or V = Cn. It turns out that for any p, 1 ≤ p < ∞, following function‖ − ‖p : V → R given by

‖x‖p =

(n∑

i=1

|xi|p) 1

p

,

is a norm called the p-norm. Of course the 2-norm (aka Euclidean norm) is the norm inducedby the standard inner product on V .

We can also define a norm called the ∞-norm:

‖x‖∞ = max{|xi| : i = 1, 2, ..., n}.

For ‖ − ‖p and ‖ − ‖∞ it is easy to check (ELFY) that they satisfy defining properties (1),(3), and (4) of a norm.

The triangle inequality is a different story. Of course we already have proven the triangleinequality for ‖ − ‖2.

The triangle inequality for ‖ − ‖p isn’t too hard to check (ELFY) when p = 1 and p = ∞.

For arbitrary values of p it is fairly involved to prove the triangle inequality, but it does hold,and is called Minkowski’s Inequality. We won’t do that here...

It turns out that ‖− ‖p for p 6= 2 cannot be induced by an inner product (they don’t satisfythe parallelogram inequality)!

The proof is beyond the scope of this course (because of the reverse implication), but thefollowing proposition characterizes norms induced from inner products:

Proposition: Let V be a normed space with a norm ‖−‖. The norm ‖−‖ is induced froman inner product (−,−) on V iff ‖ − ‖ satisfies the parallelogram inequality, that is, for allx,y ∈ V ,

‖x + y‖2 + ‖x− y‖2 = 2(‖x‖2 + ‖y‖2

).

Suggested Homework: 1.1, 1.2, 1.3, 1.5, 1.7, and 1.9.

2. Orthogonal and Orthonormal Bases

Recall two vectors u,v in an inner product space V are orthogonal if (u,v) = 0. We willuse the notation u⊥v.

A vector v ∈ V is orthogonal to a subspace U of V if for all u ∈ U , u⊥v. We use thenotation v⊥U . Two subspaces U1, U2 are orthogonal subspaces if for all u1 ∈ U1, u1 isorthogonal to the subspace U2.

Furthermore, we can define for any subspace U of V , the orthogonal complement to U (in

V ): U⊥ = {v ∈ V |v⊥U}.Also u is called a unit vector if (u,u) = 1 (and thus ‖u‖ = 1).

The vectors v1,v2, ...,vk are called an orthogonal system if they are pairwise orthogonal,and an orthonormal system if they are an orthogonal system of unit vectors.

Proposition: Any orthogonal system of non-zero vectors is linearly independent.

Proof : Let v1,v2, ...,vk be an orthogonal system of non-zero vectors. Supposek∑

i=1

civi = 0 for ci ∈ F, i = 1, 2, ..., k.

Then for any j = 1, 2, ..., k we get

(k∑

i=1

civi,vj

)=

k∑i=1

ci(vi,vj) = cj‖vj‖2 = 0.

For all j = 1, 2, ..., k, ‖vj‖2 6= 0 as vj 6= 0. Therefore, c1 = c2 = .... = ck = 0, whichcompletes the proof that the orthogonal system is linearly independent ¤

Proposition (General Pythagorean Theorem): Let v1,v2, ...,vk be an orthogonal sys-tem. Then for all c1, c2, ..., ck ∈ F,

∥∥∥∥∥k∑

i=1

civi

∥∥∥∥∥

2

=k∑

i=1

|ci|2‖vi‖2.

Proof : Let v1,v2, ...,vk be an orthogonal. Let ci ∈ F for i = 1, 2, ..., k.

∥∥∥∥∥k∑

i=1

civi

∥∥∥∥∥

2

=

(k∑

i=1

civi,

k∑j=1

cjvj

)=

k∑i=1

k∑j=1

cicj(vi,vj) =k∑

i=1

cici(vi,vi) =k∑

i=1

|ci|2‖vi‖2¤

A basis B = {v1,v2, ...,vn} of V is an orthogonal basis (orthonormal basis) if B is anorthogonal system (orthonormal system).

Due to a previous proposition, if n = dim V , then any orthogonal (orthonormal) system ofn non-zero vectors forms a orthogonal (orthonormal) basis.

Let B = {v1,v2, ...,vn} be an orthogonal basis of V . Suppose x ∈ V . Usually finding[x]B = (c1 c2 · · · cn)T involves solving a linear system, but it is easier when B is anorthogonal basis. For any j = 1, 2, ..., n,

(x,vj) =

(n∑

i=1

civi,vj

)=

n∑i=1

ci(vi,vj) = cj(vj,vj) = cj‖vj‖2 → cj =(x,vj)

‖vj‖2.

(This should look familiar! Recall that projv(u) = (u,v)‖v‖2 v.)

Furthermore, if B is an orthonormal basis, this result reduces to cj = (x,vj).

When B = {v1,v2, ...,vn} is an orthonormal basis, then for any v ∈ V , the formula

v =n∑

i=1

(v,vi)vi,

is called an orthogonal Fourier decomposition.


3. Orthogonalization and Gram-Schmidt

Let W be a subspace of V . For each v ∈ V , w ∈ W is called an orthogonal projection of v onto Wif (v −w)⊥W .

Proposition: Let W be a subspace of V and let w be an orthogonal projection of v ontoW . Then

(1) For all x ∈ W , ‖v −w‖ ≤ ‖v − x‖ (w minimizes distance from v to W ).

(2) If x ∈ W and ‖v −w‖ = ‖v − x‖ then x = w (orthogonal projections are unique).

Proof : Let W be a subspace of V and let w ∈ W be an orthogonal projection of v onto W .Let x ∈ W . Set y = w − x ∈ W . Then

v − x = (v −w) + y.

Since (v −w)⊥y as y ∈ W , by the Pythagorean Theorem we get

‖v − x‖2 = ‖v −w‖2 + ‖y‖2 → ‖v − x‖ ≥ ‖v −w‖,

and ‖v − x‖ = ‖v −w‖ iff y = 0. Hence ‖v − x‖ = ‖v −w‖ iff x = w ¤

(Now we can say the orthogonal projection of v onto W .)

Let W be a subspace of V with an orthogonal basis B = {v1,v2, ...,vk}. Let v ∈ V andPWv be the orthogonal projection of v onto W .

Let w =k∑

i=1

(v,vi)

‖vi‖2vi ∈ W.

Then v −w = v −k∑

i=1

(v,vi)

‖vi‖2vi.

Let x ∈ W . Then there exists scalars c1, c2, ..., ck such that x =∑k

i=1 civi.

Then

(v −w,x) =

(v −

k∑i=1

(v,vi)

‖vi‖2vi,

k∑j=1

cjvj

)

=

(v,

k∑i=1

civi

)−

(k∑

i=1

(v,vi)

‖vi‖2vi,

k∑j=1

cjvj

)

=k∑

i=1

ci (v,vi)−k∑

i=1

k∑j=1

(v,vi)

‖vi‖2cj (vi,vj)

=k∑

i=1

ci (v,vi)−k∑

i=1

(v,vi)

‖vi‖2ci (vi,vi)

=k∑

i=1

ci (v,vi)−k∑

i=1

(v,vi)

‖vi‖2ci‖vi‖2

=k∑

i=1

ci (v,vi)−k∑

i=1

ci(v,vi)

= 0.

This shows that w ∈ W satisfies (v −w)⊥W , and so, w is the orthogonal projection of vonto W . Hence we get the formula:

PWv =k∑

i=1

(v,vi)

‖vi‖2vi.

Claim: Let W be a subspace of V . PW ∈ L(V ).

Let u,v ∈ V and c ∈ F. Let w ∈ W . Then PWu + cPWv ∈ W and

([u + cv]− [PWu + cPWv],w) = ([u− PWu] + c[v − PWv],w) = (u− PWu,w)−c (v − PWv,w) = 0.

Thus PW (u + cv) = PWu + cPWv showing that PW is a linear operator on V .

The Gram-Schmidt algorithm:

Input: A linearly independent set of vectors {x1,x2, ...,xk} spanning a subspace W of V .

Output: An orthogonal system {v1,v2, ...,vk} such that the span is W .

Here are the steps:

(1) Set v1 = cx1 for any non-zero c ∈ F (usually 1).

Set W1 = span{x1} = span{v1}.

(2) Set v2 = c(x2 − PW1x2) = c(x2 − (x2,v1)

‖v1‖2 v1

)for any non-zero c ∈ F.

Set W2 = span{v1,v2}.Note that W2 = span{x1,x2}.

(3) Set v3 = c(x3 − PW2x3) = c(x3 − (x3,v1)

‖v1‖2 v1 − (x3,v2)‖v2‖2 v2

)for any non-zero c ∈ F.

Set W3 = span{v1,v2,v3}.Note that W3 = span{x1,x2,x3}.

(4) Continue in this manner until vk is determined (it’s not necessary to determine Wk).

Suppose V = R3.

Consider the basis B =

123

,

1−10

,

21−1

.

Let’s apply the Gram-Schmidt procedure to turn this into an orthogonal basis:

(1) Set

v1 =

123

and W1 = span

123

.

(2) Consider that

1−10

− PW1

1−10

=

1−10

− 1

14

1−10

,

123

123

=

=

1−10

+

1

14

123

=

1514

−67

314

.

So for convenience set

v2 =

5−41

,

and set W2 = span

123

,

5−41

.

(3) Consider that

21−1

− PW2

21−1

=

=

21−1

− 1

14

21−1

,

123

123

− 1

42

21−1

,

5−41

5−41

=

=

21−1

− 1

14

123

− 5

42

5−41

=

43

43

−43

.

So for convenience set

v3 =

11−1

.

In conclusion B′ =

123

,

5−41

,

11−1

, is an orthogonal basis

such that the first i vectors of B and B′ have the same spans for i = 1, 2, 3.

Recall if W is a subspace of V , then W⊥ = {v ∈ V |v⊥W}.

Claim: Let W be a subspace of V . W⊥ (orthogonal complement) is also a subspace of V .

This is pretty easy to justify:

First notice W⊥ is not empty as 0⊥W .

Let u,v ∈ W⊥ and c ∈ F. Let w ∈ W . Then

(u + cv,w) = (u,w) + c(v,w) = 0.

Thus u + cv ∈ W⊥ proving that W⊥ is a subspace of V .

More generally, if U,W are subspaces of V then U + W = {u + w|u ∈ U,w ∈ W} is calledthe subspace sum.

(ELFY) U + W is a subspace of V for any subspaces U,W .

A subspace sum U + W is called a direct sum, denoted U ⊕W , if U ∩W = {0}.

Proposition: Let U,W be subspaces of V . V = U ⊕ W iff each vector in v ∈ V has aunique representation v = u + w for u ∈ U,w ∈ W .

Proof : Let U,W be subspaces of V . Assume V = U ⊕W .

Let v ∈ V . Then there exists u ∈ U and w ∈ W such that v = u+w. Suppose v = u1 +w1

where u1 ∈ U and w1 ∈ W .

Then v = u + w = u1 + w1 implies u− u1 = w −w1.

Then u + u1 = w−w1 ∈ U ∩ V = {0}. Hence u = u1 and w = w1, so we have uniqueness.

Now assume for all v ∈ V there exists unique vectors u ∈ U,w ∈ W such that v = u + w.We are assuming each vector in V has a unique decomposition into a vector from U plus avector from W .

Clearly V = U + W . The question is whether it’s a direct sum. To yield a contradiction,suppose U ∩W 6= {0}. Let y ∈ U ∩W such that y 6= 0.

Let v ∈ V . Since V = U + W there exists u ∈ U,w ∈ W such that v = u + w.

Set u1 = u + y ∈ U . Since y 6= 0, u1 6= u.

Set w1 = w − y ∈ U . Since y 6= 0, w1 6= w.

Then u1 + w1 = u + w = v, which contradicts the uniqueness of the decomposition. Thiscompletes the proof ¤

Let v ∈ V and W be a subspace at V . Then let v1 = PWv ∈ W and v2 = v− PWv ∈ W⊥.We have already shown that PWv ∈ W is the unique vector for which v − PWv ∈ W⊥.Therefore we have a unique decomposition of v into a vector in W plus a vector in W⊥.Hence V = W ⊕W⊥.

(ELFY) For any subspace W of V , we get (W⊥)⊥ = W .


4. Least Squares Solution

Let A ∈ Mmn and b ∈ Fm. How do we determine an x ∈ Fn such that Ax is as close aspossible to b?

If we can find x such that ‖Ax−b‖ = 0 then Ax = b is consistent and we say x is an exactsolution.

In the case where we cannot find x such that ‖Ax−b‖ = 0 then we call an x which minimizes‖Ax− b‖ a least squares solution to Ax = b (which is inconsistent).

Why least squares? Well, minimizing ‖Ax− b‖ is equivalent to minimizing

‖Ax− b‖2 =m∑

i=1

|(Ax)i − bi|2 =m∑

i=1

∣∣∣∣∣n∑

j=1

Aijxj − bi

∣∣∣∣∣

2

.

Let W = range A = {Ax|x ∈ Fn}. W is a subspace of Fm.

In particular, the minimum value of ‖Ax−b‖ is the distance from b to W , which is satisfiedby an x which solves the equation (*) Ax = PWb.

Furthermore, if v1,v2, ...,vk is an orthogonal basis for W = range A then we get an explicitformula for PWb:

PWb =k∑

i=1

(b,vi)

‖vi‖2vi.

So, if we have a basis for W (which we can create by taking columns of A that are pivotcolumns) then with Gram-Schmidt we can create an orthogonal basis and compute PWb andthen solve (*). That is actually a lot of work...maybe there is an easier method?

Here is a nice approach:

Let A = [a1 a2 · · · an].

Ax = PWb iff (b− Ax)⊥W iff (b− Ax)⊥ai for i = 1, 2, ..., n.

Hence Ax = PWb iff ((b− Ax), ai) = ai∗(b− Ax) = 0 for i = 1, 2, ..., n.

Here is the clever part: (A∗(b− Ax))i = ai∗(b− Ax) = 0 for i = 1, 2, ..., n.

So Ax = PWb iff A∗(b− Ax) = 0 iff A∗Ax = A∗b.

So we get the following result:

So the easiest way to find a least squares solution to Ax = b is to solve the so-callednormal equation A∗Ax = A∗b.

Also for a least squares solution x, PWb = Ax

In particular, this means that a least squares solution is unique iff A∗A ∈ Mnn is invertible.

In this case PWb = A(A∗A)−1A∗b, which would hold for all b ∈ Fm, so then the matrix ofPW is A(A∗A)−1A∗.

Proposition: Let A ∈ Mmn. Ker A = Ker(A∗A). In other words, A∗A is invertible iffrank A = n.

Proof : Let A ∈ Mmn.

Suppose x ∈ Ker A. Then Ax = 0 and thus (A∗A)x = A∗(Ax) = A∗0 = 0. Hencex ∈ Ker(A∗A) and Ker A ⊆ Ker(A∗A).

Suppose x ∈ Ker(A∗A). Then (A∗A)x = A∗Ax = 0.

Thus ‖Ax‖2 = (Ax, Ax) = (Ax)∗Ax = (x∗A∗)Ax = x∗(A∗Ax) = x∗0 = 0. This implies thatAx = 0. So x ∈ Ker A and Ker(A∗A) ⊆ Ker A.

This completes the proof ¤

Line of best fit:

A classic application is finding a line y = mx+b to best fit some given data {(x1, y1), (x2, y2), ..., (xk, yk)}.

The best solution minimizes the following sum of square differences:k∑

i=1

|mxi + b− yi|2.

Equivalently, we are looking for a least squares solution to

x1 1x2 1. .. .. .

xk 1

(mb

)=

y1

y2

.

.

.yk

.

As long as not all x-coordinates are the same, a least squares solution is unique.

EXAMPLE:

Suppose we have the following data points in the xy-plane: {(1, 2), (3, 5), (4, 5), (6, 8), (6, 9), (7, 10)}.Find an equation for the line of best fit.

So we are looking for the least squares solution to

1 13 14 16 16 17 1

(mb

)=

2558910

.

First we create the normal equation:

(1 3 4 6 6 71 1 1 1 1 1

)

1 13 14 16 16 17 1

(mb

)=

(1 3 4 6 6 71 1 1 1 1 1

)

2558910

→

(147 2727 6

)(mb

)=

(20939

)→

(mb

)=

1

153

(6 −27−27 147

)(20939

)=

6751

3051

.

Hence the line of best fit is...

y =67

51x +

30

51.

General approach to curve-fitting or surface-fitting:

(1) Find equations your data should fit if there was an exact solution. (The equationsmust be linear with respect to the parameters.)

(2) Write these equations as a linear system in a matrix form (inconsistent unless thereis an exact solution).

(3) Find the least squares solution (left-multiply both sides by the conjugate transpose,aka adjoint, of the coefficient matrix and solve).

Suggested Homework: 4.2, 4.3, and 4.4.

5. Adjoint of a Linear Transformation

Let V = Fn. Let A ∈ Mmn with entries in F (R or C). Recall the adjoint of A is A∗ = AT .

Proposition: Let A ∈ Mmn. For all x,y ∈ V ,

(Ax,y) = (x, A∗y).

Proof : Let A ∈ Mmn and x,y ∈ V . Then

(Ax,y) = (y, Ax) = (Ax)∗y = (x∗A∗)y = x∗(A∗y) = (A∗y,x) = (x, A∗y) ¤

This proposition gives rise to the notion of an adjoint operator. Now let V, W be (finite di-mensional) inner product spaces and T ∈ L(V, W ). S ∈ L(W,V ) is called an adjoint operatorif (Tx,y) = (x, Sy) for all x ∈ V and y ∈ W . We will use the notation T ∗ for S.

Does it exist? Yes. Let T ∈ L(V, W ). By having B,C be orthonormal bases for V, Wrespectively, define T ∗ ∈ L(W,V ) by having [T ∗]BC = ([T ]CB)∗.

Let’s justify that it is indeed adjoint:

Let x ∈ V and y ∈ W . Let B = {v1,v2, ...,vn} and C = {w1,w2, ...,wm} be orthonormalbases for V,W respectively. Then Tvi = c1iw1 + c2iw2 + · · ·+ cmiwm and y = d1w1 +d2w2 +· · ·+ dmwm, where cij, dk are scalars for i = 1, 2, ...,m, j = 1, 2, ..., n, and k = 1, 2, ..., m.

Then (Tvi,y) =∑m

k=1 ckidk = ([T ]CB[vi]B, [y]C) = ([vi]B, ([T ]CB)∗ [y]C) for i = 1, 2, ..., n.

By linearity, we get (Tx,y) = ([T ]CB[x]B, [y]C) = ([x]B, ([T ]CB)∗ [y]C).

Also T ∗wi = a1iv1 + a2iv2 + · · ·+ anivn and x = b1v1 + b2v2 + · · ·+ bnvn, where aij, bk arescalars for i = 1, 2, ..., n, j = 1, 2, ...,m, and k = 1, 2, ..., n.

Then (x, T ∗wi) =∑n

k=1 akibk = ([x]B, [T ∗]BC[wi]C) = ([x]B, ([T ]CB)∗ [wi]C) for i = 1, 2, ...,m.

By linearity, we get (x, T ∗y) = ([x]B, ([T ]CB)∗ [y]C) = (Tx,y).

So T ∗ is indeed an adjoint operator to T . Is it unique? Suppose S is another adjoint operator.Then for all x ∈ V and y ∈ W , (Tx,y) = (x, T ∗y) = (x, Sy) implies that for all y ∈ W ,T ∗ = Sy. Thus S = T ∗. So the adjoint operator is unique.

Proposition: Let T ∈ L(V, W ) where V, W are inner product spaces. Then

(1) Ker T ∗ = (range T )⊥.

(2) Ker T = (range T ∗)⊥.

(3) range T = (Ker T ∗)⊥.

(4) range T ∗ = (Ker T )⊥.

Proof : First recall that for any subspace U , (U⊥)⊥ = U .

Thus (1) and (3) are equivalent as well as (2) and (4).

It turns out for any linear transformation T , (T ∗)∗ = T (ELFY - Use conjugate symmetry!).

Thus Thus (1) and (2) are equivalent as well as (3) and (4).

So we can just prove (1). It’s pretty straightforward:

y ∈ (range T )⊥ iff (Tx,y) = (x, T ∗y) = 0 for all x ∈ V . Also, (x, T ∗y) = 0 for all x ∈ V iffT ∗y = 0, which means y ∈ Ker T ∗

This completes the proof ¤

Suggested Homework: 5.2. (Hint: If PW is the orthogonal projection onto a subspace Wof an inner product space V then IV − PW is the orthogonal projection onto W⊥.)

6. Isometries and Unitary Operators

Let V and W be inner product spaces over F. T ∈ L(V,W ) is an isometry if for all x ∈ V ,‖Tx‖ = ‖x‖.

Proposition: T ∈ L(V,W ) is an isometry iff for all x,y ∈ V ,

(Tx, Ty) = (x,y).

Proof : Let’s start with the reverse direction. Suppose for all x,y ∈ V ,

(Tx, Ty) = (x,y).

Then for all x ∈ V , ‖Tx‖ =√

(Tx, Tx) =√

(x,x) = ‖x‖. So T is an isometry.

Now assume T is an isometry. We split this direction into two cases, F = R and F = C:

Assume F = R. Let x,y ∈ V . Then by the polarization identity,

(Tx, Ty) =1

4

(‖Tx + Ty‖2 − ‖Tx− Ty‖2)

=1

4

(‖T (x + y)‖2 − ‖T (x− y)‖2)

=

=1

4

(‖x + y‖2 − ‖x− y‖2)

= (x,y).

Assume F = C. Let x,y ∈ V . Then by the polarization identity,

(Tx, Ty) =1

4

(‖Tx + Ty‖2 − ‖Tx− Ty‖2 + i‖Tx + iTy‖2 − i‖Tx− iTy‖2)

=

=1

4

(‖T (x + y)‖2 − ‖T (x− y)‖2 + i‖T (x + iy)‖2 − i‖T (x− iy)‖2)

=

=1

4

(‖x + y‖2 − ‖x− y‖2 + i‖x + iy‖2 − i‖x− iy‖2)

= (x,y) ¤

So an isometry preserves the inner product.

Proposition: T ∈ L(V,W ) is an isometry iff T ∗T = IV .

Proof : Let T ∈ L(V, W ) be an isometry. Let y ∈ V . Then for all x ∈ V , (x, T ∗Ty) =(Tx, Ty) = (x,y). Hence, T ∗Ty = y for all y ∈ V . Thus T ∗T = IV .

Now assume T ∗T = IV . Let x,y ∈ V . Then (Tx, Ty) = (x, T ∗Ty) = (x, IV y) = (x,y).Thus T is an isometry ¤

The proposition above guarantees an isometry is left-invertible, but not necessarily invertible.An isometry T ∈ L(V, W ) is a unitary operator if it is invertible.

A matrix A ∈ Mmn is an isometry if A∗A = In. Also, a square matrix U ∈ Mnn is unitary ifU∗U = In.

(ELFY) The columns of a matrix A ∈ Mnn form an orthonormal basis iff A is unitary.

If F = R then a unitary matrix is also called orthogonal.

Basic properties of unitary matrices: Let U be a unitary matrix.

(1) U−1 = U∗ is also unitary.

(2) If v1,v2, ...,vn is an orthonormal basis of Fn then Uv1, Uv2, ..., Uvn is an orthonormalbasis of Fn.

Suppose V, W have the same dimension. Let v1,v2, ...,vn be an orthonormal basis of Vand w1,w2, ...,wn be an orthonormal basis of W . Define T ∈ L(V, W ) by Tvi = wi fori = 1, 2, ...n. Then T is a unitary operator:

Let x ∈ V . There exists c1, c2, ..., cn ∈ F such that x = c1v1 + c2v2 + · · · + cnvn and‖x‖2 = |c1|2 + |c2|2 + · · ·+ |cn|2.Then

‖Tx‖2 =

∥∥∥∥∥T

(n∑

i=1

civi

)∥∥∥∥∥

2

=

∥∥∥∥∥n∑

i=1

ciTvi

∥∥∥∥∥

2

=

∥∥∥∥∥n∑

i=1

ciwi

∥∥∥∥∥

2

=n∑

i=1

|ci|2 = ‖x‖2.

Proposition: Let U ∈ Mnn be a unitary matrix.

(1) |det U | = 1. (For an orthogonal matrix U , det U = ±1.)

(2) If λ is an eigenvalue of U then |λ| = 1.

Proof : Let U ∈ Mnn be a unitary matrix.

Since det U∗ = det U , it follows that

1 = det In = det(U∗U) = (det U∗) (det U) = det Udet U = |det U |2.

Thus |det U | = 1, which proves (1).

Let λ be an eigenvalue of U . Then there is a non-zero x ∈ Fn such that Ux = λx.

Then ‖x‖ = ‖Ux‖ = ‖λx‖ = |λ|‖x‖ implies |λ| = 1, which proves (2) ¤

Two operators S, T ∈ L(V ) (or matrices S, T ∈ Mnn) are called unitarily equivalent if thereexists and unitary operator U ∈ L(V ) (or matrix U ∈ Mnn) such that S = UTU∗. Inparticular, since U−1 = U∗, the operators/matrices would also be similar. However, not allsimilar operators/matrices are unitarily equivalent.

Proposition: A ∈ Mnn is unitarily equivalent to a diagonal matrix D iff A has an orthogonalbasis of eigenvectors.

Proof : Let A ∈ Mnn.

Assume A is unitarily equivalent to a diagonal matrix D. Then there is a unitary matrixU such that A = UDU∗. Since U∗U = In, it is easy to check that the vectors Uei fori = 1, 2, ..., n are eigenvectors of A (ELFY). Finally, since the vectors e1, e2, ..., en form anorthogonal (orthonormal) basis, so do the eigenvectors Ue1, Ue2, ..., Uen since U is unitary.

Now assume A has an orthogonal basis of eigenvectors v1,v2, ...,vn. Let ui = 1‖vi‖vi for

i = 1, 2, ..., n. Then B = {u1,u2, ...,un} is an orthonormal basis of eigenvectors of A. LetD = [A]BB and U be the matrix whose columns are u1,u2, ...,un. Clearly D is diagonal andU is unitary (columns are an orthonormal basis).

In particular, U = [In]SB, where S is the standard basis. Hence U∗ = U−1 = [In]BS andA = [A]SS = [In]SB[A]BB[In]BS = UDU∗. So A is unitarily equivalent to D ¤

Suggested Homework: 6.1, 6.4, 6.6, and 6.8.

7. Rigid Motions in Rn

A rigid motion in an inner product space V is a transformation f : V → V (not necessarilylinear) that preserves distance, that is, for all x,y ∈ V , ‖f(x)− f(y)‖ = ‖x− y‖.

Clearly every unitary operator T ∈ L(V ) is a rigid motion.

A nice example of a rigid motion that is not a linear operator is translation:

Let y ∈ V be non-zero. Let fy : V → V be given by fy(x) = x + y. This operator is notlinear since f(0) 6= 0.

Proposition: Let f be a rigid motion in a real inner product space V . Let T : V → V begiven by T (x) = f(x)− f(0). Then for all x,y ∈ V ,

(1) ‖T (x)‖ = ‖x‖.(2) ‖T (x)− T (y)‖ = ‖x− y‖.(3) (T (x), T (y)) = (x,y).

Proof : Let f be a rigid motion in a real inner product space V . Let T : V → V be given byT (x) = f(x)− f(0). Let x,y ∈ V .

‖T (x)‖ = ‖f(x)− f(0)‖ = ‖x− 0‖ = ‖x‖.

‖T (x)− T (y)‖ = ‖(f(x)− f(0))− (f(y)− f(0))‖ = ‖f(x)− f(y)‖ = ‖x− y‖.

Since V is a real inner product space, it follows that

‖x−y‖2 = ‖x‖2−2(x,y)+‖y‖2 and ‖T (x)−T (y)‖2 = ‖T (x)‖2−2(T (x), T (y))+‖T (y)‖2.

By using (1) and (2) in the equations above, (T (x), T (y)) = (x,y) ¤

The following proposition states that every rigid motion in a real inner product space is acomposition of a translation and an orthogonal transformation.

Proposition: Let f be a rigid motion in a real inner product space V . Let T : V → V begiven by T (x) = f(x)− f(0). T is an orthogonal transformation (unitary operator in a realinner product space).

Proof : Let x,y ∈ V and c ∈ R. Consider that

‖T (x + cy)− (T (x) + cT (y))‖2 = ‖(T (x + cy)− T (x))− cT (y))‖2 =

= ‖T (x + cy)− T (x)‖2 − 2c(T (x + cy)− T (x), T (y)) + c2‖T (y)‖2 =

= ‖x + cy − x‖2 − 2c(T (x + cy), T (y)) + 2c(T (x), T (y)) + c2‖y‖2 =

= c2‖y‖2 − 2c(x + cy,y) + 2c(x,y) + c2‖y‖2 =

= 2c2‖y‖2 − 2c(x,y)− 2c(cy,y) + 2c(x,y) = 2c2‖y‖2 − 2c2(y,y) = 0.

So T (x + cy) − (T (x) + cT (y)) = 0, which means T (x + cy) = T (x) + cT (y). ThereforeT is linear, and thus by the previous proposition, T is an isometry. However, since T is anisometry from V to itself, it follows that T is invertible and thus unitary, which in a realinner product space is called an orthogonal transformation ¤

Note: We will not cover section 8 of this chapter.

Date post:	05-Aug-2018
Category:	Documents
Upload:	dinhanh
View:	216 times
Download:	0 times

Math 342 - Linear Algebra II Notes 1. Inner …people.oregonstate.edu/~maischf/M342N5.pdfMath 342 -...

Documents