Chapter 3 Linear equations - Texas Tech...

Chapter 3

Linear equations

3.1 Introduction

We will study linear systems of the form

y′(t) = A(t)y(t) +B(t) (3.1.1)

where B(t), y(t) ∈ Rn, A is n × n. Recall, for instance, that we can always rewrite an nthorder linear equation

z(n) + a1(t)z(n−1)(t) + · · ·+ an(t)z(t) = b(t) (3.1.2)

in the form of (3.1.1). Indeed if we define

y =

y1

y2

...yn

=

z

z(1)

...z(n−1)

then

y′(t) ≡

0 1 0 · · · 0

0 0 1 · · · 0

......

. . . . . ....

−an(t) · · · −a1(t)

y +

0

0

...

b(t)

= A(t)y +B(t)

1

2 CHAPTER 3. LINEAR EQUATIONS

Recall that the matrix norm is defined by

‖A‖ = sup|y|=1

|Ay| = supy 6=0

|Ay||y| .

Here we use the sup norm, |y| = |y|∞ = maxi{|yi|}, in which case

‖A‖ = maxi

∑k

|aik|,

Theorem 3.1.1. If A(t), B(t) are continuous on [a, b], t0 ∈ (a, b), then the IVP

y′ = A(t)y +B(t)

y(t0) = y0

has a unique solution that exists on (a, b).

Proof. In the notation of the Fundamental Existence Theorem,

f(t, y) = A(t)y +B(t)

and so f is continuous on (a, b) × Rn and f is locally Lipschitz in the second variable. Toverify this last claim note that the partials

∂fi∂yj

(t) = aij(t)

are continuous or alternatively, observe that |f(t, w)− f(t, u)| ≤ ‖A‖‖w − u‖.To show the solution exists for all t we need only show it is finite for all t ∈ (a, b). If

t > t0

y(t) = y0 +

∫ t

t0

(A(s)y(s) +B(s)

)ds

⇒ |y(t)| ≤ |y0|+∫ t

t0

(‖A(s)‖|y(s)|+ |B(s)|

)ds

≤ |y0|+ maxs∈[a,b]

|B(s)|(b− a) + maxs∈[a,b]

‖A(s)‖∫ t

t0

|y(s)|ds

or

|y(t)| ≤ K1 +K2

∫ t

t0

|y(s)|ds.

3.1. INTRODUCTION 3

Hence by Gronwall,

|y(t)| ≤ K1eK2(t−t0), t0 ≤ t < b.

If t < t0, then let t = η(s) = (a − t0)s + t0 so that as s varies from 0 to 1, t varies from t0down to a. We let z(s) = y(η(s)) = y(t) and we can write

z(s) = y(t) = y0 +

∫ (a−t0)s+t0

t0

(A(τ)y(τ) +B(τ)

)dτ

( Let τ = (a− t0)u+ t0 )

y0 +

∫ s

0

(A(η(u))y(η(u)) +B(η(u))

)(a− t0)du

= y0 + (a− t0)

∫ s

0

(A(η(u))z(u) +B(η(u))

)du.

Thus with K1 = |y0|+ maxs∈[a,b]

|B(s)|(b− a) and K2 = maxs∈[a,b]

‖A(s)‖ we have

|z(s)| ≤ K1 + |(t0 − a)|K2

∫ s

0

|z(s)|ds.

Hence by Gronwall,

|z(s)| ≤ K1e|(t0−a)|K2(s−0), 0 ≤ s < 1.

So that

|y(t)| ≤ K1e(t0−a)K2((t−t0)/(a−t0)), a ≤ t ≤ t0.

Thus finally we have

|y(t)| ≤ K1eK2|t−t0|, t ∈ [a, b].

The following corollary is an obvious but important consequence of the preceding theo-rem.

Corollary 3.1.2. If ai(t) ∈ C0[a, b], an(t) 6= 0, then the IVP

y(n) + a1(t)y(n−1) + · · ·+ a1(t)y = 0

y(t0) = γ1, · · · , y(n−1)(t0) = γn

has a unique solution that exists on (a, b).


The above arguments show that y(t) is defined on (a, b) if A(t), B(t) are continuous on[a, b]. In fact y(t) can be extended to [a, b]. First observe that y(a+), y(b−) exist. Indeed,let tn → b and note

|y(tm)− y(tn)| ≤∫ tm

tn

(|A(s)y(s) +B(s)|

)ds

≤∫ tm

tn

(‖A(s)‖ |y(s)|+ |B(s)|

)ds.

From the above arguments we see that

|y(t)| ≤ K1eK2(b−a) = C.

Hence

|y(tm)− y(tn)| ≤ (max[a,b]‖A(s)‖C + max

[a,b]|B(s)|)(tm − tn).

Thus {y(tn)} is Cauchy and so the limit exists. Since y′(t) = A(t)y(t) +B(t) for all t ∈ (a, b)we get

limt→b−

y′(t) = A(b)y(b−) +B(b).

Hence the differential equation is satisfied on (a, b], and in a similar way one argues that theequation is satisfied in [a, b]

3.2 N th order linear equations

We will consider (3.1.2) assuming ai(t), bi(t) ∈ C[a, b] and an(t) 6= 0 for t ∈ [a, b]. Thus weconsider

y(n) + a1(ty(n−2) + a2(t)y(n−2) + · · ·+ an(t)y = β(t) (3.2.1)

which gives 0 1 0 0 · · · 0

0 0 1 0 · · · 0

......

......

...−an −an−1 · · · a1

Let

L(·) =dn(·)dtn

+ a1dn−1

dtn−1(·) + · · ·+ an(·).

3.2. NTH ORDER LINEAR EQUATIONS 5

Then (3.2.1) may be written as the linear nonhomogeneous system

L(y) = β. (3.2.2)

The operator L is said to be linear since

L(cy1(t) + c2y2(t)) = c1L(y1(t)) + c2L(y2(t)).

It easily follows that the set of solutions of the homogeneous linear system

L(y) = 0 (3.2.3)

is a vector space.The Wronskian of {yj(t)}nj=1 is defined as

W (t) ≡ W({yj(t)}nj=1

)(t) = W (t) =∣∣∣∣∣∣∣∣

y1(t) · · · yn(t)

......

...

y(n−1)1 (t) · · · y

(n−1)n (t)

∣∣∣∣∣∣∣∣ .Theorem 3.2.1. [Abel’s Formula] If y1, . . . yn are solutions of (LH), and t0 ∈ (a, b), then

W (t) = W (t0) exp

[−∫ t

t0

α1(s)ds

].

Thus the Wronskian of {y1, · · · yn} is never 0 or identically 0.

Proof. We compute

W ′(t) =

∣∣∣∣∣∣∣∣∣∣

y′1 · · · y′n

y′1 · · · y′n...

......

y(n−1)1 · · · y

(n−1)1

∣∣∣∣∣∣∣∣∣∣+

∣∣∣∣∣∣∣∣∣∣∣∣∣

y1 · · · yn

y′′1 · · · y′′n

y′′1 · · · y′′n...

......

y(n−1)1 · · · y

(n−1)n

∣∣∣∣∣∣∣∣∣∣∣∣∣+ · · ·+

∣∣∣∣∣∣∣∣∣∣∣∣∣

y1 · · · yn

y11

......

...

y(n−2)1 · · · y

(n−2)n

y(n)1 · · · y

(n)n

∣∣∣∣∣∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣∣∣∣∣∣

y1 · · · yn

y′1 · · · y′n...

......

y(n−2)1 · · · y

(n−2)n

−∑n

j=1 αjy(n−j)1 · · · −

∑nj=1 αjy

(n−j)n

∣∣∣∣∣∣∣∣∣∣∣∣∣


=

∣∣∣∣∣∣∣∣∣∣

y1 · · · yn

y′1 · · · y′n...

......

−α1(t)y(n−1)1 · · · −α1(t)y

(n−1)n

∣∣∣∣∣∣∣∣∣∣= −α1(t)W (t).

HenceW (t) = Ke

−∫ tt0α1(ts)ds

or, more explicitly,

W (t) = W (t0)e−∫ tt0α1(s)ds

.

Definition 3.2.2. A collection of functions {yi(t)}ki=1 is linearly independent on (a, b) if

k∑i=1

ciyi(t) = 0 for all x ∈ (a, b)⇒ cj = 0 for j = 1, · · · , n.

Otherwise we say the set {yi(t)} is linearly dependent.

Theorem 3.2.3. Suppose y1, . . . , yn are solutions of (LH). If the functions are linearly de-pendent on (a, b) then W (t) = 0 for all t ∈ (a, b). Conversely, if there is an t0 ∈ (a, b) sothat W (t0) = 0, then W (t) = 0 for all t ∈ (a, b) and the yi(t) are linearly dependent on (a, b).

Proof. If the {yi(t)} are linearly dependent, then there are ci(t) not all zero such that∑i

ciyi(t) = 0 for all t ∈ (a, b)

⇒∑

ciy(k)i (t) = 0 for all t any k.

Hence, defining

M(t) =

y1(t0) · · · yn(t0)

.... . .

...

y(n−1)1 (t0) · · · y

(n−1)n (t0)

, C =

c1...cn

,the system can be written as

M(t)C = 0

and since C 6= 0 we see that M is singular and therefore

W (t) = det(M(t))0 for all x ∈ (a, b).


Conversely, if det(M(t)) = W (t0) = 0 then

M(t)C = 0

has a nontrivial solution. For this choice of ci’s, let

y(t) =∑

ciyi(t).

Then

y(t0) = 0, y′(t0) = 0, · · · , y(n−1)(t0) = 0.

and since y is a solution of Ly = 0, from the uniqueness part of the fundamental existenceuniqeness theorem, we must have y(t) = 0 for all x ∈ (a, b).

Example 3.2.4. Consider y1 = t2, y2 = t|t|. Then

W (t) =

∣∣∣∣ t2 t|t|2x 2x sgn(t)

∣∣∣∣ = 2t3 sgn(t)− 2t2|t| ≡ 0.

However, y1(t), y2(t) are not linearly dependent. For suppose

c1y1(t) + c2y2(t) = 0, for all t.

Then

c1t+ c2|t| = 0 for all t.

If t > 0,

c1t+ c2t = 0⇒ c1 = −c2

while for t < 0,

c1t− c2t = 0⇒ c1 = c2.

Hence c1 = c2 = 0 and so y1, y2 are linearly independent on any interval (a, b) containing0. Thus y1, y2 are not solutions of a linear homogeneous 2nd order equation on (a, b).

Theorem 3.2.5. 1. There are n linearly independent solutions of (3.2.3)

2. If {y1, · · · , yn} are linearly independent solutions of (3.2.3), then given any solution yof (3.2.3), there exists unique constants c1, . . . , cn such that

y(t) = c1y1 + · · ·+ cnyn


Proof. For part 1., we denote by {ej}nj=1 the usual unit basis vectors in Rn where ej is 1 inthe jth position and 0 otherwise. Then let yj denote the solution of the initial value problemLy = 0 with

yj(t0)

y(1)j (t0)

...

y(n−1)j (t0)

= ej,

so thatW (t0) = det(In) = 1, (In the n× n identity matrix

and so the solutions are linearly independent.

For part 2., let y be a solution of L(y) = 0 and let {yj}nj=1 be linearly independentsolutions. Consider the solution

ϕ(t) = c1y1(t) + · · ·+ cnyn(t)

where c1, · · · , cn are arbitrary constants. We seek c1, · · · , cn so thatϕ(t0) = c1y1(t0) + · · ·+ cnyn(t0) = y(t0)

...

ϕ(n−1)(t0) = c1y(n−1)1 (t0) + · · ·+ cny

(n−1)n (t0) = y(n−1)(t0)

or y1(t0) · · · yn(t0)

......

...

y(n−1)1 (t0) · · · y

(n−1)n (t0)

c1

...cn

=

y(t0)

...y(n−1)(t0)

This system has a unique solution since W (t0) 6= 0. Hence by uniqueness

ϕ(t) = y(t) for all t.

Definition 3.2.6. A set of n linearly independent solutions is called a fundamental set. Theabove result shows that a fundamental set is a basis for the vector space of all solutions of(3.2.2). If {y1, · · · , yn} is a fundamental set we call

c1y1(t) + · · ·+ cnyn(t)

a general solution of (3.2.2). A general solution is a complete solution.


Theorem 3.2.7. If yp is any particular solution of Ly = β, then any other solution ofLy = β can be expressed as

y(t) = yp(t) + yh(t)

where yh(t) is a solution of the homogeneous problem Ly = 0.

Proof. Since L(y − yp) = 0, we can write

y − yp = c1y1 + · · ·+ cnyn = yh

where {y1, · · · yn} is a fundamental set.

We call yp(t) a particular solution of (3.2.2) and yp + yh the general solution of (3.2.2).The problem of finding a basis of solutions of the homogeneous problem is, in general,

very difficult. Indeed a considerable effort was directed at constructing solutions of specificequations with non constant coefficients. Many of the equations and solutions now bare thenames of those that first worked out the details. Most of these solutions were obtained froma theory that we will not discuss in this class – the method of power series. The basic ideais that if an operator L = a0D

n + a1D(n−1) + · · · + an has analytic coefficients, then the

solutions of Ly = 0 will be analytic functions. So, if a0(t) 6= 0, we seek solutions in the form

y =∞∑k=0

cktk. Substituting this expression into the equation we attempt to determine the

coefficients {ck}∞k=0. Most importantly is the case in which a0 is zero at some point (calleda singular point). In the special case of a so called regular singular point we can apply the

method of Frobenius to seek solutions in the form y = tr∞∑k=0

cktk where r is obtained from

the so-called indicial equation.We will not pursue this any further at this point. Rather we now turn to a classical

method that can be used to reduce the order of an equation if one solution can be found bysome method (like guessing for example).

Reduction of Order:

CondsiderLy = y(n) + a1y

(n−1) + · · ·+ any = 0

under the assumption that we have a solution y1, i.e., L(y1) = 0. We seek a second solutionin the form y = uy1. To find u we substitute the y into the equation to obtain.

(uy1)(n) + a1(uy1)(n−1) + · · ·+ an(uy) = 0.


Now we apply Leibniz formula

(f · g)(n) =n∑j=0

(n

j

)f (n−j)g(j)

to obtainn∑j=0

(n

j

)u(n−j)y

(j)1 + a1

n−1∑j=0

(n− 1

j

)u(n−j)y

(j)1 + · · · anuy1 = 0.

The coefficient of the terms involving u but no derivatives of u are uL(y1) which is zero.Thus if we let v = u′ we obtain an equation for v of order (n− 1) in the form

y1v(n−1) + · · ·

[ny

(n−1)1 + a1(n− 1)y

(n−2)1 + · · ·+ an−1y1

]v = 0.

Provided that y1(t) 6= 0 we know that there exists a fundamental set of solutions to thisproblem, {vj}nj=2. With this set, a fundamental set of solutions to the original problem is

given in terms of uj =

∫vj(s) ds as

y1, y1u2, · · · , y1un.

For the special case Ly = y′′+a1y′+a2y = 0, if y1 is a solution we seek a second solution

y = uy1 which as above, substituting y into the equation, leads to the equation of order onefor v = u′

y1v′ + (2y′1 + a1y1)v = 0.

If y1 6= 0, we multiply by y1 to obtain

(y21v)′ + a1(y2

1v) = 0,

which implies

y21(t)v(t) = c exp

[−∫ t

t0

a1(s) ds

].

We set c = 1 to obtain v(t)

v(t) =1

y1(t)2exp

[−∫ x

t0

a1(s) ds

].

Now we can find u and thus obtain a second solution y2 to Ly = 0 given by

y2(t) = y1(t)

∫ t

t0

1

y1(s)2exp

[−∫ s

t0

a1(ξ) dξ

]ds.

We now introduce two methods for constructing a particular solution of of the nonhomo-geneous equation when n = 2.


Variation of Parameters for Nonhomogeneous Equations:

Consider the second order case of the equation L(y) = β and suppose {y1, y2} is a fundamen-tal set. Then c1y1(t) + c2y2(t) is a general solution of L(y) = 0. A method due to Lagrangefor solving L(y) = β is based on the idea of seeking a solution as

yp(t) = c1(t)y1(t) + c2(t)y2(t).

Theny′p = c1y

′1 + c2y

′2 + c′1y1 + c′2y2.

To simplify the algebra, we impose the auxiliary condition

c′1y1 + c′2y2 = 0.

Theny′′p = c1y

′′1 + c2y

′′2 + c′1y

′1 + c′2y

′2.

If we substitute into L(y) = β, we want

c1(t)(y′′1 + a1y′1 + a0y) + c2(t)(y′′2 + a1y

′2 + a0y2) + c′1y

′1 + c2y

′2 = β(t).

Note that the two parenthesized expressions are zero because y1 and y2 are solutions of thehomogeneous equation. Thus we need to solve

c′1y1 + c′2y2 = 0

c′1y′1 + c′2y

′2 = β.

By Cramer’s Rule

c′1(t) =−y2(t)β1(t)

W (y1, y2)(t), c′2(t) =

y1(t)β(t)

W (y1, y2)(t).

Thus a particular solution is given as

yp(t) = −y1(t)

∫ t

t0

y2(s)β(s)

W (s)ds+ y2(t)

∫ t

t0

y1(s)β(s)

W (s)ds

=

∫ t

t0

[y1(s)y2(s)− y1(s)y2(s)

W (y1, y2)(s)

]β(s) ds

=

∫ t

t0

g(x, s)β(t) ds.

g(t, s) is called a Fundamental solution.


The same method works, if not as smoothly, in the general case. Consider the equationL(y) = β where L has order n, and let {y1, . . . , yn} be a fundamental set of solutions of thehomogeneous problem L(y) = 0. As in the last section, given a basis of solutions {yj}nj=1 ofthe homogeneous problem Ly = 0 we seek a solution of L(y) = β in the form

yp(t) = u1(t)y1(t) + · · ·+ un(t)yn(t).

We seek a system of equations that can be solved to find u′1, . . . , u′n. To this end we note

that by applying the product rule to yp and, collecting terms carefully, we can conclude that

u′1y1 + u′2y2 + · · ·+ u′nyn = 0 ⇒ y′p = u1y′1 + u2y

′2 + · · ·+ uny

′n

u′1y′1 + u′2y

′2 + · · ·+ u′ny

′n = 0 ⇒ y′′p = u1y

′′1 + u2y

′′2 + · · ·+ uny

′′n

u′1y′′1 + u′2y

′′2 + · · ·+ u′ny

′′n = 0 ⇒ y′′′p = u1y

′′′1 + u2y

′′′2 + · · ·+ uny

′′′n

... ⇒ ...

u′1y(n−2)1 + u′2y

(n−2)2 + · · ·+ u′ny

(n−2)n = 0 ⇒ y(n−1)

p = u1y(n−1)1 + u2y

(n−1)2 + · · ·+ uny

(n−1)n

u′1y(n−1)1 + u′2y

(n−1)2 + · · ·+ u′ny

(n−1)n = β ⇒ y(n)

p = u1y(n)1 + u2y

(n)2 + · · ·+ uny

(n)n + β

which implies

L(yp) = u1L(y1) + u2L(y2) + · · ·+ unL(yn) + β = β.

Now we note that the system of equations becomes

y1 y2 · · · yn

y′1 y′2 · · · y′n

...... · · · ...

y(n−1)1 y

(n−1)2 · · · y

(n−1)n

u1

u2

...un

=

0

0

...

β

.

The determinant of the coefficient matrix is nonvanishing since it is the Wronskian W (t) ofa set of linearly independent solutions to an n-order linear differential equation. ApplyingKramer’s rule we can write the solutions as

u′k(t) =Wk(t)

W (t), k = 1, · · · , n


where Wk(t) is the determinant of the matrix obtained from the coefficient matrix by replac-

ing the kth column[yk y′k · · · y

(n−1)k

]Tby the vector

[0 0 · · · β

]TIf we define

g(t, s) =n∑k=1

yk(t)Wk(s)

W (s)

then a particular solution of L(y) = β is

yp =

∫ t

t0

g(t, s) β(s) ds.

Linear Constant Coefficients Equations Revisited:

We have already learned, in Chapter 1, how to find a set of n solutions to any homogeneousequation of the form Ly = 0 with L = D(n) +a1D

(n−1) + · · ·+an−1D+an. Namely, we factorthe operator into a product of factors (D − r)k and (D2 − 2αD + α2 + β2)m. Having donethis we simply observe that the general solution of the associated homogeneous problem foreach of these types of operators is easy to write out. Namely, we have

(D − r)ky = 0 ⇒ y =k∑j=1

cjtj−1 ert (3.2.4)

(D2 − 2αD + α2 + β2)my = 0 ⇒ y =k∑j=1

cjtj−1 eαt cos(βt)

+k∑j=1

cjtj−1 eαt sin(βt) (3.2.5)

In the case that the coefficients ai are constant, it is possible to describe the solutionsexplicitly by simply solving the homogeneous equation for each factor and adding these termstogether. What we have not proved is that all such solutions give a basis for the null spaceof L, i.e., we have not shown that the soutions are linearly independent. To show that thesesolutions are linearly independent is not really difficult but to do it completely rigorouslyand carefully is a bit lengthy.

First we note that

Lemma 3.2.8. If λ = α + βi is a real (β = 0 ) or complex number, then

y =

(k∑j=1

cjtj−1

)eλt


is the complete solution of (D − λ)ky = 0.

Proof. Showing that the solutions of (D − λ)ky = 0 are linearly independent amounts toshowing that (

k∑j=1

cjtj−1

)eλt = 0 for all t ∈ R⇒ cj = 0, j = 1, 2, · · · , k.

But, on noting that eλt 6= 0 and dividing, this result is obvious from the fundamental theoremof algebra which says that a polynomial of degree k has exactly k zeros.

Lemma 3.2.9. If λ1 6= λ2 are two complex numbers and

p(t) =k∑j=1

cjtj−1 and q(t) =

∑j=1

djtj−1,

are two polynomials, then

p(t)eλ1t = q(t)eλ2t for all t ∈ R ⇒ p(t) = 0, q(t) = 0.

Proof. To see that this is true we first multiply both sides of the equation by e−λ1t so that

p(t) = q(t)e(λ2−λ1)t for all t ∈ R.

Now consider the cases in which α < 0, α > 0 and α = 0 where (λ2− λ1) ≡ α+ βi. If α < 0then (using L’Hospital’s rule in the first term)

limt→+∞

q(t)e(λ2−λ1)t = 0 while limt→+∞

p(t) = ±∞ (as ck is pos. or neg.).

So that we must have p(t) ≡ 0 and then q(t) ≡ 0. If α > 0 we repeat the same argumentwith the first limit replace by t → −∞. Finally, in the case α = 0 we divide both sides ofthe equation by q(t) and collect real and imaginary parts to obtain

r1(t) + ir2(t) =p(t)

q(t)= eβit = cos(βt) + i sin(βt)

where r1(t) and r1(t) are rational functions with real coefficients. Equating real and imagi-nary parts we see that this would imply that

r1(t) = cos(βt), r2(t) = sin(βt)

which is impossible unless r1(t) = 0 and r2(t) = 0 since the right side has infinitely manyzeros while the left can have only a finite number. This in turn implies that p(t) = 0 andalso q(t) = 0.


Lemma 3.2.10. If ` > 0, λ1 6= λ2 are real or complex numbers and

(D − λ2)`(p(t)eλ1t

)= 0

where p(t) is a polynomial, then p(t) ≡ 0.

Proof. We know that every solution of (D − λ2)`y = 0 can be written as y =(q(t)eλ2t

)for

some polynomial q(t) of degree at most (` − 1). So the equestion is whether or not thereexists a polynomial q(t) so that

p(t)eλ1t = q(t)eλ2t.

We note that this is only possible when p(t) = 0 and q(t) = 0 by Lemma 3.2.9.

Lemma 3.2.11. If p(t) is any polynomial of degree less than or equal (n− 1) then

(D − λ1)m(p(t)eλ2t

)= q(t)eλ2t

where q(t) is a polynomial of degree at most the degree of p(t).

Proof. Consider the case m = 1. We have

(D − λ1)(p(t)eλ2t

)= q(t)eλ2t

where q(t) = p′(t) + (λ2 − λ1)p(t) which is a polynomial of degree p(t). You can now iteratethis result for general ` > 0.

Lemma 3.2.12. If L(y) = y(n) + a1y(n−1) + · · · + any has real coefficients and p(r) = rn +

a1r(n−1) + · · · + an. Then p(z) = p(z) for all z ∈ C. Therefore if p(α + βi) = 0 then

p(α− βi) = 0.

Proof. For every z1, z2 ∈ C, we have z1 + z2 = z1 + z2 and z1z2 = z1z2 which also implieszn1 = z1

n.

From Lemma 3.2.12 we know that for a differential operator L with real coefficients,all complex roots must occur in complex conjugate pairs (counting multiplicity) and fromLemma 3.2.8 we know that for a pair of complex roots λ = α + βi each of multiplicity k, aset of 2k linearly independent solutions is given for j = 0, · · · , (k − 1) by

tjeλt = tjeαt(

cos(βt) + i sin(βt)

).

tjeλt = tjeαt(

cos(βt)− i sin(βt)

).


From this we see that there is a set of real solutions given as a linear combination of thesesolutions by

tjeαt cos(βt) =1

2tj(eλt + eλt

),

and

xjeαt sin(βt) =1

2itj(eλt − eλt

).

We already know from Lemma 3.2.8 that tjeλt and tjeλt are linearly independent. Supposewe have a linear combination

cjtjeαt cos(βt) + djt

jeαt sin(βt) = 0.

This would imply that(cj − dji)

2tjeλt +

(cj + dji)

2tjeλt = 0,

but since these functions are independent this implies

(cj − dji) = 0, (cj + dji) = 0, which implies cj = dj = 0.

Combining these results we have the main theorem:

Theorem 3.2.13. If L = y(n) +a1y(n−1) + · · ·+any has real coefficients and we assume that

the polynomial p(r) = rn + a1r(n−1) + · · ·+ an has zeros given by

r1, r1, r2, r2, · · · , r`, r`, r2`+1, · · · , rswhere rj = αj + βji, j = 1, · · · , `, αj, βj ∈ R, βj 6= 0 and rj for j = 2`+ 1, · · · , s are real.Let rj have multiplicity mj for all j. Then if pj(t) and qj(t) denote arbitrary polynomials(with real coefficients) of degree (mj − 1), the general solution of Ly = 0 can be written as

y =∑j=1

eαjt [pj(t) cos(βjt) + qj(t) sin(βjt)] +s∑

j=2`+1

pj(t)erjt.

Proof. We need only prove that all the functions making up this general linear combinationare linearly independent. We already know that each particular term, i.e., a term of theform pj(t)e

rjt or eαjt [pj(t) cos(βjt) + qj(t) sin(βjt)] consists of linearly independent functions.Note also that by rewriting this last expression in terms of complex exponentials, we havethe functions pj(t)e

rjt and pj(t)erjt. Thus let us suppose that we have a general linear

combination of the formm∑j=1

pj(t)erjt = 0, for some m,

where all we assume is that ri 6= rj for i 6= j. We want to show this implies that everypolyniomial pj ≡ 0. We prove this by induction:


1. The case s = 1 have already done.

2. Assume that the statement holds for s = k−1, i.e.,k−1∑j=1

pj(t)erjt = 0 implies that every

pj(t) ≡ 0.

3. Assume thatk∑j=1

(pj(t)e

rjt

)= 0. We now apply (D − rk)mk to this expression and

note that (D − rk)mk(pk(t)e

rkt)

= 0 so that the sum reduces to

k−1∑j=1

(D − rk)mk(pj(t)e

rjt

)= 0.

By Lemma 3.2.11 this sum can be written as

k−1∑j=1

qj(t)erjt = 0

where(D − rk)mk

(pj(t)e

rjt)

= qj(t)erjt

By the induction hypothesis we have qj(t) = 0 for all j = 1, · · · , (k − 1). But thisimplies that

(D − rk)mk(pj(t)e

rjt)

= 0, j = 1, · · · , (k − 1)

which by Lemma 3.2.10 implies that pj(t) = 0 for all j = 1, · · · , (k − 1). Finally wesee that the original expression reduces to(

pk(t)erkt

)= 0

which implies that pk(t) = 0.

Method of Undetermined Coefficients

As we have already learned, the method of variation of parameters provides a method ofrepresenting a particular solution to a nonhomogeneous linear problem

Ly = y(n) + a1y(n−1) + · · ·+ a(n−1)y

(1) + any = f


in terms of a basis of solutions {yj}nj=1 of the linear homogeneous problem. In the specialcase in which the operator L has constant coefficients, we have just seen that it is possibleto construct such a basis of solutions for the homogeneous problem. Thus given any f wecan write out a formula for a particular solution in integral form

yp(t) =

∫ t

t0

g(t, s)f(s) ds.

Unfortunately, the method of variation of parameters often requires much more work thanis needed. As an example consider the problem

Ly = y′′′ + y′′ + y′ + y = 1

y(0) = 0, y′(0) = 1, y′′(0) = 0.

Example 3.2.14. For the homogeneous problem we have

(D3 +D2 +D + 1)y = (D + 1)(D2 + 1)y = 0

so we can takey1 = cos t, y2 = sin t, y3 = e−t.

Thus the wronskian is

W (t) =

∣∣∣∣∣∣∣∣∣∣cos t sin t e−t

− sin t cos t −e−t

− cos t − sin t e−t

∣∣∣∣∣∣∣∣∣∣and we can apply Abel’s theorem to obtain

W (t) = W (0)e−∫ t0 1 ds =

∣∣∣∣∣∣∣∣∣∣1 0 1

0 1 −1

−1 0 1

∣∣∣∣∣∣∣∣∣∣e−t = 2e−t.

Thus by the variation of parameters formula yp = u1y1 + u2y2 where

u′1 =1

2et

∣∣∣∣∣∣∣∣∣∣0 sin t e−t

0 cos t −e−t

1 − sin t e−t

∣∣∣∣∣∣∣∣∣∣= −1

2(cos t+ sin t),


which implies

u1(t) =1

2(cos(t)− sin(t)).

Similarly, we obtain

u′2 =1

2(cos t− sin t), u3(t) =

1

2et,

which imply

u′2 =1

2(cos t+ sin t), u3(t) =

1

2et,

So we get

yp = u1y1 + u2y2

=1

2(cos t− sin t) cos t+

1

2(sin t+ cos t) sin t+

1

2ete−t

= 1

Well yes, in retrospect we see that it would have been easy to see that yp = 1 is a particularsolution.

Now the general solution is

y = 1 + c1 cos t+ c2 sin t+ c3e−t

and we can apply the initial conditions to determine the constants which yields

y = 1 +1

2(sin t− cos t− e−t).

We note that if we were to apply the method for finding a particlar solution with theproperties

yp(0) = 0, y′p(0) = 0, y′′p(0) = 0,

(as given in the proof of the n order case), then we would get

yp(t) = 1− 1

2(cos t+ sin t+ e−t).

We note that the second term is part of the homogeneous solution so it can be excluded.

In any case this is a lot of work to find such a simple particular solution. In case thefunction f is a linear combination of:

1. polynomials,


2. polynomials times exponentials or,

3. polynomials times exponentials times sine or cosine,

i.e., if f is a solution of a linear constant coefficient homogeneous differential equation, onecan apply the method of undetermined coefficients.

The method goes as follows:

1. Let L = P (D) = Dn +n∑j=1

ajDn−j

2. Let Ly = 0 have general solution yh =n∑j=1

cjyj.

3. Assume that M = Q(D) = Dm +∑m

j=1 bjDm−j is a constant coefficient linear differ-

ential operator such that Mf = 0.

4. Then L = ML is a polynomial constant coefficient differential operator.

5. If y = yh + yp where yp is a particular solution of Ly = f and yh is the general solutionof the homogeneous problem, then we have

Ly = 0.

6. On the other hand we can write the general solution of this problem by simply factoringL and applying the results of the previous section.

7. Note that the solution yh =n∑j=1

cjyj is part of this general solution.

8. So let us denote the general solution by

y =n∑j=1

cjyj +m∑j=1

djwj.

9. Now we also know, by the variation of parameters formula, that there exists a particularsolution of Lyp = f .

10. This particular solution must also be a part of the full general solution of the large

homogeneous problem, i.e., yp =n∑j=1

cjyj +m∑j=1

djwj.


11. We know that Lyh = 0 so we can choose yp =m∑j=1

djwj.

Example 3.2.15. Example: Ly = (D2 − 2D + 2)y = t2et sin(t):

The general solution of the homogeneous equation Ly = 0 is

yh = c1et cos(t) + c2e

t sin(t)

According to the above disscussion we seek a differential operator M so that

M(t2et sin(t)) = 0.

We immediately choose

M = (D2 − 2D + 2)3

and we need to compute the general solution to the homogeneous problem

MLy = (D2 − 2D + 2)3(D2 − 2D + 2)y = (D2 − 2D + 2)4y = 0,

which implies

y = (c1 + c2t+ c3t2 + c4t

3)et cos(t) + (d1 + d2t+ d3t2 + d4t

3)et sin(t).

If we now remove the part of this function corresponding to the solution of the homogeneousproblem Ly = 0, we have

yp = (at3 + bt2 + ct)et cos(t) + (dt3 + ft2 + gt)et sin(t)

After a lengthy calculation the first derivative

y′ =((d+ a) t3 + (3 a+ b+ f) t2 + (2 b+ c+ g) t+ c

)et cos(t)

+((−a+ d) t3 + (−b+ f + 3 d) t2 + (−c+ g + 2 f) t+ g

)et sin(t)

and the second derivative

y′′ =(2 dt3 + (6 d+ 6 a+ 2 f) t2

+ (6 a+ 4 b+ 4 f + 2 g) t+ 2 b+ 2 c+ 2 g)et cos(t)

+(− 2 at3 + (−6 a+ 6 d− 2 b) t2

+ (−4 b+ 4 f + 6 d− 2 c) t+ 2 f + 2 g − 2 c)et sin(t)


Plugging all of this into the equation yields

y′′ − 2y′ + 2y =(6 dt2 + (6 a+ 4 f) t+ 2 g + 2 b

)et cos(t) +(

−6 at2 + (−4 b+ 6 d) t− 2 c+ 2 f)et sin(t)

Equating coefficients with the right hand side leads to the equations

6 d = 0

6 a+ 4 f = 0

2 g + 2 b = 0

−6 a = 1

−4 b+ 6 d = 0

−2 c+ 2 f = 0

which have the solutions a = −1/6, b = 0, c = 1/4, d = 0, f = 1/4 and g = 0. Thus, aparticular solution of the equation is

y =

(−t

3

6+t

4

)et cos(t) +

t2

4et sin(t)

The following table contains a guide for generating a particular solution when one appliesthe method of undetermined coefficients. In particular, consider

Ly = f.

If f(t) = seek yp(t).

Pm(t) = c0tm + · · ·+ cm xs(a0t

m + · · ·+ am)

Pm(t)eαt ts(a0tm + · · ·+ am)eαt

Pm(t)eαt{

sin βtcos βt

tseαt[(a0t

m + · · ·+ am) cos βt

+(b0tm + · · ·+ bm) sin βt

]

Example 3.2.16. Returning to Example 3.2.14 we see that the operator M = D annihilatesf = 1 so we seek a particular solution yp = 1.

3.3. LINEAR SYSTEMS 23

3.3 Linear Systems

Recall that

z(n) + α1(t)z(n−1) + · · ·+ αnz(t) = β(t)

maybe written as a 1st order nonhomogeneous system (hereafter referred to as (LNH))

y′(t) = A(t)y(t) +B(t)

where

A =

0 1 0 · · · 0

0 0 1 0...

.... . . . . . . . .

...

−αn −αn−1 · · · −α1

, B =

0

...

0

β(t)

.

Here we will study the more general case of (LNH) when A is a general n× n matrix and Bis a general n× 1 vector function:

A =

a11(t) . . . a1n(t)

a21(t) . . . a2n(t)

.... . .

...

αn1(t) · · · αnn(t)

, B(t) =

b1(t)

b2(t)

...

bn(t)

with the assumption that the entries of A and B are all in C0[a, b].

To study the (LNH) problem we follow the proceed just as in the nth order case by firststudying the Linear Homogeneous (LH) problem, i.e., the case B = 0.

y′ = Ay. (3.3.1)

Definition 3.3.1. A set of vectors {yj}nj=1 ⊂ Rn is linearly independent on [a, b] if

n∑j=1

cjyj(t) = 0, for all t ∈ [a, b], implies c1 = c2 = · · · = cn = 0.

The vectors are said to linearly dependent on [a, b] if there exists numbers {cj}nj=1 not allzero such that

n∑j=1

cjyj(t) = 0, for all t ∈ [a, b].


Below we will use the following notation for the components of the vectors {yj}nj=1:

y1 =

y11

y21

y31...yn1

, y2 =

y12

y22

y32...yn2

, · · · , yj =

y1j

y2j

y3j...ynj

, · · · , yn =

y1n

y2n

y3n...ynn

A linearly independent set {yj}nj=1 ⊂ Rn of solutions to the (LH) problem (3.3.1) is

called a fundamental set. Given a fundamental set we define a fundamental matrix by

Φ(t) =

y11(t) · · · y1n(t)

y21(t) · · · y2n(t)

......

...

yn1(t) · · · ynn(t)

. (3.3.2)

Theorem 3.3.2. [Abel’s Formula] If y1(t) . . . yn(t) are solutions of (LH) and t0 ∈ (a, b),then

W (t) = W (t0) exp

(∫ t

t0

trA(s)ds

), where tr(A)(s) =

n∑j=1

ajj(s).

Thus the determinant of {y1, · · · , yn} is never 0 or identically 0.

Proof. We give the proof for n = 2. The method of proof is identical for higher dimensions.Note that

W ′(t) =

∣∣∣∣∣ y′11 y′12

y21 y22

∣∣∣∣∣+

∣∣∣∣∣ y11 y12

y′21 y′22

∣∣∣∣∣=

∣∣∣∣∣ a11 · y11 + a12 · y21 a11 · y12 + a12 · y22

y21 y22

∣∣∣∣∣+

∣∣∣∣∣ y11 y12

a21 · y11 + a22 · y21 a21 · y12 + a22 · y22

∣∣∣∣∣=

∣∣∣∣∣ a11 · y11 a11 · y12

y21 y22

∣∣∣∣∣+

∣∣∣∣∣ y11 y12

a22 · y21 +a22 · y22

∣∣∣∣∣= a11W (t) + a22W (t) = trA W (t).


Hence

W (t) = W (t0) exp

(∫ t

t0

trA (s) ds

).

The proof of the next theorem is identical to that of Theorem 3.2.3.

Theorem 3.3.3. Suppose {y1, · · · , yn} are solutions of (LH). If the functions are linearlyindependent on [a, b], then W (t) 6= 0 for all t ∈ (a, b). Conversely, if there exists t0 such thatW (t0) = 0, then W (t) ≡ 0 and the yi(t) are linearly dependent on (a, b)

Theorem 3.3.4. 1. A fundamental set exists.

2. The set of solutions S = {y ∈ Rn : y′ = Ay} to (LH) forms an n-dimensional vectorspace.

This implies that if {y1, · · · , yn} is a linearly independent set of solutions of (LH),then given any solution y(t) of (LH),there exists unique constants c1, · · · , cn such that

y(t) = c1y1(t) + · · ·+ cnyn(t).

Proof. For part 1) we need only let yj denote the solution of

y′ = Ay

y(t0) = ej jth standard unit basis vector in Rn.

Then W (t0) = 1 and so the set {y1, · · · , yn} is linearly independent.

For part 2), we show that the set {y1, · · · , yn} from part 1) forms a basis for S. We havealready shown that it is a linearly independent set so we need only show that it is a spanning

set. To this end, let z(t) be a solution of (LH) and let z(t0) = z0 =[z0

1 · · · z0n

]T ∈ Rn.Let Φ(t) be the fundamental matrix for the fundamental set {y1, · · · , yn} given above. Thennote that

z0 =

z0

1

z02...z0n

= z01e1 + z0

2e2 + · · ·+ z0nen.

Consider the vector

y(t) ≡ z01y1(t) + z0

2y2(t) + · · ·+ z0nyn(t).


Then, as a sum of solutions of (LH) we have that y is also a solution of (LH) and at t = t0we have

y(t0) = z01y1(t0) + z0

2y2(t0) + · · ·+ z0nyn(t0) = z0

1e1 + z02e2 + · · ·+ z0

nen = z0 = z(t0).

Now by the fundamental existence uniqueness theorem z(t) = y(t) for all t. Thus the set isa spanning set and we see that S is an n-dimensional vector space.

Note that, in the proof of the last theorem the set of linearly independent solutions couldhave been any such set. In particular, if {yj}nj=1 is any linearly independent set of solutionsto (LH) with fundamental matrix Φ(t) (which is therefore nonsingular), then given anysolution z(t) of (LH) with z(t0) = z0, there exists c1, · · · , cn so that

z(t) = c1y1 + · · ·+ cnyn = Φ(t)

c1...cn

≡ Φ(t)C.

Namely,

C = Φ−1(t0)z(t0) = Φ−1(t0)z0.

Thus

z(t) = Φ(t)Φ−1(t0)z0.

Given any fundamental matrix Φ(t), let

Ψ(t) = Φ(t)Φ−1(t0)

then any solution z(t) of (LH) can be written as

z(t) = Ψ(t)z0

and we have

det Ψ(t0) = det(Ψ(t0)Ψ(t0)−1) = det(I) = 1.

For the special fundamental set {y1, · · · , yn} satisfying

yj(t0) = ej,

we have,

Ψ(t) = Φ(t)

since Φ(t0) = I.


Suppose {y1, · · · yn} are linearly independent solutions of (LH). Let

Ψ(t) =[y1 · · · yn

].

Then

Ψ′(t) =[y′1 · · · y′n

]=[Ay1 · · · Ayn

]= A

[y1 · · · yn

]= AΨ(t).

We call Ψ a solution matrix. The next theorem answers the question: when is a solutionmatrix a fundamental matrix?

Theorem 3.3.5. A solution matrix Ψ of (LH) is a fundamental matrix iff det Φ(t) 6= 0 forall t ∈ (a, b).

Proof. Since the columns of Ψ are solutions of (LH), we know {y1, · · · , yn} is a fundamentalset iff

det Ψ(t) 6= 0 for all t

Theorem 3.3.6. If Φ(t) is a fundamental matrix and C is a nonsingular constant matrix,then Φ(t)C is a fundamental matrix

Proof. Let C =[C1 C2 · · · Cn

]denote a column delimited matrix and compute

(Φ(t)C)′ =

([Φ(t)C1 Φ(t)C2 · · · Φ(t)Cn

])′=[AΦ(t)C1 AΦ(t)C2 · · · AΦ(t)Cn

]= AΦ(t)C.

Moreover,det(ΦC) = det(Φ(t))(detC) 6= 0.

Theorem 3.3.7. If Φ and Ψ are fundamental matrices for (LH), then there is a nonsingularmatrix C so that

Ψ = ΦC


Proof. LetΨ =

[ψ1 · · · ψn

], Φ =

[ϕ1, · · ·ϕn

].

Since the sets of vectors {ψj}nj=1 and {ϕj}nj=1 are bases for Rn, given the column vectors ψjthere exists constant vectors c1j, · · · , cnj such that

ψj = c1jϕ1 + · · ·+ cnjϕn

= (ϕ1, · · ·ϕn)

c1j...cnj

= ΦCj

TakeC =

[C1 · · · Cn

].

ThenΨ = ΦC.

Since det Φ detC = det Ψ 6= 0, the result follows.

Nonhomogeneous Linear Systems

We now consider the nonhomogeneous linear system (LNH)

y′ = Ay +B.

To construct a solution, we imitate the method of variation of parameters. That is, we seeka particular solution

yp(t) = Φ(t)v(t), v(t) ∈ Rn.Then

y′p(t) = Φ′v + Φv′ = AΦv + Φv′ = Ayp + Φv′.

Thus we needΦv′ = B ⇒ v′ = Φ−1B

or

v(t)− v(t0) =

∫ t

t0

Φ−1(s)B(s)ds,

and we can take the solution that is zero at t0 to be

v(t) =

∫ t

t0

Φ−1(s)B(s)ds.

3.4. LINEAR SYSTEMS WITH CONSTANT COEFFICIENTS 29

Thus

yp = Φ(t)

∫ t

t0

Φ−1(s)B(s)ds.

Now, if y(t) is any other solution of (LNH), then y(t)− yp(t) satisfies (LH) and hence

y(t)− yp(t) = Φ(t)C

for some constant vector C. Therefore the general solution of LNH is

y(t) = Φ(t)C + Φ(t)

∫ t

t0

Φ−1(s)B(s)ds

which we rewrite as

y(t) = Φ(t)C + Φ(t)

∫ t

t0

Φ−1(s)ds.

For the IVPy(t0) = y0,

theny(t0) = Φ(t0)C ⇒ C = Φ−1(t0)y0.

We have derived the Variation of Parameters Formula

y(t) = Φ(t)Φ−1(t)y0 +

∫ t

t0

Φ(t)Φ−1(s)B(s)ds.

If A is constant, it is left as an exercise to show that the formula becomes

y(t) = Φ(t− t0)y0 +

∫ t

t0

Φ(t− s)B(s)ds,

where Φ(0) = I.

3.4 Linear Systems with Constant Coefficients

We conclude this chapter by examining the special case that the coefficient matrix A isconstant. As in the scalar case, we work over the complex numbers and specialize to the realcase.

Here is one idea to solve the system

y′ = Ay. (LH)


Seek solutions of the form y(t) = eλtv where v a constant vector. Then

y′ = λeλtv

andAy = A(eλtv) = eλtAv.

Thus we get a solution ifλeλtv = eλtAv

orAv = λv,

that is (λ, v) must be an eigenpair. When we constructed solutions to the linear scalardifferential equation we considered cases as determined by the roots of the characteristicpolynomial. For systems, cases are distinguished by the roots of the characteristic polynomialof A.

It’s not hard to do the analysis in the case where the matrix can be diagonalized.

Theorem 3.4.1. Suppose A is n× n and has distinct e-values {λ1, · · ·λn}, then there existn linearly indenpendent e-vectors {v1, . . . , vn}. In this case,

{eλ1tv1, . . . , eλntvn}

is a fundamental set.

Proof. It is clear, since each j is a simple eignevalue, that for each j there must be a nonzerosolution to the equation (λj −A)v = 0 since det(λj −A) = 0. For each j let this solution becalled vj. Now let us suppose that

c1v1 + c2v2 + · · ·+ cnvn = 0.

On multiplying this equation by A and using Avj = λjvj we have

c1λ1v1 + c2λ2v2 + · · ·+ cnλnvn = 0.

Repeating we arrive at the system

c1v1 + c2v2 + · · ·+ cnvn = 0

c1λ1v1 + c2λ2v2 + · · ·+ cnλnvn = 0

...

c1λn−11 v1 + c2λ

n−12 v2 + · · ·+ cnλ

n−1n vn = 0


Defining the vandermonde matrix

V =

1 λ1 λ2

1 · · · λ(n−1)1

1 λ2 λ22 · · · λ

(n−1)2

...... · · · ...

1 λn λ2n · · · λ

(n−1)n

we note that the determinant of this vandermonde is zero if and only if λi = λj for some iand j. Under the assumption that the eigenvalues are distinctg we see that V is nonsingular.So the system above can be written, first as[

c1v1 c2v2 · · · cnvn]V = 0,

and then, on multiplying by V −1 on the right,[c1v1 c2v2 · · · cnvn

]= 0,

which implies that

cjvj = 0, j = 1, · · · , n ⇒ cj = 0, j = 1, · · · , n

since, by assumption, vj 6= 0 for all j.To see that these vectors are linearly independent we note that

W{eλ1tv1, . . . , eλntvn} = |eλ1tv1, . . . , e

λntvn|

= eλ1t|v1, eλ2tv2 . . . e

λntvn|

. . .

= e(λ1+···+λn)t|v1, . . . , vn| 6= 0

since, as we have shown above, the e-vectors are linearly independent. Hence the generalsolution is

y(t) = c1eλ1tv1 + · · ·+ cne

λntvn.

and a fundamental matrix is given by

Φ(t) =[eλ1tv1 . . . eλntvn

]. (3.4.3)

Theorem 3.4.2. If A is an n× n matrix. There exists a unitary matrix P (i.e., P−1 = P ∗

) such that A = PDP ∗ with D = diag(λ1 λ2 · · · λn

)if and only if A is normal, i.e.,

AA∗ = A∗A.Thus A is normal if and only if the eigenvectors of A form an orthonormal basis for Rn.


Proof. (⇒) Assume that there exists P such that P−1 = P ∗ and A = PDP ∗ with D adiagonal matrix. Then A∗ = PD∗P ∗ and we have

AA∗ = (PDP ∗)(PD∗P ∗) = P (DD∗)P ∗

= P (D∗D)P ∗ = (PD∗P ∗)(PDP ∗)

= A∗A

where we have used the fact that diagonal matrices commute. Thus A is normal.

(⇐) For this direction we need a well known result in matrix theory – Schur’s Theorem.

Theorem For every n× n matrix A there exists a unitary matrix P such that A = PTP ∗

with T an upper triangular matrix.

The proof of this theorem is very simple but will not be included here. A proof of thistheorem can be found in any book on matrix theory (see, for example, Matrix Theory byJames M. Ortega, published by Plenum Publishing Co.). Assuming Schur’s theorem we havethat there exists an upper triangular matrix T so that T = P ∗AP and, assuming that A isnormal, we show that T is also normal. We have

TT ∗ = (P ∗AP )(P ∗A∗P ) = P ∗(AA∗)P

= P ∗(A∗A)P =

= (P ∗A∗P )(P ∗AP )

= T ∗T,

and we see that T is normal. But a normal upper-triangular matrix must be a diagonalmatrix. We consider the 2× 2 case:[

a11 a12

0 a21

] [a11 0a12 a21

]=

[a11 0a12 a21

] [a11 a12

0 a21

].

From the (1, 1)-entry we have|a11|2 + |a12|2 = |a11|2

which implies that a12 = 0 so T is diagonal.

Now in case A = PDP ∗ then AP = PD. Let us designate P as a column delimitedmatrix by P =

[P1 P2 · · · Pn

]. The we can write AP = PD as[

AP1 AP2 · · · APn]

=[λ1P1 λ2P2 · · · λnPn

]


which implies APj = λjPj for all j = 1, · · · , n. Also A∗P = PD∗ implies that A∗Pj = λjPjfor all j = 1, · · · , n. So the eigenvalues of A are {λj} with eignevectors {Pj}. To see that theeigenvectors are orthogonal just note that P unitary implies I = PP ∗ = P ∗P which simplysays that < Pj, Pk >= δjk, i.e., the vectors {Pj} are orthogonal.

Example 3.4.3.

y′ =

[1 12

3 1

]y

Here the e-pairs are (7,

[2

1

]),

(−5,

[−2

1

]).

Two linearly independent solutions are

y1(t) = e7t

[2

1

], y2(t) = e−5t

[−2

1

].

Hence a fundamental matrix is

Φ(t) =

[2e7t −2e−5t

e7t e−5t

].

Consider next the case of a real matrix with distinct eigenvalues, some of which arecomplex. We wish to find real solutions.

Suppose λ = α + iβ is an e-value with an e-vector v = v1 + iv2. Then

eλtv

is a complex solution of (LH). We obtain two real solutions by writing

y(t) = eαt(cos(βt) + i sin(βt))(v1 + iv2)

= eαt(cos(βt)v1 − sin(βt)v2)

+ ieαt(sin(βt)v1 + cos(βt)v2)

and taking

y1(t) = eαt(cos(βt)v1 − sin(βt)v2)

y2(t) = eαt(sin(βt)v1 + cos(βt)v2).


Example 3.4.4. The characteristic equation for the system

y′ = Ay =

1 0 0

0 1 −1

0 1 1

yis

ϕ(t) = |A− λI| = (1− λ)((1− λ)2 + 1) = 0

The e-values are λ1 = 1, λ = 1± i. For λ1 = 1 we see

(A− I)v = 0 =⇒ v = c

1

0

0

, c ∈ C.

if λ = 1 + i, then

(A− (1 + i))v =

−i 0 0

0 −i −1

0 1 −i

v1

v2

v3

= 0

and so we may takev1 = 0, iv2 = −v3 or v2 = i, v3 = 1.

or

v =

0

i

1

.We get a solution

y(t) = et(cos(t) + i sin(t))

0

i

1

.Two linearly independent real solutions are

y2(t) = <(y(t)) = et

0

− sin(t)

cos(t)

.

y3(t) = =(y(t)) = et

0

cos(t)

sin(t)

.


A fundamental matrix is given by

Φ(t) = et

1 0 0

0 cos(t) − sin(t)

0 sin(t) cos(t)

.These techniques leave open the question of what to do in the case of repeated roots.

For this reason, and for theoretical reasons, we introduce the concept of the exponential ofa matrix.

Recall that for a real number t,

et =∞∑n=0

tn

n!= 1 + t+

t2

2+t3

3!+ · · · ,

and this series converge absolutely for all values of t.If A is an n× n real or complex matrix, we define

eAt =∞∑j=0

1

j!tjAj = I + At+

A2t2

2!+A3t3

3!+ · · · .

To see that eAt is well defined, let

Sk(t) = I + At+ · · ·+ Akt4

k!.

then for any T > 0 and t ∈ [−T, T ]

‖Sm(t)− Sp(t)‖ = ‖m∑

k=p+1

Aktk

k!‖

≤m∑

k=p+1

‖A‖kT kk!

,

where we may assume p < m without loss of generality. This can be made arbitrarily smallby choosing p large. Hence the series converges. If we view the set of n × n matrices aselements of Kn×n the result follows since Kn×n is complete.

Theorem 3.4.5. The matrix exponential satisfies the following properties.

1. For a zero matrix 0, e0 = I.


2. If B commutes with A, then B commutes with eAt. In particular, A commutes witheAt. If B commutes with A, then eB commutes with eAt

3. The matrix valued function S(t) = eAt is differentiable, with

d

dteAt = AeAt = eAtA.

4. If A and B commute, then eAt+Bt = eAteBt = eBteAt.

5. For real numbers t and s, eA(t+s) = eAteAs.

6. For any matrix A, eAt is invertible and (eAt)−1 = e−At.

7. If P is invertible, then eP−1AtP = P−1eAtP .

Proof. We will slightly alter the order of proof:

1) The first statement is obvious on setting t = 0 in the infinite series and noting thatA0 = I and 0! = 1.

2) By induction we can readily see that BA = AB implies BAj = AjB for all j = 1, 2, · · ·which implies Sk(t)B = BSk(t) so given ε > 0 we can choose K > 0 so that for k > Kwe have ‖Sk(t)− S(t)‖ ≤ ε/(2‖B‖) for all t ∈ [−T, T ] and we can write

‖BS(t)− S(t)B‖ ≤ ‖BS(t)− Sk(t)B‖+ ‖Sk(t)B − S(t)B‖

= ‖B(S(t)− Sk(t))‖+ ‖(Sk(t)− S(t))B‖

≤ 2‖B‖‖(Sk(t)− S(t))‖ ≤ ε.

Since ε is arbitrary we are done.

3) We first recall a result from “baby reals”: If {uj} ⊂ C1[a, b] and

∞∑j=1

uj(t) and∞∑j=1

u′j(t) converge uniformly on [a, b].

Then

( ∞∑j=1

uj(t)

)′=∞∑j=1

u′j(t).


Now we nee only note that

S ′k(t) =k∑j=0

jAjtj−1

j!= A

k∑j=1

Aj−1tj−1

(j − 1)!= A

k−1∑j=0

Ajtj

j!= ASk−1(t).

Since A is continuous and Sk−1(t) converges uniformly, we see that the S ′k(t) convergesuniformly and we can apply the above result to conclude

S ′(t) = limk→∞

S ′k(t) = limk→∞

ASk(t) = A limk→∞

Sk(t) = AS(t) = S(t)A.

5) To show that S(t) is invertible and to compute the inverse we use 4) and 1). Namely,I = S(0) = S(t − t) = S(t)S(−t) which implies that S(t) is invertible and its inversesatisfies (S(t))−1 = S(−t).

4) To prove that S(t1 + t2) = S(t1)S(t2) we let t2 = s be fixed and consider t = t1 as avariable. Then we define X(t) = eAseAt and Y (t) = eA(t+s). Then by part 3) we have

X ′(t) = AX(t), X(0) = eAs

Y ′(t) = AY (t), Y (0) = eAs.

So by the fundamental existence and uniqueness theorem we have X(t) = Y (t) for allt.

7) (⇒) Let Y (t) = e(A+B)t and Z(t) = eAteBt. Note that Y (0) = Z(0) = I show we needonly show that Y and Z satisfy the same linear ordinary differential equation.

First we note thatY ′(t) = (A+B)Y (t)

is obvious and then by the product rule we have

Z ′(t) = AeAteBt + eAtBeBt.

But it was proved in 2) that AB = BA implies eAtB = BeAt so we have

Z ′(t) = (A+B)Z(t)

and we are done.

(⇐) Assume that Y (t) = Z(t) for all t. Then Y ′(t) = Z ′(t) which gives

(A+B)eAteBt = (A+B)e(A+B)t = Y ′(t) = Z ′(t) = AeAteBt + eAtBeBt


This in turn impliesBeAteBt = eAtBeBt

Now apply e−Bt on the left and use 5) to get

BeAt = eAtB

which we differentiate with respect to t and then set t = 0 to obtain

BAeAt∣∣∣∣t=0

= AeAtB

∣∣∣∣t=0

⇒ BA = AB.

6) If we let B = P−1AP want to show that exp(Bt) = P−1 exp(At)P . Note is easy to seethat Bj = P−1AjP and so

k∑j=1

Bjtj

j!=

k∑j=1

P−1AjPtj

j!= P−1

(k∑j=1

Ajtj

j!

)P,

and passing to the limit on each side we arrive at

eBt =∞∑j=1

Bjtj

j!= P−1

( ∞∑j=1

Ajtj

j!

)P = P−1 exp(At)P.

Theorem 3.4.6. Φ(t) = eAt is a fundamental matrix for (LH).

Proof. SinceΦ′ = AeAt = AΦ,

it is a solution matrix. Sincedet(Φ(0)) = 1,

by Abel’s formula det(Φ(t)) 6= 0 for all t and hence by Theorem 3.3.5 it is fundamental.

Hence every solution of (LH) when A is constant is given by

y(t) = eAtc

for a suitably chosen constant vector c. In fact, plugging in t = 0 we see that c = y(0).Recall the nonhomogeneous problem is solved by the variation of parameters formula

y′ = Ay +B, y(0) = y0 ⇒ y(t) = Φ(t)Φ−1(t0)y0 +

∫ t

t0

Φ(t)Φ−1(s)B(s)ds.


In the case where A is constant the variations of parameters formula becomes

y(t) = eAte−At0y0 +

∫ t

t0

eAte−Asb(s) ds

or

y(t) = eA(t−t0)y0 +

∫ t

t0

eA(t−s)b(s) ds.

We now turn to the computation of eAt.

Definition 3.4.7. An n× n matrix A is diagonalizable if and only if there exists a nonsin-gular matrix P such that

P−1AP = D ≡ diag(λ1 λ2 · · · λn

).

Here the {λj} are the eigenvalues of A.

Note that if A is diagonalizable then AP = PD and this can be written, in terms ofcolumns, as [

AP1 AP2 · · · APn]

=[λ1P1 λ2P2 · · · λnPn

]which implies that not only are the {λj} the eigenvalues of A but also the columns Pj of P arethe associated eigenvectors. Notice also that since P is nonsingular we know that the columnsare linear independent which implies that the eigenfunctions are linearly independent. Alsoby part 6) of Theorem 3.4.5 we have

eAt = ePDtP−1

= PeDtP−1.

Thus to compute eAt we first consider the case eDt with D diagonal.Suppose D is diagonal, say

D =

λ1 0. . .

0 λn

Then

eDt = I +

λ1t 0

. . .

0 λnt

+

λ2

1t2

2!0

. . .

0 λ2nt2

2!

+ · · ·

=

1 + λ1t+ λ2

1t2

2!+ · · · 0

. . .

0 1 + λnt+ λ2nt2

2!+ · · ·

=

eλ1t 0

. . .

0 eλnt

.


So if A is diagonalizable, then

P−1AP = D =

λ1 0

. . .

0 λn

,and so

eAt = P

eλ1t 0

. . .

0 eλnt

P−1

Hence in this case

eAt = P

eλ1t 0

. . .

0 eλnt

P−1

=[eλ1tP1 · · · eλntPn

]P−1

= Φ(t)P−1

where Φ(t) is the matrix constructed in (3.4.3).In the special case of normal matrix (e.g., a selfadjoint or symmetric matrix) for which

the eigenvectors are orthogonal so that P−1 = PT

we see that

eAty0 =[eλ1tP1 · · · eλntPn

]P−1y0 =

[eλ1tP1 · · · eλntPn

]P1

T

P2T

...

PnT

y0


]P1

Ty0

P2Ty0

...

PnTy0


]

⟨y0, P1

⟩⟨y0, P2

⟩...⟨

y0, Pn⟩

=

n∑j=1

eλjt⟨y0, Pj

⟩Pj.

Thus we can conclude


Proposition 3.4.8. If A is an n× n matrix with a set of n orthonormal eigenvectors {vj}and eigenvlaues {λj}, then the solution of

y′ = Ay, y(0) = y0

is

y = eAty0 =n∑j=1

eλjt⟨y0, vj

⟩vj.

For the nonhomogeneous problem

y′ = Ay +B, y(0) = y0

the variation of parameters formula gives

y = eAty0 +

∫ t

0

eA(t−s)B(s) ds

=n∑j=1

eλjt⟨y0, vj

⟩vj +

n∑j=1

⟨y0, vj

⟩vj

∫ t

0

eλj(t−s)B(s) ds.

Unfortunately, as we know, even for the simple example A =

[0 10 0

], a matrix can be

not diagonalizable. This means it does not have a complete set of linearly independenteigenfunctions. For this reason we need a more general procedure to compute eAt. Onemethod is the following algorithm.1

Theorem 3.4.9. Let A be an n × n constant matrix. Let P (λ) = det(A − λI) be thecharacteristic polynomial of A. Let r1, . . . , rn denote the solutions of the scalar constantcoefficient linear equation P (D)r = 0 that satisfy the following initial conditions:

r1(0) = 1r′1(0) = 0

...

r(n−1)1 (0) = 0

,

r2(0) = 0r′2(0) = 1

...

r(n−1)2 (0) = 0

, . . . ,

rn(0) = 0r′n(0) = 0

...

r(n−1)n (0) = 1

(3.4.4)

Then,

eAt = r1(t)I + r2(t)A+ r3(t)A2 + · · ·+ rn(t)An−1. (3.4.5)

1I.E. Leonard, “The matrix exponential,” SIAM Review, Vol. 38, No. 3, 507–512, September 1996


Before proving the theorem, we give some examples. First, let’s develop some compu-tational techniques. The roots of the characteristic polynomial of A are, of course, theeigenvalues of A. We only need to know the roots of the characteristic polynomial and theirmultiplicities to find the general solution of P (D)r = 0. So, suppose that s1, . . . , sn are afundamental set of solutions for this equation. Any solution r can be written in the formr = c1s1 + · · ·+ cnsn for constants c1, . . . , cn.

Formally, we could write this as

r =[s1 s2 . . . sn

]c1

c2...cn

= Sc,

where S is the row vector of fundamental solutions.If we want r to satisfy the initial conditions r(0) = γ1, r

′(0) = γ2, . . . , r(n−1)(0) = γn, we

need to find the cj’s by solving the equations

c1s(j)1 (0) + c2s

(j)2 (0) + · · ·+ cns

(j)n (0) = γj, j = 0, . . . , n− 1.

We can write this in matrix form as

Sc = γ

where S is the matrix s1(0) s2(0) . . . sn(0)s′1(0) s′2(0) . . . s′n(0)

......

. . ....

s(n−1)1 (0) s

(n−1)2 (0) . . . s

(n−1)n (0),

i.e., the Wronskian matrix at 0. Of course, we are using the notation

c =

c1

c2...cn

, γ =

γ1

γ2...γn

,so γ is the vector of initial conditions.

Suppose that we want to solve several initial value problems with with right hand sidevectors β1, . . . , βk. If rj is the solution of with initial conditions βj, then rj = Scj where cjis the solution of Scj = βj. If we introduce the matrices

C =[c1 c2 . . . ck

], B =

[β1 β2 . . . βk

],


we can combine the equations Scj = βj into the single matrix equation SC = B. Thesolution is C = S−1B Thus, we can summarize the solutions rj as[

r1 r2 . . . rn]

= SC,

where C = S−1B.In the case of the collection of initial value problems (3.4.4), the right hand side vectors

are the standard basis vectors e1, e2, . . . , en and so B is the identity matrix. Thus, thesolutions r1, . . . , rn required in the theorem are given by[

r1 r2 . . . rn]

= SS−1.

Example 3.4.10. Construct eAt and find the solution of y′ = Ay, y(0) = y0, where

A =

[3 −11 1

], y0 =

[21

].

The characteristic polynomial is P (λ) = det(A − λ) = λ2 − 4λ + 4 = (λ − 2)2. Thus, 2is a root of multiplicity two. To find the eigenvectors, we calculate the kernel of

A− 2I =

[1 −11 −1

]This matrix is row equivalent to the matrix[

1 −10 0

]

Thus, the λ = 2 eigenspace is spanned by the vector

[11

]. It follows that A is not diagonal-

izable.To apply the algorithm, we need to solve the initial value problem (D2− 4D+ 4)r = 0 for

the two sets of initial conditions r1(0) = 1, r′1(0) = 0 and r2(0) = 0, r′2(0) = 1. A fundamentalset of solutions for this equation is S =

[e2t, te2t

]. The general Wronskian matrix is[

e2 t te2 t

2 e2 t e2 t + 2 te2 t

]

and the Wronskian matrix at 0 is

S =

[1 0

2 1

].


We compute that

S−1 =

[1 0

−2 1

].

Thus, we have

[r1 r2

]= SS−1 =

[e2t, te2t

] [ 1 0

−2 1

]or, to put it another way,

r1 = e2t − 2te2t

r2 = te2t.

Thus, according to the theorem,

eAt = r1I + r2A

= (e2t − 2te2t)

[1 00 1

]+ te2t

[3 −11 1

]=

[e2 t + te2 t −te2 t

te2 t e2 t − te2 t

].

The solution of the initial value problem is y = eAty0, or

y =

[e2 t + te2 t −te2 t

te2 t e2 t − te2 t

] [21

]=

[2e2t + te2t

e2t + te2t

].

Example 3.4.11. Find eAt, where

A =

1 1 00 1 0−1 2 2

.The characteristic polynomial is P (λ) = det(A−λI) = (λ− 1)2(λ− 2). Thus, 1 is a root

of multiplicity two and 2 is a root of multiplicity one. A fundamental set of solutions for theequation P (D)r = 0 is et, tet, e2t. The general Wronskian matrix is

et tet e2 t

et (1 + t) et 2 e2 t

et (2 + t) et 4 e2 t

,


so the Wronskian matrix at t = 0 is

S =

1 0 1

1 1 2

1 2 4

.Thus, we compute that[

r1 r2 r3

]= SS−1

=[et tet e2t

] 0 2 −1

−2 3 −1

1 −2 1

=[−2 tet + e2 t (2 + 3 t) et − 2 e2 t (−1− t) et + e2 t

].

By the theorem,

eAt = (−2tet + e2t)I + ((2 + 3t)et − 2e2t)A+ ((−1− t)et + e2t)A2

= (−2tet + e2t)

1 0 00 1 00 0 1

+ ((2 + 3t)et − 2e2t)

1 1 00 1 0−1 2 2

+ ((−1− t)et + e2t)

1 2 0

0 1 0

−3 5 4

=

et tet 0

0 et 0

et − e2 t (−1 + t) et + e2 t e2 t

.Example 3.4.12. Find eAt, where

A =

1 −1 0

2 3 0

0 −1 1

.The characteristic polynomial is P (λ) = λ3 − 5λ2 + 9λ − 5 = (λ − 1)(λ2 − 4λ + 5).

The roots are 1, 2 + i, 2 − i. Since A is real, eAt must be real. Thus, it makes sense to


use a real fundamental set of solutions for P (D)r = 0. A fundamental set of solutions isS =

[et e2t cos(t) e2t sin(t)

]. The Wronskian matrix of this at t = 0 is

S =

1 1 0

1 2 1

1 3 4

.Thus, the vector of the r’s is[

5 et

2− 3 e2 t cos(t)

2+ e2 t sin(t)

2−2 et + 2 e2 t cos(t)− e2 t sin(t) et

2− e2 t cos(t)

2+ e2 t sin(t)

2

].

Thus, by the theorem,

eAt = r1I + r2A+ r3A2

=

e2 t cos(t)− e2 t sin(t) −e2 t sin(t) 0

2 e2 t sin(t) e2 t cos(t) + e2 t sin(t) 0

−et + e2 t cos(t)− e2 t sin(t) −e2 t sin(t) et

.We now turn to the proof of the theorem. The basic ingredient is the Cayley-Hamilton

theorem. Suppose that A is an n×n matrix and let P (λ) = det(A−λI) be the characteristicpolynomial of A. We can write

P (λ) = αnλn + αn−1λ

n−1 + · · ·+ α1λ+ α0,

it’s not hard to show that αn = (−1)n. It makes sense to plug A into the polynomial P (λ),namely

P (A) = αnAn + · · ·+ α1A+ α0I.

The Cayley-Hamilton Theorem says that P (A) = 0. A brief proof of the Cayley-HamiltonTheorem is given in Section 3.5.

The next ingredient of the proof of the theorem is the following lemma.

Lemma 3.4.13. Let A be an n× n matrix (real or complex) and let P (λ) =∑αjλ

j be thecharacteristic polynomial of A.

Consider the n-order differential equation for an n× n matrix valued function t 7→ Φ(t),

αnΦ(n)(t) + αn−1Φ(n−1)(t) + · · ·+ α1Φ′(t) + α0Φ(t) = 0. (3.4.6)

Then, the unique solution Φ of (3.4.6) subject to the initial conditions

Φ(0) = I, Φ′(0) = A, Φ′′(0) = A2, . . . ,Φ(n−1)(0) = An−1 (3.4.7)

is Φ(t) = eAt.


Proof. Consider the uniqueness of the solution first. So, suppose that Φ1 and Φ2 are solutionsof (3.4.6) subject to the initial conditions (3.4.7). Then Ψ = Φ2−Φ1 is a solution of (3.4.6)which satisfies the initial conditions

Ψ(0) = 0, Ψ′(0) = 0, Ψ′′(0) = 0, . . . ,Ψ(n−1)(0) = 0.

But this means that each entry ψij(t) of Ψ(t) is a solution of the scalar n-order differentialequation

αny(n)(t) + αn−1y

(n−1)(t) + · · ·+ α1y′(t) + α0y(t) = P (D)y = 0

which satisfies the conditions

y(0) = 0, , y′(0) = 0, y′′(0) = 0, . . . , y(n−1)(0) = 0.

Thus, each entry of Ψ is identically zero. This shows that Φ1 = Φ2.To show that Φ(t) = eAt is a solution, note that

Φ(k)(t) = AkeAt. (3.4.8)

Thus, Φ(k)(0) = Ak, so this function Φ satisfies the initial conditions (3.4.7). If we plug Φinto the left hand side of (3.4.6), using (3.4.8), we get

αnAneAt + αn−1A

n−1eAt + · · ·+ α1AeAt + α0e

At = P (A)eAt = 0,

since P (A) = 0 by the Cayley-Hamilton Theorem. Thus, Φ(t) = eAt is a solution of (3.4.6)with initial conditions (3.4.7).

We can now complete the proof of Theorem 3.4.9. By the last Lemma, all we need to dois to show that the expression

Φ(t) =n−1∑j=0

rj+1(t)Aj

given in the theorem satisfies the differential equation (3.4.6) and the initial conditions(3.4.7). We have

Φ(k)(t) =n−1∑j=0

r(k)j+1(t)Aj. (3.4.9)

Thus, we have

Φ(k)(0) =n−1∑j=0

r(k)j+1(0)Aj = Ak,


since the rj’s satisfy the initial conditions (3.4.4). The differential equation is

n∑k=0

αkΦ(k)(t) = 0.

Substituting (3.4.9) into the left hand side, we have

n∑k=0

αkΦ(k)(t) =

n∑k=0

αk

n−1∑j=0

r(k)j+1(t)Aj

=n−1∑j=0

[ n∑k=0

αkr(k)j+1(t)

]Aj.

But, each of the coefficients[ n∑k=0

αkr(k)j+1(t)

], j = 0, . . . , n− 1

is zero, because each rj is a solution of the scalar equation

P (D)r =n∑k=0

αkr(k) = 0.

This completes the proof of the theorem.In the next chapter we will establish some estimates on how fast eAx grows as t goes to

infinity.


Exercises for Chapter 3

1. Prove Corollary 2.3.1 (in the notes). Use Thm. 2.3.4.

2. Prove the special case of Gronwall’s Inequality when p(t) = k and f2(t) = δ areconstant. That is, show that in this case one gets that f1(t) ≤ δek|x−a|.

3. Suppose the u, v are linearly independent and continuous on and interval I. Supposethat w is defined on I and has only finitely many zeros. Show that wu,wv are linearlyindependent on I. Show that the result fails if u, v are not continuous.

4. Show that the solution of the initial value problem

ay′′ + by′ + cy = g(t), y(t0) = 0, y′(t0) = 0

where a, b, c, are constants, has the form

y = φ(t) =

∫ t

t0

K(t− s)g(s)ds.

The function K depends only on the linearly independent solutions y2, y2 of the cor-responding homogeneous equation and is independent of the inhomogeneous term g.Note also that K depends only on the combination of t − s and hence is actually afunction of a single variable. Think of g as the input and φ(t) as the output. Theresult shows that the output depends on the input over the entire interval from theinitial time t0 to the current time t. The integral is called the convolution of K and g.

5. Find the general solution.

a) y′′ + y = sinx sin 2x.

b) y′′ − 4y′ + 4y = ex + e2x.

c) y′′′ + y′′ + y′ = 1 + cos(

√3x

2).

6. Find eAt for

a) A =

[0 1−1 0

]b) A =

[0 18 −2

]c) A =

0 0 01 0 01 0 1

7. Find eAt for


a) A =

[2 10 2

]b) A =

[4 5−4 −4

]c) A =

2 1 00 2 10 0 2

d) A =

2 0 00 2 10 0 2

e) A =

−1 1 00 2 10 0 2

8. Solve the equation x = Ax+B with x(0) = x0

(a) A =

[0 18 −2

]and B = 0, x0 =

[23

](b) A =

[0 1−9 6

]and B =

[0t

], x0 =

[11

](c) A =

[0 18 −2

]and B =

[04

], x0 =

[12

]

3.5. APPENDIX: THE CAYLEY-HAMILTON THEOREM 51

3.5 Appendix: The Cayley-Hamilton Theorem

Some readers will not have seen a proof of the Cayley-Hamilton Theorem, and many refer-ences where you might look it up go farther into the structure theory of linear transformationsthan is necessary for us before they get to the Cayley-Hamilton Theorem. In this section,we give a concise, self-contained, proof of the Cayley-Hamilton Theorem.

It is convenient to approach the problem in terms of linear transformations. Supposethat V is a vector space over K (= R or C). Let T : V → V be a linear transformation.

If v1, . . . , vn is an ordered basis of V , we can find a matrix A = [aij] that represents Twith respect to this basis. The matrix A can be described as

T (vj) =n∑i=1

aijvi, j = 1, . . . , n,

i.e., the jth column of A gives the coefficients of T (vj) with respect to our given basis.Suppose that u1, . . . , un is another basis, and let B be the matrix of T with respect to

this basis. There is a nonsingular matrix P such that

uj =n∑i=1

pijvi, j = 1, . . . , n.

It is then pretty easy to calculate that

B = P−1AP.

Now, note that

det(B) = det(P−1AP ) = det(P−1) det(A) det(P ) = det(P )−1 det(A) det(P ) = det(A).

Thus, we may define det(T ) by det(T ) = det(A), where A is the matrix representation of Twith respect to some basis—we get the same number no matter which basis we use.

There are standard algebraic operations defined on the set L(V ) of linear transformationsV → V , namely

(ST )(v) = S(T (v))

(S + T )(v) = S(v) + T (v)

(kT )(v) = kT (v), k ∈ K.

If Q(λ) = anλn + . . . a1λ+ a0 is a polynomial with coefficients in K, we define

Q(T ) = anTn + an−1T

n−1 + · · ·+ a1T + a0I,


where I is the identity linear transformation on V . The algebraic operations on matrices aredefined precisely to correspond to the algebraic operations on linear transformations. Thus,if we choose a basis for V and A is the matrix of T with respect to this basis, the matrix ofQ(T ) with respect to this basis is

Q(A) = anAn + an−1A

n−1 + · · ·+ a1A+ a0I,

where I is the identity matrix.Suppose that V is a complex vector space and T : V → V is a linear transformation. The

function PT : C → C defined by PT (z) = det(T − zI) makes sense. If we choose a basis ofV and let A be the matrix of T with respect to this basis, we have PT (z) = det(A − zI).Thus, PT (z) is a polynomial, which we call the characteristic polynomial of T . We seethat PT (z) is the same as the characteristic polynomial PA(z) = det(A− zI) of any matrixrepresentation A of T . If we show that PT (T ) = 0, it will follow that PA(A) = 0 for everymatrix representation A of T (since PA(A) is the matrix representation of PT (T )).

On the other hand, every matrix A with complex entries (which includes the case ofreal entries) is the matrix representation of some linear transformation on a complex vectorspace, for example, T : Cn → Cn : v 7→ Av. Thus, if we show that PT (T ) = 0 for every lineartransformation, it will follow that PA(A) = 0 for every matrix A.

Thus, it will suffice to prove the following version of the Cayley-Hamilton Theorem.

Theorem 3.5.1 (Cayley-Hamilton). Let V be a finite dimensional vector space over Cand let T : V → V be a linear transformation. Then, if P (λ) = det(T − λI) is the charac-teristic polynomial of T , P (T ) = 0.

The proof will occupy the remainder of this section. To begin, let n be the dimension ofV . We make the following definition: a fan for T is a collection {Vj} j = 1n of subspaces ofV with the following properties

V1 ⊂ V2 ⊂ · · · ⊂ Vn−1 ⊂ Vn = V

dim(Vj) = j

T (Vj) ⊂ Vj, j = 1, . . . , n.

The main step in the proof is the following lemma.

Lemma 3.5.2. If T : V → V is a linear transformation on a finite dimensional complexvector space V , there exists a fan for T .

Proof of Lemma. The proof is by induction on the dimension of V . The lemma is triviallytrue in the case where the dimension of V is one (the fan is just {V }).

So, suppose that the lemma is true when the dimension of the vector space is n − 1.Assume that T : V → V , where V has dimension n.


Since we are working over the complex numbers, T must have an eigenvector (the char-acteristic polynomial has at least one complex root). Thus, there is a non-zero vector v1 anda complex number λ such that T (v1) = λv1. Let V1 be the one dimensional subspace of Vspanned by v1.

We can find a subspace W ⊂ V of dimension n− 1 such that

V = V1 ⊕W. (3.5.10)

One way to see this is to note that the set {v1} is linearly independent. Any linearlyindependent set can be completed to a basis, so we can find vectors {wj}n−1

j=1 such thatv1, w1, . . . , wn−1 is a basis of the n-dimensional space V . We can then take W to be the spanof {wj}n−1

j=1 . The direct sum decomposition (3.5.10) means that every v ∈ V can be writtenas v = ζv1 + w for a unique scalar ζ ∈ C and vector w ∈ W .

Let P : V → V and Q : V → V be the projections onto V1 and W respectively. In otherwords, if v ∈ V , write v = ζv1 + w, where w ∈ W . Then

P (v) = P (ζv1 + w) = ζv1

Q(v) = Q(ζv1 + w) = w.

It’s easy to check that P and Q are linear transformations, and clearly v = P (v) +Q(v).Since Q(V ) ⊂ W , surely QT (W ) ⊂ W . Thus, we may restrict QT to W and get a

linear transformation W → W . By the induction hypothesis, there is a fan for this lineartransformation. Thus, we have subspaces

W1 ⊂ W2 ⊂ · · · ⊂Wn−1 = W

such that dim(Wk) = k and QT (Wk) ⊂ Wk. Now define subspaces V2, . . . , Vn by

V1 = V1, (V1 is already defined)

V2 = V1 +W1

V3 = V1 +W2

...

Vn = V1 +Wn−1.

Thus, Vj = V1 + Wj−1 for j = 2, . . . , n and Vn = V1 + W = V . From the direct sumdecomposition (3.5.10), it is clear that these sums are direct and so dim(Vj) = j. To provethat {Vj} is a fan for T , it remains to prove that T (Vj) ⊂ Vj for each j.

In the case of V1, this is clear because v1 is an eigenvector of T . An arbitrary element ofV1 is of the form ζv1. Then T (ζv1) = λζv1 ∈ V1.


For j > 1, a typical element v of Vj can be written as v = ζv1 +w, where w ∈ Wj−1. Weneed to show that T (v) ∈ Vj. We have T (v) = PT (v) +QT (v). Certainly PT (v) ∈ V1 ⊂ Vj,since P is projection onto V1. For QT (v), we have

QT (v) = QT (ζv1 + w)

= Q(ζT (v1) + T (w))

= ζλQ(v1) +QT (w)

= QT (w),

since Q(v1) = 0. But Wj−1 is invariant under QT , so QT (v) = QT (w) ∈ Wj−1 ⊂ Vj. Thus,Vj is invariant under T .

To proceed with the proof of the theorem, suppose that T : V → V where V has dimensionn and let {Vj} be a fan for T .

We construct a basis of V as follows. Choose any non-zero v1 in the one dimensionalspace V1. Obviously V1 = span {v1}.

Now, since V1 ⊂ V2 and the dimension of V2 is 2 (which is greater than the dimensionof V1), we can find a vector v2 ∈ V2 that is not in V1. We claim that the collection v1, v2 islinearly independent. To see this, suppose that c1v1+c2v2 = 0. If c2 = 0, then c1 must be zerobecause v1 6= 0. On the other hand, if we had c2 6= 0, we would have v2 = (−c1/c2)v1, whichimplies that v2 is in the subspace V1. This contradiction shows that we must have c2 = 0, andso also c1 = 0. Since v1, v2 ∈ V2 and V2 has dimension 2, we must have span {v1, v2} = V2

Suppose that we have constructed linearly independent vectors {vj}kj=1, where k < n,such that

Vj = span {v1, . . . , vj}, j = 1, . . . , k.

To extend this set, choose a vector vk+1 ∈ Vk+1 \Vk. We claim that v1, . . . , vk, vk+1 is linearlyindependent. To see this, suppose that

c1v1 + · · ·+ ckvk + ck+1vk+1 = 0.

If ck+1 = 0, we must have c1 = 0, . . . , ck = 0, since we have assumed that v1, . . . , vk arelinearly independent. But, if we had ck+1 6= 0, we could write

vk+1 = (−c1/ck+1)v1 − · · · − (ck/ck+1)vk.

This implies that vk+1 ∈ span {v1, . . . , vk} = Vk, which contradicts the choice of vk+1. Thus,we conclude that all of the cj’s are zero, and so that v1, . . . , vk+1 is linearly independent.Since span {v1, . . . , vk+1} ⊂ Vk+1 and the dimensions are equal, we must have Vk+1 =span {v1, . . . , vk+1}.


Continuing in the way, we get a basis v1, . . . , vn of V such that

Vk = span {v1, . . . , vk}, k = 1, . . . , n.

Let’s see what the matrix of T is with respect to this basis. Since v1 ∈ V1 and V1 isinvariant under T , we must have T (v1) ∈ V1 = span {v1}. Thus, the expansion of T (v1) interms of the basis is

T (v1) = a11v1

for a scalar a11. In general, we have vk ∈ Vk and Vk is invariant under T , so T (vk) ∈ Vk =span {v1, . . . , vk}, and so the expansion of T (vk) is

T (vk) = a1kv1 + a2kv2 + · · ·+ akkvk. (3.5.11)

Thus, the matrix A of T with respect to this basis is of the form

a11 a12 a13 . . . a1,n−1 a1n

0 a22 a23 . . . a2,n−1 a2n

0 0 a33 . . . a3,n−1 a3n

0 0 0 . . . a4,n−1 a4n...

......

. . ....

...0 0 0 . . . an−1,n−1 an−1,n

0 0 0 . . . 0 ann

.

In other words, the matrix is upper triangular. Since it’s easy to to compute the determinantof an upper triangular matrix, we see that the characteristic polynomial of T (which is thesame as the characteristic polynomial of A) is given by

P (λ) = (a11 − λ)(a22 − λ) . . . (ann − λ)

and so

P (T ) = (a11I − T )(a22I − T ) · · · (annI − T ).

To complete the proof, we will show by induction on k that

(a11I − T ) · · · (akkI − T )Vk = 0 (3.5.12)

for k = 1, . . . , n. The case k = n then shows that P (T )Vn = P (T )V = 0, so P (T ) = 0.For k = 1, we see that (a11I−T ) is zero on V1, because every vector in V1 is an eigenvector

of T with eigenvalue a11.


For the induction step, assume that (3.5.12) holds for some k < n. An arbitrary vectorv ∈ Vk+1 can be written as v = c1v1+. . . ckvk+αvk+1 = αvk+1+u, where u = c1v1+· · ·+ckvk ∈Vk. Then, we have

T (v) = αT (vk+1) + T (u).

Since Vk is invariant under T , T (u) is some vector in Vk, call it u′. If we apply (3.5.11), wehave

T (v) = αak+1,k+1vk+1 +

[α

k∑j=1

aj,k+1vj

]+ u′.

The term in brackets is in Vk = span {v1, . . . , vk} and u′ ∈ Vk, so we have

T (v) = αak+1,k+1vk+1 + u′′,

where u′′ ∈ Vk. Thus, we have

ak+1,k+1v − T (v) = ak+1,k+1αvk+1 + ak+1,k+1u− αak+1,k+1vk+1 − u′′= ak+1,k+1u− u′′ ∈ Vk.

Thus, we have shown that (ak+1,k+1I − T )Vk+1 ⊂ Vk. But then

(a11I − T ) · · · (akkI − T )(ak+1,k+1 − T )Vk+1 ⊂ (a11I − T ) · · · (akkI − T )Vk

and the right hand side is 0 by the induction hypothesis (3.5.12).This completes the proof of the Cayley-Hamilton Theorem.

Date post:	27-May-2018
Category:	Documents
Upload:	doanngoc
View:	223 times
Download:	2 times

Chapter 3 Linear equations - Texas Tech...

Documents