Explicit Solutions of Linear Matrix Equations

Explicit Solutions of Linear Matrix EquationsAuthor(s): Peter LancasterSource: SIAM Review, Vol. 12, No. 4 (Oct., 1970), pp. 544-566Published by: Society for Industrial and Applied MathematicsStable URL: http://www.jstor.org/stable/2028490 .

Accessed: 13/06/2014 07:50

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extendaccess to SIAM Review.

http://www.jstor.org

This content downloaded from 185.2.32.21 on Fri, 13 Jun 2014 07:50:23 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=siam

http://www.jstor.org/stable/2028490?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


SIAM REVIEW

Vol. 12, No. 4, October 1970

EXPLICIT SOLUTIONS OF LINEAR MATRIX EQUATIONS*

PETER LANCASTERt

1. Introduction. Let A 1, A2, - , Ak be m x m complex matrices (members of Cm xm) and B1, B2, . , Bke C, G n,. In this paper we review some techniques for obtaining explicit representations for a solution matrix X E Cm x nfor equations of the form

(1) A1XB1 + A2XB2 + + AkXBk = C,

where C E Cm X n is given, and such a solution is known to exist. Certain special cases of this equation arise frequently in applications and will be considered in some detail. The problem in its full generality is far from tractable, although the transformation to a matrix-vector equation, which we shall discuss in a moment, allows us to use the considerable arsenal of numerical weapons currently available for the solution of such problems.

Our objective is to bring together some results concerning solutions of our equation and present them in a connected treatment which in parts allows us to improve the theory and in others allows us to make simplifications. In particular, we wish to present and make available to a wider audience some of the elegant results of Krein [7]. By presenting these in a matrix context we can significantly abbreviate some of the theory.

2. The equivalent matrix-vector equation. If A E Cm x m and B E C,, x n, then the direct product (or tensor product) of A and B, written A 0 B, is defined to be the partitioned matrix

[a,,B a12B ... aimB

a21B a22B ... a2mB A 0 B = ECmn x mn-

L_amrB am2B ... ammBj

It must be assumed that the reader has some familiarity with this concept. An account of the uses and applications of this operation can be found in [8, Chap. 8] or in MacDuffee [10]. With any matrix D of Cmx n we associate a column vector d E Cmn whose elements are just those of D written out in a particular order. We start with d1 1, work across the first row, then the second, and so on, down to the mth row when all the elements of D have been exhausted.

With these definitions it is a routine matter to verify that if G = A1 0 B'1 + ... + Ak 0 B'k (the prime denotes transposition), then (1) is equivalent to (2) Gx = c.

In fact, by viewing (1) as a set of mn scalar equations for the elements of X, (2) is precisely the same set of equations written out in a different way. However, in

* Received by the editors August 22, 1969, and in revised form February 16, 1970. t Department of Mathematics, Faculty of Arts and Sciences, University of Calgary, Calgary 44,

Alberta, Canada. This work was supported by the Natidnal Research Council of Canada.

544



SOLUTIONS OF LINEAR MATRIX EQUATIONS 545

order to make pronouncements about existence, uniqueness and techniques for the solution of (2), we generally need some information about the eigenvalues of G. We call the set of complex numbers which are eigenvalues of a matrix M the spectrum of M and denote this set by v(M).

In the general case it is difficult to say anything very useful about v(G), even if we know o(Aj) and o(B) for j = 1, 2, - , k. Indeed, we cannot say when 0 E v(G); that is, when G will be singular. Our first special case, then, is one which still admits some generality, but. allows us to determine v(G) in terms of o(Aj) and o(Bi), j = 1, 2,., k. We consider the case in which each Ai is a scalar polynomial in a fixed matrix A c Cm x m and each Bj is a scalar polynomial in a fixed B E Cn xn,

Equation (1) then takes the form

(3) a ojkAXB = C, i,k 0

where the acik are complex numbers. If we define the polynomial in two scalar variables: (x, y) = ZajkXjyk, then we define the polynomial 0 with matrix arguments by

q(A;B) = Z JkAi 0 B . j,k = 0

Then we see that (3) is equivalent to an equation of the form (2) with G = 4(A; B'). The remarkable thing about this class of problems is the following result.

THEOREM 1. The spectrum of the matrix /(A; B) is the set of all numbers of the form O(A, p), where A E v(A), p E v(B).

The reader is referred to [8] or [10] for a proof of this important theorem. Thus, given v(A) or v(B) we can immediately say, for example, that G is singular if and only if O({, p) = 0 for some eigenvalues A, it or A, B, respectively.

3. Special cases. A special case which will occupy us is the much studied equation

(4) AX + XB =C,

in which case O(A, p) = A + p, (A; B') = G = A 0 I + I 0 B'and v(G) = {I + i: A cE (A), p E v(B)}. A further important special case is obtained by putting B = A*, the conjugate transpose of A. We then have m = n and seek an X E Cn X n for which

(5) AX + XA* = C.

We shall refer to this as the Lyapunov equation and it is of particular importance because properties of a solution matrix X can sometimes be used to decide whether A is a stable matrix or not. (The matrix A is said to be stable if and only if all of its eigenvalues have negative real parts.) We mention also the special case of (4) in which B -A and C = 0. We then include the problem of finding all matrices X which commute with the given matrix A, i.e., for which AX = XA.

Another case arising frequently in practice can be derived by a transformation of (4). Suppose that a is a nonzero real number and let f(z) = (z + a)(z - a)- '.



546 PETER LANCASTER

Define' U = f(A) and V = f(B). It is easily verified that (4) is equivalent to

(aI + A)X(aI + B) - (aI - A)X(aI - B) = 2aC.

If we now suppose that a is chosen so that (aI - A)- ' and (aI - B)- exist, then on pre- and postmultiplying by these matrices, respectively, we obtain

UXV- X = 2a(aI - A)-'C(aI -B)-.

Observing that f(z) - 1 = 2a(z - a)- we obtain

(6) X- UXV= --(U-I)C(V-I). 2a

Thus, with an a chosen as indicated, X satisfies (4) if and only if it satisfies (6). The transformed equation is particularly useful in connection with the

stability problem. Thus, suppose it is known that A and B are stable. The function f maps the imaginary axis in the z-plane (Re (z) = 0) onto the unit circle, and the left half-plane (Re (z) < 0) onto the interior of this circle. Thus, if A and B are stable, then U = f(A) and V = f(B) have all their eigenvalues inside the unit circle. If we define the spectral radius of a square matrix M, p(M), to be the maximum of the moduli of the eigenvalues of M, then we have p(U), p(V) < 1.

Note that in the application of Theorem 1 to (6) we have O(A, p) = 1 -A and G = 4(L; V') = I - U 0 V'. Thus, when p(U)p(V) -< 1, G is nonsingular, and (6) has a unique solution.

4. Remarks on existence and uniqueness theorems. For the general problem (1), the best we can do for existence and uniqueness results is to formulate them in terms of the equivalent matrix-vector problem (2). Thus, there exists a solution if and only if the vector c in (2) is in A(G), the range of G. Or, what is equivalent, the rank of the augmented matrix [G, c] is equal to the rank of G. A third formula- tion is: There exists a solution of (1) if and only if y'c = 0 for all vectors y in X(G'), the null space of G', i.e., all y for which y'G = 0'. Finally, we have the uniqueness theorem: There exists a unique solution of (1) if and only if G is nonsingular.

We have noted that we cannot say in general when G will be singular and so it seems unlikely that these results can be related more directly to the spectral properties of the coefficient matrices, Aj and B1. In the case of (3) we can say when G will be singular, and hence when we have uniqueness. But in analyzing the problem completely and trying to characterize the set of all solutions, (4) seems to be the first tractable case, and even this is far from pretty. We shall not pursue this question. Our interest is in cases where explicit solutions can be written down in matrix form and hence primarily in the case of a unique solution, but also in the nonunique case when special assumptions will simplify the nature of the problem.

There are several good accounts of the general solution of (4) to be found in the literature, among which we mention one of the earlier accounts by Rutherford [15] which has been reformulated by Ma [9], and the account given by Gant- macher [2]. Givens [3] has given a detailed discussion of the structure of G = A 0 I - I 0 B in terms of its elementary divisors. Trampus goes further

' Refer to ? 6 for more information on the definition off(A), f (B).




and in [18] and [19] he shows how complete sets of generalized eigenvectors can be formed for operators of the form L 1(X) = AX + XB (cf. (4)) and L2(X) = AXB. He avoids our use of the direct product of matrices by observing that eigenelements (or vectors) of L 1 are matrices from Cm ,, of rank one. (This divergence in methods of attack follows from two distinct identifications of the tensor product of the spaces Cm and Ce.)

Much the most elegant criteria for existence of a solution of (4) in the case m = n was proved by Roth [14]. He proves that a solution exists if and only if the following partitioned 2n x 2n matrices are similar:

Lo BJ Lo B A simple, but significant, necessary condition for the existence of a solution

of (3) can be formulated as follows. THEOREM 2. A necessary condition for the existence of a solution of (3) is that,

for every A,r e C(A) and p, E v(B) with O(ilr, y) = 0, we have a'Cb = Ofor all a E .l(A' - ArI) and all b E Al(B - j)

This is easily proved on multiplying (3) on the left by a', on the right by b and noting that a'A' = A a', Bkb = 4t4b. However, it is instructive to view this another way. We have noted that a necessary and sufficient condition for existence is that, whenever 1({r, Y) = 0, we must have y'c = 0 for all y E i/1(G'). We claim that, in the above notation, y = a 0 b E i/1(G'), for

(a 0 b)'G = (a' 0 b')(Z aJkAi 0 B') j,k

= Z ojk(a'Ai) 0 (Bkb)' j,k

= (kLr, is)(a 0 b)' = 0.

Thus, when a solution exists, we must also have (a 0 b)'c = 0, and this is another way of writing a'Cb = 0.

Using this approach we can easily see that if d1, d2 are the dimensions of ,Al-(A ArI) and iF(B - yI), respectively, then the dimension of Al'(G') is not less than E did2, where the summation is over all pairs Ar ' Ps for which k(ir, YJ) = 0. Unfortunately it can exceed this so that not all the left eigenvectors of G are necessarily formed in this way.

A first insight into the difficulties of the problem is obtained on simplifying (3) by reducing A and B to Jordan normal form. Thus, let

A = SJ1S', B= TJ2T-',

where J1 and J2 are matrices in the Jordan normal form. It is then a simple verification to see that (3) is equivalent to EjkacjkJj1XJ2 = C, where X = S-'XT, C = S-'CT. Then G takes the form

G= q(J1;J) = EajkJjl 2OJ2- j,k

Thus, we see that G is considerably simplified if either A or B has only linear elementary divisors (in which case J1 or J2 is diagonal). If A and B both have only linear elementary divisors, then we may write J1 = diag{)ll, A-.. }



548 PETER LANCASTER

J2= diag{ll, - - - , yln} and G is diagonal with diagonal elements q(QLr, Pl), r = 1, 2, , m, s = 1, 2, , n. If, in addition, (lr, ls) 0 for all r and s, then the unique solution is given by

* A Crs Xrs= (r s

5. A solution in series. We consider equations which can be written in the form

X - E AjXBj = C. j= 1

The corresponding vector equation is x - Gx = c, where G = EjAj E0 Bj, or (I - G) x = c. When det (I - G) :# 0 we have the unique solution x = (I - G)-'c. Now it is well known [8, Theorem 7.1.1] that we may write

(I - 1 = I + G + G2 +

if and only if p(G) < 1. Thus, in this case we have the series representation for x,

(7) x = c + Gc + G2c + ,

and the bound

(8) llxll 1 llc? 1-v

where v = JIGII < 1. Here we use any vector norm together with its induced matrix norm (the Euclidean vector norm and spectral matrix norm, for example). Now the vector series (7) has a matrix representation

x = C + E AjCBj + EAjAkCBkBj + i j,k

which is just the solution proposed by Wedderburn [20]. Wedderburn arrived at his series in the following way. Put x = c + Gx in the second term of x - Gx = c to obtain

x - G2x = c + Gc.

Substitute in a similar way n - 1 times, and we get

x - G3x c + Gc + G2c,

x-G x =c + Gc + + G- 'c. Thus, if xn is the nth partial sum of (7), we have

(9) x x-Xn = G nXI

and combining this with (8) we obtain the error estimate vn

IIX vXnll 11 - 1 |C||. 1- v




In the application of the technique one problem is likely to be that of deciding when p(G) < 1. If we are in luck, we may be able to apply Theorem 1 to estimate p(G). The important equation (6) is such a case which we may write in the form

(10) X - UXV= C1.

If G = U 0 V', then Gv = U' 0 Vv. We see from Theorem 1 that p(G) = p(U)p(V) and so we obtain the following theorem.

THEOREM 3. Equation (10) has a unique solution X with the series representation

00

X E Ui-lclvi-l j= 1

if and only if p(U)p(V) < 1. When (10) is obtained by transformation from AX + XB = C, we can

certainly use the series solution if it is known that A and B are stable. But note that we may have p(V) ? 1 (B unstable) provided p(U)p(V) < 1 is still satisfied. This corresponds to the fact that X satisfies (4) if and only if

(A - oI)X + X(B + oc) = C.

Thus, with a > 0 and A, B stable, we can arrange for B + cI to be unstable and still have the same solution matrix X. We refer to this as the "spectral shift" property of (4).

A simple a priori bound for X is easily found. Write X = C1 + UXVand, with any matrix norm, we have

iiXIi <_ illCi 1 + 11 Ugll XIi lVil.

If K = 11Ull 1HVII < 1, we have

(1 1) |lXII _ (1 - K) ICl II.

In this case (9) yields X - X,, = U"XV", and combining this with (11) we obtain the error estimates (in any matrix norm)

Kn

( 12)~~~~~~ 1 - K

COROLLARY (to Theorem 3). If 11 11 denotes any matrix norm, we have K

U V 11 < 1, and we write

n

Xn E Uj-lXVj;-1 j=1

then condition (12) obtains for n = 1, 2, Smith [17] discusses a technique for accelerating the convergence of the

series in numerical practice. We should note that (4) can generally be solved without first transforming

to the formn (10). In [6], Jameson describes a finite recursive technique for direct computation qf the solution of (4) in the nonsingular case. This involves the inversion of the matrix of order min (m, n).



550 PETER LANCASTER

6. Results from spectral theory. We shall next discuss Krein's analysis of (3). In order to use his techniques we need some of the results of the spectral theory of operators as developed in generality by Dunford and Schwarz [1], for example, and in the present context by Lancaster [8]. We summarize here some of these results.

If a matrix A E Cn x nhas distinct eigenvalues AI, A2, i , 4 define the index of Ak, k = 1, 2, . , s, to be the least positive integer, Mk, for which Al'(Ik - A)Mk

= Al(LAk - A)mk+ 1. We then say that a complex-valued function f of a complex variable is defined on the spectrum of A if f and its first mk - 1 derivatives exist at ALk, k = 1, 2, . , s. For such a function f the matrix f(A) is well-defined and there exist component matrices Zkj, k = 1, 2, , s, j = 1, 2, *.**, mk, depending only on A, which are linearly independent members of Cn x n and

s mk

(13) f(A) = E E fi ')Zkj k=1 j=1

wherefk( ' is the (j - 1)st derivative offevaluated at Ak. The component matrices are polynomials in A (and hence commute with A) and may also be expressed as contour integrals

(14) Z(j -k)!27cif t k)i'R(z,A)dz,

where Ck is a sufficiently small circle with center Ak and R(z, A) is the resolvent of A, (Iz -A)- '. The reader will easily verify the resolvent equation

(15) R(, A) - R(1, A) = (i - A)R(A, A)R(u, A).

The matrices Zkl are particularly important. They are idempotent, Zkl = Zkl and have the property

(16) *Zk 1) = IF(Ltk - A)Mkk.

We also describe an idempotent matrix K as a projection onto X(K). Since I = Z,, + Z21 + *-- + ZsI (deduce from (13)), it follows that (using the direct sum of subspaces)

S

Cn A=(Z,) G* * ?(Zs 1)= E ) (Zk9) k= 1

The matrices Zkl are also orthogonal, in the sense that ZjIZkI = 0 when j =# k. Important special cases of (13) are:

S

(17) A = Z (AkZkl + Zkj) k= 1

s mk (j-1)!

(18) RQA=k=I j= 1 (A -k)

A contour in the complex plane will always mean a finite number of rectifiable Jordan curves oriented in the usual sense. If F is a contour, r denotes the interior of F andr = F U F, the union of F and F.




Now let F be a contour with no member of v(A) on F (F n v(A) = 0, the empty set). The operator

p= I R(z,A)dz

is a projection and depends only on the poles of R(z, A) in r, that is, on the members of v(A) in r If n A v(A) = ,, we refer to P as the projection defined by A and a,. In fact, we have

P= EZkl, AP) = E G (Zkl) k k

summed over those k for which Ak E I. The operator I - P is also a projection and Cn = W(P) + (I - P). Note also that AP = PA.

Iff is continuous on F and regular in fr, then we have, more generally,

(19) Pf(A) = f (A)P = 2* { f (z)R(z, A) dz.

Note that this integral is well-defined even though f (A) itself may not be defined as a result of singularities in f at points of the spectrum of A outside F. To allow for such cases we may also write Pf (A) = fp(A).

Finally, we note that if P is any projection (idempotent) and x E R(P), then Px = x. To see this, observe that x E #(P) implies that there is a y with x = Py. Multiplying by P and putting p2 = P, we get

Px = p2y = Py = x.

7. The theorem of Krein. We first need a preliminary result. LEMMA. Let A E Cn x n F1, F2 be regular functions in a neighborhood N of

v(A) with values in Cl x n, Cn x m respectively, and F a contour in the interior of N with v(A) n F = 0. Then

L20) f A Jr,()R(A, A) dA][2 frI R(A, A)F2(A) dj (20)

= _IT F1()R(A, A)F2(A) dA.

Proof. Choose a new contour F1 with F c F-, F c N and f nA (A) = n' A v(A). Then write

f R(A, A)F2(A) dA = f R(,, A)F2(i) dy.



552 PETER LANCASTER

Using the resolvent equation (15) we manipulate the left-hand side of (20) as follows:

I Ir F F(A)R(A, A) dAL R(u, A)F2(y) dy

- 2 Jl f f F1 ()R(A, A)R(Q, A)F2(1) dA d,t

=47r2 JJF() , 2H id 14s2 C T (A) R(A, A) - R(u, A)F2(y) dA .

F4(A)(Aj A 2-g Fdy u dA Ld,

From our definition of F, we see that F2(u)/I( - p) is a regular function of p in save for a simple pole at p = A, and that F1()/I( - p) is regular in r. Hence

I 2()dy = -27riF2(A), fF1QL)ddA =0

and the above expression reduces to

2_ i F1 ()R(A, A)F2(A) dA

which is the right-hand side of (20). We can now prove the main result concerning (3). We first recall the definition

4(X, Y) = Ej,k ajkXjyk.

THEOREM 4 (M. G. Krein). Let A e Cm x m, B e C,, x,, C e Cm x n and suppose that there exist a, c o(A) and U2 C o(B) such that /(A, p) =# 0 for A ) el, , e U2. If P1, P2 are the projections defined by A, a, and B, U2 respectively, and if P1 CP2 = C, then (3) has the solution

(21) X 1 {fR R(( , A)CR(u, B) d di

where FT, r2 are contours containing and sufficiently close to a1, c2, respectively. Proof. The proof is by verification. From (19) we have

'AiP,= 27 i{ ,VR(A, A) dA, and P2Bk = R(, B) dy.




Then, putting C = P1CP2 in (21), we have

AJXBk =AJPl4 R(, A)CR(, B) dud1jP2Bk

= LJP_ iR(I,A) d1iT RQd, A)C (2 .{ dij

47r2 rl 27 r 7 r /()

L2~ ?-. I B)Rdii].

Since O({, p) = 0 for A CL 1, It C 2, we can choose F1, F2 so close to a,, c2 that the integral in round brackets exists and is a regular function of A on F1. We may then apply the lemma to the first two square brackets and obtain

k _ _ R(,u, B) iF1 k AjXBk = L J { )R(, A)C(. { dy) dAj[2. J ukR(,, B) d] -27ri r 27ri r2 O(AI ) -27ri r2

1 1 hr JLR(A 9A)\d Ry Bp y 27ri Jr2 I27ri J OA ,G(>ds)CR(i,B)ikdii,

having used the lemma a second time. Thus

AjXBk 7r2 J4i2ff tAi R(, A)CR(y, B) dt dA

and

Z ajkAjXB = -42 f f R(A, A)CR(,, B) dy dA j,k

4r

- (2Zi < R(A, A) dA)C(21. R(f , B) d1t

PI PCP2

- C.

Remark 1. The hypothesis P1CP2 = C is equivalent to (I - P1)C = 0 and C(I - P2) = 0.

To see this we multiply P1 CP2 = C on the left by P1 and on the right by P2 to obtain P1C = C and CP2 = C, whence (I - P1)C = 0 and C(I - P2) = 0. Conversely, putting C = P1C on the right of C = CP2, we get P1 CP2 = C.

Remark 2. The solution (21) also satisfies the hypothesis on C:

(22) X = P1XP2.

Remark 3. If a, = o(A), U2 = (B), then /({,u) =# 0 for A ) e(A), ,u e(B), and X is the unique solution of (3). Note that in this case P1 = P2 = I.

In general X is the unique solution of the pair of equations (3) and (22).



554 PETER LANCASTER

Remark 4. Using the functional notation introduced at the end of ? 6, we may write

27if R R( A) d = p,(AA I)P

Hence

(23) X = 2I /PJ1(A, Il) 'CR(,I, B) d,l.

Similarly, we may write

(24) X = I R(Q,A)Cp2(I ,B)' dA) 27ci JCO21I

8. The singular case. The most useful applications of Theorem 4 will doubtless be in the nonsingular case, when k(A, p) =# 0 for all A E v(A) and p E v(B). However it also provides us with sufficient conditions for the existence of a solution in the singular case. Let us investigate this case a little more closely. Let z1 be the complement of a, in v(A) and -c2 be the complement of U2 in v(B). Assume that, in addition to the hypothesis of Theorem 4, for each A E -c1, O(A, i) = 0 for some , E -c2, and conversely. Then Q1 = I - P1 is the projection defined by A, z1 and Q2 = I - P2 is the projection defined by B, -2, and in Remark 1 above we have seen that if Q1C = CQ2 = 0, then X is a solution.

Now, if x is a right eigenvector of B with eigenvalue P, E z2, then x E AQ2). (Note that Q2 = E Zkl summed over those k with Pk E T2 and also (16).) Hence Q2x = x and CQ2 = 0 implies that Cx = 0. Thus, CQ = 0 implies that Cx = 0 for all right eigenvectors x of the eigenvalues of B in -c2. Similarly, it can be shown that Q IC = 0 implies that y'C = O' for all left eigenvectors y of the eigenvalues of A in z1.

Comparing this with Theorem 2 we see that either of the conditions CQ2 = 0 or Q1C = 0 alone is stronger than the necessary condition for existence given there.

9. The equation AX + XB = C. In this case (21) reduces to

(25) X=-~~~ 1{ R()L, A)CR(ii, B)dud (25) X = -4Z d J+ 8 dA d.

Using (23) and (24) we may also write

(26) X = 2zi J (Ii + A)H1C(Ii - B)' dy = 2 (i -(I -A) C(li + B)H2 d).

There is an interesting case in which X can be expressed as a real, improper integral. We follow the notation of the last theorem.

THEOREM 5 (M. G. Krein). If Re (A + it) < 0 for all A c aI and I C U2, and if P1CP2 = C, then

(27) X = - { eAlCeBf dt

is a solution of AX + XB = C.




Proof With the proper choice of F1 and F2 we will have Re (A + p) < 0 for all A E F1, It E F2 and in (25) we may write

1 = _ [ ~e(A + Y)t dt. + f Jo

We then have

X = 4l2 7 (f e tR(, A) dA) C eutR(u, B) d) dt.

Using (19) we obtain

X= - ePAtPCP2eBt dt = - J eAtCeBt dt.

Note, in particular, that for eigenvalues of A with Re (A) < 0 and of B with Re (i) < O we have Re (A + p) < 0. Thus, if A and B are stable, then P1 = P2 = I and (27) is then the unique solution.

Let us now return to the second integral expression of (26) and recall the constraints on F1. We must have a, c F1, no other points of v(A) in t1 and no members of a(- B) in F1 . Bearing this in mind we ask to what extent F1 can be continuously deformed without changing the value of the integral. We may suppose that we begin the deformation from a set of small circles with centers at the points of a, . However, noting that P, = EZr Z summed over those r for which Alr E ,, and using (18) together with the orthogonality of the component matrices, we can write for the integrand of (26):

(IA - A) 'P1C(I2 + B)p 1 = (Z , (J-1)!

Ms 1S ( _ )k- (k -1)! )

and the s sum is over selected numbers of a( - B) outside F1 (that is, the members of C2). The matrices YSk are, of course, the component matrices of B.

Thus, the property C = P1 CP2 implies that we may continue the deformation so that F1 contains other eigenvalues of A, provided they are not also eigenvalues of - B in C2. By doing so, we introduce no further singularities into the integrand. We may also include some eigenvalues of - B in F 1, namely, the members of a(-B) -C2-

Similar arguments may be applied to deformations of F1 and F2 in the general solution (21).

The study of solutions in the form (26) is simplified if there are simple geo- metrical boundaries in the complex plane separating a, from u(- B) which can then be used as the contour F1 (or part of it). For example, if a, is contained in a circle and9 f(-B) is outside this circle (the circle may be supposed to have its center at the origin using the spectral shift property) we can use this circle for F1, thus transforming to what may be a more tractable real integral.



556 PETER LANCASTER

As another example we develop a formal proof of Theorem 5. We shall need the following lemma (which looks like a Laplace transform).

LEMMA. If Re (u) < R(A) for all /1 E v(A), then

r0 (I - A)-' = e- (IA-A)t dt.

The proof is left to the reader. Now suppose that max,ga(A) Re(g) < c1, max,ga(B)Re(y) = C2 and that

c1 + c2 < 0. We consider two semicircular contours as indicated in the sketch. By choosing their common radius sufficiently large we can arrange for v(A),

ii 2

c,+iR

FIG. 1. The complex A-plane

a(-B) to be inside the left and right contours respectively. Choose the left-hand contour for F,, and name the right-hand contour F2 . It is a routine matter to show that in (26) the integral over the semicircle approaches zero as R oo and hence

X= . (IAi-A)- 1CQIA + B)- 1 dA. i7ri c ioo

Using the lemma, we have

cI _ _ _ _ _ _ _ _ _ _ _ _ _ _

2Sx c o e- (I. - A)t dt Q(IA + B)- dA 2-7ri Jo Jc --

At - ct(IA + B)F d dt FIG.e C1 Tel

- B tobe nsde he eftan righotusrsetvl.Coosthle-an

and we have used (19) at the last step.




It is of interest to ask more generally when integrals of the kind found in (26) can be solutions of (4), taking a general contour F for the integration. The answer is provided in the following theorem.

THEOREM 6. Let F be any contour with no points of v(A) U u(- B) on F. Let T 1 = v(A) n F and T2 = c(-B) n r'and K1, K2 be the projections defined by A, T1 and -B, T2 respectively. Let 6 be any subset of a(-B) and J be the projection defined by -B, 6. Then

(28) X = jj (IA - A) 1C(IQ + B)1 dA

is a solution of AX + XB = C if and only if K1CJ = C and CJo = 0, where JO = K2J (the projection defined by -B and T2 n 6).

Proof. Note first that

A(I) - A)- = A(I) - A)- - I, (IA + B) 'B = J - )(I)A + B)J',

and supposing that (28) is a solution we have

AX + XB = 21iJ A(I - A)f 1C(I + B)J' dA

+ jij (I)- A)- 1C(I + B)I 'BdA

-2i{Jr {(I - A)-' - I}C(I + B)J 1 d2

+2 if(IA -A)-'C{J - A(I) + B)J '}dA

-= .cj' (I) + B)J 'dAL + 1 J ( - A)-1 dA CJ 271ri 27rir KCJ - CK2J = C.

Since this argument is reversible we see that (28) is a solution if and only if (K C - CK2)J = C. We prove the theorem by showing that this single condition is equivalent to CJo = 0 and K1CJ = C.

That these two conditions imply (K C - CK2)J = C is obvious. Conversely, premultiply the latter equation by K, and postmultiply by J and we see that K1CK2J = 0. By postmultiplying (K1C - CK2)J = C by K2J we find that CK2J = O. It follows immediately that K1CJ = C and the proof is complete.

It is easily seen that (28) itself also satisfies the conditions on C; namely, XJo = 0 and K1XJ = X.

We obtain the result of (26) when ri contains no eigenvalues of - B so that T2 = 0 and we may take J = P2. Then K2J = JO = 0 and K, = P, so that the two conditions on C reduce to P1CP2 = C, the hypothesis of Theorem 4.



558 PETER LANCASTER

10. Solutions in terms of component matrices. As before we have A E Cm x m BeC-Xn and CeCm xn,. Let A, B have distinct eigenvalues Al, A, and

** lAt respectively, and suppose that Al =-I, A, = -Ir, with no other coincidences among the Ai, -uk. Let Ba = B + cI, ac : 0, and choose cc small enough so that Aj 4- (mk + cX) for allj, k. There is then a unique Xa e Cm x n

such that

AX, + XABa = C.

We now choose a contour F with v(A) c f and a(B) n r = 0. Let Zkj, k = 1, 2, ... , s; j = 1, 2, ... , mk, be the component matrices for A, and with similar conventions, Yih are the component matrices for B (and hence Ba). Using (18) we find that, if lui has index ij, i = 1, 2, , t, then

m fl (_ 1)h-l1(h - 1)! (29) (IA + Bx) = Z Z ( Ai + )h Yih

Using this expansion in (28) we have

I= 2ii { (Ii-A 'C(I + B)' dA

I_ (I( A-C 1)h- (h -)Yh A

27ri Jli,h (A + IAi + aX)h

Using (19) we have

2I X i ( + + h(IA - A)-' dA = (I(I + cc) + A) h

and so, using (13) for this function of A,

(30) X= (-1)'(h - 1)!(I(ui + cx) + A) C1?h. i,h

(31) x (1)h+j(h + j - 2)! i,h u, tk+ cc)h+jl ZkjC ih

We see at once that we may run into trouble if we approach the limit cc = 0 in those terms for which lAi + Ak = 0. We also see that, when r = 0, we may simply put cc = 0 in (31) and obtain the unique solution X, for the nonsingular case. We go on to investigate when solutions of the same kind obtain in the singular case, r > 0.

Now suppose that X, is written out in powers of oc. Thus co

Xx = E Xv Cv X_ # 0. v= -P

Then-we see that (32) t ni S Mk (1)h+ (h +j - 2)!

i= 1 h 1 k= 1j= Zkj C Yih i= 1h=l1k=l1j= 1 (i~ + ALk




and the prime on the summation means that in this sum we simply exclude all those terms for which yu + Ak = 0. We also have

r r

(33) X-1 = E ZklCYil, X-2 =

E (Zk2CYil + ZklCYi2)- i,k= 1 i,k= I

Then from AX, + XaBa = C we deduce that

AX-P + X PB = 0,

AX-p+ I + X-p+IB + X_p = 0,

(34) AX-, + X-1B + X-2 = 01

(35) AX0 + X0B + X-1 = C,

Now substitute (33) into (34), write (using (17))

(36) AZkl = 4kZk1 + Zk2, Yj1B = AiYil + Yi2

and multiply the result from left and right by Zkl, Yil, respectively, to obtain

(AkZkl + Zk2)Cyil + ZklC(4iYil + Yi2) = Zk2CYil + ZklCYi2

This simplifies to

(Ak + [li)ZklCYil = 0.

Thus, if 'ljk + li = 0, 1 _ i, k ? r, then Zk1CYL1 = 0 and since

(j - l)!Zkj = (A - Ak')j 'Zkl, (h - l)!Yih = Yi1(B - IJI)h

we deduce that

ZkjCYih = 0 for 1 < i, k ? r, i k. The expressions (33) therefore simplify and we have, in particular,

r

X_= Z ZklCYkl- k= 1

Substituting in (35) we obtain the following theorem. THEOREM 7. Let A Ec Cm, B Ec C and v(A) = {J1, , is}, a(B)

- {,1, a tj. Suppose that Aj = lAk if and only if I < j = k < r. Then r

AXo + XoB + E ZklCYkl = C, k I

where XO is given by (32). We can, use this at once to improve on a result of Rosenblum [12]. THEOREM 8. With the hypothesis of Theorem 7 we have: (i) If ZklCYk1 = 0 for k = 1, 2, * * , r, then there exists a solution of AX

+ XB = C and XO is such a solution.



560 PETER LANCASTER

(ii) If the common eigenvalues '1, , 'r have linear elementary divisors in both A and -B and there exists a solution of AX + XB = C, then ZklCYkl = 0, k= 1,2, ,r.

Proof Part (i) follows immediately from Theorem 7. For part (ii) we deduce from AX + XB = C by multiplying from left and right by Zkl, Yk1 respectively and using (36),

(lkZk1 + Zk2)Xykl + Zk1X(i-kYkl + Yk2) = ZklCYkl,

whence

ZklCYkl = Zk2XYkl + ZklXYk2.

If 2 . , /r have linear elementary divisors in both A and -B, then Zk2 = Yk2 = for k = 1, 2, , r and we deduce that ZklCYkl = 0.

To see that Zk1CYk1 = 0 is not necessary in every case, take

A= B = [o O] C = l 0O]7 .

A=B=[o 'j c= c]

Then,

[c 0]

is a solution of AX + XB = C and Z1ICY,1 = C #& 0. We obtain the following corollary as a special case of the theorem. COROLLARY. If A and B are simple matrices, then there exists a solution of

AX + XB = C if and only if ZklCYkl = Ofor k = 1,2,. , r. Note that when A and B are simple the summation (32) reduces to

t , ZklCyil

i=1 k=1 Eli + k

In the case of a Lyapunov equation with A simple we have S

St Zk CZi1*

Xo = Z Z i=1 k=1 'k + )i

To make a connection with preceding results let us apply Theorem 4 taking s t

P1= z ZU1, P2 = E YV1 u=r+ 1 v=r+1

The hypothesis P1CP2 = C gives s t

E zulc E YV1 = c u =r+ I v=r+

and multiplying from left and right by Zkl, Ykl, respectively, with 1 ? k ? r we obtain Zk1CYk1 = 0 again.

Conversely, ZklCYkl = 0 for k = 1,2, . , r does not necessarily imply P1CP2 C, so the hypotheses of Theorem 8 are weaker than those of Theorem 4. We can certainly use the solutions (26), obtained from Theorem 4, to obtain




summations of the form (32), but the i, k summations will run from r + 1 to s and r + 1 to t, respectively, thus removing the need for the "primed" summation convention.

11. Solutions in terms of adjoint matrices. We first recall some definitions ([2] or [8]). The reduced adjoint of R - A is the adjoint matrix, adj (R - A), divided by the greatest common divisor of its elements. The minimal polynomial of A is the characteristic polynomial of A divided by this same factor. Let E(i), F({) be the reduced adjoints of R - A, R - B respectively and let /, + be the minimal polynomials of A, B. It is easily seen that we have

(37) E(,)(R - A) = (R - A)E(i) = 0(i)I.

It should also be noted that if a matrix has all its eigenvalues distinct then the reduced adjoint coincides with the adjoint, and the characteristic and minimal polynomials coincide.

Now it is known (Gantmacher [2]) that the component matrices can be expressed in terms of the reduced adjoint and minimal polynomials. We have, for example,

(38) y = 1 [F() 1(mi-h) (38) Yi h- (h 1)!(mi h)! Oj(~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~( 1!m -h![q )~)

where the index denotes a derivative, the subscript says the derivative is to be evaluated at lAi, and Xi(i) = -A/(-i)mi

Thus, all the results obtained in the last section in terms of component matrices can also be expressed in terms of adjoints and minimal polynomials. Let us illustrate in the relatively simple case in which either A or B is simple. Thus, if B is simple and A and -B have no eigenvalues in common, we can use (30) to write the unique solution in the form

X = (IAi + A)ihCYl. i = 1

From (38) we have

- F(HI)

and from (37),

(Ipi + A)-' = E(- E)

Hence, with B simple and a unique solution, we may write

E(-IAi)CF(Ai)

This generalizes considerably solutions given by Givens [3] in the case of Lyapunov's equation with no repeated eigenvalues in the coefficient matrix, A.



562 PETER LANCASTER

12. A priori bounds on solutions of AX + XB = C. We present bounds in cases related to the stability problem. The first result is due to E. Heinz [4].

LEMMA 1. If a = max a((A + A*)/2), then in the spectral norm we have

IleAll ? ea.

The reader is referred to the paper of Heinz for proof. LEMMA 2. If a = max u((A + A*)/2), then for all i E C(A) we have Re (i) < a. Proof Let Ax = Ax and x*x = 1. Then x*A* x X*' and

x* -(A + A*))x = +(x*Ax + x*A*x)= 2+ +) = Re(i).

We then obtain Re (i) ? a from the extreme value of the Rayleigh quotient for (A + A*)/2.

THEOREM 9 (E. Heinz). If a = max a((A + A*)/2), b = max a((B + B*)/2) and a + b < 0, then (27) obtains and, in the spectral norm,

11XII _ la + bK'`IICII.

Proof First, it follows from Lemma 2 that for i E C(A), Re (i) ? a and for ,U E a(B), Re ([u) ? b. Hence, the hypothesis of Theorem 5 follows from a + b < 0 (and P1 = P2 = I). The integral of (27) is therefore the unique solution. We now have

||x i <_ I- le eAtII 11 e"Il1 dt . 11 C I11

and from Lemma 1,

Ile Bt || e Bt dt < e(a b)t dt = a b.

The bound of Heinz' theorem will coincide with a similar bound obtained from a theorem of Smith [17] when a = b. When a :# b, Heinz' theorem is the stronger. As Smith points out this kind of estimate may find some application in estimating and improving approximations to the solution of AX + XB = C. Thus, suppose the approximation is X + A. The discrepancy

D = A(X + A) + (X + A)B - C = AA + AB

is computed and the error A satisfies an equation involving the same operator as that for X itself. If enough is known about A and B, the theorem can be used to obtain an estimate for the norm of the error, I All, in terms of I D I .

We next present some bounds for the spectrum of the solution matrix X of the Lyapunov equation in the case that the coefficient matrix, A, is stable. In this case it is known (and will be proved in ? 13) that if W is a positive definite matrix then the solution X of

(39) AX+ XA* = -W

is als'o positive definite. For this result it is convenient to write, for any square matrix 4,

m(A) = min Re (i), M(A) = max Re (i),




the minimum and maximum being taken over the spectrum of A. The following results can be found in a paper by Smith [16].

THEOREM 10. If W is positive definite, and A is stable, then for the solution of (39) we have

(40) I m(W) < m(X) <

M(W) (40) j~~~~m(A + A*)j = ()=2jm(A)j'

(41) 21m(W) _ M(X) < M(W)

21jM(A)j = M(A + A*)j'

Proof Let x*A = Ax* with x # 0. Then A*x = Ax and multiplying (39) on left and right by x*, x respectively, we obtain

(A + A)x*Xx -x* Wx.

Since W is positive definite, X is also positive definite and Re (A) < 0. We have

** xWx x*Xx = -

xW

2Re (i)'

and using the extremal properties of the Rayleigh quotient we obtain

m(X) ? (W) and M(X) > m(W) ) 21jm(A)j 21 2M(A)j

which gives one inequality for each of (40) and (41). Now let Xy = jy with y # 0. Then y*X = ,uy*, u > 0, and

Yy*(A + A*)y =-y* Wy.

Thus, y*(A + A*)y < 0. Choosing p = M(X) we obtain

M(X) < M(W) M )< M(A + A*)j'

and choosing ,u = m(X) we get

m(X) > m(W)

im(A + A)*1

This completes the four inequalities (40) and (41).

13. A theorem on the inertia of a matrix. We wish finally to present Krein's proof of an important theorem due to Ostrowski and Schneider [10], although we simplify Krein's result here to the matrix case.

For any M E C,, X n we denote the fact that M is positive (negative) definite by writing M >> 0 (<<O).

Note first of all that if the Lyapunov equation

AX + XA* = C,

with C Hermitian, has a solution then this may be supposed to be Hermitian.



564 PETER LANCASTER

For we also have

X*A* + AX* = C,

and adding these equations we find that (X + X*)/2 is also a solution and is Hermitian.

THEOREM 1 1. There exists a Hermitian matrix W such that AW + WA* << 0 if and only if A has no pure imaginary eigenvalues.

Proof Suppose first that A has no pure imaginary eigenvalues. We introduce projections P+, P_ defined by A and the parts of v(A) in the right and left half- planes, respectively.

Suppose H >> 0 and consider the equation

-AX - XA* = P+HP*+.

We apply Theorem 5, taking a, as the set of eigenvalues of - A in the left half- plane. Then P+ = P1, U2 = a, and P* = P2. Hence Re(, + [A) < 0 for all i{ e1, I a 2 and we can write

X = - eAt P+HP*e A*tdt.

Similarly it is found that a solution of

AY+ YA* = -P_HP*

is

y = { eAtP_ HP* eA*t dt.

If we define the Hermitian matrix W = X + Y, then adding the equations for X and Y we have AW + WA* = -H1, where H1 = P+HP* + PHP*. We have finished this part of the proof if we can prove that H1 >> 0. But this is the case, for if x E Cn and x #A 0 we can write x = xl + x2, where x = P+x, x2 = P_x. Then

x*Hix = x*P+HP*x + x*P_HP*x = x*Hx1 + x*HX2

and this is positive since H >> 0 and x =A 0 implies that x1 and x2 cannot be zero simultaneously.

Conversely, suppose there exists a Hermitian W such that AW + WA* =-2H and H >> O and that x*A = iblx* with u real and x :# 0. Then A*x =i/x and

i,ux* Wx - i,ux* Wx = -2x*Hx.

But this is impossible because the left-hand side is zero and the right-hand side is negative. So we have a contradiction, and A has no pure imaginary eigenvalue.

Ostrowski and Schneider define the inertia of a matrix M to be the triple of nonnegative integers (7r, v, 3), where ir is the number of eigenvalues of M with positive real parts, there are v with negative real parts, and 3 eigenvalues are pure imaginary. We now complete the Ostrowski-Schneider result with the following theorem.




THEOREM 12. A Hermitian solution Wof AW + WA* = -H, where H >> 0, has the same inertia as -A.

Proof Let T = f(A), where f(z) = (z + a)(z - a)' and a > 0. This is the transformation used in ? 3 and we see that our equation for W transforms to (see (6)):

W- TWT* =-(T- I)H(T* - I). 2a

Since W exists we have from Theorem 11 that A has no pure imaginary eigenvalues and so T has no eigenvalues on the unit circle. Thus T - I is nonsingular and H >> 0 implies that

TWT* - W << 0.

Furthermore, (zI - T)-' exists for all z with jzj = 1. It is easily verified that

TWT* - IZ12W= (zI - T){W- z(zI - T)-'W- zW(zI - T*)}(zI -T*),

and so on the circle jzj = 1 we have

(42) W- z(zI - T)-'W- zW(zI -T*)-1 << 0.

dz

FIG. 2. The unit circle in the complex z-plane

On lzl = 1 we have

zdO = Tdz, z do= dz,

so integrating (42) with respect to 0 from 0 to 27r we obtain Ir I

27rW- -(zI-T)-' dz W + -W (zI -T*)- dz << O, or (43) W - P_ W - WP* << 0, where P_ is the projection defined by

P-= 2V (zI - T) dz.

Thus P_ is the projection defined by T and the part of a(T) inside the unit circle. But this part of v(T) is just the image of that part of the spectrum of A in the left



566 PETER LANCASTER

half-plane. It follows that P- is also the projection defined by A and the part of its spectrum with negative real part. We then deduce that P+ = I - P is the projection defined by A and the part of its spectrum in the right half-plane.

Now, if x E b9(P*), we have P*x = 0, and from (43) we obtain x*Wx < 0. Then if x E W9(P*), we have P*x = x, and this time (43) yields x* Wx > 0. These results imply: (i) that the number of negative eigenvalues of W (counted

according to multiplicities) is equal to the dimension of the range of P* and (ii) the number of positive eigenvalues of W is the dimension of the range of P* . From the above characterization of P_ and P+ we see that the number of negative (positive) eigenvalues of W is equal to the number of eigenvalues of A with positive (negative) real parts. That is, the inertia of W coincides with the inertia of -A.

COROLLARY (Lyapunov theorem). The matrix A is stable if and only if there is a positive definite Wsuch that AW + WA* << 0.

Proof. If we are given that A is stable, then from Theorem 11 we deduce the existence of a Hermitian W with AW + WA* << 0, and from Theorem 12 we deduce that W has the inertia of - A. That is, W is positive definite.

Conversely, if we are given the existence of W >> 0 with A W + WA* << 0, then we apply Theorem 12 at once and see that A must be stable.

REFERENCES

[1] N. DUNFORD AND J. T. SCHWARZ, Linear Operators, Part I, Interscience, New York, 1966. [2] F. R. GANTMACHER, The Theory of Matrices, vol. 1, Chelsea, New York, 1960. [3] W. GIVENS, Elementary divisors and some properties of the Lyapunov mapping X -* AX + XA*,

Argonne National Laboratory, Illinois, 1961. [4] E. HEINZ, Beitrage zur Storungstheorie der Spektralzerlegung, Math. Ann., 123 (1951), pp.

415-438. [5] A. S. HOUSEHOLDER, The Theory of Matrices in Numerical Analysis, Blaisdell, New York, 1964. [6] A. JAMESON, Solution of the equation AX + XB = C by inversion of an M x M or N x N matrix,

SIAM J. Appl. Math., 16 (1968), pp. 1020-1023. [7] M. G. KREIN, Lectures on Stability Theory in the Solution of Differential Equations in a Banach

Space, Inst. of Math., Ukrainian Acad. Sci., 1964. (In Russian.) [8] P. LANCASTER, Theory of Matrices, Academic Press, New York, 19,69. [9] E. C. MA, A finite series solution of the matrix equation AX - XB = C, SIAM J. Appl. Math.,

14 (1966), pp. 490-495. [10] C. C. MACDUFFEE, The Theory of Matrices, Chelsea, New York, 1956. [11] A. OSTROWSKI AND H. SCHNEIDER, Some theorems on the inertia of general matrices, J. Math.

Anal. Appl., 4 (1962), pp. 72-84. [12] M. ROSENBLUM, On the operator equation BX - XA = Q, Duke Math. J., 23 (1956), pp. 263-270. [13] , The operator equation BX - XA = Q with self-adjoint A and B, Proc. Amer. Math.

Soc., 20 (1969), pp. 115-120. [14] W. E. ROTH, The equations AX - YB = C and AX - XB = C in matrices, Ibid., 3 (1952),

pp. 392-396. [15] D. E. RUTHERFORD, On the solution of the matrix equation AX + XB = C, Nederl. Akad.

Wetensch. Proc. Ser. A, 35 (1932), pp. 53-59. [16] R. A. SMITH, Bounds for quadratic Lyapunov functions, J. Math. Anal. Appl., 12 (1965),

pp. 425-435. [171 , Matrix equations XA + BX = C, SIAM J. Appl. Math., 16 (1968), pp. 198-201. [18] A, TRAMPUS, A canonical basis for the matrix transformation X -* AXB, J. Math. Anal. Appl.,

14 (1966), pp. 153-160. [19] - A canonical basis for the matrix transformation X -* AX + XB, Ibid., 14 (1966), pp.

242-252. [20] J. H. M. WEDDERBURN, Note on the linear matrix equation, Proc. Edinburgh Math. Soc., 22

(1904), pp. 49-53.



Date post:	15-Jan-2017
Category:	Documents
Upload:	peter-lancaster
View:	217 times
Download:	1 times

Explicit Solutions of Linear Matrix Equations

Documents