New algorithm for analog optical matrix inversion

New algorithm for analog optical matrix inversion

David Casasent and John-Scott Smokelin

A new matrix inversion algorithm is described. It provides a meaningful estimate of the inverse A-' of a

matrix A on an analog optical processor in a reduced calculation time (compared to other methods). The new

nested iterative algorithm has no convergence conditions on the matrix and requires fewer operations than

prior iterative neural net and other algorithms.

I. Introduction

We consider a new algorithm for matrix inversion onan analog processor. The algorithm is easily modifiedto allow the solution of a set of linear algebraic equa-tions. With these basic operations and the matrix-vector multiplier used, nonlinear matrix equations canalso be solved. Our present concern is only matrixinversion using an analog (low-accuracy) processor(optical or analog very-large-scale integrated or fixed-point digital). Many direct or decomposition algo-rithms for matrix inversion and the solution of linearalgebraic equations exist,1-3 but they generally requirehigh-accuracy processors and are not tolerant of errorsin individual matrix-vector products or vector innerproducts. We thus consider iterative algorithms sincethey correct individual errors, avoid accumulation ofroundoff errors, and allow a solution accurate to 1% (ina mean-squared error sense) on a 1% accuracy analogprocessor. We use 1% accuracy, since it is the bestaccuracy one can expect on analog processors.

The new aspect of this work is an algorithm that ismore efficient in the number of operations it requires.It utilizes several preconditioning stages each followedby an iterative matrix-vector multiplier. Section IIpresents an overview of the algorithm. Another newaspect of our work is that the algorithm is useful forany matrix without restrictions on its structure (posi-tive definite, etc.). This is believed to be necessary onan analog processor in which the analog representation

The authors are with the Center for Excellence in Optical Data

Processing, Department of Electrical and Computer Engineering,Carnegie Mellon University, Pittsburgh, Pennsylvania 15213.

Received 27 September 1990.0003-6935/91/233281-07$05.00/0.© 1991 Optical Society of America.

of the matrix with expected component errors andaccuracy is still not expected to result in a positivedefinite, etc. matrix being recorded. We use the itera-tive Richardson or simultaneous displacement algo-rithm recently described4 as a neural net energy mini-mization problem. Section III analyzes itsconvergence-rate. To reduce the number of iterationsand hence the calculation time required, we also usethe algorithm to obtain an estimate of the matrix in-verse to precondition the matrix to reduce its conditionnumber and hence improve the convergence rate of thealgorithm (Section III). The combination of severalpreconditioning steps and iterative solutions is prefer-able to other5 preconditioning methods and to an itera-tive solution alone. We do not consider ridge regres-sion techniques6 to reduce the condition number of thematrix since they change the original problem beingsolved. We do not consider bimodal, etc. processorssince we found7 them not to be competitive in terms ofthe number of operations required. Section IV quan-tifies the number of preconditioning stages (k) usedand the number of iterations (N) per stage. Section Vpresents simulation results to show that the algorithmachieves 1% accuracy in the matrix-inverse solution ona 1% analog processor. Section VI provides a discus-sion.

II. Iterative Multistage Preconditioning

Figure 1 shows a block diagram of the steps in our newalgorithm. We denote matrices (vectors) by upper(lower) case underlined letters, which are boldfaced inthe text. One basic step in the internal iterative algo-rithm involves estimating the condition number C ofthe matrix. We use the trace of the matrix as an easilycalculated weak upper bound on the maximum eigen-value Xmax. We use C to estimate the k and Nvalues touse (k is the number of outer loops or preconditioningstages and N is the number of iterations used in eachinner loop). From this, we specify a fixed number ofiterations N1 (in stage 1) in our iterative algorithm

10 August 1991 Vol. 30, No. 23 APPLIED OPTICS 3281

-|~~~~~C Esie Estimate nube e,A --P-1 _ EstimateC 1---61 of iterations. N, I

L IIhrdcd -PRECONDMON -I

L_| Form AP,] 2 furher reduce C . . .

PRECONDITION - k

AT

Fig. 2. Opticalrithm.

realization of the iterative matrix-inverse algo-

Calculate final A" estimatePP2... Pkk1 -- A

Fig. 1. Block diagram of our nested iterative matrix-inverse algo-rithm.

(Section III) to obtain an estimate P = Al-' of theinverse of the matrix A. The first three boxes showthese steps in the first preconditioning stage (precon-dition-1) of our algorithm. We then use P, to precon-dition A to obtain the new matrix AP, = A2 withreduced condition number (C2 < C). We then oper-ate again on this new matrix (precondition-2) to obtaina new estimate of A-'. These stages continue and thefinal A-' estimate is given by the matrix product of theindividual estimates Pk. We now detail (Section III)and analyze the various portions of this algorithm.

Ill. Iterative Matrix-Vector Solutions: Analysis

1. Iterative Pseudoinverse Algorithm

An attractive iterative algorithm to solve Ax = b for xis the Richardson algorithm

x(n + 1) = x(n)-w[Ax(n)-bI, (1)

where n is the iteration index and w is the accelerationparameter or step size parameter. For convergence, co< 2/Xmax (where Xmax is the dominant eigenvalue of A).This algorithm is easily modified as

X(n + 1) = X(n) - w[AX(n) - I] (2)

to solve for A-', where X is now a matrix and I is theidentity matrix. The steady-state solution is X = A-'.However, for convergence, A must be positive definite(all the eigenvalues of the matrix A must be strictlygreater than zero). We desire a solution with no re-strictions on A. To achieve this, we use neural netconcepts of energy (error) minimization. Specifically,we minimize the mean-square error energy function

E = (1/2)IIAX -I112, (3)

where | ( ) || defines the Frobenius (Euclidean) matrixnorm that measures the squared error in the differ-

ence, AX - I. Other measures besides mean-squareerror can be used and may be considered more usefulwhen the problem is ill-conditioned. However, thisenergy function requires less restrictive convergenceconditions on the matrices and is thus attractive. Us-ing this measure the minimum E occurs when AX = Ior X = A-' as desired. To obtain this, we let X evolvein time in the direction of the negative gradient of E,i.e.,

dX/dt = -qaEIOX = -[ATAX(n) - AT]. (4)

This is the standard Hopfield neural net conceptwhere the change in E with time is dE/dt = (E/

X) (dX/dt), which converges. When the derivative inEq. (4) is zero, E is minimized and this occurs when X =(ATA)-1AT = A-'. The iterative algorithm to be usedis thus

X(n + 1) = X(n) - w[ATAX(n) - AT, (5)

which is similar to Eq. (2) with X = qAt (where At is thediscrete time step size).

This form of the algorithm in Eq. (5) was derivedearlier.4 It is easily realized on an optical matrix-vector processor as in Fig. 2. The Pi data are thelexicographically ordered vector version of X, the P2matrix M is -DcATA [with the first X(n) term in Eq. (5)included as a set of ones added to the diagonal ele-ments of the P2 matrix], and the external vector (at P4)added to the matrix-vector output at P3 is the vectorversion of wAT. This architecture thus forms a ma-trix-vector product (of the P2 matrix times the Pivector) plus an added external vector (at P4). It thusimplements the right-hand side of Eq. (5) that is fedback to produce the new P, input X(n + 1) at the nextiteration. A practical and attractive aspect of thisalgorithm is the block Toeplitz structure of the P2matrix, which makes a realization of the system andalgorithm using acousto-optic devices at P, and P2attractive. This was fully detailed earlier4 and thus weonly note it here for completeness. We also note thatthis acousto-optic realization easily partitions to allowlarge matrices to be processed. Our present concern isthat this algorithm is preferable to the standard Rich-ardson algorithm in Eq. (2) since it does not impose thepositive-definite restriction on A and hence it is suit-

3282 APPLIED OPTICS / Vol. 30, No. 23 / 10 August 1991

able for use with any matrix A (specifically ill-condi-tioned matrices). This is the iterative pseudoinverse(or Moore-Penrose inverse) algorithm, since it solvesfor the generalized inverse [ATA]-IAT. Although thisalgorithm does not require conditions (e.g., positivedefinite) on A for convergence, a unique solution willnot be obtained unless the columns of A are linearlyindependent (otherwise ATA will be singular). Thus,in general ATA will be a nonnegative-definite matrix(its eigenvalues are greater than or equal to zero) andwill be positive definite only if the columns of A arelinearly independent. We note this for completenessand do not concern ourselves with the case of unique-ness (rather we desire an A-' solution with a smallmean-square error).

Without some nonlinearity at P3 in Fig. 2, the archi-tecture and implementation is an iterative matrix-vector processor and not a neural net. The P3 nonlin-ear operation originally suggested4 is to restrict the Xelements to lie between 0 and 1 (or ±1) by truncatingsmall and large elements. We found that this was noteasily implemented because no obvious scaling existsfor the P1, P2, and P4 data since the final P 3 output(A-') has elements whose values exceed unity even foran original matrix A with all elements scaled to beunity or less. When a nonlinearity was used at P3, thealgorithm did not converge. Hence, we do not employany P3 nonlinearity and thus the architecture and algo-rithm are not (strictly speaking) a neural net.

The only parameter required in the algorithm in Eq.(5) is w. For convergence, we require w < 2/Am (whereXm is now the maximum eigenvalue of ATA). Since bydefinition ATA is nonnegative definite for any matrixA, there is no restriction on A as in Eq. (2). Calculat-ing Xm is computationally intensive and thus we ap-proximate it by the trace (Tr) and use

w = 2/[Tr(A T A)]. (6)

In the first box in our preconditioning algorithm (Fig.1) we thus use the trace to estimate the conditionnumber C of ATA and to choose w in Eq. (5) as in Eq.(6). This is a good upper bound as we quantify inSection VI.

2. Convergence Rate Analysis

In exercising the algorithm in Eq. (5) on various matri-ces, we noted that many iterations were required forconvergence within 1% of the solution. Thus, thenumber of multiplications required exceeds O[N3], thetypical range required in direct decomposition algo-rithms (where N is the one-dimensional dimension ofthe matrix), and the algorithm is not attractive (al-though it requires only analog accuracy). The majorpurpose of this work is to reduce the number of itera-tions required. We begin by analyzing the conver-gence of the basic Richardson algorithm. The follow-ing analysis is adapted from Ref. 8 for our currentproblem in Eq. (5).

We first rearrange Eq. (5) as

X(n + 1) = [I - wATA]X(n) + wAT. (7)

We then subtract A-' from both sides of Eq. (7) andrearrange it as

X(n + 1) - A-' = [I - ATA][X(n) -A`]. (8)

If we denote the initial estimate as X(0) and note thatthe error [the left-hand side of Eq. (8)] improves by afactor of [I - wATA] at each iteration n, we can write 8

X(n) - A' = [I - wATAPn[X(O) -A- ]. (9)

This form provides the information we desire on theconvergence rate of the algorithm. The last factor[X(0) - A-'] on the right-hand side is the initial errorat the start (iteration n = 0). The left-hand side is thefinal error after iteration n. As seen, the original erroris reduced by a factor of [I - wATA] on each iteration.Since this is less than unity, the right-hand side de-creases as n increases and X(n) approaches A-'. Toquantify the difference error at different iterations, weform the matrix-2 norm of both sides of Eq. (9), and weuse the Cauchy-Schwartz inequality

IIX(n) -A- 11 < III - wATAIInIIX(O) - A-'11

S [1 - 1/CA2nJIX() - A 1| (10)

(where the second expression follows from the symme-try of ATA, the fact that III - wATA1I is the largesteigenvalue of [I - wATA], and that w = 1/Amax), whereCA2 = Xmax/Xmin is the condition number of ATA or thesquare of the condition number of A. To obtain acomparison measurement that is independent of thesize of A (note that the norm of a matrix increases withthe matrix size), we divide both sides of inequality (10)by the norm of A-' to obtain

JIX(n) - A'II < [1- 1/CA 2]n IIX(O) - A'IIIIA-111 - IIA_1lI

(11)

The left-hand side of inequality (11) measures theaverage percent error in all the elements of our com-puted solution for A-'. When we refer to a 1% errorsolution, we mean that the left-hand side of inequality(11) is 0.01, i.e., the sum of the squares of the differenceof the elements of our solution X and the exact solutionA-' divided by IIA-1II is less than or equal to 0.01.

From inequality (11) we see that the condition num-ber C of A determines the rate at which the algorithmapproaches a solution. The left-hand side of inequal-ity (11) is the fractional error in the solution as afunction of iterations n. For larger C, the improve-ment factor per iteration [1 - 1/C2 ] is smaller and thusfor large C many iterations n will be required. Theinitial estimate X(0) also affects the number of itera-tions n required, but not the rate of convergence. Inspecific cases, good estimates X(0) are possible [we donot consider the initial X(0) choice]. To quantify thenumber of iterations required for a 1% accuracy solu-tion [the left-hand side of inequality (11) equals 0.01],we assume an error of 50% in the initial estimate X(0)[the last factor on the right-hand side of inequality (11)is 0.5]. If the condition number of A is C = 100, theerror will decrease by a factor of 0.9999 with eachiteration and 4 X 104 iterations are required to reach a

10 August 1991 / Vol. 30, No. 23 / APPLIED OPTICS 3283

final percent error of 1%. Specifically, (.9999)n = (1-10-4)n- 1 - n(10-4) + [(n2 - n)/2!] (10-8) + . . , whichequals 0.01 if n 4 X 104. Note that with n large, wemust keep many of the higher-order terms in the bino-mial expansion to be accurate. Each iteration re-quires N4 multiplications (a matrix-vector productwith the N2 X N2 version of the matrix ATA and the N2-dimensional vector version of X). Thus, reducing thenumber of iterations is vital for this algorithm to becompetitive. This is our present concern. It is alsorelated to the number of iterations required in variousneural net optimization algorithms.

3. Preconditioning Estimation

From inequality (11) we see that we must reduce C toreduce n. The method we use to achieve this is matrix-inverse preconditioning. Specifically, we calculate anestimate Al-' = P of A-' and multiply the originalmatrix A = AO (at iteration zero) by this to obtain amatrix A0P1 with a reduced condition number. Thecondition number depends on how good the estimateAl-' of A-' is. Many matrix-inversion precondition-ing techniques exist,2 but we desire one that yields onlyan approximate solution for A-', one that has no re-strictions (positive definite) on the matrix, and onethat allows use of an analog processor. Iterative ma-trix-inverse estimation algorithms are thus preferable.Previous iterative inverse estimation techniques5 forpreconditioning required the matrices to be symmetricand positive definite. The preconditioning methodwe use is just our original algorithm in inequality (11).Specifically, we operate the algorithm in inequality(11) for a specified number of iterations N, and use itsoutput X(N) = Pi as an estimation Al-' of the matrixinverse.

4. Preferable New Nested Iterative Algorithm

We thus use P, to precondition the original matrix A toform A2 = AP,, which has a smaller condition numberC2. We then use our algorithm in inequality (11) onthis new matrix for a fixed number of iterations N2 toobtain an estimate A2 - = P2 of (AP,)-'. We continuethis procedure using k iterative outer loops (each cal-culates one of k matrix inverses or preconditioningmatrices Pk using our iterative algorithm). Each ofthese outer loops uses Nk inner loop iterations. Figure1 shows this sequence. The precondition-i cycle (k =1) calculates Pi. The precondition-2 cycle (k = 2) usesAP, as the matrix and calculates P2, etc. The finalestimate of A-' is simply the product of the Pk (last boxin Fig. 1). For example, after iterating on AP,, weobtain P2 and we form the product PP 2 = PA 2 -' -Pi[APi] -1 = PP,-'A- = A-' and thus obtain a betterestimate of A-'. In general (after k cycles), our A-'estimate is

PY2 ... PkPk+l.-.. P1P2 ... P[APP 2... Pk]

= PP2 ... PkPk . . . P-'A-' = A'. (12)

We refer to this new algorithm as a nested iterativealgorithm since it involves use of the original iterative

algorithm in an inner loop (index nk), perturbation ofthe matrix (by preconditioning) in an outer loop (indexk), and repeating the inner iterative loop for new ma-trices (with reduced condition numbers C).

To describe the convergence of this algorithm, wemultiply both sides of inequality (10) by A and use theidentity matrix X(0) = I as our initial estimate of X.The error norm after the kth cycle (assuming a con-stant N = nk inner loop iterations per cycle k) satisfies

IIAX(n) - III S [1- Ck]2N[1 - Ck2]N... [1 - C- 2]NIIAo _ I||,

(13)

where Ak is the kth preconditioned product matrix, Ckis its condition number, Ao is the original matrix A, andCO is its condition number. The condition number Ckdecreases in each cycle k, and the preconditioned prod-uct Ak = A PI ... Pk approaches the identity matrix.Thus, the algorithm converges. In successive outerloops k, the improvement factor [1 - Ck-2 ] decreases(improves) significantly (since Ck becomes much less)and the error in the final answer improves dramatical-ly. From inequality (13) we can quantify the reducednumber of iterations possible for a specific matrix.For C0 = 100 and N = 150 inner loop iterations, fourouter loops (k = 4) reduce the original error (assumedto be 0.5 or 50%) to (0.9759)150(0.9984)150(0.9997)150-(0.9999)1500.5 = 0.0095 or better than 1% accuracy.This is a total of kN = 4(150) = 600 iterations in ournested algorithm compared to 40,000 iterations re-quired in the original algorithm to achieve the same0.01 error. Thus our new algorithm significantly re-duces the total number of iterations required (by afactor of 66 in this example).

We now quantify how Ck reduces as k increases.With min = 0.2 and Xm,0 = 2.0 for the first k = 1iteration cycle, the initial condition number is Co =100, [1 - Co-2 ] = 0.9999 and after N = 150 iterations,we have reduced the error by only a factor of [0.9999] 150= 0.9851 to (0.9851)(0.5) = 0.4925 after the first k = 1cycle. The condition number of the new A P = A,matrix is less (C = 57.4), but the improvement factorper iteration during the second k = 2 cycle is onlyslightly less (0.9997). After 150 iterations the error isreduced by an additional factor of only (0.9997)150 =0.9560 or by a total factor of (0.9560)(0.9851) = 0.9417.In the k = 3 cycle, C2 = 25.2 and the improvementfactor after 150 iterations is better, (0.9984)150 =0.7865. In the final k = 4 cycle, C3 = 6.4 and the error isnow significantly reduced after 150 iterations by thevery significant factor of (0.9759)150 = 0.0258. Thus,the fractional amount by which the solution error im-proves increases with each outer loop iteration, withthe increase being very significant in the last iteration.Since the preconditioned product Ak in inequality (13)approaches the identity matrix as k increases, the ei-genvalues of Ak must likewise approach unity. As aresult, the condition number (maxIX/in) becomessmaller and C approaches 1 as the algorithm con-verges. We now quantify the iterative performance ofour algorithm and the choice of k and Nk for matriceswith different condition numbers C.


IV. Nested Algorithm Performance

We now quantify the performance of our nested (k-stage) algorithm compared to the single-stage originalalgorithm. We also discuss the choice of k and Nk. Inboth algorithms, we used I = X(0) as the originalestimate and Tr[ATA] = Xmax in determining w. Table1 shows the results obtained for four different matriceswith different condition numbers C. Each matrix wasof size 3 X 3 with no specific matrix structure or eigen-value distribution. The matrices generated had ran-dom elements obtained from a random number gener-ator. Thus, their condition numbers also varied. Wecalculated the condition numbers for about fifty suchmatrices and used six different matrices in our testswith C varying from 9.2 to 135. In Table 1 we list thecondition number C (of the original matrix A) and thedata (number of iterations required) for both algo-rithms for four matrices with different C values. Forour nested algorithm, we used k = 4 and an equalnumber of iterations N (column 2) for the iterativeinner loop in each of the k cycles (we increased N from15 to 200 as the initial C is increased). We list the totalnumber of iterations (columns 3 and 4) required inboth algorithms. For the nested algorithm, the num-ber of iterations on the last (k = 4) cycle was terminat-ed when the error in the final answer was <0.01 or 1%(thus we use less than kNiterations). As seen, our newnested algorithm offers a considerable improvement.The improvement factor becomes larger (from 7.4 to

50) as C increases. Thus, our algorithm is moreadvantageous for problems with larger C; these ill-conditioned problems are specifically those that re-quire such a new algorithm.

From inequality (13) we can determine the numberof iterations k and N required to achieve a given per-cent error (0.01) for a given initial condition numberC0 . However, this requires knowledge (or approxima-tion) of Xmax = Tr(AJ) and Xmin = Ck,'Tr(Ak) andhence requires Tr[A] and not merely C. We can useinequality (13) to obtain an estimate of the total num-ber of iterations kN (matrix-vector multiplications)required for different choices of k and N. To provideinitial guidelines, we varied k and N in our nestedalgorithm for four matrices with different conditionnumbers. The size of each matrix was 3 X 3 and all theresults were obtained by simulation. As before, theiterations on the last kth cycle were terminated whenthe error was below 0.01 or 1%. Table 2 summarizesour results. Many other N choices were used (besidesthose shown) for each k choice and each C value. Wealways reduced N as we increased k and, in general, we

Table 1. Performance of Nested and Standard Iterative Algorithms forI% Solution Accuracy

Condition Nested Algorithm (k = 4 Stages) Standard AlgNumber C N = Iter/Stage Total No. Iters Total No. Iters

9.2 15 50 37018.3 25 80 143538.1 100 333 657995.2 200 611 30000

reduced Nby more than we increased k since we expectk > 1 or 2 to be best. For simplicity, we used the samenumber of iterations N in the inner loop in each cycle k.

From these tests we find that increasing k and de-creasing N are preferable since fewer total iterationsare required. This is apparent for the C = 95.2 datawhere we needed N = 500 for convergence (with a 1%error) with k = 2 and with only N = 20 for convergencewith k = 8. This reduces the total number of iterationsfrom 888 to 151. We consistently found the largestimprovement to occur as k was increased to 2,3, and 4.In general, we found that, for k > 5, there was a trendfor the improvement (total number of iterations) tolevel off. This is expected since, as N decreases, theinverse estimates are less accurate and the advantageof preconditioning is reduced. The initial estimate[X(0) = I] somewhat affects these results since it is abetter estimate of A-' for the C = 95.2 and C = 135matrix cases than for the other cases. To quantify thisfor the four cases (C = 18.3, 38.1, 95.2, 135) in Table 2,we calculated the scaled norm IIX(0) -A-' II/A-I ofthe initial estimate and found it to be 0.936, 1.023,0.995, and 0.996, respectively. Values further fromunity indicate a larger initial error. More extensivetests are needed to remove the effect of the initial X(0)= I choice on these data. However, the data for the C= 18.3 and C = 38.1 cases have a similar initial errorand the initial error in the other two cases is also nearlyidentical. We note that there is an N value below

Table 2. Effect of k and N on Performance of the Nested Algorithm

Cond. No. Stages (k) Iter/Stage (N) Total Iterations

18.318.318.318.318.318.318.3

38.138.138.138.138.138.1

95.295.295.295.295.295.295.295.295.2

135135135135135135135

1234568

123456

12345678

10

123458

11

50159654

17070504035

500150704535252016

800200100603020

723884533302730

3128297187166169180

30,000888367250202182161151155

40,0001333511335262217220

/ Vol. 30, No. 23 / APPLIED OPTICS 328510 August 1991

0.585 0.819 0.254

0.768 0.033 0.788

0.977 0.997 0.875 4(a)

4.294 2.629 -3.614

-0.555 -1.496 1.509

-4.162 -1.231 3.459

(b)

4.247 2.595 -3.583

-0.539 -1.483 1.501

L4.118 -1.204 3.430

(C)Fig. 3. Test results for our nested iterative matrix-inverse algorithm on a 1% analog processor: (a) original matrix, (b) exact inverse, (c)calculated inverse.

which the k value required increases making kN largerand forcing the total number of iterations to increase,e.g., N = 4 and k = 8 for the C = 18.3 matrix; N = 35 andk = 6 for the C = 38.1 matrix; N = 16 and k = 10 for theC = 95.2 case; and N = 20 and k = 11 for the C = 135case.

Future work (Section VI) is necessary to determinethe best N, k, and X(0) choices and whether to vary Nkwithin each of the k outer iterative loops. We prefer toestimate C0 and from this to fix k and Nk. This avoidsthe added calculations in measuring X(n + 1) - X(n)for each iteration, deciding on some difference for thisat which to stop the iterations, and relating this differ-ence to the error in the final solution.

V. Simulation Results

To verify our new nested iterative algorithm and spe-cifically its performance on a low-accuracy analog opti-cal processor, we considered inversion of the matrixshown in Fig. 3(a). It has no structure and its condi-tion number is C0 = 18.3 (the first matrix used in Table2). We used a simple model of the analog opticalprocessor of Fig. 2 in which a new uncorrelated Gauss-ian random variable with variance a = 0.003 (thismodels 1% noise with 99% confidence) was added to theoutput P3 vector and the results for each output ele-ment were truncated to 7 bits (128 levels of -1% accu-racy) before feeding the P3 output back to P, for thenext iteration. We used a nested iterative algorithmwith k = 4 and N = 9 (the same number of iterationswas used for each inner iterative loop associated witheach outer loop or cycle k). The exact matrix inverse isshown in Fig. 3(b). The matrix inverse obtained onthe analog processor after a total of 34 iterations (N = 9iterations in the first k = 3 cycles plus 7 iterations inthe last k = 4 cycle) is shown in Fig. 3(c) (this repre-sents the matrix inverse obtained with our algorithm).The average percent error of the result is 1.06%. Thiserror is the left-hand side of inequality (11), the normof the difference [X(n) - A-'] divided by the norm ofthe exact solution A-'. Our theory (Section IV) forour algorithm (Section III) predicts 1% accuracy withN = 4 and k = 9 in 33 total iterations (the fourth entryin Table 2), in excellent agreement with the resultsobtained here in Fig. 3.

VI. Discussion

Prior neural net and other iterative algorithms to cal-culate the inverse of a matrix were limited to use on

positive-definite matrices or required many iterations.The new nested algorithm described is suitable for usewith any matrix and significantly reduces the numberof iterations required. The algorithm is suitable forimplementation on analog low-accuracy processors(optical, analog very-large-scale integrated, or fixed-point digital processors) and provides an accuracyequal to that of the processor without roundoff errorsand with the ability to correct errors in intermediatesteps. Although we assumed nearly the best analogprocessor accuracy to be expected (1%), our resultsshould be useful for lower-accuracy processors sincewe considered cases where the condition number of thematrix is larger than the reciprocal of the processoraccuracy. Further results are needed to quantifythese conclusions.

Many variations of the basic algorithm are possible.We have presented its use for the case of general matri-ces. By using our nested algorithm with A and Eq. (2)rather than ATA and Eq. (5), improved performanceresults (since the condition number of ATA is thesquare of that for A). However, matrix A must bepositive definite and must remain so even when re-corded to limited (1%) analog accuracy. By applyingour nested algorithm to Eq. (1), it can be used toprovide improved solutions to systems of linear alge-braic equations and nonlinear matrix equations. Var-ious improvements are possible in the basic algorithm.We now discuss several of these.

We used X(0) = I as our initial choice. When otherinformation is available and in certain specific applica-tions, better X(0) choices are possible [e.g., in adaptivephased array radar, the inverse of the prior estimate ofthe noise covariance would be used for X(0)]. If thematrices being processed are symmetric, that informa-tion can be used to choose an X(0) that is symmetric.In all our initial tests, we used X(0) = I, even forsubsequent k outer loops. This results in the compactform in inequality (13) for the final error. Fasterconvergence in a smaller kNtotal number of iterationsmight be expected if we used the A,-1 estimate ob-tained at the prior k iteration as the initial Xk+1(0)value for the next (k + 1) outer loop iterations. Thisrequires further analysis.

We used the same number of iterations N withineach inner loop. One can vary Nk within each outerloop k to reduce the total number of iterations (kN).Extensive simulation tests can be used to provide suchguidelines for matrices with different C values. The


Table 3. Comparison of Tr[ATA] and Xmax

Condition Calculated Estimate fromNumber C Amax Trace

18.3 4.493 5.00438.1 2.105 2.26095.2 1.242 1.600

135 3.172 3.571

empirically determined k and Nk values can then beused with matrices of comparable C values. Our testsin Table 2 indicated that a maximum of k = 4 appearssufficient. Thus, such an off-line search of k and Nk isnot unreasonable. To employ any such rules, a good Cestimate must be obtained. Use of an iterative algo-rithm to estimate C is possible 5 since C Ail 11 A-lI,where || ( ) is any matrix norm. Thus, from A and anestimate of A-', we can calculate the norms and henceestimate C. We can estimate Xmax from the trace. If agood estimate of Xmin can be obtained (without exten-sive matrix-vector multiplications as in the powermethod), C could be estimated. No such algorithms toestimate Xmin accurately are known.

We now consider the accuracy of the estimate of Xmaxfrom the trace, since this is used to select the accelera-tion parameter w in the gradient descent algorithm.The estimate Xmax = Tr(ATA) is a weak bound. Totest its accuracy, we used the four matrices in Table 2(which have a wide range of condition numbers). Wecalculated the exact Xmax and the Tr(ATA) for each.The values obtained are listed in Table 3. The trace isalways larger than Xmax (since it is the sum of all theeigenvalues). For the examples shown, the trace esti-mate is -10% high and thus the X step size used is-10% below the optimum and -10% more iterationsare used. The ease with which the trace estimationcan be calculated makes it quite suitable. The estima-tion is quite good (10% high). The accuracy of theestimation will not always be 10% but will depend onthe distribution of the eigenvalues. For matrices withlarge C, we expect the estimation to be better (sinceXmax dominates when C is large and since the trace isthe sum of all the eigenvalues). With sufficient casetesting, we need not recalculate cw for each outer loop k.Rather, we can calculate the trace once, use larger w atlow k = 1 and 2 iterations, and then the proper w for k =3 and 4 iterations (with the c value calculated from theestimated reductions expected in C). Since the eigen-

values of ATA are all nonnegative, the trace estimate ofAmax is better than it would be for A (since several of itseigenvalues are negative for the examples in Table 3).

In a full comparison with other algorithms and tech-niques, the speed of an analog versus a digital vectorinner product must be considered together with all theoperations required in the algorithm (specifically, thematrix-matrix multiplications of the inverse estimatesP, in Fig. 1) and data storage (analog storage is re-quired in this and any analog processor). Initial re-sults are promising, and we hope that future work willaddress practical speed issues, comparisons to otheralgorithms, various applications, and a lengthy casestudy of the k and N values to be used for various Cvalues. The nested iterative algorithm appears to beof general use in solutions of all nonlinear matrix equa-tions and in any iterative algorithm (since all iterativealgorithms have a convergence rate that is inverselyproportional to the condition number of the matri-ces2). The specific iterative algorithm into which it isbest to embed our algorithm depends on specific as-pects of the matrix structure, symmetry, and condi-tions such as positive definiteness.

We gratefully acknowledge the support of the De-fense Advanced Research Projects Agency monitoredby the U.S. Army Missile Command.

References

1. G. F. Golub and C. F. VanLoan, Matrix Computations (JohnHopkins U. Press, Baltimore, Md., 1983).

2. J. W. Hageman and D. M. Young, Applied Iterative Methods(Academic, London, 1981).

3. G. Strang, Linear Algebra and Its Applications (Harcourt BraceJovanovich, San Diego, Calif., 1988).

4. E. Barnard and D. Casasent, "Optical neural net for matrixinversion," Appl. Opt. 28, 2499-2504 (1989).

5. A. Ghosh and P. Paparao, "High speed matrix preprocessing onanalog optical associative processors," Opt. Eng. 28, 354-363

(1989).6. D. Casasent and A. Ghosh, "Reduced sensitivity algorithm for

optical processors using constraints and ridge regression," Appl.Opt. 27, 1607-1611 (1988).

7. E. Pochapsky and D. Casasent, "Hybrid digital-optical proces-sors: a performance assessment," in Real-Time Signal Process-ing XII, J. P. Letellier, ed., Proc. Soc. Photo-Opt. Instrum. Eng.1154, 254-274 (1989).

8. D. Casasent, A. Ghosh, and C. P. Neuman, "A quadratic matrixalgorithm for linear algebra processors," J. Large-Scale Syst. 9,

35-49 (Sept. 1985).

10 August 1991 / Vol. 30, No. 23 / APPLIED OPTICS 3287

Date post:	05-Oct-2016
Category:	Documents
Upload:	john-scott
View:	214 times
Download:	0 times

New algorithm for analog optical matrix inversion

Documents