Maximum Likelihood Reconstruction 2

8/2/2019 Maximum Likelihood Reconstruction 2

1/15

200 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 12, NO. 2, JUNE 1993

Squares,aximum Likelihood,and Penalized Least Squares for PETLeastLinda Kaufman

Abstract- The EM algorithm is the basic approach used tomaximize the log likelihood objective function for the reconstruc-tion problem in PET. The EM algorithm is a scaled steepestascent algorithm that elegantly handles the nonnegativity con-straints of the problem. We show that the same scaled steepestdescent algorithm can be applied to the least squares meritfunction, and that it can be accelerated using the conjugategradient approach. Ou r experiments suggest that one can cutthe computation by about a factor of 3 by using this technique.Our results also apply to various penalized least squares functionswhich might be used to produce a smoother image.I. INTRODUCTION

OSITRON emission tomography (PET) is used to studyP lood flow and metabolism of a particular organ. Thepatient is given a tagged substance (such as glucose for brainstudy) which emits positrons. Each positron annihilates withan electron and emits two photons in opposite directions.The patient is surrounded by a ring of detectors, which arewired so that whenever any pair of detectors senses a photonwithin a very small time interval, the size of which is system-dependent, the count for that pair is incremented. In a matterof minutes, several million photon pairs may be detected.The reconstruction problem in PET is to determine a mem -ory map of the annihilations, and hence a map of the bloodflow, given the data gathered by the ring of detectors. Thereare two main approaches given in the literature: convolutionbackprojection [28], which was originally devised for CAT;and the probability matrix approach, which better capturesthe physics of the positron annihilation, but in practice hasnot been as popular as convolution backprojection. There aretwo main arguments usually leveled against the probabilityapproach. In the first place, the images are often speckled, andsecondly, they can be expensive to produce.There have been various proposals for different merit func-tions-maximum likelihood (ML) [31], least squares (LS ),maximum a posreriori-intended to give a better image.Various smoothers have been proposed, which tend to considernearest neighbor interactions (see Green [9], Hebert and Leahy[ 111, Lange [191, Geman and McClure [7], and Levitan andHerman [22]). These smoothing techniques also choose aparticular solution when there is no unique one with the ML orLS approaches. A disadvantage is that they have parameterswhich must be determined. Herman and Odhner [12] havearrested some of the controversy over the desirability of some

Manuscript received September 19, 1991; revised July 23, 1992.The author is with AT&T Bell Laboratories, Murray Hill, NJ 07974.IEEE Lo g Number 9208162.

of the merit functions by showing that the suitability of anapproach depends on the medical application, and sometimesthe speckling is inconsequential.The EM algorithm proposed in [29] and [21] is the basicapproach for solving the ML problem. Techniques have beensuggested for speeding up each iteration of the EM algorithmby taking advantage of the fact that the algorithm is well suitedfor parallel computation [4], [24], [131, and by decreasing thenumber of unknowns by multigridding and adaptive gridding[27], [26]. Various people have suggested treating the steps ofthe EM algorithm as a direction, and then using an inexactline search to speed up convergence. The EM algorithm is ascaled steepest ascent algorithm. The scaling is a very goodway to incorporate the nonnegativity constraints. However,as a steepest ascent technique, it is a linearly convergentscheme. Steepest ascent algorithms are notorious for goingacross steep canyons rather than along canyons, and for takingvery small steps whenever the level curves are ellipsoidal.Using a line search usually improves the rate of convergence,but the algorithm is still linearly convergent. However, onecan create a superlinearly convergent scheme using the ideasof the conjugate gradient algorithm. The conjugate gradientalgorithm uses a linear combination of the current step andthe previous one to create directions which are A orthogonalwhere A is an approximation to the Hessian matrix. It tendsto go along canyons. Tsui et al. [30] have used the conjugategradient algorithm with the least square objective function,and showed that in one setting, LS-CG was ten times fasterthan ML-EM.In Section 11, we develop the EM algorithm for boththe maximum likelihood and least squares merit function,and we show that EM for ML is equivalent to applyingEM to a continually reweighted least squares problem. TheKuhn-Tucker conditions which are used to develop the EMalgorithm elegantly incorporate the nonnegativity constraints.Most algorithms that have been used for the LS problemeither do not incorporate the constraints (see [30]), includethem more as an afterthought (see [16]), or force one todecide whether a variable is small or 0 (see [2] and [17]).Moreover, the same techniques apply to various smoothingpenalty functions. Using the same type of technique withvarious merit functions eliminates some of the factors thattend to obscure the issue of determining which, if any, is thebest merit function.In Section 111, we discuss ways of accelerating the EMalgorithm for least squares computation. Our techniques aresimilar to those discussed in [151 for the maximum likelihood

0278-0062/93$03,00 0 993 IEEE


2/15

KAUFMAN: ML, LS, AND PENALIZED LS FOR PET 201

function. However, since differences in function values canbe computed more easily for LS than for ML, accelerationtechniques, based on function differences, should be muchmore acceptable to the medical imaging community thanthese same techniques were when applied to the maximumlikelihood function in [15]. Like [30], we turn to the conjugategradient algorithm, but we suggest a preconditioned conjugategradient algorithm (PCG) based on the scaled steepest descentalgorithm in order t o take into consideration the nonnegativityconstraints that they ignore. Our algorithm is similar to theone proposed by Kawata and Nalcioglu [17], but we givea bit more freedom in choosing a diagonal scaling matrix.We show that with little modification, the algorithms canbe used with a merit function with a smoothing penaltyterm. We also suggest that the scaling in the EM algorithmmight not be optimal, especially if it is used in a multigridsetting.In Section IV, numerical results are given. In general,there is little difference between the images produced usingEM for least squares and EM for ML. Applying our PCGalgorithm to the least squares function tends to reduce thenumber of iterations by about a factor of 3 over the EM-MLalgorithm. The main features of the image appear early inthe sequence of pictures produced by the EM algorithmapplied to LS and in those produced by the EM-based PCGalgorithm. As in the case with the EM algorithm for ML, asthe algorithms converge, the images become snowier. Someresearchers suggest terminating the EM algorithm before thespeckling obscures the image, while others suggest some typeof smoothing. Adding a smoothing penalty term, such as asquared difference as in [22] or an ln(cosh), as in Green [9],decreases the amount of speckling, but the PCG approach isstill just as effective. The appropriate weighting parametersin these penalty approaches depends on the total number ofannihilations counted and the shape of the image, and adjustingthem might not be an easy task.

When comparing various merit functions, the algorithm usedto optimize a particular merit function must be considered.Images obtained using different algorithms or starting guessesto optimize the same objective function might be radicallydifferent. Just because an algorithm is producing iterates thatappear to have converged does not mean that the optimum ofthe function has been obtained. An algorithm that stops whenthe gradient is small may terminate prematurely in a regionthat is almost flat. Iterates could be bouncing back and forthbetween the sides of a steep canyon, and thus might appearto be converging. Different algorithms may take differentpaths to the solution, and using even the same stoppingcriteria may produce different results. The initial guess alsotends to be a big contributing factor to the appearance of animage, as the results in Kaufman [15] indicate. Furthermore,when the solution is not unique or there are multiple localoptima, various algorithms will approach different optima. Forexample, with an initial guess of 0, fo r an underdeterminedsystem, the conjugate gradient algorithm is guaranteed, ifroundoff error is not considered, to find the least squaressolution that has minimum norm. In general, it will determinethe solution that is closest to the starting point.

11. EM APPLIED O L s AN D M L AN D PENALIZED LIKELIHOODIn the discrete reconstruction problem, one has data q wherevt represents the number of photon pairs detected in tube t .One w ould like to determine z ( z ) , the number of photon pairsemitted at z . However, this is not computationally feasible, butone can impose a grid of B boxes on the affected organ andtry to compute as unknowns x b , the number emitted in box b.

We would like the z s to be nonnegative, and it would be niceif the sum of the emitted pairs equals the sum of the detectedpairs. We assume that a matrix P can be constructed such thatp b , t represents the probability that a photon pair emitted inbox b will be detected in tube t.There are various mathematical approaches to determine themap of the annihilations, i.e., z. ne approach, suggested byShepp and Vardi [29], is based on the assumption that theemissions occur according to a spatial Poisson point processin a certain region. If the vts are assumed to be independentPoisson variables with means xi,Shepp and Vardi show thatz an be found by maximizing the likelihood function

where T is the number of detecting tubes. The vector zwhich maximizes L ( z ) also maximizes l ( z ) = l o g ( L ( s ) )whose gradient is much simpler to compute than that of L( z ) .Assuming thatB1 b ? ) b t = 171

b = lan d

Bx p b t 1b = l

where p b , t is the probability that photons emitted from box bwill be detected by detecting pair t , then the gradient of l ( z )is given by

m

b=lTh e Kuhn-Tucker conditions (see [8 ] or [23]) for maximizing1 subject to nonnegativity constraints is

and~ 5 0 fo r h = 1, . . .B an d x b = 0. (2.4)

The Kuhn-Tucker conditions along with the formula given in(2.2) lead to the EM algorithm of Dempster et al . [SI, roposedfor PET by Shepp and Vardi [29] and Lange and Carson [21]:

t l l ( Z )ax

T(2.5)p e w ) = $*d) % p b , tB

t = l Z : l d ) p b , , tb=l


3/15

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 12, NO. 2, JUNE 199302

Another way of rewriting the EM algorithm is Equation (2.10) implies that in the EM-ML algorithm ,(2.11)

whereso that the EM algorithm might be thought of as a scaledto the nonnegativity constraint used as the scale factor. In therest of the paper, any scaled steepest descent or ascent methodusing this scale factor we will call an EM -like algorithm.Another approach for determining the annihilation map is

steepest ascent algorithm with the distance of each element z = P awhich, if ut = 1i formethods is to consider minimizing

t , is the EM-LSAnother way to look at the correspon dence between the two

the least squares approach in which, given the tube counts 9, 4 2 ) = ; I I w T z - 9NI; (2.12)one minimizeswhere D s a diagonal weighting m atrix. The gradient of w ( z )

(2.7) is simplysubject to nonnegativity constraints. We note that for a rathersmall problem, one may impose a 128 x 128 grid leadingto 16 384 unknowns, and that there might be 128 detectorsor 128 x 127/2 columns in P. Thus, P would have 125million elemen ts, of which only a bout 1.6 million are nonzero.Because of its size and density structure, using a factorizationof the P matrix to solve (2.7) is ill advised.The gradient of (2.7) is

Of = P( P ' z - 9 ) (2.8)Using the Kuhn-Tucker conditions for minimizing a functionsubject to nonnegativity constraints, one can derive an EM -likealgorithm for the least squares function, namely,

More formally, one can write the algorithm in matrix formas follows.EM-LS:Fo r IC = 1 , 2 , . . until convergence:

1) Se t z = P ( P ~ ~ ( ~ )9)2 ) Se t x f + ' ) = xf) - xf)z&.

The EM-LS algorithm and the EM algorithm for the max-imum likelihood function, which we will call EM-ML, arerather similar. This becom es more apparent when the EM-MLis written in matrix form as follows.EM-ML:Fo r IC = 1, . . until convergence:

1 ) Se t U = P d k )2) Se t $t = v t /u t3) Se t y = Pd fo r t = 1 , 2 , . . ,T4) Se t xf+l) - bk ) b f o r b = 1 , 2 , . . . , B .

Both EM-LS and EM-ML require matrix-vector m ultipli-

V w = PD 2 ( PT z- 9 ) .As we did for f(z) ,we can derive an EM-like algorithm forw(z) , namely,

(k+l ) = (k) - ( k ) z bx b x b bwhere

2 = PD2(PT#) - 9 ) = PD2(u- 9 ) .If D were allowed to change each iteration and

d;; = ( 1 / u i ) ' l 2 ,then it should be obviou s from (2.11) and (2.9) that the iteratesobtained from the EM-ML algorithm would be exactly thoseobtained from applying an EM -like algorithm to a continuallyreweighted least squares problem.Our develop ment of the EM algorithm can also be extend edto merit functions that include a penalized potential function.In the likelihood situation, these can take the form of

where U ( z ) s design ed to penalize large differences in esti-mate parame ters for neighboring boxes and have the generalformU ( % )= y ~ v ( x j : x , / X , N j ) (2.14)

jwhere y is a positive constant and N j are usually the eightor so nearest neighbors. Various suggestions have been givenfo r v, ncluding

\ -cations with matrices P and P T , and thus require roughly thesame amount of work per iteration. suggested in [22] where N j represent the eight nearest neigh-bors, Green's suggestion in [9] ofLe t(2.9) v(xj,xi) ln(cosh((x; - xj)/S) (2.16)

so that $t = 1- ot. et e represent the vector containing all1's. Because all the row sums of P ar e 1 , Pe = e. Thus,

where S is another parameter to be set, and the nonlinearfunction of Hebert and Leahy [ l l ]v(xj, ;)= I n ( l + (xi- x j12 / s )= P+ = P ( e - a ) = e - Pa. (2.10) (2.17)


4/15

KAUFMAN: ML, LS, AND PENALIZED LS FOR PET 20 3

which also has an additional parameter. These, and others sug-gested in [ 111 and [7], are all easy to differentiate, nonnegative,and even.One can obviously add V ( z ) to f ( z ) nd form a penalizedleast squares function and apply the EM algorithm as givenabove. The form of (2.15) is very conducive to a least squaressituation, but as suggested by various authors, it penalizeshigh deviations between neighboring boxes excessively. Asshown by Lange [191, the ln(cosh) function has many desirableproperties, but it is critical that IS in (2.16) be appropriate to theproblem. The term suggested by Hebert and Leahy seems tohave most of the advantages of (2.16) and is easier to evaluateand seems to have few er numerical considerations.In the remainder of this paper, MA P will denote a penalizedleast squares function using (2.13, LC using the log(cosh)function in (2.16), and LN using the penalty term in (2.17).111. ALTERNATIVEAYS FOR SOL VI NG

TH E LEAST QUARE S PROB L E MIn this section, we will discuss several general approachesfor minimizing

f(z) +llpTz all; (3. la)such that

z 2 0. (3 . lb )Our aim in this discussion is to determine ways to acceleratethe EM-LS algorithm given in Section 11. We also pointout that extending our results to merit functions involving apenalized smoothing term as in (2.14) is easy.Many algorithms for minimizing (3.la) subject to (3.lb)have the following general form.

General Minimization Algorithm:1) Determine dl).2) For k = 1 , 2 , . . until convergence:a) Determine a search direction s(')b) Determine a step size a ( k )c) Set z ( k + l ) z ( k ) + a ( k ) s ( k ) . (3 .2 )

Step 1) of the above algorithm should not be treated lightly.For certain algorithms for solving (3.1), one has to be "closeenough' to get convergence. For some methods, like theEM-LS algorithm of Sectio n 11, starting at z = 0 spellsdisaster. For others, it may be fine. As our data indicate in thenext section, certain algorithms produce much better pictureswhen the initial guess is uniform.The parameter a in step 2b) is often used to obtain asufficient decrease in f along s and to maintain feasibility. ForEM-LS, described in Section 11, it is set to 1 , which does notguarantee feasibility. How ever, the EM-ML algorithm followsthe general outline given above with a set to 1, and as provedin [31], the iterates will always be nonnegative. Moreover, ifthe EM-ML algorithm were modified to include a step sizeparameter, as long as cy(') = 1, one would have

so that there is a preservation of tube counts in the tomographyproblem. For the least squares problem, the EM-LS algorithm,which sets a = 1,does not guarantee that the tube counts willbe preserved or that f(z("')) 5 f(z('1).Allowing a to varyin the EM-like algorithms gives much greater flexibility.A. Ensuring Nonnegativity

There are four main approaches to handling the constraintsin the general algorithm given above. In the first place, s an bedetermined without considering (3. b), as in the well-knownactive set techniques sometimes used for such problems.Nonnegativity is maintained by restricting a. Secondly, if forsome 6, xb + & sb is 0, then for a > 8, one might considersb = 0. Thus, s might be considered a bent line, and as onetravels along s, whenever a component of z+ as becomes0, s would bend. Thus, the constraints would determine thebreakpoints in the bent line. Thirdly, the constraints can beexplicitly used while forming s, as in the EM algorithm andin barrier methods. Finally, the constraints can be used ininitially forming s as in the third approach, and then a bentline approach can be applied.

In the active set procedures, at each iteration, one separatesthe elements of x nto those which should be kept at 0 an dthe variables that are free to vary. The direction s s chosento minimize some approximation to f in the space of the freeVariables. One travels along s until some approximation to fhas been sufficiently decreased or a variable becomes negative.The approach assumes that it is important to determine whethervariables are 0, and works well when one knows a priorialmost all the variables that will be at bound. If there areinitially many variables that are positive, which will eventuallybe driven to zero, many small steps might be required. Becausethe ultimate use of the variables in tomography is a picture, inwhich elements that are small and elements that are zero mightbe displayed by the same color, it is not that important whethera variable is 0 or just close to 0. Moreover, there may be alarge number of variables at 0, which a priori were thoughtto be positive. Thus, the active set procedures are rarely costeffective for the tomography problem.The problem of small steps is partially overcome in thebent line approach. Here, a can be larger than in the activeset approach so that some of the elements of x may becomeinitially negative. After the step is taken, negative elementsare set to 0. Various algorithms have been suggested fordetermining a , and the reader is referred to a recent paperby Beierlaire et al . [2]. The bent line approaches assume thatthe task of reevaluating f at the projections of z(') for variousvalues of cr is much less than computing a new direction s.The tests in 12) indicate that when there are more than a fewvariables at bound, which is often the case in tomography, itis better to do an inexact line search that stops when f ha sbeen sufficiently decreased than to do an exact line search.The third approach involves the explicit incorporation of theconstraints into the search direction s. ncluded in this categoryis the EM algorithm, with a line search and various interiorpoints methods given in [181 and other recent papers. Often,the objective method is changed to reflect the constraints, and


5/15

20 4 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 12, NO . 2, JUNE 1993

a standard unconstrained method is used to determine s of thenew function. The advantage of incorporating the constraintsinto s s that, usually, e lements of s, orresponding to elementsclose to 0, are kept small. Thus, large steps in the "freer"variables are tolerated. The active set methods and the bentline methods eventually reach this state, but rarely as quicklyas those that involve the distance to the constraints in thedetermination of s.The bounded line search EM algorithm,given below, follows the third approach.Bounded Line Search EM-LS:1) Determine dl).2) Set g = p ( P z ( l ) q).3) Let W be the diagonal matrix containing dl ) .4) Set y = g T W g .5) Fo r IC = 2 , 3 , . . . ntil convergence:a) Set s = -Wg, the new search directionb) Set U = P T sc) Set 0 = y /uTu , he minimum of f along s

d) Set Q = min (0 , ~ i i r i ~ ~ < ~ ( - ~ ~ I , - ' ) / ~ ~ ) )e) Setf) Reset W to the diagonal matrix containing z(')g) Set z = P uh) Set g = g + azi) Set y = g t W g .

= dk-')as

If one had a penalized least squares function of the formf ( z ) + U ( z ) where U ( z ) might be something like (2.16),then one would subtract aU ( z ) / a z or g in the bounded linesearch EM-LS algorithm and change the formula for 0 in (5c)accordingly.B. D etermining s and Acceleration Schemes

In (3.2), the direction s s chosen to approximately minimizef or a modification of f to include constraint information.If the approximation is linear, as in the EM algorithm, onequickly gets somewhat close to the solution, but then littleprogress is made. When the level curves of a function arevery elongated hyperellipsoids and minimization is tantamountto finding the lowest point in a flat, steep-sided valley, asteepest descent algorithm, such as the EM algorithm, tends totraverse across the valley rather than going along the valley.The directions generated are too similar and information is notgathered in subspaces orthogonal to these directions.Several altematives exist which remove or decrease theeffect of ellipsoidal level curves. One can use a quadraticapproximation as in the method suggested by Han et al. [lo].However, their algorithm involves solving a huge linear systeminvolving P , not an easy task. Often, an iterative method isused to approximately solve the system.The conjugate gradient algorithm is an easy to use algorithmwhich generates gradients that are mutually orthogonal. Thesearch directions tend to go along steep valleys, rather thanacross them. The search directions s satisfy the conditionthat s , P P T s 3 = 0 fo r i # j . (See [8].) The linear conju-gate gradient algorithm, originally proposed by Hestenes andStiefel in 1952 [14], is used to iteratively solve a symmetricpositive semi-definite linear system when it is easy to doa matrix-vector multiplication with the coefficient matrix.

The nonlinear conjugate gradient algorithm, proposed firstby Fletcher and Reeves in 1964 [6], is used in functionminimization. If the function is quadratic, as in (3 . la) , andthere are no constraints, then the standard nonlinear conjugategradient algorithm will produce the same sequence of iteratesas the linear conjugate gradient method applied to the systemP P T z = P q . If P T ha s m positive distinct singular values,the conjugate gradient algorithm is guaranteed to converge inat most m steps, each involving a matrix by vector multi-plication to determine s. (See [3].) In our experience, goodpictures are obtained in significantly fewer less than m steps,and for the least squares problem, in many fewer steps thanthe bounded line search EM-LS algorithm. As we shall seein Section IV , dramatic decreases in function values can beobtained very quickly.The strength of the conjugate gradient method is capturedin the following theorem recast from Luenberger [23].Theorem: Assume d o) s the initial guess of an iterationprocedure and Q = P P T , and consider the class of proceduresgiven by,(k+l) = +Rlc(Q)Vf(z(O')

where RI,(Q) s a polynomial of degree IC . Assume x* s thesolution to the problem P P T z = P q so that- - * = ( I + R ~ ( Q ) ) ( ~ ( O ) -z*).

Le t E ( z ( k + l ) ) l ( z ( k + l ) - * ) T Q ( z ( k + l )- z*implyingE(z(I,+')) ; ( z (~+ ' ) * ) ~ Q ( I QRI,(Q))(z(O)z*).

The point z('+l) generated by the conjugate gradient methodsatisfies qz("+l))min ;(z(k+') - *) TR k

.Q ( I+QRI,(Q))(JO)z*)where the minimum is taken with respect to all possiblepolynomials RI , of degree IC .The above theorem states that in one sense, the conjugategradient method is optimal over a class of procedures that iseasy to implement. In particular, every step of the conjugategradient method is at least as good as the steepest descent stepwould be from the same point.There are various ways of stating the conjugate gradientalgorithm for minimizing a quadratic function, all of whichare equivalent in infinite precision arithmetic. The variantthat seems least sensitive to roundoff error in finite precisionarithmetic is LSQR [25].As explained in Section 111-A in general terms, there are anumber of ways LSQR can be modified to take into consid-eration nonnegativity constraints. One can use an active setstrategy in the space of free variables, and whenever taking astep of Q to minimize f , or an approximation thereof, violates anonnegativity constraint, one only goes as far as the constraintand restarts at the top of the algorithm. Computational testssuggest that this strategy makes little progress in tomographyproblems. It is probably a bad strategy in general. One cando a bent line search approach, and again restart wheneverone goes past the first bend. (See [2].) The efficacy of this


6/15


approach is problem-dependent, and also depends on how thebent line search is implemented. One can be very lucky, andafter a few iterations, all the elements that eventually will be0 are determined, and one gets the superlinear convergenceassociated with the conjugate gradient technique.A third possibility involves using the constraints moreexplicitly, as in the EM algorithm. Let W be the diagonalmatrix with wii = xi. The line search EM algorithm, givenabove, would use s = -Wg, where g is the gradient off ( z ) . To accelerate the EM algorithm in the same waythat the conjugate gradient algorithm accelerates the steepestdescent algorithm, one might consider a conjugate gradientEM algorithm with

,Jk+l) -w g - s ( k ) . (3.4)Let us derive such an algorithm for the quadratic problem(3. a). The traditional linear conjugate gradient algorithm isdesigned to solve the system

PPTz= Pr) (3.5)

w 1 / 2 p p T w 1 / 2 2 = w1/2pv (3.6)Solving (3.5) is also equivalent to solving

where W1/'2 = z. he matrix W is usually called apreconditioner because it is assumed that the new system isbetter conditioned than the old one and the algorithm willconverge faster. (See [8].) If it is assumed that W is aconstant matrix in (3.6) that is updated each time there is arestart, then the linear conjugate gradient algorithm, retainingnonnegativity, applied to (3.6) is as follows.PCG:1) Determine dl) . et k to 1.2 ) Se t d = r) - PTz(')).et W to the diagonal matrix3) Le t4) Set y = Pu.5 ) Le t g = y / ( ~ ~ ~ y ) l / ~ .6) Se t dk) W g , y = (yTWy)l / ' , and p ( k ) = y.7) Until convergence iterate:

containing dk).= I(dllz and u = d / $ ( ' ) .

a) Set d = P T y - yub) Set p = lldll2 and U = d / Pc) Set c = ~ ( ~ ) / p ( ' ) ,= p/p( ' )d) Set p ( k ) = ( p ( k ) ) 2+ @2)1/2, +(k) = ,$('),and 6 = q $ ( k ) ) / p ( k )

d')If 6 > 0, set a = m i n ( 6 , m i n y (b ) < Oxik) /s i" )If 6 < 0, se t a = max(6, min y (b , >Oj k ) / s z ( " )

e')If ( a (< (61, ncrement k by 1, go back to step 2 )f) Set y = Pu - p yg> Set y = ( y T ~ y ) 1 / 2h) Set 9 = Y /Y

j ) Se t s('+') = W g - SS(')).

e> Set z ( k + l ) = z ( k ) + as(k)

i> Set p('"+l) = -,y$('+1) = T $ ( k ) , 6 = ry,and S = O/p( ' )

In algorithm PCG, whenever a constraint is hit, (a1< (61,and the algorithm is restarted with a new preconditioner, and

a line search EM step is taken. If the algorithm never hitsa constraint, the whole algorithm is just the standard linearconjugate gradient algorithm applied to the preconditionedsystem. Because the objective function is quadratic, termina-tion is assured. In practice, the role of the preconditioner isto ensure that the distance to the nearest constraint is largeenough so that progress is not hindered. It also reweights theproblem so that information corresponding to nonnegligiblex's is considered more important. Although the algorithm doesnot seem simple, most of the work is involved in multiplicationby P and PT, operations that must be done with the standardEM algorithm. Notice that the search direction generated instep j) is just a linear combination of the EM step and theprevious direction. The main differences between the standardLSQR algorithm as given in [ 2 5 ] and algorithm PCG givenabove are steps d') and e') and the inclusion of W .One problem with the above algorithm is that once avariable becomes 0, it will never increase. Thus, although thealgorithm will terminate in a finite number of steps, there is noguarantee that if xJ = 0, f (z ) ould not be further decreasedby letting z3 ecome positive again. One way around thisproblem is to check on termination whether d f /ax, s positivefor all zJ = 0, and if not, restart the algorithm. If xJ / as thelargest negative gradient for all z' s that are 0, reset w3/,Jlto max,x,. In theory, allowing only one element to changeand checking the sign of the gradient only after terminationof the inner PCG algorithm guarantees convergence in a finitenumber of steps. In practice, one usually checks the gradientfor zero x' s in the inner loop and restarts immediately.

Of course, one never sees the power of the PCG approachif there is a restart every iteration. In our examples, in SectionIV, a restart, because a constraint w as hit, occurred about everysix iterations. The EM algorithm fo r maximum likelihood can,in principle, be accelerated just as the EM algorithm for LS isaccelerated in PCG. The problem we encountered in practicewas that for each iteration, a restart was necessary so thatthe conjugate gradient inner iteration was never activated. Asreported in [151, this can be overcome by instituting a bent lineapproach initially so that one momentarily permits infeasibilityand then sets negative elements to 0. This returns us to the oldproblem, however, of trying to determine which elements are"0" ather than letting the EM algorithm itself do it.The PCG algorithm given above is similar to the algorithmgiven by Kawata and N alcioglu [17] (K N), but the differencesare very important. In the KN algorithm, the elements ofWcorresponding to nonzero values of z are set to 1. Thus,if z, is small but nonzero and s, < 0, then little progress canbe made in that iteration. Moreover, one is forced to determinewhether an element is small or 0, which can be numericallydifficult and rather unnecessary. The smoothness in which thenonnegativity constraint is handled in the EM algorithm doesnot appear in the KN algorithm.However, there is one problem with the EM algorithm, andits variants like PCG above, that the KN algorithm overcomes.When s, > 0 or there is no fear of hitting the boundary,it would be nice not to scale the search direction by T , .The EM algorithm is very sensitive to the initial guess, andstarting with a random start is disastrous, whether it be in the


7/15

20 6 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 12, NO. 2, JUNE 1993

ML setting or the LS setting. Little improvement is made inthose components which are very small initially, but should berelatively large. The picture is particularly speckled. An initialhomogeneous guess is a good starting point for W;i = xi.In a multigrid or an adaptive grid situation, one may wantto use as an initial guess information that might come from acoarser grid. With the EM approach, if one is not very careful,the coarser grid tends to bleed through and give unnecessaryartifacts .There is another change which one may wish to make withthe W matrix in the PCG algorithm. When the problem isunderdetermined, which could be the case in tomography, theconjugate gradient algorithm seeks out the solution with theleast norm if the initial guess is 0. Even in our situation withthe preconditioner changing, theoretically, in a roundoff-error-free environment, the iterates will always lie in the range spaceof PT if the starting guess is 0. However, starting with aninitial guess of 0 is not an option given in our current PCGscheme with the W matrix defined above. If initially w;i is setto 1 if yi > 0, then one could take advantage of this property.In order to take advantage of a penalty smoothing term asin (2.14), the PCG algorithm may be reworked as follows.

PCG-Penalized:1 ) Determine dl) . et IC to 1 .2) Se t d = 1 - PTdk ) ) .et W to the diagonal matrix3) Set y = P d .4) Let g = y - V U ( z ) .5 ) Se t p = gTWg.6 ) Set s(') = Wg.7) Until convergence iterate:

containing ~ ( ' 1 .

a) Determine 6, the minimum of j ( d k ) as('))b) Set Q = min (d , m in s(k )


8/15


101 I I , , I I , , , ,

0.40 2I-:::I-0.4-0.61 , ,w1- 1 .o4.0 0.0 1 o-0.e PE r Pnmrom

Picture 1 . The phantom used in the computer simulation.

mistaken for tumors. The measure of the success is usuallyvisual, but one must realize that in a gray scale picture, itis much easier to determine relative intensities rather thanabsolute intensities. An area in a picture might stand outwhen it is slightly different from the background, when itreally is not very close to the actual solution. Line plots ofa one-dim ensional cross section tend to be more informativeif absolute intensities of a tumor are important. Secondly,one should be concerned with the rate of convergence-howmuch com putational time is required before the tu mors appear.Pictures at various iterations give some clue to the rate ofconvergence, but it is also very helpful to look at numericalmeasures, like function values. Such numerical information issometim es missing in the tomography literature.In our com parisons, we will usually be looking at algorithmsthat use a line search. With a m aximum likelihood (M L) meritfunction and the standard EM algorithm , a step size of 1.0 isguaranteed to yield nonnega tive iterates that converge to thesolution. There is no such guarantee with the least squares(LS) merit function or the penalized LS merit function andth e EM algorithm. In fact, in one of our examp les, a step sizeof 0.0051 on the first iteration with LS produced a negativeiterate. Moreo ver, continuing with a step size of 1 and settingnegative elements back to 0 produced a divergent rather than aconvergent algorithm. Thus, some sort of line search techniqueis essential for LS. A bounded line search procedur e in whicha is small enough to ensure nonnegativity is extrem ely cheapin LS . For ML, as shown in [15], the cost of doing a boundedline search involves function evaluations that are cheaper thanperform ing a matrix-vector multiplication with the P matrix,as needed in computing a search direction. The efficacy of abounded line search is prob lem-depen dent. In 1151, we showsome exam ples in w hich approxim ately the same picture isobtained in about one third of the work with a line search.In the main example of this paper, which uses a square grid,Q was bounded for each iteration with the ML function byabout 1.06. Thus, a bounded line search produced only aslight improvemen t. In every ML example seen by the authorfor PET using a bounded line search, the maximum functionimprovement is attained when Q is at the bound. This meansthe cost of doing the search is small. Since the cost of doing

I(x)

l e d 7

le&0 IO 20 30O 20 30 0

(a), (b) Iterations Funct ion value versus iteration count for 10nullionig. 1 tube counts.

a crude line search usually is so outweighted by its potentialbenefits, doing a bounded line search is highly recommended.Thus, for the rest of this section, the terms EM-LS andEM-ML imply that a bounded line search has been performed .Our computations were done on the SGI 4D/240, a fastscalar machine. Each of the algorithms was stopp ed after 32iterations, which required about 4 minutes of computationtime. The number 32 was a bit arbitrary. After 32 iterations,the functions involving penalty terms had converged. Althoughthe function values for the case of the PCG algorithm forthe nonpenalized function were still decreasing slowly, thefeatures in the test phantom were well defined by iteration16, and one could have stopped there. By going as far as32 iterations, one sees the deterioration of the image as thealgorith m tries to fit the noise in the data. By the 32nd iterat ion,the features of the phantom for the EM-ML and EM-LScomputatio ns were quite w ell defined, but speckling had notreally occurred.The com putations could have taken adv antage of the hard-ware of a parallel machine. The reader is referred to [16] fo rsuggestions on how to adapt the algorithms to these machinesand data showing that the computation would have beenfinished on a Cray within 10 seconds.A. Function Plots

Fig. l(a) gives the least squares value defined in (2.7) foreach iteration of the bounded line search EM-ML algorithm,the bounded line search EM-LS algorithm, and the PCG al-gorithm of Section 111. Unless stated otherwise, the algorithmswere started with the sam e approxim ation which is uniformwithin the inscribed circle of the grid and 0 outside thatcircle. The guess is scaled so that the sum of the unknownsequals the computed tube count. Note that no attempt wasmade to m inimize the least squares value during the EM-MLalgorithm. The least squares values were just computed andprinted. That it did better than the EM-LS algorithm was acomplete surprise. However, there was no surprise that thePCG algorithm, which used the conjugate gradient technique,greatly improved the situation. This is the type of result thatnumerical analysts have seen for years. In 32 iterations, therewere six restarts when the line search suggested that theoptimum w as at the boundary. The graph shows the differencethat a preconditioner makes.Because of noise in the data, the least squares value of (2.7)at the optimum m ay not be the best test for go odness of fit. Aweighted sum of squares which would take into consideration


9/15


detector efficiency might be more appropriate. Although smallchanges in the function value is often used as a criterion forconvergence, many iterations with small changes can often addup to a large change. It is often suggested that one stop the EMalgorithm before convergence when one has obtained a good fitof the signal in the data, but before the image h as been affectedgreatly by the noise in the data. Assu ming that there are iteratesthat have captured the signal with little noise, determining thatthis is the case with a process like EM-LS where functionvalues and projected gradients decrease gradually is muchmore difficult than with a procedure like PCG where thereis an initial rapid decrease in these values.In Fig. l(b), all the algorithms are some form of LSQR,with the constraints handled in a variety of ways. The RE-STRICTED LSQR algorithm was the algorithm mentioned,but not recommended in [2], which is the same algorithm asPCG with nonzero elements of W always set to 1. The graphshows the difference that a preconditioner makes. The CON-TINUOUS PCG algorithm is similar to the PCG algorithm,but changes W each iteration so that its diagonal reflects thecurrent iterate, not the iterate at the last restart. There wereonly four restarts for 32 iterations. Because the scale factorin the search direction is changed with each iteration, onewould expect fewer restarts than with the PCG algorithm. ThePROJGR algorithm used the inexact bent line search algorithmgiven in [2], which is a cross between the restricted LSQRalgorithm and a bent line search projected gradient algorithm.Recall that in a bent line search algorithm fo r the nonnegativityconstraint case, an element of the iterate that becomes negativeis reset to 0 before the merit function is evaluated. In thePROJGR algorithm, whenever the restricted LSQR algorithmis restarted and the first step violates the nonnegativity con-straints, the algorithm takes a projected gradient step whichwould obtain a sufficient decrease that would satisfy theGoldstein-Armijo criteria for convergenc e. (See [8] ) Thus,several function evaluations might be required on each restart.These extra evaluations are not counted in our graph. Inour example, a restart was necessary immediately almostevery time the restricted LSQR algorithm was begun. Thisaccounts for the wavy nature of the PROJGR curve. Only theprojected gradient steps produced a reasonable descent. Again,the preconditioned algorithms outperform the algorithms thatdo not use a preconditioner.Fig. 2(a) considers the penalized smoothing approach withthe penalty term MAP with two values of y n (2.14), 1.0and 0.1. In Fig. 2(b), the preconditioned conjugate gradientalgorithm is applied to problems using the penalizing terms LCand LN with y = S = 100 for both cases. For the penalizedfunction, the function value is that of the nonpenalized term,not the sum of f ( z ) an d U ( z ) . There are two significantfeatures of these graphs. In the first place, Fig. 2(a) showsthe power of the conjugate gradient approach over the scaledgradient approach with the penalized function MAP. It stronglysuggests that those advocating a smoothing term considerusing the conjugated gradient approach as an accelerator.Secondly, those curves with the penalized term which use thePCG approach tend to follow the nonpenalized curve for awhile and then flatten out. Choosing the penalty parameter

byd ismap-0 1

IO U) 30 0 IO 20 30Fig. 2. (a), (b) Iterations. Function value ve rsus iteration for I O million tubecounts using penalty term.

em-ml

0 IO 20 30lterauons

Function values versus iteration for problem wlth 256 detectors,ig. 3. 128 x 1 2 8 gnd, and I O million tube counts.

- m-Is"1

10 20 30iterations

Fig. 4. Function value versus iteration for 1 million tube counts.

really seems to be quite similar to choosing when to stop thenonpenalized PCG iteration. Our data support the methodologyof the comparison in [ 121.Thus far, we have considered a problem which ev en with thenonnegativity constraints is underdetermined. Fig. 3 shows theefficacy of the preconditioned conjugate gradient approach ona problem with a ring of 256 detectors which circumscribea 128 x 128 square grid. Again, the curve tracing f(s)with the MAP function using the preconditioned conjugategradient algorithm is initially similar to the curve for thatalgorithm applied to the nonpenalized function. Of course,similar function values do not guarantee similar iterates. In thenext section, where we give some line plots of early iterations,e.g., Fig. 6, there are slight differences. The differences arelargest where there are large transitions between regions inthe phantom (e.g., near the skull), and the smoothing penaltyfunction tends to smooth out these transitions. In fact, we

compared the PCG image, zp at iteration 8, and the imageproduced by PCG on the MAP function with y = 0 . 1 , ~ ~tthe same iteration, and 11sp - ~ ~ l l z / l ~ p l l 2,0071, so thatthe overall differences were quite small.Fig. 4 gives the least squares values for each iteration forsome of the same algorithms used in Fig. 1for a problem with1 million tube counts, a more common situation. Again, the


10/15

KAUFMAN: ML, LS, AN D PENALIZED LS FOR PET 209

I - m-is3m-1 -0.5 0 0.5 1

Fig. 5. Iteration 8, lroordinate of phantom at S = 0 . Ten million count.

- r 1 r I 1-1 -0.5 0 0.5 IFig. 6. Iteration 8, 1- coordinate of phantom at S = 0. Ten million count.

PCG algorithm is faster than the standard EM algorithm. Theparameters for the penalized functions were changed to givea more realistic picture. For the penalty term in LN, y wasset to 10 and 6 to 100.B. Line Plots

Let us now consider comparing the algorithms visually,as one might in an application. This will give us an ideawhether low function values actually correspond to usefulinformation. We begin with line plots that are important ifabsolute intensities are important. Figs. 5-10 are line plots atz = 0 using the problem with 10 million tube counts and128 detectors. In Picture I , below the point 0, 0 there isa tumor at which the emission count should be almost at2000. In Fig. 5, we compare the line plot for the EM-ML andEM-LS algorithms at the eighth iteration with a histogramof the true emission count that generated the data. Fig. 6compares the eighth iteration for the PCG algorithm and thePCG algorithm on the penalized function with penalty termMAP and y set to 0.1 with the true histogram for I O milliontube counts. Notice that the intensity of the tumor is just aboutcorrect by the eighth iteration for the PCG algorithm, and theplots for the unpenalized function and the penalized one arealmost identical. The tumor is hardly defined by the EM-MLalgorithm. A graph of the CONTINUOUS PCG algorithm isalmost indistinguishable from that of the PCG algorithm, andhence is not given.In many iterative processes, the answers before the iterateshave converged bare little resemblance to the final solution.By showing plots for reconstructed images stopped well beforeconvergence, we have indicated that this is not the case in ourstudies. If for some reason , like budgetary or computer failure,one must stop before convergence, the produced images canoften given useful information. The main signal in the dataseems to emerge quickly. Since all the algorithms and meritfunctions require about the same amount of work per iteration,

3000 - m-1s I2000 usto to gramI000 I I I I I

-1 4. 5 0 0.5 1Fig. 7. Iteration 16, l r oordinate of phantom at S = 0 . Ten million tubecount.

PmJP

I I I I I-1 4 s 0 0 5 1Fig 8 Iteration 16, Ir oordinate of phantom at X = 0 Ten million tubecount

our images taken at early iterations indicate their quality if oneis forced to stop the processes at a specific time for whateverreason.Figs. 7 and 8 compare the algorithms for the I O millioncount problem at the 16th iteration. One could obviously havestopped the PCG algorithm at iteration 8, and one is nowseeing the raggedness that usually characterizes the EM-MLalgorithm at late iterations. Thus, the least squares functionalso has this unpleasant feature. The tumor defined by theEM-ML algorithm is better defined than in the eighth iteration,but the absolute intensity is still lagging. The tumor is welldefined for the EM-LS algorithm, an unexpected result. Thecurve tends to be smoother for the PROJGR algorithm, butthe tumor is ill defined. The added effort of the PROJGR issimply not worth the cost.Figs. 9 and 10 consider different penalizing terms withdifferent parameters. With MAP when y was set to I , the tumorwas slightly damped. Iterating further did not improve thesituation. For LC and LN, y and 6 were set 100. Determiningsuitable parameters was a rather tedious process. Particularlywith LC, one has to be careful with 6 o avoid overflowwhile evaluating the cosh function during line searches. Thepenalty approaches did give a rather smooth picture with awell-defined tumor, as did the unp enalized function at iteration8 . Assuming that the penalty parameters do not have to be resetoften, using a penalty function is a viable approach. However,as our next problem suggests, if there is a change in the tubecounts, the least squares function would be changed, and ifthe penalty parameter is not adjusted, a tumor may be missed.Figs. 11-16 look at the Y coordinate of the phantom atX = 0 for various algorithms for the 1 million tube countproblem. All the difficulties seem enlarged in this case. Thetumor is well defined by the eighth iteration for the PCGalgorithm, and further iteration tends just to produce artifacts


11/15

210

3000

2000-

1000-

0 ,

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 12, NO . 2, JUNE 1993- ap0.l ,)). hislapu

1 I I I I-1 -0.5 0 0.5 1

:100 - m-ls- ojpI I I I I

-1 -0.5 0 0.5 1Fig. 13. Iteration 16, Y coordinate of phantom at X = 0.Fig. 9. Iteration 16, Y coordinate of phantom at X = 0. Ten million count.- n100.100

I I I I I-1 -0.5 0 0.5 1

. . .. .- mu) :.. :::CCI L100.100

L II I I * ' I-1 4.5 0 05 1Fig. 14. Iteration 16. Y coordinate of phantom at X = 0.Fig. 10. Iteration 16,Y coordinate of phantom at X = 0. Ten million count.

- n-ls3oo- ........200-........... ._..

loo- :0- '.

I I I 1 I-1 -0.5 0 05 1I I I I I-1 -0.5 0 05 1 I I I I I-1 -0.5 0 0.5 1

Fig. 1 1 . Iteration 8, Y coordinate of phantom at X = 0. Fig. 15. Iteration 16, Y coordinate of phantom at X = 0 . One million tubecount problem.- n-10.100 1........c).r."%,apl.0 .. ...: .::

I 1I I I I-1 4.5 0 0.5 1

Fig. 12. Iteration 8, Y coordinate of phantom at X = 0.

that may be mistaken for tumors. At least for this problem,the EM-LS algorithm seems to capture the intensity of thetumor better than the EM algorithm applied to the maximumlikelihood function, but the differences are really minimal.The penalty parameters used in the previous example with10 million counts for MAP and LN were completely wrong.With MAP , a larger parameter was needed to smo oth the plot.With LN, the previous parameters smoothed out the tumor.The extra param eter in LN was useful, and finally gave a mostsatisfying result. However, adjusting the parameters is not atask one wishes to repeat often.Because this problem has few er counts, reducing both y and6 in LC is appropriate. Reducing both to 20 gave a smootherpicture. Reducing S to 10 caused overflow problems during

- n100.100- n l0.lW........I II I I 111 -0.5 0 0.5

Fig. 16. Iteration 16, Y coordinate of phantom at X = 0. One million tubecount problem.the line search procedure. In general, sm oothing by adjusting apenalty parameter is at best annoy ing, and at worst hazardous.I would rather apply smoothing afterwards to the unpenalizedfunction rather than trying to gues s beforehand what to do.

C. PicturesWhen comparing objective functions and algorithms de-signed to optimize these functions, the pictures produced arethe most important measure. Howe ver, as pointed by H ermanand Odhner [12], the utility of a picture depends on theultimate application: a pleasing picture that does not givesufficient information is useless, and a b especkled picture that

.


12/15

KAUFMAN: ML , LS, AN D PENALIZED LS FOR PET 211

Picture 2. Images a fter iteration 8 with the 10 million tube count problem. Picture 4. Images after iteration 32 with the 10 million tube count problem.

Picture 3. Images after iteration 16 with the 10 million tube coun t problem. Picture 5. Images after iteration 32 with the 10 million tube count problemwith various penalizing terms.captures exactly the right information is useful. Moreover, apicture that would be good for one application might not begood for another.P ic tu res 2 4 g ive i te rat ions 8, 16, and 32 for the 10million count problem with 128 detectors. Since the majorportion of work of each iteration for each method is the twomatrix multiplications with the P matrix, the pictures showthe methods at comparable points in the computation. Thespeckling that five years ago caused concern about the EMalgorithm applied to the maximum likelihood function alsoappears when the EM algorithm and its variants are appliedto the least squares merit function. For the PCG algorithm,the speckling appears sooner, but the small tumors appearmore distinct earlier than in the nonaccelerated algorithms.The different orientations of the two lower tumors seem to beevident by iteration 16 for PC G and iteration 32 for EM-LS,but are not discernible for the EM-ML approach. For somemedical applications, this may be important. In pictures 2 4 ,PCGO indicates starting with an initial guess of 0. If thezth component of the gradient is positive, then uiii is initiallyset to 0; otherwise, it is initially set to 1. Theoretically, in theabsence of roundoff error, the solution will lie entirely in therange space of PT.Thus, when the least squares function is notunique, PCGO will choose the one with minimum norm. Thereappears to be little difference between the solution obtainedusing PCG and PCGO.

Picture 5 gives the 32nd iteration for various penalized leastsquares functions obtained using the preconditioned conjugategradient algorithm outlined in Section 111. Fo r LN and LC,both y and b were set to 100. For MAP, a penalty parameterof 1.0 smoothed out the tumor too much. However, later inpicture 10, for the problem with 1 million tube counts, thepicture produced with y et to 1.0 is better than that obtainedwith y = 0.1, the better setting in the 1 0 million tube countcase. Thus, we see the sensitivity of the parameter to the tubecount. Hopefully, in a production situation, tube counts willnot differ by a factor of 10 or else one really needs a trainedperson manipulating penalty parameters. In truth, the penaltyparameters for all of the penalty functions were determinedby first guessing one parameter, looking at the function valuesproduced, and adjusting the parameters so that convergencewould be obtained when f ( z )was about that given by iteration10or 14using the preconditioned conjugate gradient algorithmon the nonpenalized function. With all the penalty functions,one could choose parameters which on convergence gave amuch more pleasing picture than the converged nonpenaltyfunction case. However, there does not appear to be that muchdifference between iteration 8 of the PCG algorithm for thenonpenalized case and the converged penalized pictures.Picture 6 treats the problem with 256 detectors and 10million tube counts. With this problem, there is no questionwhether there is a unique answer since the number of boxes


13/15

212 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 12 , NO. 2, JUNE 1993

Picture 6. Ima ges after iteration 16 with the 10million tube count problemwith 256 detectors.

Picture 7. Images after iteration 8 with the 1 million tube count problem.

remained at 128'. The picture tends to corroborate our previ-ous results. The preconditioned conjugate gradient algorithmconverged very quickly to a bespeckled picture. Adding apenalty term produced a less snowy picture.Pictures 7-10 consider the 1 million tube count problem. Forthe PCG algorithm, the speckling is much more of a hindrancethan in the previous problem, and the EM algorithm appliedto the maximum likelihood function gives a clearer picture,but this might not be important for a particular application.Starting with an initial guess of 0 does not affect the situation.Smoothing using a penalty function approach does improvethe pictures, but one has to be careful not to smooth out thetumors and to adjust the penalty parameter to account forthe fact that f ( z ) will be smaller and that small changesare more significant than in the previous case. In picture8, the LC penalty term with y = 6 = 20 removed muchmore of the annoying speckling than the parameter settingused in the 10 million tube count problem that is shown inpicture 9. The reconstructions in picture 10 obtained using thepreconditioned conjugate gradient algorithm applied to variouspenalty functions are all rather strikingly different from eachother.The most striking comparison occurs in picture 11, whichcompare s the line search EM-LS algorithm with one inwhich the scaled gradient direction is modified slightly. Thesame random starting guess was used with both algorithms.

Picture 8. Images after iteration 16 with the 1 million tube count problem.

Picture 9. Images after iteration 32 with the I million tube count problem.

Picture 10. Images after iteration 32 with the I million tube count problemwith various penalizing terms.

Certainly, starting with a random guess is a bad idea for theEM approach. Values with low numbers will not recover withan EM-like algorithm that scales an element of the searchdirection by the numerical value of the variable. The modifiedalgorithm is a compromise between the EM algorithm anda standard active constraint approach. Whenever the gradientis negative, implying that the corresponding variable shouldbe enlarged, that component of the gradient is multiplied bythe maximum elements in z when determining the searchdirection. Whenever a component of the gradient is positive,


14/15

KAUFMAN: ML, LS , AND PENALIZED LS FOR PET 213

Picture 11. These images all used the same random starting guess. Theleft-hand images used the standard EM-LS algorithm. In the right-handimages, whenever the gradient was negative, it was scaled by the largest2, o determine the search direction.

Picture 13. The left-hand images used the random starting guess of Picture11 , and the right-hand images used the starting guess of Picture 12. Theimages are those of iterations 8 and 16 of the EM-LS algorithm applied tothe MA P function with -, = 0.1.

Picture 12. These images all used the same starting guess, with the innersquare having smaller values. The left-hand images used the standard EM-LSalgorithm. In the right-hand images, whenever the gradient was negative, itwas scaled by the largest .rL to determine the search direction.

that component of the search direction is handled like thestandard algorithm. Thus, the modified algorithm treats de-creasing variables as smoothly as the EM algorithm, but itdoes not hinder increasing variables. The modified algorithmis not as sensitive to the initial guess. In picture 13, we seethe iterates produced using the same random starting guessand the MA P merit function using the EM-LS algorithm.The smoothing helped a bit, but not as much as applying themodified algorithm to the nonpenalized function.For picture 12, the inner square was begun with numbers3/7 less than the outer portion, and the same algorithms, whichwere used in picture 11, were compared. Although the higherstarting values do not seem to give a worse or different picturewithin the square, there is an undershoot+vershoot problemjust at the boundary of the square with the standard EM-LSalgorithm. This suggests that any algorithm which initiallyuses a coarse grid to obtain an initial approximation to thesolution in an efficient manner and then tries to refine the gridmust use care when refining the grid to avoid a shadow ofthe coarse grid appearing during the fine grid iteration. Themodified algorithm is much less sensitive to the initial guess.It is much better than applying the unmodified algorithm to

the smoothing MAP function shown in Picture 13. Perhapsthere is a penalty parameter setting and a penalty term thatare not as sensitive as our results suggest. However, it seemsthat, in general, appending a penalty term is less effectivethan modifying the scaling matrix when the initial guess isnot uniform.

V. CONCLUSIONSOur results show that a preconditioned conjugate gradientapproach, which marries the EM algorithm to the conjugategradient method, applied to the least squares function orto a penalized least squares function produces satisfactoryimages much faster than a line search EM-ML approach.The preconditioner takes into consideration the nonnegativityconstraints as easily as is done in the EM algorithm, dnd oneis not forced to determine whether an element is small or 0.Our choice of preconditioner also permits much more progress

in the nonzero variables than in an approach that does not usea preconditioner.In general, the pictures do not conclusively indicate the bestmerit function. As many others have suggested, there are verygood theoretical statistical reasons for considering a penalizedapproach, but penalty functions usually entail determining apenalty parameter. Our results suggest that one must use carein choosing an algorithm to optimize whatever merit functionone uses. The initial guess, the stopping criteria, the parameterin the penalty function, and the ability of the algorithm tohandle the nonnegativity constraints all affect the pictures.ACKNOWLEDGMENT

The author wishes to thank E. Grosse, B . Coughran, andT. Duff for writing the local software which produced thepictures.

REFERENCES[ I ] 0. Axelsson, Conjugate gradient type methods for unsymmetric andinconsistent systems of linear equations, Linear Algebra Appl., vol. 29 ,

pp . 1-16, 1980.


15/15


[2] M. Bierlaire, Ph. Toint, and D. Tuyttens, On iterative algorithms forlinear least squares problems with bound constraints, Linear AlgebraAppl. , pp. 1 1 1-143, Jan. 1991.[3] A. Bjorck, Methods for sparse linear least squares problems, inSparse Matrix Computations, J. Bunch and D. Rose, eds. New York:Academic, 1976, pp. 177-200.[4] C. M. Chen, S.-Y. Lee, and Z . H. Cho, Parallelization of the EM algo-rithm for 3-D PET image reconstruction, IEEE Trans. Med. Imaging,vol. 10, pp. 513-522, Dec. 1991 .[5 ] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihoodfrom incomplete data via the EM algorithm, J. Roy. Statist. Soc., der.B, vol. 39, pp. 1-38, 1977.[6 ] R. Fletcher and C. Reeves, Function minimization by conjugate gradi-ents, Comput. J ., vol. 7, pp, 149-154, 1964 .[7 ] S. Geman and D. McClure, Bayesian image analysis: An applicationto single photon emission tomography, Proc. Statist. Comput. Sect.,Amer. Statist. Assn., Washington, DC , pp. 12-18, 1985.[8] P. E. Gill, W. Murray, and M. Wright, Practical Optimization. Ne wYork: Academic, 1981.

1 P. Green, Bayesian reconstructions from emission tomography using amodified EM algorithm, IEEE Trans. Med. Imaging, vol. 9, pp. 84-93,XX..- innnV1U. L77.C. G. Han. P. M. Pardalos, and Y. Ye, Computational aspects of aninterior point algorithm for quadratic programming problems with boxconstraints, in Large Scale Numerical Optimization, T. F. Coleman andY. Li, eds., SIAM, 1990, pp. 92-112.T. Hebert and R. Leahy, A generalized EM algorithm for the 3-DBayesian reconstruction from Poisson data using Gibbs priors, IEEETrans. Med. Imaging, vol. 8, pp. 194-202, 1989.G. T. Herman and D. Odhner, Evaluation and optimization of iterativereconstruction techniques, IEEE Trans. Med. Imaging, vol. IO , pp.336346, Sept . 1991.G. Herman, D. Odhner, X . Toennies, and S . Zenios, A parallelized algo-rithm for image reconstruction from noisy projections, in Proc. Work-shop Large Scale Optimization, T. Coleman and Y. Li, eds. Philadel-phia, SIAM, 1990.M. R. Hestenes and E. Stiefel, Method of conjugate gradients forsolving linear systems, J. Res. Nut. Bur. Standards, no. 49 , pp. 4094 36 ,1952.L. Kaufman, Implementing and accelerating the EM algorithm forpositron emission tomography, IEEE Trans. Med. Imaging, vol. MI-6,pp. 37-51, M ar. 1987.

[ 161 __, Solving emission tomography problems on vector machines,Annals Oper. Res., vol. 22, pp, 325-353, 1990.1171 S . Kawata and P. Nalcioglu, Constrained reconstruction by the con-jugate gradient method, IEEE Trans. Med. Imaging, vol. MI-4, pp.65-71, June 1985.[181 J. C. Lagarias and M. J. Todd, eds., Mathematical Developments Arisingfrom Linear Programming Mathematics, vol. 1 14. Providence, RI:Amer. Math. Soc., 1990.[191 K. Lange, Convergence of EM reconstruction with Gibbs smoothing,IEEE Trans. Med. Imaging, vol. 9 , pp. 439 44 6, Dec. 1990.[20] -, Corrections to convergence of EM reconstruction with Gibbssmoothing, IEEE Trans. Med. Imaging, vol. IO , p. 228, June 1991.[21] K. Lange and R. Carson, EM reconstruction algorithms for emissionand transmission tomography, J . Compu t. Assisted Tomography, vol. 8,pp. 302-316, 1984.[22] E. Levitan and L. Herman, A maximum a posteriori probability ex-pectation maximization algorithm for image reconstruction in emissiontomography, IEEE Trans. Med . Imaging, vol. MI-6, pp. 185-192, 1987.[23] D. Luenberger, Introduction to Linear and Nonlinear Programming.Reading, MA: Addison-Wesley, 1973.[24] A. McCarthy and M. Miller, Maximum likelihood SPECT in clinicalcomputation times using mesh-connected parallel computers, IEEETrans. Med. Imaging, vol. 10, pp. 426-436, Sept. 1991.C. Paige and M. Saunders, LSQR: An algorithm for sparse linearequations and sparse least squares, ACM Trans. Math. Sofhyare, pp.43-71, Ma r. 1982.T. Pan and A. Yagle, Numerical study of multigrid implementations ofsome iterative reconstruction algorithms, IEEE Trans. Med. Imaging,vol. IO , pp. 572-588, Dec. 1991.M. V. Ranganath, A. D hawan, and N. Mullani, A multigrid expectation

maximization reconstruction algorithm for positron tomography, IEEETrans. Med. Imaging, vol. 7, pp. 273-278, Dec. 1988.L. A. Shepp and B. F. Logan, The Fourier reconstruction of a headsection, IEEE Trans. Nucl. Sci., vol . NS-21, pp. 2143, 1974.L. A. Shepp and Y. Vardi, Maximum likelihood reconstructionin positron emission tomography, IEEE Trans. Med. Imaging, pp.113-122, 1982.B. Tsui, X. Zhao, E. Frey, and G . Gullberg, Comparison between ML-EM and WLS-CG algorithms for SPECT image reconstruction, IEEETrans. Nucl. Sci., vol. 38, pp. 1766-1772, Dec. 1991.Y. Vardi, L. A. Shepp, and L. Kaufman, A statistical model for positronemission tomography, J. Amer. Statist. Ass., vol. 80, pp. 8-20, 34-37,Mar. 1985.

Date post:	05-Apr-2018
Category:	Documents
Upload:	sandra-barbazan
View:	222 times
Download:	0 times

Maximum Likelihood Reconstruction 2

Documents