Journal of Magnetic Resonance - Spin Dynamics · spectroscopy [4–6], magnetic resonance [7–12],...

Journal of Magnetic Resonance 212 (2011) 412–417

Contents lists available at SciVerse ScienceDirect

Journal of Magnetic Resonance

journal homepage: www.elsevier .com/locate / jmr

Second order gradient ascent pulse engineering

P. de Fouquieres a, S.G. Schirmer a,⇑, S.J. Glaser b,⇑, Ilya Kuprov c,⇑a Centre for Quantum Computation, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, United Kingdomb Department of Chemistry, Technische Universität München, 85747 Garching, Germanyc Oxford e-Research Centre, University of Oxford,7 Keble Road, Oxford OX1 3QG, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history:Received 17 April 2011Revised 22 July 2011Available online 4 August 2011

Keywords:Optimal controlGRAPEBFGS

1090-7807/$ - see front matter � 2011 Elsevier Inc. Adoi:10.1016/j.jmr.2011.07.023

⇑ Corresponding authors.E-mail addresses: [email protected] (S.G. Schirmer

[email protected] (I. Kuprov).

We report some improvements to the gradient ascent pulse engineering (GRAPE) algorithm for optimalcontrol of spin ensembles and other quantum systems. These include more accurate gradients, conver-gence acceleration using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton algorithm as wellas faster control derivative calculation algorithms. In all test systems, the wall clock time and the conver-gence rates show a considerable improvement over the approximate gradient ascent.

� 2011 Elsevier Inc. All rights reserved.

1. Introduction

An optimal control problem consists in bringing a dynamic sys-tem from one state to another to a given accuracy with minimumexpenditure of effort [1–3]. Such tasks are encountered in opticalspectroscopy [4–6], magnetic resonance [7–12], spin dynamics[13,14] and the emerging field of quantum information processing[15–19]. While many variations exist in practice [20–22], depend-ing on the desired outcome and the constraints placed on the solu-tion by instrumental limitations [1,23,24], they can all be broadlyclassified into gate design problems [11,25], where a specific uni-tary transformation of the entire state space is sought, and statecontrol problems [10,26–28], where the population is to be movedfrom one specific state to another without conditions on thedynamics of other states. Because any gate design problem canbe represented as a state control problem in a space of higherdimension [23,25], we will only consider the state control formula-tion below.

The state of a quantum system can be described by a densityoperator qðtÞ, whose evolution is governed by the quantum Liou-ville equation [29]:

@

@tqðtÞ ¼ �i HðtÞ; qðtÞ

h iþ ^RqðtÞ ð1Þ

where HðtÞ is a possibly-time dependent Hamiltonian and ^R is therelaxation superoperator. It is often convenient to carry out the cal-culations in Liouville space by replacing a matrix representation of

ll rights reserved.

), [email protected] (S.J. Glaser),

qðtÞwith a vector jqðtÞi obtained by stacking the columns of qðtÞ. Inthis representation, the equation acquires the following form:@

@tjqðtÞi ¼ �i^LðtÞ qðtÞj i; ^LðtÞ ¼ E� HðtÞ � HðtÞT � Eþ i^R ð2Þ

where E is the unit matrix of the same dimension as H [29] and ^R isthe Liouville space representation of the relaxation superoperator.The general solution may be formally written as:

jqðtÞi ¼ expðOÞ �iZ t

0

^LðtÞdt� �

jqðtÞi ð3Þ

where exp(O) indicates Dyson’s time-ordered exponential [30]. Gi-ven a fixed grid of points {t1, ... , tN}, the density matrix at a partic-ular grid point n is then given by:

jqðtnÞi ¼ ^Pn:::^P2

^P1 qð0Þj i; ^Pn ¼ expðOÞ �iZ tn

tn�1

^LðtÞdt� �

ð4Þ

A state control problem consists in finding such HðtÞ as would max-imize the population of a given target density matrix rj i after evo-lution from a given initial state jqð0Þi under the total Liouvillian ^LðtÞ[23,28]:

HoptðtÞ 2 arg maxHðtÞ

ðhrjqðtNÞiÞ ð5Þ

where the maximum is sought in the class of square-integrable Her-mitian matrix valued functions of time. From the experimental per-spective, not every part of HðtÞ can be modified at will, and it iscommon to separate it into the ‘‘drift’’ and the ‘‘control’’ parts:

HðtÞ ¼ H0 þX

k

cðkÞðtÞHk ) ^LðtÞ ¼ ^L0 þX

k

cðkÞðtÞ^Lk

^L0 ¼ E� H0 � HT0 � Eþ i^R; ^Lk ¼ E� Hk � HT

k � E

ð6Þ

http://dx.doi.org/10.1016/j.jmr.2011.07.023

mailto:[email protected]



http://dx.doi.org/10.1016/j.jmr.2011.07.023

http://www.sciencedirect.com/science/journal/10907807

http://www.elsevier.com/locate/jmr

P. de Fouquieres et al. / Journal of Magnetic Resonance 212 (2011) 412–417 413

^L0 being the ‘‘drift’’ component deemed to be beyond our directinfluence and ^Lk are the ‘‘control’’ components, whose contributionsmay be varied experimentally [13]. Various constraints are oftenplaced on the control functions c(k)(t), mostly to enforce the instru-mental limitations [20,21]. The optimization problem in Eq. (5) isdifficult to solve in full generality, and it is common to simplifythe description of HðtÞ by assuming the control functions to bepiecewise-constant with fixed switching times [9,12]:

cðkÞðtÞ ¼ cðkÞn ; tn�1 < t < tn ð7Þ

This makes the problem finite-dimensional and facilitates numeri-cal solutions. In practice this is often not an approximation, sincethe actual output of many hardware devices, e.g. waveform genera-tors in NMR spectroscopy, can be made piecewise-constant. Underthis assumption, the time-ordered exponential (a notoriously com-plicated object from the numerical calculation perspective) in Eq.(4) simplifies into a simple matrix exponential:

^Pn ¼ expðOÞ �iZ tn

tn�1

^LðtÞdt� �

¼ exp �i ^L0 þX

k

cðkÞn^Lk

!Dt

" #ð8Þ

where Dt is the time grid spacing. Progress can then be made withthe optimization problem in Eq. (5), because the gradient of the er-ror functional 1� hrjqðtNÞi with respect to the amplitude of controlk at time step n is now easily computed:

@

@cðkÞn

rjqðtNÞh i ¼ @

@cðkÞn

r ^PN . . .^Pn . . .

^P1

�� qð0ÞD E

¼ r ^PN . . .^Pnþ1

@^Pn

@cðkÞn

^Pn�1 . . .^P1

��qð0Þ

* +

¼ ^Pynþ1 . . .^PyN rj i

� �y @ ^Pn

@cðkÞn

^Pn�1 . . .^P1 qð0Þj i

� �ð9Þ

This effectively means that the destination state has to be propa-gated backward to time point tn, the source state has to be propa-gated forward to time point tn�1 and the scalar product with thederivative of the propagator for the time step n has to be taken.In practice [7,9,12,28], the entire forward trajectory is computedfrom qð0Þ, the entire backward trajectory is computed from r andthen the two are folded in each step with the propagator derivativesas prescribed by Eq. (9).

2. Calculation of control derivatives

The expression for the propagator derivative suggested in thepaper introducing the GRAPE method [9] reasonably assumes thatthe control sequence discretization step is small:

@

@cðkÞn

^Pn ¼@

@cðkÞn

exp �i ^L0 þX

k

cðkÞn^Lk

!Dt

" #

¼ exp �i ^L0 þX

k

cðkÞn^Lk

!Dt

" #�i^LkDt þ OðDt2Þ� �

¼ ^Pn �i^LkDt� �

þ OðDt2Þ ð10Þ

this assumption makes the evaluation of control gradient very com-putationally affordable – it introduces no new matrix–matrix mul-tiplications beyond those used to compute the propagators, because�i^LkDt and ^Pn can be multiplied sequentially into the vectors oneither side of the derivative in Eq. (9). The cost of the control gradi-ent is therefore approximately equal to the cost of the trajectorycalculation. From Eq. (9), we have:

@

@cðkÞn

rjqðtNÞh i ¼ ^Pyn . . .^PyN rj i

� �y�i^LkDt� �

^Pn�1 . . .^P1 qð0Þj i

� �þ OðDt2Þ ð11Þ

While the gradient ascent using this equation does in most casesyield acceptable accuracy solutions, it has been recognized for sometime that the O(Dt2) residual tends to limit both the convergencerate and the final accuracy achievable. As the gradient becomessmaller during the minimization, it is the first term in Eq. (11) thatgets reduced, and the situation eventually emerges where theapproximation error dominates the gradient. This leads to the oftenobserved and much lamented ‘‘slowdown’’ of the GRAPE algorithmas it approaches high transfer fidelities. It also scrambles theapproximate Hessians used by quasi-Newton methods, essentiallypreventing their use. This may be seen directly by following theO(Dt2) residual through Newton’s method:

f ð~xþ D~xÞ ¼ f ð~xÞ þ ðrf ð~xÞT þ OðDt2ÞÞD~xþ 12

D~xHD~xþ OðjD~xj3Þ

¼ f ð~xÞ þ rf ð~xÞTD~xþ 12

D~xHD~xþ OðDt2jD~xjÞ þ OðjD~xj3Þ

ð12Þ

Unless the time step Dt is chosen to be extremely small, theOðDt2jD~xjÞ error term, which is linear in jD~xj, completely obscuresthe Hessian term, which is quadratic in jD~xj. For typical NMR sys-tems, this accuracy constraint places Dt into nanosecond range,which makes the number of steps very large and causes difficultieson the instrumental side.

In our experience, these problems can be removed at a reason-able computational cost, if the exact propagator gradient is used,rather than the first order approximation. The most straightfor-ward avenue is to differentiate the Taylor or Chebyshev expansionfor the exponential directly [31]. In the case of the Taylor series,this yields:

@

@cðkÞn

exp �i^LDth i

¼X1p¼1

ð�iDtÞp

p!

Xp�1

q¼0

^Lq^Lk^Lp�q�1;

^L ¼ ^L0 þX

k

cðkÞn^Lk ð13Þ

The second sum appears because ^L and ^Lk do not necessarily com-mute. This formulation is computationally about as expensive asthe original exponential because matrices involved (particularly^Lk) are often very sparse [32,33], but it is rather inconvenient be-cause it involves double summation. A more computer-friendly ver-sion is given by a commutator series, which is the direct extensionof Eq. (10):

@

@cðkÞn

^Pn ¼@

@cðkÞn

exp �i^LDth i

¼ exp �i^LDth i

�i^LkDt þ Dt2

2^L; ^Lk

h i�

þ iDt3

6^L; ^L; ^Lk

h ih i� Dt4

24^L; ^L; ^L; ^Lk

h ih ih iþ � � �

�ð14Þ

This expression can be obtained by rotating summation indices inEq. (13):

X1p¼1

ð�iDtÞp

p!

Xp�1

q¼0

^Lq^Lk^Lp�q�1 ¼

X1p¼0

X1q¼0

ApBAq

ðpþ qþ 1Þ! ;

A ¼ �i^LDt; B ¼ �i^LkDt; ð15Þ

splitting the factorial in the denominator, then summing the seriesinto matrix exponentials

414 P. de Fouquieres et al. / Journal of Magnetic Resonance 212 (2011) 412–417

1ðpþ qþ 1Þ! ¼

1p!q!

Z 1

0ð1� aÞpaq da)

X1

p¼0

X1q¼0

ApBAq

ðpþ qþ 1Þ! ¼ eAZ 1

0e�aABeaA da; ð16Þ

evaluating the integralZ 1

0e�aABeaA da ¼

Z 1

0e�adAaBda ¼ c½�adA�B;

cðzÞ ¼ ez � 1z¼X1n¼0

zn

ðnþ 1Þ! ; adxy ¼ ½x; y�ð17Þ

and expressing powers of adA as nested commutators with A:

X1m¼0

ð�adAÞm

ðmþ 1Þ! B ¼X1m¼0

ð�1Þm

ðmþ 1Þ! ½A;B�m;

½A;B�m ¼ ½A; ½A;B�m�1�; ½A;B�0 ¼ B ð18Þ

With Eqs. (14)–(18) in place, the expression for the control gradientbecomes:

@

@cðkÞn

rjqðtNÞh i ¼ � ^Pyn . . .^PyN rj i

� �yX1m¼0

ðiDtÞmþ1

ðmþ 1Þ!^L0 þ

Xk

cðkÞn^Lk;

^Lk

" #m

!^Pn�1 . . .

^P1 qð0Þj i� �

ð19Þ

where the summation of the series is to be continued until the de-sired accuracy (as indicated by the residual norm) is achieved. A fewinitial orders of accuracy in the Taylor expansion of c(z) are plottedin Fig. 1.

The numerical accuracy of Eq. (19) in finite-precision arithmeticmerits further discussion. As with all power series, the perfect sce-nario from the numerical point of view is to have the norm of theargument �i^LDt scaled into the unit interval – this avoids the‘‘hump’’ in the convergence and keeps the terms well within theaccuracy limits imposed by 64-bit floating-point arithmetic. AsFig. 2 demonstrates, adequate numerical accuracy is maintainedfor k � i^LDtk < 30, but deteriorates rapidly thereafter. The standardtechnique used to resolve this issue is known as ‘‘scaling andsquaring’’ [31,34]:

exp �i^LDt� �

¼ exp�i^LDt

2

!2

ð20Þ

The product rule for the derivative makes the scaling and squaringprocedure for the derivative propagator slightly different:

-2 -1 1 20.75

0.80

0.85

0.90

0.95

1.00

1.05second order

third order

fourth order

( )z

z

γ

Fig. 1. Taylor approximation accuracy for |c(z)| as a function of approximationorder and matrix norm. It should be noted that only power series are in practiceaffordable in Eq. (17) – any rational approximation would require computationallyexpensive matrix inversions.

@

@cðkÞn

exp �i^LDt� �

¼ exp�i^LDt

2

!@

@cðkÞn

exp�i^LDt

2

!" #

þ @

@cðkÞn

exp�i^LDt

2

!" #exp

�i^LDt2

!ð21Þ

Because the exponential propagator itself is computed elsewhere inthe GRAPE procedure [9], the cost of the squaring step is modest –two sparse matrix multiplications. Our experience with this proce-dure has been very positive, it tolerates scaling factors in excess of106, thus encompassing all practically encountered GRAPE algo-rithm application situations.

If relaxation is negligible and the Hamiltonian dimension issmall enough to permit numerical diagonalization, the series inEq. (19) may be avoided because we can evaluate

c½adðiHDtÞ�ð�iHkDtÞ directly by diagonalizing H [8,22,31]. Let

H ¼ VKV y, where V is a unitary matrix whose columns are eigen-

vectors of H and K is a diagonal matrix with the correspondingeigenvalues kr along the diagonal. We then have:

DðkÞ ¼ c adðiHDtÞh i

�iHkDt� �

¼ V G� B� �

V y

Grs ¼ c½iðkr � ksÞDt�; B ¼ V y �iHkDt� �

Vð22Þ

where � denotes element-wise (Hadamard) matrix multiplication.Using this formula, we have:

@

@cðkÞn

rjqðtNÞh i ¼ ^Pyn . . .^PyN rj i

� �y ^DðkÞ ^Pn�1 . . .^P1 qð0Þj i

� �

^DðkÞ ¼ DðkÞ � E� E� DðkÞT

ð23Þ

This method is the adaptation of the diagonalization method formatrix functions – the matrix is transformed into its eigenframe,the function is applied to the eigenvalues and the result is trans-formed back into the original frame. It is not applicable to Hamilto-nians with dimensions in excess of 104, because the eigenvectorarray V is often dense even for sparse Hamiltonians and overflowsthe computer memory.

Eqs. (13)–(23) present an analytical formalism for the calcula-tion of the control derivatives. A popular numerical alternative isto use finite-difference approximations, e.g.:

@^Pn

@cðkÞn

¼^Pnð. . . ; cðkÞn þ h; . . .Þ � ^Pnð. . . ; cðkÞn ; . . .Þ

hþ OðhÞ ð24Þ

@^Pn

@cðkÞn

¼^Pn . . . ; cðkÞn þ h; . . .� �

� ^Pn . . . ; cðkÞn � h; . . .� �

2hþ Oðh2Þ ð25Þ

where the amplitude of the kth control at the nth time point is var-ied by a finite amount h. Eqs. (24) and (25) are indicative – they arethe simplest examples of a large class of numerical finite-differenceapproximations for the derivative [35]. The primary balance to bemaintained in this approach is between the approximation accu-racy, the numerical accuracy and the computational cost of thederivative.

The forward finite difference approximation in Eq. (24) has theadvantage of being computationally affordable – it only requires

the calculation of one extra exp �i^LDt� �

q product per step, which

may be carried out using Krylov subspace techniques, thus avoid-ing matrix multiplications. From Eq. (9):

0 50 100 150 200 250

order of approximation

−40

−20

0

20

40

60

80double precisionbreakdown

()

()

alo

gz

zγ

γ−

10z =

80z =

Fig. 2. Numerical convergence profiles in double precision for the Taylor approximation of c(z) as functions of approximation order and matrix norm. The Y axis shows thenorm of the difference between the value of c(z) evaluated directly and using the series expansion ca(z) to a given order. Due to the presence of numerical round-off errors, thescaling and squaring procedure is mandatory for z > 30.

Finite difference step / a.u.

Nor

m o

f the

dev

iatio

nfro

m th

e ex

act d

eriv

ativ

e

Fig. 3. Norm of the deviation of the finite difference derivative (fourth order centralfinite difference approximation) from the limit of the commutator series in Eq. (19)as a function of the differentiation parameter step.


@

@cðkÞn

rjqðtNÞh i ¼ ^Pynþ1 . . .^PyN rj i

� � @ ^Pn

@cðkÞn

^Pn�1 . . .^P1 qð0Þj i

� �

¼ ^Pynþ1 . . .^PyN rj i

� � ^Pn cðkÞn þ h� �

� ^Pn cðkÞn

� �h

� ^Pn�1 . . .^P1 qð0Þj i

� �þ OðhÞ ð26Þ

Approximations with a higher order of accuracy may be used at theexpense of having to calculate further exp �i^LDt

� �q products for

the extra stencil points. In common with the commutator series ap-proach in Eq. (19), the finite difference method is applicable to dis-sipative quantum systems, where anti-Hermitian terms may bepresent in the Liouvillian. It should be noted that Eq. (26) only in-volves a finite difference with respect to the step propagator – therest of the trajectory does not need to be recomputed. This is amuch more efficient arrangement as compared to the brute-force fi-nite-differencing of rjqðtNÞh i.

The accuracy of finite difference methods depends on the stepsize h. In practical situations, the choice is constrained in twoways: if the step is too large, the finite difference would not be agood approximation to the derivative, and if the step is too small,the number of accurate digits in the floating point representationof the difference would reduce to none. In general, we do not havesufficient information to make an a priori estimate for the finite dif-ference approximation error (it requires the knowledge of higherderivatives), but the numerical round-off error is a somewhat morestraightforward quantity. A reasonable strategy therefore is tochoose the smallest h for which the round-off error is guaranteedto be below a given threshold. Assuming the approximation erroris indeed small for that choice of h, we can approximate a functionf(x) with a linear polynomial f(x + h) = f(x) + f 0(x)h for the purposeof obtaining the required round-off error bound.

The evaluation of matrix exponentials is accurate up to a fixedpurely absolute error eA, which is a few orders of magnitude largerthan the machine precision eM (equal to 2.22 � 10�16 in 64-bitarithmetic), because the norm of exponential time propagators inquantum mechanics is less than or equal 1. The error incurred incomputing f(x + h) � f(x) is then 2eA plus at most eM(|f(x)| + |f 0(x)|h)from truncating their difference. The finite difference approxima-tion |h�1[f(x + h) � f(x)] � f 0(x)| then carries an absolute errorbounded by:

1jhj 2eA þ eMjf ðxÞjð Þ þ eMjf 0ðxÞj: ð27Þ

This expression may be equated to the chosen error threshold andsolved for h, assuming that an order of magnitude estimate of |f 0(x)|is available. Even if the norm of f 0(x) cannot be estimated a priori,Eq. (27) can still be used to validate the choice of step a posteriori,using the finite difference approximation to f 0(x).

The dependence of the approximation accuracy on the finite dif-ference step size is illustrated in Fig. 3 – for large steps the error isdominated by the approximation error of the finite difference,which drops smoothly when the step is reduced. For small stepsthe error is dominated by the numerical round-off errors, which in-crease erratically as the numerical accuracy decreases.

The choice of the differentiation algorithm is ultimately left touser’s discretion. The considerable improvement that better gradi-ent accuracy brings to the asymptotic convergence rate is illus-trated in Fig. 4 – for the spin chain in question, a pulse with 100nanosecond time stepping is clearly outside the validity range ofthe first-order approximation in Eq. (11), and further terms in Eq.(19) are necessary to prevent the minimization process from halt-ing when the insufficiently accurate gradient effectively sends thesystem uphill.

3. GRAPE with quasi-Newton optimizers

The close relationship between optimal control and numericaloptimization methods is well researched [36–38] and the emerg-ing numerical optimization methods (such as SQP [39]) have beenspecialized to optimal control and successfully applied, for exam-ple, to controlling fluid flow [40] or biological processes [41]. Inthe magnetic resonance context, the above noted fact that the con-trol gradient of the objective function is relatively cheap to com-pute means that it is almost always advantageous to use the

1st ordert

2nd orderd

3rd orderd

4th orderh

exact

20 40 60 80

BFGS iteration Frequency / Hz

100

10-1

10-2

10-3

-2400 -1600 -800 0 800 1600 2400

A

B

C

()

ˆˆ

1Nt

σρ

−

Fig. 4. (Left Panel) Quality of state transfer as a function of iteration number of the BFGS algorithm (as implemented in the interior-point algorithm used by Matlab’s fminconfunction [46,47]). The fidelity parameter refers to the quality of magnetization inversion under a 50-point shaped radiofrequency pulse applied to a chain of 31 protons withchemical shifts spread at regular intervals over the range of 8 ppm with strong nearest neighbor J-couplings of 20 Hz in a 600 MHz magnet. Pulse duration 5 ms (100 ls perwaveform step), pulse amplitude capped at 2500 Hz. State space restriction to three-spin orders involving adjacent spins was used to reduce the matrix dimension involved inthe simulation. The starting points in the optimization were set to sequences of uniformly distributed random numbers from the ±1000 Hz interval. The ‘‘kth order’’ labelsrefer to the number of commutator series terms in Eq. (19), ‘‘exact’’ refers to the series that has been summed to machine precision. (Right Panel) A: pulse-acquire NMRspectrum of the spin system described above; B: magnetization inversion profile under the pulse waveform obtained after 100 iterations with the first-order approximationto the gradient; C: magnetization inversion profile obtained after 100 iterations with the ‘‘exact’’ gradient computed using Eq. (19).

BFGSLBFGS(10)DFPsteepest descent

20 40 60 80iteration number

100

10-1

10-2

10-3

()

ˆˆ

1Nt

σρ

−

Fig. 5. Quality of state transfer as a function of iteration number for three Hessianupdate schemes as compared to cubic line search steepest descent. DFP stands forDavidon–Fletcher–Powell method. The fidelity parameter refers to the quality ofmagnetization inversion pulse in a 31-spin system as described in the caption toFig. 3. It should be noted that the steepest descent minimization requires 5–10function evaluations per iteration (line search) and is therefore considerably sloweron the wall clock, as well as iteration count, than the three quasi-Newton methods.

416 P. de Fouquieres et al. / Journal of Magnetic Resonance 212 (2011) 412–417

gradient history to build an approximation to the Hessian matrix,which can then be used in quasi-Newton optimization algorithms,which can exhibit super-linear convergence [42]. Because GRAPE isa concurrent update algorithm [9], the standard quasi-Newtonmethods may be used directly.

Several schemes exist for generating approximate Hessiansfrom the gradient history, the most notable being DFP [43] andBFGS [42]:

HDFPkþ1 ¼ E�

~gk~sTk

~gTk~sk

� �Hk E�

~sk~gTk

~gTk~sk

� �þ~gk~gT

k

~gTk~sk;

HBFGSkþ1 ¼ Hk þ

~gk~gTk

~gTk~sk� ðHk~skÞðHk~skÞT

~sTkHk~sk

;

~gk ¼ rf ð~xkþ1Þ � rf ð~xkÞ; ~sk ¼~xkþ1 �~xk ð28Þ

These pseudo-Hessians are constructed to satisfy the natural finitedifference condition:

rf ð~xkÞ ¼ rf ð~xkþ1Þ �Hkþ1ð~xkþ1 �~xkÞ ð29Þ

A necessary condition for Hk+1 to be negative definite is thereforethat

ð~xkþ1 �~xkÞTHkþ1ð~xkþ1 �~xkÞ ¼ ð~xkþ1 �~xkÞTðrf ð~xkþ1Þ � rf ð~xkÞÞ < 0

ð30Þ

but a useful property of BFGS and DFP update rules is that it is also asufficient condition, assuming that H0 was chosen to be negativedefinite [42]. The iteration step (with optional line search) is thenperformed as:

~xkþ1 ¼~xk � akH�1k rf ð~xkÞ; ak > 0 ð31Þ

where ak is the line search parameter (ak = 1 corresponds to New-ton iteration).

Because matrix inversions are expensive, it is in practice neces-sary to use the corresponding update schemes for the inverse ofthe Hessian:

ðH�1ÞBFGSkþ1 ¼ E�

~sk~gTk

~gTk~sk

� �T

H�1k E�

~sk~gTk

~gTk~sk

� �þ~sk~sT

k

~gTk~sk

ðH�1ÞDFPkþ1 ¼ H�1

k þ~sk~sT

k

~gTk~sk�

H�1k~gk

� �H�1

k~gk

� �T

~gTkH�1

k ~gk

ð32Þ

In the case of BFGS, a very memory-efficient procedure is availablefor generating the next step vector directly from the past gradienthistory, requiring no matrix storage. It is known as memory-limitedBFGS, or L-BFGS [44,45]. In the context of optimal control, thenumber of variables often exceeds 104, and L-BFGS is the onlyquasi-Newton method that is capable of handling such problems.The performance of DFP, BFGS and L-BFGS for the optimization of


a broadband magnetization inversion pulse in NMR spectroscopy isillustrated in Fig. 5.

4. Conclusions and outlook

The GRAPE algorithm for control sequence optimization has thebenefit of computationally affordable gradients. Using the equa-tions reported in this paper, their accuracy may be improved be-yond the first order approximation and the result used togenerate approximate Hessians for quasi-Newton optimization.In all test systems, the wall clock time and the convergence ratesshow a considerable improvement over the approximate gradientascent – the ‘‘slowdown’’ problem disappears. The BFGS-GRAPEprocedure reported in this paper is implemented in the Spinachsoftware library [33].

Acknowledgments

The authors would like to thank Burkhard Luy, Shai Machnes,Ivan Maximov, Uwe Sander, Thomas Schulte-Herbrüggen, ThomasSkinner, Luke Edwards and David Tannor for stimulating discus-sions. We are also grateful to the anonymous reviewer, whosecomments have helped significantly to improve the presentationof this paper. This work is supported by EPSRC (EP/F065205/1,EP/H003789/1, EP/D07192X/1, CASE/CNA/07/47), DFG (Gl 203/6-1, SFB 631), Hitachi Corporation and the EU program QESSENCE.

References

[1] F.L. Lewis, V.L. Syrmos, Optimal Control, Wiley, 1995.[2] L.S. Pontryagin, V.G. Boltyanski, R.V. Gamkrelidze, E.F. Mischenko,

Mathematical Theory of Optimal Processes, Nauka, Moscow, 1961.[3] V.F. Krotov, Global Methods in Optimal Control Theory, Marcel Dekker, New

York, 1996.[4] C.P. Koch, M. Ndong, R. Kosloff, Two-photon coherent control of femtosecond

photoassociation, Faraday Discuss. 142 (2009) 389–402.[5] C.P. Koch, J.P. Palao, R. Kosloff, F. Masnou-Seeuws, Stabilization of ultracold

molecules using optimal control theory, Phys. Rev. A 70 (2004) 013402.[6] J.P. Palao, R. Kosloff, C.P. Koch, Protecting coherence in optimal control theory:

state-dependent constraint approach, Phys. Rev. A 77 (2008).[7] N.I. Gershenzon, K. Kobzar, B. Luy, S.J. Glaser, T.E. Skinner, Optimal control

design of excitation pulses that accommodate relaxation, J. Magn. Reson. 188(2007) 330–336.

[8] T.O. Levante, T. Bremi, R.R. Ernst, Pulse-sequence optimization with analyticalderivatives. Application to deuterium decoupling in oriented phases, J. Magn.Reson. 121 (1996) 167–177.

[9] N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbrüggen, S.J. Glaser, Optimalcontrol of coupled spin dynamics: design of NMR pulse sequences by gradientascent algorithms, J. Magn. Reson. 172 (2005) 296–305.

[10] K. Kobzar, T.E. Skinner, N. Khaneja, S.J. Glaser, B. Luy, Exploring the limits ofbroadband excitation and inversion: II. Rf-power optimized pulses, J. Magn.Reson. 194 (2008) 58–66.

[11] Z. Tošner, S.J. Glaser, N. Khaneja, N.C. Nielsen, Effective Hamiltonians byoptimal control: solid-state NMR double-quantum planar and isotropic dipolarrecoupling, J. Chem. Phys. 125 (2006) 184502.

[12] Z. Tošner, T. Vosegaard, C. Kehlet, N. Khaneja, S.J. Glaser, N.C. Nielsen, Optimalcontrol in NMR spectroscopy: numerical implementation in SIMPSON, J. Magn.Reson. 197 (2009) 120–134.

[13] N. Khaneja, R. Brockett, S.J. Glaser, Time optimal control in spin systems, Phys.Rev. A 63 (2001) 323081.

[14] N. Khaneja, T. Reiss, B. Luy, S.J. Glaser, Optimal control of spin dynamics in thepresence of relaxation, J. Magn. Reson. 162 (2003) 311–319.

[15] T. Calarco, M.A. Cirone, M. Cozzini, A. Negretti, A. Recati, E. Charron, Quantumcontrol theory for decoherence suppression in quantum gates, Int. J. Quant. Inf.5 (2007) 207–213.

[16] T. Caneva, M. Murphy, T. Calarco, R. Fazio, S. Montangero, V. Giovannetti, G.E.Santoro, Optimal control at the quantum speed limit, Phys. Rev. Lett. 103(2009).

[17] U.V. Poulsen, S. Sklarz, D. Tannor, T. Calarco, Correcting errors in a quantumgate with pushed ions via optimal control, Phys. Rev. A 82 (2010) 012339.

[18] N. Khaneja, B. Heitmann, A. Spörl, H. Yuan, T. Schulte-Herbrüggen, S.J. Glaser,Shortest paths for efficient control of indirectly coupled qubits, Phys. Rev. A 75(2007) 012322.

[19] A. Spörl, T. Schulte-Herbrüggen, S.J. Glaser, V. Bergholm, M.J. Storcz, J. Ferber,F.K. Wilhelm, Optimal control of coupled Josephson qubits, Phys. Rev. A 75(2007) 012302.

[20] V.F. Krotov, Quantum system control optimization, Dokl. Math. 78 (2008) 949–952.

[21] V.F. Krotov, I.N. Fel’dman, Iterative method for solving optimal controlproblems, Eng. Cyber. 21 (1983) 123–130.

[22] S. Machnes, U. Sander, S.J. Glaser, P.d. Fouquieres, A. Gruslys, S. Schirmer, T.Schulte-Herbrueggen, Comparing, optimising and benchmarking quantumcontrol algorithms in a unifying programming framework. Available from:<http://link.aps.org/doi/10.1103/PhysRevA.84.022305>.

[23] H. Mabuchi, N. Khaneja, Principles and applications of control in quantumsystems, Int. J. Rob. Nonl. Con. 15 (2005) 647–667.

[24] L.S. Pontryagin, Mathematical Theory of Optimal Processes, Pergamon Press,1964.

[25] B. Luy, K. Kobzar, T.E. Skinner, N. Khaneja, S.J. Glaser, Construction of universalrotations from point-to-point transformations, J. Magn. Reson. 176 (2005)179–186.

[26] K. Kobzar, B. Luy, N. Khaneja, S.J. Glaser, Pattern pulses: design of arbitraryexcitation profiles as a function of pulse amplitude and offset, J. Magn. Reson.173 (2005) 229–235.

[27] N. Pomplun, B. Heitmann, N. Khaneja, S.J. Glaser, Optimization of electron-nuclear polarization transfer, Appl. Magn. Reson. 34 (2008) 331–346.

[28] T.E. Skinner, T.O. Reiss, B. Luy, N. Khaneja, S.J. Glaser, Application of optimalcontrol theory to the design of broadband excitation pulses for high-resolutionNMR, J. Magn. Reson. 163 (2003) 8–15.

[29] R.R. Ernst, G. Bodenhausen, A. Wokaun, Principles of nuclear magneticresonance in one and two dimensions, Clarendon, 1987.

[30] F.J. Dyson, The radiation theories of Tomonaga, Schwinger, and Feynman, Phys.Rev. 75 (1949) 486–502.

[31] I. Kuprov, C.T. Rodgers, Derivatives of spin dynamics simulations, J. Chem.Phys. 131 (2009) 234108.

[32] R.S. Dumont, S. Jain, A. Bain, Simulation of many-spin system dynamics viasparse matrix methodology, J. Chem. Phys. 106 (1997) 5928–5936.

[33] H.J. Hogben, M. Krzystyniak, G.T.P. Charnock, P.J. Hore, I. Kuprov, Spinach – Asoftware library for simulation of spin dynamics in large spin systems, J. Magn.Reson. 208 (2011) 179–194.

[34] T.C. Fung, Computation of the matrix exponential and its derivatives by scalingand squaring, Int. J. Numer. Meth. Eng. 59 (2004) 1273–1286.

[35] G.E. Forsythe, W.R. Wasow, Finite-difference Methods for Partial DifferentialEquations, John Wiley and Sons, New York; London, 1960.

[36] M.D. Canon, C.D. Cullum, E. Polak, Theory of Optimal Control andMathematical Programming, McGraw-Hill, New York, London, 1970.

[37] E. Polak, Historical survey of computational methods in optimal control, SiamRev. 15 (1973) 553–584.

[38] D. Tabak, Kuo, B.C.-i. Kuo, Optimal Control by Mathematical Programming,Prentice-Hall, Englewood Cliffs, NJ, 1971.

[39] A. Barclay, P.E. Gill, J. Ben Rosen, SQP methods and their application tonumerical optimal control, variational calculus, Optim. Contr. Appl. 124 (1998)207–222.

[40] M. Hinze, K. Kunisch, Second order methods for optimal control of time-dependent fluid flow, SIAM J. Contr. Optim. 40 (2002) 925–946.

[41] E. Balsa-Canto, J.R. Banga, A.A. Alonso, V.S. Vassiliadis, Efficient optimal controlof bioprocesses using second-order information, Ind. Eng. Chem. Res. 39(2000) 4287–4295.

[42] R. Fletcher, Practical Methods of Optimization, second ed., Wiley, 1987.[43] W.C. Davidon, Variable metric method for minimization, SIAM J. Opt. 1 (1991)

1–17.[44] R.H. Byrd, J. Nocedal, R.B. Schnabel, Representations of quasi-Newton matrices

and their use in limited memory methods, Math. Program. 63 (1994) 129–156.[45] D.C. Liu, J. Nocedal, On the limited memory BFGS method for large scale

optimization, Math. Progr. 45 (1989) 503–528.[46] T.F. Coleman, Y. Li, On the convergence of interior-reflective Newton methods

for nonlinear minimization subject to bounds, Math. Program. 67 (1994) 189–224.

[47] T.F. Coleman, Y.Y. Li, An interior trust region approach for nonlinearminimization subject to bounds, Siam J. Optim. 6 (1996) 418–445.

http://link.aps.org/doi/10.1103/PhysRevA.84.022305

Date post:	17-Jun-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Journal of Magnetic Resonance - Spin Dynamics · spectroscopy [4–6], magnetic resonance [7–12],...

Documents