SubRiemannian geodesics and cubics for efficient quantum ... · SubRiemannian geodesics and cubics...

SubRiemannian geodesics and cubics forefficient quantum circuits

Michael SwaddleSchool of Physics

Supervisors

Lyle NoakesSchool of Mathematics and Statistics

Jingbo WangSchool of Physics

September 6, 2017

This thesis is presented for the requirements of the degree of Masterof Philosophy of the University of Western Australia.

Acknowledgements

We would like to thank Harry Smallbone, Kooper de Lacy, and Liam Salter forvaluable thoughts and discussion. This research was supported by an AustralianGovernment Research Training Program (RTP) Scholarship.

i

ii

Abstract

Nielsen et al [1]–[5] first argued that efficient quantum circuits can be obtainedfrom special curves called geodesics. Previous work focused mainly on computinggeodesics in Riemannian manifolds equipped with a penalty metric where the penaltywas taken to infinity [1]–[6]. Taking such limits seems problematic, because it is notclear that all extremals of a limiting optimal control problem can be arrived atas limits of solutions. To rectify this, we examine subRiemannian geodesics, usingthe Pontryagin Maximum Principle on the special unitary group SU(2n) to obtainequations for the normal subRiemannian geodesics. The normal subRiemanniangeodesics were found to coincide with the infinite penalty limit.

However, the infinite penalty limit does give all the subRiemannian geodesics. Thereare potentially abnormal geodesics. These abnormal geodesics, which do not satisfythe normal differential equations. Potentially the abnormal geodesics may have lowerenergy, and hence generate a more efficient quantum circuit. Sometimes abnormalscan be ruled out via contradiction. However in other cases it is possible to constructnew equations for abnormals.

In SU(8), the space of operations on three qubits, we allow subRiemannian geodesicsto move tangent to directions corresponding to one and two qubit operations. In thiscase, we find new closed form solutions to the normal geodesics equations. Addition-ally, several examples of abnormal geodesics are constructed. In higher dimensionalgroups a series solution to the normal geodesic equations is also provided.

Furthermore we present numerical solutions to the normal geodesic boundary valueproblem in SU(2n) for some n. Building on numerical methods found in [6], wefind that a modification of bounding the energy leads to lower energy geodesics.Unfortunately, the geodesic boundary value problem is computationally expensiveto solve. We consider a new discrete method to compute geodesics, which is moreefficient. Instead of solving the differential equations describing geodesics, a discretecurve is found by optimising the energy and error directly.

Geodesics, however, may not be the most accurate continuous description of an effi-cient quantum circuit. To refine Nielsen et al’s original ideas we introduce a new typeof variational curve called a subRiemannian cubic. We derive their equations fromthe Pontryagin Maximum Principle and examine the behaviour of subRiemanniancubics in some simple groups.

iii

iv

Solving the geodesic and cubic boundary value problems is still a difficult task. Inthe search of a better practical tool, we investigate whether neural networks canbe trained to help generate quantum circuits. A correctly trained neural networkshould reasonably guess the types, number and approximate order of gates requiredto synthesise a U . This guess could then be refined with another optimisationalgorithm.

Contents

Acknowledgements i

Abstract iii

1 Introduction 11.1 Quantum Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Basic background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Pauli basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Qudits and generalised Pauli matrices . . . . . . . . . . . . . . . . 3

1.2.3 Lie product formula . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Explicit Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Circuits for basis elements . . . . . . . . . . . . . . . . . . . . . . 4

2 SubRiemannian Geodesics 62.1 SubRiemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 SubRiemannian geodesic equations . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Normal case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Abnormal case . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Analytical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 SU(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 SU(8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 SU(2n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Higher Order PMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.2 Abnormal geodesics . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Magnus expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Discretising subRiemannian geodesics . . . . . . . . . . . . . . . . . . . 23

3 SubRiemannian cubics 273.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Riemannian cubics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 SubRiemmannian cubics . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3.1 Normal case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.2 Abnormal case - SU(2) . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.3 SubRiemannian cubics with tension . . . . . . . . . . . . . . . . . 33

3.3.4 Higher Order PMP for cubics . . . . . . . . . . . . . . . . . . . . . 34

v

vi Contents

3.4 Normal cubics in three dimensional groups . . . . . . . . . . . . . . . . 35

3.4.1 SU(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Numerical calculations of geodesics and cubics 374.1 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.1 Direct Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.1.2 Discrete Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2 Cubics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2.1 Direct search for cubics . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.2 Discrete Cubics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.3 Indefinite cubics . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Neural Networks 465.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3 Network Design - SU(8) . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.1 Global decomposition . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.2 Local decomposition . . . . . . . . . . . . . . . . . . . . . . . . 50

5.4 Results - SU(8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4.1 Global decomposition . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4.2 Local decomposition . . . . . . . . . . . . . . . . . . . . . . . . 52

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Summary 55

Bibliography 56

Appendix A 616.1 Commutation Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.1.1 Commutation relations in su(8) . . . . . . . . . . . . . . . . . . . 61

6.1.2 Commutation relations in su(16) . . . . . . . . . . . . . . . . . . . 61

6.1.3 Adjoint action in SU(8) . . . . . . . . . . . . . . . . . . . . . . . . 63

Appendix B 636.2 Gradients for discrete geodesics and cubics . . . . . . . . . . . . . . . . 64

6.2.1 Trace error gradient . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.2.2 Discrete cubics gradient . . . . . . . . . . . . . . . . . . . . . . . 64

1. Introduction

1.1 Quantum Computation

In 1982, Richard Feynman showed that a classical Turing machine would not be ableto efficiently simulate quantum mechanical systems [7]. Feynman went on to proposea model of computation based on quantum mechanics, which would not suffer thesame limitations. Feynman’s ideas were later refined by Deutsch who proposed auniversal quantum computer [8]. In this scheme, computation is performed by aseries of quantum gates, which are the quantum analog to classical binary logicgates. A series of gates is called a quantum circuit [9]. Quantum gates act on qubitswhich are the quantum analog of a classical bit.

Seth Lloyd later proved that a quantum computer would be able to simulate aquantum mechanical system efficiently as long as they evolve according to localinteractions [10]. Equivalently, this can be stated as; given some special unitary op-eration U ∈ SU(2n), U †U = I, there exists some quantum circuit that approximatesU , where n is the number of qubits. One pertinent question that remains is howto find the circuit which implements this U . In certain situations the circuit can befound exactly. However in general it is a difficult problem, and it is acceptable toapproximate U . Previously U has been found via expensive algebraic means [11]–[14]. Another novel attempt at finding an approximate U has been to use the toolsof Riemannian geometry [1]–[5].

Michael Nielsen et al originally proposed calculating special curves called geodesicsbetween two points, I and U in SU(2n). Geodesics are fixed points of the energyfunctional [15]. Nielsen et al claimed that when an energy minimising geodesic isdiscretised into a quantum circuit, this would efficiently simulate U [1]–[5]. In prac-tice however, finding the geodesics is a difficult task. Computing geodesics requiresone to solve a boundary value problem in a high dimensional space. Furthermore,Nielsen et al originally formulated the problem on a Riemannian manifold equippedwith a so called penalty metric, where the penalty was made large. This complicatedsolving the boundary value problem [6].

The Nielsen et al approach can be refined by considering subRiemannian geodesics.A subRiemannian geodesic is only allowed to evolve in directions from a horizontalsubspace of the tangent space [16]. However, geodesics may not provide the mostaccurate description of an efficient quantum circuit.

1

2 1.2. Basic background

A circuit to implement U is considered efficient if it uses a polynomial number ofgates. Roughly, the application of a quantum gate can be viewed as an accelerationin U(2n). To try and more accurately measure this, we investigate another class ofcurves called cubics [17]. Cubics are curves which minimise the mean squared normof the covariant acceleration.

This approach still involves solving a complicated boundary value problem. For apractical tool, a much faster methodology to synthesise a U is required. With recentadvances in computing power, neural networks are an attractive option. We proposethat a neural network be trained to produce approximate subRiemannian geodesics.

1.2 Basic background

1.2.1 Pauli basis

Operations on n qubits are represented as unitary matrices from the unitary groupU(2n). Associated with the Lie group is the Lie algebra. The Lie algebra for U(2n),u(2n), the space of skew-Hermitian matrices, can be constructed out of Kroneckerproducts of the 3 Pauli matrices;

σ0 = I2×2, σ1 =

(0 1

1 0

), σ2 =

(0 i

−i 0

), σ3 =

(1 0

0 −1

).

Then the set of all n-fold Kronecker products of the form

i√2nσµ1 ⊗ · · · ⊗ σµn︸︷︷︸

n times

,

where µi = 0, 1, 2, 3, forms a basis for u(2n). Additionally it is often sufficientto work in SU(2n) instead. In section (1.3), exponentials of these basis elementsare constructed in terms of quantum gates. To form a basis for su(2n) simplyexclude the Kronecker product of identity matrices from the basis of u(2n). We usethe nomenclature that m-body refers to a basis element containing m ≤ n Paulimatrices in the Kronecker product. Additionally a basis element will be referred toas an instruction.

Furthermore, the set of one and two body basis elements, ⊂ su(2n) are bracketgenerating. This means repeated Lie brackets, of one and two body matrices canspan the whole of su(2n).

1.2. Basic background 3

1.2.2 Qudits and generalised Pauli matrices

Qudits are a generalisation of a qubit, and transform under U(dn), d > 2. The Liealgebra for this space can be constructed out n-fold Kronecker products of the d×d

generalised Pauli matrices [18], λ, and the d× d identity matrix.

Alternatively, the set of all generalised Pauli matrices of size (dn)× (dn) dimensionform a basis, but the Kronecker product adds additional structure, as it neatlyseparates the number of qudits the exponential of the instruction acts on

Let Srs represent a d× d matrix with a 1 in the r, s entry, and 0 elsewhere. Definethe following set of matrices over Cd×d:

• For 1 ≤ r ≤ d− 1:

λ1r =

√2

r(r + 1)

(m∑s=1

Sss − rSr+1 r+1

)

• For 1 ≤ s < r ≤ d:λ2rs = −i(Ssr − Srs)

• For 1 ≤ r < s ≤ d:λ3rs = Srs + Ssr

Along with the d× d identity matrix, the set

i√dI, λ1r, λ2rsλ3rs (1.1)

forms a basis for u(d) [18]. This is extended to a basis for u(dn) by taking Kroneckerproducts with the d×d identity matrix in a way analogous to u(2n) . Like the qubitcase, it is convenient to consider SU(dn) instead.

1.2.3 Lie product formula

The Lie product formula gives a rudimentary method to decompose arbitrary U ∈SU(2n) in terms of products of matrix exponentials

exp( m∑

i=1

ci t τi

)= lim

n→∞(exp(c1 t/n τ1) . . . exp(cm t /n τm))n.

If this expansion is truncated at some finite n,

exp( m∑

i=1

ci t τi

)= (exp(c1 t/n τ1) . . . exp(cm t /n τm))n +O(t/n),

4 1.3. Explicit Circuits

it can be used to approximate U . Higher order variants of the Lie product formulaexist

exp( m∑

i=1

ci t τi

)= lim

n→∞

(exp(cm t/2n τm) . . .

. . . exp(c2 t/2n τ2) exp(c1 t/n τ1) exp(−c2 t/2n τ2)

. . . exp(−cm t/2n τm))n,

where truncation has error O(t/n2). The Lie product formula and its variants canbe used as a rudimentary starting point to construct circuits. One significant disad-vantage is that given log(U) = ciτi in general has dim(log(U)) ∝ 4n−1 components,a circuit generated by the Lie product is hopelessly impractical. If a circuit requiresexponentially many operations, then there is little gain over classical simulation.One option is to examine U with some structure which enables the design of effi-cient quantum circuits. Another option is to use the tools of Riemannian geometry,which is investigated in the following chapters (2-4).

1.3 Explicit Circuits

1.3.1 Circuits for basis elements

For each basis element, τi in the Pauli basis for su(2n), it is possible to construct acircuit for the exponential, exp(ϑτi) in SU(2n). The first step is to observe exp(iϑσ3⊗· · · ⊗ σ3) has the following circuit representation

• •• •... ...... ...

• •• •

R3(ϑ)

where ... denotes staggered CNOT gates. When there is an identity matrix in theplace of a σ3, then the staggered CNOT gates skip the respective qubit. For exampleexp(iϑσ3 ⊗ I ⊗ σ3), has the circuit

• •

R3(ϑ)

1.3. Explicit Circuits 5

To obtain circuits for exponentials which contain σ1 and σ2 terms, make use ofthe Hadamard gate, denoted H, and the Y gate. They have the following matrixrepresentations;

H =1√2

(1 1

1 −1

), Y =

1√2

(1 i

i 1

),

and have the following properties

Y σ3Y† = σ2, Hσ3H

† = σ1.

These gates can be used to swap the indices in the Kronecker product iσ3⊗· · ·⊗σ3.For example exp(iϑσ1 ⊗ I ⊗ σ2) has the circuit

H • • H†

Y R3(ϑ) Y †

This can be seen from the formula for the exponential of a single basis element.Since the circuit for

exp(iϑσ3 ⊗ · · · ⊗ σ3︸︷︷︸j−th entry

⊗ · · · ⊗ σ3)

is known, to swap the j-th sigma, simply apply the H or Y gate to the j-th qubit.For example, to obtain a σ1 in the jth entry, let H = I2×2 ⊗ · · · ⊗H ⊗ . . . I2×2

H exp(iϑσ3 ⊗ · · · ⊗ σ3 ⊗ · · · ⊗ σ3)H†

= H(cos(ϑ)I2n×2n + i sin(ϑ)(σ3 ⊗ · · · ⊗ σ3 ⊗ · · · ⊗ σ3))H†

= cos(ϑ)I2n×2n + i sin(ϑ)(σ3 ⊗ · · · ⊗ σ1 ⊗ · · · ⊗ σ3)

= exp(iϑσ3 ⊗ · · · ⊗ σ1 ⊗ · · · ⊗ σ3)

Note that such structures will apply to qudits. Instead of using the qubit CNOTgate, the above constructions apply equally with two qudit based operations. Thesegates can be the building blocks for more complicated circuits. In the Nielsenapproach, these circuit elements would be used to discretise geodesics.

2. SubRiemannian Geodesics

2.1 SubRiemannian manifolds

It has been argued [1]–[6], [19] that efficient quantum circuits correspond to specialcurves called geodesics. Previous work has focused on computing geodesics in Rie-mannian manifolds (for a good reference to Riemannian geometry see [20]), equippedwith a penalty metric where the penalty needs to be taken to infinity [1]–[4], [6], [19],[21]. Taking such limits seems problematic, because it is not clear that all extremalsof a limiting optimal control problem can be arrived at as limits of solutions. Thiscan be rectified by using subRiemannian geometry

Instead of applying a large penalty to the undesirable instructions, they can beforbidden entirely. To do this, the problem can be recast in terms of subRiemanniangeometry. In subRiemannian geometry curves are only allowed to move tangent toa horizontal subspace of the tangent space. However if this horizontal subspaceis bracket generating, then the curve is able to join any two arbitrary points inthe subRiemannian manifold [16]. In the context of quantum computing, alloweddirections come from a universal gate set. A universal gate set is a set of unitarymatrices which generate SU(2n) [9]. The matrix logarithms of universal gates wouldform a basis for a bracket generating subset of su(2n). For clarity define the matrixlogarithm of a gate as an instruction. As seen previously, the Pauli basis forms abracket generating set. Other work [22], [23] has considered qudrit and qudit basedquantum computers respectively, which can be considered as operations in SU(3n)

or SU(dn). The equations describing subRiemannian geodesics given here also applyto these groups.

Given a differentiable manifold M a distribution ∆, is a subbundle of the tangentbundle TM , ∆ ⊂ TM . A velocity field, x is called horizontal if x ∈ ∆. Let gbe a smooth positive-definite quadratic form on ∆. A subRiemannian manifold isthen defined by the collection (M,∆, g). A subRiemannian manifold has a naturallydefined metric, taken to be the infimum of the lengths L, of curves x, joining twopoints

d(a, b) = infL[x] = inf∫ 1

0

dt (g(x, x))12 ,

where x(0) = a, and x(1) = b. When M is a Lie group G, ∆ can be identified with abracket generating subset of the Lie algebra of G, g. This can be done through left-or right-Lie reduction. To follow the convention set in the Schrödinger equation, we

6

2.1. SubRiemannian manifolds 7

will use right-Lie reduction.

For simplicity, from now on let ∆ ⊂ g be the right-Lie reduction of the distribtion.

To compute subRiemannian geodesics, it is much easier to find stationary points ofthe energy functional

E [x] = 1

2

∫ 1

0

dt g(x, x).

This is because the energy functional is invariant under reparameterisations. Thereare infinitely many curves of the same energy but with lengths dependent on pa-rameterisation. Minimising the energy finds the minimal length curve uniquelyparameterised by arc length.

The subRiemannian geodesic equations on a Lie group G may be derived fromthe Pontryagin Maximum Principle (PMP) [24]. Traditionally the PMP applies todynamical systems x ∈ Rn [25]. While it can be written for systems on smooth man-ifolds [26] , this adds additional complexity. To simplify things, use the traditionalPMP by noting that SU(2n) ⊂ R4n. Then given that x is defined by the Schrödingerequation

x = ux

since u is in su(2n), x will always be contained in SU(2n).

Given some dynamical system

x = p(x, u), x ∈ Rn, u ∈ ∆ ⊂ Rm

x(a) = x0, x(b) = x1,

to minimise the functional

S[x] =

∫ b

a

dt ψ(x, u)

we can use the Pontryagin Maximum Principle.Theorem 2.1.1 (Pontryagin Maximum Principle). The optimal control u and state trajec-tory x must minimise the Hamiltonian H,

H(u, x, λ, ν) = λ(p(x, u)) + ν ψ(x, u) (2.1)

so that

H(u, x, λ, ν) = maxu∗∈∆

H(u∗, x, λ, ν), (2.2)

8 2.2. SubRiemannian geodesic equations

for all permissible controls u∗. Additionally the costate equations

λ = −dHx, (2.3)

(λ, ν) = (0, 0), (2.4)

ν ≤ 0, (2.5)

for some λ ∈ TM∗ and, ν ∈ R. must be satisfied.

∆ as a subset of g can also be identified with a real vector space, which satisfies thePMP. When the PMP succeeds in finding minima of S[x] the solutions are termednormal. However if ν = 0 typically the PMP does not provide enough informationto determine fixed points. The PMP is analogous to Lagrange multipliers, whereinstead of scalar lagrange multipliers, they are now functions. An extension of thePMP has been proposed, which attempts to deal with the abnormal case [27].

2.2 SubRiemannian geodesic equations

Define the subRiemannian energy as

E [x] = 1

2

∫ 1

0

dt ⟨x, x⟩J =1

2

∫ 1

0

dt ⟨u, u⟩J ,

where we are interested in simple metrics of the form g := ⟨. . . , . . . ⟩J , which corre-sponds to the Killing form

⟨W,V ⟩J = ⟨JW,V ⟩ = tr(JWV †) (2.6)

but there is the requirement V,W ∈ ∆. J is the inertia transformation. Forquantum computing, J simply weights the cost of moving in a particular basisdirection. To calculate JW , let W be the matrix in g. If

W =∑i

wiτi,

then

JW =∑i

ciwiτi,

where ci is the cost moving in the τi direction. The ci can be defined by the physicalcost of implementing exp(τi). For example single body directions correspond to asingle gate, so could be weighted with cost 1. Two body directions can require atmost five gates so could cost 5.

2.2. SubRiemannian geodesic equations 9

To minimise E [x], form the PMP Hamiltonian,

H(ν, λ, x, u) = λ(ux) +ν

2⟨u, u⟩J . (2.7)

where λ is in the dual of the tangent space at TGx, denoted λ ∈ T ∗Gx and ν ∈ R.By the PMP, find the (λ, ν) = (0, 0), ν ≤ 0 which minimises (2.7), subject to theconstraint λ = dHx. Clearly

λ(z) = −dH(z)x = −λ(uz). (2.8)

Now define a Λ∗ in the dual g∗ of g, Λ∗(v) =: λ(R(x)v), where R stands for rightmultiplication, and v ∈ g. Corresponding to Λ∗, define Λ ∈ g by the inner productΛ∗(v) =: ⟨Λ, v⟩, where ⟨. . . , . . . ⟩ corresponds to the usual bi-invariant inner producton g.

First differentiating Λ∗(v),

Λ∗(v) = λ(vx) + λ(vx)

= λ(vx) + λ(vux).

Using the constraint (2.8), this can be rearranged to give

Λ∗(v) = −λ(uvx) + λ(vux)

= −Λ∗(uv) + Λ∗(vu).

As Λ∗ is a linear function,

Λ∗(v) = Λ∗([v, u]).Finally the pull back onto the inner product on g

⟨Λ, v⟩ = ⟨Λ, [v, u]⟩.

As ⟨, ⟩ is bi-invariant, ⟨X, [Y, Z]⟩ = ⟨X, [−Z, Y ]⟩ = ⟨[X,−Z], Y ⟩,

Λ = −[Λ, u]. (2.9)

By the PMP there are two cases to examine.

2.2.1 Normal case

In the normal case, the PMP requires ν < 0. Without loss of generality, chooseν = −1, as the equations can be rescaled by a constant. Then

H = λ(ux)− 1

2⟨u, u⟩ = Λ∗(u)− 1

2⟨u, u⟩J .

10 2.2. SubRiemannian geodesic equations

Now identifying λ with a Λ,

H = ⟨Λ, u⟩ − 1

2⟨u, u⟩J .

Minimal controls occur when dHu(u∗) = 0, ∀u∗ ∈ ∆. dHu(u

∗) denotes the pushfor-ward or total derivative of H at u in the direction of u∗. We require this to be zero∀u∗ ∈ ∆ otherwise there will be no unique minima. Computing the the derivative,

⟨Λ, u∗⟩ − ⟨u∗, u⟩J = ⟨Λ− J u, u∗⟩ = 0

Therefore minima occur when proj∆(Λ) = J u. Denote the orthogonal complementof ∆ by ⊥. Denote the projection of Λ onto ⊥, as Λ⊥. When the metric is therestriction of the bi-invariant metric, the normal equations can be written as

u+ Λ⊥ = −[Λ⊥, u].

Lemma 2.2.1. Normal subRiemannian geodesics have constant norm.

Proof. Taking the inner product of the Euler equations with respect to u,

⟨Λ, u⟩ = ⟨Λ⊥ + J u, u⟩ = ⟨[u,Λ], u⟩.

By properties of ⟨, ⟩, clearly ⟨u, u⟩J = 0. Integrating, ⟨u, u⟩J = C21 , C1 ∈ R, hence

||u|| = C1. Additionally the same calculation can be performed for Λ⊥, ⟨Λ⊥,Λ⊥⟩ =C22 .

Previous work has considered geodesics on a Riemannian manifold equipped with apenalty metric. It has been suggested that taking the infinite penalty limit coincideswith optimal solutions [6]. Let u still denote the desired instructions u ∈ ∆ andnow v ∈⊥, the Riemannian geodesic equations can be written as [6]

u+ pv = (1− p)[v, u]

x = (u+ v)x.

Only if v → Λ⊥/p as p → ∞, then these equations coincide with the normal equa-tions found by the PMP. However taking the limit of a penalty metric geodesic doesnot always give optimal results. There are cases where the minimising solution,to the original functional, is not a solution of the equations given by the limit ofthe penalty metric. Some examples can be found in [28]. The PMP however mayprovide enough information to rule out abnormals.

2.3. Analytical results 11

2.2.2 Abnormal case

The abnormal case occurs when the PMP fails to provide sufficient information todetermine minima or maxima. For the abnormal case, ν = 0, λ = 0 must minimiseH(0, λ, x, u) = λ(ux). Equivalently, the optimal u minimises the quantity;

H = ⟨Λ, u⟩,

where Λ = 0. As before, differentiating H, we find

⟨Λ, u∗⟩ = 0,

∀u∗ ∈ ∆. The only way this can occur is if Λ is in the orthogonal complement to ∆,denoted ⊥, so proj∆Λ = 0. Let the component of Λ in the orthogonal complementbe denoted by Λ⊥. In this situation (2.9) becomes

Λ⊥ = −[Λ⊥, u]. (2.10)

For certain distributions, finding a contradiction to (2.10) can rule out abnormals.If this is not possible then the abnormal case could have a lower energy, which maylead to a more efficient quantum circuit.

For an example where abnormals can be ruled out, consider SU(2) where ∆ =

spaniσ1, iσ2 and ⊥= spaniσ3. In this case, for u ∈ ∆ and Λ⊥ ∈⊥, then [Λ⊥, u] ∈∆. As Λ⊥ must be non zero, the only choice for u to satisfy (2.10) is u = 0. Thesame argument can be made for SO(3).

This can be generalised to so called fat distributions, where for any V ∈⊥, witha slight abuse of notation [∆, V ] ⊂ ∆. Abnormals can be ruled out in these casesvia contradiction. Clearly any [Λ⊥, u] will be contained in ∆. However we requireΛ⊥ = [Λ⊥, u], and so the only possibilities to satisfy this equation are u = 0, thetrivial case, or Λ⊥ = 0. This contradicts the PMP as (Λ⊥, ν) = (0, 0), so there areno abnormals.

2.3 Analytical results

Analytical expressions for normal geodesics can be found in several groups. In thesecases the commutation relationships allowed the equations to be separated intosimpler components. In general this does not occur.

12 2.3. Analytical results

2.3.1 SU(2)

Normal case

SubRiemannian geodesics in SU(2) are well understood, for example [29] . Howeverwe use SU(2) as an example to recall a technique due to Noakes [30]. Let ∆ =i√2spanσ1, σ2 and ⊥= i√

2spanσ3. In this case equation 2.9 separates directly

into

u = −[Λ⊥, u], (2.11)

Λ⊥ = 0 (2.12)

which can be integrated directly to give

Λ⊥ = Λ⊥0 , u = Ad(exp

(−Λ⊥

0 t))u0.

To solve Schrödingers equation for x, use the work of Noakes [30].

Given the equation Z = [Z,W ], if Z and W are known and there exists a y suchthat Ad(y)(Z) = Z0 then the solution to the Schrödinger equation, x = Wx is givenby Noakes to be

x = y† exp(∫ t

0

ds v(s)

)(2.13)

wherev = Ad(y†)(yy† +W )

Note the changes in sign to Noakes, as we are computing right invariant geodesics.Clearly the choice of y = exp(Λ⊥

0 t) satisfies this. This gives v = (Λ⊥0 + u0)t, and so

subRiemannian geodesics in (right invariant) SU(2) are given by

x = exp(−Λ⊥0 t) exp((Λ⊥

0 + u0)t)x0, (2.14)

which is the product of two one parameter subgroups. This would also cover SO(3)

through a change of basis.

2.3.2 SU(8)

Normal case

In su(8), ∆ is the span of one and two body Pauli matrices.⊥ is the linear spanof the three bodies Pauli matrices. Specifically ∆ = i

2√2spanσr

i , σsjσ

tk. Denote


u = S + D to be the one and two body components respectively. Furthermore letΛ⊥ = T . In this case equation (2.9) separates into the three equations

S = 0, (2.15)

D = −[T,D], (2.16)

T = −[T, S] (2.17)

Clearly S = S0 a constant. To integrate the other equations make the followingobservation. Given an equation of the form Z = [W,Z], where W is known, W,Z ∈g, solutions are given by Z = Ad(w)Z0, where w solves the equation w = Ww,with w(0) = I, and Z0 is a constant. The solution to the three body equation isimmediately

T = Ad(exp(S0t))T0.

The two bodies now satisfy

D = −[Ad(exp(S0t))T0, D].

To integrate this look for the w : [0, 1] → SU(8) such that

w = −Ad(exp(S0t))(T0)w.

This equation has the solution

w = exp(S0t) exp(−(S0 + T0)t),

recalling w(0) = I. Therefore the solution to the two body equation is given by

D = Ad(exp(S0t) exp(−(S0 + T0)t))(D0).

Now the geodesic equation can be integrated using the work of Noakes [30] to give

x = exp(S0t) exp(−(S0 + T0)t) exp((S0 +D0 + T0)t)x0. (2.18)

Interestingly this geodesic is a product of three one parameter subgroups. Fur-thermore this extends naturally to SU(d3), where the allowed set is one and twoqudit instructions. These solutions can also cover SU(3) and SO(4) with particulardistributions.

Abnormal case

In SU(8) the abnormal case is given by ⟨Λ, u∗⟩ = 0, ∀u∗ ∈ ∆. This occurs whenΛ is in the orthogonal complement of ∆. In this case the costate equation can bewritten as

Λ⊥ = −[Λ⊥, u].


While there is not an explicit equation for u, examples of abnormals can be con-structed in SU(8), which have the same energy as normal solutions.

As before, let the allowed set be ∆ = span i2√2σri ,

i2√2σsjσ

tk. Now consider a u in

the allowed set, u = i2√2Sirσ

ri +

i2√2Dij

rsσri σ

sj = S+D , and Λ⊥ = i

2√2T ijkrst σ

ri σ

sjσ

tk = T .

For arbitrary T and D the terms [T ijkrst σ

ri σ

sjσ

tk, D

ijrsσ

ri σ

sj ] are always contained in ∆

(see Appendix A ). To avoid a contradiction, as Λ⊥ = T must be perpendicular to∆, it must be that [T,D] = 0.

If [T,D] = 0, at least one of the T ijkrst = 0 is required by the PMP. In SU(8) there

are subspaces of the two body directions which will commute with a single T .

From the normal case, D = [T,D] and so constant D = D0, in the abnormal case,is covered by the normal equations. Now we construct variations of D = D0 whichpreserve the commutation relationship. For example supposeD is also an isospectral-flow, so D = Ad(d)D0, where d : [0, 1] → SU(8). Since T is also an isospectral-flow, the solution to the abnormal equation can be written as T = Ad(τ)T0, whereτ : [0, 1] → SU(8). If T0 and D0 are chosen so [T0, D0] = 0, then one possibility isthat d = τ . τ however must be chosen however so T always remains perpendicular to∆. Hence τ must be the exponential of single body terms, for example τ = exp(S0t).However such a solution for T is also given by the normal case.

Likewise the abnormal equations do not give any information about the singlebody components. The normal solution for S is when the Si

r are all constantS = i

2√2(S0)

irσ

ri = S0. The abnormal equation for T can then be satisfied by

T = exp(−S0t)T0 exp(S0t). Note this is always contained in ⊥ and still satisfies thenormal geodesic equations.

A more complicated S is given by rotations of the form,

S = Ad(exp(−S ′0t)S0,

where S ′0 =

i2√2(S ′

0)ijσ

ji is also a single body. S in this case has constant norm, like

the normal case, but now

T = Ad(exp(−S ′0t) exp(−(S0 − S ′

0)t))T0,

which is not given by the normal case. This T is always contained in ⊥ as

Ad(exp(−S ′0t) exp(−(S0 − S ′

0)t),

simply rotates T0 in ⊥. Furthermore

D = Ad(exp(−S ′0t) exp(−(S0 − S ′

0)t))D0.


Likewise, this will remain in ∆ by a similar argument.

Clearly this choice for S,D and T are not given by the normal geodesic equations,but in SU(8) it is a rotation of a normal solution by exp(−S ′

0t). More exotic solutionscould easily be created by adding extra rotations. For practical implementation as acircuit, the constant single body controls would be simpler, and it is already coveredby the normal case.

2.3.3 SU(2n)

Normal case

As before the allowed set is composed of one and two bodies

u = S +D.

Now denote the odd m > 1 bodies as T and the even n > 2 bodies as F . In thissituation equation (2.9) separates into

S = 0

D + F = [D,T ] + [S, F ]

T = [D,F ] + [S, T ].

It is not possible to completely integrate these equations in closed form. ClearlyS = S0 as before. Now let D = Ad(exp(S0t))D, F = Ad(exp(S0t))F and T =

Ad(exp(S0t))T which removes the Lie brackets of T and F , with S0.

Now the equations become˙T = [D, F ],

˙D +

˙F = [D, S0] + [D, T ].

Now set Z = D+ F , and let p denote the projection onto the allowed set. This givesthe two coupled equations

˙T = [p(Z), Z],

Z = [p(Z), S0] + [p(Z), T ].

Potentially these equations can be solved with a power series. A power series maybe a useful computational tool for generating approximate solutions. Let

T =∞∑i=0

qiti,

Z =∞∑i=0

ziti,


0.0 0.2 0.4 0.6 0.8 1.0

0.5

1.0

1.5

2.0

t

Series v exact results

Exact

n = 2

n = 4

n = 6

n = 8

n = 10

Figure 2.1: Truncated series results vs exact result in SU(8). Figure shows thecontrol function in the σ3 ⊗ σ3 ⊗ I direction. n is the number of terms in the seriesexpansion. Control functions in other directions displayed similar behaviour.

where zi and qi are matrices. To find the coefficients, compare the powers of t∞∑r=1

rqrtr−1 =

∞∑j=0

∞∑k=0

[p(zj), zk]tj+k

∞∑r=1

rzrtr−1 =

∞∑j=0

[p(zj), S0]tj +

∞∑j=0

∞∑k=0

[p(zj), qk]tj+k

Computing the first few terms

q1 = [p(z0), z0]

q2 =1

2([p(z1), z0] + [p(z0), z1]) ,

q3 =1

3([p(z2), z0] + [p(z1), z1] + [p(z0), z1]) ,

and likewise

z1 = [p(z0), S0 + q0],

z2 =1

2([p(z1), S0 + q0] + [p(z0), q1]) ,

z3 =1

3([p(z2), S0 + q0] + [p(z1), q1] + [p(z0), q2]) .

In general for k ≥ 1,

qk =1

k

(k−1∑r=0

[p(zk−1−r), zr]

)

zk =1

k

([p(zk−1), S0 + q0] +

k−1∑r=1

[p(zk−1−r), qr]

)

Figure (2.1) shows the series expansions to various orders, compared with the exactresult for randomly chosen initial conditions. For convergence, first note that the

2.4. Higher Order PMP 17

norm of p(Z0) and T0 is bounded above by Λ0, and an n deep nesting of Lie bracketsand projections will be bounded approximately by ∥Λ0∥n. For an n th term inthe series, Lie brackets are nested at most n + 2 deep. Furthermore, over countingrepeated indices, there are at most 3n−2+1 terms for the n th power of t. Additionallythe 1/k coefficients for each term in the expansion contribute to, at worst, an overallfactor of 1/n!. Therefore a crude upper bound on the norm for the n term is(3n−2∥Λ0∥n+2/n!). Clearly this converges, but this does not assist with solving theboundary value problem.

Abnormal case

Again suppose Λ is entirely perpendicular to ∆. In this case, abnormals in SU(2n)

must obey the equationF + T = [u, F + T ].

If u is a constant control, this equation follows from the equations for the normalgeodesics. In the constant u case U must be the exponential of a matrix in ∆.Suppose u is not constant. Let u = S +D, where S is a single, and D a two body.As seen in the normal case, the term [D,T ] can be contained in ∆. If T contains oddbody terms, then there is a two body in ∆ so [D,T ] is a two body which contradictsthe abnormal equation. Therefore the control cannot always move in all two bodydirections. To satisfy the abnormal case it must belong to a subset of the two bodiesin ∆.

This does not rule out abnormals in general. For example a piecewise u such that itrespects the commutation relationships could move to an arbitrary x. u can switchbetween moving in different basis directions. This behaviour has a nice physicalinterpretation, which is simply switching between different gates. In this example ifu is piecewise constant, then the energy is the sum of the magnitudes of the differentsegments of u. This somewhat obfuscates the meaning of complexity. It would bebetter to simply count the number of switches.

2.4 Higher Order PMP

2.4.1 General approach

The PMP fails in the abnormal case when the derivative dHu(u∗) = 0 does not give

enough information to determine u. This is because there are not enough constraintsin the original control problem. We propose a new method to try and find u in the

18 2.4. Higher Order PMP

abnormal case, when the original state equation

x1 = f(u, x1),

is affine in u. Specifically, let

x1 = G(x1, t) +K(x1, t)u

where x1 : [t1, t2] → Rn, u : [t1, t2] → Rp. G(x1) is an n× 1 matrix valued functionof x1, and K(x1) is an n× p matrix valued function of x1. Furthermore we restrictourselves to the case where the original optimal control problem, was to minimisean action of the form

S[x1] =1

2

∫dt u(t)Tg(x1)u(t),

where g(x1) is a Riemannian metric, namely a position-dependent, positive definite,symmetric matrix. Briefly, the new method is to take any constraints found in theabnormal case and re-apply the PMP with these constraints. The original PMPHamiltonian H we now denote H(1) is simply

H(1) = λ1 · (G(x1) +K(x1)u) +ν12u(t)Tg(x1)u(t).

where λ1 denotes the co-vector in the dual of Rn, λ1 : [t1, t2] → Rn∗. By the PMPλ1 is required to satisfy

λ1 = −(∂H(1)

∂x1

)T

which denotes the gradient with respect to components of x. In the normal casethere will always be enough information to determine u. While the state equationis linear in u, the integrand of the action is quadratic, leaving a u after computingthe derivative dHu. This means u can be solved for in terms of λ1. However in theabnormal case, the quadratic term will always vanish, which is where the problemoccurs. However it does give some information. In the abnormal case we requireν1 = 0, λ = 0 and

λ · (K(x)u) = 0,

∀u ∈ Rp. Therefore, we require λ1 to be perpendicular to all the columns of K(x)

for all t ∈ [t1, t2]. This is equivalent to requiring the p vector,

χ(t) =

⟨λ1, k1⟩⟨λ1, k2⟩. . .

⟨λ1, kp⟩

,

2.4. Higher Order PMP 19

where ki denotes the i-th column, to be zero, χ1(t) = 0 ∀ t ∈ [t1, t2]. Clearly this isstill affine in the original u. The new method is to add this constraint on λ1 backinto PMP.

Analogous to finite dimensional Lagrange multipliers, for each constraint we add aLagrange multiplier. As long as we only check the cases where λ1 satisfies χ1(t) = 0

reapplying the PMP with these extra constraints hopefully gives extra equations foru.

Constraints in the PMP are given by state equations. So to add the χ1 constraint,we upgrade χ1 and λ1 to state vectors. A state equation for χ1 is found by thederivative, χ1, and requiring the initial condition χ1(0) = 0. Clearly χ1 will still beaffine in the control, and there will be no u terms. An equation for λ1 is alreadygiven by the PMP. Hence we then take our new state to be

x2(t) = (x1(t), λ1(t), χ1(t)),

where x2 : [t1, t2] → Rn × Rn × Rp. Then re-apply the PMP looking to minimiseS[x1] with respect to u subject to,

x2 = (x1, λ1, χ1).

Note this system will still be affine in u. Now the new PMP Hamiltonian can bewritten as

H(2) = λ2 · (x1, λ1, χ1) +ν22uTg(x1)u,

where λ2 : [t1, t2] → (Rn × Rn × Rp)∗ is required to satisfy

λ2 =

((∂H(2)

∂x1

)T

,

(∂H(2)

∂λ1

)T

,

(∂H(2)

∂χ

)T ).

Clearly there are abnormal and normal cases of this to examine. In the abnormalcase, dH(2)

u will always be affine in u, so u can be solved for in terms of λ2 indH

(2)u = 0. However in the abnormal case of H(2) we will find new orthogonality

constraints for λ2, which we will set as χ2. If this does not lead to a contradiction,for example the SU(2) case, we will upgrade these and λ2 to state variables andrepeat the procedure. This will give a chain of H(k) and λk, where at each level wewill call the control u found k-th abnormal. Suppose the original state vector was ofdimension m. Then the new state will have dimension 2m+p, as a χk will always beof dimension p, and λk of dimension m. More than doubling the dimension at eachstep means this procedure is expensive for control problems in large dimensions.

20 2.4. Higher Order PMP

2.4.2 Abnormal geodesics

Previously it was found that Λ11 = −[Λ1

1, u1]. In the abnormal case, Λ was requiredto satisfy ⟨Λ1

1, u∗⟩ = 0, ∀u∗ ∈ ∆, so by the PMP we require proj∆ (Λ1

1) = 0. As anexample of the previous method, we will find equations for 2-abnormal geodesics.Along with the equation for Λ1

1, this constraint is upgraded to a state equationχ1 = d

dtproj∆ (Λ1

1) = −proj∆ ([Λ11, u]). Then the new iterated PMP Hamiltonian is

given by

H(2) = λ21(u1x)− λ22([Λ11, u1])− λ23

(proj∆

([Λ1

1, u1]))

+ν22∥u1∥2.

where λ2i are new costates, so λ21 ∈ T ∗Gx , λ22 ∈ g∗ and λ23 ∈ ∆∗. Computing thederivatives

λ21(v1) = −λ21(v1x),

λ22(v2) = λ22 ([v2, u1]) + λ23 (proj∆ ([v2, u1])) ,

λ23(v3) = 0,

where v1 ∈ TGx, v2, and v3 ∈ g. These equations can be rewritten as equations ong. λ21 can be identified with a Λ2

1 ∈ g as before. Additionally λ22 can be identifiedwith a Λ2

2 ∈ g via λ22(v) = ⟨Λ22, v⟩ where v ∈ g. Similarly for λ23 can be identified

with a Λ23.

Λ21 = −[Λ2

1, u1],

Λ22 = −[Λ2

2, u1]− [proj∆(Λ2

3

), u1],

Λ23 = 0.

Therefore take Λ23 as a constant in ∆. As usual there are two cases to examine, the

normal case ν ≤ 0 and the abnormal case ν = 0. Additionally H(2) can be writtenin terms of inner products.

H(2) = ⟨Λ21 − [Λ2

2,Λ11]− Λ2

3, u1⟩+ν

2⟨u1, u1⟩.

In the normal case, without loss of generality, ν2 = −1. Maxima of the Hamiltonianoccur when dH

(2)u1 (u

∗) = 0, ∀u∗ ∈ ∆. Therefore

⟨Λ21 − [Λ2

2,Λ11]− Λ2

3 − u1, u∗⟩ = 0.

Hence this requiresproj∆

(Λ2

1 − [Λ22,Λ

11])− Λ2

3 = u1.

2.5. Magnus expansions 21

This gives enough equations to determine u1. In the abnormal case, ν2 = 0, a similarsituation as the original abnormal case occurs,

proj∆(Λ2

1 − [Λ22,Λ

11]− Λ2

3

)= 0.

Then the process can be repeated. The Λ2i can be upgraded to state variables, along

with a new state for the projection. One significant problem with this approach isthat the dimension more than doubles at each iteration. For SU(8) this becomes aproblem in a 252 dimension space. There is also no guarantee that these abnormalshave lower energy. For the quantum computing application it might be better toaccept the possibly higher energy normal geodesics. However this approach mightbe useful for other control problems.

2.5 Magnus expansions

The Magnus Expansion uses Picard fixed point iteration to solve the matrix Schrödingerequation as a power series which respects the geometric structure [31]–[34] . As anextension, the same technique can be used to constructs series solutions for Laxequations. Furthermore, this will also yield a solution for the Schrödinger equation,giving a general series solution for subRiemannian geodesics.

Recall the derivative of the matrix exponential is given by

d

dtexp(X) =

(I − exp(−adX)

adX

(X)

)exp(X) = dexpX(X) exp(X),

where the operator, dexp and its inverse can be computed by a power series

dexpX :=

(I − exp(−adX)

adX

)=

∞∑k=0

(−1)k

(k + 1)!(adX)

k (2.19)

dexp−1X :=

∑ Bk

k!adk

X , (2.20)

where Bk is the k−th Bernoulli number. Consider the Lax equation

Z = [p(Z), Z].

As Z is an isospectral flow, there exists a W ∈ G such that Z = WZ0W†. Addition-

ally let p be the orthogonal projection onto the allowed set ∆,

p(X) =∑τi∈∆

⟨X, τi⟩τi.

Now W must satisfy

W = p(WZ0W†)W = p(Ad(W )(Z0))W.

22 2.5. Magnus expansions

Suppose W = exp(Ω) exp(Z0t). The exp(Z0t) term is included to increase theaccuracy of Ω constructed from the series expansion, and to match known solutions.Then

dexpΩ(Ω) exp(Ω) exp(Z0t) + exp(Ω) exp(Z0t)Z0 = p(Ad(expΩ)(Z0)) exp(Ω) exp(Z0t),

dexpΩ(Ω) = p(Ad(exp(Ω))(Z0))− Ad((exp(Ω))(Z0)).

Let p⊥ be the negative of the orthogonal projection onto ⊥. Now Ω satisfies theequation

Ω = dexp−1Ω

(p⊥(Ad(exp(Ω))(Z0))

)=

∞∑k=0

Bk

k!adk

Ω

(p⊥(Ad(exp(Ω))(Z0)

)).

This can be written as the integral equation

Ω(t) =∞∑k=0

Bk

k!

∫ t

0

ds adkΩ(s)p

⊥(Ad(expΩ(s))(Z0)),

assuming Ω(0) = 0. To find an approximate solution perform the Picard iterativeprocess

Ω[0] = 0,

Ω[1] =

∫ t

0

ds p⊥(Z0) = p⊥(Z0)t,

Ω[n+1] =

∫ t

0

ds∞∑k=0

Bk

k!adk

Ω[n]p⊥(Ad(exp(Ω[n]))(Z0)).

Assuming small t, t ≤ 1, the calculations can be simplified by truncating the powerseries for dexp−1 and Ad(exp) at the order of accuracy as the iterated solution. Thisis because we would like Ω[n] to be accurate to order O(tn). A k-fold nesting of Liebrackets of Ω[n] will at least have a O(tk) term. So we should ignore n + 1 nestedLie brackets as they will contribute higher order terms. Then, the procedure foriterating can be taken to be

Ω[n+1] =

∫ t

0

dsn∑

k=0

Bk

k!adk

Ω[n]p⊥( n∑

j=0

1

j!adj

Ω[n](Z0)

).

By linearity of p⊥ this at least simplifies to

Ω[n+1] =

∫ t

0

ds

n∑k=0

n∑j=0

Bk

j!k!adk

Ω[n]p⊥(adj

Ω[n](Z0)

). (2.21)

Given that Ω[1] is a first order polynomial, additional iterated solutions will be higherorder polynomials. Furthermore if the Ω[n] term is accurate to O(tn), we should only

2.6. Discretising subRiemannian geodesics 23

take terms up to O(tn) in the integrand, so after integration we are left with O(tn+1)

order terms. The second order solution is given by

Ω[2] = p⊥(Z0)t+1

2p⊥([p⊥(Z0), Z0])t

2.

To find the third order solution, we ignore O(t3) terms in the integrand.

Ω[3] = p⊥(Z0)t+1

2p⊥([p⊥(Z0), Z0]

)t2+

1

3

(1

2p⊥([[Z0, p

⊥(Z0)], p⊥(Z0)]

)+

1

4[p⊥([p⊥(Z0), Z0]

), p⊥(Z0)]

+1

2p⊥([p⊥([p⊥(Z0), Z0]

), Z0]

))t3.

As a check, this matches exactly with the truncated solution in SU(8). Recall inthat case exp(Ω) = exp(S0t) exp(−(S0 + T0)t) exactly. Expanding this with theBaker-Campbell-Hausdorff formula to O(t3) we find Ω[3].

In general Ω can be represented by a power series.

Ω =∞∑k=0

Ωktk,

where the Ωk term involves permutations of (k− 1)-deep Lie brackets and k projec-tions.

W also provides a solution to the Schrödinger equation,

x = p(WZ0W†)x.

Suppose a y can be found such that x = Wy. Differentiating

p(WZ0W†)y +Wy = p(WZ0W

†)Wy.

Clearly the only solution is y = y0 a constant, so x = W . This series for W willbe useful for numerical calculations, as it can be used to estimate initial conditions.It can also be used to construct numerical integrators which respect the unitaryproperty of x. This can be achieved by replacing the integral with a quadrature.

2.6 Discretising subRiemannian geodesics

Nielsen et al proved three lemmas guaranteeing that Riemannian geodesics can beapproximated to arbitrary precision with a quantum circuit. The first lemma is au-tomatically satisfied by subRiemannian geodesics. Our control function u is always

24 2.6. Discretising subRiemannian geodesics

in terms of one and two body gates, it does not need to be approximated. Theremaining two lemmas can be modified for subRiemannian geodesics. Namely thatthe error of the approximation can be controlled by choosing the size of discretisa-tion. Additionally, as a departure from previous work, we choose a more naturaldefinition of error.

For a small time interval [0,∆t], let U = x(∆t), be the unitary matrix generated bythe control u. Now define the average control u,

u =1

∆t

∫ ∆t

0

dt u(t).

and let U = exp(u∆t) be an approximation to U . We define the error to be givenby the length of the minimal bi-invariant Riemannian geodesic joining U and U .This can be computed by

err(U , U) = ∥ log(U †U)∥,

where log is taken on the principal branch, and ∥V ∥ =√

tr(V V †) is the Riemannianinner product. While err does not respect any subRiemannian structure, a subRie-mannian metric might be crudely approximated by replacing ∥..∥ with a penaltymetric ∥...∥p.

First, the error of approximating U by U is boundedLemma 2.6.1. err(U, U) is O(∆t2).

Proof. Using the BCH formula, let U † = exp(V ), correspond to the exact solutionto the Schrödinger equation, x = ux. Now

err(U, U) = ∥ log(U † exp(u∆t))∥

=

∥∥∥∥V + u∆t+1

2[V, u∆t] +

1

12

([V, [V,∆tu]] + ∆t2[u, [u, V ]]

)+ . . .

∥∥∥∥ .If∫ ∆t

0dt ∥u∥ = ∆t C1 ≤ π, then the Magnus expansion, as a solution for x, in

x = ux , is guaranteed to converge. So choosing a sufficiently small ∆t, V is givenby the Magnus expansion as

V = −∫ ∆t

0

dt u− 1

2

∫ ∆t

0

dt1

∫ t1

0

dt2 [u(t1), u(t2)] +O(∆t3),

Denoting the higher order integrals with the notation O(∆t) ∼∫ ∆t

0dt. Additionally

2.6. Discretising subRiemannian geodesics 25

by lemma (2.2.1) ∥u∥ = C1. Now

err(U, U) =∥∥− 1

2

∫ ∆t

0

dt1

∫ t1

0

dt2 [u(t1), u(t2)] +O(∆t3)

− 1

4[

∫ ∆t

0

dt1

∫ t1

0

dt2 [u(t1), u(t2)] +O(∆t3), u∆t] +O(∆t3)∥∥

err(U, U) ≤ C21∆t

2 + ∥O(∆t3)∥.

Then

lim∆t2→0

err(U, U)∆t2

≤ C21 .

To ensure the overall error is bounded by ε2 we must choose ∆t ≤ ε/C1.

Therefore the error of approximating U by constant controls can be controlled bymaking ∆t sufficiently small. It is currently unknown how C1 changes with respectto the dimension, the target and the distribution in general.

To convert a constant segment U into a quantum circuit, use the Lie Product for-mula. First divide [0,∆t] into M segments of length ∆t2. Then

U =(U∆t2

)M+O(∆t2),

where

U∆t2 = exp(u1τ1∆t2) . . . exp(umτm∆t2).

From Chapter (1), a two body gate requires 2 CNOT gates and at most 5 singlequbit gates. Also note that the single body gates can be written as phase gates. Insu(2n) there are m = (9/2)n(n − 1) + 3n one and two body basis elements. So intotal there are at most q = (63/2)n(n − 1) + 3n fundamental gates (assuming fornow phase gate can be implemented directly) required to implement a single step ofthe Lie Trotter formula.Lemma 2.6.2. err(U , UM

∆t2) is O(∆t3).

Proof. First by the Lie product formula

exp(u1τ1∆t2) . . . exp(umτm∆t2) = exp(u∆t2 +O(∆t4m2)).

Next (exp(u∆t2 +O(∆t4m2))

)M= exp(u∆t+O(∆t2m)).

26 2.6. Discretising subRiemannian geodesics

Then

err(U , UM∆t2) = ∥ log(exp(−u∆t) exp(u∆t+O(∆t2m)))∥

= ∥ − [∆tu,O(∆t2m)] +O(∆t4)]∥

≤ 2C1m∆t∥O(∆t2)∥+ ∥O(∆t4)∥.

Therefore as ∆t3 → 0,err(U , UM

∆t2)

∆t3≤ 2C2m,

where C2 is another constant at most order O(C31).

Overall to approximate a general x(1) = U with error ε, in terms of one and twobody gates, divide [0, 1] into small segments of length O(ε/m C1). Clearly each localsection of U can be approximated by O(q2C2

1) gates, and so overall O(q3C31) gates.

Note C1 may become quite large. A lower bound is determined by the fact that thesubRiemannian distance is bounded below by the Riemannian distance. So at leastC1 ≥ ∥ log(U)∥. However for an arbitrary U , the subRiemannian distance is expectedto scale exponentially, and hence the number of segments and gates required.

3. SubRiemannian cubics

3.1 Background

While the energy of a geodesic is a reasonable measure of complexity, we believethere is a better description. Consider the simple geodesic x = exp(ϑτjt), whereϑ ∈ R is some constant variable. The energy of x is 1

2ϑ2. Physically, x can be

implemented by CNOT gates and a phase gate which depends on ϑ. If we are onlycounting CNOTs and phase gates, ϑ does not add extra complexity, but it doeschange the energy. So the energy of x includes contributions from phenomena thatdo not contribute to computational complexity. One measure of complexity is tocount the number of operations in an algorithm. The continuous description ofcomplexity should try to mimic this. Hence a more appropriate measure, would beto count the number of changes in direction. On a Riemannian manifold one waythis can be achieved is with the following functional

S[x] =

∫ 1

0

dt ⟨∇tx,∇tx⟩J ,

where ∇t denotes the covariant derivative. Also ⟨A,A⟩J = ⟨JA,A⟩ and J is theinertia transform. When the Lie reduction of x is restricted to a ∆, the PMPcan be used to determine stationary points. However ⟨, ⟩J does not need to be asubRiemannian metric. The Lie reduction of the covariant acceleration does nothave to be in ∆. Measuring the norm squared of the covariant acceleration inperpendicular directions is important, because changing directions in certain waysmay be more expensive.

In the Riemannian case it is true that geodesics are cubics with zero initial velocityand acceleration. Riemannian geodesics are fixed points of the Riemannian cubicfunctional. However, this might not necessarily be true on a subRiemannian man-ifold. We study subRiemannian cubics out of mathematical interest and hopefullyas a better measure of complexity of a quantum circuit.

Let x ∈ G, and x ∈ TGx. Denote the right-Lie reduction of x as ˙x,

˙x = xx−1.

Now define X st x =: Xx = X ieix where ej is a vector field in TGx. For a vector fieldY (t) = Y iei along x, the right-Lie reduction of the covariant derivative is defined by

∇tY = Y iei +X iY j∇eiej.

27

28 3.2. Riemannian cubics

The right-Lie reduction of the covariant derivative for two vector fields, for rightinvariant G, is given by

∇eiej =1

2([ej, ei] + h(ei, ej)),

where the standard notation [W,V ] = WV − VW applies. Note the order of theLie bracket in the definition. The right-Lie reduction of a Lie bracket [ei, ej] =

−[ei, ej] = [ej, ei]. We take into account the factors of −1 in our definition.

To save space, the notation

h(V,W ) = B(V,W ) +B(W,V ) (3.1)

where B is the bi-linear operator defined by

⟨B(V,W ), Z⟩J := ⟨[Z,W ], V ⟩J= ⟨[W,Z],J V ⟩

= ⟨W, [Z,J −1(V )]⟩

= ⟨[J V,W ], Z⟩

= ⟨J −1([J V,W ]),JZ⟩

= ⟨J −1([J (V ),W ]), Z⟩J ,

has been used. Therefore

B(V,W ) = J −1([J V,W ]), (3.2)

which gives∇tY = ˜Y +

1

2([Y , X] + h(X,Y )). (3.3)

When Y = x this equation reduces to

∇tx = X + J −1[JX,X] (3.4)

This is the negative of the left-Lie reduction of ∇tx as seen in [17], because we usedright-Lie reductions.

3.2 Riemannian cubics

Before examining the subRiemannian case, we investigate the Riemannian case inthe metric, ⟨, ⟩J .

S[x] =∫ b

a

dt ⟨∇tx,∇tx⟩J . (3.5)

3.2. Riemannian cubics 29

It can be shown that the critical points of equation (3.5) are given by

∇3t x+R(∇tx, x)x = 0. (3.6)

Given that x must satisfy the Schrödinger equation x = V x, we can find an equationfor V via right-Lie reduction. From the previous section we found that

∇tx = V +1

2h(V, V ),

and then

∇2tx =

d

dt∇tx+

1

2

([∇tx, V ] + h(V, ∇tx)

)=

d

dt(X +

1

2h(V, V )) +

1

2

([V +

1

2h(V, V ), V ] + h(V, V +

1

2h(V, V ))

)= V +

3

2h(V, V ) +

1

2[V , V ] +

1

4[h(V, V ), V ] +

1

4h(V, h(V, V ))

where the derivative of h is given by

d

dth(X,Y ) =

d

dt(B(X,Y ) +B(Y,X))

= h(X, Y ) + h(X, Y ).

Finally

∇3t x =

d

dt(∇2

t x) +1

2([∇2

t x, V ] + h(V, ∇2t x)

=d

dt

(V +

3

2h(V, V ) +

1

2[V , V ] +

1

4[h(V, V ), V ] +

1

4h(V, h(V, V ))

)+

1

2

([V +

3

2h(V, V ) +

1

2[V , V ] +

1

4[h(V, V ), V ] +

1

4h(V, h(V, V )), V

]+ h(V,

(V +

3

2h(V, V ) +

1

2[V , V ] +

1

4[h(V, V ), V ] +

1

4h(V, h(V, V ))

))=

...V + [V , V ] +

3

2h(V , V ) + 2h(V, V )

+1

4[h(V, V ), V ] +

1

4h(V , h(V, V )) +

5

4h(V, h(V, V ))

+5

4[h(V, V ), V ] +

1

4[[V , V ], V ] +

1

8[[h(V, V ), V ], V ]

+1

8[h(V, h(V, V )), V ] +

1

4h(V, [V , V ])

+1

8h(V, [h(V, V ), V ]) +

1

8h(V, h(V, h(V, V )))

Now consider the curvature term R(∇tx, x)x. The curvature operator is defined by

R(V,W )Z = ∇V∇WZ −∇W∇VZ −∇[V,W ]Z.

30 3.3. SubRiemmannian cubics

where V,W,Z ∈ TG. Denote the right Lie reduction as R. Writing this in localcoordinates,

R(ei, ej)ek = ∇ei∇ejek −∇ej∇eiek −∇[ei,ej ]ek

=⇒ R(ei, ej)ek =1

2([∇ejek, ei] + h(∇ejek, ei))

− 1

2([∇eiek, ej] + h(∇eiek, ej))

− 1

2([ek, [ej, ei]] + h(ek, [ej, ei]))

=−1

4[[ei, ej], ek] +

1

4[h(ej, ek), ei] +

1

4h(ei, [ek, ej]) +

1

4h(ei, h(ej, ek))

− 1

4[h(ei, ek), ej]−

1

4h(ej, [ek, ei])−

1

4h(ej, h(ei, ek))

+1

2h([ei, ej], ek)

Then

R(∇tx, x)x =−1

4[[∇tx, V ], V ] +

1

4[h(V, V ), ∇tx] +

1

4h(∇tV, h(V, V ))

− 1

4[h(∇tx, V ), V ]− 1

4h(V, [V, ∇tx])−

1

4h(V, h(∇tx, V )) +

1

2h([∇tx, V ], V )

=−1

4[[V , V ], V ]− 1

8[[h(V, V ), V ], V ] +

1

4[h(V, V ), V ]

+1

4h(V , h(V, V )) +

1

8h(h(V, V ), h(V, V ))

− 1

4[h(V , V ), V ]− 1

8[h(h(V, V ), V ), V ]− 3

4h(V, [V, V ]) +

3

8h(V, [h(V, V ), V ])

− 1

4h(V, h(V , V ))− 1

8h(V, h(h(V, V ), V )

).

When J = I, the bi-invariant metric, substituting all the terms back into theoriginal cubic expression, we find

∇3t x+ R(∇tx, x)x =

...V + [V , V ] = 0,

which simply gives...V = −[V , V ].

This is the negative of the Left-invariant cubics. Specifically when x = xW , wefind W = −V . This expression then matches the equations for the cubics found byNoakes et al in [17].

3.3 SubRiemmannian cubics

The equations for subRiemannian cubics in a matrix Lie group G can also be derivedfrom the PMP. The PMP only applies however when there are no derivatives of a

3.3. SubRiemmannian cubics 31

control function, w : [0, 1] → ∆ ⊂ g in the object functional. When x : [0, 1] → G isrequired to satisfy

x = wx,

the Lie reduction of covariant acceleration gives a first order derivative of the controlfunction w,

S[x] =1

2

∫ 1

0

dt⟨∇tx,∇tx⟩J =1

2

∫ 1

0

dt⟨w + J −1[Jw,w], w + J −1[Jw,w]⟩J

where ⟨, ⟩J , is a Riemannian metric and J corresponds to an inertia transformation.Clearly ∇tx is not always in ∆, so it is important to measure the accelerations inthe perpendicular directions. Only w is constrained to be in ∆.

One option so we can apply the PMP is to define a new control function u : [0, 1] →∆, and an additional costate ϑ ∈ g∗. Then w is treated as a state variable,

x = wx,

w = u.

This is so the action, S[x], contains no derivatives of the control u. Now the PMPHamiltonian can be written as

H = λ(wx) + ϑ(u) +ν

2⟨u+ J −1[Jw,w], u+ J −1[Jw,w]⟩J .

Where λ ∈ T ∗G, ϑ ∈ g∗. The PMP requires that λ(z) = −dH(z)x, z ∈ TG, andϑ(v) = −dH(v)w, v ∈ g. Computing the derivatives

λ(z) = −λ(wz),

andϑ(v) = −λ(vx) + ν⟨J −1[J v, w] + J −1[Jw, v], u+ J −1[Jw, v]⟩J .

λ can be identified with a Λ∗ ∈ g∗ by λ(vx) =: Λ∗(v). Λ∗ can then be identified witha Λ ∈ g via the bi-invariant inner product Λ∗(v) =: ⟨Λ, v⟩, Λ ∈ g. Note ⟨, ⟩J doesnot need to be used as it will only scale both sides of the resulting equations by J .

Likewise ϑ ∈ g∗ can be assigned to a Θ ∈ g, also by the bi-invariant inner productϑ(v) =: ⟨Θ, v⟩.

Differentiating Λ∗,

Λ∗(v) = λ(vx) + λ(vwx)

= −λ(wvx) + λ(vwx)

= Λ∗([v, w]).


which gives

⟨Λ, v⟩ = ⟨Λ, [v, w]⟩ =⇒ Λ = −[Λ, w].

Furthermore

⟨Θ, v⟩ = −⟨Λ, v⟩ − ν

(⟨J [w, u], v⟩+ ⟨J [w,J −1[Jw,w]], v⟩

− ⟨[Jw, u], v⟩ − ⟨[Jw,J −1[Jw,w]], v⟩)

=⇒ Θ = −Λ− ν

(J [w, u] + J [w,J −1[Jw,w]]

− [Jw, u]− [Jw,J −1[Jw,w]]).

3.3.1 Normal case

In the normal case ν < 0, the optimal control u∗ must maximise H. When the metricis not bi-invariant the costates depend on ν which adds additional complexity. Fornow we will only consider the bi-invariant metric, so J = I. Then without loss ofgenerality we can set ν = −1. Writing H in terms of inner products,

H = ⟨Λ, w⟩+ ⟨Θ, u⟩ − 1

2⟨u, u⟩.

Maxima occur when dH(u∗)u = 0, so

⟨Θ, u∗⟩ − ⟨u, u∗⟩ = 0,

∀u∗ ∈ ∆. So clearly this gives proj∆(Θ) = u. The equations for the costates in g

can then be reduced into a single equation,

Θ = −[Θ, w], (3.7)

where Θ = Θ⊥ + w, and Θ⊥ ∈ g/∆. As a check of this calculation, let ∆ = g,then necessarily Θ⊥ = 0. The resulting normal equations match the bi-invariantRiemannian case, ...

w = −[w, w].

There are several conserved quantities in (3.7). First, take the inner product of(3.7) and w, to find ⟨...w,w⟩ = ⟨w +Θ⊥, [w,w]⟩ = 0. Second, take the inner productof equation (3.7) and w + Θ⊥, to find ⟨...w, w⟩ + ⟨Θ⊥, Θ⊥⟩ = 0. Integrating theseequations,

⟨w, w⟩ = C2 −1

2⟨w, w⟩, (3.8)

⟨w, w⟩+ ⟨Θ⊥, Θ⊥⟩ = C3. (3.9)

3.3. SubRiemmannian cubics 33

Clearly C3 > 0. In general, solutions to (3.7) are difficult to find. One subset ofsolutions are the linear Lie quadratics

w = (q0 + q1t+ q2t2)w0, Θ⊥ = Θ0,

where qi ∈ R and w0 ∈ ∆ and Θ0 ∈ g/∆ are constant matrices.

3.3.2 Abnormal case - SU(2)

The abnormal case is given by ν = 0. The PMP Hamiltonian can be written as

H = ⟨Λ, w⟩+ ⟨Θ, u⟩.

Computing dHu(u∗), maxima occur when ⟨Θ, u∗⟩ = 0. This occurs when proj∆Θ =

0, so Θ = Θ⊥ ∈⊥ where ⊥ is the orthogonal complement of ∆. Unlike geodesics,even in simple groups such as SO(3) and SU(2), abnormals can not be ruled out bya contradiction. Equation (3.7) reduces to

Θ⊥ = −[Θ⊥, w]. (3.10)

Using SU(2) as an example, let B∆ = τ1, τ2, and B⊥ = τ3, clearly the only onlyway to satisfy (3.10) is if Θ⊥ = 0. Then Θ⊥ = Θ0, a constant. So the PMP doesnot give any information on how to choose w in this case.

3.3.3 SubRiemannian cubics with tension

The abnormal case might be removed by adding an additional term in the action.With the bi-invariant metric, consider cubics with tension,

S[x] =1

2

∫ 1

0

dt ⟨w, w⟩+ T ⟨w,w⟩

where T ∈ R≥0 is the tension. Here ⟨, ⟩ will just be the restriction of the bi-invariantnorm to the distribution. The more general case can be handled as before. Anothermotivation for the tension, is if the norm of a cubic is smaller, it will make it cheaperto approximate in terms of quantum circuit elements. Specifically because a largerstep size can be used in approximating x. As before, introduce the additional controlfunction u and costate ϑ, so x = wx, w = u. Writing the PMP Hamiltonian

H = λ(wx) + ϑ(u) +ν

2

(⟨u, u⟩+ T ⟨w,w⟩

).

The equations for the co-states can be computed in the same way as the cubicswithout tension. For the first costate

˙λ(z) = −dHx(z) = −λ(zx)


and for the second costate

ϑ(v) = −dHw(v) = −λ(vx)− ν T ⟨v, w⟩.

As before, these can be written as equations in the Lie algebra,

Λ = −[Λ, w]

Λ = −Θ− ν T w.

Writing as a single equation

Θ + ν T w = −[Θ, w].

In the abnormal case, minima of H occur when Λ and Θ are perpendicular to w asbefore. All time derivatives of w vanish, giving no extra information. Hence addingthe tension constraint does not remove the abnormal case.

3.3.4 Higher Order PMP for cubics

Using the method described in Chapter (2.4), we can attempt to find equationsfor 2-abnormal cubics in SU(2n). As required, introduce the new state equationsχ = proj∆(Γ), χ(0) = 0, and Γ = −[Γ, w], where Γ = Θ. Also x = wx and w = u asbefore. For now we will only consider cubics in the bi-invariant metric. This controlsystem requires the PMP Hamiltonian

H = λ1(wx) + λ2(u)− λ3([Γ, w]) + λ4(proj∆(Γ)) +ν

2⟨u, u⟩,

where λ1 ∈ T ∗G, λ2, λ3 ∈ g∗ and λ4 ∈ ∆∗. As before equations can be found in g

for these quantities. Let Λi denote the corresponding quantity in g.

Λ1 = −[Λ1, w]

Λ2 = −Λ1 − [Λ3,Γ]

Λ3 = −[Λ3, w] + proj∆(Λ4)

Λ4 = 0.

Clearly it is sufficient to take Λ4 as a constant in ∆. In the normal case, to solvefor u, re-write H in terms of inner products,

H = ⟨Λ1, w⟩+ ⟨Λ2, u⟩ − ⟨Λ3, [Γ, w]⟩+ ⟨Λ4,Γ⟩+ν

2⟨u, u⟩.

The the optimal u is required to minimise H. In the normal case, without loss ofgenerality, take ν = −1 and find dHu(u

∗) = 0, ∀u∗ ∈ ∆ =⇒ proj∆(Λ2) = u .These are complicated coupled differential equations and would need to be solvednumerically.

3.4. Normal cubics in three dimensional groups 35

3.4 Normal cubics in three dimensional groups

3.4.1 SU(2)

In SU(2), ∆ is the span of the x and y Pauli matrices, ∆ = spanτ1 = iσ1, τ2 = iσ2,while ⊥ is spanned by the Pauli z matrix ⊥= spanτ3 = iσ3. Let Θ⊥ = ϕ. Theequations for the control functions in su(2) expand into two components ,

ϕ = −[w, w] (3.11)...w = −[ϕ, w]. (3.12)

(3.11) can be integrated to give

ϕ = −[w, w] + C1τ3.

subRiemannian cubics in SU(2) are termed null when C1 = 0. Suppose w can bewritten as

w = Ad(W )(rw0).

where r : [0, 1] → R and w0 ∈ ∆. In SU(2), suppose

W =

(eiφ 0

0 e−iφ

),

and φ ∈ R. If r(0) = r(0) = 0, then φ = φ0 t , φ0 ∈ R , and the solutions reduce togeodesics.Lemma 3.4.1. subRiemannian geodesics are normal subRiemannian cubics with zeroinitial velocity in SU(2).

For non zero initial velocity and acceleration, in the limit t → ∞, certain solutionsto (3.11), are observed to behave like quadratics. To see this, simplify the problemby taking w → ω ∈ C. Now equation (3.11) is written as a complex equation

...ω =

1

2ω( ˙ωω − ωω)− iC1ω.

In the null case, C1 = 0, in polar coordinates ω = reiϑ, the equations separate intocoupled radial and angular components

...r − 3rϑ2 − 3rϑϑ = 0

3ϑr + 3rϑ− rϑ3 + r...ϑ = −ϑr3.

First make a change of variable and let ϑ = ψ. Multiply the first equation by r andthen integrate to obtain

−1

2r2 + rr − 3

2r2ψ2 + C1 = 0,

36 3.4. Normal cubics in three dimensional groups

r

r'

r''

(a)

-

-

θθθ

θ

θ'

θ''

(b)

Figure 3.1: Radial and angular components of a null cubic control function in su(2),with unit initial conditions.

where C1 is another constant. Now in certain conditions the behaviour of the radialcomponent, r is observed to asymptotically be quadratic, while ϑ and ϑ → 0. Inthis case, in the large time limit, the radial equation behaves like

−1

2r2 + rr + C1 = 0.

This has solutionsr =

(C2t+ 2C3)2 − 2C1t

2

4C3

,

where C2 and C3 are other constants. Now the angular equation becomes

rφ+ 3rφ+ (3r + r3)φ = 0.

Recall, the transformed Bessel equation

x2d2y

dx2+ (2p+ 1)x

dy

dx+ (a2x2r + b2)y = 0,

has solutionsy = x−p

(D1Jq/r

(arxr)+D2Yq/r

(arxr))

,

where Y and J are Bessel functions, q =√p2 − b2 and D1, D2 ∈ R. If r = q2(t+q1)

2,first make the change of variable z = t+ k, and then this case can be integrated interms of Bessel functions.

4. Numerical calculations of geodesics and cubics

4.1 Geodesics

Integrating the subRiemannian geodesic equations analytically is a difficult task. In-stead, as a more practical approach, geodesics can be computed numerically. Whilethe geodesic equations can be forward integrated easily, solving the boundary valueproblem adds additional complexity. For quantum computing the geodesics arealso required to have minimal energy. Solving this boundary value problem canbe treated as a constrained optimisation problem. Specifically finding the initialconditions Λ0, with the smallest norm, such that for a trial solution x, the error,err(x(1), U) is small.

Other work has attempted to find subRiemannian geodesics by slowly increasingthe penalty p in the Riemannian geodesic equations, using a previous iteration asa guess for the current value of p [6]. There is also no guarantee that this methodgives an optimal solution. The method proposed here does not rely on increasing pas it solves the subRiemannian equations directly.

From numerical experimentation, it was found that using the trace error, given by

err(x, U) = 1− 1

2ntr(U †x)

gave faster convergence. Additionally this definition of error is much easier to calcu-late compared to the Riemannian distance which involves a matrix logarithm. Theonly downside being that this error is less accurate, but the results it generates canbe refined by switching back to the Riemannian distance definition.

Recall that geodesics can be found by solving the equation

x = proj∆(xΛ0x†)x

where Λ0 = u0 + Λ⊥0 and proj is projection onto ∆. While standard numerical

integration methods can be used, they will not preserve the unitary property of x,(with error order of the integration method). Given that the matrix exponential canbe computed accurately given x at step j, xj, xj+1 can be found by the modifiedEuler discretisation

xj+1 = exp(h proj∆(xjΛ0x†j))xj.

In a similar fashion higher order geometric integrators, analogous to Runge-Kutta,which respect the group structure are easily constructed [35].

37

38 4.1. Geodesics

4.1.1 Direct Search

The direct search method is simply forward solving the geodesic equations andadjusting the initial conditions to minimise the error. This can be done with a quasi-Newton method. The main problem with this approach is that the error function hasmany local minima, slowing convergence. Further complicating the matter is thatthere are infinitely many geodesics joining two points with differing energy. Optimalgeodesics are solutions with the smallest energy such that the boundary conditionsare satisfied. To eliminate nonoptimal solutions, constrained optimisation can beused to systematically rule out higher energy geodesics.

As the energy is completely determined by the norm of the initial conditions, itis simply a matter of constraining this norm at each stage to forbid nonoptimalgeodesics. Evidently, the subRiemannian distance is bounded below by the (bi-invariant) Riemannian distance, dsub(I, U) ≥ dRiem(I, U). Hence for the normalsubRiemannian geodesics there is a lower bound on the norm of the initial conditions∥proj∆(Λ0)∥ = ∥u0∥ ≥ ∥log(U)∥. The Riemannian distance is simply the norm ofthe principal branch of the logarithm, ∥ log(U)∥.

Now to find subRiemannian geodesics, an initial Λ0 is chosen such that ∥u0∥ = c,where c is larger than the Riemannian distance. A simple initial guess is a scalarmultiple of log(U). This Λ0 is adjusted via constrained optimisation (with constraintc) until the error is small, err(x(1), U) ≤ ϵ, where ϵ is some small parameter. Nextthe constraint is lowered by some small parameter δ , c→ c− δ. Then the previousΛ0 are also adjusted Λ0 → Λ0 , so ∥u0∥ ≤ c. Finally the Λ0 is adjusted again viaconstrained optimisation, with constraint c, so the error is small. This procedureis iterated until the error fails to converge below ϵ, or the norm is equal to theRiemannian distance (then the target is the exponential of elements in the allowedset).

A more sophisticated way to generate an initial guess for Λ0, is to use the seriessolution for x, given in the previous chapter, truncated to some order. This alsorequires optimisation, but can improve overall convergence. Truncating the series tofirst order, often a choice of Λ0, so err(y, U) ≤ ϵ where y = exp(−proj⊥(Λ0)) exp(Λ0)

would be adequate. The accuracy of this initial guess can be further improvedby using Leapfrog, which divides x into a number of shorter segments which caniteratively be improved [36].

4.1. Geodesics 39

4.1.2 Discrete Geodesics

A significant problem with the direct search method is that the size of the basis forsu(2n) scales exponentially. Worse, integrating the geodesic equations requires allthe basis matrices to be stored in memory. Even with sparse data structures thispresents significant challenges. A much more efficient way to find geodesics is todiscretise the problem entirely. First consider the discrete curve x : RN |∆| → SU(2n),composed of N segments,

x = exp(u1i /Nτi) . . . exp(uNi /Nτi)

where each uji ∈ R, and τi ∈ ∆ (repeated indices summed). x will be a discretesubRiemannian geodesic when it minimises the discretised energy functional.

E [x] = h

2

N−2∑i=0

(⟨u(ti+1), u(ti+1)⟩+ ⟨u(ti), u(ti)

), (4.1)

where u(ti) = uijτj. As before, to reach an arbitrary U ∈ SU(2n) exponentially manysegments will still be required in general. The main advantage here is that only theallowed set will need to be generated, along with a vector of real coefficients uji .This is still significantly smaller than the entire basis for su(2n).

To find the discrete geodesic joining I and U , minimise the object function, F whichis the sum of the energy and the error,

F = κ err(x, U) + ε E [x]

where κ and ε are experimentally chosen weightings.

One common method to minimise such a function is to use Newton’s or a quasi-Newton method. For this approach κ and ε were found to require scaling with theerror function. If they did not change the optimisation procedure would becomestuck at a local minimum, with suboptimal error. From experimentation, scalingsof ε = 10−perr(x, U)+10−q and κ = 103 gave best results. This is to stop the energyterm dominating the error term. Figure (4.1) shows the improved convergence withthe adaptive scaling.

For x to accurately approximate U , the trace error needs to be less than O(10−8).The weighting ensures that early in the optimisation procedure, a greater emphasisis placed on the error. Once a non optimal curve joining I and U is approximatelyfound, it can be refined into having lower energy.

40 4.1. Geodesics

Convergence

TraceError

Iteration

Figure 4.1: Red: convergence of the error function without adaptive weighting.Blue: convergence of the error function with adaptive weighting

The Hessian of F is an expensive function to calculate. This is because the deriva-tives of the error term,

∂2err∂ukj∂u

qp=

1

N2tr(U †x1 . . . τjxk . . . τpxq . . . xn), (4.2)

involve many products of matrix exponentials in each entry. The calculation ofthe gradient of F can be found in Appendix B. To avoid evaluating this expensivefunction, the Hessian can be approximated at each step. In general the problemwill scale poorly to higher dimensions if the Hessian has to be evaluated. Futurework should investigate alternative optimisation methods which do not calculate thegradient.

4.1.3 Results

Direct

Figure 4.2 shows two of the 36 control functions generated by the direct search for arandomly chosen target in SU(8). The first iteration, shown in red, gave an energyof 36.806. The second iteration, shown in blue, gave an energy of 24.74. Beyondthis the algorithm failed to converge. The trace error was O(10−8) for both cases.The algorithm was implemented in Mathematica 11.0 and took, over several trials,an average of 9.5 hours on an i7-5960x.

Interestingly, the reduction in norm of the control functions was also often accom-panied by a reflection. This type of behaviour is more easily visualised on the sphereS2. Starting from the north pole, a geodesic is a line of longitude. The minimalsolution will be a geodesic which is just a single arc. Higher energy geodesics maywrap around the sphere multiple times. The same situation can occur in SU(8), butthis is a 63 dimensional space so the differences may be more subtle.

4.1. Geodesics 41

0.0 0.2 0.4 0.6 0.8 1.0

-4

-2

0

2

4

t

First and second iteration

(a)

0.0 0.2 0.4 0.6 0.8 1.0

-4

-2

0

2

4

6

t

First and second iteration

(b)

Figure 4.2: Comparison of two of the 36 control functions after two iterations of thedirect search. Red is first iteration, blue is second. Other control functions displayedsimilar behaviour.

Discrete

Figure 4.3 shows a sample of of the discrete control functions generated for randomlychosen targets in SU(16) and SU(32).

0.0 0.2 0.4 0.6 0.8 1.0

-10

-5

0

5

10

t

Sample of discrete control functions in su(16)

(a)

0.0 0.2 0.4 0.6 0.8 1.0-20

-10

0

10

20

t

Sample of discrete control functions in su(16)

(b)

Figure 4.3: Same of the control functions for discrete geodesics in SU(16) andSU(32), generated by optimisation.

Direct vs Discrete

For three random unitary matrices in SU(8) we benchmarked the two different meth-ods. Both methods were run until the trace error was O(10−8). The discrete curveshad N = 10 segments. For the direct search a step size of h = 0.001 was used. Bothalgorithms were implemented in Mathematica 11.0 and run on a 3.7 GHz (fixed)i7 5960x. More segments could have been used, but since each trial converged tothe desired accuracy this highlights the increased efficiency of the discrete method.It is likely as n increases more segments will have to be used.

42 4.2. Cubics

0.0 0.2 0.4 0.6 0.8 1.0

-2

-1

0

1

2

t

Direct vs discrete

(a)

0.0 0.2 0.4 0.6 0.8 1.0-2

-1

0

1

2

t

Discrete vs direct

(b)

0.0 0.2 0.4 0.6 0.8 1.0

-2

-1

0

1

2

t

Direct vs discrete

(c)

0.0 0.2 0.4 0.6 0.8 1.0

-0.4

-0.2

0.0

0.2

0.4

t

Direct vs discrete

(d)

Figure 4.4: Comparison of some discrete and direct control functions in SU(8). Redis result of direct search, blue is discrete. Often the discrete result would show areflection.

Timing - Direct Timing - Discrete Energy - Direct Energy - Discrete

U1 9.94 hrs 1.52 hrs 20.175 25.6217U2 8.78 hrs 1.45 hrs 25.785 26.541U3 1.94 hrs 48.45 mins 23.054 25.399

Comparing some of the control functions generated by the discrete and direct search.Often the discrete result would be a reflection of the exact solution. This is analogousto the situation on the sphere S2. Consider the geodesics from the north to southpole. In this case the geodesics can be thought of as the lines of longitude. To reachthe south pole there are infinitely many choices, but they all have equal energy. Theenergy of the discrete and the exact result was found to be comparable for all targetstested. This then suggests the most efficient way to find subRiemannian geodesicsis to use the initial data of a discrete curve as an initial guess for the direct search.

4.2 Cubics

4.2. Cubics 43

4.2.1 Direct search for cubics

Given an initial velocity and acceleration, Λ(0), and Λ(0) to satisfy x(1) = U ,The simplest way to calculate cubics is to forward solve the cubic equations andthen adjust initial conditions based on the error at the x(1) boundary. Recall thatfor subRiemannian geodesics this process was relatively inefficient compared to adiscrete method.

4.2.2 Discrete Cubics

Like discrete geodesics, discrete cubics are minima of the discretised cubic functional

S[x] =h

2

N−1∑j=1

⟨w(tj), w(tj)⟩+ ⟨w(tj−1), w(tj−1) (4.3)

=N−1∑j=1

||w(tj)− w(tj−1)||2 (4.4)

where ⟨, ⟩ is the restriction of the bi-invariant metric. Using other metrics wouldnot change the procedure significantly. Now to find the discrete cubics, numericallyminimise the weighted sum

F = ω err(xA(u), U) + κN−1∑j=1

||w(tj)− w(tj−1)||2

where κ ≥ 0 and ω > 0 are chosen experimentally, and err is the error. To solvethe continuous cubic equations, the value of w(0) also needs to be specified. In thediscrete case, this can be achieved by adding a constraint on the components sow(t2)− w(t1) = hw(0). For the quantum computing application the need of such aconstraint is not clear. The initial acceleration does not particularly correspond toany physical quantity.

If no constraints are given, the optimisation procedure would often converge to acontrol function, for example Figure (4.5), which has zero acceleration for a duration,and then a sudden spike. This type of behaviour is not apparent from the normalequations given by the PMP. Controls with zero initial acceleration and acceleration,in the normal case, are required to be geodesics. Such behaviour is not apparent insubRiemannian geodesics. This potentially is a numerical example of an abnormalcubic. Compared to geodesics, this could be considered a simpler circuit, as gates canbe reused. Approximating the previous geodesics as a circuit, the phase gates changeconstantly. The curve shown in 4.5 the phase gates only change once. However thereis still a significant problem, as the cubic is allowed to move in a linear combinationof directions at each time step.

44 4.2. Cubics

-

-

()

Figure 4.5: Sample of some discrete cubic control functions with no constraints oninitial conditions in SU(8) for a random target.

4.2.3 Indefinite cubics

Unfortunately cubics are still not the most accurate (continuous) description of anefficient quantum circuit. A cubic is still allowed to move in a linear combinationof directions in ∆. So at a small time step x(t + h) ≈ exp(hwiτi)x(t), and eachexp(hwiτi) requires additional approximation. The naive way to approximate thisexponential is to use the Lie product formula. However, there might be an exactproduct of exponentials which implements the exp of the linear combination in fewercircuit elements than using the Lie product formula.

Clearly the cost of a circuit must depend on the number of changes in direction, butit should be more expensive to move in a linear combination of directions. Whilethe ideal solution is nonsmooth, a small regularisation is an acceptable compromise.As long as it is small it can approximately be removed in a quantum circuit. Totry and force curves to move in only one direction, modify the cubic functional byadding the indefinite inner product to the cubic equation,

SI [x] =1

2

∫ 1

0

dt⟨∇tx,∇tx⟩+ ⟨w,w⟩I (4.5)

where ⟨w,w⟩I = wiIijwj is what we call the indefinite inner product, wi is a real

coefficient, and I = J−I. This is to try and make movements in linear combinationsof directions more expensive than moving in a single direction.

As before, the most efficient way to try and compute minima of these functionalsis to discretise the problem entirely. We first attempted to find indefinite cubics byadding the indefinite inner product to the discrete object function

F = ω err(x, U) + κ

N−1∑j=1

∥w(tj)− w(tj−1)∥2 + η

N−1∑j=0

⟨w(tj), w(tj)⟩I

4.2. Cubics 45

where η is another experimentally chosen weighting. It took significantly longerto find the minima of this function. Even for a simple target, for example U4 =

exp(i(σ1⊗σ2⊗σ3+σ3⊗σ2⊗σ1)), the process took 4.1 hours to find a curve joiningI to U4, figure (4.6). Note that at each step i the control w approximately takes ona single value, which is the desired behaviour. More complicated targets the timeextended significantly. This is because many more segments are required. When eachxj is the exponential of a linear combination, products of consecutive xj can bracketgenerate more perpendicular directions. When the xj are only the exponential of asingle allowed instruction, more terms are needed to bracket generate perpendiculardirections. This is evident from the Baker-Campbell-Hausdoff formula

exp(X) exp(Y ) = exp(X + Y +1

2[X,Y ] + . . . ).

Currently it might be more practical to use geodesics to create circuits, and acceptsome error from approximation. Alternatively one might consider just minimisingthe error between U and products of exponentials of single instructions. The problemwith this approach is that there is no unique way to do this. Cubics and geodesics areunique which means there is a better chance of the discrete optimisation succeeding.

Figure (4.6) does show it is indeed possible to generate indefinite cubics. Possiblymore advanced optimisation procedures can improve this process in future.

2 4 6 8 10-1

0

1

2

3

i

wji

Figure 4.6: Sample of indefinite cubic control functions with no constraints on initialconditions in SU(8) for U4.

5. Neural Networks

5.1 Background

Solving the cubic and geodesic boundary value problems require expensive optimi-sation procedures. This significantly hinders practical usage. A much faster methodneeds to be devised. One option is to explore better optimisation methods. Al-ternatively, with advances in computational power, neural networks might be anattractive option for generating approximate circuits. Due to the high dimensionand the non commutativity of the matrix exponential it is very difficult to analyt-ically obtain initial guesses for the previous approaches in Chapter (4). Being ableto generate a good initial guess which can later be refined by the previous methodswould be a significant step forward.

The most basic type of neural network is a multilayer perceptron. This type of neuralnetwork consists of several fully connected layers of neurons. The layers are dividedinto input, hidden and output layers. Each neuron in a layer is fully connected toall the neurons in the previous layer. A neuron accepts a vectors of real numbers asan input, and outputs a single real number. This output is computed by first takingthe weighted sum of the inputs, and then passing this sum to an activation function,which then produces an output. Common activation functions include the logisticfunction, the hyperbolic tangent (tanh), and the rectified linear unit (ReLu).

σ1

σ2

σ3

w1

w2

w3

S =∑

i wiσi σ4 = f (S)

Figure 5.1: A basic neuron, σ1−3 are the inputs, wi are the input weights, f is anactivation function, σ4 is the output of the neuron.

Training a neural network uses a technique called backpropagation. Training dataconsists of inputs and desired outputs. First, the input data is fed through the net-work. The error between the output, and the desired output is computed. Gradientdescent is used to minimise the error by adjusting the weights. However insteadof adjusting all the weights simultaneously, the weights are adjusted sequentiallyfrom the final output layer to the input layer via the chain rule. More complicated

46

5.1. Background 47

w5

w6

σ1

σ2

σ3

σ4

w1

w2

w3

w4

σ5

1

2

3

4

5

Figure 5.2: A very simple neural network, σi are the inputs / outputs, wi are theinput weights, f is an activation function. Numbered circles denote neurons.

networks exist featuring different connectivity and types of neurons. Two commontypes are Convolutional Neural Networks (CNNs), and Long Short-Term Memoryneural networks (LSTMs).

Recall, the problem is to find U approximately as a product of exponentials

U ≈E(c) = exp(c11τ1) . . . exp(c1mτm)

. . . exp(cN1 τ1) . . . exp(cNmτm), (5.1)

where E we call the embedding function, c = (c11, . . . , cNm) and the τi are a basis for

a bracket generating subset of the Lie algebra ∆ ⊂ su(2n) of dimension m. Bracketgenerating means that repeated Lie brackets of terms in ∆ can generate any termin su(2n). Because products of matrix exponentials generate Lie bracket terms

exp(A) exp(B) = exp(A+B +1

2[A,B] + . . . ),

any U ∈ SU(2n) can be written as Equation (5.1) with sufficiently many products.We restrict ourselves to U which can be written as a product of a polynomial in n

terms. An example of such a ∆ could be the matrix logarithms of universal gates.For convenience it is easier to work with all permutations of Kronecker products ofone and two Pauli matrices, so

∆ = span i√2nσji ,

i√2nσki σ

lj,

where σji represents the N fold Kronecker product, I ⊗ · · · ⊗ σi ⊗ · · · ⊗ I, with a σi

inserted in the j-th slot and I representing the 2× 2 identity matrix. Exponentialsof these basis elements have very simple circuits, for more detail see Chapter (1).

We propose that a neural network be trained to learn E−1. The neural networkwill try to find all the coefficients cki so that the product approximates U . In thisapproach, the neural network takes a unitary matrix U as an input and returns a

48 5.2. Training data

list c of cki . A segment is a product of m exponentials of each basis element. In totalthere are N segments. We only examine U which are implementable in a reasonablenumber of segments. We found that we required two neural networks to achievethis. The first is a Gated Recurrent Unit, GRU, network [37], [38] which factors aU into a product of Uj,

U ≈ U1U2 . . . Uj . . . UN ,

where each Uj is implementable in polynomially many gates, which we call globaldecomposition. The second is simply several dense fully connected layers, whichdecomposes the Uj into products of exponentials

Uj ≈ exp(cj1τ1) . . . exp(cjmτm),

which we term local decomposition. These procedures can be done with traditionaloptimisation methods. But the lack of a good initial guess meant that it took anorder of an hour in SU(8). While the output from the neural network may notimplement U to a required tolerance, it does provide a good initial guess as theerror will be small.

5.2 Training data

To generate the training data, the c should not be chosen randomly. If there is nostructure to how c is chosen, it will introduce extra redundancy. More seriously,E−1 will not be well defined. There are infinitely many ways to factor a U into someunordered product of matrix exponentials. Geometrically this could be visualisedas taking any path from I to U on SU(2n). Randomly generating data may givetwo different decompositions for a U , and so E is not one to one. To ensure thetraining data is unique, we propose that these paths should be chosen to be, at leastapproximately, minimal normal subRiemannian geodesics.

The choice of using geodesics is not particularly special. Other types of curves couldbe used as long as it uniquely joins I and U . This is so E−1 is well defined. Gener-ating random geodesics can be done simply by generating random initial conditions.However the geodesics must also be minimal. The first way to try and ensure theyare minimal is to bound the norms of the initial conditions.

5.2. Training data 49

From Chapter (3), the normal subRiemannian geodesic equations can be written as

x = ux,

Λ = [Λ, u],

u = proj∆(Λ),

where Λ : [0, 1] → su(2n), u : [0, 1] → ∆ ⊂ su(2n) and proj∆ is projection onto ∆.This can be re-written as the single equation

x = proj∆(xΛ0x†)x, (5.2)

where Λ0 = Λ(0). Choosing the Λ0 completely determines the geodesic. To generatethe training data for the Uj, first randomly choose a Λ0. The Uj are then matriceswhich forward solve the geodesic equations

x(tj+1) ≈ Ujx(tj),

where [0, 1] has been divided into N segments of width h. For this paper we utilisedthe simple first order integrator

Uj = exp(h proj∆(xjΛ0x

†j)),

since approximating the geodesic is sufficient. There are infinitely many bi-invariantRiemannian geodesics joining I and U , for the different branches of log(U). Sub-Riemannian geodesics are similarly behaved, but it varies on the norm of Λ0. Togenerate the training data we bounded the norms by dim(∆) = O(n2), to try andensure the geodesics are unique.

Further, the norm ∥proj∆(Λ0)∥ = ∥u0∥ determines the distance between I and a U .Nielsen showed that the distance can be thought of as approximately the complexityto implement U . Lemma (3) in [1] shows that a U further away from I requires moregates. The distance however is likely to scale exponentially. By bounding the normby a polynomial, this ensures the training data only contains U which are reachablewith a polynomial number of quantum gates.

Naively, a neural network could take all the real and complex entries of U as inputs.For most U this is halved by only taking the real or imaginary components. LetU = B + iC. If B or C is given then it is likely that the other component can berecovered. Suppose B or C is given. Then the other can be found when the systemof equations

(B + iC)(B − iC)T = I (5.3)

50 5.3. Network Design - SU(8)

has a unique solution for C or B. Taking the real and imaginary components,

BBT + CCT − I = 0

BCT − CBT = 0.

This is equivalent to asking if the map ϕ(B) = (BBT +CCT , BCT −CBT ) has fullrank. Differentiating,

dϕB(W ) = (WBT +BW T ,WCT − CW T ),

and suppose dϕB(W ) = (S,A), where S is symmetric, and A is anti-symmetric.Then either

W =1

2S(BBT )−1B (5.4)

or

W =1

2A(CCT )−1C (5.5)

So given that BBT or CCT is invertible, then solutions will likely exist. This thenhalves the total number of inputs.

5.3 Network Design - SU(8)

5.3.1 Global decomposition

The neural network for the global decomposition takes an input of U and returns alist of Uj. To do this U is decomposed into rows of length 2n. This makes 2n realvectors. Each row is treated as a single timestep in the GRU layer. The output Uj

are also decomposed into their rows and these rows are treated as timesteps in theoutput. This gives 2nN output vectors of length 2n. In particular we examined then = 3 qubit case. For SU(8) we found 10 stacked GRU layers was sufficient to givereasonable results.

In SU(8) we chose N = 10 segments, so there were 8 input vectors of length 8 and80 output vectors of length 8. The network was implemented in the Keras Pythonlibrary [39] with the TensorFlow backend, on a Nvidia GTX 1080.

5.3.2 Local decomposition

For SU(8) a network with 2 fully connected dense hidden layers of 2000 neurons,with the ReLU activation function was found to be sufficient. The input layer tooka vectorised Uj, and outputted dim(∆) values. The network was implemented inthe Keras Python library with the TensorFlow backend, on a Nvidia GTX 1080.

5.4. Results - SU(8) 51

5.4 Results - SU(8)

5.4.1 Global decomposition

The global decomposition network was trained on Uj taken from 5000 randomlygenerated geodesics in SU(8). 500 were used for validation data. The loss functionused was the standard Euclidean distance between the output vector and the desiredoutput. The Euclidean distance is simply a rescaling of the standard mean squarederror. However it performed better, as it made the local minima more distinguish-able. The rescaling After 1500 training epochs the validation loss reached ∼ 0.9 anddid not decrease. This was found to be sufficient to generate Uj close to the trainingdata.

Figure (5.3) shows the validation and training loss. Figure (5.4(a)) and figure(5.4(b)) shows a randomly chosen Uj from a list of Uj generated by the network, andfrom the training data respectively for some random U . Most Uj appeared to bevery similar. Figure (5.5(a)) and figure (5.5(b)) show the same entry in consecutiveUi for validation data. Again the network was able to output values very close tothe values in the validation dataset. This similarity was typical. This shows thenetwork is able to reasonably approximate the Uj.

0 500 1000 15000

1

2

3

4

5

6

7

Epoch

Euclid

ean

err

or

Global decomposition loss

loss

val loss

Figure 5.3: The loss and validation loss from training the global decomposition.

52 5.4. Results - SU(8)

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

(a) Real components of a Uj

generated by the NN.

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

(b) The respective known realcomponents of a Uj from thevalidation dataset.

Figure 5.4: A known Uj from the validation data and the Uj generated by the NNin SU(8) for global decomposition. Each Uj is close to the identity matrix. Theshading from blue to orange represents [−1, 1]

2 4 6 8 10

0.030

0.035

0.040

0.045

0.050

0.055

i

Components of Re(Ui)

(a) The same real entry from the 10 Ui

from the validation data set (blue), vs thepredicted output (red).

2 4 6 8 10

-0.055

-0.050

-0.045

-0.040

-0.035

i

Components of Re(Ui)

(b) The same real entry from the 10 Ui

from the validation data set (blue), vs thepredicted output (red).

Figure 5.5: Real entries of validation Ui vs the Ui generated by the NN. Recall theUi are not constant, and solve equation (5.2). The behaviour displayed here wastypical in other entries.

5.4.2 Local decomposition

The network to implement the local decomposition was trained on Uj generated bychoosing a random m-vector of the coefficients cji , where each cji was order 1/N . Intotal there were 5000 pairs in the training set, and 500 in the validation set. Figure(5.6) shows the validation and training loss. After 500 epochs the network wasable to sufficiently compute the local decomposition to reasonable error (on average0.16). Figures (5.7(a)) and (5.7(b)) show a matrix generated by the neural networkand the target matrix.

5.5. Discussion 53

0 100 200 300 400 5000

10

20

30

40

50

Epoch

Eu

clid

ea

ne

rro

r

Local decomposition loss

loss

val loss

Figure 5.6: The loss and the validation loss from training the local decomposition.There was no significant improvement after 500 epochs.

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

(a) Real components of a Uj

generated by the NN.

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8

(b) The respective known realcomponents of a Uj from thevalidation dataset.

Figure 5.7: A known Uj from the validation data and the Uj generated by the NNin SU(8). These figures are for the local decomposition network. The shading fromblue to orange represents [−1, 1]

5.5 Discussion

Training two neural networks to together decompose U into cji via a two-step ap-proach (global decomposition followed by local decomposition) was found to besuccessful, when restricting the set of training data generated to paths which ap-proximate minimal normal subRiemannian geodesics. This restriction limited thetraining data pairs to ones which were one-to-one, eliminating redundancy. For theglobal decomposition, using a neural network consisting of stacked GRU layers al-lowed for efficient training of the network, with the validation loss of the networkapproaching its minimum at 500 epochs for SU(8). A simple dense network with twohidden layers proved sufficient for the local decomposition. In SU(8), the networks

54 5.5. Discussion

were small enough that both networks were able to be trained on a desktop machinewith a single Nvidia GTX 1080 GPU. The two stage decomposition proved moresuccessful than single-stage attempts to form a solution, with the decomposition ofa given U into Uj being crucial for this increase in effectiveness. This approach tothe solution for this problem demonstrates a novel use of neural networks.

Although this approach works well for systems with small numbers of qubits (such asthe SU(8) case used as an example), the approach does not scale well with increasingnumber of qubits. This is because the size of the network scales by the number ofentries in matrices in SU(2n). Although this is not a significant problem for currentlyrealisable quantum computers, or those in the near future, it will increasingly becomeproblematic as quantum computing continues to advance. To somewhat counteractthis, the complexity of the problem can be decreased by restricting the set of U onwhich the neural network is trained. For example if the U are sparse, some savingsin the size of the network may be made. Investigating this will be increasinglysignificant, as it will increase the practical usefulness of this approach.

As noted in section 5.2, the choice of using geodesics to restrict the training data isfairly arbitrary, and as such, there may be different ways of restricting the trainingdata which, while still ensuring the input/output is one-to-one, may produce abetter dataset, improving the accuracy of the networks. This is heavily related tothe nature of Λ0 which is currently not fully understood. Exploring this problem isa possible future avenue of investigation, which may improve the effectiveness of theapproach described in this paper.

Finally note that training the network is the most computationally expensive part ofthis approach. Once the network is trained, propagating an input through throughthe network is much more efficient than the conventional optimisation techniquesfor compiling U .

6. Summary

The ability to efficiently compile specified U into elementary gates is an assumptionmade in many quantum algorithms. Before Nielsen et al [1]–[5], quantum compi-lation was treated as an algebraic problem. Trying to describe an optimal circuitusing geometry is a novel approach. Previous work [1]–[6] in this direction focusedon computing geodesics on a Riemannian manifold with a penalty metric wherethe penalty is made large. This added additional complexity as the penalty had togradually be made large numerically. Recasting it as a problem in Subriemanniangeometry removes the need for a penalty.

Subriemannian geodesics and cubics, while interesting mathematical objects, stillhave limited practicality in constructing quantum circuits. This is due to the dif-ficulty in solving the boundary value problem in large dimensions. To significantlyreduce the computational cost, we numerically calculated discretised cubics andgeodesics. In SU(8) the discrete approach gave double or several times speedup,and with a much smaller memory footprint.

Naively computing geodesics also ignores information about the U commonly re-quired. Often the U belong to a low dimensional subset of SU(2n). Solving thecontinuous boundary value problem, as it stands, requires adjusting an exponentialnumber of parameters which is computationally prohibitive for larger n. The dis-crete methods can better account for the low dimensional U by only taking somepolynomial sized number of products. Future work could try to relate the sparsenature of U to the number of parameters needed in a more rigorous way.

Another caveat with the geodesic approach is that there was no consideration forerror correction. While the CNOT, H and Y gates can be implemented in anerror correcting way, the phase gate is not. It must further be decomposed into aproduct of H, and T gates which does add additional complexity. However this isunavoidable. The geodesic approach requires the continuously variable parametersin the phase gates so a continuous optimisation method can be used. This could thenbe considered an advantage, as a search algorithm dealing with the discrete objectswould be even more computationally expensive as it must check permutations ofdiscrete gates.

The neural network approach offers an interesting practical tool. For three qubitsthe neural network was able to generate circuits for arbitrary U , with bounded

55

56

complexity, with minimal error. Additionally, once the network had been trained,this was significantly faster than performing an optimisation procedure. Futurework will extend this approach to higher dimensions. All data and programs usedto produce this work can be found at https://github.com/Swaddle/nnQcompiler.

https://github.com/Swaddle/nnQcompiler

Bibliography

[1] M. Nielsen, “Quantum computation as geometry,” Science, vol. 311, pp. 1133–1134, 2006.

[2] ——, “A geometric approach to quantum circuit lower bounds,” QuantumInformation and Computation, vol. 6, pp. 213–262, 3 2006.

[3] M. Nielsen and M. Dowling, “The geometry of quantum computation,” Quan-tum Information and Computation, vol. 8, pp. 861–899, 10 2008.

[4] M. Nielsen, M. Gu, M. Dowling, and et al, “Optimal control, geometry, andquantum computing,” Physical Review A, vol. 73, p. 062 323, 6 2006.

[5] M. Nielsen, A. Doherty, and M. Gu, “Quantum control via geometry: Anexplicit example,” Physical Review A, vol. 78, p. 032 327, 3 2008.

[6] X. Wang, M. Allegra, K. Jacobs, C. L. Seth Lloyd, and M. Mohseni, “Quantumbrachistochrone curves as geodesics: Obtaining accurate minimum-time pro-tocols for the control of quantum systems,” Physical Review Letters, vol. 65,p. 170 501, 2015.

[7] R. P. Feynman, “Simulating physics with computers,” International journalof theoretical physics, vol. 21, no. 6, pp. 467–488, 1982.

[8] D. Deutsch, “Quantum theory, the church-turing principle and the universalquantum computer,” Proceedings of the Royal Society of London A: Math-ematical, Physical and Engineering Sciences, vol. 400, no. 1818, pp. 97–117,1985. doi: 10.1098/rspa.1985.0070.

[9] M. Nielsen, Quantum Computation and Quantum Information. CambridgeUniversity Press, 2010.

[10] S. Lloyd, “Universal quantum simulators,” Science, vol. 273, no. 5278, p. 1073,1996.

[11] Y. G. Chen and J. B. Wang, “Qcompiler: Quantum compilation with the CSDmethod,” Computer Physics Communication, vol. 184, pp. 854–865, 2013.

[12] T. Loke, Y. H. Chen, and J. B. Wang, “OptQC: An optimized parallel quantumcompiler,” Computer Physics Communications, vol. 185, pp. 3307–3316, 122014.

[13] Y. Kawano, Y. Nakajima, M. Nakanishi, Y. Nakashima, H. Sekigawaand, andS. Yamashita, “Synthesis of quantum circuits for d-level systems by usingcosine-sine decomposition,” Quantum Information and Computation, vol. 9,pp. 423–443, 2009.

57

https://doi.org/10.1098/rspa.1985.0070

58 Bibliography

[14] M. Mottonen, J. J. Vartiainen, V. Bergholm, and M. M. Salomaa, “Quan-tum circuits for general multiqubit gates,” Physical Review Letters, vol. 93,p. 130 502, 2004.

[15] W. Kühnel, Differential geometry, curves and surfaces. American Mathemat-ical Society, 2006.

[16] R. Montgomery, A Tour of Subriemannian Geometries, Their Geodesics andApplications. American Mathematical Society, 2006.

[17] L. Noakes and T. Ratiu, “Bi-Jacobi fields and Riemannian cubics for left-invariant SO(3),” Communications in Mathematical Sciences, vol. 14, 1 2016.

[18] R. Bertlmann and P. Krammer, “Bloch vectors for qudits,” Journal of PhysicsA: Mathematical and Theoretical, vol. 41, no. 23, p. 235 303, 2008.

[19] H. Brandt, “Tools in the Riemannian geometry of quantum computation,”Quantum Information Processing, vol. 11, pp. 787–839, 3 2012.

[20] J. Milnor, Morse Theory. Princeton University Press, 1960.[21] K. Shizume, T. Nakajima, R. Nakayama, and Y. Takahashi, “Quantum com-

putational riemannian and sub-riemmanian geodesics,” Progress of TheoreticalPhysics, vol. 127, pp. 997–1008, 6 2012.

[22] M. Luo, X.-B. Chen, Y.-X. Yang, and X. Wang, “Geometry of Quantum Com-putation with Qudits,” Scientific Reports, vol. 4, p. 4044, 4044 2014. [Online].Available: http://www.nature.com/articles/srep0404.

[23] Z.-H. Yu, B. Li, and S.-M. Fei, “Geometry of quantum computation withqutrits,” Scientific Reports, vol. 3, p. 2594, 2594 2013. [Online]. Available:http://www.nature.com/articles/srep02594.

[24] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishechenko,“The mathematical theory of optimal processes,” Journal of Applied Mathe-matics and Mechanics, vol. 43, no. 10-11, pp. 514–515, 1963.

[25] D. F. Lawden, Analytical Methods of Optimization. Dover Publications, 2003.[26] Y. Sachkov, “Control theory on lie groups,” Journal of Mathematical Sciences,

vol. 156, pp. 381–439, 3 2009.[27] A. J. Krener, “The high order maximal principle and its application to sin-

gular extremals,” SIAM Journal on Control and Optimization, vol. 15, no. 2,pp. 256–293, 1977.

[28] R. Montgomery, “A survey of singular curves in sub-riemannian geometry,”Journal of Dynamical and Control Systems, vol. 1, no. 1, pp. 49–90, 1995,issn: 1573-8698.

http://www.nature.com/articles/srep0404

http://www.nature.com/articles/srep02594

Bibliography 59

[29] A. D. Boozer, “Time-optimal synthesis of su(2) transformations for a spin-1/2system,” Physical Review A, vol. 85, p. 012 317, 1 2012.

[30] L. Noakes, “Lax constraints in semi-simple Lie groups,” The Quarterly Journalof Mathematics, vol. 57, pp. 527–538, 2006.

[31] W. Magnus, “On the exponential solution of differential equations for a linearoperator,” Communications of Pure and Applied Mathematics, vol. 7, pp. 649–673, 1954.

[32] F. Casas, S. Blanes, J. Oteo, and J. Ros, “The magnus expansion and someof its applications,” Physics Reports, vol. 570, pp. 151–238, 5-6 2008.

[33] ——, “A pedagogical approach to magnus expansion,” European Journal ofPhysics, vol. 31, p. 907, 4 2010.

[34] K. Ebrahimi-Fard and D. Manchon, “The magnus expansion, trees and knuth’srotation correspondence,” Foundations in Computational Mathematics, vol. 14,pp. 1–25, 1 2013.

[35] H. Munthe-Kass, Applied Numerical Mathematics, vol. 29, pp. 115–127, 1999.[36] L. Noakes, “Global algorithm for geodesics,” Journal of the Australian Math-

ematical Society, vol. 64, pp. 37–50, 1998.[37] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, “Gated feedback recurrent

neural networks,” CoRR, vol. abs/1502.02367, 2015. [Online]. Available: http://arxiv.org/abs/1502.02367.

[38] ——, “Empirical evaluation of gated recurrent neural networks on sequencemodeling,” CoRR, vol. abs/1412.3555, 2014. [Online]. Available: http : / /arxiv.org/abs/1412.3555.

[39] F. Chollet, Keras, https://github.com/fchollet/keras, 2015.

http://arxiv.org/abs/1502.02367




https://github.com/fchollet/keras

60 Bibliography

Appendix A

6.1 Commutation Relations

6.1.1 Commutation relations in su(8)

The Pauli basis has several useful properties

[v1, v1] ⊂ v1 (6.1)

[v1, v2] ⊂ v2 (6.2)

[v1, v3] ⊂ v3 (6.3)

[v2, v3] ⊂ v2. (6.4)

where v1 = spaniσir , v2 = spaniσi

rσjs, v3 = spaniσi

rσjsσ

kt . This can be verified

by computing Lie brackets of basis elements of these subspaces. First for elementsin v3 and v2,

i2[σi ⊗ σj ⊗ I, σr ⊗ σs ⊗ σt]

= −(σiσr ⊗ σjσs ⊗ σt − σrσi ⊗ σsσj ⊗ σt)

= 2iϵirθσθ ⊗ δsjI ⊗ σt + 2iδir ⊗ ϵsjµσµ ⊗ σt ∈ v2.

where σiσj = iϵijθσθ + δijI. Other choice of m and n in σmi σ

nj will follow similarly,

so [v2, v3] ⊂ v2, as the vi are closed under addition. Now checking Lie brackets ofterms such as iσm

i , iσnj σ

pk, to examine [v1, v2],

[iσi ⊗ I ⊗ I, iσj ⊗ σk ⊗ I] = [iσi, iσj]⊗ σk ⊗ I

which is at least contained in v2. Alternatively if there is no overlap of Pauli ma-trices in the Lie bracket, for example [ I ⊗ I ⊗ σi, σj ⊗ σk ⊗ I] = 0, the Lie bracketis still in v2. Hence we conclude [v1, v2] ⊂ v2. The [v1, v3] ⊂ v3 case follows similarly.

These relationships can be used to separate the normal equations in su(8). GivenS ∈ v1, D ∈ v2, T ∈ v3, it is clear that their derivatives are confined to the samesubspaces S ∈ v1, D ∈ v2, T ∈ v3. By using the previous commutation relations,the normal equations in su(8) must separate into the form presented in the mainbody of the text.

6.1.2 Commutation relations in su(16)

The basis for su(16) is spanned by matrices of the form iσir, iσ

irσ

js, iσ

irσ

jsσ

kt and

iσirσ

jsσ

kt σ

lu. These form bases for the four orthogonal subspaces respectively denoted

61

62 6.1. Commutation Relations

v1, v2, v3 and v4. In the normal geodesic equations, S ∈ v1, D ∈ v2, T ∈ v3 andF ∈ v4. Examining the commutation relationships of these subspaces can helpseparate the normal equations. First for basis elements in v4 and v2.

[iσr ⊗ σs ⊗ σt ⊗ σv, iσj ⊗ σk ⊗ I ⊗ I]

= −((εrjmiσm + δrjI)⊗ (εskniσn + δskI)

− (−εrjmiσm + δrjI)⊗ (−εsknσn + δskI)

)⊗ σt ⊗ σv

= −(2ϵrjmiδskσm ⊗ I + 2ϵskniδrjI ⊗ σn

)⊗ σt ⊗ σv.

This is contained in v3, and will contribute to projv3 terms. Next there are twopossible outcomes for σi

rσjs and σi

rσjsσ

kt commutators. First there can be two Pauli

matrices overlapping

[iσr ⊗ σs ⊗ σt ⊗ I, iσj ⊗ σk ⊗ I ⊗ I]

= −((εrjmiσm + δrjI)⊗ (εskniσn + δskI)

− (−εrjmiσm + δrjI)⊗ (−εskniσn + δskI)

)⊗ σt ⊗ I

= −(2ϵrjmiδskσm ⊗ I + 2ϵsknδrjiI ⊗ σn

)⊗ σt ⊗ I,

which is in v2. So this will only be nonzero under projv2 . Alternatively if there isonly a single Pauli matrix overlapping,

[iσr ⊗ σs ⊗ σt ⊗ I, iσj ⊗ I ⊗ I ⊗ σk]

= −[σr, σj]⊗ σs ⊗ σt ⊗ σk

= −2iεrjmσm ⊗ σs ⊗ σt ⊗ σk,

which will only be non zero under projv4 . Summarising these results, using thelinearity of the Lie bracket for linear combinations of matrices,

[v1, vj] ⊂ vj (6.5)

[v2, v4] ⊂ v3 (6.6)

[v2, v3] ⊂ v4 + v2. (6.7)

Now consider the Lie bracket of terms like σirσ

jsσ

kt and σi

rσjsσ

kmσ

ln. The first term in

the Lie bracket can be written as

(iϵrjθσθ + δrjI)⊗ (iϵskϕσϕ + δskI)⊗ (iϵtmµσµ + δtmI)⊗ σn.

6.1. Commutation Relations 63

Note this will be at least contained in v2 + v3 + v4. However all the terms inv1 will be canceled by the second term in the Lie bracket as δ is symmetric in theindices. Components in v3 will have coefficients containing two Levi-Cevita symbols.However because of the Levi-Cevita symbol, if two indices are swapped in both Levi-Cevita symbols, the sign remains unchanged. Hence terms in v3 will vanish fromthe Lie bracket. Terms in v2 + v4 however will contain three or one Levi-Cevitasymbols. Swapping one index in all of them, results in a change of the sign, and willnot vanish from the Lie bracket. Therefore

[v3, v4] ⊂ v2 + v4. (6.8)

The two body couplings seen in SU(16) naturally extend to higher dimensionalsu(2n). Commutators of a two body with a k body produce terms in k+1 and k−1

directions.[vk, v2] ⊂ vk−1 + vk+1.

This is because products of the form . . . σiσj ⊗ σkσl . . . in the Lie bracket will leavea term like . . . I ⊗ σs . . ., reducing the order. If a two body and a k body only hasa single overlapping product of Pauli matrices, this will increase the order as seenpreviously.

Also in SU(2n) the single body terms preserve the body order

[vk, v1] ⊂ vk.

The normal equations in SU(2n) become highly coupled, as the Lie brackets do notneatly belong to one of the different subspaces. Hence this does not give enoughinformation to solve the equations for the normal geodesics in terms of the differentsubspaces.

6.1.3 Adjoint action in SU(8)

In SU(8), eS0tT0e−S0t ∈ g/∆. This follows from (6.1.1) as [v1, v3] ⊂ v3 = g/∆, and

the well known property eXY e−X = Y +[X,Y ]+ 12[X, [X,Y ]]+ 1

6[X, [X, [X,Y ]]]+. . . .

Clearly nested Lie brackets of the form [v1, . . . [v1, v3]] will always be contained in v3,and as v3 is closed under addition, this gives the desired result. This same argumentapplies when the adjoint action of elements in = v1 + v2 is taken by eS0t.

Appendix B

6.2 Gradients for discrete geodesics and cubics

6.2.1 Trace error gradient

The gradient of the error function err,∂

∂ckierr(Y, U) = − 1

2n∂

∂ckjtr(U †e

∑mj=1 c

1jτj . . . e

∑mj=1 c

Nj τj)

∂

∂ckierr(Y, U) = − 1

2ntr(U †e

∑j c

1jτj . . . τie

∑j c

kj τj . . . e

∑cNj τj)

6.2.2 Discrete cubics gradient

To apply Newton’s method we need the gradient of the discrete cubic functional.

SA[u] =N∑j=1

h

2

(⟨u(tj+1), u(tj+1)⟩+ ⟨u(tj), u(tj)⟩

)We will perform the minimization with respect to the scalar coefficients, cji in u(tj) =cjiτi, and where u has been approximated with finite difference.

∂S

∂ckl=

∂

∂ckl

( N∑j=1

1

2

(||cj+2

i τi − cj+1i τi||2 + ||cj+1

i τi − cjiτi||2))

Clearly the only non zero terms which will remain after the partial derivative willbe when j = k − 2, k − 1, k.

∂S

∂ckl=

∂

∂ckl

(||ck+1

i τi − cki τi||2 + ||cki τi − ck−1i τi||2

),

where the norm is calculated by (repeated indices are summed)

||cj+1i τi − cjiτi||2 = tr

((cj+1i τi − cjiτi

)(cj+1i τi − cjiτi

)†)= tr

(− (cj+1

i τi)2 + cj+1

i cjs τiτs + cjscj+1i τsτi − (cjiτi)

2)

In the normalised Pauli basis tr(τiτj) = −δij. Taking the trace,

||cj+1i τi − cjiτi||2 = cj+1

i cj+1i − 2cj+1

i cji + cjicji

Now it is easy to calculate the partial derivative,∂SA

∂ckl=

∂

∂ckl

(ck+1i ck+1

i − 2ck+1i cki + 2cki c

ki − 2cki c

k−1i + ck−1

i ck−1i

)= −2ck+1

l + 4ckl − 2ck−1l

If k = 1, we have no k − 1 terms. Remember there are N + 1 , ckl vectors.

64

Date post:	06-Aug-2019
Category:	Documents
Upload:	trankien
View:	217 times
Download:	0 times

SubRiemannian geodesics and cubics for efficient quantum ... · SubRiemannian geodesics and cubics...

Documents