Physics successfully implements Lagrange multiplier ...COMPUTER SCIENCES Physics successfully...

COM

PUTE

RSC

IEN

CES

Physics successfully implements Lagrangemultiplier optimizationSri Krishna Vadlamania,1 , Tianyao Patrick Xiaob, and Eli Yablonovitcha,1

aDepartment of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720; and bSandia National Laboratories,Albuquerque, NM 87185-1084

Contributed by Eli Yablonovitch, August 25, 2020 (sent for review July 27, 2020; reviewed by Thomas Kailath and Stanley Osher)

Optimization is a major part of human effort. While being math-ematical, optimization is also built into physics. For example,physics has the Principle of Least Action; the Principle of Mini-mum Power Dissipation, also called Minimum Entropy Generation;and the Variational Principle. Physics also has Physical Annealing,which, of course, preceded computational Simulated Annealing.Physics has the Adiabatic Principle, which, in its quantum form,is called Quantum Annealing. Thus, physical machines can solvethe mathematical problem of optimization, including constraints.Binary constraints can be built into the physical optimization.In that case, the machines are digital in the same sense that aflip–flop is digital. A wide variety of machines have had recentsuccess at optimizing the Ising magnetic energy. We demonstratein this paper that almost all those machines perform optimiza-tion according to the Principle of Minimum Power Dissipation asput forth by Onsager. Further, we show that this optimizationis in fact equivalent to Lagrange multiplier optimization for con-strained problems. We find that the physical gain coefficients thatdrive those systems actually play the role of the correspondingLagrange multipliers.

hardware accelerators | physical optimization | Ising solvers

Optimization is ubiquitous in today’s world. Everyday appli-cations of optimization range from aerodynamic design

of vehicles and physical stress optimization of bridges toairline crew scheduling and delivery truck routing. Further-more, optimization is also indispensable in machine learning,reinforcement learning, computer vision, and speech process-ing. Given the preponderance of massive datasets and com-putations today, there has been a surge of activity in thedesign of hardware accelerators for neural-network training andinference (1).

We ask whether physics can address optimization? There area number of physical principles that drive dynamical systemstoward an extremum. These are the Principle of Least Action;the Principle of Minimum Power Dissipation (also called Min-imum Entropy Generation); the Variational Principle; PhysicalAnnealing, which preceded computational Simulated Annealing;and the Adiabatic Principle (which, in its quantum form, is calledQuantum Annealing).

In due course, we may learn how to use each of these prin-ciples to perform optimization. Let us consider the Principleof Minimum Power Dissipation in dissipative physical systems,such as resistive electrical circuits. It was shown by Onsager(2) that the equations of linear systems, like resistor networks,can be reexpressed as the minimization principle of a powerdissipation function f (i1, i2, . . . , in) for currents in in variousbranches of the resistor network. By reexpressing a merit func-tion in terms of power dissipation, the circuit itself will findthe minimum of the merit function, or minimum power dissi-pation. Optimization is generally accompanied by constraints.For example, perhaps the constraint is that the final answersmust be restricted to be ±1. Such a digitally constrainedoptimization produces answers compatible with any digitalcomputer.

A series of physics-based Ising solvers have been created inthe physics and engineering community. The Ising challenge isto find the minimum energy configuration of a large set of mag-nets. This is very hard even when the magnets are restricted toonly two orientations, North Pole up or down (3). Our maininsights in this paper are that most of these Ising solvers usehardware based on the Principle of Minimum Power Dissipationand that almost all of them implement the well-known Lagrangemultipliers method for constrained optimization.

An early work was by Yamamoto and coworkers in ref. 4, andthis was followed by further work from their group (5–8) andother groups (9–15). These entropy-generating machines rangefrom coupled optical parametric oscillators to resistor–inductor–capacitor electrical circuits, coupled exciton–polaritons, and sil-icon photonic-coupler arrays. These types of machines have theadvantage that they solve digital problems orders of magnitudefaster, and in a more energy-efficient manner, than conventionaldigital chips that are limited by latency and the energy cost (8).

Within the framework of these dissipative machines, con-straints can be readily included. In effect, these machines per-form constrained optimization equivalent to the technique ofLagrange multipliers. We illustrate this connection by survey-ing seven published physically distinct machines and showingthat each minimizes power dissipation in its own way, sub-ject to constraints; in fact, they perform Lagrange multiplieroptimization.

In effect, physical machines perform local steepest descentin the power-dissipation rate. They can become stuck in local

Significance

All through human civilization, optimization has played amajor role, from aerodynamics to airline scheduling, deliv-ery routing, and telecommunications decoding. Optimizationis receiving increasing attention, since it is central to today’sartificial intelligence. All of these optimization problems areamong the hardest for human or machine to solve. It hasbeen overlooked that physics itself does optimization in thenormal evolution of dynamical systems, such as seeking outthe minimum energy state. We show that among such physicsprinciples, the idea of minimum power dissipation, also calledthe Principle of Minimum Entropy Generation, appears tobe the most useful, since it can be readily implemented inelectrical or optical circuits.

Author contributions: S.K.V., T.P.X., and E.Y. designed research; S.K.V., T.P.X., and E.Y.performed research; S.K.V., T.P.X., and E.Y. analyzed data; and S.K.V. and E.Y. wrote thepaper.y

Reviewers: T.K., Stanford University; and S.O., University of California, Los Angeles. y

The authors declare no competing interest.y

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).y1 To whom correspondence may be addressed. Email: [email protected] or [email protected]

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2015192117/-/DCSupplemental.y

First published October 12, 2020.

www.pnas.org/cgi/doi/10.1073/pnas.2015192117 PNAS | October 27, 2020 | vol. 117 | no. 43 | 26639–26650

Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021

http://orcid.org/0000-0003-4602-0771

http://orcid.org/0000-0002-5724-3375

https://creativecommons.org/licenses/by-nc-nd/4.0/

https://creativecommons.org/licenses/by-nc-nd/4.0/

mailto:[email protected]



https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2015192117/-/DCSupplemental


https://www.pnas.org/cgi/doi/10.1073/pnas.2015192117

http://crossmark.crossref.org/dialog/?doi=10.1073/pnas.2015192117&domain=pdf

optima. At the very least, they perform a rapid search for localoptima, thus reducing the search space for the global optimum.These machines are also adaptable toward advanced techniquesfor approaching a global optimum.

At this point, we note that there are several other streams ofwork on physical optimization in the literature that we shall notbe dealing with in this paper. These works include a variety ofLagrange-like continuous-time solvers (16, 17), Memcomputingmethods (18), Reservoir Computing (19, 20), adiabatic solversusing Kerr nonlinear oscillators (21), and probabilistic bit logic(22). A brief discussion of adiabatic Kerr oscillator systems (21)is presented in SI Appendix, section 4.

The paper is organized as follows. In Section 1, we recognizethat physics performs optimization through its various princi-ples. Then, we concentrate on the Principle of Minimum PowerDissipation. In Section 2, we give an overview of the minimumpower-dissipation optimization solvers in the literature and showhow they incorporate constraints. Section 3 has a quick tutorialon the method of Lagrange multipliers. Section 4 studies fivepublished solvers in detail and shows that they all follow someform of Lagrange multiplier dynamics. In Section 5, we lookat those published physics-based solvers that are less obviouslyconnected to Lagrange multipliers. Section 6 presents the appli-cations of these solvers to perform linear regression in statistics.Finally, in Section 7, we conclude and discuss the consequencesof this ability to implement physics-based Lagrange multiplieroptimization for areas such as machine learning.

1. Optimization in PhysicsWe survey the minimization principles of physics and the impor-tant optimization algorithms derived from them. The aim isto design physical optimization machines that converge to theglobal optimum, or a good local optimum, irrespective of theinitial point for the search.

1.A. The Principle of Least Action. The Principle of LeastAction is the most fundamental principle in physics. Newton’sLaws of Mechanics, Maxwell’s Equations of Electromagnetism,Schrodinger’s Equation in Quantum Mechanics, and QuantumField Theory can all be interpreted as minimizing a quantitycalled Action. For the special case of light propagation, thisreduces to the Principle of Least Time, as shown in Fig. 1.

A conservative system without friction or losses evolvesaccording to the Principle of Least Action. The fundamen-tal equations of physics are reversible. A consequence of thisreversibility is the Liouville Theorem, which states that volumesin phase space are left unchanged as the system evolves.

Contrary-wise, in both a computer and an optimization solver,the goal is to have a specific solution that occupies a smaller zonein the search space than the initial state, incurring an entropycost first specified by Landauer and Bennett. Thus, some degreeof irreversibility, or energy cost, is needed, specified by the num-ber of digits in the answer in the Landauer–Bennett analysis. Analgorithm has to be designed and programmed into the reversiblesystem to effect the reduction in entropy needed to solve theoptimization problem.

The reduction in entropy implies an energy cost but not neces-sarily a requirement for continuous power dissipation. We lookforward to computer science breakthroughs that would allowthe Principle of Least Action to address unsolved problems. Analternative approach to computing would involve physical sys-tems that continuously dissipate power, aiding in the contractionof phase space toward a final solution. This brings us to thePrinciple of Least Power Dissipation.

1.B. The Principle of Least Power Dissipation. If we consider sys-tems that continuously dissipate power, we are led to a secondoptimization principle in physics, the Principle of Least Entropy

Fast medium(small

refractive index)

Slow medium

A

B

Fig. 1. The Principle of Least Time, a subset of the Principle of Least Action.The actual path that light takes to travel from point A to point B is the onethat takes the least time to traverse. Recording the correct path entails asmall energy cost consistent with the Landauer Limit.

Generation or Least Power Dissipation. This principle statesthat any physical system will evolve into a steady-state config-uration that minimizes the rate of power dissipation given theconstraints (such as fixed thermodynamic forces, voltage sources,or input power) that are imposed on the system. An early ver-sion of this statement is provided by Onsager in his celebratedpapers on the reciprocal relations (2). This was followed by fur-ther foundational work on this principle by Prigogine (23) andde Groot (24). This principle is readily seen in action in elec-trical circuits and is illustrated in Fig. 2. We shall frequentlyuse this principle, as formulated by Onsager, in the rest ofthe paper.

1.C. Physical Annealing; Energy Minimization. This technique iswidely used in materials science and metallurgy and involvesthe slow cooling of a system starting from a high tempera-ture. As the cooling proceeds, the system tries to maintainthermodynamic equilibrium by reorganizing itself into the low-est energy minimum in its phase space. Energy fluctuationsdue to finite temperatures help the system escape from localoptima as shown in Fig. 3. This procedure leads to globaloptima when the temperature reaches zero in theory, but thetemperature has to be lowered prohibitively slowly for thisto happen.

1.D. Adiabatic Method. The Adiabatic Method, illustrated inFig. 4, involves the slow transformation of a system from ini-tial conditions that are easily constructed to final conditions thatcapture the difficult problem at hand.

More specifically, to solve the Ising problem, one initializesthe system of spins in the ground state of a simple Hamil-tonian and then transforms this Hamiltonian into the Isingproblem by slowly varying some system parameters. If theparameters are varied slowly enough, the system stays in theinstantaneous ground state throughout and the problem getssolved. In a quantum mechanical system, this is sometimes called“quantum annealing.” Several proposals and demonstrations,including the well-known D-Wave machine (25), utilize thisalgorithm.

The slow rate of variation of the Hamiltonian parametersis determined by the minimum energy spacing between theinstantaneous ground state and first excited state that occursas we move from the initial Hamiltonian to the final one. Thesmaller the gap is, the slower the rate at which we need toperform the variation to successfully solve the problem. It hasbeen shown that the gap can become exponentially small in theworst case, implying that this algorithm takes exponential time inthe worst case for nondeterministic polynomial time (NP)-hardproblems.

26640 | www.pnas.org/cgi/doi/10.1073/pnas.2015192117 Vadlamani et al.

Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021



COM

PUTE

RSC

IEN

CES

A possible final configuration

The actual final configuration

2

3

1

3

1

2

1

2

Fig. 2. The Principle of Least Power Dissipation. In a parallel connec-tion, the current distributes itself in a manner that minimizes the powerdissipation, subject to the constraint of fixed input current I.

1.E. Minimum Power Dissipation in Multioscillator Arrays. Mul-tioscillator Arrays subject to Parametric Gain were introduced inrefs. 4 and 5 for solving Ising problems. This can be regarded asa subset of the Principle of Minimum Power Dissipation, whichalways requires an input power constraint to avoid the null solu-tion. In this case, gain acts as a constraint for minimum powerdissipation, and the oscillator array must arrange itself to dissi-pate the least power subject to that constraint. If the oscillatorarray is bistable, this problem becomes analogous to the mag-netic Ising problem. This mechanism will be the main point ofSection 2.

2. Coupled Multioscillator Array Ising SolversThe motivation for “Coupled Multioscillator Array IsingSolvers” is best explained using concepts from laser physics. Asa laser is slowly turned on, spontaneous emission from the laser-gain medium couples into the various cavity modes and beginsto become amplified. The different cavity modes have differentloss coefficients due to their differing spatial profiles. As the laserpump/gain increases, the least-loss cavity mode grows faster thanthe others, and the gain is clamped by saturation. This picturecan be incomplete since further nonlinear evolution among all ofthe modes can occur.

Coupled Multioscillator Array Ising machines try to map thepower losses of the optimization machine to the magnetic ener-gies of the Ising problem. If the mapping is correct, the lowestpower configuration will match the energetic ground state ofthe Ising problem. This is illustrated in Fig. 5. The systemevolves toward a state of minimum power dissipation, or mini-mum entropy generation, subject to the constraint of gain beingpresent.

The archetypal solver in this class consists of a networkof interconnected oscillators driven by phase-dependent para-metric gain. Parametric gain amplifies only the cosine quadra-ture and causes the electric field to lie along the ±RealAxis in the complex plane. The phase of the electric field(0 or π) can be used to represent ±spin in the Ising prob-lem. The resistive interconnections between the oscillators aredesigned to favor ferromagnetic or antiferromagnetic “spin–spin” interactions by the Principle of Minimum Power Dissi-pation, subject to parametric (phase-dependent) gain as thepower input.

The gain input is very important to the Principle of MinimumPower Dissipation. If there were no power input, all of the cur-rents and voltages would be zero, and the minimum power dissi-pated would be zero. In the case of the Coupled Multioscillator

circuit, the power input is produced through a gain mechanism,or a gain module. The constraint could be the voltage input to thegain module. However, if the gain were to be too small, it mightnot exceed the corresponding circuit losses, and the current andvoltage would remain near zero. If the pump gain is then gradu-ally ramped up, the oscillatory mode requiring the least thresholdgain begins oscillating. Upon reaching the threshold gain, a non-trivial current distribution of the Couple Multioscillator circuitwill emerge. As the gain exceeds the required threshold, therewill be further nonlinear evolution among the modes so as tominimize power dissipation. The final-state “spin” configura-tion, dissipating the lowest power, is reported as the desiredoptimum.

With Minimum Power Dissipation, as with most optimizationschemes, it is difficult to guarantee a global optimum.

In optimization, each constraint contributes a Lagrange mul-tiplier. We will show that the gains of the oscillators are theLagrange multipliers of the constrained system. In Section 3, weprovide a brief tutorial on Lagrange multiplier optimization.

3. Lagrange Multiplier Optimization TutorialThe method of Lagrange multipliers is a very well-known pro-cedure for solving constrained optimization problems in whichthe optimal point x∗≡ (x , y) in multidimensional space locallyoptimizes the merit function f (x) subject to the constraintg(x) = 0. The optimal point has the property that the slopeof the merit function is zero as infinitesimal steps are takenaway from x∗, as taught in calculus. However, these deviationsare restricted to the constraint curve, as shown in Fig. 6. Theisocontours of the function f (x) increase until they are lim-ited by, and just touch, the constraint curve g(x) = 0 at thepoint x∗.

At the point of touching, x∗, the gradients of f and g areparallel to each other:

∇f (x∗) =λ∗∇g(x∗). [1]

The proportionality constant λ∗ is called the Lagrange multipliercorresponding to the constraint g(x) = 0.

When we have multiple constraints g1, . . . , gp , we expand Eq.1 as follows:

∇f (x∗) =

p∑i=1

λ∗i ∇gi(x∗), [2]

where the gradient vector ∇ represents n equations, accompa-nied by the p constraint equations gi(x) = 0, resulting in n + pequations. These equations solve for the n components in thevector x∗ and the p unknown Lagrange multipliers λ∗i . Thatwould be n + p equations for n + p unknowns.

possible solutions

Isin

g en

ergy

H

Fig. 3. Physical Annealing involves the slow cooling down of a system. Thesystem performs gradient descent in configuration space with occasionaljumps activated by finite temperature. If the cooling is done slowly enough,the system ends up in the ground state of configuration space.

Vadlamani et al. PNAS | October 27, 2020 | vol. 117 | no. 43 | 26641

Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021

Ener

gy H

Hamiltonian parameter (p)p=0simple Hamiltonian

p=po

hard Hamiltonian

ground state

1st excitedstate

2nd excitedstate

3rd excited state

Fig. 4. A system initialized in the ground state of a simple Hamiltoniancontinues to stay in the ground state as long as the Hamiltonian is changedslowly enough.

Motivated by Eq. 2, we introduce a Lagrange function L(x,λ)defined as follows:

L(x,λ) = f (x) +

p∑i=1

λigi(x), [3]

which can be optimized by gradient descent or other methodsto solve for x∗ and λ∗. The theory of Lagrange multipliers, andthe popular “Augmented Lagrange Method of Multipliers” algo-rithm used to solve for locally optimal (x∗,λ∗), are discussedin great detail in refs. 26 and 27. A gist of the main points ispresented in SI Appendix, sections 1–3.

For the case of the Ising problem, the objective functionis given by f (µ) =

∑i,j Jijµi ·µj , where f (µ) is the magnetic

Ising Energy and µi is the i th magnetic moment vector. Forthe optimization method represented in this paper, we needa circuit or other physical system whose power dissipation isalso f (x) =

∑i,j Jij xixj , but now f (x) is power dissipation, not

energy; xi is a variable that represents voltage, or current orelectric field; and the Jij are not magnetic energy but rather dissi-pative coupling elements. The correspondence is between mag-netic spins quantized along the z axis, µzi =±1 and the circuitvariable xi =±1.

While “energy” and “power dissipation” are represented bydifferent units, we nonetheless need to establish a correspon-dence between them. For every optimization problem, there isa challenge of finding a physical system whose power-dissipationfunction represents the desired equivalent optimization function.

If the Ising problem has n spins, there are also p =n con-straints, one for each of the spins. A sufficient constraint isgi(x) = 1− x2

i = 0. More complicated nonlinear constraints canbe envisioned, but (1− x2

i ) could represent the first two terms ina more complicated constraint Taylor expansion.

Therefore, a sufficient Lagrange function for the Ising prob-lem, with digital constraints, is given by

L(x,λ) =

n∑i=1

n∑j=1,j 6=i

Jij xixj +

n∑i=1

λi(1− x2i )

where λi is the Lagrange multiplier associated with the corre-sponding constraint. We shall see in Section 4 that most analogalgorithms that have been proposed for the Ising problem in theliterature actually tend to optimize some version of the aboveLagrange function.

4. The Physical Ising SolversWe now discuss some physical methods proposed in the liter-ature and show how each scheme implements the method ofLagrange multipliers. They all obtain good performance on theGset benchmark problem set (28), and many of them demon-strate better performance than the heuristic algorithm, BreakoutLocal Search (29). The main result of our work is the realiza-tion that the gains used in all these physical methods are in factLagrange multipliers.

The available physical solvers in the literature, we entitle asfollows: Optical Parametric Oscillators [4.A], Coupled RadioOscillators on the Real Axis [4.B], Coupled Laser Cavities UsingMulticore Fibers [4.C], Coupled Radio Oscillators on the UnitCircle [4.D], and Coupled Polariton Condensates [4.E]. In Sec-tion 5, we discuss schemes that might be variants of minimumpower dissipation: Iterative Analog Matrix Multipliers [5.A] andLeleu Mathematical Ising Solver [5.B]. In SI Appendix, section4, we discuss “Adiabatic Coupled Radio Oscillators” (21), whichseems unconnected with minimum power dissipation.

Optical Parametric Oscillators, Coupled Radio Oscillators onthe Real Axis, and Coupled Radio Oscillators on the Unit Circleuse only one gain for all of the oscillators, which is equiv-alent to imposing only one constraint, while Coupled LaserCavities Using Multicore Fibers, Coupled Polariton Conden-sates, and Iterative Analog Matrix Multipliers use different gainsfor each spin and correctly capture the n constraints of theIsing problem.

4.A. Optical Parametric Oscillators.4.A.1. Overview. An early optical machine for solving the Isingproblem was presented by Yamamoto and coworkers (4, 30).Their system consists of several pulses of light circulating in anoptical-fiber loop, with the phase of each light pulse represent-ing an Ising spin. In parametric oscillators, gain occurs at halfthe pump frequency. If the gain overcomes the intrinsic lossesof the fiber, the optical pulse builds up. Parametric amplifica-tion provides phase-dependent gain. It restricts the oscillatoryphase to the Real Axis of the complex plane. This leads to bista-bility along the positive or negative real axis, allowing the opticalpulses to mimic the bistability of magnets.

power gain

power loss

possible solutions = physical modes

ygrenegnisI

H

power gain ground state

possible solutions = physical modes

power loss

ygrenegnisI

H

Fig. 5. A lossy multioscillator system is provided with gain. The x axis is alist of all of the available modes in the system, whereas the y axis plots theloss coefficient of each mode. Gain is provided to the system and is graduallyincreased. As in single-mode lasers, the lowest loss mode, illustrated by theblue dot, grows exponentially, saturating the gain. Above the threshold, wecan expect further nonlinear evolution among the modes so as to minimizepower dissipation.


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021





COM

PUTE

RSC

IEN

CES

Fig. 6. Maximization of function f(x, y) subject to the constraint g(x, y) = 0.At the constrained local optimum, the gradients of f and g, namely ∇f(x, y)and ∇g(x, y), are parallel.

In the Ising problem, there is magnetic coupling betweenspins. The corresponding coupling between optical pulses isachieved by specified interactions between the optical pulses. InYamamoto and coworkers’ approach (30), one pulse i is firstplucked out by an optical gate, amplitude modulated by theproper connection weight specified in the Jij Ising Hamilto-nian, and then reinjected and superposed onto the other opti-cal pulse j , producing constructive or destructive interference,representing ferromagnetic or antiferromagnetic coupling.

By providing saturation to the pulse amplitudes, the opti-cal pulses will finally settle down, each to one of the twobistable states. We will find that the pulse-amplitude configu-ration evolves exactly according to the Principle of MinimumPower Dissipation. If the magnetic dipole solutions in the Isingproblem are constrained to ±1, then each constraint is associ-ated with a Lagrange multiplier. Surprisingly, we find that eachLagrange multiplier turns out to be equal to the gain or lossassociated with the corresponding oscillator.4.A.2. Lagrange multipliers as gain coefficients. Yamamoto andcoworkers (5) analyze their parametric oscillator system usingslowly varying coupled wave equations for the circulating opticalmodes. We now show that the coupled wave equation approachreduces to an extremum of their system “power dissipation.” Thecoupled-wave equation for the slowly varying amplitude ci of thein-phase electric field cosine component of the i th optical pulse(representing magnetic spin in an Ising system) is as follows:

dcidt

= (−αi + γi)ci −n∑

j=1,j 6=i

Jij cj [4]

where the weights, Jij , are the dissipative coupling rate constants.(The Jij arise from constructive and destructive interference andcan be positive or negative. Jij ≡ |Jij | × where =± 1 isthe corresponding weight in the binary Ising problem.) γi repre-sents the parametric gain (1/sec) supplied to the i th pulse, andαi is the corresponding loss (1/sec). We shall henceforth usenormalized, dimensionless ci in the rest of the paper. The nor-malization electric field is that which produces an energy of 1/2joule in the normalization volume, while for voltages, the nor-malization voltage is that which produces an energy of 1/2 joulein the linear capacitor. For clarity of discussion, we dropped thecubic terms in Eq. 4 that Yamamoto and coworkers (5) origi-nally had. A discussion of these terms in given in SI Appendix,section 3.

Owing to the nature of parametric amplification, the quadra-ture sine components si of the electric fields die out rapidly. Thenormalized power dissipation, h (in watts divided by one joule),including the negative dissipation associated with gain can bewritten:

h(c,γ) =

n∑i=1

αic2i −

n∑i=1

γic2i +

n∑i=1

n∑j=1,j 6=i

Jij cicj [5]

where the electric field cosine amplitudes ci are rendered dimen-sionless. If we minimize the power dissipation h(c) withoutinvoking any constraints, that is, with γi = 0, the amplitudes cisimply go to zero.

If the gain γi is large enough, some of the amplitudes might goto infinity. To avoid this, we employ the n constraint functionsgi(ci) =

(1− c2i

)= 0, which enforce a digital ci =±1 outcome.

Adding the constraint function to the power dissipation yieldsthe Lagrange function, L (in units of watts divided by one joule),(which includes the constraint functions times the respectiveLagrange multipliers):

L(c,γ) =

n∑i=1

αic2i −

n∑i=1

γi(c2i − 1) +

n∑i=1

n∑j=1,j 6=i

Jij cicj [6]

The unconstrained Eq. 5 and the constrained Eq. 6 differ only inthe (−1) added to the γi term, which effectively constrains theamplitudes and prevents them from diverging to∞. Eq. 6 is theLagrange function given at the end of Section 3. Surprisingly, thegains γi emerge to play the role of Lagrange multipliers. Thismeans that each mode, represented by the subscripts in ci , mustadjust to a particular gain γi such that power dissipation is min-imized. Minimization of the Lagrange function (Eq. 6) providesthe final steady state of the system dynamics. In fact, the right-hand side of Eq. 4 is the gradient of Eq. 6, demonstrating thatthe dynamical system performs gradient descent on the Lagrangefunction. If the circuit or optical system is designed to dissipatepower in a mathematical form that matches the Ising magneticenergy, then the system will seek out a local optimum of theIsing energy.

Such a physical system, constrained to ci =±1, is digital in thesame sense as a flip–flop circuit, but unlike the von Neumanncomputer, the inputs are resistor weights for power dissipa-tion. Nonetheless, a physical system can evolve directly, withoutthe need for shuttling information back and forth as in a vonNeumann computer, providing faster answers. Without the com-munications overhead but with the higher operation speed, theenergy dissipated to arrive at the final answer will be less,despite the circuit being required to generate entropy during itsevolution toward the final state.

To achieve minimum power dissipation, the amplitudes ci andthe Lagrange multipliers γi must all be simultaneously opti-mized using the Lagrange function as discussed in Section 4.E.While a circuit will evolve toward optimal amplitudes ci , thegains γi must arise from a separate active circuit. Ideally, theactive circuit that controls the Lagrange multiplier gains γi wouldhave its power dissipation included with the main circuit. Amore common method is to provide gain that follows a heuris-tic rule. For example, Yamamoto and coworkers (5) follow theheuristic rule γi = a + bt . It is not yet clear whether the heuristic-based approach toward gain evolution will be equally effectiveas using the complete Lagrange method in Section 4.E andlumping together all main circuit and feedback components andminimizing the total power dissipation.

We conclude this subsection by noting that the Lagrange func-tion, Eq. 6, corresponds to the following merit function, thenormalized power dissipation, f (in watts divided by one joule),and constraints:


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021



f (c) =

n∑i=1

n∑j=1,j 6=i

Jij cicj +

n∑i=1

αic2i

gi(ci) =(1− c2i

)= 0, for i = 1, 2, . . . ,n.

4.B. Coupled Radio Oscillators on the Real Axis.4.B.1. Overview. A coupled inductor–capacitor (LC) oscillatorsystem with parametric amplification was analyzed in the circuitsimulator, SPICE, by Xiao (9). This is analogous to the opti-cal Yamamoto system, but this system consists of a network ofradio frequency LC oscillators coupled to one another throughresistive connections. The LC oscillators contain linear induc-tors but nonlinear capacitors, which provide the parametric gain.The parallel or cross-connect resistive connections between theoscillators are designed to implement the ferromagnetic or anti-ferromagnetic couplings Jij between magnetic dipole momentsµi as shown in Fig. 7. The corresponding phase of the voltageamplitude Vi , 0 or π, determines the sign of magnetic dipolemoment µi .

The nonlinear capacitors are pumped by voltage V (2ω0) atfrequency 2ω0, where the LC oscillator natural frequency is ω0.Second harmonic pumping leads to parametric amplification inthe oscillators. As in the optical case, parametric amplificationinduces gain γi in the Real Axis quadrature and imposes phasebistability on the oscillators.

Ideally, an active circuit would control the Lagrange multipliergains γi , and the gain control circuit would have its power dissi-pation included with the main circuit. A more common approachis to provide gain that follows a heuristic rule. Xiao (9) linearlyramps up the gain as in Optical Parametric Oscillators. Again,as in the previous case, a mechanism is needed to prevent theparametric gain from producing infinite amplitude signals. Zenerdiodes are used to restrict the amplitudes to finite saturation val-ues. With the diodes in place, the circuit settles into a voltagephase configuration, 0 or π, that minimizes net power dissipationfor a given pump gain.4.B.2. Lagrange function and Lagrange multipliers. The evolutionof the oscillator capacitor voltages was derived from Kirchhoff’slaws by Xiao (9). The slowly varying amplitude approximationon the cosine component of these voltages, ci , produces thefollowing equation for the ith oscillator:

dcidt

=

n∑j=1,j 6=i

Jij cj

−αci + γci [7]

where the ci are the peak voltage amplitudes; Rc is the resistanceof the coupling resistors; the cross-couplings Jij are assigned val-

V1(t) V2(t)spin 1 spin 2

ferromagnetic, J12= +1, the circuit optimizes J µ µ

anti-ferromagnetic, J12= –1, the circuit optimizes J µ µ

V1(t) V2(t)spin 1

spin 2noise

Rc

Rc

Rc

Rc

Fig. 7. Coupled LC oscillator circuit for two coupled magnets. The oscil-lation of the LC oscillators represents the magnetic moments, while theparallel or antiparallel cross-connections represent ferromagnetic or anti-ferromagnetic coupling, respectively. The nonlinear capacitors are pumpedby V(2ω0) at frequency 2ω0, providing parametric gain at ω0.

ues ; C0 is the linear part of the capacitance ineach oscillator; n is the number of oscillators; ω0 is the natu-ral frequency of the oscillators; the parametric gain constant γ=ω0|∆C |/4C0, where |∆C | is the capacitance modulation at thesecond harmonic; and the decay constant α= (n − 1)/(4RcC0).In this simplified model, all decay constants α are taken asequal, and, moreover, each oscillator experiences exactly thesame parametric gain γ, conditions that can be relaxed if needed.

We note that Eq. 7 performs gradient descent on the netpower-dissipation function:

h(c, γ) =−n∑

i=1

n∑j=1,j 6=i

Jij cicj +

n∑i=1

αc2i −n∑

i=1

γc2i [8]

where h, L, f are the power-dissipation functions in watts dividedby one joule. This is very similar to Section 4.A. The first twoterms on the right-hand side together represent the dissipa-tive losses in the coupling resistors, while the third term is thenegative of the gain provided to the system of oscillators.

Next, we obtain the following Lagrange function through thesame replacement of

(−c2i

)with

(1− c2i

)that we performed in

Section 4.A:

L(c, γ) =−n∑

i=1

n∑j=1,j 6=i

Jij cicj +

n∑i=1

αc2i −n∑

i=1

γ(c2i − 1) [9]

where the ci are normalized to the voltage that produces anenergy of 1/2 joule on the capacitor C0. The above Lagrangefunction corresponds to Lagrange multiplier optimization usingthe following merit function and constraints:

f (c) =−n∑

i=1

n∑j=1,j 6=i

Jij cicj +

n∑i=1

αc2i , g(c) =

n∑i=1

(1− c2i ) = 0

Again, we see that the gain coefficient γ is the Lagrangemultiplier of the constraint g = 0.4.B.3. Time dynamics and iterative optimization of the Lagrangefunction. Although the extremum of Eq. 9 represents the finalevolved state of the physical system and represents an opti-mization outcome, it would be interesting to examine the timeevolution toward the optimal state. We shall show in this subsec-tion that iterative optimization of the Lagrange function in timereproduces the slowly varying time dynamics of the circuit. Eachiteration is assumed to take time ∆t . In each iteration, the volt-age amplitude ci takes a step antiparallel to the gradient of theLagrange function:

ci(t + ∆t) = ci(t)−κ∆t∂

∂ciL(c, γ), [10]

where the minus sign on the right-hand side drives the sys-tem toward minimum power dissipation. The proportionalityconstant κ controls the size of each iterative step; it also cal-ibrates the dimensional units between power dissipation andvoltage amplitude. (Since ci is voltage amplitude, κ has units ofreciprocal capacitance.) Converting Eq. 10 to continuous time,

dcidt

=−κ ∂

∂ciL(c, γ), [11]

where the γj play the role of Lagrange multipliers, and the gj = 0are the constraints. Substituting L(c, γ) from Eq. 9 into Eq. 11,we get

dcidt

= 2κ

n∑j=1,j 6=i

Jij cj

−αci + γci

[12]


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021


COM

PUTE

RSC

IEN

CES

The constant κ can be absorbed into the units of time to repro-duce Eq. 7, the slowly varying amplitude approximation forthe coupled radio oscillators. Thus, in this case and many of theothers (except Section 4.E), the slowly varying time dynamics canbe reproduced from iterative optimization steps on the Lagrangefunction.

4.C. Coupled Laser Cavities Using Multicore Fibers.4.C.1. Overview. The Ising solver designed by Babaeian et al.(10) makes use of coupled laser modes in a multicore opticalfiber. Polarized light in each core of the optical fiber correspondsto each magnetic moment in the Ising problem. The numberof cores is equal to the number of magnets in the given Isinginstance. The right-hand and left-hand circular polarization ofthe laser light in each core represent the two polarities (up anddown) of the corresponding magnet. The mutual coherence ofthe various cores is maintained by injecting seed light from amaster laser.

The coupling between the fiber cores is achieved throughamplitude mixing of the laser modes by Spatial Light Modula-tors at one end of the multicore fiber (10). These Spatial LightModulators couple light amplitude from the i th core to the j thcore according to the prescribed connection weight Jij .4.C.2. Equations and comparison with Lagrange multipliers. As inprior physical examples, the dynamics can be expressed usingslowly varying equations for the polarization modes of thei th core, EiL and EiR, where the two electric-field amplitudesare in-phase temporally, are positive real, but have differentpolarization. They are

d

dtEiL =−αiEiL + γiEiL +

1

2

n∑j=1,j 6=i

Jij (EjR −EjL),

d

dtEiR =−αiEiR + γiEiR −

1

2

n∑j=1,j 6=i

Jij (EjR −EjL),

where αi is the decay rate in the i th core, and γi is the gain inthe i th core. The third term on the right-hand side represents thecoupling between the j th and i th cores that is provided by theSpatial Light Modulators. They next define the degree of polar-ization as µi ≡EiL−EiR. Subtracting the two equations above,we obtain the following evolution equation for µi :

d

dtµi =−αiµi + γiµi +

n∑j=1,j 6=i

Jijµj [13]

where the electric fields are properly dimensionless and nor-malized as in Section 4.A. The power dissipation is propor-tional to |EiL|2 + |EiR|2. However, this can also be written|EiL−EiR|2 + |EiL +EiR|2 = |µi |2 + |EiL +EiR|2. |EiL +EiR|2can be regarded as relatively constant as energy switches backand forth between right and left circular polarization. Then,power dissipation h(µ) would be most influenced by quadraticterms in µ:

h(µ,γ) =

n∑i=1

αiµ2i +

n∑i=1

n∑j=1,j 6=i

Jijµiµj −n∑

i=1

γiµ2i .

As before, we add the n digital constraints gi(µi) = 1−µ2i = 0,

where µi =±1 represents fully left or right circular polarization,and obtain the Lagrange function:

L(µ,γ) =

n∑i=1

αiµ2i +

n∑i=1

n∑j=1,j 6=i

Jijµiµj −n∑

i=1

γi(µ2i − 1

).

[14]

Once again, the gains γi play the role of Lagrange multipliers.Thus, a minimization of the power dissipation, subject to theoptical gain γi , solves the Ising problem defined by the sameJij couplings. In fact, the right-hand side of Eq. 13 is the gradi-ent of Eq. 14, demonstrating that the dynamical system performsgradient descent on the Lagrange function.

The merit and constraint functions in the Lagrange functionabove are

f (µ) =

n∑i=1

αiµ2i +

n∑i=1

n∑j=1,j 6=i

Jijµiµj

gi(µi) =(1−µ2

i

)= 0, for i = 1, 2, . . . ,n.

4.D. Coupled Electrical Oscillators on the Unit Circle.4.D.1. Overview. We now consider a network of nonlinear,amplitude-stable electrical oscillators designed by Wang andRoychowdhury (11) to represent an Ising system for which weseek a digital solution with each dipole µiz =±1 along the z axisin the magnetic dipole space. Wang and Roychowdhury providea dissipative system of LC oscillators with oscillation amplitudeclamped and oscillation phase φi = 0 or π revealing the pre-ferred magnetic dipole orientation µiz =±1. It is noteworthythat Roychowdhury goes beyond Ising machines and constructsgeneral digital logic gates using these amplitude-stable oscillatorsin ref. 31.

In their construction, Wang and Roychowdhury (11) use non-linear elements that behave like negative resistors at low-voltageamplitudes but as saturating resistance at high-voltage ampli-tudes. This produces amplitude-stable oscillators. In addition,Wang and Roychowdhury (11) provide a second harmonic pumpand use a form of parametric amplification (referred to as sub-harmonic injection locking in ref. 11) to obtain bistability withrespect to phase.

With the amplitudes being essentially clamped, it is the read-out of these phase shifts, 0 or π, that provides the magnetic dipoleorientation µiz =±1. One key difference between this systemand Yamamoto’s system is that the latter had fast phase dynam-ics and slow amplitude dynamics, while Roychowdhury’s systemhas the reverse.4.D.2. Equations and comparison with Lagrange multipliers.Wang and Roychowdhury (11) derived the dynamics of theiramplitude-stable oscillator network using perturbation conceptsdeveloped in ref. 32. While a circuit diagram is not shown, ref. 11invokes the following dynamical equation for the phases of theirelectrical oscillators:

dφi

dt=−

n∑j=1,j 6=i

Jij sin (φi(t)−φj (t))

−λi sin (2φi(t)),

[15]where Rc is a coupling resistance in their system, φi is the phaseof the i th oscillator, and the λi are decay parameters that dictatehow fast the phase angles settle toward their steady-state values.

We now show that Eq. 15 can be reproduced by iterativelyminimizing the power dissipation in their system. Power dissi-pation across a resistor Rc is (V1−V2)2/Rc , where (V1−V2)is the voltage difference. Since V1 and V2 are sinusoidal, thepower dissipation consists of constant terms and a cross-term ofthe form

f (φ1,φ2) =|V |2 cos (φ1−φ2)

Rc,

where f (φ1,φ2) is the power dissipated in the resistors. Mag-netic dipole orientation parallel or antiparallel is represented bywhether φ1−φ2 = 0 or π, respectively. We may choose an originfor angle space at φ= 0, which implies φi = 0 or π. This can beimplemented as


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021

gi(φi) = (cos (2φi)− 1) = 0.

Combining the power dissipated in the resistors with theconstraint function gi(φi) = 0, we obtain a Lagrange function:

L(φ,λ) =

n∑i=1

n∑j=1,j 6=i

Jij cos (φi −φj )+

n∑i=1

λi (cos (2φi)− 1)

[16]

where λi is the Lagrange multiplier corresponding to the phase-angle constraint, and Jij are resistive coupling rate constants.The right-hand side of Eq. 15 is the gradient of Eq. 16, demon-strating that the dynamical system performs gradient descent onthe Lagrange function.

The Lagrange function above is isomorphic with the generalform in Section 3. The effective merit function f and constraintsgi in this correspondence are

f (φ) =

n∑i=1

n∑j=1,j 6=i

Jij cos (φi −φj )

gi(φi) = (cos (2φi)− 1)= 0, for i = 1, 2, . . . ,n.

4.E. Coupled Polariton Condensates.4.E.1. Overview. Kalinin and Berloff (12) proposed a system con-sisting of coupled polariton condensates to minimize the XYHamiltonian. The XY Hamiltonian is a two-dimensional versionof the Ising Hamiltonian and is given by

H (µ) =

n∑i=1

n∑j=1,j 6=i

Jijµi ·µj

where the µi represents the magnetic moment vector of the i thspin restricted to the spin-space XY plane.

Kalinin and Berloff (12) pump a grid of coupled semiconduc-tor microcavities with laser beams and observe the formationof strongly coupled exciton–photon states called polaritons. Forour purposes, the polaritonic nomenclature is irrelevant. For us,these are simply coupled electromagnetic cavities that operateby the Principle of Minimum Power Dissipation similar to theprevious cases. The complex electromagnetic amplitude in thei th microcavity can be written Ei = ci + jsi , where ci and si rep-resent the cosine and sine quadrature components of Ei , and jis the unit imaginary. ci is mapped to the X-component of themagnetic dipole vector, and si to the Y-component. The elec-tromagnetic microcavity system settles into a state of minimumpower dissipation as the laser pump and optical gain are rampedup to compensate for the intrinsic cavity losses. The phase anglesin the complex plane of the final electromagnetic modes are thenreported as the corresponding µ-magnetic moment angles in theXY plane.

Since the electromagnetic cavities experience phase-inde-pendent gain, this system does not seek phase bistability. We areactually searching for the magnetic dipole vector angles in theXY plane that minimize the corresponding XY magnetic energy.4.E.2. Lagrange function and Lagrange multipliers. Ref. 12 uses“Ginzburg–Landau” equations to analyze their system, result-ing in equations for the complex amplitudes Ψi of the polari-ton wavefunctions. However, the Ψi are actually the complexelectric-field amplitudes Ei (properly dimensionless and nor-malized as in Section 4.A) of the i th cavity. The electric-fieldamplitudes satisfy the slowly varying amplitude equation:

dEi

dt=(γi −αi −β|Ei |2

)Ei − iU |Ei |2Ei −

n∑j=1,j 6=i

JijEj [17]

where γi is optical gain, αi is linear optical loss, β is nonlinearattenuation, U is nonlinear phase shift, and Jij are dissipa-tive coupling rate constants. We note that both the amplitudesand phases of the electromagnetic modes are coupled to eachother and evolve on comparable timescales. This is in contrastto ref. 11, where the main dynamics were embedded in phase—amplitude was fast and almost fixed—or, conversely (9), wherethe dynamics were embedded in amplitude—phase was fast andalmost fixed.

We show next that the method of ref. 12 is essentially themethod of Lagrange multipliers with an added “rotation.” Thepower-dissipation rate is

h(E) =− d

dt

n∑i=1

(E∗i2

+Ei

2

)2=

1

2

n∑i=1

n∑j=1,j 6=i

Jij(E∗i Ej +EiE

∗j

)+

n∑i=1

β|Ei |4

+

n∑i=1

αi |Ei |2−n∑

i=1

γi |Ei |2.

If we add a saturation constraint, gi(Ei) =(1− |Ei |2

)= 0, then

by analogy to the previous sections, γi is reinterpreted as aLagrange multiplier:

L(E,γ) =1

2

n∑i=1

n∑j=1,j 6=i

Jij(E∗i Ej +EiE

∗j

)+

n∑i=1

β|Ei |4

+

n∑i=1

αi |Ei |2−n∑

i=1

γi(|Ei |2− 1

) [18]

where L is the Lagrange function and h, L, f are the normalizedpower-dissipation functions (in watts divided by one joule). Thus,the scheme of coupled polaritonic resonators operates to find thestate of minimum power dissipation in steady state, similar to theprevious cases.

Dynamical Eq. 17 performs gradient descent on the Lagrangefunction Eq. 18 in conjunction with a rotation about the ori-gin, iU . This rotation term, iU , is not captured by the Lagrangemultiplier interpretation. It could, however, be useful in develop-ing more sophisticated algorithms than the method of Lagrangemultipliers, and we discuss this prospect in Section 5.B, where asystem with a more general “rotation” term is discussed.4.E.3. Iterative evolution of Lagrange multipliers. In the methodof Lagrange multipliers, the merit-function Eq. 18 is used tooptimize not only the electric-field amplitudes Ei but also theLagrange multipliers γi . The papers of the previous sections usedsimple heuristics to adjust their gains/decay constants, which wehave shown to be Lagrange multipliers. Kalinin and Berloff (12)employ the Lagrange function itself to adjust the gains, as in thecomplete Lagrange method discussed next.

We introduce the full method of Lagrange multipliers bybriefly shifting back to the notation of Section 3. The fullLagrange method finds the optimal x∗ and λ∗ by performing gra-dient descent of L in x and gradient ascent of L in λ. The reasonfor ascent in λ rather than descent is to more strictly penalizedeviations from the constraint. This leads to the iterations

xi(t + ∆t) = xi(t)−κ∆t∂

∂xiL(x,λ), [19]

λi(t + ∆t) =λi(t) +κ′∆t∂

∂λiL(x,λ), [20]

where κ and κ′ are suitably chosen step sizes.With our identification that the Lagrange multipliers λ are

the same as the gains γ, we plug the Lagrange function Eq. 18


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021


COM

PUTE

RSC

IEN

CES

into the second iterative equation and take the limit ∆t→ 0. Weobtain the following dynamical equation for the gains γi :

dγidt

=κ′(1− |Ei |2

). [21]

This iterative evolution of the Lagrange multipliers is indeedwhat Kalinin and Berloff (12) employ in their coupled polaritonsystem.

To Eq. 21, we must add the iterative evolution of the fieldvariables xi :

dxidt

=−κ ∂

∂xiL(x,λ). [22]

Eqs. 21 and 22 represent the full iterative evolution, but in someof the earlier subsections, γi(t) was assigned a heuristic timedependence.

We conclude this subsection by splitting the Lagrange functioninto the effective merit function f and the constraint func-tion gi . The extra “phase rotation” U is not captured by thisinterpretation.

f (E1, . . . ,En) =1

2

n∑i=1

n∑j=1,j 6=i

Jij(E∗i Ej +EiE

∗j

)+

n∑i=1

β|Ei |4

+

n∑i=1

αi |Ei |2

gi(Ei) =(1− |Ei |2

)= 0, for i = 1, 2, . . . ,n.

4.F. General Conclusions from Coupled Multioscillator Array IsingSolvers. 1) Physical systems minimize the power-dissipation ratesubject to input constraints of voltage, amplitude, gain, etc. 2)These systems actually perform Lagrange multiplier optimiza-tion with the gain γi playing the role of multiplier for the i thdigital constraint. 3) Under the digital constraint, amplitudesci =±1 or phases φi = 0 or π, power-dissipation minimizationschemes are actually binary, similar to a flip–flop. 4) In manyof the studied cases, the system time dependence follows gra-dient descent on the power-dissipation function as the systemapproaches a power-dissipation minimum. In one of the cases(Section 4.E), there was a rotation superimposed on this gradientdescent.

5. Other Methods in the LiteratureWe now look at other methods in the literature that do notexplicitly implement the method of Lagrange multipliers butnevertheless end up with dynamics that resemble it to varyingextents. All of these methods offer operation regimes where thedynamics is not analogous to Lagrange multiplier optimization,and we believe it is an interesting avenue of future work to studythe capabilities of these regimes.

5.A. Iterative Analog Matrix Multipliers. Soljacic and coworkers(13) developed an iterative procedure consisting of repeatedmatrix multiplication to solve the Ising problem. Their algorithmwas implemented on a photonic circuit that utilized on-chipoptical matrix multiplication units composed of Mach–Zehnderinterferometers that were first introduced for matrix algebraby Zeilinger and coworkers in ref. 33. Soljacic and coworkers(13) showed that their algorithm performed optimization on aneffective merit function that is demonstrated to be a Lagrangefunction in SI Appendix, section 5.

We use our insights from the previous sections to implementa simplified iterative optimization using an optical matrix multi-plier. A block diagram of such a scheme is shown in Fig. 8. Letthe multiple magnetic moment configuration of the Ising prob-lem be represented as a vector of electric-field amplitudes, Ei ,

Matrix multiplier composed of 2X2 optical splitters and gain

Output of current iteration fed back as input for the next iteration

Fig. 8. An optical circuit performing iterative multiplications converges ona solution of the Ising problem. Optical pulses are fed as input from theleft-hand side at the beginning of each iteration, pass through the matrixmultiplication unit, and are passed back from the outputs to the inputs forthe next iteration. Distributed optical gain sustains the iterations.

of the spatially separated optical modes. Each mode-field ampli-tude represents the value of each magnetic moment. In eachiteration, the optical modes are fed into the optical circuit, whichperforms matrix multiplication, and the resulting output opticalmodes are then fed back to the optical circuit input for the nextiteration. Optical gain or some other type of gain sustains thesuccessive iterations.

We wish to design the matrix multiplication unit such that ithas the following power-dissipation function:

h(E) =

n∑i=1

αi |Ei |2−n∑

i=1

γi |Ei |2 +1

2

n∑i=1

n∑j=1,j 6=i

Jij (E∗i Ej

+EiE∗j

)The Lagrange function, including a binary constraint, |Ei |2 = 1,is given by

L(E,γ) =

n∑i=1

αi |Ei |2−n∑

i=1

γi(|Ei |2− 1

)+

1

2

n∑i=1

n∑j=1,j 6=i

Jij

×(E∗i Ej +EiE

∗j

)[23]

where the Jij is the dissipative loss rate constant associated withelectric-field interference between optical modes in the Mach–Zehnder interferometers, and γi is the optical gain.

The iterative multiplicative procedure that evolves the electricfields toward the minimum of the Lagrange function Eq. 23 isgiven by

Ei(t + 1)−Ei(t) =−κ∆t∂

∂Ei

(n∑

i=1

αi |Ei(t)|2

+

n∑i=1

γi(1− |Ei(t)|2

)+

1

2

n∑i=1

n∑j=1,j 6=i

Jij (E∗i (t)Ej (t)

+Ei(t)E∗j (t)

)),

where κ is a constant step size with the appropriate units, andeach iteration involves taking steps in Ei proportional to thegradient ∂/∂Ei of the Lagrange function. (∂/∂Ei represents dif-ferentiation with respect to the two quadratures.) Simplifying


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021


and sending all of the terms involving time step t to one side,we get

Ei(t + 1) =

n∑j=1

((1 + 2κ∆tγi − 2κ∆tαi)δij

−2κ∆tJij (1− δij ))Ej (t) [24]

where δij is the Kronecker delta (1 only if i = j ). TheMach–Zehnder interferometers should be tuned to the matrix[(1 + 2κ∆tγi − 2κ∆tαi)δij − 2κ∆tJij (1− δij )]. Thus, we havean iterative matrix multiplier scheme that minimizes theLagrange function of the Ising problem. In effect, a lump of dis-sipative optical circuitry, compensated by optical gain, will, in aseries of iterations, settle into a solution of the Ising problem.

The simple system above differs from that of Soljacic andcoworkers (13) in that their method has added noise andnonlinear thresholding in each iteration. A detailed descriptionof their approach is presented in SI Appendix, section 5.

5.B. Leleu Mathematical Ising Solver. Leleu et al. (8) proposeda modified version of the Yamamoto’s Ising machine (5) thatsignificantly resembles the Lagrange method while incorporat-ing important new features. To understand the similarities anddifferences between Leleu’s method and that of Lagrange multi-pliers, we recall the Lagrange function for the Ising problem thatwe encountered in Section 4:

L(x,γ) =

n∑i=1

n∑j=1,j 6=i

Jij xixj +

n∑i=1

αix2i +

n∑i=1

γi(1− x2

i

)[25]

In the above, xi are the optimization variables, Jij is the inter-action matrix, γi is the gain provided to the i th variable, andαi is the loss experienced by the i th variable. To find a localoptimum (x∗,γ∗) that satisfies the constraints, one performs gra-dient descent on the Lagrange function in the x variables andgradient ascent in the γ variables, as discussed in Section 4.E,Eqs. 19 and 20. Substituting Eq. 25 into them and taking the limitof ∆t→ 0, we get

dxidt

= 2κ

(−αi + γi)xi −n∑

j=1,j 6=i

Jij xj

[26]

dγidt

=κ′(1− x2

i

). [27]

On the other hand, Leleu et al. (8) propose the following system:

dxidt

= (−α+ γ)xi + ei

n∑j=1,j 6=i

Jij xj [28]

deidt

=β(1− x2i )ei , [29]

where the xi are the optimization variables, α is the loss expe-rienced by each variable, γ is a common gain supplied to eachvariable, β is a positive parameter, and the ei are error coeffi-cients that capture how far away each xi is from its saturationamplitude. Leleu et al. also had cubic terms in xi in ref. 8, and adiscussion of these terms is given in SI Appendix, section 3.

It is clear that there are significant similarities between Leleu’ssystem and the Lagrange multiplier system. The optimizationvariables in both systems experience linear losses and gains andhave interaction terms that capture the Ising interaction. Bothsystems have auxiliary variables that are varied according to howfar away each degree of freedom is from its preferred saturationamplitude. However, the similarities end here.

A major differentiation in Leleu’s system is that ei mul-tiplies the Ising interaction felt by the i th variable, result-ing in eiJij . The complementary coefficient is ejJij . Conse-quently, Leleu’s equations implement asymmetric interactionseiJij 6= ejJij between vector components xi and xj . The inclu-sion of asymmetry seems to be important because Leleu’s systemachieves excellent performance on the Gset problem set, asdemonstrated in ref. 8.

We obtain some intuition about this system by splitting theasymmetric term eiJij into a symmetric and antisymmetric part.This follows from the fact that any matrix A can be written asthe sum of a symmetric matrix, (A+AT )/2, and an antisymmet-ric matrix, (A−AT )/2. The symmetric part leads to gradientdescent dynamics similar to all of the systems in The PhysicalIsing Solvers. The antisymmetric part causes a energy-conserving“rotary” motion in the vector space of xi .

The secret of Leleu et al.’s (8) improved performance seemsto lie in this antisymmetric part. The dynamical freedom asso-ciated with asymmetry might provide a fruitful future researchdirection in optimization and deserves further study to ascertainits power.

6. Applications in Linear Algebra and StatisticsWe have seen that minimum power-dissipation solvers canaddress the Ising problem and similar problems like the travel-ing salesman problem. In this section, we provide yet anotherapplication of minimum power-dissipation solvers to an opti-mization problem that appears frequently in statistics, namelycurve fitting. In particular, we note that the problem of lin-ear least-squares regression, linear curve fitting with a quadraticmerit function, resembles the Ising problem. In fact, the electri-cal circuit example we presented in Section 4.B can be applied tolinear regression. We present such a circuit in this section. Ourcircuit provides a digital answer but requires a series of binaryresistance values, that is, . . . , 2R0,R0, 0.5R0, . . . , to representarbitrary binary statistical input observations.

The objective of linear least-squares regression is to fit a lin-ear function to a given set of data {(x1, y1), (x2, y2), (x3, y3), . . . ,(xn, yn)}. The xi are input vectors of dimension d , while the yi arethe observed outputs that we want our regression to capture. Thelinear function that is being fit is of the form y(a) =

∑di=1 wiai ,

where a is a feature vector of length d , and w is a vector ofunknown weights. The vector w is calculated by minimizing the

Fig. 9. A 2-bit, linear regression circuit to find the best two curve-fittingweights wd , using the Principle of Minimum Power Dissipation.


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021




COM

PUTE

RSC

IEN

CES

sum of the squared errors it causes when used on an actualdataset:

w∗= arg minw

n∑i=1

[(d∑

j=1

wj xij

)− yi

]2,

where xij is the j th component of the vector xi. This functionalform is identical to the Ising Hamiltonian, and we may constructan Ising circuit with Jij =

∑nk=1 xkixkj , with the weights w acting

like the unknown magnetic moments. There is an effective mag-netic field in the problem hi =−2

∑nj=1 xjiyj . A simple circuit

that solves this problem for d = 2 (each instance has two fea-tures) is provided in Fig. 9. This circuit provides weights to 2-bitprecision.

The oscillators on the left-hand side of Fig. 9 represent the 20

and 21 bits of the first weight, while the oscillators on the otherside represent the second weight.

The cross-resistance R that one would need to represent theJij that connects the i th and j th oscillators is calculated as

1

R=

b1R−1

+b0R0

+b−1

R1,

where Rm = 2mR0 is a binary hierarchy of resistances based ona reference resistor R0, and bm are the bits of Jij : Jij = b1×21 + b0× 20 + b−1× 2−1. This represents Jij to 3-bit precisionusing resistors that span a dynamic range 22 = 4. Further, thesign of the coupling is allotted according to whether the resis-tors R are parallel-connected or cross-connected. In operation,the resistors R would be externally programmed to the correctbinary values, with many more bits than 3-bit precision, as givenby the matrix product Jij =

∑nk=1 xkixkj .

We have just solved the regression problem of the form Xw = y,where matrix X and vector y were known measurements and thecorresponding best weight vector w for fitting was the unknown.We conclude by noting that this same procedure can be adoptedto solve linear systems of equations of the form Xw = y.

7. Discussion and ConclusionPhysics obeys a number of optimization principles such as thePrinciple of Least Action, the Principle of Minimum Power Dis-sipation (also called Minimum Entropy Generation), the Varia-tional Principle, Physical Annealing, and the Adiabatic Principle(which, in its quantum form, is called Quantum Annealing).

Optimization is important in diverse fields, ranging fromscheduling and routing in operations research to protein foldingin biology, portfolio optimization in finance, and energy min-imization in physics. In this article, we made the observationthat physics has optimization principles at its heart and that theycan be exploited to design fast, low-power digital solvers thatavoid the limits of standard computational paradigms. Naturethus provides us with a means to solve optimization problemsin all of these areas, including engineering, artificial intelli-gence, machine learning (backpropagation), Control Theory,and reinforcement learning.

We reviewed seven physical machines that purported to solvethe Ising problem and found that six of the seven were perform-ing Lagrange multiplier optimization; further, they also obeythe Principle of Minimized Power Dissipation (always subjectto a power-input constraint). This means that by appropriatechoice of parameter values, these physical solvers can be usedto perform Lagrange multiplier optimization orders of mag-nitude faster and with lower power than conventional digitalcomputers. This performance advantage can be utilized for opti-mization in machine-learning applications where energy andtime considerations are critical.

The following questions arise: What are the action items?What is the most promising near term application? All of thehardware approaches seem to work comparably well. The easiestto implement would be the electrical oscillator circuits, althoughthe optical oscillator arrays can be compact and very fast. Electri-cally, there would two integrated circuits, the oscillator array, andthe connecting resistors that would need to be reprogrammedfor different problems. The action item could be to design thefirst chip consisting of about 1,000 oscillators and a second chipthat would consist of the appropriate coupling resistor array fora specific optimization problem. The resistors should be in anaddressable binary hierarchy so that any desired resistance valuecan be programmed in by switches, within the number of bitsaccuracy. It is possible to imagine solving a new Ising problemevery millisecond by reprogramming the resistor chip.

On the software side, a compiler would need to be developedto go from an unsolved optimization problem to the resistor arraythat matches that desired goal. If the merit function were mildlynonlinear, we believe that the Principle of Minimum Power Dis-sipation would still hold, but there has been less backgroundscience justifying that claim.

With regard to the most promising near-term application, itmight be in Control Theory or in reinforcement learning in self-driving vehicles, where rapid answers are required, at modestpower dissipation.

The act of computation can be regarded as a search amongmany possible answers. Finally, the circuit converges to a finalcorrect configuration. Thus the initial conditions may includea huge phase-space volume of 2n possible solutions, ultimatelytransitioning into a final configuration representing a small- ormodest-sized binary number. This type of computing implies asubstantial entropy reduction. This led to Landauer’s admoni-tion that computation costs kn log 2 of entropy decrease andkTn log 2 of energy, for a final answer with n binary digits.

By the Second Law of Thermodynamics, such an entropy reduc-tion must be accompanied by an entropy increase elsewhere. InLandauer’s viewpoint, the energy and entropy limit of computingwas associated with the final acting of writing out the answer in nbits, assuming the rest of the computer was reversible. In practice,technology consumes ∼104 times more than the Landauer limit,owing to the insensitivity of the transistors operating at ∼1 V,when they could be operating at ∼10 mV.

In the continuously dissipative circuits we have described here,the energy consumed would be infinite if we waited long enoughfor the system to reach the final optimal state. If we terminate thepowering of our optimizer systems after they reach the desiredfinal-state answer, the energy consumed becomes finite. By oper-ating at voltage <1 V and by powering off after the desiredanswer is achieved, our continuously dissipating Lagrange opti-mizers could actually be closer to the Landauer limit than aconventional computer.

A controversial point relates to the quality of solutions thatare obtained for NP-hard problems. The physical systems we areproposing evolve by steepest descent toward a local optimum,not a global optimum. Nonetheless, many of the authors of theseven physical systems presented here have claimed to find betterlocal optima than their competitors, due to special adjustments intheir methods. Undoubtedly, some improvements are possible,but none of the seven papers reviewed here claims to always findthe one global optimum, which would be NP-hard (34).

We have shown that a number of physical systems that per-form optimization are acting through the Principle of MinimumPower Dissipation, although other physics principles could alsofulfill this goal. As the systems evolve toward an extremum, theyperform Lagrange function optimization where the Lagrangemultipliers are given by the gain or loss coefficients that keepthe machine running. Thus, nature provides us with a series of


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021

physical optimization machines that are much faster and possiblymore energy-efficient than conventional computers.

Data Availability. All study data are included in the article and SI Appendix.

ACKNOWLEDGMENTS. We gratefully acknowledge useful discussions withDr. Ryan Hamerly, Dr. Tianshi Wang, and Prof. Jaijeet Roychowdhury. Thework of S.K.V., T.P.X., and E.Y. was supported by the NSF through the Centerfor Energy Efficient Electronics Science (E3S) under Award ECCS-0939514and the Office of Naval Research under Grant N00014-14-1-0505.

1. Y. Shen et al., Deep learning with coherent nanophotonic circuits. Nat. Photonics 11,441–446 (2017).

2. L. Onsager, Reciprocal relations in irreversible processes. II. Phys. Rev. 38, 2265–2279(1931).

3. A. Lucas, Ising formulations of many NP problems. Front. Phys. 2, 5 (2014).4. S. Utsunomiya, K. Takata, Y. Yamamoto, Mapping of Ising models onto injection-

locked laser systems. Opt. Express 19, 18091–18108 (2011).5. Y. Haribara, S. Utsunomiya, Y. Yamamoto, Computational principle and perfor-

mance evaluation of coherent Ising machine based on degenerate optical parametricoscillator network. Entropy 18, 151 (2016).

6. T. Inagaki et al., Large-scale Ising spin network based on degenerate opticalparametric oscillators. Nat. Photonics 10, 415–419 (2016).

7. T. Inagaki et al., A coherent Ising machine for 2000-node optimization problems.Science 354, 603–606 (2016).

8. T. Leleu, Y. Yamamoto, P. L. McMahon, K. Aihara, Destabilization of local minima inanalog spin systems by correction of amplitude heterogeneity. Phys. Rev. Lett. 122,040607 (2019).

9. T. P. Xiao, “Optoelectronics for refrigeration and analog circuits for combinato-rial optimization,” PhD thesis, Department of Electrical Engineering and ComputerSciences, University of California, Berkeley, CA (2019).

10. M. Babaeian et al., A single shot coherent Ising machine based on a network ofinjection-locked multicore fiber lasers. Nat. Commun. 10, 3516 (2019).

11. T. Wang, J. Roychowdhury, “OIM: Oscillator-based Ising machines for solving com-binatorial optimisation problems” in Unconventional Computation and NaturalComputation, I. McQuillan, S. Seki, Eds. (Lecture Notes in Computer Science, SpringerInternational Publishing, Cham, Switzerland, 2019), vol. 11493, pp. 232–256.

12. K. P. Kalinin, N. G. Berloff, Global optimization of spin Hamiltonians with gain-dissipative systems. Sci. Rep. 8, 17791 (2018).

13. C. Roques-Carmes et al., Heuristic recurrent algorithms for photonic Ising machines.Nat. Commun. 11, 249 (2020).

14. S. Mahler, M. L. Goh, C. Tradonsky, A. A. Friesem, N. Davidson, Improved phaselocking of laser arrays with nonlinear coupling. Phys. Rev. Lett. 124, 133901(2020).

15. D. Pierangeli, G. Marcucci, C. Conti, Large-scale photonic Ising machine by spatial lightmodulation. Phys. Rev. Lett. 122, 213902 (2019).

16. M. Ercsey-Ravasz, Z. Toroczkai, Optimization hardness as transient chaos in an analogapproach to constraint satisfaction. Nat. Phys. 7, 966–970 (2011).

17. B. Molnar, F. Molnar, M. Varga, Z. Toroczkai, M. Ercsey-Ravasz, A continuous-timeMaxSAT solver with high analog performance. Nat. Commun. 9, 4864 (2018).

18. F. L. Traversa, M. Di Ventra, Polynomial-time solution of prime factorization andNP-complete problems with digital memcomputing machines. Chaos 27, 023107(2017).

19. W. Maass, T. Natschlager, H. Markram, Real-time computing without stable states: Anew framework for neural computation based on perturbations. Neural Comput. 14,2531–2560 (2002).

20. G. Tanaka et al., Recent advances in physical reservoir computing: A review. NeuralNetw. 115, 100–123 (2019).

21. H. Goto, K. Tatsumura, A. R. Dixon, Combinatorial optimization by simulatingadiabatic bifurcations in nonlinear Hamiltonian systems. Sci. Adv. 5, eaav2372 (2019).

22. W. A. Borders et al., Integer factorization using stochastic magnetic tunnel junctions.Nature 573, 390–393 (2019).

23. I. Prigogine, Etude Thermodynamique des Phenomenes irreversibles (Editions Desoer,Liege, 1947), chap. V.

24. S. de Groot, Thermodynamics of Irreversible Processes (Interscience Publishers, NewYork, NY, 1951), chap. X.

25. N. G. Dickson et al., Thermally assisted quantum annealing of a 16-qubit problem.Nat. Commun. 4, 1903 (2013).

26. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, 2004).27. D. Bertsekas, Nonlinear Programming (Athena Scientific, 1999).28. Index of /∼yyye/yyye/Gset. https://web.stanford.edu/∼yyye/yyye/Gset/. Accessed 21

September 2020.29. U. Benlic, J. K. Hao, Breakout local search for the max-cut problem. Eng. Appl. Artif.

Intell. 26, 1162–1173 (2013).30. P. L. McMahon et al., A fully programmable 100-spin coherent Ising machine with

all-to-all connections. Science 354, 614–617 (2016).31. J. Roychowdhury, Boolean computation using self-sustaining nonlinear oscillators.

Proc. IEEE 103, 1958–1969 (2015).32. A. Demir, A. Mehrotra, J. Roychowdhury, Phase noise in oscillators: A unifying the-

ory and numerical methods for characterization. IEEE Trans. Circuits Syst. I: Fundam.Theory Appl. 47, 655–674 (2000).

33. M. Reck, A. Zeilinger, H. J. Bernstein, P. Bertani, Experimental realization of anydiscrete unitary operator. Phys. Rev. Lett. 73, 58–61 (1994).

34. R. M. Karp, “Reducibility among combinatorial problems” in Complexity of ComputerComputations, R. E. Miller, J. W. Thatcher, Eds. (Springer, 1972), pp. 85–103.


Dow

nloa

ded

by g

uest

on

Aug

ust 1

2, 2

021


https://web.stanford.edu/~yyye/yyye/Gset/


Date post:	13-Mar-2021
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Physics successfully implements Lagrange multiplier ...COMPUTER SCIENCES Physics successfully...

Documents