+ All Categories
Home > Documents > IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1...

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1...

Date post: 01-Apr-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
12
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 Predictive Control Using an FPGA With Application to Aircraft Control Edward Nicholas Hartley, Juan Luis Jerez, Student Member, IEEE, Andrea Suardi, Jan M. Maciejowski, Fellow, IEEE , Eric C. Kerrigan, Member, IEEE, and George A. Constantinides, Senior Member, IEEE Abstract— Alternative and more efficient computational meth- ods can extend the applicability of model predictive control (MPC) to systems with tight real-time requirements. This paper presents a system-on-a-chip MPC system, implemented on a field-programmable gate array (FPGA), consisting of a sparse structure-exploiting primal dual interior point (PDIP) quadratic program (QP) solver for MPC reference tracking and a fast gradient QP solver for steady-state target calculation. A parallel reduced precision iterative solver is used to accelerate the solution of the set of linear equations forming the computational bottleneck of the PDIP algorithm. A numerical study of the effect of reducing the number of iterations highlights the effectiveness of the approach. The system is demonstrated with an FPGA-in- the-loop testbench controlling a nonlinear simulation of a large airliner. This paper considers many more manipulated inputs than any previous FPGA-based MPC implementation to date, yet the implementation comfortably fits into a midrange FPGA, and the controller compares well in terms of solution quality and latency to state-of-the-art QP solvers running on a standard PC. Index Terms— Aerospace control, field-programmable gate arrays (FPGAs), optimization methods, predictive control. I. I NTRODUCTION A STRENGTH of model predictive control (MPC) is its ability to systematically handle constraints [1]–[4]. At each sampling instant, the solution of a constrained finite- horizon optimal control problem is used as the basis for the control action applied to the plant. With a linear prediction model, a convex quadratic cost function and linear inequality constraints on the plant states and inputs, this can be posed as a convex quadratic program (QP). For MPC with offset- free reference tracking, it is commonplace to also calculate a steady-state equilibrium target to which the MPC regulates Manuscript received November 12, 2012; revised May 1, 2013; accepted June 16, 2013. Manuscript received in final form June 27, 2013. This work was supported in part by the EPSRC under Grant EP/G030308/1, Grant EP/G031576/1, and Grant EP/I012036/1, and the EU FP7 Project EMBOCON under Grant FP7-ICT-2009-4 248940, Xilinx, MathWorks, and the European Space Agency. Recommended by Associate Editor N. E. Wu. E. N. Hartley and J. M. Maciejowski are with the Department of Engineering, Cambridge University, Cambridge CB2 1PZ, U.K. (e-mail: [email protected]; [email protected]). J. L. Jerez, A. Suardi, and G. A. Constantinides are with the Department of Electrical and Electronic Engineering, Imperial College, London SW7 2AZ, U.K. (e-mail: [email protected]; [email protected]; [email protected]). E. C. Kerrigan is with the Department of Aeronautics and the Department of Electrical and Electronic Engineering, Imperial College, London SW7 2AZ, U.K. (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCST.2013.2271791 [5]–[7]. For a system with redundant actuators, this requires the solution of an additional (albeit smaller) constrained opti- mization problem. For small predictive control problems, an attractive option is to precompute the explicit solution of the QP as a piecewise-affine function of the current state (and reference) using multiparametric programming [8], [9]. For large pre- dictive control problems, the computation and storage of the explicit solution is impractical, and the constrained opti- mization problem must be solved online. For a broad range of control applications that could benefit from employing predictive control, the cost and power requirements of general purpose computing platforms necessary to meet hard real-time requirements are unfavorable, and custom circuits designed specifically for the predictive control application are an attrac- tive alternative. In this paper, a field-programmable gate array (FPGA)-based implementation of an MPC controller for a large airliner is demonstrated. Tracking of roll, pitch and airspeed are achieved through the use of a steady-state target calculator and MPC regulator [5], [6], each requiring the solution of a constrained QP. While arguments for the use of MPC in flight control can be found in [10]–[12], the focus of this paper is on the FPGA-based methodology for implementation of the optimization scheme rather than tuning controller parameters or obtaining formal certificates of stability, for which a mature body of theory is already available [3], [13]. A. Background and Motivation FPGAs are reconfigurable chips that can be customized for a specific application. This enables the implementation of com- plex algorithms exploiting wide parallelization. For predictive control applications, FPGAs are easily embedded as a system component, and offer cycle-accurate timing guarantees. The low clock frequencies in comparison with high-performance microprocessors can translate to lower power consumption. Computational speed is regained through parallelization and customizability. Transistor scaling is still improving the per- formance and reducing the power consumption and cost in each new technology generation. There have been several previous FPGA implementations of QP solvers for predictive control. In [21], speeds comparable with MATLAB implementations were achieved for an aircraft example with four states, by using a high-level synthesis tool to convert a C-like description into a hardware description. 1063-6536/$31.00 © 2013 IEEE
Transcript
Page 1: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1

Predictive Control Using an FPGA WithApplication to Aircraft Control

Edward Nicholas Hartley, Juan Luis Jerez, Student Member, IEEE, Andrea Suardi,Jan M. Maciejowski, Fellow, IEEE, Eric C. Kerrigan, Member, IEEE,

and George A. Constantinides, Senior Member, IEEE

Abstract— Alternative and more efficient computational meth-ods can extend the applicability of model predictive control(MPC) to systems with tight real-time requirements. This paperpresents a system-on-a-chip MPC system, implemented on afield-programmable gate array (FPGA), consisting of a sparsestructure-exploiting primal dual interior point (PDIP) quadraticprogram (QP) solver for MPC reference tracking and a fastgradient QP solver for steady-state target calculation. A parallelreduced precision iterative solver is used to accelerate thesolution of the set of linear equations forming the computationalbottleneck of the PDIP algorithm. A numerical study of the effectof reducing the number of iterations highlights the effectivenessof the approach. The system is demonstrated with an FPGA-in-the-loop testbench controlling a nonlinear simulation of a largeairliner. This paper considers many more manipulated inputsthan any previous FPGA-based MPC implementation to date,yet the implementation comfortably fits into a midrange FPGA,and the controller compares well in terms of solution quality andlatency to state-of-the-art QP solvers running on a standard PC.

Index Terms— Aerospace control, field-programmable gatearrays (FPGAs), optimization methods, predictive control.

I. INTRODUCTION

ASTRENGTH of model predictive control (MPC) isits ability to systematically handle constraints [1]–[4].

At each sampling instant, the solution of a constrained finite-horizon optimal control problem is used as the basis for thecontrol action applied to the plant. With a linear predictionmodel, a convex quadratic cost function and linear inequalityconstraints on the plant states and inputs, this can be posedas a convex quadratic program (QP). For MPC with offset-free reference tracking, it is commonplace to also calculatea steady-state equilibrium target to which the MPC regulates

Manuscript received November 12, 2012; revised May 1, 2013; acceptedJune 16, 2013. Manuscript received in final form June 27, 2013. This workwas supported in part by the EPSRC under Grant EP/G030308/1, GrantEP/G031576/1, and Grant EP/I012036/1, and the EU FP7 Project EMBOCONunder Grant FP7-ICT-2009-4 248940, Xilinx, MathWorks, and the EuropeanSpace Agency. Recommended by Associate Editor N. E. Wu.

E. N. Hartley and J. M. Maciejowski are with the Department ofEngineering, Cambridge University, Cambridge CB2 1PZ, U.K. (e-mail:[email protected]; [email protected]).

J. L. Jerez, A. Suardi, and G. A. Constantinides are with the Department ofElectrical and Electronic Engineering, Imperial College, London SW7 2AZ,U.K. (e-mail: [email protected]; [email protected];[email protected]).

E. C. Kerrigan is with the Department of Aeronautics and the Department ofElectrical and Electronic Engineering, Imperial College, London SW7 2AZ,U.K. (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCST.2013.2271791

[5]–[7]. For a system with redundant actuators, this requiresthe solution of an additional (albeit smaller) constrained opti-mization problem.

For small predictive control problems, an attractive optionis to precompute the explicit solution of the QP as apiecewise-affine function of the current state (and reference)using multiparametric programming [8], [9]. For large pre-dictive control problems, the computation and storage ofthe explicit solution is impractical, and the constrained opti-mization problem must be solved online. For a broad rangeof control applications that could benefit from employingpredictive control, the cost and power requirements of generalpurpose computing platforms necessary to meet hard real-timerequirements are unfavorable, and custom circuits designedspecifically for the predictive control application are an attrac-tive alternative. In this paper, a field-programmable gatearray (FPGA)-based implementation of an MPC controllerfor a large airliner is demonstrated. Tracking of roll, pitchand airspeed are achieved through the use of a steady-statetarget calculator and MPC regulator [5], [6], each requiringthe solution of a constrained QP. While arguments for theuse of MPC in flight control can be found in [10]–[12],the focus of this paper is on the FPGA-based methodologyfor implementation of the optimization scheme rather thantuning controller parameters or obtaining formal certificates ofstability, for which a mature body of theory is already available[3], [13].

A. Background and Motivation

FPGAs are reconfigurable chips that can be customized for aspecific application. This enables the implementation of com-plex algorithms exploiting wide parallelization. For predictivecontrol applications, FPGAs are easily embedded as a systemcomponent, and offer cycle-accurate timing guarantees. Thelow clock frequencies in comparison with high-performancemicroprocessors can translate to lower power consumption.Computational speed is regained through parallelization andcustomizability. Transistor scaling is still improving the per-formance and reducing the power consumption and cost ineach new technology generation.

There have been several previous FPGA implementations ofQP solvers for predictive control. In [21], speeds comparablewith MATLAB implementations were achieved for an aircraftexample with four states, by using a high-level synthesis toolto convert a C-like description into a hardware description.

1063-6536/$31.00 © 2013 IEEE

Page 2: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY

TABLE I

CHARACTERISTICS OF EXISTING FPGA-BASED QP SOLVER IMPLEMENTATIONS

A soft-core (sequential) processor implemented on the FPGAfabric was used in [20] to execute a C implementation ofthe QP solver and demonstrate the performance on a two-state drive-by-wire system. In [22] and [23], a mixed soft-ware/hardware implementation is used where the core matrixcomputations are carried out in parallel custom hardware,while the remaining operations are implemented in a generalpurpose microprocessor. The performance was evaluated ontwo-state systems. In contrast, in [17] the linear solvers wereimplemented in software while custom accelerators were usedfor the remaining operations. A motor servo system with twostates was used as a case study. The use of nonstandardnumber representations was studied in [15] with a hybridfixed-point floating-point architecture and a nonstandard MPCformulation tested on a satellite example with six states. While[14] presents a full fixed-point implementation, no analysisor guarantees are provided for handling the large dynamicrange manifested in interior-point methods. Recently, active-set [18] and interior-point [16], [19] architectures were pro-posed using (very) reduced precision floating-point arithmeticand solving a condensed QP. Feasibility was demonstrated onan experimental setup with a 14th-order SISO open-loop stablevibrating beam, with impressive computation times. Manyonline optimization-based FPGA controllers have been forlow-dimensional systems—a domain in which explicit MPCcould also potentially be an option. In [24] a summary ofFPGA-based MPC implementations up until 2010 is presented.Table I in this paper shows a survey of more recent devel-opments, albeit highlighting little progress. A common trendis the use of dense QP formulations in contrast with thecurrent trends in research for structure-exploiting optimizationalgorithms for predictive control.

B. Summary of Contribution

This paper considers a plant model with significantly moremanipulated inputs than prior FPGA-based implementations.In addition, the sparse structure of the uncondensed opti-mization problem is exploited and unlike prior FPGA-basedimplementations of MPC, the steady-state target calculator foroffset free control is included on-chip as a separate constrainedQP. The implementation is also capable of running reliably athigher clock rates than prior designs. Table II summarizes keycharacteristics of the design.

1) MPC Regulator: The MPC regulator is based on amodification to a VHDL-based design, first presented in [24],[25] and applied in [26]. A primal-dual interior-point (PDIP)algorithm is used to solve the constrained QP. A parallelimplementation of the minimum residual (MINRES) algorithmis used to solve the system of linear equations occurring ateach iteration of the PDIP algorithm. Unlike many embeddedmicroprocessors, an FPGA does not preclude standard doubleprecision arithmetic. However, single precision arithmetic isused to reduce the number of hardware resources required,since this reduces the size, cost and power consumption ofthe FPGA device needed to realize the design.

The MINRES algorithm is known to be sensitive to numer-ical precision and conditioning, and a preconditioner is oftenneeded to accelerate convergence and to improve the qualityof the converged solution. In [24] and [25] no precondition-ing was used, while in [26] a scaling of the state spaceprediction matrices was demonstrated as an effective offlinepreconditioner.

To demonstrate the tradeoff between control performanceand solution time, a numerical study is performed to inves-tigate the compromise between the number of iterationsof the inner MINRES algorithm, and the effect of offlinemodel scaling and online matrix preconditioning. All of thesedirectly influence the total solution time and the solutionquality, both in terms of fidelity of the computed controlinput with respect to that obtained from a standard QPsolver, and in terms of the resulting closed-loop controlperformance. Despite using single precision arithmetic, theproposed design using offline scaling and online precondi-tioning, running on an FPGA with the circuit clocked at250 MHz, gives solution accuracies and times comparable withthose of standard matrix factorization-based algorithms usingdouble precision arithmetic on a midrange (circa 2012) laptopcomputer.

2) Target Calculator: The design for the MPC regulatorexploits the sparse structure of the optimization problem[27], [28]. This structure is not present in the steady-statetarget calculation problem and therefore the design cannotbe re-used without substantial modification. FPGA resourceconstraints make implementing a second floating point interiorpoint solver an unrealistic option. Since the target calculationproblem is dense and bound constrained, the steady-state target

Page 3: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HARTLEY et al.: PREDICTIVE CONTROL USING AN FPGA 3

TABLE II

KEY PARAMETERS CHARACTERIZING MPC IMPLEMENTATION

calculator is realized using the fast gradient method (FGM)[29]. This was carried out using the MathWorks’ HDL coder[30], using fixed point arithmetic.

3) System Integration: The two QP solvers are each imple-mented as peripherals to a Xilinx MicroBlaze soft-core proces-sor [31], which is used to handle data transfer between thetwo QP solvers and the outside world via UDP/IP over100-Mbit/s Ethernet. In contrast to [17], [22], and [23] wherethe custom circuit is used to accelerate key parts of theQP solver with the remainder implemented in software on aconventional processor, the present design uses the MicroBlazesolely to bridge communication. This can be replaced witha custom interface layer to suit the demands of a givenapplication.

The resulting system-on-a-chip is verified using a test-bench system with the plant model (in this case a high-fidelity nonlinear model of the rigid body dynamics of a largeairliner [11]), simulated faster-than-real-time in SIMULINK.The plant state estimates and output reference setpoints arecommunicated to the system running on the FPGA in thepayload of a UDP packet at each sampling instant, with theFPGA returning control commands in the same manner.

C. Outline

The remainder of this paper is organized as follows.Section II summarizes the MPC control problem. Section IIIintroduces the case study application. Section IV describesthe hardware implementation. Section V presents a numericalinvestigation of the tradeoff between MINRES iterations,online and offline preconditioning and control performance.Section VI describes the test bench. Section VII presentsnumerical results and Section VIII concludes this paper.

II. PREDICTIVE CONTROL PROBLEM

In this section, the constrained, linear predictive trackingcontrol problem with quadratic cost is briefly summarized. Theforms of the two QPs to be solved on the FPGA at eachsampling instant are described.

A. Predictive Regulation Control Problem

Let x ∈ Rnx and u ∈ R

nu denote the state and controlledinput of a nonlinear plant at an equilibrium trim point, where

nx is the number of states and nu is the number of inputs.Defining the nonlinear plant state at time k as x(k) ∈ R

nx andinput u(k) ∈ R

nu , respectively, and letting δx(k) � x(k)− x ,and δu(k) � u(k)− u, a linear prediction model of the form

δx(k + 1) = Aδx(k)+ Bδu(k)+ Bdw(k) (1)

can be obtained by linearization of a nonlinear model aroundthis (x, u), where A ∈ R

nx×nx , B ∈ Rnx×nu and Bd ∈ R

nx×nw ,and w(k) ∈ R

nw is an unmeasured disturbance.Letting δxs(r(k)) and δus(r(k)) be state and input setpoints

relative to the equilibrium trim point, N be the control horizon(equal to the prediction horizon), δx(k) be the current estimateof the linearized state, w(k) be an estimate of the unmeasureddisturbance, δxi and δui denote predicted state and inputdeviations from the trim point at a time instant i steps intothe future, θ � [δx T

0 , δuT0 , . . . , δx T

N ]T , ‖ · ‖2M be a shorthandfor · T M · , and P ≥ 0, Q ≥ 0, R > 0 be appropriately sizedweighting matrices, then the finite-horizon optimal controlproblem to solve at each time step is

minθ‖δxN − δxs‖2P +

N−1∑

i=0

(‖δxi − δxs‖2Q + ‖δui − δus‖2R

)

(2a)

s.t. δx0 = δx(k) (2b)

δxi+1 = Aδxi + Bδui + Bdw(k), i = 0, . . . , N − 1 (2c)

δumin ≤ δui ≤ δumax, i = 0, . . . , N − 1. (2d)

For simplicity, state constraints are neglected and with onlyinput constraints considered, no further measure is needed toensure that a feasible solution to the QP exists, hence nomethod to detect infeasibility is required in the QP solver.

Letting ⊗ denote the Kronecker product, ⊕ denote thematrix direct sum, Ii ∈ R

i×i be an identity matrix, and1i ∈ R

i×1 be a vector of ones, let

H � 2 (IN ⊗ (Q ⊕ R))⊕ P (3a)

G �[

IN ⊗[

0nu×nx Inu

0nu×nx −Inu

], 02Nnu×nx

](3b)

F �

⎡⎢⎢⎢⎣

−Inx

A B −Inx...

. . .

−Inx

⎤⎥⎥⎥⎦ (3c)

h � 2[−1T

N ⊗[δx T

s Q δuTs R

] −δx Ts P

]T(3d)

g � 1N ⊗[δuT

max −δuTmin

]T(3e)

f �[−x T (k) −(1T

N ⊗ wT (k)BTd )

]T. (3f)

The optimal control problem (OCP) (2) is posed as aparametric QP

minθ12θT H θ + hT θ s.t. Gθ ≤ g, Fθ = f (4)

where vectors h and f change between problem instances.

B. Steady-State Target Calculation

For nominal offset-free steady-state tracking of referencesetpoints it is common to use a target calculator [5]–[7] to

Page 4: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY

calculate δxs and δus that satisfy

Aδxs(k)+ Bδus(k)+ Bww(k) = δxs(k) (5a)

Crδxs(k) = r(k) (5b)

δumin ≤ δus ≤ δumax

δxmin ≤ δxs ≤ δxmax (5c)

where Cr ∈ Rnr×nx , and r(k) ∈ R

nr is a vector of nr referencesetpoints to be tracked without offset if, for some value r∞,r(k) → r∞ as k → ∞. A feasible solution to (5) is notguaranteed to exist. Let

As �[(A − I ) B

Cr 0

]

Bs �[−Bw 0

0 I

]

θs �[δxs

δus

]

bs �[w(k)r(k)

]

and W > 0 be a weighting matrix. The solution of

minθs12θT

s ATs W Asθs − bT

s BTs W Asθs (6)

subject to (5c) will find a solution satisfying the equalityconstraints if one exists and return a least-squares approxi-mation if one does not. This is a dense QP with no equalityconstraints; however, it is not guaranteed that AT

s W As > 0 andthe solution of (6) may not be unique. Defining Q � Q ⊕ R,A⊥s to be a matrix whose columns form an orthogonal basisof Ker(As) and Hs � AT

s W As+ A⊥s A⊥T

s QA⊥s A⊥T

s , a (strictlyconvex) target calculation problem can now be posed as

minθs12θT

s Hsθs − bTs BT

s W Asθs (7)

subject to (5c). The optimal values δx∗s (r(k)) and δu∗s (r(k))are then used as the setpoints in the regulation problem (2).

III. CASE STUDY—CONTROL OF A LARGE AIRLINER

For this implementation, the control of the roll, pitch andairspeed of a nonlinear Simulink-based model [11], [32] of therigid-body dynamics of a Boeing 747-200 with individuallymanipulable controls surfaces is considered.

A prediction model of the form (1) is obtained by lineariza-tion of the nonlinear model about an equilibrium trim pointfor straight and level flight at an altitude of 600 m and anairspeed of 133 ms−1, discretized with a sample period ofTs = 0.2 s. The linearized model considers 14 states (rollrate, pitch rate, yaw rate, airspeed, angle of attack, sideslipangle, roll, pitch, yaw, altitude, and four engine power states).Yaw angle and altitude are neglected in the prediction modelused for the predictive controller, since they do not affect theroll, pitch and airspeed (leaving 12 remaining states). The17 inputs considered consist of four individually manipulableailerons, left spoiler panels, right spoiler panels, four indi-vidually manipulable elevators, a stabilizer, upper and lowerrudder, and four engines. The effects of landing gear andflaps are not considered as these substantially change the locallinearization. The disturbance input matrix Bd is selected to

TABLE III

COST FUNCTION

TABLE IV

INPUT CONSTRAINTS

describe an input disturbance in discrete time that is equivalentto a constant disturbance to the rate of each of the first 10 statesin continuous time. The cost function (2a) is chosen withthe weights in Table III and the constraints on the inputs aresummarized in Table IV.

The twelve states δx(k) of the model (1) are assumedmeasurable, along with the two variables that were neglectedin the prediction mode: the altitude, and the yaw angle. Thedisturbance w(k) cannot be measured. As is standard practicein predictive control, an observer is therefore used to esti-mate w(k). The observer includes a one step ahead predictionto allow the combined target calculator and predictive regulatora deadline of one sampling period for computation.

The target calculator is configured with Cr = [e4, e7, e8]T ,where ei is the i th column of the 29 × 29 element identitymatrix, in order that the references to be tracked are airspeed,roll, and pitch angle. The weighting matrix W is selected asan appropriately sized identity matrix.

IV. HARDWARE IMPLEMENTATION

While for software running on desktop-oriented platformsthe use of IEEE double precision (64-bit) floating-pointarithmetic is generally unquestioned, for embedded platformssingle precision (32-bit) arithmetic is often preferred due tothe super-linear relationship between word-length and siliconresources [33]. In addition, memory storage and bandwidthrequirements—key issues for resource-constrained embeddedapplications—decrease with the number of bits used. Manyapplications, including predictive control, do not require theaccuracy provided by double precision arithmetic if sufficientcare is taken to formulate the problem in a numericallyfavorable way.

Factors different to those important for a software imple-mentation can motivate the choice of algorithm for a customhardware design. In sequential software, a smaller floating-point operation count leads to shorter algorithm runtimes.In hardware it is the ratio of parallelizable work to sequentialwork that determines the potential speed of an implementation.

Page 5: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HARTLEY et al.: PREDICTIVE CONTROL USING AN FPGA 5

In addition, the proportion of different types of operations canalso be an important factor. For example, multiplication andaddition have lower latency and use fewer hardware resourcesthan division or square root operations.

A. MPC QP Solver

This subsection describes the main architectural and algo-rithmic details for the design of the solver for problem (4).

1) PDIP Algorithm: The present design uses a PDIPalgorithm [28] with an iterative linear solver at its core(Fig. 1). Iterative linear solvers can be preferable over direct(factorization-based) methods in this context, despite the prob-lem sizes being small in comparison with the problems forwhich iterative methods have been used historically. Firstly,matrix-vector multiplication accounts for most of the com-putation at each iteration, an operation offering multipleparallelization and pipelining opportunities. Secondly, thereare few division and square root operations compared withfactorization-based methods. Finally, these methods allow oneto trade off accuracy for computational time by varying thenumber of iterations. Despite iterative methods being moresensitive to poor conditioning than direct methods, for thepresent application, with suitable problem scaling, a relativelysmall number of iterations can be sufficient to obtain adequateaccuracies, as observed in Section V. Conversely, direct meth-ods are more difficult to parallelize and pipeline, with manypoints at which all subsequent operations depend upon thesolution of a single division operation (with a comparativelylong latency).

Because there is no matrix factorization to reuse, a sim-ple PDIP algorithm [28] is employed instead of Mehrotra’spredictor-corrector algorithm [34], which is commonly foundin software packages. Rather than checking a terminationcriterion, the number of interior-point iterations is fixed to18 since this guarantees that a (possibly suboptimal) solutionis available at a given time. A detailed investigation intothe number of iterations needed by interior-point methods isnot the subject of this paper, however, a posteriori testingindicates that 18 iterations is sufficient to achieve the accuracyof solution needed, and a crude bound on the duality measureat the final iteration indicates μ18 ≥ 1.52 · 0.3518 ≈ 1.4×10−8.Both this and its reciprocal are well within the dynamic rangeof single precision floating point arithmetic. In step 5 of thePDIP algorithm, a backtracking line search algorithm is used,reducing the step-length by a factor of 0.5 per iteration, overa maximum of 17 iterations before rounding to zero.

To accelerate the convergence of the iterative linear solverfor Step 3 in the PDIP algorithm, a positive diagonal precondi-tioner [33], [35] is used whose elements (indexed with bracedsubscripts) are given by

M{ii}(Ak) � 1

/√∑Zj=1 |Ak,{i j }| (8)

where Z is the number of columns of the matrix. The problemM(Ak)Ak M(Ak)wk = M(Ak)bk is solved and the solutionto the original problem is recovered by computing zk =M(Ak)wk . Hardware resource usage motivates the use of asimple diagonal structure. While the original motivation for the

Fig. 1. PDIP algorithm.

derivation of this preconditioner was to bound variables ratherthan to improve convergence [33], [35], it has a positive effecton the convergence of the iterative algorithms for this andother applications. The hardware implementation is outlinedin Section IV-A3 and its effect on reducing the number ofiterations required for convergence is illustrated in Sections Vand VII.

2) Architecture of QP Solver: This design is based onthe initial architecture described in detail in [24] and [25],implemented using VHDL and Xilinx IP-cores for floatingpoint arithmetic and RAM structures. The implementationis split into two distinct blocks: one block accelerates thecomputational bottleneck of the algorithm (solving the linearequations in Fig. 1, step 3) implementing a parallel MINRESsolver; the other block computes all the remaining operations.

The block that prepares the linear systems consists of acustom sequential machine with custom complex instructionsthat reduce instruction storage requirements and is close to100% efficient, i.e., there are no cache misses or pipeline stalls,so a useful result is produced at every clock cycle. The originalsequence of instructions has been modified for problemswith only input bound constraints, significantly reducing theamount of computation for calculating �k . In addition, moreinstructions have been added to calculate the preconditioner Mand to recover the search direction zk from the result for thepreconditioned system given by the linear solver. Since veryfew elements in �k are changing from iteration to iteration,the updating of the preconditioner M is not costly.

The parallel MINRES solver accelerates computations byperforming dot-products in a parallel fashion. This blockconsists of a bank of multipliers that perform all the multipli-cations concurrently followed by an adder reduction tree thataccumulates the results. Since matrix Ak can be rearrangedinto a banded form by interleaving the primal and dualvariables [27], the size of the dot-products is equal to the size

Page 6: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY

TABLE V

VALUES FOR c IN (9) FOR DIFFERENT IMPLEMENTATIONS

of the band 2V − 1, where V = 2nx + nu is the half-band ofthe matrix. A customized storage scheme is used to exploit thefact that many elements in Ak are constant or zero and thereis repetition due to the multistage problem structure and time-invariance. This saves approximately 70% of memory storagerequirements in comparison with a standard banded storagescheme [24]. Exploitation of this structure would not havebeen possible with direct methods, since the nonzero band inthe factorized matrices becomes dense.

Another benefit of FPGA technology for real-time appli-cations is the ability to provide cycle accurate computationtime guarantees. Let II P and IM R be the number of PDIPand MINRES iterations, respectively. Let fc denote the clockfrequency (250 MHz in this implementation) and c relatethe time spent by the sequential block to the time spentin a MINRES iteration (this varies with different imple-mentations as shown in Table V). Let Z = N(2nx +nu) + 2nx denote the number of elements in bk , and P �⌈(

2Z + V + 12 log2(2V − 1)� + 230)/Z

⌉, then for the cur-

rent design, computation time is given by

(II P · P · Z(IM R + c)) / fc seconds. (9)

3) Online Preconditioning Implementation: Two options forimplementing the online preconditioning can be considered.The first option is to compute the preconditioned matrix inthe sequential block and store it in the linear solver. Thisrequires no extra computational resources; however, it imposesan extra computational load on the sequential block, increasingoverall latency. It also prohibits the use of the customizedreduced storage scheme, since the nonzero elements that werepreviously constant between iterations are no longer constantin the preconditioned matrix.

The second option, adopted by the present implementation,only computes the preconditioner in the sequential block.The original matrix is stored in RAM in the linear solvercomponent, and the preconditioner is applied by a bank ofmultipliers inserted at the output of the memory. This requiresapproximately three times as much computation; however,this is not on the critical path, i.e., memories storing thematrix can be read earlier, so there is no effect on executionspeed. The reduced storage scheme is retained at the cost ofa significant increase in the number of multipliers. There isa tradeoff between the extra resources needed to implementthis procedure and the amount of acceleration gained througha reduction in iteration count, as investigated in Section VII.

B. Target Calculator

The QP solver described in Section IV-A2 is designedto directly exploit the sparse structure of the problem (4).

Fig. 2. Fast gradient method (where eig(Hs) ⊆ (0, 1])

The target calculation problem (7) does not exhibit such astructure, thus the solver of Section IV-A2 cannot be applieddirectly. A separate QP solver is therefore required.

Matrix Hs ∈ Rns×ns , where ns = nx+nu is relatively small,

positive definite by construction, and the constraints (5c) aresimple bounds. The FGM algorithm [29] is considered, beingdivision free and well-suited to fixed-point implementation(which uses significantly fewer FPGA resources).

A diagonal scaling matrix Ms is obtained in the samemanner as (8). The scaled matrix Hs � MsHs Ms has itsnumerical values in the range [−1, 1] and eigenvalues inthe range (0, 1] (proven in [35] and [33]). The same scal-ing is applied to calculate Fs � Ms AT

s W Bsbs and θq �Ms [δxT

q , δuTq ]T , q ∈ {max, min}. For the plant described in

Section III, the minimum and maximum eigenvalues of Hs

are λmin(Hs) ≈ 0.002 and λmax(Hs) ≈ 5582. After scaling,λmin(Hs) ≈ 2.79× 10−4 and λmax(Hs) = 1.0.

The implementation is prototyped at a register transfer levelusing SIMULINK and the MATLAB Fixed Point Toolbox,with control logic specified using M-code function blocks andprogrammatically converted to VHDL using MathWorks’ HDLcoder R2012a. The target calculator circuit is implemented asfour distinct subsystems. Subsystem #1 calculates f s � Fsbs

through multiplication of an ns × nb element matrix by nb

element vector (where nb = nd + nr ). Since this occurs onlyonce per sample, this matrix-vector multiplication is carriedout sequentially, with one multiplication and one additionhappening simultaneously per clock cycle, to conserve FPGAresource usage. Subsystem #2 contains the implementation ofthe FGM (Fig. 2).

As with MINRES, the majority of the effort in this algorithmoccurs in a matrix vector multiplication, which happens onceper iteration. The matrix Hs is stored with each column ina separate RAM, so that the multiplication of the elementsof each row of Hs with the elements of y(k−1) can be per-formed in parallel, with the sum of the elementwise productscalculated using a pipelined tree reduction structure. This latterbeing conveniently generated by HDL coder by using the sumof elements block configured to use a tree structure, withregister balancing. Subsystem #3 reverses the preconditioning,element-by-element in series, and subsystem #4 calculatesT−1

Q Qxs , T−1R Rus and T−1

Q Pxs—the input parameters to beused to construct h (3d) for the MPC controller circuit of

Page 7: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HARTLEY et al.: PREDICTIVE CONTROL USING AN FPGA 7

TABLE VI

DATA TYPES USED WITHIN TARGET CALCULATOR

Section IV-A2, but scaled by diagonal matrices TQ and TR toimprove problem conditioning (Section V).

The fixed point data types (Table VI) are chosen to matchthe resources present in the particular FPGA device targeted.In particular, data types for multiplication operations arechosen with consideration of the DSP48E resources on theFPGA, each of which includes a 25×18 bit integer multiplier.Matrix values are represented with a 25 bit data type andvector variables with a 35 bit data type, meaning that twoDSP48E units are required per vector element to enable athroughput of one element-wise vector product per clockcycle.

The length of the integer portions are chosen based onthe maximum ranges expected based on the input constraintsfrom the case study (Section III). The number of clock cyclesneeded is shown by stage in Table VII, where IFG denotes thenumber of FGM iterations. The time for each FGM iterationis insignificant compared with the interior point regulator, soIFG = 1000 is chosen as a conservative value. A detailedanalysis of the convergence properties of FGM using fixedpoint arithmetic can be found in [36].

V. OFFLINE PRESCALING FOR PDIP/MINRES

For each PDIP iteration, the convergence of the MINRESalgorithm used to solve Akzk = bk , and the accuracy ofthe final estimate of zk are influenced by the eigenvaluedistribution of Ak . When no scaling is performed on theprediction model and cost matrices for this application, and nopreconditioning is applied online, inaccuracy in the estimatesof zk leads the PDIP algorithm to not converge to a satisfactorysolution. Increasing the number of MINRES iterations fails toimprove the solution, yet increases the computational burden.

Preconditioning applied online at each iteration of the PDIPalgorithm can accelerate convergence and reduce the worst-case solution error of zk . In [26], an offline prescaling wasused in lieu of an online preconditioner, with the controlperformance demonstrated competitive with the use of con-ventional factorization-based algorithms on a general purposeplatform. The rationale behind the prescaling is now restatedand numerical results presented to demonstrate that combin-ing systematic offline prescaling with online preconditioning

TABLE VII

TARGET CALCULATOR TIMING

yields better performance compared with mutually exclusiveuse.

Matrix Ak is not constant, but W−1k is diagonal. Since

there are only upper and lower bounds on inputs, the varyingcomponent of Ak , �k only has diagonal elements. In addition,as k → II P−1, the elements of W−1

k corresponding to inactiveconstraints approach zero. Therefore, despite the diagonalelements of W−1

k corresponding to active constraints becominglarge, as long as only a handful of these exist at any point,the perturbation to A is of low rank, and will have a relativelyminor effect on the convergence of MINRES. Hence, rescalingthe control problem to improve the conditioning of A shouldalso improve the conditioning of Ak in some sense.

Prior to scaling, for N = 12, cond(A) = 1.77 × 107.The objective of the following procedure is to obtain diagonalmatrices TQ > 0 and TR > 0 to scale the linear state spaceprediction model and quadratic cost weighting matrices asfollows: A ← TQ AT−1

Q , B ← TQ BT−1R , Bd ← TQ Bd ,

Q ← T−1Q QT−1

Q , R ← T−1R RT−1

R , δumin ← TRδumin,δumax ← TRδumax. This substitution is equivalent to

A← MAM (10)

where

M �((

IN ⊗(

T−1Q ⊕ T−1

R

))⊕ T−1

Q

)⊕ (

IN+1 ⊗ TQ).

(11)

Since TQ and TR are diagonal, the diagonal structure of �k

is retained. The transformation (11) is a function of both TQ

and its inverse, and both of these appear quadratically.In [37] some guidelines are provided for desirable scaling

properties. In particular, it is desirable to scale the rows andcolumns of A so that they are all of similar magnitude. Whilenot exactly the original purpose, it should be noted that if thepreconditioner (8) is applied repeatedly (i.e., repreconditioningthe same matrix multiple times) to a general square matrix offull rank, the one-norm of each of the rows converges asymp-totically to unity. The method proposed here for normalizing Afollows naturally but with the further caveat that the structureof M is imposed to be of the form (11). Therefore, it is not(in general) possible to scale A such that all row norms areequal to an arbitrary value. Instead, the objective is to reducethe variation in row (and column) norms. Empirical testingsuggests that normalizing the two-norm of the rows of A(subject to [11]) gives the most accurate solutions from thePDIP method for the present application.

Noting the structure of A, define the following vectors:

sx �{

sx ∈ Rn : sx,{i} =

(∑nj=1 Q2

i j +∑n

j=1 A2j i + 1

)1/2}

Page 8: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY

Fig. 3. Offline prescaling algorithm.

TABLE VIII

EFFECTS OF OFFLINE PRECONDITIONING

su �{

su ∈ Rm : su,{i} =

(∑mj=1 R2

i j +∑n

j=1 B2j i + 1

)1/2}

sN �{

sN ∈ Rn : sN,{i} =

(∑nj=1 P2

i j + 1)1/2

}

sλ �{

sλ ∈ Rn : sλ,{i} =

(∑nj=1 A2

i j +∑m

j=1 B2i j + 1

)1/2}.

Also, define elementwise, l1 �√

su/μ and

l2 � {l2 ∈ Rn > 0 : (l2)

4 = ((Nsx + sN )/(1+ Nsλ))}where μ �

(N

∑sx +∑

sN + N∑

sλ + n)/(2(N + 1)n),

and apply the algorithm in Fig. 3.Table VIII shows properties of A that influence solution

quality, before and after application of the prescaling withε = 10−7: the condition number of A; the standard deviationof the row 1- and 2-norms; and the standard deviation of themagnitude of the eigenvalues, which are substantially reducedby the scaling.

Fig. 4 shows three metrics for the quality of the solutionfrom the MINRES-based PDIP solver over the duration ofa closed-loop simulation with a prediction horizon N = 12.The number of MINRES iterations per PDIP iteration is variedfor four different approaches to preconditioning (none, offline,online, and combined online and offline). These experimentswere performed in software, but a theoretical computation timeusing (9) for the FPGA implementation is also shown.

With neither preconditioning nor offline scaling, the controlperformance is unacceptable. Even when the number ofMINRES iterations is equal to 2Z = 2 × 516 = 1032, themean stage cost over the simulation is high (the controllerfailed to stabilize the aircraft), and the worst-case control errorin comparison with a conventional PDIP solver using doubleprecision arithmetic and a factorization-based approachis of the same order as the range of the control inputs.Using solely online preconditioning, control performance(in terms of the cost function) does not start to deterioratesignificantly until the number of MINRES iterations isreduced to �0.25Z� = 129, although at this stage the worstcase relative accuracy is still poor (but mean relative accuracyis tolerable). With only offline preconditioning, worst case

Fig. 4. Numerical performance for a closed-loop simulation withN = 12, using PC-based MINRES-PDIP implementation with online andoffline preconditioning (missing markers for the mean error indicate that atleast one control evaluation failed due to numerical errors). Offline prescalingappears to be more effective than the online preconditioning alone, but com-bined use yields significantly better numerical performance than using eitheralone. (a) No preconditioning. (b) Offline preconditioning only. (c) Onlinepreconditioning only. d) Online and Offline Preconditioning.

relative control error does not deteriorate until the numberof MINRES iterations is reduced to �0.75Z� = 387 andcontrol performance does not deteriorate until this is reducedto �0.1Z� = 51. When combined, control performance ismaintained with �0.03Z� = 15 iterations, and worst-casecontrol accuracy is maintained with �0.08Z� = 41 iterations.

Page 9: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HARTLEY et al.: PREDICTIVE CONTROL USING AN FPGA 9

Fig. 5. Hardware-in-the-loop experimental setup.

VI. FPGA-IN-THE-LOOP TESTBENCH

The hardware-in-the-loop experimental setup used to testthe predictive controller design has two goals: providing areliable real-time closed-loop simulation framework for con-troller design verification; and demonstrating that the con-troller could be plugged into a plant presenting an appropriateinterface. Fig. 5 shows a schematic of the experimental setup.The QP solver, running on a Xilinx FPGA ML605 evaluationboard [38], controls the nonlinear model of the B747 aircraftrunning in SIMULINK on a PC. At every sampling instantk, the observer estimates the next state δx(k + 1|k) anddisturbance w(k + 1|k). For the testbench, the roll, pitch andairspeed setpoints comprising the reference signal r(k) in thetarget calculator (5) that the predictive controller is designed totrack, are provided by simple linear control loops, with the rolland pitch setpoints as a function of a reference yaw angle andreference altitude, respectively. The airspeed setpoint is passedthrough a low-pass filter. The vectors δx(k+ 1|k), w(k+ 1|k)and r(k) are represented as a sequence of single-precisionfloating point numbers in the payload of a UDP packet via anS-function and this is transmitted over 100 Mbit/s Ethernet.The FPGA returns the control action in another UDP packet.This is applied to the plant model at the next sampling instant.

On the FPGA the two custom hardware circuits implement-ing the QP solvers for target calculation and MPC regulationare connected to a Xilinx MicroBlaze soft core processor,upon which a software application, bridges the communicationbetween the Ethernet interface and the two QP solvers. As wellas being simpler to implement, this architecture provides somesystem flexibility in comparison with a dedicated custom inter-face, with a small increase in FPGA resource usage (Table IX)and communication delay, and allows easy portability to otherstandard interfaces, e.g., SpaceWire, CAN bus, and so on, aswell as an option for direct monitoring of controller behavior.

Table IX shows the FPGA resource usage of the differentcomponents in the system-on-a-chip testbench, as well as theproportion of the FPGA used for two midrange devices withapproximately the same silicon area from the last two technol-ogy generations (the newer FPGA offers more resources perunit area, meaning a smaller, cheaper, lower power model canbe chosen). The linear solver uses the majority of the resourcesin the MPC QP solver, while the MPC QP solver consumessubstantially more resources than the target calculator, since itis solving a larger optimization problem (refer to Section II).

Fig. 6. Closed-loop state and input trajectory plots from FPGA-in-the-looptestbench. (a) Roll, pitch, yaw, altitude and airspeed trajectories. (b) Selectedinput trajectories (forbidden region shaded red).

Table IX also highlights the cost of using preconditioning.Using Xilinx Power Analyzer it is estimated that the Virtex

6 LX240T FPGA-based system draws 9.75 W of power, ofwhich 4.277 W is quiescent (e.g. leakage) and 5.469 W isdynamic. This is less than the 35 W thermal design powerassociated with the microprocessor in the laptop used forperformance comparison in the following section. A smaller,low-voltage or more modern FPGA model can reduce powerconsumption (particularly leakage). Within the custom circuit,3.00 W is used in the interior point QP solver and 0.12 W inthe FGM target calculator.

VII. FPGA-IN-THE-LOOP CLOSED-LOOP PERFORMANCE

A closed-loop system with the FPGA in the loop, con-trolling the nonlinear model of the Boeing 747 from [32]is compared with running the complete control system ona conventional computer using factorization-based methods.The two QP solvers are first evaluated separately, and thentrajectory plots of the closed-loop trajectories for the completesystem are presented. The reference trajectory is continuous,piecewise continuous in its first derivative, and consists of a

Page 10: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY

TABLE IX

FPGA RESOURCE USAGE

TABLE X

COMPARISON OF FPGA-BASED MPC REGULATOR PERFORMANCE (WITH BASELINE FLOATING POINT TARGET CALCULATION IN SOFTWARE)

period of level flight, followed by a 90◦ change in heading,then by a 200 m descent, followed by a 10 ms−1 deceleration.

A. MPC Regulator

Solution times and control quality metrics for the regulatorQP solver are presented for a 360 s simulation, with N = 12and N = 5 in Table X. Based on the results of Section V,for N = 12, the number of MINRES iterations per PDIPiteration IM R = 51. For N = 5, IM R = 30. This is higherthan was empirically determined to be necessary; however,the architecture of the QP solver requires that the MINRESstage must run for at least as long as the sequential stage. Thecontrol accuracy metrics presented are

emax � maxi,k

∣∣∣uF{i}(k)− u∗{i}(k)∣∣∣/ (

δumax,{i} − δumin,{i})

eμ � meani,k

∣∣∣uF{i}(k)− u∗{i}(k)∣∣∣/ (

δumax,{i} − δumin,{i})

where uF (k) is the calculated control input, and u∗(k) is thehypothetical true solution and the subscript · {i} indicates anelementwise index. Since the true solution is not possible toobtain analytically, the algorithm of [27], implemented usingMATLAB Coder, is used as a baseline.

The metrics are presented alongside those for custom soft-ware QP solvers generated using current versions of CVXGEN[39] (for N = 5 only since for N = 12 the problem was toolarge to handle) and FORCES [40] (for N = 12 and N = 5).PC-based comparisons are made using double precision arith-metic on a laptop with a 2.4 GHz Intel Core 2 Duo processor.The code from CVXGEN and FORCES is modified to usesingle precision arithmetic and timed running directly on the

TABLE XI

COMPARISON OF FPGA-BASED TARGET-CALCULATOR PERFORMANCE

100 MHz MicroBlaze soft core on the FPGA for the numberof iterations observed necessary on the PC. (Double precisionfloating point arithmetic would be emulated in software, andnot provide a useful timing comparison.) While obtainingresults useful for control from the single precision modificationto the CVXGEN solver proved to be too challenging, the tim-ing result is presented assuming random data for the numberof iterations needed on the PC. The MicroBlaze used for thesoftware solvers is configured with throughput optimizations,single precision floating point unit (including square root),maximum cache sizes (64 kB data and 64 kB instruction), andmaximum cache line length (eight words).

For N = 12, the FPGA-based QP solver (at 250 MHz)is slightly faster than the PC-based QP solver generatedusing FORCES (at 2.4 GHz) based on wall-clock time but≈ 10× faster on a cycle-by-cycle basis. It is also ≈ 65× fasterthan the FORCES solver on the MicroBlaze (at 100 MHz),which would fail to meet the real-time deadline of Ts =0.2 s by a factor of approximately 10. By contrast, the clockfrequency for the FPGA-based QP solver could be reduced

Page 11: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HARTLEY et al.: PREDICTIVE CONTROL USING AN FPGA 11

by a factor of ≈ 15 (reducing power requirements, andmaking a higher FPGA resource utilization factor possible),or the sampling rate increased by the same factor (improvingdisturbance rejection) while still meeting requirements. Worst-case and mean control error are competitive. A similar trend isvisible for N = 5 with the FPGA-based solver only marginallyslower than the CVXGEN solver on the PC in terms of wall-clock time.

The maximum communication time over Ethernet, exper-imentally obtained by bypassing the interface with the QPsolvers in the software component is 0.6990 ms. The valuesfor FPGA-based implementation in Table X are normalized bysubtracting this, since it is independent of the QP solver.

B. Target Calculator

The accuracy and computation time of the fixed-pointFPGA-based target calculator is compared with double andsingle precision variants of the same algorithm generated usingMATLAB Coder and with a solver generated using CVXGEN,in Table X. FORCES is not considered here since the targetcalculation problem does not have a multistage structure.While the accuracy of the FPGA-based implementation isslightly inferior to the other options (but as demonstrated byclosed-loop simulation still adequate), the solution wall-clocktime is faster than the fastest PC-based algorithm compared,and more than 200× faster on a cycle-per-cycle basis thanrunning a single precision FGM directly on the MicroBlaze,and negligible in comparison with the MPC computation time.

C. Combined System

Trajectories from the closed-loop setup, with N = 12 forthe PDIP-based MPC regulator, and the FGM-based targetcalculator running on the FPGA are shown in Fig. 6. Thereference trajectory is tracked, inputs constraints are enforcedduring transients, and the zero-value lower bound on thespoiler panels is not violated in steady state.

VIII. CONCLUSION

This paper has demonstrated the implementation of asystem-on-a-chip MPC control system, including predictivecontrol regulator and steady-state target calculation on anFPGA. A Xilinx MicroBlaze soft-core processor is used tobridge communication between the two custom QP solvers,and the outside world over Ethernet. The controller is tested inclosed-loop with a nonlinear simulation of a large airliner—aplant with substantially more states and inputs than anyprevious FPGA-based predictive controller.

The MPC regulator employs a PDIP algorithm using sin-gle precision floating point arithmetic, with a preconditionediterative method used to solve systems of linear equations.A numerical investigation shows that with preconditioning andthe correct plant model scaling, a relatively small number ofMINRES iterations is required to achieve sufficient controlaccuracy for this application. The steady-state target calculatoruses the FGM implemented using fixed point arithmetic. Thisis economical in terms of FPGA resources, and is faster than

an equivalent algorithm in floating point arithmetic on a laptopPC, while running at approximately 10% of the clock fre-quency. The whole system can fit onto a midrange FPGA, anduses less than 1/3 of the power of the laptop microprocessorwith which it is compared. Lower clock frequencies could beused to further reduce power consumption further, while stillmeeting real-time control deadlines.

REFERENCES

[1] J. M. Maciejowski, Predictive Control with Constraints. Harlow, U.K.:Prentice-Hall, 2002.

[2] E. F. Camacho and C. Bordons, Model Predictive Control. New York,NY, USA: Springer-Verlag, 2004.

[3] J. B. Rawlings and D. Q. Mayne, Model Predictive Control: Theory andDesign. Madison, WI, USA: Nob Hill Publishing, 2009.

[4] S. J. Qin and T. A. Badgwell, “A survey of industrial model predictivecontrol technology,” Control Eng. Pract., vol. 11, no. 7, pp. 733–764,Jul. 2003.

[5] K. R. Muske and T. A. Badgwell, “Disturbance modeling for offset-free linear model predictive control,” J. Process. Control, vol. 12, no. 5,pp. 617–632, 2002.

[6] G. Pannocchia and J. B. Rawlings, “Disturbance models for offset-free model predictive control,” AIChE J., vol. 49, no. 2, pp. 426–437,Feb. 2003.

[7] U. Maeder, F. Borrelli, and M. Morari, “Linear offset-free model pre-dictive control,” Automatica, vol. 45, no. 10, pp. 2214–2222, Oct. 2009.

[8] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicitlinear quadratic regulator for constrained systems,” Automatica, vol. 38,no. 1, pp. 3–20, Jan. 2002.

[9] A. Alessio and A. Bemporad, “A survey on explicit model predictivecontrol,” in Nonlinear Model Predictive Control, vol. 384, New York,NY, USA: Springer-Verlag, 2009, pp. 345–369.

[10] J. M. Maciejowski and C. N. Jones, “MPC fault-tolerant flight controlcase study: Flight 1862,” in Proc. IFAC Safeprocess Conf., Jun. 2003,pp. 9–11.

[11] C. Edwards, T. Lombaerts, and H. Smaili, Fault Tolerant Flight Control:A Benchmark Challenge. Berlin, Germany: Springer-Verlag, 2010.

[12] H.-G. Geisseler, M. Kopf, P. Varutti, T. Faulwasser, and R. Findeisen,“Model predictive control for gust load alleviation,” in Proc. IFAC Conf.Nonlinear Model Predictive Control, 2012, pp. 27–32.

[13] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert, “Con-strained model predictive control: Stability and optimality,” Automatica,vol. 36, no. 6, pp. 789–814, Jun. 2000.

[14] K. Basterretxea and K. Benkrid, “Embedded high-speed model predic-tive controller on a FPGA,” in Proc. NASA/ESA Conf. Adapt. Hardw.Syst., Jun. 2011, pp. 327–335.

[15] X. Chen and X. Wu, “Design and implementation of model predictivecontrol algorithms for small satellite three-axis stabilization,” in Proc.Int. Conf. Inf. Autom., Jun. 2011, pp. 666–671.

[16] A. Wills, A. Mills, and B. Ninness, “FPGA implementation of aninterior-point solution for linear model predictive,” in Proc. 18th IFACWorld Congr., Aug. 2011, pp. 14527–14532.

[17] N. Yang, D. Li, J. Zhang, and Y. Xi, “Model predictive controller designand implementation on FPGA with application to motor servo system,”Control Eng. Pract., vol. 20, no. 11, pp. 1229–1235, Nov. 2012.

[18] A. G. Wills, G. Knagge, and B. Ninness, “Fast linear model predictivecontrol via custom integrated circuit architecture,” IEEE Trans. Control.Syst. Technol., vol. 20, no. 1, pp. 59–71, Jan. 2012.

[19] A. Mills, A. G. Wills, S. R. Weller, and B. Ninness, “Implementation oflinear model predictive control using a field-programmable gate array,”IET Control Theory Appl., vol. 6, no. 8, pp. 1042–1054, Jul. 2012.

[20] H. Chen, F. Xu, and Y. Xi, “Field programmable gate array/systemon a programmable chip-based implementation of model predictivecontroller,” IET Control Theory Appl., vol. 6, no. 8, pp. 1055–1063,Jul. 2012.

[21] K.-V. Ling, B. F. Wu, and J. M. Maciejowski, “Embedded modelpredictive control (MPC) using a FPGA,” in Proc. 17th IFAC WorldCongr., Jul. 2008, pp. 15250–15255.

[22] L. G. Bleris, P. D. Vouzis, M. G. Arnold, and M. V. Kothare,“A co-processor FPGA platform for the implementation of real-timemodel predictive control,” in Proc. Amer. Control Conf., Jun. 2006,pp. 1912–1917.

[23] P. D. Vouzis, L. G. Bleris, M. G. Arnold, and M. V. Kothare, “Asystem-on-a-chip implementation for embedded real-time model pre-dictive control,” IEEE Trans. Control. Syst. Technol., vol. 17, no. 5,pp. 1006–1017, Sep. 2009.

Page 12: IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 1 ...cas.ee.ic.ac.uk/people/gac1/pubs/HartleyTCST13.pdf · Manuscript received November 12, 2012; revised May 1, 2013; accepted June

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY

[24] J. L. Jerez, G. A. Constantinides, and E. C. Kerrigan, “An FPGA imple-mentation of a sparse quadratic programming solver for constrainedpredictive control,” in Proc. ACM Symp. Field Program. Gate Arrays,Mar. 2011, pp. 209–218.

[25] J. L. Jerez, K.-V. Ling, G. A. Constantinides, and E. C. Kerrigan, “Modelpredictive control for deeply pipelined field-programmable gate arrayimplementation: Algorithms and circuitry,” IET Control Theory Appl.,vol. 6, no. 8, pp. 1029–1041, 2012.

[26] E. N. Hartley, J. L. Jerez, A. Suardi, J. M. Maciejowski, E. C. Kerrigan,and G. A. Constantinides, “Predictive control of a Boeing 747 aircraftusing an FPGA,” in Proc. IFAC Conf. Nonlinear Model PredictiveControl, Aug. 2012, pp. 80–85.

[27] C. V. Rao, S. J. Wright, and J. B. Rawlings, “Application of interior-pointmethods to model predictive control,” J. Optim. Theory Appl, vol. 99,no. 3, pp. 723–757, Dec. 1998.

[28] S. J. Wright, “Interior-point method for optimal control of discrete-timesystems,” J. Optim. Theory Appl., vol. 77, pp. 161–187, Apr. 1993.

[29] S. Richter, C. Jones, and M. Morari, “Computational complexitycertification for real-time MPC with input constraints based on thefast gradient method,” IEEE Trans. Autom. Control, vol. 57, no. 6,pp. 1391–1403, Jun. 2012.

[30] HDL Coder User’s Guide R2012a, The MathWorks, Natick, MA, USA,2012.

[31] MicroBlaze Processor Reference Guide—Embedded Development Kit(UG081), Xilinx Inc., San Jose, CA, USA, 2012.

[32] C. van der Linden, H. Smaili, A. Marcos, G. Balas, D. Breeds, S. Run-ham, C. Edwards, H. Alwi, T. Lombaerts, J. Groeneweg, R. Verhoeven,and J. Breeman (2011). GARTEUR RECOVER Benchmark [Online].Available: http://www.faulttolerantcontrol.nl

[33] E. C. Kerrigan, J. L. Jerez, S. Longo, and G. A. Constantinides, “Numberrepresentation in predictive control,” in Proc. IFAC Conf. NonlinearModel Predictive Control, Aug. 2012, pp. 60–67.

[34] S. Mehrotra, “On the implementation of a primal-dual interior pointmethod,” SIAM J. Optim., vol. 2, no. 4, pp. 575–601, Nov. 1992.

[35] J. L. Jerez, G. A. Constantinides, and E. C. Kerrigan, “Towards afixed point QP solver for predictive control,” in Proc. 51st IEEE Conf.Decision Control, Dec. 2012, pp. 675–680.

[36] J. L. Jerez, P. J. Goulart, S. Richter, G. A. Constantinides, E. C. Kerrigan,and M. Morari, “Embedded predictive control on an FPGA using the fastgradient method,” in Proc. Appear Eur. Control Conf., 2013, pp. 1–6.

[37] J. T. Betts, Practical Methods for Optimal Control and Estimation usingNonlinear Programming, 2nd ed. Philadelphia, PA, USA: Soc. Ind. Appl.Math., SIAM, 2010.

[38] ML605 Hardware User Guide, Xilinx Inc., San Jose, CA, USA,Feb. 2011.

[39] J. Mattingley and S. Boyd, “CVXGEN: A code generator for embeddedconvex optimization,” Optim. Eng., vol. 13, no. 1, pp. 1–27, 2012.

[40] A. Domahidi, A. U. Zgraggen, M. N. Zeilinger, M. Morari, andC. N. Jones, “Efficient interior point methods for multistage problemsarising in receding horizon control,” in Proc. 51st IEEE Conf. DecisionControl, Dec. 2012, pp. 668–674.

Edward Nicholas Hartley received the M.Eng. andPh.D. degrees from the University of Cambridge,Cambridge, U.K., in 2006 and 2010, respectively.

He is currently a Research Associate with theControl Group, University of Cambridge. His currentresearch interests include predictive control designsfor aerospace and space systems and embeddedcontrol implementations.

Juan Luis Jerez (S’11) received the M.Eng. degreein electrical engineering from Imperial College Lon-don, London, U.K., in 2009, where he is currentlypursuing the Ph.D. degree with the Circuits andSystems Research Group.

His research work is on developing tailored linearalgebra and optimization algorithms for efficientimplementation on custom parallel computing plat-forms.

Andrea Suardi received the M.Sc. degree in elec-tronic engineering and the Ph.D. degree in electron-ics and communication engineering from Politecnicodi Milano, Milano, Italy, in 2006 and 2010, respec-tively.

He is currently a Research Associate with ImperialCollege London, London, U.K. His current researchinterests include digital architecture design, in par-ticular high efficiency field-programmable gate arraybased systems for supercomputing and control appli-cations.

Jan M. Maciejowski (M’81–SM’96–F’11) receivedthe B.Sc. degree in automatic control from SussexUniversity, Sussex, U.K., in 1971, and the Ph.D.degree in control engineering from Cambridge Uni-versity, Cambridge, U.K., in 1978.

He was a Systems Engineer with Marconi Spaceand Defence Systems Ltd., Frimley, Surrey, U.K.,from 1971 to 1974, working mostly on attitudecontrol of spacecraft and high altitude balloon plat-forms. Since 1981, he has been with the Universityof Cambridge, Cambridge, where he is currently a

Professor of Control Engineering and the Head in the Information EngineeringDivision. He was a President of the European Union Control Association from2003 to 2005 and a President of the Institute of Measurement and Controlin 2002. His current research interests include the theory and applicationsof predictive control, and its application to fault-tolerant control, in systemidentification, and in the control of autonomous systems.

Prof. Maciejowski was a recipient of the Honeywell International Medalfrom the InstMC in 2008.

Eric C. Kerrigan (S’94–M’02) focuses his currentresearch on the development of efficient numericalmethods and computer architectures for solving opti-mal control, estimation, optimization and modelingproblems that arise in a variety of aerospace, andrenewable energy applications.

He is a member of IET, MOS, and SIAM. He ison the IEEE CSS Conference Editorial Board and anAssociate Editor of the IEEE TRANSACTIONS ON

CONTROL SYSTEMS TECHNOLOGY, Control Engi-neering Practice, and Optimal Control Applications

and Methods.

George A. Constantinides (S’96–M’01–SM’08)received the M.Eng. and Ph.D. degrees from Impe-rial College London, London, U.K., in 1998 and2001.

He has been a Faculty Member with ImperialCollege London, since 2002, where he leads theCircuits and Systems Group.

Dr. Constantinides was a recipient of the Eryl Cad-waladar Davies Prize for the best doctoral thesis inelectrical engineering at Imperial College London in2001, and an Imperial College Research Excellence

Award in 2006. He is a fellow of EPSRC and the British Computer Society.He is a Program (General) Chair of the ACM FPGA Symposium from 2014to 2015 and chaired FPL in 2003 and FPT in 2006.


Recommended