[IEEE 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and...

A Cost Model for Partial Dynamic ReconfigurationMarkus Rullmann

Dresden University of Technology,01062 Dresden, Germany

Email: [email protected]

Renate MerkerDresden University of Technology,

01062 Dresden, GermanyEmail: [email protected]

Abstract—We present a new model for dynamic reconfig-uration costs. The model targets both, reconfiguration timeand data. Reconfiguration is modeled at the granularity ofreconfigurable resources, hence the model can take advantageof the partially identical configurations. We apply the model tobinary configuration data to generate cost efficient bitstreamsfor reconfigurable systems. It is shown how the model re-lates to previously established reconfiguration techniques. Theimprovements in reconfiguration time and bitstream size aredemonstrated on an example. The model also provides a measurefor the similarity of reconfigurable circuits. We present theapplication of the reconfiguration model for high level designs.The model describes in detail, which circuit elements are staticand which need to be reconfigured. We use the model as a costfunction in a high level synthesis tool to derive an allocation withminimal reconfiguration costs.

I. INTRODUCTION

Partial dynamic reconfiguration is used to adapt the FPGAconfiguration at runtime to the applications needs. While thistechnique allows a more efficient use of logic resources, thereconfiguration leads to a time overhead and overhead tostore configuration data. Most approaches consider an areamodel to assess the extra cost e.g. for task scheduling [1]. Theapplication is divided into reconfigurable modules that performthe data processing. The reconfiguration techniques considerthe reconfigurable modules as inseparable entities. Moreover,reconfiguration is treated independently of the actual config-uration. There exist a range of methods that can improvereconfiguration times, either by configuration prefetching,caching, or smart reconfiguration scheduling [2][3]. These canbe seen as system level approaches. Configuration data canbe reduced e.g. with more efficient coding [4][5]. In [6][7],the binary frame data is compressed by exploiting intra-frameand inter-frame redundancy. Configuration data compressiontechniques require additional configuration control that mustbe implemented on the FPGA.

In this paper we develop a reconfiguration model, the mod-ule transition model (MTM) describes configuration inherentlyas partial reconfiguration. Naturally, not all resources allocatedby a module need to be reconfigured, if the new module usespartially the same configuration of some resources. We modelthe re-use potential of reconfigurable resources at differentlevels: for binary reconfiguration data and at structural level.

The model is used in a tool that creates an set of configu-ration bitstreams, minimal in terms of reconfiguration timeor bitstream size. The optimization complements previous

techniques mentioned above. Existing methods to create partialbitstreams are the Xilinx modular design flow [8] and the Com-bitGen tool [9]. We demonstrate that a substantial reduction inreconfiguration costs can be achieved, when compared to thesemethods.

At structural level, the reconfiguration model provides ameasure of similarity between reconfigurable modules. Weimplemented a tool that optimizes the allocation of a design todevice resources such that minimal reconfiguration costs areachieved.

II. MODULE TRANSITION MODEL

The MTM describes the states (configurations) of a deviceand the transitions (reconfigurations) between states. The de-vice configuration that realizes a reconfigurable module putsthe device in a certain state. A transition from one state toanother is realized by partial reconfiguration of some of thedevice resources. The model can be described as a configura-tion transition graph G(N , E): Each state is represented by anode ni ∈ N . A transition from state ni to state nj is possibleif the graph contains the directed edge e = (ni, nj) ∈ Ewith ni, nj ∈ N . We further define a reconfiguration bitmapr : E → {0, 1}m, where m is the number of configurabledevice resources. If an element in r(e) is set to 1, thecorresponding resource must be reconfigured; if an elementis set to 0, the corresponding resource does not need to bereconfigured.

With the configuration transition graph two important mea-sures can be defined: The total configuration size s is givenby the sum of configuration data required for each state andthe total reconfiguration time t is given by the sum of thereconfiguration times for all transitions:

s =∑nj∈N

∣∣ ∨(ni,nj)∈E

r((ni, nj))∣∣ and t =

∑e∈E

|r(e)|. (1)

The reconfiguration costs are normalized to represent theaverage configuration size s and the average reconfigurationtime t:

s =s

Nand t =

t

Ewith N = |N | and E = |E|.

Both quantities count the number of reconfigurable elementsthat must be stored for a configuration or that must beloaded on reconfiguration. The quantities can be interpretedfor binary configuration data as follows: each reconfigurableelement corresponds to one configuration frame. The size of a

978-1-4244-1985-2/08/$25.00 ©2008 IEEE182

configuration frame (frame size) is used as an architecturedependent scaling factor. The average storage size for rawconfiguration data evaluates to:

sdata[KB] = s× frame size[KB].

Similiarly, the average reconfiguration time can be computedby using the configuration frame size and the speed of theconfiguration interface (data rate) of the device as scalingfactors:

tdata[s] = t×frame size[KB]

data rate[KB/s].

III. RECONFIGURATION COSTS FOR BINARYCONFIGURATION DATA

In this section, we show the benefits of the MTM ifit is applied to binary configuration data of Xilinx VirtexFPGAs. Therefore the model is refined such that the specialproperties of configuration bitstreams are handled correctly.We developed two new techniques to create bitstreams thatare minimal in configuration size and reconfiguration time.We also discuss previous models for dynamic reconfigurationin the context of MTM.

Dynamic reconfiguration is restricted by the configurationlogic in the FPGA. E.g. in Xilinx Virtex FPGAs, the smallestunit of configuration data is a configuration frame. The size ofsuch a frame depends on the FPGA device. A reconfigurablemodule usually consists of more than one frame. In partiallyreconfigurable devices, only frames with different configu-ration data are reconfigured. Since a frame constitutes thesmallest reconfigurable resource at binary level, the numberof frames occupied by the reconfigurable modules is m.

The following example demonstrates the application of theMTM to frame based, partial reconfiguration.Example 1: In Figure 1a the configuration transition graph

for four modules is shown. If the device is in state n1 andmust be reconfigured to establish state n2, the frames 3 and 5must be reconfigured, r((n1, n2)) = [00101].

The data of configuration frames is usually contained inbitstreams. Most FPGA-based reconfigurable systems rely onthese bitstreams, because they can be directly transferred to theconfiguration interface of the device from the memory. Apartfrom the frame data, bitstreams contain a header, commandsfor the configuration logic, and a checksum. They can becreated from raw frame data only with extra processing over-head. However, there are approaches that modify bitstreamsat runtime (e.g. [10]). With the MTM we can create minimalbitstreams for reconfigurable systems with respect to config-uration size and reconfiguration time. We assume that a fixedset B of bitstreams must be created at design time for thereconfigurable system.

A bitstream can be used to perform a transition in theMTM if it contains at least the necessary configuration frames.bl,n ∈ B is a bitstream l that performs a partial reconfigurationto establish state n. The function f : B → {0, 1}m yields theconfiguration bitmap for a bitstream. The bitmap marks all

n1[65373]

n2[65275]

n3[25356]

n4[65376]

[10010]

[00101]

[00101]

[10011]

[10111]

[00001]

(a) Configuration Transition Graph for Example 1

n1[65373]

n2[65275]

n3[25356]

n4[65376]

[10010]

[00101]

[00101][00001]

(b) Configuration Transition Graph for Example 3

Fig. 1: Configuration transition graphs for Examples 1 and 3:the nodes n1, . . . , n4 are labeled with a frame configuration(one frame per digit). The edges are labeled with r(e).

frames with 1 that are contained in bitstream bl,n. The bit-stream bl,nj can only be used to realize transition (ni, nj) ∈ Eif f(bl,nj ) ≥ r((ni, nj)).

A. Model CapabilitiesThe primary goal is to use the model for the minimization

of reconfiguration costs. We describe two novel methods toobtain minimal configuration bitstreams, both are based onthe MTM. The first method yields a set of bitstreams suchthat the transitions in the MTM can be realized directly, thesecond method yields a reduced set of bitstreams, for whichsome transitions are replaced by a sequence of transitions.

a) Reconfiguration by Direct Transition: In this modelwe assume that, each transition in E is mapped to a bitstreamin B, e.g. there exists a function c : E → B. This function canbe chosen to minimize either the total bitstream size or thetotal reconfiguration time for a configuration transition graphG.

The total bitstream size and the total reconfiguration timeis given by:

s =∑b∈B |f(b)| and t =

∑e∈E |f(c(e))| (2)

Example 2 (continued): Reconfiguration to state n1 withminimal time can be achieved with f(c((n2, n1))) = [00101],f(c((n3, n1))) = [10011], and f(c((n4, n1))) = [00001].

b) Reconfiguration with a Transition Sequence: In con-trast to previous methods, it is also possible to replace thedirect transition between states as a sequence of alternativetransitions. A sequence of transitions is defined as a path P .The substitution of a single transition e by a sequence P oftransitions can be exploited to reduce the total bitstream sizefurther, but at the extend of increased reconfiguration times.

For the reconfiguration as a sequence, we seek to find asubset of transitions E ′ ⊂ E such that for each (ni, nj) ∈ E

183

there exists a path P . The path P is an ordered set of edgese′ ∈ E ′. The function p maps each transition e ∈ E to sucha path P . We require the function p to cause minimal recon-figuration time for any path P , hence

∑e′∈setP |f(c(e

′))| isminimal. The total reconfiguration time in this reconfigurationsequence model is then:

s =∑e∈E

∑e′∈P

|f(c(e′))| (3)

Example 3: The configuration transition graph shown inFigure 1b contains only the minimal subset of transi-tions necessary to realizes all transitions from Example 1.A reconfiguration (n2, n1) is given by p((n2, n1)) =[(n2, n4)(n4, n1)]. With a function f(c((n2, n4))) = [00101]and f(c((n4, n1))) = [00001] the reconfiguration time isproportional to 3 configuration frames.

B. Previous MethodsThe MTM is able to model two already existing reconfigu-

ration models: Xilinx Modular Design Flow and CombitGen.Both models define exactly one bitstream per module, hence∀ni, nj ∈ N : c((ni, nj)) = b0,nj . The models also assumefull reconfigurability, e.g. G is complete. Thus N = |B| andE = N(N − 1).

c) Xilinx Modular Design Flow: The standard methodto implement reconfigurable modules is based on the Xilinxmodular design flow [8]. In a floorplanning step, the designerselects the areas for the reconfigurable modules on the de-vice and defines busmacros, through which the reconfigurablemodules connect to the static part of the design and othermodules. The configuration data size and reconfiguration timefor a reconfigurable module is defined by the selected area,regardless of the module contents.

The modular design flow includes all configuration framesin the bitstream for a module, e.g. |f(b)| = m. The Equations 2reduce to:

s = mN and t = mN(N − 1) (4)

Example 4: In this case, f(c((n2, n1))), f(c((n3, n1))),and f(c((n4, n1))) equals [11111].

d) CombitGen: The CombitGen tool [9] exploits the factthat the modules that are placed at the same area do notnecessarily differ in all configuration frames. Some configu-ration frames might be equal in all modules, if the associatedresources are configured equally, or if the resources are notused by the modules. CombitGen processes all modules for thesame area and excludes configuration frames, that are equalin all modules, from the reconfiguration data. If the modulesare very densely populated or if resource configuration differssubstantially, no reconfiguration frames will be saved and thereconfiguration cost are equal to that of the Xilinx modularflow.

The CombitGen tool includes all frames in a bitstream thatneed to be reconfigured for any transition to that module con-figuration. Equations 5 and 6 compute an element 1 ≤ k ≤ m

TABLE I: Comparison of the different reconfiguration modelsfor Example 1 and the ESM video filters application. The tablelists the resulting average reconfiguration times t and bitstreamsizes s for each method. The numbers are given in configu-ration frames. For the reconfiguration by direct transition thebitstreams have been optimized to achive minimal bitstreamsize or minimal reconfiguration time.

Example 1 ESM Video Filtert s t s

Xilinx 5.00 5.00 88.00 88.00CombitGen 4.00 4.00 61.00 61.00Direct Transition, Min. Size 3.50 4.00 61.00 61.00Direct Transition, Min. Time 2.33 6.50 44.83 108.50Transition Sequence, Min. Size 2.58 2.25 63.66 41.25

of the configuration bitmap:

f(b0,nj )k =∨

ni∈N∧(ni,nj)∈E

c((ni, nj))k, (5)

in case G is complete, f(b) = �f0 for all bitstreams:

f0,k =∨e∈E

c(e)k. (6)

Since at most all frames need to be reconfigured, the followingunequation holds:

|�f0| ≤ m. (7)

For the CombitGen method, Equations 2 reduce to:

s = |�f0|N and t = |�f0|N(N − 1) (8)

Example 5: In this case, f(c((n2, n1))), f(c((n3, n1))),and f(c((n4, n1))) equals [10111].

The Equations 4 and 8 differ only by factor N − 1. Hence,the total bitstream size and total reconfiguration time areminimized both at the same time, if only one bitstream permodule is used.

C. ExamplesWe investigated the reconfiguration costs computed with the

MTM for the module configurations of Example 1 and for areal world example, a set of four reconfigurable video filtersimplemented on the ESM platform [11]. We developed a toolthat generates bitstreams that are minimal in terms of sizeor reconfiguration time. The function of the bitstreams wheresuccessfully verified on the ESM platform. The results aregiven in Table I. As expected, the average reconfiguration timeand bitstream sizes are equal if the reconfiguration method ofXilinx or CombitGen are used. However, our method allows usto reduce average reconfiguration times by almost 50 % whencompared to the original Xilinx method. The penalty is anoverhead of 23 % in bitstream size. Vice versa, the bitstreamsize could be reduced by more that 50 % if the transitionsequence method is used. The reconfiguration time in this caseis comparable to the CombitGen approach.

184

IV. RECONFIGURATION COST ESTIMATION ATSTRUCTURAL LEVEL

In this section we demonstrate the benefits of the MTM forthe estimation of reconfiguration costs at a higher design level.At the structural level, the modules are given as a graph basedmodel, the module graph. The module graph of a module i ∈ Iis given by Gi(Vi, Ci). The module graph is equivalent eitherto the digital circuit or to the data flow graph of the module.The nodes v ∈ Vi in the module graph represent operations orstorage elements that are implemented in the logic resourcesof the devices, the edges c ∈ Ci in the module graph describedata transfers or connections that are realized as routes on theswitch matrix of the FPGA.

At structural level, there exists no fixed architecture forwhich reconfiguration costs can be measured in configurationframes. Instead, a virtual architecture (VA) is used as a basisfor the cost functions s, t. The VA is as a graph GVA(R,W)with R being a set of resources and W a set of routedconnections.

The allocation functions are defined for each module i asfollows: The node allocation function ai : Vi �→ R allocates anode from the module graph to a resource in the VA. The edgeallocation function is used to map the edges from the modulegraph to routed connections W in the VA, e.g. a′i : Ci �→ W .The entirety of resources and connections in the VA are thengiven by:

R =⋃i∈I

ai(∀v ∈ Vi) and W =⋃i∈I

a′i(∀c ∈ Ci). (9)

The MTM describes the use of resources in the VA bydifferent modules. Hence, if wires or logic is configuredequally by two modules, there is no reconfiguration necessaryon transition. The MTM for a VA and the allocation describesin detail, which elements need to be reconfigured. Thus, theMTM can be used as a cost function to derive a reconfigurationoptimal allocation.

The implementation of a module on GVA can be representedin vectorized form that describes the configuration of the VA. Itis assumed thatR andW are ordered sets, hence we can definea configuration vector �ci for each module i with the lengthm = |R|+ |W|. The configuration vector defines the resourceand connection configuration for the VA to implement a givenmodule. The elements in the configuration vector can describeeither the use of the element in the VA (e.g. for connections)or the configuration of that element (e.g. logic configuration).With the configuration vector, the reconfiguration function canbe derived for the VA:

r(ni, nj)k =

{1 if c(ni)k = c(nj)k0 otherwise

(10)

Example 6: Figure 2 shows an example set of three modulegraphs (a)–(c). A possible allocation is given in Table II. E.g.the nodes 1, 5, and 10 are allocated to the same resource a.The node mapping defines also the connections in the VA: e.g.the three edges (1, 2), (5, 6), and (10, 8) are mapped to thesame connection (a, b).

1 2

3(a)

4 5

6(b)

7 8

109(c)

110(a)

(c)111

(b)101

011 001

010

(d)

Fig. 2: The three module graphs (a)–(c) for Example 6. (d)shows the configuration transition graph that results from theallocation given in Table II. The nodes are labeled with themodule and the part of the configuration vector that describesthe use of connections in the VA.

TABLE II: Allocation of nodes and edges for Example 6.(a) (b) (c) a(v) a′((v1, v2)) u()

Nodes 1 5 10 a 3v 2 6 8 b 3

3 7 c 24 9 d 2

Edges (1, 2) (5, 6) (10, 8) (a, b) 3(v1, v2) (3, 2) (7, 8) (c, b) 2

(4, 5) (9, 10) (d, a) 2

The VA depends on the allocation itself and there are manydifferent VAs for each set of modules. The allocation definesalso which logic and which routes will be used in the finalimplementation of the modules.

The reconfiguration costs that result from a given allocationcan be computed instantly, if the following assumption aremade: (a) the MTM describes full reconfigurability, (b) allelements of the VA can be configured individually, and (c) theconfiguration vector describes the use of an element in the VA.We define a reuse function u : {W} �→ N that describes howoften a connection in the VA is used by the configurations ofmodules. The total configuration size for the interconnect is:

s =∑i∈I

|Ci| −∑w∈W

u(w)− 1 = |W| (11)

and the total reconfiguration time for interconnect is:

t = 2(N − 1)∑i∈I

|Ci| −∑w∈W

2(u(w)− 1)u(w). (12)

The configuration costs for resources can be computed simi-larly.Example 7: In Table II, the reuse function u is given for

resources and connections. The connection (c, b) is used intwo configurations, hence u((c, b)) = 2.

Since the input graphs are fixed, the reconfiguration costscan be reduced with an optimal allocation function that max-

185

TABLE III: Resource usage and reconfiguration costs for 3task sets, achieved with a simulated annealing optimization.The table shows the average resource use (Module Size) andthe reconfigurable resources (Configuration Size) for two opti-mization criteria: minimal bitstream size (min s) and minimalreconfiguration time (min t).

Module Set Module Size Configuration Size(No. of Modules) min s min t

Slices Wires Slices Wires Slices Wires

rgb (2) 218.50 844.00 92.50 436.00 92.50 436.00edge (3) 167.33 562.67 15.33 90.67 15.33 85.33operators (8) 126.75 284.00 59.75 164.00 44.14 140.00

imizes the reuse of elements in VA and hence, minimizes sand t.

A. ExamplesThe MTM provides a reasonable model to optimize module

implementations at the structural level. In the following exam-ple, we optimized the allocation of a data flow graph to achievelow reconfiguration costs. The nodes of the data flow graphwere scheduled and bound to functional macros. The allocationto macro instances determines the area and the connectivityof the resulting data path. The data paths of different modulescan be compared in terms of macro instances and connectivityto assess the expected reconfiguration costs, based on theVA. We employed a simulated annealing technique to achievea solution with minimal cost.

For the simulated annealing, we used two different costfunctions: At first, we minimized the total configuration size(min s) and second the total reconfiguration time was min-imized (min t). The reconfigurable resources were measuredin terms of configured slices (logic) and wires (connections).

Table III shows the results for three module sets: rgbcontains an rgb to ycrcb color conversion, edge contains ahorizontal, a vertical and a 2D Sobel filter, and operatorscontains a 32 bit math operator in each module (+, −, 2×, ×,min, max, abs−, abs+). The results show that a considerableamount of logic and connections can be re-used in the con-figurations, which reduces reconfiguration cost in the imple-mentation. The module set operators highlights the differencebetween the resources that differ with all modules, whichmust be stored as configuration data (59.75 Slices/164 Wires),and the difference between individual modules, which needreconfiguration(44.14Slices/140 Wires).

V. SUMMARY

We have described the MTM model to analyse reconfigura-tion costs at different levels of abstraction. The model providesmeasures to evaluate the total bitstream size and the totalreconfiguration time, which are not necessarily equivalent.When the model is applied to binary configuration data,absolute costs can be measured. For this case we providedan extension that covers the use of a finite set of bitstreams.The configuration bitstreams which realize all transitions can

be minimal in reconfiguration time or in terms of configurationcosts.

The model can also be used to estimate the reconfigurationcosts at a higher design level. The benefits are twofold: Themodel provides a measure for the similarity of modules, e.g.the reconfiguration costs that are inherent in the modules. Also,the identified similarity can be passed on to the implementationtools. The tools can use the similarity information to generatecost optimal implementations that decrease the reconfigurationcosts at the binary level[12]. The analysis at a higher level hasconsiderable advantages: the resource configuration can be op-timized with much better granularity, and the operations haveno fixed allocation. The MTM can be used to determine theallocation that minimizes the reconfiguration costs. Anotherapplication is reconfigurable architecture design: The VA fora given allocation describes a reconfigurable architecture thatcan implement the modules set with minimum reconfigurationcosts.

REFERENCES

[1] H. Walder and M. Platzner, “Online scheduling for block-partitionedreconfigurable devices,” in Design, Automation and Test in EuropeConference and Exhibition, 2003, pp. 290–295.

[2] S. Hauck, “Configuration prefetch for single context reconfigurable co-processors,” in Proceedings of the 1998 ACM/SIGDA sixth internationalsymposium on Field programmable gate arrays, 1998, pp. 65–74.

[3] Z. Li, K. Compton, and S. Hauck, “Configuration caching managementtechniques for reconfigurable computing.” in FCCM, 2000, pp. 22–38.

[4] P. Stepien and M. Vasilko, “On feasibility of fpga bitstream compressionduring placement and routing,” Field Programmable Logic and Appli-cations, 2006. FPL ’06. International Conference on, pp. 1–4, August2006.

[5] M. Huebner, M. Ullmann, F. Weissel, and J. Becker, “Real-time con-figuration code decompression for dynamic fpga self-reconfiguration,”Parallel and Distributed Processing Symposium, 2004. Proceedings.18th International, April 2004.

[6] J. H. Pan, T. Mitra, and W.-F. Wong, “Configuration bitstream compres-sion for dynamically reconfigurable fpgas,” Computer Aided Design,2004. ICCAD-2004. IEEE/ACM International Conference on, pp. 766–773, November 2004.

[7] Z. Li and S. Hauck, “Configuration compression for virtex fpgas,” inField-Programmable Custom Computing Machines, 2001. FCCM ’01.The 9th Annual IEEE Symposium on, 2001, pp. 147–159.

[8] Xapp290 – Two Flows for Partial Reconfiguration: Module Based orDifference Based, Xilinx Inc., September 2004.

[9] C. Claus, F. H. Muller, J. Zeppenfeld, and W. Stechele, “A new frame-work to accelerate Virtex-II pro dynamic partial self-reconfiguration,”in IEEE International Parallel and Distributed Processing Symposium,2007. IPDPS 2007., March 2007, pp. 1–7.

[10] H. Kalte, G. Lee, M. Porrmann, and U. Ruckert, “Replica: A bitstreammanipulation filter for module relocation in partial reconfigurable sys-tems,” Parallel and Distributed Processing Symposium, 2005. Proceed-ings. 19th IEEE International, April 2005.

[11] D. Gohringer, M. Majer, and J. Teich, “Bridging the gap betweenrelocatability and available technology: The erlangen slot machine,”in Dynamically Reconfigurable Architectures, ser. Dagstuhl SeminarProceedings, P. M. Athanas, J. Becker, G. Brebner, and J. Teich, Eds.Internationales Begegnungs- und Forschungszentrum fur Informatik(IBFI), Schloss Dagstuhl, Germany, 2006.

[12] M. Rullmann and R. Merker, “A reconfiguration aware circuit mapperfor fpgas,” in IEEE International Parallel & Distributed ProcessingSymposium - IPDPS 2007, 14th Reconfigurable Architectures Workshop,2007.

186

Date post:	08-Dec-2016
Category:	Documents
Upload:	renate
View:	212 times
Download:	0 times

[IEEE 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and...

Documents