+ All Categories
Home > Documents > An Optimal Approach to Hardware/software Partitioning for...

An Optimal Approach to Hardware/software Partitioning for...

Date post: 23-Sep-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
28
UNU/IIST International Institute for Software Technology UNU/IIST Report No. 286 R An Optimal Approach to Hardware/software Partitioning for Synchronous Model Pu Geguang, Wang Yi, Dang Van Hung and He Jifeng September 2003
Transcript
Page 1: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

UNU/IISTInternational Institute forSoftware Technology

UNU/IIST Report No. 286 R

An Optimal Approach toHardware/software Partitioningfor Synchronous Model

Pu Geguang, Wang Yi, Dang Van Hung andHe Jifeng

September 2003

Page 2: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

UNU/IIST and UNU/IIST Reports

UNU/IIST (United Nations University International Institute for Software Technology) is a Research andTraining Centre of the United Nations University (UNU). It is based in Macau, and was founded in1991. It started operations in July 1992. UNU/IIST is jointly funded by the Governor of Macau andthe governments of the People’s Republic of China and Portugal through a contribution to the UNUEndownment Fund. As well as providing two-thirds of the endownment fund, the Macau authorities alsosupply UNU/IIST with its office premises and furniture and subsidise fellow accommodation.

The mission of UNU/IIST is to assist developing countries in the application and development of softwaretechnology.

UNU/IIST contributes through its programmatic activities:

1. Advanced development projects, in which software techniques supported by tools are applied,

2. Research projects, in which new techniques for software development are investigated,

3. Curriculum development projects, in which courses of software technology for universities in devel-oping countries are developed,

4. University development projects, which complement the curriculum development projects by aimingto strengthen all aspects of computer science teaching in universities in developing countries,

5. Schools and Courses, which typically teach advanced software development techniques,

6. Events, in which conferences and workshops are organised or supported by UNU/IIST, and

7. Dissemination, in which UNU/IIST regularly distributes to developing countries information oninternational progress of software technology.

Fellows, who are young scientists and engineers from developing countries, are invited to actively partic-ipate in all these projects. By doing the projects they are trained.

At present, the technical focus of UNU/IIST is on formal methods for software development. UNU/IISTis an internationally recognised center in the area of formal methods. However, no software technique isuniversally applicable. We are prepared to choose complementary techniques for our projects, if necessary.

UNU/IIST produces a report series. Reports are either Research R , Technical T , Compendia C orAdministrative A . They are records of UNU/IIST activities and research and development achievements.Many of the reports are also published in conference proceedings and journals.

Please write to UNU/IIST at P.O. Box 3058, Macau or visit UNU/IIST’s home page: http://www.iist.unu.edu,if you would like to know more about UNU/IIST and its report series.

Chris George, Acting Director

Page 3: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

UNU/IISTInternational Institute forSoftware Technology

P.O. Box 3058

Macau

An Optimal Approach toHardware/software Partitioningfor Synchronous Model

Pu Geguang, Wang Yi, Dang Van Hung andHe Jifeng

Abstract

Computer aided hardware/software partitioning is one of the key challenges in hardware/softwareco-design. This paper describes a new approach to hardware/software partitioning for syn-chronous communication model. We transform the partitioning into a reachability problem oftimed automata. By means of an optimal reachability algorithm, an optimal solution can beobtained in terms of limited resources in hardware. To relax the initial condition of the parti-tioning for optimization, two algorithms are designed to explore the dependency relations amongprocesses in the initial specification. Moreover, we propose a scheduling algorithm to improvethe synchronous communication efficiency further after partitioning stage. Some experimentsare conducted with model checker UPPAAL to show our approach is both effective and efficient.

Page 4: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Pu Geguang is a fellow at UNU/IIST on leave from the School of Mathematical of Bejing Univer-sity, China, where he is a Ph.D candidate. His research interest includes programming theory, co-design techniques for real-time systems, object-oriented technology. Email: [email protected]

Wang Yi is a professor at Uppsala University, Sweden. His research interests are mainly inmethods and tools for the design, verification and implementation of embedded and real timesystems. Email: [email protected]

Dang van Hung is a research fellow of UNU/IIST, on leave of absence from the Institute ofInformation Technology, Hanoi, Vietnam. His research interests include formal techniques ofprogramming, concurrent and distributed systems, real-time systems. Email: [email protected]

He Jifeng is a senior research-fellow of UNU/IIST. He is also a professor of computer scienceat the Software Engineering Institute of East China Normal University. His research interestsinclude the theories of programming languages, formal methods for safety-critical systems, par-allel and distributed systems, component-oriented computing, co-design techniques for real-timeembedded systems. Email: [email protected]

Copyright c© 2003 by UNU/IIST, Pu Geguang, Wang Yi, Dang Van Hung and He Jifeng

Page 5: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Contents i

Contents

1 Introduction 1

2 Overview of Our Partitioning Approach 3

3 Exploring Dependency Relations between Processes 43.1 Dependency Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 Algorithms for Exploring Dependency Relations . . . . . . . . . . . . . . . . . . . 7

4 Formal Model Construction 84.1 Behaviour Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Model Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 An Optimal reachability Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Improving the Communication Efficiency 135.1 System Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Experiments in UPPAAL 19

7 Conclusion 20

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 6: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section
Page 7: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Introduction 1

1 Introduction

Computer system specification is usually completely implemented as software solution. However,some strong requirements for performance of the system demand an implementation fully inhardware. Consequently, in between two extremes, Hardware/Software Codesign [24], whichstudies systematically the design of systems containing both hardware/software components, hasemerged as an important field. A critical phase of codesign process is to partition a specificationinto hardware and software components.

One of the objective of hardware/software partitioning is to search a reasonable composition ofhardware and software components which not only satisfies the constraint such as timing, butalso optimized desired quality metrics, such as communication cost, power and so on.

Several approaches based on algorithms have been developed, as described, for example, in [2,20, 21, 23, 25]. All the approaches above emphasis the algorithmic aspects, for instance, integerprogramming [20, 25], evolution algorithm [23] and simulated annealing algorithm [21] are re-spectively introduced to the partitioning process in the previous researches. These approachesare applied to different architectures and cost functions. For example, in [25], Markus pro-vides a technique based on integer programming to minimise the communication cost and totalexecution time in hardware with certain physical architecture. The common feature of theseapproaches is that the communication cost is simplified as a linear function on data transferor the relation between adjacent nodes in task graph. This assumption is reasonable in asyn-chronous communication model, but is not reasonable in synchronous communication model inwhich the cost of waiting time for communication between processes is very high. In order tomanage the synchronous model which is common in many practical systems, a new approachmust be introduced into the partitioning problem.

A number of papers in the literature have introduced formal methods into the partitioningprocess [17, 22, 3]. Some of them adopt a subset of the Occam language [16] as specificationlanguage [17, 22]. For example, in [22], Qin provided a formal strategy for carrying out thepartitioning phase automatically, and presented a set of algebraic laws to prove the correctness ofthe partitioning process. In that paper, he did not deal with the optimization of the partitioning.

Few approaches deal with the analysis of the initial specification for exploring the hidden con-currency to relax the initial condition of the partitioning for optimization. In [17], Juliano et al.provide several algebraic laws to transform the initial description of the system into a parallelcomposition of a number of simple processes. However, this method delivers a large number ofprocesses and communication channels, which not only increases the difficulty of merging thosesmall processes, but also raises the communication load between the hardware and softwarecomponents.

In this paper, we propose an optimal automatic partitioning model. In this model, we adoptan abstract synchronous architecture which is composed of a coprocessor board and a hardwaredevice such as FPGAs, ASIC etc, where the communication between them is synchronized. By

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 8: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Introduction 2

Program

Profiling

Partitioning

local

HW local memory

local

memorySW local

bus bus

Analyzing

Adjusting

system bus HW

syn commSW

Figure 1: Architecture and Partitioning Flow

means of our approach, the following goals will be achieved

• Explore the hidden concurrency, i,e, find the processes which could be executed in parallelfrom the initial sequential specification.

• Obtain the optimal performance of the overall program in terms of the limited resources inhardware. The communication waiting time between software and hardware componentsis considered as well.

• Improve the communication efficiency by moving the commands to reduce the communi-cation waiting time between hardware and software components after partitioning.

Given a specification, system designers are required to divide the specification into a set of basicprocesses (blocks) which are regarded as candidate processes for partitioning phase. In general,the next step is to select and put some processes into hardware components to obtain the bestperformance. On account of the parallel structure of software and hardware components, thehidden concurrency among the processes will relax precedence condition of the partitioning,that is, it will obtain an optimal solution from a larger search space. We design two algorithmsto explore the control and data flow dependency. To allocate the processes into the softwareand hardware components, we transform the partitioning to a reachability problem of timedautomata[10], and obtain the best solution by means of an optimal reachability algorithm.Since the synchronous communication model is adopted in our target architecture, to reducethe communication waiting time further, we adjust communication commands in the programof each component by applying a scheduling algorithm.

The paper is organized as follows. Section 2 presents the overview of our technique. Section3 explores the dependency relation between processes. Section 4 describes a formal model ofhardware/software partitioning using timed automata. In section 5, we propose a schedulingalgorithm to improve the communication efficiency. Some partition experiments are conductedin Section 6. Finally, Section 7 is a short summary of the paper.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 9: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Overview of Our Partitioning Approach 3

2 Overview of Our Partitioning Approach

In this section we present our approach to hardware/software partitioning problem. The parti-tioning flow is depicted as Figure 1.

In profiling stage, a system specification is divided into a set of basic candidate processes whichcould never be split further. However, there is a trade-off between the granularity of the can-didate precesses and the feasibility of optimisation. The smaller the candidate processes are,the greater the number of different partitions is. The large number of partitions may increasethe complexity of the computation of the optimum. Furthermore, the small candidate processeswill bring heavy communication cost. On the other hand, The candidate processes of larger sizewill restrict the number of possible partitions, but may reduce the concurrency and increase thewaiting time for communication. We leave this choice to the designers. Our approach enablesthem to repeat the profiling process as long as they are not satisfied with the software/hardwarepartitioning results according to the current granularity of candidate process.Once the designerdecide the process granularity, the initial specification is transformed into the one that is a se-quential composition of candidate processes. That is, P1; P2; . . . ;Pn , where Pi denotes the ithcandidate process.

The analyzing phase in Figure 1 is to explore the control and data flow dependency among thesequential processes P1; P2; . . . ; Pn. The data flow dependency is as important as the controlflow dependency and helps to decide whether data transfer occurs between any two processes.The details are discussed in Section 3.

Our goal is to select those processes which yield the highest speedup if moved to hardware.Namely, the total executed time of the initial program is minimised in terms of the limitedresources in hardware. The overhead of the required communication between the software andthe hardware should be included. The synchronous waiting time will be considered in the per-formance of the partitioning as well. In fact, we will find that this partitioning is a schedulingproblem which is constrained by precedence relation, synchronous communication and limited re-sources. We transform the scheduling problem into a reachability one of timed automata(TA)[10]and obtain the optimal result by means of an optimal reachability algorithm. TA is finite stateautomata added with clock variables. It has proven that TA is useful formalism to model real-time systems [18]. The verification of a system is usually converted to checking reachabilityproperties, i.e. if a certain state is reachable or not. In recent years several automatic modelchecking tools for timed automata have become available, such as UPPAAL [19], KRONOS[7]and HyTEch[13]. We will use the UPPAAL as our modelling tool to conduct some partitioningexperiments in Section 7.

When the partitioning process is finished, we will get the following form:

PS1 ; PS2 ; . . . ; PSm ‖ PH1 ; PH2 ; . . . ; PHl

where PSis (1 ≤ i ≤ m) denote the processes which are allocated in the software component,and PHis (1 ≤ i ≤ n) denote the processes which are allocated in the hardware component.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 10: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Exploring Dependency Relations between Processes 4

In the end of partitioning phase, communication commands are added to support data exchangebetween software and hardware components. To reduce the waiting time, we reorganize thesoftware and hardware components by moving those added communication commands. Forexample, let us consider the following partitioned processes P and Q.

P || Q

P1(x); Q1(z);y := f(x); q := g(z);P2(x); Q2(z);C!y; C?q;P3(x); Q3(z);

Suppose process P is implemented in software and process Q is implemented in hardware. WhereC!y denotes the output and C?q as the input. In the process P , moving the action C!y beforeP2(x) or after P3(x), and in the process Q moving the action C!q before Q2(x) or after Q3(x)will not effect the result of the program P ||Q. We assume that the estimate of the executiontime of P1, P2 and P3 is 2, 2, and 2 respectively, and the estimate of the time for the executionof Q1, Q2 and Q3 is 1, 1 and 1 respectively. Then moving C!y to the line in between y := f(x)and P2(x) will make the program running faster. We will propose a general algorithm whichcould be applied for more than two parallel processes to improve the communication efficiency.

3 Exploring Dependency Relations between Processes

3.1 Dependency Relations

Let P1; P2; . . . ;Pn be the initial sequential specification produced by the system designer in theprofiling stage. In this section we explore the dependency relations between any two processes.This is an important step in analyzing phase. Our intention is to disclose the control and dataflow relations of processes in the specification. These relations will be passed to the next stepas an input for partitioning using timed automata model. Moreover, through the analysis ofcontrol relation among processes, we will find those processes that are independent so that theycan be executed in any order on one component or in parallel on two components without anychange to the computation result specified by the original specification.

Let for process Pi, Wr(Pi) and Re(Pi), respectively, denote the set of variables modified by Pi

and the set of variables read by Pi.

The control flow dependency is represented by the relation ≺ over two processes defined asfollows.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 11: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Exploring Dependency Relations between Processes 5

Definition 1

Pi ≺ Pj =def (Wr(Pi) ∩Re(Pj) 6= ∅∨Re(Pi) ∩Wr(Pj) 6= ∅∨Wr(Pi) ∩Wr(Pj) 6= ∅)∧i < j

We call Pi as a control predecessor of Pj . If Pi ≺ Pi+1 does hold, then process Pi+1 can notstart before the completion of process Pi. Otherwise, Pi+1 can be activated before Pi leavingthe behaviour of the whole program unchanged.

Theorem 1Pi; Pi+1 = Pi+1; Pi if ¬(Pi ≺ Pi+1)

Proof:

To prove formally this property we follow the convention of Hoare and He [14] that every pro-gram can be represented as a design. A design has the form pre ` post, where pre denotes theprecondition and post denotes the postcondition. Sequential composition is formally defined asfollows [14]:

P(v, v′);Q(v, v′) =def ∃m •P(v,m) ∧Q(m, v′),Where variable lists v and v′ stand for initial and final values respectively, and m is a set offresh variables which denote the hidden observation.

The following lemma is taken from [14]:

Lemma 1(P1 ` Q1); (P2 ` Q2) = P1 ∧ ¬(Q1; ¬P2) ` Q1;Q2

Because processes Pi and Pi+1 do not satisfy relation Pi ≺ Pj , we can easily obtain Wr(Pi) ∩(Wr(Pi+1) ∪ Re(Pi+1)) = ∅ and Wr(Pi+1) ∩ (Wr(Pi) ∪ Re(Pi)) = ∅. For simplicity, AssumeWr(Pi) = {y}, Wr(Pi+1) = {z}, Re(Pi) ∩ Re(Pi+1) = {x}, where variables x, y, z stand forlist variables respectively. Let Pi = pre1 ` post1, Pi+1 = pre2 ` post2. We could notepre1, post1, pre2 and post2 as follows:

Pr1 = pre1(x, y)Pr2 = pre2(x, z)Po1 = post1(x, y, y′) ∧ x = x′ ∧ z = z′

Po2 = post2(x, z, z′) ∧ x = x′ ∧ y = y′

From the above, we could easily obtain:Pi = pre1 ` post1 = Pr1 ` Po1

Pi+1 = pre2 ` post2 = Pr2 ` Po2

(1) Po1; Po2 {rewrite Po1 and Po2}= post1(x, y, y′) ∧ x = x′ ∧ z = z′ ;

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 12: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Exploring Dependency Relations between Processes 6

post2(x, z, z′) ∧ x = x′ ∧ y = y′ {def of ; }= ∃m1,m2,m3 • post1(x, y, m2) ∧

x = m1 ∧ z = m3 ∧ post2(m1, m3, z′) ∧

m1 = x′ ∧m2 = y′ {predicate calculus}= ∃m1,m2,m3 • post1(x, y, m2) ∧

m1 = x′ ∧m2 = y′ ∧ post2(m1,m3, z′) ∧

x = m1 ∧ z = m3 {predicate calculus}= post1(x, y, y′) ∧ post2(x, z, z′) ∧ x = x′

(2) Po2; Po1 {rewrite Po1 and Po2}= post2(x, z, z′) ∧ x = x′ ∧ y = y′ ;

post1(x, y, y′) ∧ x = x′ ∧ z = z′ {def of ; }= ∃m1,m2,m3 • post2(x, z, m3) ∧

x = m1 ∧ y = m2 ∧ post1(m1,m2, y′) ∧

m1 = x′ ∧m3 = z′ {predicate calculus}= ∃m1,m2,m3 • post2(x, z, m3) ∧

m1 = x′ ∧m3 = z′ ∧ post1(m1,m2, y′) ∧

x = m1 ∧ y = m2 {predicate calculus}= post2(x, z, z′) ∧ post1(x, y, y′) ∧ x = x′

(3) From (1)(2), we establish

Po1; Po2 = Po2; Po1

In the same way, we can prove the following equation

Pr1 ∧ ¬(Po1;¬Pr2) = Pr2 ∧ ¬(Po2;¬Pr1)

From Lemma 1, the theorem is proved.

2

Let set Sj (1 ≤ j ≤ n) store all the control predecessors of Pj , and constant maxj be themaximum index of processes in Sj . To uncover the hidden concurrency among the processes,we have the following corollary.

Corollary

P1; . . . ;Pmaxj ; Pmaxj+1; . . . ;Pj ; . . . ;Pn

= P1; . . . ; Pmaxj ; Pj ; Pmaxj+1; . . . ; . . . ; Pn

if maxj < j − 1

Proof: Apply theorem 1 (j − 1−maxj) times. 2

This corollary shows each process Pk which is between processes Pmaxj and Pj could be executedin parallel with process Pj . If processes Pk and Pj are allocated in software and hardwarecomponents respectively, it should reduce the execution time of the whole program.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 13: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Exploring Dependency Relations between Processes 7

Table 1: Control and Data Flow Dependency AlgorithmsInput: P1; P2; . . . Pn Input : P1; P2; . . . Pn

Output: Set vector S Output : Set vector TMethod: Method:for j := 1 to n for i := n to 1

Sj := ∅; Ti := ∅;end for end forfor j := 2 to n for i := n to 1

for i := 1 to j − 1 j := i− 1;if (Pi ≺ Pj) while(Re(Pi) 6= ∅ ∧ j > 0)

Sj := Sj ∪ {Pi}; G := Re(Pi) ∩Wr(Pj);endif if(G 6= ∅)

endfor Ti := Ti ∪ {Pj};endfor Re(Pi) := Re(Pi)\G

end ifj := j − 1;

end whileend for

To be more concrete on the data flow specified by the initial specification, we introduce the rela-

tiond≺ between processes which is exactly the relation “read-from” in the theory of concurrency

control in databases.

Definition 2

Pid≺ Pj = i < j ∧ ∃x • (x ∈ Wr(Pi) ∩Re(Pj)∧

∀k • (i < k < j ⇒ x 6∈ Wr(Pk)))

If processes Pi and Pj satisfy relationd≺ , there is direct data transferring between them in any

execution. We call process Pi as a data predecessor of process Pj . Through this relation, weknow the data from which process, a process may need and estimate the communication timebetween them.

3.2 Algorithms for Exploring Dependency Relations

In this section, we present two algorithms. One is for finding control predecessors of each process,and the other is for finding data predecessors of each process. The two algorithms are intuitive,so we will omit the proof of their correctness here.

Set variables S and T are vectors with n components. They store the control and data prede-

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 14: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Formal Model Construction 8

cessors of each process respectively. i.e, the postconditions of the two algorithms are as follows,

∀j ≤ n • (Sj = {Pk | Pk ≺ Pj})

∀i ≤ n • (Ti = {Pk | Pkd≺ Pi})

Obviously, S1 = T1 = ∅. Table 1 shows the two algorithms.

Although control flow algorithm is very simple, set Sj (1 ≤ j ≤ n) discloses the hidden concur-rency in the sequential specification based on the corollary in the last subsection.

The two algorithms have both the complexity O(n2). Set variables S and T will provide usall necessary information on the temporal order between processes which will be the input formodelling the partitioning with timed automata in UPPAAL. For simplicity, let Di be the set ofindexes for the processes in Ti, i.e. Di = {j | Pj ∈ Ti}

4 Formal Model Construction

In this section, we transform the hardware/software partitioning into a reachbility problem oftimed automata. The timed behaviour of each process is modelled as a timed automaton, andthe whole system is composed as a network of timed automata. The timed automata modelsof all processes are similar except for some guard conditions. After the model is constructed,an optimal reachbility algorithm is applied to obtain an optimal trace in which each processis sequentially recorded whether it is allocated in hardware or software components. As modelchecker UPPAAL has implemented this algorithm in its latest version, we use the UPPAAL as ourmodelling tool.

4.1 Behaviour Analysis

Here we list some key elements of the timed automata in our model.

Share variables. Each process has two possible states which indicate whether the process isallocated in software or in hardware. We use a global variable Sti to record the state of processPi. Sti is 1 if process Pi is implemented in software, otherwise it is 0.

Precedence constraints. It is obvious that only when all the control predecessors of a pro-cess have terminated, the process has the opportunity to be executed either in hardware or in

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 15: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Formal Model Construction 9

Table 2: NotationPi Process Pi

Tokeni The number of control predecessors of Pi

Tcomi The estimated communication time of Pi

STexei The estimated execution time of Pi if implemented in softwareHTexei The estimated execution time of Pi if implemented in hardwareRhi The estimated hardware resources needed for Pi

Sti Variable indicating the state of Pi

Xi Variable recording the number of the terminated control predecessors of Pi

SR Variable indicating if the processor of software is occupiedHR Variable indicating if the hardware is occupiedHres Constant storing the total available hardware resourcesCSi Software clock for Pi

CHi Hardware clock for Pi

CCi Communication clock for Pi

Ti The set of the processes from which Pi reads data fromDi The set of the indexes of the processes that Pi reads data from

software. We use local variable Xi to denote the number of the control predecessors of Pi whichhave finished.

Resource constraints. There is only one general processor in our architecture, so no morethan one process can be executed on the processor at any time. We introduce a global variableSR to indicate whether the processor resource is occupied or not. The processor is free if SRis 1, otherwise SR is 0. As far as hardware resources are concerned, the situation is a littlecomplicated. We introduce global variable Hres to record the available resources in hardware.As the processes in hardware are also sequential like in software in our architecture, we introduceanother global variable HR to denote whether a process is executed in hardware. If HR is 1, itindicates that no process occupies the hardware. Otherwise HR is 0.

Clock variables. Local clock variables CHi and CSi represent the hardware clock and soft-ware clock for process Pi respectively. To calculate the communication time between the softwareand hardware we introduce a local clock CCi for process Pi.

Table 2 lists the variables used in our timed automata models together with their intendedmeaning. Most of these notations have been explained above.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 16: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Formal Model Construction 10

CH i = HTexe i

i iX =Token

i iX =Token

SR:=1

SR:=0

HR:=0

srun

end

i := 0

hrun

wait

i :=0CS i =STexe iCS

HR:=1

SR=1

HR=1

CH

jX j:= X +1 if

jX j:= X +1 if P i j

P i j < P

< P

Figure 2: An Ideal Model

4.2 Model Construction

In this section we present two models, one is called ideal and helps to understand the behaviourof the system, and the other is complete which takes into account all the elements includingresource and communication.

Ideal Model

Figure 2 shows the ideal model. It expresses the timed behaviour of process Pi. There are fourstates in this model. The wait state denotes that process Pi is waiting. The states srun andhrun denote that process Pi is allocated in software or in hardware respectively. When processPi finishes its computation task, it will be in the state end. Our purpose is to find the fastestsystem trace in which all processes reach their end states.

If the control predecessors of process Pi are all terminated, i.e Xi = Tokeni is satisfied, Pi isenabled to be executed in software or in hardware. When both of the components are free, it willchoose one of them nondeterministically. If both components are occupied by other processes,Pi will still be in state wait.

Suppose process Pi is decided to run in software. Once Pi has occupied the software, it sets theglobal variable SR to 0 to prevent other processes from occupying the processor. It sets thesoftware clock CSi to 0 as well. The transition between the srun state and end state can onlybe taken, when the value of the clock CSi equals STexei. As soon as the transition is taken,the variable Xj will be added one if process Pi is the control predecessor of process Pj . At thesame time, process Pi also releases the software processor. The situation is similar if process Pi

is implemented the hardware.

This ideal model shows that every process Pi may be implemented in software or in hardware.When all the processes reach its end state, it is said that reachability property of the system

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 17: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Formal Model Construction 11

Hres :=Hres− Rh i jD StΣ >0j

i

:= 1Sti

X i =Token i

ji

∋D St i| =|TΣ j SR:=0

iCS := 0 i

= STexe iCS

jji

∋D St i|Σ <|T

jD StΣ =0ji

:=0CS i

X i =Token i

i= STexe iCS

Sti:= 0

end

iRh <= Hres

SR=1,HR=1

HR=1 i

i= HTexe i:=0CH

wait

SR=1 SR:=1

HR:=0SR:=0

SR=1, HR=1

HR:=1= HTexei

CH iHR:=0HR=1

SR=1SR:=1HR:=1

CH

SR:=1

HR:=1

CH i :=0

SR:=0HR:=0

SR:=0

HR:=0

i

i

CC :=0

CC :=0

i

iHR:=1

iCC =Tcom

CC =Tcom i SR:=1

X j := X j +1 if Pi j < P

X j := X j +1 if Pi j < P

X j := X j +1 if Pi j < P

X j := X j +1 if Pi j < P

Figure 3: An Complete Model

is satisfied. For obtaining the minimal execute time of the whole system, we use a global clockin UPPAAL tool and once the optimal reachability trace is found the global clock will show theminimal execute time.

Complete Model

Now we present the complete model taking into account the communication and resource, etc.The complete model is depicted in Figure 3

In addition to the states the ideal model introduced, we solve two problems which are notinvolved before. One is how to simulate the resource allocation in the hardware component, andthe other is how to simulate the data transfer between the hardware and software components.

The approach to the first one is simple. Whether process Pi will be implemented in hardware isconsidered, the automata tests not only variable HR but also variable Hres to see if the resourcesare available for process Pi in hardware component. If the left resources are enough, process Pi

may be put into the hardware. Though there exist reusable resources in hardware such as adderand comparator, we do not need to consider here because the processes are implemented sequen-tially in hardware in terms of our target architecture. When process Pi finishes its computationtask, it will release the reusable resources for next processes.

In order to model the communication between the software and hardware, the data dependencywithin the processes has to be considered. When process Pi uses the data which is defined in otherprocesses, the data transfer will occur between them. If they are all in the same component, thecommunication time could be ignored. For example, when the processes which communicate witheach other are all in the software, they exchage data through shared memory. Supposing thatprocess Pi is put into the software component, and at least one process which will communicatewith process Pi is allocated in the hardware component, the communication between them will

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 18: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Formal Model Construction 12

take place by means of the bus or other physical communication implementation. The overheadof communication between the software and the hardware cannot be negligible, and we shouldtake it into account in our model.

Recall that variable Sti is introduced to denote that process Pi is implemented in hardware orsoftware components. When process Pi is decided to run in the software component, it shoulddo Sti := 1. Process Pi then checks whether those processes that will transfer data to it (i.e.the processes in Ti) are in software component or hardware component. If at least one of themhas been put into the hardware, the communication should be taken into account. In Figure 3,the condition Σj∈DiStj < |Ti| is a guard to express that at least one process that Pi reads datafrom is in hardware. The situation is similar if process is in hardware component (the guardbecomes Σj∈DiStj > 0 for this case).

Next, when the communication really occurs between the software and the hardware, it shouldoccupy both the software and hardware components. That is to say, no other processes wouldbe performed until the communication is finished. According to this, the two variables SR, HRare set to 0 simultaneously as long as the communication takes place. The clock CCi is setto 0 as well. For the communication time Tcommi is related to the states of procees Pi’s datapredecessors, there are two methods to estimate the value of Tcommi . One is that let Tcommi

be the probability average value of all the communication combinations. The other is that thevalue of every possible communication is computed in advance, then Tcommi choose one of themaccording to the current states of process Pi’s data predecessors.

In the end, once the communication of process Pi is finished it releases the control of the hardwareand software immediately. Process Pi will compete hardware or software resources with otherready processes.

It is worthy to point out that even if process Pk is one of the data predecessors of process Pj ,it is not necessary that there will be a non negligible time consuming communication betweenprocesses Pj and Pk. Other process Pi may be as a delegate to transfer the data for them. Thedata will not be modified by process Pi in terms of the data dependency defined before. Forexample, if both processes Pi and Pj are implemented in the hardware and they have to transferdata to the process Pk which is allocated in the software. Process Pi or Pj will be a delegate tosend all the data to process Pk. Although more than one process will communicate with processPk, the communication between the hardware and software occurs only once.

4.3 An Optimal reachability Algorithm

We have showed that the hardware/software partitioning is formulated as a scheduling problemwhich is constrained by precedence relation, limited resource, etc. In the partition model, weneed not only to check all processes could reach their end states, but also to obtain a shortestaccumulated delay trace. This is regarded as the optimal reachability problem of model checkingin timed automata.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 19: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Improving the Communication Efficiency 13

For model checking algorithm, it is necessary to translate the infinite state-spaces of timed au-tomata into finite state presentation. For pure reachability analysis, symbolic states[11] of theform (l, Z) are often used, where l is a location of the timed automata and Z is a convex set ofclock valuations called a zone. The formal definition of (l, Z) could be found in [11]. Here anabstract optimal reachability algorithm based on forward reachability analysis is given. Func-tion D(l, Z) calculates the minimal time delay in comparison with a global clock in zone Z. Thealgorithm is as follows,

Algorithm: Optimal ReachabilityAccumuTime := ∞PASSED := ∅WAITING := {(l0, Z0)}while WAITING 6= ∅ do

select (l, Z) from WAITING;if l = lreq and D(l, Z) < AccumuTime then

AccumuTime := D(l, Z);if for all (l, Z ′) in PASSED : D(l, Z ′) > D(l, Z)then

add (l, Z) to PASSED;for each next state (lnext, Znext) of (l, Z) whichis not in WAITING is added to WAITING;

return AccumuTime

The algorithm uses two sets WAITING and PASSED to store states waiting to be checked, andstates already explored respectively. This algorithm always searches the entire state-space ofthe analyzed automaton to find the optimal trace.

To generalize the minimum-time reachability, in [5], a general model called linearly pricedtimed automata(LPTA) which extends the model of TA with prices on all transitions and loca-tions is introduced to solve the minimum-cost reachability problem. Uniformly Priced TimedAutomata(UPTA)[4], as a variety of LPTA, adopts a similar algorithm of ours which uses sometechniques such as branch-and-bound to improve the searching efficiency has been implementedin the latest version of UPPAAL. In Ssection 6, we will use UPPAAL to do some experiments onhardware/software partitioning cases.

5 Improving the Communication Efficiency

After the partitioning stage is finished, we obtain two parallel processes running in software andhardware components respectively. The communication is synchronised between them. More-over, we can improve communication efficiency further by moving the communication commandsappropriately.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 20: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Improving the Communication Efficiency 14

The idea is that we find a flexible interval [ASAP, ALAP ] for each communication commandin which the communication could occur without changing the program result. This intervaldenotes the earliest and latest time when the communication command can execute relativelyto the computation time of a program. Then we apply a scheduling algorithm to decide theappropriate place of communication command to reduce the waiting time between processes.Here we propose a general algorithm which is for more than two processes in parallel.

5.1 System Modelling

Let S be a system of m processes {P1, . . . , Pm} running in parallel and synchronised by hand-shaking. All the processes start at time 0. In our partitioning problem, let m equal 2.

Description of each process Pi

• Pi has a communication trace αi = ci1c

i2 . . . ci

ni, ci

j ∈ Σi, where Σi is the alphabet of thecommunication actions of Pi.

• Pi needs a computation time Ai before it completes.

• Each cij has a “flexible” interval for the starting time [ai

j , bij ] relatively to the computation

time of Pi, aij ≤ bi

j ≤ Ai. This means that cij is enabled when the accumulated execution

time of Pi has been reached aij time units, and should take place before the accumulated

execution time of Pi reaches bij time units. bi

j and Ai can be infinity, and aij can be 0. To

be meaningful, we assume that aij ≤ ai

j+1 and bij ≤ bi

j+1 for 1 ≤ j < ni.

• Pi is either running or waiting when not yet completed. It is waiting iff it is executing acommunication action ci

j for which the co-action has not been executed.

The system S completes when all of its processes complete. Our task is to schedule communi-cation actions such that S completes at the earliest time.

We now formulate the problem precisely. The purpose of formalisation here is just to avoidambiguity and to simplify the long text in the proof when applicable. Any formalism to modelthe problem must have the capacity to express the “accumulated execution time” for processes.

For this reason, we take some idea from Duration Calculus (DC) ([8]) to formalism for theproblem.

For each process Pi, we introduce four state variables (which are mappings from [0,∞) to {0, 1})Pi.running, Pi.waiting, Pi.completed and Pi.start to express the behaviour of Pi. At a time t,the state variable Pi.running (Pi.waiting, Pi.completed and Pi.start) has the value 1 iff P isrunning (waiting, completed and start, correspondingly) at the time. These state variables are

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 21: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Improving the Communication Efficiency 15

mutually exclusive. A process Pi starts at time 0 and terminates when its accumulated executiontime has reached Ai. All processes stay in the state “complete” when after they terminate.

Systems and Assumptions

• α1, α2, . . . , αm are assumed to be matched in the sense of handshaking synchronisation.Let f be the matching function, i.e. f(ci

j , ckh) = 1 iff ci

j and ckh are matched (they are the

partners for the same communication).

• Let tij be starting time of cij (according to the unique global clock). Then tij must satisfy

the constraint for cij , i.e. ai

j ≤∫ tij0 Pi.running. dt ≤ bi

j and tij ≤ tij+1,

• Pi.waiting(t) = 1 if and only if there exists cij and ck

h such that tij ≤ t ≤ tkh and f(cij , c

kh) = 1

(Pi is waiting iff it decides to communicate and its partner is not ready).

To formalise the behaviour of communication actions as mentioned in the items mentionedabove, we introduce for each ci

j and ckh such that f(ci

j , ckh) = 1, a state variable comm(i, j, k, h).

comm(i, j, k, h)(t) = 1 iff at time t, one of the partner action (either cij or ck

h) has started andthe communication has not completed. Note that comm(i, j, k, h) = comm(k, h, i, j).

An execution of S is a set of intervals [tij , t′ij ] of the starting time and the ending time for

communication actions cij . An execution terminates at time t iff t is the termination time of the

latest process.

Question: Develop a procedure for the scheduling to produce an execution of S that terminatesat the earliest time.

In the following algorithm and example, we assume that communication takes no time forsimplicity. The algorithm is also correct when including the communication time.

5.2 Scheduling Algorithm

Because α1, . . . , αm are matched, we can easily construct a dependency graph G to express thesynchronised computation for the system S (a Mazurkiewicz trace [1], [9]) as follows. Eachnode of the graph represents a synchronised action (ci

j , ckh) with f((ci

j , ckh)) = 1 (and labelled by

(cij , c

kh)). There exists a directed edge from n1 = (ci

j , ckh) to n2 = (ci′

j′ , ck′h′) iff either i = i′ and

j′ = j + 1 or k = k′ and h′ = h + 1. G is used as an additional input for the algorithm.

Algorithm

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 22: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Improving the Communication Efficiency 16

Input: S, G

Output: Time to start each communication action for each process and time for the syn-chronous communication actions (represented by each node n of G)

Method:

(1) (Initialisation) Set waiting time vector to zero W := (0, 0, . . . , 0) (no process is waitingbefore doing any synchronisation action). Set the last communication time vector V tozero V = (0, 0, . . . , 0).

(2) Compute the minimal slice C for G which is the set of all nodes of G with no incomingedge. If C = ∅ halt the algorithm. Otherwise, for each node n = (ci

j , ckh) in C:

(2.1) Compute the global real time intervals for the enabling of cij and ck

h: I := [Wi +ai

j , Wi + bij ] and J := [Wk + ak

h,Wk + bkh]. Let K := [max{Vi, Vk},∞) be the interval of

possible time for the synchronous communication action represented by the node n.

(2.2) (Select the earliest time tn that (cij , c

kh) can take place). Let B = I ∩ J ∩K.

(a) If B 6= ∅ then tn := minB. In this case, tij := tkh := tn (no waiting time), Vi := Vk :=tn.

(b) If B = ∅ then tn := min I ∩ K if maxJ ∩ K < min I ∩ K, and update the waitingtime Wk := Wk + (min I ∩K −max J ∩K). In this case, tij := tn, tkh := maxJ ∩K,and Vi := Vk := tn. The case max I ∩K < min J ∩K is symmetric.

(3) Remove all the nodes in C and the edges leaving from them from graph G.

(4) Repeat Step 2 until G is empty.

(5) Output tij for each j, i, and tn (as the scheduled time for the communication actionsrepresented by the node n) for each node n = (ci

j , ckh) of G.

Example: Let system S as in the previous example. The dependency graph G for S is con-structed as in Fig 4.

The first execution of Step 2 is on the slice C1 = {n1}, and gives t1 = 4, W = (0, 0, 0),V = (4, 4, 0) meaning that until time t1 for the finishing of the actions represented by n1, noprocess is waiting, and that at the action represented by n1 involves P1 and P2, and terminateat time 4.

The second execution of Step 2 is on the slice C2 = {n2}, and gives t2 = 6, W = (0, 0, 0),V = (4, 6, 6) meaning that until time t2 for the finishing of the actions represented by n2, noprocess is waiting.

The last execution of Step 2 is on the slice C3 = {n3}, and gives t3 = 11, W = (1, 0, 0),V = (11, 11, 6) meaning that until time t3 for the finishing of the actions represented by n3, P1has to wait for 1 time unit. 2

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 23: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Improving the Communication Efficiency 17

P2

4 6 8 11 13 15.5P1!u P3?z P1?y

P3

13.5P2!r

5 7

P1!u

P2!r

P1?yP2!v

P3?z

P2?x

n2

n3

(0,0,0)

Execution of P1

Execution of P2

Execution of P3

t1=4 t2=6 t3=11

(1,0,0)w= (0,0,0)(4,4,0) (4,6,6) (11,11,6)V=

P1P2?x P2!v

12.510753

n1

Local Process Model

Dependency Graph G

Schedule

Figure 4: An Algorithm Execution

Theorem 2tn is the earliest time that the communication actions represented by n can take place. Hence,the algorithm gives a correct answer to the problem mentioned above.

Proof: Let Vi, V′i and Wi,W

′i be the values of the ith components of vector V and W just before

and after an execution of Step 2 respectively. We will prove by induction on the number of step2 on graph G that tn, W ′

i , tij , tkh and V ′i produced by the last application of Step 2 have the

following properties:

1. tn is the earliest time that the communication actions represented by n can take place (i.e.the constraints for the communication actions represented by n are satisfied),

2. W ′i is the waiting total time of the process i over the interval [0, tn], and

3. max{tij , tkh} ≤ tn, V ′i = V ′

k = tn.

First, from assumption that αi’s are match, and from the definition of G, G is acyclic, and eachset C produced in Step 2 has at most one node with the label including an action from oneprocess.

Basic step: Before the execution of Step 2 for each node n = (cij , c

kh) in C, Wi = Vi = Wk =

Vk = 0. The fact comm(i, j, k, h)(t) = 1 iff t ∈ [min{tij , tkh}, tn] will make the constraints for cjj

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 24: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Improving the Communication Efficiency 18

and ckh satisfied together with other desired properties. We verify this for the two cases of Step

2.

When min{tij , tkh} = thk and thk = tn (the case I ∩ J 6= ∅), it holds that tij = tn. Both Pi.waiting

and Pk.waiting have the value 0 in the interval [0, tn]. Hence,∫ tn0 Pi.running(t) dt = tn and

∫ tn0 Pk.running(t) dt = tn. Because tn = minB, ai

j ≤∫ tji0 Pi.running(t) dt ≤ bi

j and ahk ≤

∫ tkh0 Pk.running(t) dt ≤ bk

h are satisfied. Furthermore, W ′k = W ′

i = 0 is also derived.

When min{tij , tkh} = thk and thk < tn (the case I ∩ J = ∅ and max I ∩K < min J ∩K), it holdsthat in the interval [0, tn], Pk.waiting(t) has the value 1 iff t ∈ [thk , tn] and Pi.waiting(t) has the

value 0 for all t ∈ [0, tn]. Therefore, tkh =∫ tkh0 Pk.running(t) dt, and W ′

k =∫ tn0 Pk.waiting(t) dt =

∫ tntkh

Pk.waiting(t) dt = tn − tkh. Hence akh ≤ tkh

∫ tkh0 Pk.running(t) dt ≤ bk

h is satisfied. Because

in this case tij = tn, and∫ tn0 Pi.running(t) dt = tn, we have ai

j ≤∫ tji0 Pi.running(t) dt ≤ bi

j andW ′

i = 0 as well.

The fact comm(i, j, k, h)(t) = 1 iff t ∈ [min{tij , tkh}, t′n] with t′n < tn would violate COM(i, j, k, h)

because that the constraint (aij ≤

∫ t′n0 Pi.running(t) dt ≤ bi

j ∧ akh ≤

∫ t′n0 Pk.running(t) dt ≤ bk

h)is not satisfied.

Because for n, n′ ∈ C, n 6= n′ implies that n and n′ do not have a common communication action,we conclude that the three properties are verified for all n ∈ C in the first time of applicationof Step 2.

Induction step: The arguments are almost the same as for the basic step, with some smallmodification.

Let C be the slice of G for in the rth application of Step 2, and n = (cij , c

kh) be any node in C.

The fact comm(i, j, k, h)(t) = 1 iff t ∈ [min{tij , tkh}, tn] will make the constraints for cjj and ck

h

satisfied together with other desired properties. We verify this from the two cases of Step 2.

When min{tij , tkh} = thk and thk = tn (the case I ∩ J 6= ∅), it holds that tij = tn. Pi.waiting(t) hasthe value 0 in the interval [Vi, tn] and Pk.waiting have the value 0 in the interval [Vk, tn]. Hence,by the inductive hypothesis,

∫ tn0 Pi.waiting(t) dt =

∫ Vi

0 Pi.waiting(t) dt +∫ tnVi

Pi.waiting(t) dt =

Wi + 0 = Wi and∫ tn0 Pk.waiting(t) dt =

∫ Vk

0 Pi.waiting(t) dt +∫ tnVk

Pi.waiting(t) dt = Wk + 0 =

Wk. Therefore, we have that Wi = W ′i, Wk = W ′k, and

∫ tn0 Pk.running(t) dt = tn − Wk and

∫ tn0 Pi.running(t) dt = tn − Wi. Because tn = min B, we have ai

j ≤∫ tji0 Pi.running(t) dt ≤ bi

j

and ahk ≤ int

tkh0 Pi.running(t) dt ≤ bk

h are satisfied.

When min{tij , tkh} = thk and thk < tn (the case I ∩ J = ∅ and max I ∩ K < minJ ∩ K), itholds that in the interval [Vk, tn], Pk.waiting(t) has the value 1 iff t ∈ [thk , tn] and Pi.waiting(t)

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 25: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Experiments in UPPAAL 19

Table 3: Experimental results of partitioning using UPPAAL

Case Num. of Required Num. of processes Runningstudies total gates in hardware time

processes in hardware (K) after partitioning in UPPAAL(Seconds)Huffman decoder 10 44.5 5 121rmatrix multiplier 20 97.7 3 289

packdata 30 221.9 5 503

has the value 0 for all t ∈ [Vi, tn]. Therefore,∫ tkh0 Pk.running(t) dt =

∫ Vk

0 Pk.running(t) dt +∫ thkVk

Pk.running(t) dt = Vk − Wk + thk − Vk = thk − Wk, and W ′k =

∫ tn0 Pk.waiting(t) dt =

∫ Vk

0 Pk.waiting(t) dt+∫ tnVk

Pk.waiting(t) dt = Wk+tn−tkh. Hence, akh ≤ tkh

∫ tkh0 Pk.running(t) dt ≤

bkh is satisfied. Because in this case tij = tn, and

∫ tn0 Pi.running(t) dt = tn, we have ai

j ≤ tji ≤ bij

and W ′i = Wi as well.

The fact comm(i, j, k, h)(t) = 1 iff t ∈ [min{tij , tkh}, t′n] with t′n < tn would violate the constraints

for communication actions because that either the constraint (aij ≤

∫ t′n0 Pi.running(t) dt ≤

bij ∧ ak

h ≤∫ t′n0 Pk.running(t) dt ≤ bk

h) or the temporal order of actions is not satisfied.

Because for n, n′ ∈ C, n 6= n′ implies that n and n′ do not have a common communication action,we conclude that the three properties are verified for all n ∈ C in the first time of applicationof Step 2. 2

6 Experiments in UPPAAL

We have used the technique in the previous section to find optimal solution for some hard-ware/software partitioning case studies. In this section we present some of our experiments insolving them with the model checker UPPAAL version 3.3.32, running in Linux machine with256Mb memory.

After we have modelled a hardware/software partitioning problem as a network of timed au-tomata with n processes, we input the model to the UPPAAL model checker. Then we asked theUPPAAL to verify:

E <> P1.end and P2.end and ... Pn.end

This property in UPPAAL specification language specifies that there exists a trace of the automatanetwork in which eventually all n processes reach their end states.

To let UPPAAL find out the optimal solution to our problem, we choose the breadth-first modelchecking algorithm (UPPAAL offer various different algorithms) and the option “fastest trace”

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 26: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

Conclusion 20

offered by UPPAAL. A global clock variable is declared to store the execution time. When thereachability property is satisfied, the fastest trace which records the partitioning scheme will befound, and the global clock will record the minimal execution time of all the processes. Thistrace, after having been added with the necessary communication statements, can be passedinto the software compiler and hardware compiler to be implemented.

Here we use a Occam-like language as our specification language, and use the hardware compilertechnique [6] to estimate the required resources in hardware of each process. For simplicity, asresources we list here only the estimate required gates of each problem.

The experimental results for the three case studies are shown in Table 3. We assume there are15,000 gates in hardware resources.

The first one is Huffman decoder algorithm. The second is a matrix multiplier algorithm, andthe last example is a pack data algorithm in network.

7 Conclusion

This paper presents a new approach to hardware/software partitioning supporting the abstractarchitecture in which the synchronous communication takes place. After the designer decidesthe process granularity of the initial specification, the partitioning process could be carried outautomatically. We explore the relations among processes to find the hidden concurrency anddata dependency in the initial specification. These relations are as the input of timed automatato ensure the behaviours of processes are modelled correctly. Once the formal partitioningmodel is constructed with timed automata, the optimal result can be obtained by means of anoptimal reachability algorithm. To further improve the synchronous communication efficiencybetween hardware and software components, a scheduling algorithm is introduced to adjustcommunication commands after partitioning. The experiments in model checker UPPAAL clearlydemonstrated the feasibility and advantage of our proposed approach.

Acknowledgement We are grateful to Liu Zhiming, Jin Naiyong for their helpful discussionsand suggestions for the improvement of the paper.

References

[1] I. J. Aalbersberg, G . Rozenberg. Theory of Traces. Theoretical Computer Science, Vol. 60,pp1-82. 1988.

[2] S. Agrawal and R. Gupta. Dataflow-Assisted Behavioral Partitioning for Embedded Systems.Proc. Design Automation Conf ACM, N.Y. pp709-712, 1997.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 27: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

References 21

[3] E. Barros, W. Rosenstiel, X. Xiong. A Method for Partitioning UNITY Language in Hard-ware and Software. In Proc. EURODAC, September, pp220-225, 1994.

[4] G. Behrmann, A. Fehnker, T. Hune, K. G. Larsen, P. Pettersson, and J. Romijn. EfficientGuiding Towards Cost-Optimality in Uppaal. In Proceedings of the 7th International Con-ference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’01),LNCS 2031, pp174-188, 2001.

[5] G. Behrmann, A. Fehnker, T. Hune, K. G. Larsen, P. Pettersson, J. Romijn, and F. Vaan-drager. Minimum-Cost Reachability for Priced Timed Automata. In Proceedings of the 4thInternational Workwhop on Hybrid Systems: Computation and Control (HSCC’01). LNCS2034, pp147-161, 2001.

[6] J. Bowen and He Jifeng. An approach to the specification and verification of a hardwarecompilation scheme. journal of Supercomputing, 2000.

[7] M. Bozga, C. Daws, O. Maler, A. Olivero, S. Tripakis, and S. Yovine, ”Kronos: A mod-elchecking tool for real-time systems,” CAV’98, LNCS 1427, pp546-550,1998.

[8] Dang Van Hung. Real-time Systems Development with Duration Calculus: an Overview.Technical Report 255, UNU/IIST, P.O. Box 3058, Macau, June 2002.

[9] V. Diekert and G. Rozenberg, editors. Book of Traces. World Scientific, Singapore, 1995.

[10] R. Dill and D. L. Dill. A Theory for Timed Automata. InTheoretical Computer Science,Vol.125, pp183-235, 1994.

[11] D. Dill. Timing Assumptions and Verification of Finite-State Concurrent Systems. In Proc.of Automatic Verification Methods for Finite State Systems, LNCS 407, pp197-212, 1989.

[12] A. Fehnker. Bounding and heuristics in forward reachability algorithms. Technical ReportCSI-R0002, Computing Sciennce Institute Nijmegen, 1999.

[13] Thomas A. Henzinger, Pei-Hsin Ho, and Howard Wong-Toi. HyTech: A Model Checkerfor Hybird Systems. In Proc. of the 9th Int. Conf. on Computer Aided Verication (OrnaGrumberg, ed.), LNCS1254, pp460-463, 1997

[14] C. A. R. Hoare and He Jifeng. Unifying Theories of Programming. Prenticel Hall, 1998.

[15] T. Hune, K. G. Larsen, and P. Pttersson. Guided Synthesis of Control Programs UsingUPPAAL. Proc. of Workshop on verification and Control of Hybrid Systems III, ppE15-E22,2000.

[16] INMOS Ltd. The Occam 2 Programming Manual. Prentice-Hall, 1988.

[17] J. Iyoda, A. Sampaio, and L. Silva. ParTS: A Partitioning Transformation System. InWorld Congress on Formal Methods 1999 (WCFM 99) , pp1400-1419, 1999.

[18] K. G. Larsen, P. Pettersson and Wang Yi. Model-Checking for Real-Time Systems. InProceedings of the 10th International Conference on Fundamentals of Computation Theory,LNCS 965, pp62-88, 1995.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau

Page 28: An Optimal Approach to Hardware/software Partitioning for ...i.unu.edu/media/unu.edu/publication/1634/report286.pdf · hardware/software partitioning using timed automata. In section

References 22

[19] K. G. Larsen, P. Pettersson and Wang Yi. UPPAAL in a Nutshell. Int.Journal of SoftwareTools for Technology Transfer 1,1-2(Oct), pp134-152, 1997.

[20] R. Nieman and P. Marwedel. An Algorithm for Hardware/Software Partitioning UsingMixed Integer Linear Programming. Design Automation for Embedded Systems, special Issue: Partitioning Methods for Embedded Systems. Vol. 2, No. 2, pp165-193, Kluwer AcademicPublishers, March 1997.

[21] Z. Peng, K. Kuchcinski. An Algorithm for Partitioning of Application Specific System.IEEE/ACM Proc. of The European Conference on Design Automation (EuroDAC), pp316-321, 1993.

[22] Qin Shengchao and He Jifeng. An Algebraic Approach to Hardware/software Partitioning.Technical Report 206, UNU/IIST, 2000.

[23] G. Quan, X. Hu and G. W. Greenwood. Preference-driven hierarchical hardware/softwarepartitioning. In Internatitional conference on Computer Design(IEEE), pp652-657, 1999.

[24] J. Staunstrup and W. Wolf, editors. Hardware/Software Co-Design: Principles and Prac-tice. Kluwer Academic Publishers, 1997.

[25] M. Weinhardt. Ingeger Programming for Partitioning in Software Oriented Codesign. Lec-ture Notes of Computer Science 975, pp227-234, 1995.

[26] W. Wolf. Hardware-Software Co-Design of Embedded System. Proc. of the IEEE, Vol.82,No.7, pp967-989, 1994.

Report No. 286, September 2003 UNU/IIST, P.O. Box 3058, Macau


Recommended