+ All Categories
Home > Documents > Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Date post: 12-Jan-2016
Category:
Upload: walt
View: 36 times
Download: 0 times
Share this document with a friend
Description:
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. S. M. Farhad PhD Student Supervisor : Dr. Bernhard Scholz Programming Language Group School of Information Technology University of Sydney. Abstract. - PowerPoint PPT Presentation
Popular Tags:
37
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School of Information Technology University of Sydney 1
Transcript

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

S. M. FarhadPhD Student

Supervisor: Dr. Bernhard ScholzProgramming Language Group

School of Information TechnologyUniversity of Sydney

1

Abstract

Synchronous data flow (SDF) differs from traditional data flow

The schedule of SDF nodes can be done at compile time (statically)

Contribution of this paper: Develop theory for static scheduling of SDF

programs on single or multiple processors

2

Introduction

Need to depart from the simplicity of von Neumann computer architecture

Programming signal processors using large grain data flow languages [W. B. Ackerman 82] Ease the programming Enhancing the modularity of code Describe algorithms in more naturally Concurrency is immediately evident from program

description

3

Data Flow Analysis [W. B. Ackerman 82]1. P = X + Y

2. Q = P/Y

3. R = X*P

4. S = R – Q

5. T = R*P

6. RESULT = S/T

Many of these instructions can run in parallel as long as some constraints are met

These constraints can be represented by a graph

Node represents instructions

Arrow between nodes represents constraints

So, the permissible computation sequence can be for example (1, 3, 5, 2, 4, 6), (1, 2, 3, 5, 4, 6) and others.

4

Sequencing Constraints

(1) P = X + Y

(3) R = X*P (2) Q = P/Y

(4) S = R - Q (5) T = R*P

(6) RESULT = S/T

5

The Data Flow Paradigm

A program is divided into pieces (nodes or blocks) which can execute whenever input data are available

An algorithm can be described as data flow graph Node representing function Arc representing data paths

Signal processing algorithms can also be described as data flow graph Node is atomic or non-atomic function Arc is signal path

6

The Data Flow Paradigm Contd. The complexity of the functions (granularity) will

determine the amount of parallelism available No attempt to exploit concurrency inside a block The functions within the blocks can be specified

using von Neumann programming techniques The blocks can themselves represent another data

flow graph (hierarchical) LGDF is ideally suited for signal processing

7

Synchronous Data Flow Graphs A block is invoked when input available When it invoked it consumes a fixed number of

input samples on each input path and produces fixed number of output samples

A block is synchronous if we can specify a priori its input and output samples when it is invoked

Assuming that the signal processing system repetitively apply an algorithm to an infinite sequence of data

8

A synchronous data flow graph

BA

C

bc e j

d

fg

h

i

SDF graph requires buffering the data samples passed between blocks and schedule blocks when data are available (static approach)

This could be done dynamically (runtime supervisor, costly approach)

9

A synchronous data flow graph SDF graphs can be scheduled statically (at

compile time) regardless of the number of processors

No need to have dynamic control Communication between nodes and processors

is set up by the compiler so no runtime control Thus the LGDF paradigm gives the programmer

a natural way for programming with evident concurrency

10

Scheduling an SDF graph

Schedule blocks onto processors in such a way that data is available during its invocation

Assumptions The SDF graph is non terminating (without dead

lock) The SDF graph is connected

Goal is to find a periodic admissible parallel schedule (PAPS also PASS)

11

Construction of a PASS

Topology matrix

12

21

3

c e

d

fg

i

1

2

3

)1(

0

0

0

gi

fd

ec

Construction of a PASS Replace each arc with FIFO queue to pass data from

one block to another (vary) Vector b(n) contains the queue sizes of all the buffers

at time n For sequential schedule only one block can be

invoked at a time v(n) is the vector of blocks invoked at time n

)2(

1

0

0

or

0

1

0

or

0

0

1

)(

nv

Construction of a PASS

The change in the buffer size caused by invoking a node is

A unit delay on an arc from A to B means that a n-th sample consumed by node B is (n-1)-th sample produced by node A

So the first sample consumed by destination block is not produced by the source (part of initial state of arc buffer)

)3()()()1( nvnbnb 2

1

3

1 1

1

1

1

2

D

2D

Construction of a PASS

Because of this initial condition block 2 can be invoked once and block 3 can be invoked twice before block 1 is invoked at all

Delay therefore affect the way the system starts up

21

3

1 1

1

1

1

2

D

2D

)4(2

1)0(

b

Construction of a PASS

Given this computation model (eqn. 1 - 4) Find necessary and sufficient conditions for

existing a PASS, and hence a PAPS Find practical algorithms that provably finds a

PASS if one exists Find a practical algorithms that construct

reasonable (not necessarily optimal) PAPS, if a PASS exists

Necessary condition for existing a PASS

Where s is the number of nodes or blocks in the graph

Definition 1: an admissible sequential schedule is a non-empty ordered list of nodes such that if the nodes are executed in sequence given by , the amount of buffer will remain non negative and bounded. Each node must appear in at least once

1rank s

Quick reminder of rank of a matrix

18

0000

0000

1100

1021

A

52 6 3

22 0 0

0121

31 4 2

A

Necessary condition for existing a PASS Theorem 1: For a

connected SDF graph with s nodes and topology matrix Γ, rank (Γ) = s -1 is a necessary condition for a PASS to exist.

PASS of period p

(3)=> b(p) = b(0) + Γq

where

2

1

3

1

1

2

1

12

3

2

1

)(

2

1

1

102

120

011

q

1

0

)(p

n

nvq

Necessary condition for existing a PASS Since the PASS is periodic, we can write

Since the PASS is admissible, the buffers must remain bounded, by definition 1. The buffers remain bounded if and only if

where O is a vector full of zeros For q ≠ O, this implies that rank (Γ) < s where s

is the dimension of q. But rank (Γ) can be either s or s – 1, and so it must be s – 1 [Lemma 3]

20

qnbnpb )0()(

Oq

Necessary condition for existing a PASS Theorem 1

indicates that if we have a SDF graph with a topology matrix of rank s, then the graph is somehow defective and no PASS can be found for it

2

1

3

1

1

2

1

12

3

1

1

3)(rank

102

110

011

s

Necessary condition for existing a PASS Theorem 2: For a connected SDF graph with s

nodes and topology matrix Γ, and with rank(Γ) = s – 1, we can find a positive integer vector q ≠ O such that Γq = O where O is the zero vector.

Definition 2: A predecessor to a node x is a node feeding data to x.

Necessary condition for existing a PASS Definition 3: (Class S algorithm) Given a positive

integer vector q such that Γq = O and an initial state for the buffers b(0), the ith node is runnable at a given time if it has not been run times and running it will not cause a buffer size to go negative. A class S algorithm is any algorithm that schedules a node if it is runnable, updates b(n) and stops only when no more nodes are runnable. If class S algorithms terminates before it has scheduled each node the number of times specified in the q vector, then it is said to be deadlocked.

23

iq

Necessary condition for existing a PASS Theorem 3: Given a SDF graph with topology

matrix Γ and given a positive integer vector q s.t. Γq = O, if a PASS of period p = exists, where is a row vector full of ones, any class S algorithm will find such a PASS.

24

qT1T1

Necessary condition for existing a PASS

1

2

1

1

1

1

1

2

1

2

1

2

D

(a) (b)

Two SDF graph with consistent sample rates but no admissible schedule

Necessary condition for existing a PASS Theorem 4: Given a SDF graph with topology

matrix Γ and given a positive integer vector q s.t. Γq = O, a PASS of period p = exists if and only if a PASS of period Np exists for any integer N.

Theorem 4 tells us that it does not matter what positive integer vector we use from the null space of the topology matrix, so we can simplify our system by using the smallest such vector, thus obtaining a PASS with minimum period.

26

qT1

Class S algorithm given the theorems1) Solve for the smallest positive integer vector

2) Form an arbitrary ordered list L of all nodes in the system

3) For each , schedule if it is runnable, trying each node once

4) If each node has been scheduled times, STOP

5) If no node in L can be scheduled, indicate a deadlock

6) Else goto 3 and repeate

27

)(q

L

q

Constructing a PAPS

If a workable schedule for a single processor can be generated then a workable schedule for a multiprocessor system can also be generated

First step is to construct an acyclic precedence graph for J period of the PASS by class S algorithm

28

Construct an acyclic precedence graph by example This graph is neither

acyclic nor a precedence graph

Possible minimum PASS is {1, 3, 1, 3}, {3, 1, 1, 2} or {1, 1, 3, 2} each with period 4.

{2, 1, 3, 1} not a PASS because node 2 is not immediately runnable

29

31

2

1 2

1

2

D

2D

1

Construct an acyclic precedence graph

1

1

2

3

1

1

2

3

3

1

1

2

J=1 J=2

Next step constructing a parallel schedule By critical path method [Adam 74] or by Hu-level

scheduling algorithm [T. C. Hu 61] A level is determined for each node in the

acyclic precedence graph, where the level of a given node is the worst case of the total of the runtimes of nodes on a graph from the given node to a terminal node of the graph

The terminal node is a node with no successor If there is no terminal node then one can be

created with zero runtime

31

Hu-level scheduling algorithm

1

1

2

3

1

1

2

3

3

1

1

2

J=1 J=2

3

3

3

26

6

6

5 3

32

3

Constructing a parallel schedule Hu-level scheduling algorithm simply

schedules available nodes with the highest level first

When there are more than available nodes with the same highest level than there are processors, a reasonable heuristic is to schedule the ones with the longest runtime first

33

Constructing a parallel schedule

PROC 1

PROC 2

3

1 1 2

PROC 1

PROC 2

3

1 1 2

1 3

1 2

J=1 J=2

Two processors, runtime of nodes 1,2,3 are 1, 2,3 time units respectively

Limitations of Model

Do not greater scale conditional control flow like general purpose languages

Asynchronous graphs Connecting to the outside world Data dependent runtime of blocks

Summary

This paper describes the theory necessary to develop a signal processing programming methodology that offers Programmer convenience Natural way to describe signal processing Readily use the available concurrency

36

Question?

Thank you

37


Recommended