Post on 01-Jan-2016
description
transcript
Static Translation of Stream Programs
S. M. Farhad
School of Information Technology
The University of Sydney
Parallel Programming Gap
Uniprocessor hits the physical limit Multicore, many core systems
Programming challenges now Multiple flows of control and memories Finding algorithm that can run in parallel Correctness of the program Synchronization Issues Debugging the program
2
3
Stream Programming Paradigm General purpose programming
paradigm Suitable for high performance
applications Inherent parallelism Two types of elements in a program:
Actors: computation elements Streams: channels to ship data
Stream graph: composition of actors and streams
Input and output: stream
Actor
Input channels
Output channels
StreamIt Language
StreamIt: stream programming language
Architecture independent
Modular
parallel computation
may be any StreamIt language construct
joinersplitter
pipeline
feedback loop
joiner splitter
splitjoin
filter
4
5
A Simple Filter in StreamIt
int->int filter A {
init { // empty }
work pop 1 peek 2 push 1 {
push(peek(0) + peek(1));
pop();
}
}
A
StreamIt Example
float->float pipeline Example() {add A();add B();add splitjoin {
split duplicate;add D();add E();join roundrobin;
}add G();add H();
}
6
E
F
A
B
C
split
join
D
z
1
1 1
1
1 1
1
1
Research Question?
7
G
H
A
B
D
split
join
E
z
1
1 1
1
1 1
1
1
?
Parallel System
Proc-1
Proc-2
Proc-N
88
Static Translation of Stream Programs We propose
Novel steady state analysis describing the output bandwidth of an actor depending on the input of the stream program
Quantitative analysis to resolve bottlenecks in stream programs
Algorithm for mapping actors to parallel system Scheduling mapped actors to processor
Our goal To statically optimize the throughput of a stream
program on a parallel system
99
Overview of the Proposed System
Finding a Closed Form for Steady State
Resolve Bottleneck
Find Mapping by ILP or Approximation
Schedule Actors for Processors
Bandwidth Transfer Model
Synchronous model Fundamental assumption:
Actors deliver constant output bandwidth given a constant input bandwidth
Output bandwidth of actor is normalized (scaled btw. 0..1)
Bandwidth of channel (i,j): output bandwidth of actor i multiplied with weight wij
10
i
j
wijxi
n
kkkiii xwx
1
k
11
n
jijw
)(11 zx
Finding a Closed Form for Steady State
Simultaneous Linear Functional Equations System Bandwidth transfer functions:
Resulting in a simultaneous functional equation system:
11
i
n
kkkiii xwx
1
111 zx
iii xx :
Finding a Closed Form for Steady State
Solving Equation System
Solve by Gaussian-Style equation solver Bandwidth variables are successively eliminated
Substitution Loop-breaking
Solution of the system is a linear function for each actor
, which does not depend on the predecessors depends only on the input bandwidth “z”
ii zz
Finding a Closed Form for Steady State 12
13
Mapping Actors to Processors An NP-hard problem
Takes too long time Approximation algorithm
Much faster O(n log n) Non-optimal solution
We formulate Integer Linear Programming problem considering Input, output and processing
bandwidths of each actor Processing capacity of each
processor Inter processor communication
bandwidths
P1
P2
P3G
H
A
B
D
split
join
E
z
1.0
0.5 0.5
1.0
1.0 1.0
1.0
1.0
Find Mapping by ILP or Approximation
Find Optimal Solution by Binary Search Developing a test for a
given input bandwidth “z” of stream program
Binary Search If solution is not feasible at
the mid point, new upper bound for z
If solution is feasible at the mid point, new lower bound for z
0
zsys
1
Solution space
mid
left
right
14Find Mapping by ILP or
Approximation
Test for a given z: Simple Model is a 0-1 binary variable 1 ≤ i ≤ n, 1 ≤ j ≤ p
for all i, 1 ≤ i ≤ n …… (1)
for all j, 1 ≤ j ≤ p …… (2)
where
15
p
jijy
1
1
ijy
n
iiji yU
1
1
n
jjjiii zwU
1
* )(
Find Mapping by ILP or Approximation
Bottleneck Analysis
Constraints limiting input bandwidth of the whole program Input, output and processing
bandwidth constraints
Bottleneck-free if SystemBW < ActorBW
If SystemBW > ActorBW then there is a bottleneck in the system
16Resolve Bottleneck
0
SystemBW
1
< ActorBW
1818
Overview of the Proposed System
Finding a Closed Form for Steady State
Resolve Bottleneck
Find Mapping by ILP or Approximation
Schedule Actors for Processors
Summary
Synchronous model for stream programs Novel steady state model Statically optimize the throughput of stream
programs Resolving bottleneck by simple quantitative
analysis Finding an approximation for the mapping
problem
24
Related Works
[1] Static Scheduling of SDF Programs for DSP [Lee ‘87]
[2] StreamIt: A language for streaming applications [Thies ‘02]
[3] Phased Scheduling of Stream Programs [Thies ’03]
[4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in
Stream Programs [Thies ‘06]
[5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08]
[6] Software Pipelined Execution of Stream Programs on GPUs
[Udupa‘09]
[7] Synergistic Execution of Stream Programs on Multicores with
Accelerators [Udupa ‘09]
25
Future Plan to Complete
Year Duration Milestones
2009 2 months Prepare for publication: Solving SLFE system, ILP model, Approx. Algorithm, Bottleneck resolving technique
2010
3 monthsDesign approximation/heuristic algorithm for actor placement with communication constraints between processing elements
4 months Experiment on different benchmarks and compare the results
3 months Investigate new scheduling techniques for stream programs
2 months Further publications covering both topics (at least 2)
2011
4 months Design a small stream programming language that uses our proposed technique
6 months Implement the small stream programming language that uses our proposed technique
2 months Further publications
2012 3 months Write up PhD thesis and further publication
26