Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Mapping and Scheduling StreamingApplications using SMT Solvers
Pranav Tendulkar
Supervisors:Dr. Oded Maler Dr. Peter Poplavko
Verimag, FRANCE
13 October 2014
Tendulkar Mapping/scheduling for many-core 1 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core Processors Everywhere
Cars
Phones
Space-shuttle
Tablets Laptops
Cameras
Smart-TV
Tendulkar Mapping/scheduling for many-core 2 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core Processors Everywhere
source : http://www.csl.cornell.edu/courses/ece5745/handouts.html
Tendulkar Mapping/scheduling for many-core 3 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core Processors Everywhere
source : http://www.csl.cornell.edu/courses/ece5745/handouts.html
Tendulkar Mapping/scheduling for many-core 3 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core systems
How To:
Deploy the application to the platform
Decide number of processors to use?
Allocate tasks to processors and schedule them
Tendulkar Mapping/scheduling for many-core 4 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core systems
How To:
Deploy the application to the platform
Decide number of processors to use?
Allocate tasks to processors and schedule them
Tendulkar Mapping/scheduling for many-core 4 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core systems
How To:
Deploy the application to the platform
Decide number of processors to use?
Allocate tasks to processors and schedule them
Tendulkar Mapping/scheduling for many-core 4 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-core systems
How To:
Deploy the application to the platform
Decide number of processors to use?
Allocate tasks to processors and schedule them
Tendulkar Mapping/scheduling for many-core 4 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Our Deployment Framework
Constraints
Performance
Mem
ory
#Pro
cess
ors
Platform Model
Application Model
OptimizationTechniques
CodeSolution
Tendulkar Mapping/scheduling for many-core 5 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Our Deployment Framework
Constraints
Performance
Mem
ory
#Pro
cess
ors
Platform ModelApplication Model
OptimizationTechniques
CodeSolution
Tendulkar Mapping/scheduling for many-core 5 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Our Deployment Framework
Constraints
Performance
Mem
ory
#Pro
cess
ors
Platform ModelApplication Model
OptimizationTechniques
CodeSolution
Tendulkar Mapping/scheduling for many-core 5 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Our Deployment Framework
Constraints
Performance
Mem
ory
#Pro
cess
ors
Platform ModelApplication Model
OptimizationTechniques
CodeSolution
Tendulkar Mapping/scheduling for many-core 5 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Our Deployment Framework
Constraints
Performance
Mem
ory
#Pro
cess
ors
Platform ModelApplication Model
OptimizationTechniques
Code
Solution
Tendulkar Mapping/scheduling for many-core 5 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Our Deployment Framework
Constraints
Performance
Mem
ory
#Pro
cess
ors
Platform ModelApplication Model
OptimizationTechniques
CodeSolution
Tendulkar Mapping/scheduling for many-core 5 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Application Model
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relationsannotated with execution time
Tendulkar Mapping/scheduling for many-core 6 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Application Model
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relationsannotated with execution time
Tendulkar Mapping/scheduling for many-core 6 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Application Model
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relations
annotated with execution time
Tendulkar Mapping/scheduling for many-core 6 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Application Model
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relations
annotated with execution time
Tendulkar Mapping/scheduling for many-core 6 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment Problem
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relations
Deployment Solution
C F E D H J
A B G I
Time
P1
P2
Pro
cess
ors
Mapping : Task⇒ Processor
Scheduling : Task⇒ Time
Tendulkar Mapping/scheduling for many-core 7 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment Problem
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relations
Deployment Solution
C F E D H J
A B G I
Time
P1
P2
Pro
cess
ors
Mapping : Task⇒ Processor
Scheduling : Task⇒ Time
Tendulkar Mapping/scheduling for many-core 7 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment Problem
Task Graph
D
E
F
G
B
C
A
H
I
J
Tasks : Software procedure
Edges : Precedence relations
Deployment Solution
C F E D H J
A B G I
Time
P1
P2
Pro
cess
ors
xC
Mapping : Task⇒ Processor
Scheduling : Task⇒ Time
Tendulkar Mapping/scheduling for many-core 7 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment Problem
Solution1:
C F E D H J
A B G I
Time
P1
P2
Pro
cess
ors
Solution2:
A B C D E F G H I J
Time
P1
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 8 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment Problem
Solution1:
C F E D H J
A B G I
Time
P1
P2
Pro
cess
ors
Solution2:
A B C D E F G H I J
Time
P1
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 8 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Solution space is large
Tendulkar Mapping/scheduling for many-core 9 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment problem
How to:find optimal solutions in exponential design space.
model complex hardware which has Processors, Network, DMA
evaluate multiple criteriaLatencyMemory usedProcessors used...
Tendulkar Mapping/scheduling for many-core 10 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment problem
How to:find optimal solutions in exponential design space.
model complex hardware which has Processors, Network, DMA
evaluate multiple criteriaLatencyMemory usedProcessors used...
Tendulkar Mapping/scheduling for many-core 10 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Deployment problem
How to:find optimal solutions in exponential design space.
model complex hardware which has Processors, Network, DMA
evaluate multiple criteriaLatencyMemory usedProcessors used...
Tendulkar Mapping/scheduling for many-core 10 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Outline
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 11 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Overview
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 12 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model of Computation
Synchronous Dataflow graphs (SDF)by Edward Lee and David Messerschmitt in 1987
represents Streaming Applications
Computation
Input output
Tendulkar Mapping/scheduling for many-core 13 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model of Computation
Synchronous Dataflow graphs (SDF)by Edward Lee and David Messerschmitt in 1987
represents Streaming Applications
Computation
Input output
Tendulkar Mapping/scheduling for many-core 13 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model of Computation
Synchronous Dataflow graphs (SDF)by Edward Lee and David Messerschmitt in 1987
represents Streaming Applications
Computation
Input output
Tendulkar Mapping/scheduling for many-core 13 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre
Blur1
Blur0
Blur2
Blur3
Post
images images
Tendulkar Mapping/scheduling for many-core 14 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre Blur Post
SDF Graph
4 1 1 4
Pre
Blur1
Blur0
Blur2
Blur3
Post
Task Graph
Actors
Edges
Rates
Tendulkar Mapping/scheduling for many-core 15 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre Blur Post
SDF Graph
4 1 1 4Pre Blur Post
Pre
Blur1
Blur0
Blur2
Blur3
Post
Task Graph
Actors
Edges
Rates
Tendulkar Mapping/scheduling for many-core 15 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre Blur Post
SDF Graph
4 1 1 4
Pre
Blur1
Blur0
Blur2
Blur3
Post
Task Graph
Actors
Edges
Rates
Tendulkar Mapping/scheduling for many-core 15 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre Blur Post
SDF Graph
4 1 1 4
Pre
Blur1
Blur0
Blur2
Blur3
Post
Task Graph
Actors
Edges
Rates
Tendulkar Mapping/scheduling for many-core 15 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre Blur Post
SDF Graph
4 1 1 4
Pre
Blur1
Blur0
Blur2
Blur3
Post
Task Graph
Actors
Edges
Rates
Actor Blur is compact representation of data parallel tasks.
All Blur tasks have same properties such as execution time.
Tendulkar Mapping/scheduling for many-core 15 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Synchronous DataFlow
Pre Blur Post
SDF Graph
4 1 1 4
Pre
Blur1
Blur0
Blur2
Blur3
Post
Task Graph
Actors
Edges
Rates
Actor Blur is compact representation of data parallel tasks.All Blur tasks have same properties such as execution time.
Tendulkar Mapping/scheduling for many-core 15 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Split-Join Graphswe use split-join graphs : restriction of SDF
still covering perhaps 90% of use cases in the literature
a simple example:
A B Cα 1/α
α : spawn and split
1/α: wait and join
A0
B1
B0
. . .
Bα−1
C0
Tendulkar Mapping/scheduling for many-core 16 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Split-Join Graphswe use split-join graphs : restriction of SDF
still covering perhaps 90% of use cases in the literature
a simple example:
A B Cα 1/α
α : spawn and split
1/α: wait and join
A0
B1
B0
. . .
Bα−1
C0
Tendulkar Mapping/scheduling for many-core 16 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Split-Join Graphswe use split-join graphs : restriction of SDF
still covering perhaps 90% of use cases in the literature
a simple example:
A B Cα 1/α
α : spawn and split
1/α: wait and join
A0
B1
B0
. . .
Bα−1
C0
Tendulkar Mapping/scheduling for many-core 16 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Restrictions compared to general SDF
Split-join does not support:
Stateful actors
Non-proportional rates
Initial tokens and cyclic paths
A B
2 3
32
4
Tendulkar Mapping/scheduling for many-core 17 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Restrictions compared to general SDF
Split-join does not support:
Stateful actors
Non-proportional rates
Initial tokens and cyclic paths
A B2 3
32
4
Tendulkar Mapping/scheduling for many-core 17 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Restrictions compared to general SDF
Split-join does not support:
Stateful actors
Non-proportional rates
Initial tokens and cyclic paths
A B2 3
32
4
Tendulkar Mapping/scheduling for many-core 17 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Overview
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 18 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = true
UNSATout0 = false
&out1 = true
SATin0 = true,
in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = true
SATin0 = true,
in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
SATisfiability solver (SAT / SMT)
Boolean variablesin0, in1, in2 ...out0, out1, out2 ...
Constraintsout0 = in0 ∨ in1 ⊕ in2 ...
SAT solver
variables
constraints
out0 = true&
out1 = trueUNSAT
out0 = false&
out1 = trueSAT
in0 = true,in1 = false, ...
SMT extends SAT by numeric variables and constants
Tendulkar Mapping/scheduling for many-core 19 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description Variables
Start time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
A0
B0
Time
Pro
cess
ors
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
B2 B1
TimeP
roce
ssor
s
OR
B1 B2
Time
Pro
cess
ors
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Encoding deployment with constraints
A0
B1
B0
B2
B3
C0
Task Graph
Actor A B CTasks A0 B0 B1 B2 B3 C0
Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0
Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0
Duration dA dB dC
Precedence ConstraintsxB0 ≥ (xA0 + dA)
Mutual Exclusion Constraintsif (pB1 = pB2) then
xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)
Latency CostLatency = (xC0 + dC)
Tendulkar Mapping/scheduling for many-core 20 / 52
A0 B0 B2
B1 B3 C0
Latency
Time
Pro
cess
ors
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-criteria Problem
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
P4
Latency = 3#Proc = 4
Latency
Pro
cess
ors
(4,2)
(3,4)Pareto Set
Tendulkar Mapping/scheduling for many-core 21 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-criteria Problem
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
P4
Latency = 3#Proc = 4
Latency
Pro
cess
ors
(4,2)
(3,4)
Pareto Set
Tendulkar Mapping/scheduling for many-core 21 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-criteria Problem
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
P4
Latency = 3#Proc = 4
Latency
Pro
cess
ors
(4,2)
(3,4)
Pareto Set
Tendulkar Mapping/scheduling for many-core 21 / 52
Conflicting Criteria
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Multi-criteria Problem
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
P4
Latency = 3#Proc = 4
Latency
Pro
cess
ors
(4,2)
(3,4)Pareto Set
Tendulkar Mapping/scheduling for many-core 21 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
Upper Bound UpperB
ound
Latency
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
Upper Bound UpperB
ound
Latency
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
A0
B0 B2
B1 B3 C0
Time
P1
P2
P3
Latency = 5#Proc = 3
Upper Bound UpperB
ound
Latency
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
A0 B0 B2
B1 B3 C0
Time
P1
P2
Latency = 4#Proc = 2
A0
B0 B2
B1 B3 C0
Time
P1
P2
P3
Latency = 5#Proc = 3
Upper Bound UpperB
ound
Latency
Pro
cess
ors
SAT
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
Latency = 4#Proc = 2
Not Possible
Upper Bound UpperB
ound
Latency
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
Latency = 4#Proc = 2
Not Possible
Latency = 2#Proc = 1
Also Not Possible
Upper Bound UpperB
ound
Latency
Pro
cess
ors
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Problem Monotonicity
Latency = 4#Proc = 2
Not Possible
Latency = 2#Proc = 1
Also Not Possible
Upper Bound UpperB
ound
Latency
Pro
cess
ors
UNSAT
Tendulkar Mapping/scheduling for many-core 22 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints
SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design Space Exploration
Split-join Graph
SMT Constraints SMT Solver
Design SpaceExploration Algorithm
costconstraints
solutions
(x1, y1)
SAT
(x2, y2)
UNSAT
(x3, y3)
?TIMEOUT
Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.
Tendulkar Mapping/scheduling for many-core 23 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Exploration Algorithm
Divide cost space using grids
One SMT query per point on the grid
Finer grid after every iteration
Don’t query in known area
sat points unsat points not yet explored points
Tendulkar Mapping/scheduling for many-core 24 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Exploration Algorithm
Divide cost space using grids
One SMT query per point on the grid
Finer grid after every iteration
Don’t query in known area
sat points unsat points not yet explored points
Tendulkar Mapping/scheduling for many-core 24 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Exploration Algorithm
Divide cost space using grids
One SMT query per point on the grid
Finer grid after every iteration
Don’t query in known area
sat points unsat points not yet explored points
Tendulkar Mapping/scheduling for many-core 24 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Exploration Algorithm
Divide cost space using grids
One SMT query per point on the grid
Finer grid after every iteration
Don’t query in known area
sat points unsat points not yet explored points
Tendulkar Mapping/scheduling for many-core 24 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Exploration Algorithm
Divide cost space using grids
One SMT query per point on the grid
Finer grid after every iteration
Don’t query in known area
sat points unsat points not yet explored points
Tendulkar Mapping/scheduling for many-core 24 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Overview
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 25 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
all instances of actor C are similar (symmetric)
No change in latency !
Huge number of such symmetric solutions
Add constraints to eliminate all but one
Tendulkar Mapping/scheduling for many-core 26 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
C10
C11
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
a lexicographic schedule
B1 C01 C10 C11 D1 E
A0 B0 C00 D0
time
P1
P2
lexicographic order : C00 � C01 � C10 � C11
enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′
s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)
Tendulkar Mapping/scheduling for many-core 27 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
C10
C11
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
a lexicographic schedule
B1 C01 C10 C11 D1 E
A0 B0 C00 D0
time
P1
P2
lexicographic order : C00 � C01 � C10 � C11
enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′
s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)
Tendulkar Mapping/scheduling for many-core 27 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
C10
C11
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
a lexicographic schedule
B1 C01 C10 C11 D1 E
A0 B0 C00 D0
time
P1
P2
lexicographic order : C00 � C01 � C10 � C11
enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′
s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)
Tendulkar Mapping/scheduling for many-core 27 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
C10
C11
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
a lexicographic schedule
B1 C01 C10 C11 D1 E
A0 B0 C00 D0
time
P1
P2
lexicographic order : C00 � C01 � C10 � C11
enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′
s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)
Tendulkar Mapping/scheduling for many-core 27 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
C00
C01
C10
C11
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
a lexicographic schedule
B1 C01 C10 C11 D1 E
A0 B0 C00 D0
time
P1
P2
lexicographic order : C00 � C01 � C10 � C11
enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′
s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)
Tendulkar Mapping/scheduling for many-core 27 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry : Theorem
Lexicographic Schedule
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
Theorem : Every group has a lexicographic schedule
Corollary : No feasible cost is lost
Tendulkar Mapping/scheduling for many-core 28 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry : Theorem
Lexicographic Schedule
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
Theorem : Every group has a lexicographic schedule
Corollary : No feasible cost is lost
Tendulkar Mapping/scheduling for many-core 28 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry : Theorem
Lexicographic Schedule
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
Theorem : Every group has a lexicographic schedule
Corollary : No feasible cost is lost
Tendulkar Mapping/scheduling for many-core 28 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry : Theorem
Lexicographic Schedule
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
Theorem : Every group has a lexicographic schedule
Corollary : No feasible cost is lost
Tendulkar Mapping/scheduling for many-core 28 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Task Symmetry : Theorem
Lexicographic Schedule
a schedule
B1 C10 C01 C00 D0 E0
A0 B0 C11 D1
time
P1
P2
C01 C00
a permuted schedule
B1 C10 C00 C01 D0 E0
A0 B0 C11 D1
time
P1
P2
C00 C01
Theorem : Every group has a lexicographic schedule
Corollary : No feasible cost is lost
Tendulkar Mapping/scheduling for many-core 28 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Processor Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
A0
B1
B0 C00
C10
C01
C11
D0
D1
E0
Time
P1
P2
schedule
A0
B1
B0 C00
C10
C01
C11
D0
D1
E0
Time
P1
P2
swapped P1 and P2
Tendulkar Mapping/scheduling for many-core 29 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Processor Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
A0
B1
B0 C00
C10
C01
C11
D0
D1
E0
Time
P1
P2
schedule
A0
B1
B0 C00
C10
C01
C11
D0
D1
E0
Time
P1
P2
swapped P1 and P2
Tendulkar Mapping/scheduling for many-core 29 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Processor Symmetry
C00
C01
C10
C11
B0
B1
A0
D0
D1
E0
task graph
A0
B1
B0 C00
C10
C01
C11
D0
D1
E0
Time
P1
P2
schedule
A0
B1
B0 C00
C10
C01
C11
D0
D1
E0
Time
P1
P2
swapped P1 and P2
Tendulkar Mapping/scheduling for many-core 29 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Pareto Exploration
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
without symmetry breaking
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
with symmetry breaking
0 8 16 24 32 40 48 56
Takes no time Times out
Exploration : Processors vs Latency α = 30
Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.
Tendulkar Mapping/scheduling for many-core 30 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Pareto Exploration
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
without symmetry breaking
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
with symmetry breaking
0 8 16 24 32 40 48 56
Takes no time Times out
Exploration : Processors vs Latency α = 30
Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.
Tendulkar Mapping/scheduling for many-core 30 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Pareto Exploration
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
without symmetry breaking
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
with symmetry breaking
0 8 16 24 32 40 48 56
Takes no time Times out
Exploration : Processors vs Latency α = 30
Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.
Tendulkar Mapping/scheduling for many-core 30 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Pareto Exploration
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
without symmetry breaking
0 5 10 15 20 25 30
Latency
0
5
10
15
20
25
30
Pro
cess
ors
Sat Points Unsat Points Pareto Curve
with symmetry breaking
0 8 16 24 32 40 48 56
Takes no time Times out
Exploration : Processors vs Latency α = 30
Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.Tendulkar Mapping/scheduling for many-core 30 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size
MPEG video decoder:
122 Tasks
Latency(.10 3)
8
12
16
20
24
Buffer
Size
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
With symmetry Without symmetry
Better Pareto points in same TIME-Budget !
Tendulkar Mapping/scheduling for many-core 31 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size
MPEG video decoder:
122 TasksLatency(.10 3)
8
12
16
20
24
Buffer
Size
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
With symmetry Without symmetry
Better Pareto points in same TIME-Budget !
Tendulkar Mapping/scheduling for many-core 31 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size
MPEG video decoder:
122 TasksLatency(.10 3)
8
12
16
20
24
Buffer
Size
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
Latency(.10 3)
8
12
16
20
24
Buffer
Size
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
Latency(.10 3)
8
12
16
20
24Buff
erSize
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
With symmetry Without symmetry
Better Pareto points in same TIME-Budget !
Tendulkar Mapping/scheduling for many-core 31 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size
MPEG video decoder:
122 TasksLatency(.10 3)
8
12
16
20
24
Buffer
Size
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
With symmetry Without symmetry
Better Pareto points
in same TIME-Budget !
Tendulkar Mapping/scheduling for many-core 31 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size
MPEG video decoder:
122 TasksLatency(.10 3)
8
12
16
20
24
Buffer
Size
150
200
250
300
350
400
Pro
cess
or
0
20
40
60
80
100
120
140
[5,367,91]
[24,276,1]
[14,276,122]
[14,333,62]
[10,323,122]
[17,182,122]
[7,205,122]
[24,182,1]
[19,182,31]
[10,229,31]
With symmetry Without symmetry
Better Pareto points in same TIME-Budget !
Tendulkar Mapping/scheduling for many-core 31 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Distributed memory scheduling
So far we ignored the communication costs
For distributed memory, communication needs to be modeled
Tendulkar Mapping/scheduling for many-core 32 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Distributed memory scheduling
So far we ignored the communication costs
For distributed memory, communication needs to be modeled
Tendulkar Mapping/scheduling for many-core 32 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Overview
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 33 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters16 processors2 MB Shared MemoryDMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters
16 processors2 MB Shared MemoryDMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters
16 processors2 MB Shared MemoryDMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters16 processors
2 MB Shared MemoryDMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters16 processors2 MB Shared Memory
DMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters16 processors2 MB Shared MemoryDMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Kalray MPPA-256512KB
QuadCore
USMC PCIe inter laken DDR
GPIOs
Eth
Interlaken
Quad
Core
512K
B
Eth
Interlaken
Quad
Core
512K
B
DDR
GPIOs PCIe interlaken
QuadCore
512KB
SharedMemory
D-NocRouter
DMA
syst.core
C-NocRouter
C-NoC
DSU
P0 P1
P2 P3
P8 P9
P10 P11
P4 P5
P6 P7
P12 P13
P14 P15
16 compute clusters16 processors2 MB Shared MemoryDMA
Toroidal 2D network
Tendulkar Mapping/scheduling for many-core 34 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
The problem?
Which cluster to allocate?
Which processor to allocate?
Connected tasks in same or different cluster?
Communicating tasks if to be added, which DMA?
And the constraintsPrecedence
Mutual Exclusion
Costs
For 10 tasks, 256 processors,
1.20892582× 1024 potential solutions!
Tendulkar Mapping/scheduling for many-core 35 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
The problem?
Which cluster to allocate?
Which processor to allocate?
Connected tasks in same or different cluster?
Communicating tasks if to be added, which DMA?
And the constraintsPrecedence
Mutual Exclusion
Costs
For 10 tasks, 256 processors,
1.20892582× 1024 potential solutions!
Tendulkar Mapping/scheduling for many-core 35 / 52
Split the problem into sub-problems.
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
A B
C
D
E F
Group the Actors
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
A B
C
D
E F
Group the Actors
GoalsLoad balance the groupsMinimize data exchange
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
A B
C
D
E F
Place the Groups
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
A B
C
D
E F
Place the Groups
GoalsMinimize distance between communicating groups
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
C0
D0
C1
D1
B0
B1
A0
E0
E1
F0
A0 B0
B1
C0
C1
D0
D1
E0
E1
F0
transfer
Time
Clu
ster
0C
lust
er1
P1
P2
DMA0
P1
P2
ScheduleTasks
Transfer
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Design FlowApplication
Graph
Partitioning
Placement
Multi-clusterScheduling
C0
D0
C1
D1
B0
B1
A0
E0
E1
F0
A0 B0
B1
C0
C1
D0
D1
E0
E1
F0
transfer
Time
Clu
ster
0C
lust
er1
P1
P2
DMA0
P1
P2
ScheduleTasks
Transfer
GoalsMinimize LatencyMinimize Buffer size
Tendulkar Mapping/scheduling for many-core 36 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Output of Design Flow
Tasks and Transfers
Cluster Mapping
Processor and DMA Mapping
Start time
EdgesCommunication buffer size
ApplicationLatency
Tendulkar Mapping/scheduling for many-core 37 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Output of Design Flow
Tasks and TransfersCluster Mapping
Processor and DMA Mapping
Start time
EdgesCommunication buffer size
ApplicationLatency
Tendulkar Mapping/scheduling for many-core 37 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Output of Design Flow
Tasks and TransfersCluster Mapping
Processor and DMA Mapping
Start time
EdgesCommunication buffer size
ApplicationLatency
Tendulkar Mapping/scheduling for many-core 37 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Output of Design Flow
Tasks and TransfersCluster Mapping
Processor and DMA Mapping
Start time
EdgesCommunication buffer size
ApplicationLatency
Tendulkar Mapping/scheduling for many-core 37 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Output of Design Flow
Tasks and TransfersCluster Mapping
Processor and DMA Mapping
Start time
EdgesCommunication buffer size
ApplicationLatency
Tendulkar Mapping/scheduling for many-core 37 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Output of Design Flow
Tasks and TransfersCluster Mapping
Processor and DMA Mapping
Start time
EdgesCommunication buffer size
ApplicationLatency
Tendulkar Mapping/scheduling for many-core 37 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
DMA Model
Tasks communicating via DMA:
A I G B
DMA
I G
A
I
I
G
B
Time
Clu
ster
0C
lust
er1
P1
DMA0
P1
Task Description Resources used Task duration
I Initialization Processor and DMA Constant
G Network Transfer Only DMA Transfer size dependent
Tendulkar Mapping/scheduling for many-core 38 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
DMA Model
Tasks communicating via DMA:
A I G B
DMA
I G
A
I
I
G
B
Time
Clu
ster
0C
lust
er1
P1
DMA0
P1
Task Description Resources used Task duration
I Initialization Processor and DMA Constant
G Network Transfer Only DMA Transfer size dependent
Tendulkar Mapping/scheduling for many-core 38 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
DMA Model
Tasks communicating via DMA:
A I G B
DMA
I
G
A
I
I
G
B
Time
Clu
ster
0C
lust
er1
P1
DMA0
P1
Task Description Resources used Task duration
I Initialization Processor and DMA Constant
G Network Transfer Only DMA Transfer size dependent
Tendulkar Mapping/scheduling for many-core 38 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
DMA Model
Tasks communicating via DMA:
A I G B
DMA
I GA
I
I
G
B
Time
Clu
ster
0C
lust
er1
P1
DMA0
P1
Task Description Resources used Task duration
I Initialization Processor and DMA Constant
G Network Transfer Only DMA Transfer size dependent
Tendulkar Mapping/scheduling for many-core 38 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-controlDMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-controlDMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-controlDMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-controlDMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-controlDMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-control
DMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Model TransformationAn example application graph:
A B[α, ω]
Partition-Aware graph:
A Iwr Gwr Bewt : [1, w
↑] ewn : [1] ert : [α, ω]
Buffer-Aware graph:
A Iwr Gwr
Fst
B
IrdGrd
ewt : [1, w↑] ewn : [1] ert : [α, ω]
ews: [1]
ewb : [1, 0, b(e
wt )]
e rs:[1]
ern : [1]
erb : [α −1
, 0, b(ert )]
DMA : Data
DMA : flow-controlDMA-Completion
Tendulkar Mapping/scheduling for many-core 39 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
VLD IQ/IDCT COLOR
12 1
12
VLD : Variable Length Decoder
IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform
Color : Color Conversion
Tendulkar Mapping/scheduling for many-core 40 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
VLD IQ/IDCT COLOR
12 1
12
VLD : Variable Length Decoder
IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform
Color : Color Conversion
Tendulkar Mapping/scheduling for many-core 40 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
VLD IQ/IDCT COLOR
12 1
12
VLD : Variable Length Decoder
IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform
Color : Color Conversion
Tendulkar Mapping/scheduling for many-core 40 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
VLD IQ/IDCT COLOR
12 1
12
VLD : Variable Length Decoder
IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform
Color : Color Conversion
Tendulkar Mapping/scheduling for many-core 40 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEGDecoder
Partitioning
Placement
Multi-clusterScheduling
Partitioning
Placement
Multi-clusterScheduling
Partitioning Solutions:
Cz : No. of Groups
Cη : Total communication cost
Cτ : Max. workload per group
Allocated group Exploration CostSolution vld iq color Cz Cη Cτ
Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276
Scheduling Solutions:
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0 Ps1 Ps2 Ps3
Tendulkar Mapping/scheduling for many-core 41 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEGDecoder
Partitioning
Placement
Multi-clusterScheduling
Partitioning
Placement
Multi-clusterScheduling
Partitioning Solutions:
Cz : No. of Groups
Cη : Total communication cost
Cτ : Max. workload per group
Allocated group Exploration CostSolution vld iq color Cz Cη Cτ
Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276
Scheduling Solutions:
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0 Ps1 Ps2 Ps3
Tendulkar Mapping/scheduling for many-core 41 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEGDecoder
Partitioning
Placement
Multi-clusterScheduling
Partitioning
Placement
Multi-clusterScheduling
Partitioning Solutions:
Cz : No. of Groups
Cη : Total communication cost
Cτ : Max. workload per group
Allocated group Exploration CostSolution vld iq color Cz Cη Cτ
Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276
Scheduling Solutions:
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0 Ps1 Ps2 Ps3
Tendulkar Mapping/scheduling for many-core 41 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEGDecoder
Partitioning
Placement
Multi-clusterScheduling
Partitioning
Placement
Multi-clusterScheduling
Partitioning Solutions:
Cz : No. of Groups
Cη : Total communication cost
Cτ : Max. workload per group
Allocated group Exploration CostSolution vld iq color Cz Cη Cτ
Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276
Scheduling Solutions:
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0 Ps1 Ps2 Ps3
Tendulkar Mapping/scheduling for many-core 41 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEGDecoder
Partitioning
Placement
Multi-clusterScheduling
Partitioning
Placement
Multi-clusterScheduling
Partitioning Solutions:
Cz : No. of Groups
Cη : Total communication cost
Cτ : Max. workload per group
Allocated group Exploration CostSolution vld iq color Cz Cη Cτ
Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276
Scheduling Solutions:
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0 Ps1 Ps2 Ps3
Tendulkar Mapping/scheduling for many-core 41 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEGDecoder
Partitioning
Placement
Multi-clusterScheduling
Partitioning
Placement
Multi-clusterScheduling
Partitioning Solutions:
Cz : No. of Groups
Cη : Total communication cost
Cτ : Max. workload per group
Allocated group Exploration CostSolution vld iq color Cz Cη Cτ
Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276
Scheduling Solutions:
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0 Ps1 Ps2 Ps3
Tendulkar Mapping/scheduling for many-core 41 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEG decoder latency measured on Kalray platform
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps1
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize(
byte
s)
Ps2
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize(
byte
s)
Ps3
model measured-min. measured-max.
Maximum prediction error of 9%
Tendulkar Mapping/scheduling for many-core 42 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
JPEG Decoder Example
JPEG decoder latency measured on Kalray platform
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps0
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize
(byt
es)
Ps1
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize(
byte
s)
Ps2
0.4 0.5 0.6 0.7 0.8 0.9 1
·106
1
1.1
1.2
·104
Latency (cycles)
Buf
ferS
ize(
byte
s)
Ps3
model measured-min. measured-max.
Maximum prediction error of 9%
Tendulkar Mapping/scheduling for many-core 42 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
StreamIt Benchmarks
JPEG
Dec.
Beam
Form
er
Inser
tion Sor
t
Merge
Sort
Radix
Sort
Dct1 Dct2 Dct3 Dct4 Dct5 Dct6 Dct7 Dct8
DctCoa
rse
DctFine
Comp.
coun
t
Matrix
Mult.
Fft0
20
40
60
80
100
25
155
7
37
6 4
8
4
8 8
24
7 10
3
6 4
8 8
#Solutions %error
Tendulkar Mapping/scheduling for many-core 43 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
StreamIt Benchmarks
JPEG
Dec.
Beam
Form
er
Inser
tion Sor
t
Merge
Sort
Radix
Sort
Dct1 Dct2 Dct3 Dct4 Dct5 Dct6 Dct7 Dct8
DctCoa
rse
DctFine
Comp.
coun
t
Matrix
Mult.
Fft0
20
40
60
80
100
25
155
7
37
6 4
8
4
8 8
24
7 10
3
6 4
8 8
#Solutions %error
Tendulkar Mapping/scheduling for many-core 43 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Overview
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 44 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal
SAT
UNSAT
Tendulkar Mapping/scheduling for many-core 45 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal SAT
UNSAT
Tendulkar Mapping/scheduling for many-core 45 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal SAT
UNSAT
Tendulkar Mapping/scheduling for many-core 45 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal SAT
UNSAT
Tendulkar Mapping/scheduling for many-core 45 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal SAT
UNSAT
Tendulkar Mapping/scheduling for many-core 45 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal
TIMEOUT
foundoptimal
Tendulkar Mapping/scheduling for many-core 46 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal
TIMEOUT
foundoptimal
Tendulkar Mapping/scheduling for many-core 46 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal
TIMEOUT
foundoptimal
Tendulkar Mapping/scheduling for many-core 46 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
lower bound upper boundoptimal
TIMEOUT
foundoptimal
Tendulkar Mapping/scheduling for many-core 46 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3 Optimized Schedule
Such constraints makes the problem harder for SMT
Tendulkar Mapping/scheduling for many-core 47 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3 Optimized Schedule
Such constraints makes the problem harder for SMT
Tendulkar Mapping/scheduling for many-core 47 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3 Optimized Schedule
Such constraints makes the problem harder for SMT
Tendulkar Mapping/scheduling for many-core 47 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3 Optimized Schedule
Such constraints makes the problem harder for SMT
Tendulkar Mapping/scheduling for many-core 47 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Lessons learnt from SMT solver
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3
A0 B0
B1
B2
B3
C0
Time
P1
P2
P3 Optimized Schedule
Such constraints makes the problem harder for SMT
Tendulkar Mapping/scheduling for many-core 47 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Two-step optimization
Get a loose schedule from the solver
Optimize it for:LatencyProcessors used
Upper Bound UpperB
ound
LatencyP
roce
ssor
s
LooseSolution
Tendulkar Mapping/scheduling for many-core 48 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Two-step optimization
Get a loose schedule from the solver
Optimize it for:LatencyProcessors used
Upper Bound UpperB
ound
LatencyP
roce
ssor
s
LooseSolution
Tendulkar Mapping/scheduling for many-core 48 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Two-step optimization
Get a loose schedule from the solver
Optimize it for:LatencyProcessors used
Upper Bound UpperB
ound
LatencyP
roce
ssor
s
LooseSolution
optimized
Tendulkar Mapping/scheduling for many-core 48 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Two-step optimization
Get a loose schedule from the solver
Optimize it for:LatencyProcessors used
Upper Bound UpperB
ound
LatencyP
roce
ssor
s
LooseSolution
optimized
SAT
Tendulkar Mapping/scheduling for many-core 48 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Overview
1 Motivation
2 Application Model
3 Deployment using SMT
4 Symmetry elimination
5 Distributed memory scheduling
6 SMT Solving
7 Conclusions
Tendulkar Mapping/scheduling for many-core 49 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Conclusions and Future Work
Conclusions:
Symmetry elimination finds better solutions
Combined Optimization with Communication modeling
Automated design flow for distributed memory
Tendulkar Mapping/scheduling for many-core 50 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Conclusions and Future Work
Conclusions:
Symmetry elimination finds better solutions
Combined Optimization with Communication modeling
Automated design flow for distributed memory
Tendulkar Mapping/scheduling for many-core 50 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Conclusions and Future Work
Conclusions:
Symmetry elimination finds better solutions
Combined Optimization with Communication modeling
Automated design flow for distributed memory
Tendulkar Mapping/scheduling for many-core 50 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
References
P. Tendulkar, P. Poplavko, and O. Maler. “Symmetry Breaking forMulti-criteria Mapping and Scheduling on Multicores”. In:FORMATS. 2013
P. Tendulkar, P. Poplavko, I. Galanommatis, and O. Maler.“Many-Core Scheduling of Data Parallel Applications using SMTSolvers”. In: DSD. 2014
P. Tendulkar, P. Poplavko, and O. Maler. Strictly PeriodicScheduling of Acyclic Synchronous Dataflow Graphs using SMTSolvers. Tech. rep. Verimag Research Report, 2014
Tendulkar Mapping/scheduling for many-core 51 / 52
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions
Thank You
Questions?
Tendulkar Mapping/scheduling for many-core 52 / 52