Presented by
Parallel Discrete Event Simulation (PDES) at ORNL
Kalyan S. Perumalla, Ph.D.Modeling & Simulation Group
Computational Sciences & Engineering
2 Perumalla_PDES_SC07
PDES: Selected application areas
Network simulation Internet protocols, security,
P2P designs, …
Traffic simulation Emergency planning/response,
environmental policy analysis, urban planning, …
Social dynamics simulation Operations planning, foreign
policy, marketing, …
Sensor simulations Wide area monitoring,
situational awareness, border surveillance, …
Organization simulations Command and control, business
processes, …
Emergencies
Current and future defense systems
Protection and awareness systems
Global and local events
3 Perumalla_PDES_SC07
High-performance PDES kernel requirements
Global time synchronization Total time-stamped ordering of events Paramount for accuracy
Fast synchronization Scalable, application-independent, time-advance mechanisms Critical for real-time and as-fast-as-possible execution
Support for fine-grained events Minimal overhead relative to event processing times Application computation is typically only 5 µs to 50 µs per event
Conservative, optimistic, and mixed modes Need support for the principal synchronization approaches Useful to choose mode on per-entity basis at initialization Desirable to vary mode dynamically during simulation
General-purpose API Reusable across multiple applications Accommodates multiple techniques
Lookahead, state saving, reverse computation, multicast, etc.
4 Perumalla_PDES_SC07
µsik—unique PDES “micro-kernel”
Some recent results of fine-grained PDES benchmark (phold)
Among the largest/fastest scalability results in parallel discrete event simulation
LP = 1 thousand, MSG = 1 millionLP = 1 million, MSG = 100 millionLP = 1 million, MSG = 1 billion
LP = 1 thousand, MSG = 1 millionLP = 1 million, MSG = 100 millionLP = 1 million, MSG = 1 billion
Unique mixed-mode kernel The only scalable mixed-mode
kernel in the world Supports conservative, optimistic,
and mixed modes in a single kernel
Used in a variety of applications DES-based vehicular traffic models DES-based plasma physics models DES-based neurological models Largest Internet simulations
0
10
20
30
40
50
60
70
128 256 512 1024
Ag
gre
ga
te e
ve
nts
/s (
M)
Number of processors
0
5
10
15
20
25
30
35
0 128 256 384 512 640 768 896 1024
µs
pe
r e
ve
nt
Number of processors
5 Perumalla_PDES_SC07
µsik scaled to more than 104 processors
Some recent results of fine-grained PDES benchmark On Blue Gene Watson (BGW) at IBM TJ Watson Research Center Well-known PHOLD benchmark, with 1 million logical processes, 10 million pucks
The largest and fastest scalability results in PDES recorded to date
0
100
200
2,048 4,096 8,192 16,384
Events/s (millions)
Number of processors
300
400
500
600ConservativeMixedOptimistic
1000
5000
9000
13000
17000
0 2 4 6 8 10 12 14 16 18Number of processors (thousands)
Speedup
Conservative
Mixed
Optimistic
0
10
20
30
40
50
0 2 4 6 8 10 12 14 16 18
µs per event
Conservative
Mixed
Optimistic
Number of processors (thousands)
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Parallel efficiency
MixedOptimis
tic
Conservative
0 2 4 6 8 10 12 14 16 18Number of processors (thousands)
6 Perumalla_PDES_SC07
µsik micro-kernel internals
Processable
Pp
User LPs
Kernel LPs
Micro-kernel Future event listProcessed event listLocal virtual time
When update kernel Qs?
• New LP added or deleted• LP executes an event• LP receives an event
LP = Logical processKP = Kernel processECTS = Earliest committable time stampEPTS = Earliest processable time stampEETS = Earliest emittable time stampPEL = Processed event listFEL = Future event listLVT = Local virtual time
EPTS Q
Committable
Pc
ECTS Q
Emittable
Pe
EETS Q
LP LP
LP LP
LP PEL→t LVTFEL→t
KP
KP
KP
KP
7 Perumalla_PDES_SC07
libSynk: µsik’s synchronization core
OS/Hardware
libSynk
µsikProcess
µsik
µsikProcess
µsikProcess
TM
TM Red TM Null
RMRM Bar
FM Myr FM TCP
FMFM
FM ShM FM MPI
X Y Implies X uses Y
Network
8 Perumalla_PDES_SC07
µsik micro-kernel capabilities μsik is currently able to support the following:
Lookahead-based conservative and/or optimistic execution Reverse computation-based optimistic execution Checkpointing-based optimistic execution Resilient optimistic execution (zero rollbacks)
Constrained, out-of-order execution Preemptive event processing
Any combinations of the above Automated, network-throttled flow control User-level event retraction Process-specific limits to optimism Dynamic process addition/deletion Shared and/or distributed memory execution Process-oriented views
It accommodates addition of the following: Synchronized multicast Optimistic dynamic memory allocation Automated load-balancing
9 Perumalla_PDES_SC07
SensorNet: Parallel simulation/ immersive test-bed
Seamless integrated testbed to incorporate a variety of important simulations, stimulations, and live devices
Achieves unified capabilities and significant fidelity for test and evaluation of CB sensor device-based designs, concepts, and operations
TOSSIMSimDriver
SimComm
Tiny Viz
Script Interpreter
External Event Buffer
Tython
ReflectedSimulation
Script
Python
Environ-mental
Model Proxy
SCIPUFF
Weather Model
Plume ModelsComm. Models
PlumeModelsPlumeModels
Sensor Net Simulator
Sensor Net Simulator
Weather SimulatorWeather
SimulatorOperations Simulator
Operations Simulator
LiveDevices
LiveDevices
CB Sensor Network Testbed Framework
Current Envisioned
10 Perumalla_PDES_SC07
SensorNet: Simulation-based analysis for plume tracking
Environmental phenomenon exhibits high variability.
Phenomenon drives the sensor network’s computation and communication.
Trace gathered at base station of sensed phenomenon reflects high variability.
Communication effects induce unpredictable gaps in series.
Accurate, integrated simulation of phenomenon and communication captures complex interdependencies.
1
Base Station
Source Release
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0Sensor Network Layout
-96.0 -91.2 -86.4Longitude
40.6
42.6
44.6
46.6
48.550.5 Turbulent Winds
Surface DosageCHEM at T = 17.0 h
40.6
42.6
44.6
46.6
48.550.5 Light Winds
Surface DosageCHEM at T = 17.0 h
40.6
42.6
44.6
46.6
48.550.5 No Winds
Surface DosageCHEM at T = 17.0 h
Kgs/cm3
1.00E-021.00E-031.00E-041.00E-051.00E-061.00E-071.00E-08
Latit
ude
300 400 500Time (s)
0
10
20
No WindsWinds
Turbulent30
40
600
Mea
n C
once
ntra
tion
(10-
6 kg
s/m
3 )
0
10
20
30
40 No WindsWinds
Turbulent
11 Perumalla_PDES_SC07
SCATTER: Ultra-scale PDES-based mobility simulations Scalable tool for transportation and
energy/event/emergency research Regional scale: multiple states
106–107 intersections
Current tool capabilities At most 104 intersections
Faster than real time is very useful
Higheraccuracy
Loweraccuracy
Fidelity
Speed
Faster
Slower
Desirable
Network flowmethods
OREMS
CORSIM
MITSIM
TRANSIMS
SCATTER
Good for emissions estimates, traffic analysis, . . .
Good for rough estimates of evacuation delay, . . .
Our approach: SCATTER DES models
vs time-stepped Parallel execution
vs sequential Scalability to high-
performance computing 102–103 CPUs
Important behaviors kinetic + non-kinetic
12 Perumalla_PDES_SC07
SCATTER: Benchmark performance
1
10
100
1000
10000 100000 1000000
µs/
even
t
Number of vehicles
Nodes = 9
Nodes = 1089
Event processing speed
1
10
100
1000
10000
10000 100000 1000000
Rea
l tim
e/si
mu
late
d t
ime
Number of vehicles
Nodes = 9
Nodes = 1089
Speedup over real-time
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
1 2 3 4
Sp
eed
up
ove
r se
qu
enti
al r
un
Number of processors
Parallel Speedup
Significant speedup with parallel execution!
Significantly faster than real time with 1 million vehicles!
Very low event processing overhead (ms)!
13 Perumalla_PDES_SC0713 Perumalla_PDES_SC07
Contact
Kalyan S. PerumallaModeling & Simulation GroupComputational Sciences & Engineering (865) [email protected]