µsik: A Micro-Kernel for PDES Systems
Kalyan S. PerumallaResearch Faculty Member
College of Computing, Georgia Techwww.cc.gatech.edu/fac/kalyan
1 June 2005
Overriding Policy of Talk
Closeness to Lunch Time
Tolerance to presenter overshooting allotted time
Motivating Project – Multi-scale Model Execution
MHD Hybrid PICModeling Alternative Spectrum:
Global Kinetic Models of Earth’s Magnetosphere
Fidelity
Scale: 10 x ReRe=Radius of earth
Scalability
NSF-ITR: Fujimoto, Pande & Perumalla (Georgia Tech)Karimabadi (SciberNet)
Parallel execution of Novel DES Models
Outline
• Background• Concepts & Definitions• Implementation Details• Extensibility Case Studies• Performance Study• Conclusions & Future Work
Outline
• Background
• Concepts & Definitions• Implementation Details• Extensibility Case Studies• Performance Study• Conclusions & Future Work
Traditional PDES Systems
• Traditional kernels– E.g., TWOS, SPEEDES, GTW, DaSSF, ROSS
• Federated systems– E.g., DIS, HLA, XMSF
• Preliminary hybrid systems– E.g., PARSEC
• Citations to related efforts & publications listed in references.
Why One Extensible, Encompassing Framework?
Decide on synchronizationa priori
Develop/use separate simulator for each synchronization method
Mix multiple synchronization methods in same run
Reuse model almost unmodified in new methodsbeing developed
Port model to multiple different simulators
PDES – Variants & Core
ConservativeLookback
ConservativeCCT
ConservativeReductions
ConservativeNull Msgs
…
OptimisticReverse
Computation
OptimisticState Saving
?
• Retractions– User-level, Internal– Conservative, Optimistic
• Modeling views– Process-oriented, Event-oriented
• I/O (optimistic/conservative)– File I/O, Dynamic memory
• Checkpointing– Copy, Incremental, Periodic
• Reverse computation– System-level, User-level
• Bundling– Message bundling, Event bundling
• Flow control– Network-based, Memory-based
• Aggregate event processing– Fully aggregate, Hybrid
• Time-synchronized multicast
Overview
• Problem: Is there a way to support wide variety of approaches in a single extensible framework?
• Emphases:– Synchronization efficiency– Runtime performance– Fine granularity
• Solution: Micro-kernel approach
• Emphases:– Efficient data structures– Minimalist core interfaces– Fine-grained control– Parallel/distributed execution
Mixed Synchronization
• [Conservative + Optimistic] is not very new:– Bagrodia et al– Nutaro et al– Rajaei et al– …
• Novelty & Contributions:– Systems approach/framework– Scalable performance
on supercomputing platforms– Extensibility case studies, applications
Outline
• Motivation & Background• Concepts & Definitions
• Implementation Details• Extensibility Case Studies• Performance Study• Conclusions & Future Work
What’s the Name Again?
• µsik = Micro Simulation Kernel• Musik (& music) are all about timing!• Note: Musik[sic], not Music!
Execution Architecture
Network
…Machine
Processor
Federate
LPLP
LPLP
Processor
Federate
LPLP
LPLP
…Machine
Processor
Federate
LPLP
LPLP
Processor
Federate
LPLP
LPLP
…
LP=Logical Process with its own timeline
Event Execution
• Process Event– Tentative processing
• Emit an Event– New events scheduled during tentative processing
• Rollback Event– Undo previous tentative processing
• Commit Event– Finalize the actions performed in tentative processing
Event Processing Phases
3 PhasesProcess, Rollback, Commit
ConservativeProcess+Commit
Optimistic(Process+Rollback)*+Process+Commit
Example
Process = Execute tentativeRollback = Undo tentativeCommit = Finalize tentative
:
Call/Context Process Rollback Commit
execute(e) sv=newstate()copystate(sv)push(sv)execute(e)
sv=pop()restorestate(sv)freestate(sv)
sv=pop()freestate(sv)
Event Processing Phases – More Examples
ConservativeProcess+Commit
Optimistic(Process+Rollback)*+Process+Commit
More Examples:
Call/Context Process Rollback Commit
malloc() p=malloc()push(p)
p=pop()free(p)
pop()
free(p) push(p) pop() p=pop()free(p)
MicroProcess – Core of a Generalized LP
ECTS EPTS EETS
Simulation time →
CommittedCommittableProcessableEmittable
ECTS=Earliest Committable Time StampEPTS=Earliest Processable Time StampEETS=Earliest Emittable Time Stamp
LCTS
LCTS=Latest Committed Time Stamp
Process interface expected by MicroKernel:class MicroProcess{SimTime ects() = 0;SimTime epts() = 0;SimTime eets() = 0;
void enqueue( SimEventBase* event ) = 0;void dequeue( SimEventBase* event ) = 0;
long advance( SimTime lbts ) = 0;long advance_optimistically( SimTime lbts, … ) = 0;
}
Thesis: Almost all traditional PDES techniques can be efficiently built over this thin interface.
Traditional lookahead is
supported here
Micro-Kernel Services
Application Models
Classical Services
ExtensionsC
onvenience M
odules
Micro-KernelCore
• Naming– Referring to & looking up
processes• Routing
– Forwarding (& pulling back) events among processes
• Scheduling– Ensuring progress, avoiding
deadlock & livelock, efficient network & CPU utilization
class MicroKernel{
int num_feds();int fed_id();
SimPID add_mp( MicroProcess* p );void del_mp( MicroProcess* p );MicroProcess *ID2MP( SimPID pid );
void forward( SimPID to, SimEventBase* event );void pullback( SimEventBase* event );
void init();void start();void SimTime run( SimTime max_t, long max_events );void stop();
}
Micro-Kernel Interface
Outline
• Motivation & Background• Concepts & Definitions• Implementation Details
• Extensibility Case Studies• Performance Study• Conclusions & Future Work
Micr-Kernel Implementation
ECTS QECTS Q
Pc
Commitable
EPTS QEPTS Q
Processable
EETS QEETS Q
Emittable
Pe
LPLP
LPLPLPLP
LPLPLPLP
KPKP KPKP
KPKP KPKP
User LPs
Kernel LPs
Micro-Kernel
FEL LVT
Future Event ListProc’d Event ListLocal Virtual Time
→tPEL →t
When update kernel Q’s?•New LP added or deleted•LP executes an event•LP receives an event
Pp
Naming and Proxying
0 1 2 3 …… -3 -2 -1
User processes (UPs)Kernel processes (KPs)
Local identifiers assignment
Application
UP UP…
Kernel Services
KP KP…
Micro-Kernel
UP UP…
KP0 KPi…
Federate 0 Federate i
Example: Inter-federate event exchanges via local proxy KPs
…
Kernel Services implemented using Kernel Processes
Benefit:KPs are also
MicroProcesses, hence time-
synchronized & scheduled
automatically!
SimPID=(FedID,LocalID)
Implementation – Software Architecture
µsik
µsikProcess
µsikProcess
µsikProcess
libSynk
TM Null
TM
TM Red
RMFM
FM ShM
FM Myr FM TCP
FM MPI
RM Bar
X Y Implies X uses Y
OS/HardwareNetwork
Evaluation
• Flexibility – How Extensible?– Case studies
• Performance – How Expensive?– Supercomputing results
Outline
• Motivation & Background• Concepts & Definitions• Implementation Details• Extensibility Case Studies
• Performance Study• Conclusions & Future Work
Extensibility Case Studies
• Run-ahead & Resilience (RA & RS)
• Constrained Out-of-order Processing (COORD)
• Pre-emptive Event Processing (PEP)
How Comprehensive?
• µsik is currently able to support the following:– Lookahead-based conservative and/or optimistic execution– Reverse computation-based optimistic execution– Checkpointing-based optimistic execution– Resilient optimistic execution (zero rollbacks)
• Constrained, out-of-order execution• Preemptive event processing
– Any combinations of the above– Automated, network-throttled flow control– User-level event retraction– Process-specific limits to optimism– Dynamic process addition/deletion– Shared and/or distributed memory execution– Process-oriented views
• Able to add the following, if/as needed:– Synchronized multicast– Optimistic dynamic memory allocation– Automated load-balancing
Outline
• Motivation & Background• Concepts & Definitions• Implementation Details• Extensibility Case Studies• Performance Study
• Conclusions & Future Work
How Expensive?
Some simple test configurations to test overheads:
• Sequential speed• Conservative parallel speed• Optimistic parallel speed• Speed with mixture of conservative & optimistic
Items of interest:
• Event processing cost• Context-switching cost
Applications
• Spacecraft Charging (optimistic)• 1-D Hybrid Collision-less Shock (conservative)• Neurological Models (conservative)• Diffusion Equation (conservative + optimistic)• PHOLD
PHOLD
• Relatively fine grained– ~5 microseconds computation per event
• Conservative– LPi.enable_undo( false )
• Optimistic– LPi.enable_undo( true, RA=10*LA )
– Reverse-computation– Reversible random number generator
• Mixed Mode– Even numbered LPs are conservative– Odd numbered LPs are optimistic
Computational Platform
• San Diego Supercomputing Center– IBM DataStar
• 8-way p655 nodes– 16 GB/node, eight 1.5GHz Power4 CPUs
• IBM Federation Switch– Low latency, fair bandwidth
• MPI
1-D Hybrid Shock DES Model
0
64
128
192
256
320
384
448
512
0 64 128 192 256 320 384 448 512
No. of Processors
Spee
dup
Linear (Ideal) Cells/CPU=1500 Cells/CPU=150
Some of our recent results in parallel DES-based plasma simulations• Problem size scaled with processors: Cells/CPU = 150, 1500; Ions/Cell = 100• Largest: 76.8 million ions (100 ions/cell x 1500 cells/CPU x 512 CPUs)• Can scale to much larger no. of ions
Simulation Platform: IBM DataStarSan Diego Supercomputing Centerwww.sdsc.edu/user_services/datastar
Sequential Speed – PHOLD
05
1015202530354045
1 100 10000 1000000Number of Simulation Processes
Mic
ro-s
ec p
er e
vent
R=1 R=10 R=100 R=1000
R=events/LPs
Conservative Parallel Speed – PHOLD
0
5
10
15
20
25
30
35
0 64 128 192 256 320 384 448 512Number of Processors
Mic
rose
c/ev
ent
LP=1 thousand, MSG=1 millionLP=1 million, MSG=100 millionLP=1 million, MSG=1 billion
Billion event population!
Optimistic Parallel Speed – PHOLD
0
5
10
15
20
25
30
35
0 64 128 192 256 320 384 448 512Number of Processors
Mic
rose
c/ev
ent
LP=1 thousand, MSG=1 millionLP=1 million, MSG=100 millionLP=1 million, MSG=1 billion
Billion event population!
Mixed-Mode Parallel Speed – PHOLD
0
5
10
15
20
25
30
35
0 64 128 192 256 320 384 448 512Number of Processors
Mic
rose
c/ev
ent
LP=1 thousand, MSG=1 millionLP=1 million, MSG=100 millionLP=1 million, MSG=1 billion
Billion event population!
Mixed-Mode Aggregate Speed – PHOLD
05
10152025303540
128 256 512
Mill
ions
Number of Processors
Mic
rose
c/ev
ent
LP=1 thousand, MSG= 1 million, mixedLP=1 million, MSG= 100 million, mixedLP=1 million, MSG= 1 billion, mixed
Outline
• Motivation & Background• Concepts & Definitions• Implementation Details• Extensibility Case Studies• Performance Study• Summary & Future Work
Summary
• Novel systems framework– Multiple synchronizations supported, variable per LP– Dynamic changeable synchronization within LP– Very encouraging extensibility
• High performance implementation, in C++
– Class hierarchies rooted at core concepts– Micro-kernel, micro-process, …
• Debugged & tested on multiple complex applications– Spacecraft charging, Hybrid Shock, Neurological Simulation,
…
Summary (continued)
• Highly scalable implementation operational– Tested up to 512 CPUs
• Billion-event PHOLD!– Conservative, Optimistic, Mixed
Future Work
• Large-scale Time Warp Issues– “Large-scale” used to be dozens of CPUs– Now, large-scale is 100’s to 1000’s of CPUs!
• Mixed mode is more challenging than first thought– Local dependencies need to be accounted for carefully
• Time-synchronized Multicast• Dynamic load balancing (transparently?)
References
• Operational software downloadable from:www.cc.gatech.edu/fac/kalyan/musik.htm
• Software release includes– micro-kernel, micro-process, etc.– base LPs– documentation– example applications
Questions?