Meta-Simulation Design and Analysis for Large
Scale NetworksDavid W Bauer Jr
Department of Computer ScienceRensselaer Polytechnic Institute
OUTLINE Motivation Contributions Meta-simulation
ROSS.Net BGP4-OSPFv2 Investigation
Simulation Kernel Processes Seven O’clock Algorithm
Conclusion
“…objective as a quest for general invariant
relationships between network
parameters and protocol dynamics…”
High-Level Motivation: to gain varying degrees of qualitative and quantitative
understanding of the behavior of the system-under-test
Parameter
Sensitivity
Protocol
Stability and
Dynamics
Feature
Interactions
Meta-Simulation: capabilities to extract and interpret meaningful performance data from the results of multiple simulations
• Individual experiment cost is high
• Developing useful interpretations
• Protocol performance modeling
Experiment Design Goal: identify minimum cardinality set of meta-metrics to maximally model system
OUTLINE Motivation Contributions Meta-simulation
ROSS.Net BGP4-OSPFv2 Investigation
Simulation Kernel Processes Seven O’clock Algorithm
Conclusion
Contributions: Meta-Simulation: OSPFProblem: which meta-metrics are most important in determining OSPF convergence?
Search complete model space
Negligible metrics identified and isolated
Step 2
Optimization-based ED: 750 experiments
Full-Factorial ED (FFED):
16384 experimentsStep 3
Our approach within 7% of Full Factorial using 2 orders of magnitude fewer experiments
Re-parameterize
Re-scale
Step 1
Contributions: Meta-Simulation: OSPF/BGPAbility: model BGP and OSPF control plane
Problem: which meta-metrics are most important in minimizing control plane dynamics (i.e., updates)?
Meta-Simulation Perspective: complete view of all domains
OB: ~50% of total updates
BO: ~0.1% of total updates
Global perspective 20-25% better than local perspectives
– BO: BGP-caused OSPF update
– OB: OSPF-caused BGP update
All updates belong to one of four categories:– OO: OSPF-caused OSPF (OO) update– BO: BGP-caused OSPF update
Minimize total BO+OB 15-25% better than other metrics
Contributions: Simulation: Kernel ProcessParallel Discrete Event Simulation
Conservative SimulationWait until it is safe to process next
event, so that events are processed in time-stamp order
Optimistic SimulationAllow violations of time-stamp
order to occur, but detect them and recover
Benefits of Optimistic Simulation:i. Not dependant on network topology simulated
ii. As fast as possible forward execution of events
Contributions: Simulation: Kernel ProcessProblem: parallelizing simulation requires 1.5 to 2 times more memory than sequential, and additional memory requirement affects performance and scalability
Decreased scalability as model size increases:
due to increased memory required to support model
Model Size Increasing
4 Processors Used
Solution: Kernel Processes (KPs)
new data structure supports parallelism, increases scalability
Contributions: Simulation: Seven O’clockProblem: distributing simulation requires efficient global synchronization
Inefficient solution: barrier synchronization between all nodes while performing computation
Efficient solution: pass messages between nodes, and sycnhronize in background to main simulation
Seven O’clock Algorithm: eliminate message passing reduce cost from O(n) or O(log n) to O(1)
OUTLINE Motivation Contributions Meta-simulation
ROSS.Net BGP4-OSPFv2 Investigation
Simulation Kernel Processes Seven O’clock Algorithm
Conclusion
ROSS.Net: Big PictureGoal: an integrated simulation and experiment
design environment
ROSS.Net(simulation &
meta-simulationProtocolDesignProtocol
metrics
Protocol parameters
Measurement Data-sets(Rocketfuel)
Measured topology data, traffic and router stats, etc.
ModelingProtocol Models:OSPFv2, BGP4, TCP Reno, IPv4, etc
ROSS.Net
Design of ExperimentsTool (DOT)
Parallel Discrete Event Network Simulation
Input Parameters
Output Metric(s)
Meta-Simulation
Simulation
• Experiment design• Statistical analysis• Optimization heuristic
search– Recursive Random Search
• Sparse empirical modeling
• Optimistic parallel simulation
– ROSS• Memory efficient
network protocol models
ROSS.Net: Big Picture
Design of Experiments Tool (DOT)
Traditional Experiment Design
(Full/Fractional Factorial)
Statistical or Regression Analysis
(R, STRESS)
Metric(s)Parameter Vector
• Small-scale systems• Linear parameter interactions• Small # of params
Empirical model
Design of Experiments Tool (DOT)
Optimization Search
Statistical or Regression Analysis
(R, STRESS)
Metric(s)Parameter Vector
• Large-scale systems• Non-Linear parameter interactions• Large # of params – curse of dimensionality
Sparse empirical model
ROSS.Net: Meta-Simulation Components
• Router topology from Rocketfuel tracedata– took each ISP map as a
single OSPF area– Created BGP domain
between ISP maps– hierarchical mapping of
routers
AT&T’s US Router Network Topology
• 8 levels of routers:– Levels 0 and 1, 155Mb/s, 4ms delay– Levels 2 and 3, 45Mb/s, 4ms delay– Levels 4 and 5, 1.5Mb/s, 10ms delay– Levels 6 and 7, 0.5Mb/s, 10ms delay
Meta-Simulation: OSPF/BGP Interactions
• OSPF– Intra-domain, link-state routing– Path costs matter
• Border Gateway Protocol (BGP)– Inter-domain, distance-vector, policy routing– Reachability matters
• BGP decision-making steps:– Highest LOCAL PREF– Lowest AS Path Length– Lowest origin type ( 0 – iBGP, 1 – eBGP, 2 – Incomplete)– Lowest MED– Lowest IGP cost– Lowest router ID
iBGP connectivity
eBGP connectivity
OSPF domain
Meta-Simulation: OSPF/BGP Interactions
• Intra-domain routing decisions can effect inter-domain behavior, and vice versa.
• All updates belong to either of four categories:– OSPF-caused OSPF (OO) update– OSPF-caused BGP (OB) update – interaction– BGP-caused OSPF (BO) update – interaction– BGP-caused BGP (BB) update
Link failure or cost increase (e.g. maintenance)
Destination
OB Update
8 10
Meta-Simulation: OSPF/BGP Interactions
Intra-domain routing decisions can effect inter-domain behavior, and vice versa.
Identified four categories of updates:– OO: OSPF-caused OSPF update– BB: BGP-caused BGP update– OB: OSPF-caused BGP update – interaction– BO: BGP-caused OSPF update – interaction
eBGP connectivity becomes available
Destination
BO Update
These interactions cause route changes to thousands of IP prefixes, i.e. huge traffic shifts!!
Meta-Simulation: OSPF/BGP Interactions
• Three classes of protocol parameters:– OSPF timers, BGP timers,
BGP decision• Maximum search space size
14,348,907.• RRS was allowed 200 trials
to optimize (minimize) response surface:– OO, OB, BO, BB,
OB+BO, ALL updates• Applied multiple linear
regression analysis on the results
Meta-Simulation: OSPF/BGP Interactions
• Optimized with respect to OB+BO response surface.• BGP timers play the major role, i.e. ~15% improvement in the optimal response.
– BGP KeepAlive timer seems to be the dominant parameter.. – in contrast to expectation of MRAI!
• OSPF timers effect little, i.e. at most 5%.– low time-scale OSPF updates do not effect BGP.
Meta-Simulation: OSPF/BGP Interactions
~15% improvement when BGP timers included in search space
• Varied response surfaces -- equivalent to a particular management approach.• Importance of parameters differ for each metric.• For minimal total updates:
– Local perspectives are 20-25% worse than the global.• For minimal total interactions:
– 15-25% worse can happen with other metrics• OB updates are more important than BO updates (i.e. ~0.1% vs. ~50%)
Meta-Simulation: OSPF/BGP Interactions
Important to optimize OSPFImportant to optimize OSPFImportant to optimize OSPFImportant to optimize OSPF
OB: ~50% of total updates
BO: ~0.1% of total updates
Global perspective 20-25% better than local perspectives
Minimize total BO+OB 15-25% better than other metrics
Meta-SimulationConclusions:
– Number of experiments were reduced by an order of magnitude in comparison to Full Factorial.
– Experiment design and statistical analysis enabled rapid elimination of insignificant parameters.
– Several qualitative statements and system characterizations could be obtained with few experiments.
OUTLINE Problem Statement Contributions Meta-simulation
ROSS.Net BGP4-OSPFv2 Investigation
Simulation Kernel Processes Seven O’clock Algorithm
Conclusion
Simulation: OverviewParallel Discrete Event Simulation
– Logical Process (LPs) for each relatively parallelizable simulation model, e.g. a router, a TCP host
Local Causality Constraint: Events within each LP must be processed in time-stamp order
Observation: Adherence to LCC is sufficient to ensure that parallel simulation will produce same result as sequential simulation
Conservative Simulation- Avoid violating the local causality
constraint (wait until it’s safe)
I. Null Message (deadlock avoidance)
(Chandy/Misra/Byrant)
II. Time-stamp of next event
Optimistic Simulation- Allow violations of local causality to
occur, but detect them and recover using a rollback mechanism
I. Time Warp Protocol (Jefferson, 1985)
II. Limiting amount of opt. execution
ROSS: Rensselaer’s Optimistic Simulation System
tw_event
messagereceive_tssrc / dest_lpuser data
message
free event list tail
free event list head
event queue
cancel queue
lp_list
tw_pe
pelp number
type
proc ev queue head
proc ev queue tail
tw_lp
ROSSROSS
free event list[ ][ ]
GTWGTW
messagecancel queuelplist[MAX_LP]
PEState
event queue
PEState GState[NPE]
...
messageinit proc ptrrev proc ptrfinal proc ptr
LPState
process ptr
messagemessage
Event
lp number
Example Accesses
GTW: Top down hierarchylp_ptr =
GState[LP[i].Map].lplist[LPNum[i]]
ROSS: Bottom up hierarchylp_ptr = event->src_lp;
orpe_ptr = event->src_lp->pe;
Key advantages of bottom up approach:
• reduces access overheads• improves locality and processor
cache performance
Memory usage only 1% more than sequential and independent of LP count.
“On the Fly” Fossil Collection
Processor 0FreeList[1]
FreeList[0]
LP A LP B LP C
5.0 5.0 5.0 10.0 10.0 10.0 15.0 15.0 15.0
Snapshot of PE 0’s internal state at time 15.0
Snapshot of PE 0’s internal state after rollback of LP A and re-executeProcessor 0FreeList[1]
FreeList[0]
LP A LP B LP C
5.0 5.0 10.0 10.0 15.0 15.0 5.0 10.0 15.0
Key Observation: Rollbacks cause the free list to become UNSORTED in virtual time.Result: event buffers that could be allocated are not.
user must over-allocate the free list
OTFFC works by only allocating events from the free list that are less than GVT. As events are processed they are immediately placed at the end of the free list....
KP
KP
Kernel
Processes
Contributions: Simulation: Kernel Process
LP
LP
LP
LP
. . .
(Logical Processes)
9 5
8 7 3 1
9
6 4 2
Fossil Collection / Rollback
PE
(Processing Element
per CPU utilized)
Advantages:
i. significantly lowers fossil collection overheadsii. lowers memory usage by aggregation of LP statistics into KP
statisticsiii. retains ability to process events on an LP by LP basis in the
forward computation.
Disadvantages:
i. potential for “false rollbacks”ii. care must be taken when deciding on how to map LPs to KPs
ROSS: Kernel Processes
ROSS: KP Efficiency
Not enough work in system…
Small trade-off: longer rollbacks vs faster FC
ROSS: KP Performance Impact
# KPs does not negatively impact performance
ROSS: Performance vs GTW
ROSS outperforms GTW 2:1 in sequential
ROSS outperforms GTW 2:1 at best parallel
Optimistic approach– Relies on global virtual time (GVT) algorithm to perform fossil collection at
regular intervals– Events with timestamp less than GVT:
• Will not be rolled back• Can be freed
GVT calculation– Synchronous algorithms: LPs stop event processing during GVT calculation
• Cost of synch. may be higher than positive work done per interval• Processes waste time waiting
– Asynchronous algorithms: LPs continue processing events while GVT calculation continues in the background
* Goal: creating a consistent cut among LPs that divides the events into past and future the wall-clock time
Two problems: (i) Transient Message Problem, (ii) Simultaneous Reporting Problem
Simulation: Seven O’clock GVT
Construct cut via message-passing
Cost: O(log n) if tree, O(N) if ring
! If large number of processors, then free pool exhausted waiting for GVT to complete
Simulation: Mattern’s GVT
Construct cut using shared memory flag
Cost: O(1)
! Limited to shared memory architecture
Sequentially consistent memory model ensures proper causal order
Simulation: Fujimoto’s GVT
Sequentially consistent does not mean instantaneous
Memory events are only guaranteed to be causally ordered
Is there a method to achieve sequentially consistent shared memory in a loosely coordinated, distributed environment?
Simulation: Memory Model
Key observations: – An operation can occur atomically within a network of processors if all processors observe
that the event occurred at the same time.– CPU clock time scale (ns) is significantly smaller than network time-scale (ms).
Network Atomic Operations (NAOs): – an agreed upon frequency in wall-clock time at which some event logically observed to
have happened across a distributed system.– subset of the possible operations provided by a complete sequentially consistent memory
model.
wall-clock timeCompute
GVTCompute
GVTCompute
GVTCompute
GVTCompute
GVTCompute
GVTCompute
GVT
Update Tables
Update Tables
Update Tables
Update Tables
Update Tables
Update Tables
Update Tables
Simulation: Seven O’clock GVT
wall-clock time
GVT
7
5
10 9
LVT: 7
LVT: 5
LVT: min(5,9)
GVT: min(5,7)
A B C D E
• Itanium-2 Cluster• r-PHOLD• 1,000,000 LPs• 10% remote events• 16 start events• 4 machines
– 1-4 CPUs– 1.3 GHz
• Round-robin LP to PE mapping
Simulation: Seven O’clock GVTLinear Performance
• Netfinity Cluster• r-PHOLD• 1,000,000 LPs• 10, 25% remote
events• 16 start events• 4 machines
– 2 CPUs, 36 nodes– 800 GHz
Simulation: Seven O’clock GVT
• Itanium-2 Cluster• 1,000,000 LPs
– each modeling a TCP host (i.e. one end of a TCP connection).
• 2 or 4 machines– 1-4 CPUs on each– 1.3 GHz
• Poorly mapped LP/KP/PE
Simulation: Seven O’clock GVT: TCP
Linear Performance
• Netfinity Cluster• 1,000,000 LPs
– each modeling a TCP host (i.e. one end of a TCP connection).
• 4-36 machines– 1-2 CPUs on each– Pentium III– 800MHz
Simulation: Seven O’clock GVT: TCP
• Sith Itanium-2 cluster
• 1,000,000 LPs – each modeling a
TCP host (i.e. one end of a TCP connection).
• 4-36 machines– 1-2 CPUs on each– 900MHz
Simulation: Seven O’clock GVT: TCP
Summary– Seven O’Clock Algorithm
• Clock-based algorithm for distributed processors– creates a sequentially consistent view of distributed memory
• Zero-Cost Consistent Cut– Highly scalable and independent of event memory limits
Fujimoto’s Seven O’Clock Mattern’s Samadi’s
Cut Calculation Complexity
O(1) O(1) O(n) or O(log n) O(n) or O(log n)
Parallel / Distributed
P P & D P & D P & D
Global Invariant Shared Memory Flag
Clock Synchronization
Message Passing Interface
Message Passing Interface
Independent of Event Memory
N Y N N
Simulation: Seven O’clock GVT
Summary: ContributionsMeta-simulation
ROSS.Net: platform for large-scale network simulation, experiment design and analysis
OSPFv2 protocol performance analysis
BGP4/OSPFv2 protocol interactions
Simulation kernel processes
memory efficient, large-scale simulation
Seven O’clock GVT Algorithm zero-cost consistent cut
high performance distributed execution
Summary: Future WorkMeta-simulation
ROSS.Net: platform for large-scale network incorporate more realistic measurement data, protocol models
CAIDA, Multi-cast, UDP, other TCP variants
more complex experiment designs better qualitative analysis
Simulation Seven O’clock GVT Algorithm
compute FFT and analyze “power” of different models attempt to eliminate GVT algorithm by determining max rollback length