Date post: | 21-Dec-2015 |
Category: |
Documents |
Upload: | grant-warren |
View: | 213 times |
Download: | 0 times |
Faithful Reproduction of Network Experiments
Dimosthenis Pediaditakis
CharalamposRotsos
Andrew W.Moore
Computer Laboratory, Systems Research GroupUniversity of Cambridge, UK
http://selena-project.github.io
2http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Research on networked systems: Present
1 GbE
10 GbE
WAN link: 40++ Gbps
100 Mbps 100 Mbps 100 Mbps
1 GbE 1 GbE 1 GbEHow can we
experiment
with new
architectures?
3
Performance of widely available tools
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
• A simple experiment– 2-pod Fat-Tree– 1 GbE links– 10K 5MB TCP flows
• Simulation (ns3)– Flat model– 2.75x lower
throughput
• Emulation (MiniNet)– 4.5x lower
throughput– Skewed CDF
4
Why not simulation
• Fidelity– Modelling abstractions– Real stacks or applications?
• Scalability– Network size– Network speed (10Gbps ++)– Poor execution time scalability
• Reproducibility– Replication of configuration– Repeatability of results (same rng seeds)
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Example: NS2 / NS3Fidelity
ScalabilityReproducibility
Simulation
5
Why not real-time emulation
• Fidelity– Real stacks or applications– Heterogeneity support – SDN devices
• Scalability– CPU bottleneck
• Network speed• Network size
• Reproducibility– Replication of configuration– Repeatability of results
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Example: MiniNetFidelity
ScalabilityReproducibility
SimulationEmulation
6
In an ideal world...
• Fidelity– Real stacks or applications– Heterogeneity support – Realistic SDN switch model
• Scalability– 10GbE, 100Gbps ...– 100s of nodes
• Reproducibility– Replication of configuration– Repeatability of results
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
What if we could achieve:Fidelity
ScalabilityReproducibility
SimulationEmulationOur vision
7
• High-level experiment description, automation– Python API (MiniNet style)
• Real OS components, applications– Xen based emulation– Fine-grained resources control– Heterogeneous deployments
• Hardware resources scaling– Time dilation (revisiting DieCast), unmodified guests– Users can trade execution speed for fidelity and scalability
• Network control plane fidelity– Support for unmodified SDN platforms– Empirical OpenFlow switch model (extensible)
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
8
Deploying an experiment with SELENA
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
OVSBridge Bridge
Selena compiler
9
Scaling resources via Time Dilation
• Create a scenario, choose TDF
• Linear and symmetric scaling of “perceived” by the guest OS resource– Network I/O , CPU, disk I/O
• Control independently the guest’s “perception” of available resources– CPUs Xen Credit2– Network Xen VIF QoS, NetEm/DummyNet– Disk I/O within guests via cgroups/rctl
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
10
The concept of Time-Dilation
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
I command you to slow
down
1 tick = (1/C_Hz) seconds
RealTime
10 Mbits data
Real time
rateREAL = 10 / (6*C_Hz) Mbps
2x Dilated time (TDF = 2)
(tick rate)/2 , C_Hztick rate , 2*C_Hz
OR
Virtualtime
10 Mbits datarateVIRT = 10 / (3*C_Hz) Mbps = 2 * rateREAL
11
PV-guest time dilation
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
XEN Hypervisor
rdtscVIRQ_TIMERHyp
ervi
sor_
set_
timer
_op
XENClock Source
TSCvalueXEN VIRQ
setnext
event
• Wall clock time– Time since epoch– System time (boot)– Independent
clock mode (rdtsc)
• Timer interrupts– Scheduled timers– Periodic timers– Loop delays
12
OpenFlow Toolstack X-Ray
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Network OS
H/W
S/W
ASIC
OF Agent
ControlApp
ControlApp
PCI(e)
ControlChannel
Available capacity, synchronicity
ASIC driver policy configuration : - latency and semantics in
- Scarce co-processor resources - Switch OS scheduling is non-trivial
Control application complexity
How critical is SDN control plane performance for the data plane performance ?
Limited PCI bus capacity
13
Building an OpenFlow switch model• Measure an off-the-shelf switch device– Measure message processing performance (OFLOPS)– Extract latency and loss characteristics of:• flow table management• the packet interception / injection mechanism• Statistics counters extraction
• Configurable switch model– Replicate latency and loss characteristics– Implementation: Mirage-OS based switch
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
14
Evaluation roadmap
Methodology
1. Run experiment onreal hardware
2. Reproduce results in:– MiniNet– NS3– SELENA
3. Compare against “real”
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
Dimensions of fidelity
1. Throughput
2. Latency
3. Control plane
4. Application performance
5. Scalability
15
Latency fidelity
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Setup - 18 nodes, 1Gbps links 10000 flows
MiniNet, Ns3 accuracy: 32% and 44%
Selena accuracy 71% with 5x dilation 98.7% with 20x dilation
mininetns3
Platform Execution Time
Mininet 120s
Ns-3 172m 51s
SELENA (TDF=20) 40m
SELENA
16
SDN Control plane Fidelity
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
1Mb
TCP
flow
s co
mpl
etion
tim
eex
pone
ntial
arr
ival
λ =
0.0
2
Stepping behavior: - TCP SYN & SYNACK loss
Mininet switch model: - does not capture this throttling effect
The model is not capable to capture transient switch OS scheduling effectsof the real switch.
17
Scalability analysis
• Star topology, 1 GbE links, multi Gbit sink link• Dom-0 is allocated 4-cores– Why tops at 250% CPU utilisation ?
• Near linear scalability
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
OVSBridge Bridge
18
Application fidelity (LAMP)
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
• 2-pod Fat-Tree– 1 GbE links– 10x switches– 4x Clients– 4x WebServers:
Apache2, PHP, MySQL, Redis, Wordpress
19
SELENA usage guidelines• SELENA is primarily a NETWORK emulation framework
– Perfect match: network bound applications– Allows experimentation with:
• CPU, disk, Network relative performance• Real applications / SDN controllers / network stacks
– Improved fidelity and scalability• Outperforms common simulation / emulation tools
• Time dilation is exciting but not a panacea– Hardware-specific performance characteristics, e.g.:
• Disks, cache size, per-core lock contention, Intel DDIO
• Rule of thumb for choosing TDF– Low Dom-0 and Dom-U utilisation– Observation time-scales matter
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
20http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
http://selena-project.github.io
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
21
SELENA is free and open.Give it a try: http://selena-project.github.io
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
Backup slides
24
MiniNet and Ns3 - 2.7Gbps and 5.3GbpsSELENA - 10x dilation: 99.5% accuracy - executes 9x faster than Ns3
Throughput fidelity
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Platform Execution TimeMininet 120sNs-3 175m 24sSELENA (TDF=10) 20m
ns3
mininet
SELENA
25
Scalability• Multi-machine emulation– Synchronization among host– Efficient placement
• Optimize guest-2-guest Xen communications
• Auto-tuning of TDF
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
26
A layered SDN controller hierarchy
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
4 pod, Fat-Tree topology, 1GbE links32 Gbps aggregate traffic
The layered control-plane architecture
Question: How does a layered controller hierarchy affect performance ?
1st Layer Controller 2nd Layer Controller
• More layers– Control decisions taken higher in the hierarchy– Flow setup latency increases
• Network, Request pipelining, CPU load
– Resilience
Limitations of ns-3• Layer 2 models– CSMA Link:• Half duplex -> lower throughput.• The only wired model supporting Ethernet .
– Point-to-point link model:• IP only -> Cannot use switches.• Distributed -> Synchronisation is not a good fit for DC
experiments. • Time scalability is similar to CSMA.
• Layer 3 models– TCP socket model• No window scaling
28
Containers vs Xen• Heterogeneity (OS, network stacks)
• OS-level time virtualization is easier
• Resource management– Containers: cgroups, kernel noise, convoluted tuning – Xen: Domain-0 -- Xen -- Dom-U isolation
• Can run MiniNet in a time-dilated VM
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
29
Why not just scale network rates• Non uniform resource and time scaling– User space applications– Kernel (protocols, timers, link emulation)
• Not capturing the packet-level protocol effects– E.g. TCP window sizing– Queueing fidelity
• Lessons learned via MiniNet use cases– JellyFish topology– TCP-incast effect
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
Related work
31http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
32http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
33http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
34http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
35http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
36http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
http://selena-project.github.io
37http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califoria, USA
46
Research on networked systems: past, present, future
• Animation: 3 examples of networks.Examples will show the evolution of “network-characteristics” on which research is conducted:– Past: 2-3 Layers, Hierarchical, TOR, 100Mbps, bare metal OS– Present: Fat-tree, 1Gbps links, Virtualization, WAN links– Near future: Flexible architectures, 10Gbps, Elastic resource management, SDN
controllers, OF switches, large scale (DC),
• The point of this slide is that real-world systems progress at a fast pace (complexity, size) but common tools have not kept up with this pace
• I will challenge the audience to think:– Which of the 3 examples of illustrated networks they believe they can model
with existing tools– What level of fidelity (incl. Protocols, SDN, Apps, Net emulation)– What are the common sized and link speeds they can model
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
47
A simple example with NS-3• Here I will assume a simple star-topology• 10x clients, 1x server, 1x switch (10Gbps
aggregate)• I will provide the throughput plot and explain
why performance sucks• Point out that NS3 is not appropriate for faster
networks• Simplicity of models + non real applications• Using DCE: even slower, non full POSIX-
complianthttp://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
48
A simple example with MiniNet• Same as before• Throughput plot• Better fidelity in terms of protocols, applications
etc – Penalty in performance
• Explain what is the bottleneck, especially in relation to MiniNet’s implementation
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
49
Everything is a trade-off• Nothing comes for free when it comes to modelling and the 3 key-experimentation
properties• MiniNet aims for fidelity
– Sacrifices scalability• NS-3 aims for scalability (many abstractions)
– Sacrifices fidelity, +scalability limitations• The importance of Reproducibility
– MiniNet is a pioneer– difficult to maintain from machine to machine
• MiniNet cannot guarantee that at the level of performance, only at the level of configuration
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
Fidelity
ScalabilityReproducibility
50
SELENA: Standing on the shoulders of giants• Fidelity: use Emulation
– Unmodified apps and protocols: fidelity + usability– XEN: Support for common OS, good scalability, great control on resources
• Reproducible experiments– MiniNet approach, high-level experiment descriptions, automation
• Maintain fidelity under scale– DieCast approach: time dilation (will talk more later on that)
• The user is the MASTER:– Tuning knob: Experiment Execution speed
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
51
SELENA Architecture• Animation here: 3 steps show how an experiment is
– Specified (python API)– compiled– deployed
• Explain mappings of network entities-features to Xen emulation components
• Give hints of optimization tweaks we use under the hood
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
Experiment descriptionPython API
Selena compiler
52
Time Dilation and Reproducibility• Explain how time dilation also FACILITATES
reproducibility across different platforms• Reproducibility– Replication of configuration
• Network architecture, links, protocols• Applications• Traffic / workloads• How we do it in SELENA: Python API, XEN API
– Reproduction of results and observed performance • Each platform should have enough resources to rund faithfully
the experiment• How we do it in SELENA: time dilation
– An older platform/hardware will require a different minimum TDF to reproduce the same results
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
53
Demystifying Time-Dilation 1/3• Explain the concept in high-level terms– Give a solid example with a timeline• Similar to slide 8:
http://sysnet.ucsd.edu/projects/time-dilation/nsdi06-tdf-talk.pdf
• Explain that everything happens at the H/V level– Guest time sandboxing (experiment VMs)– Common time for kernel + user space– No modifications for PV guests• Linux, FreeBSD, ClickOS, OSv, Mirage
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
54
Demystifying Time-Dilation 2/3• Here we explain the low-level staff• Give credits to DieCast, but also explain the
incremental work we did• Best to show/explain with an animation
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
55
Demystifying Time-Dilation 3/3• Resources scaling
– Linear and symmetric scaling for Network, CPU, ram BW, disk I/O– TDF only increases the perceived performance headroom of the
above– SELENA allows for configuring independently the perceived speeds
of• CPU • Network• Disk I/O (from within the guests at the moment -- cgroups)
• Typical workflow1. Create a scenario 2. Decide the minimum necessary TDF for supporting the desired
(will see more later on that)3. Independently scale resources, based on the requirements of the
users and the focus of their studieshttp://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
56
Summarizing the elements of Fidelity• Resource scaling via time dilation (already
covered)• Real Stacks and other OS components• Real Applications– Including SDN controllers
• Realistic SDN switch models– Why is it important– How much can it affect observed behaviours
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
57
Inside an OF switch• Present a model of an OF switch internals– Show components– Show paths / interactions which affect performance• Data plane (we do not model that currently)• Control plane
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
Random image from the web.Just a placeholder
58
Building a realistic OF switch model• Methodology for constructing an empirical
model– PICA-8– OFLOPS measurements• Collect, analyze, extract trends• Stochastic model
– Use a mirage-switch to implement the model• Flexible, functional, non-bloated code• Performant: uni-kernel, no context switches• Small footprint: scalable emulations
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
59
Evaluation methodology1. Run experiment on real hardware2. Reproduce results in:
1. MiniNet2. NS33. SELENA (for various TDF)
3. Compare each one against “real”
• We evaluate multiple aspects of fidelity:– Data-Plane– Flow-level– SDN Control – Application
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
60
Data-Plane fidelity• Figure from paper• Explain Star-topology• Show comparison of MiniNet + NS3– Same figures from slides 2+3 but now compared
against Selena + real• Point out how increasing TDF affects fidelity
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
61
Flow-Level fidelity• Figure from paper• Explain Fat-tree topology
http://selena-project.github.io/ ANCS 2014, Marina del Rey, Califorina, USA
62
Execution Speed• Compare against NS3, MiniNet• Point out that SELENA executes faster than NS3– NS3 however replicates only half speed network• Therefore difference is even bigger
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
63
SDN Control plane Fidelity• Figure from paper• Explain experiment setup• Point out shortcomings of MiniNet– As good as OVS is
• Point out terrible support for SDN by NS3
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
64
Application level fidelity• Figure from paper• Explain the experiment setup• Latency aspect• Show how CPU utilisation matters for fidelity– Open the dialogue for the performance bottlenecks
and limitations and make a smooth transition to the next slide
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
65
Near-linear Scalability• Figure from paper• Explain how is scalability determined for a given
TDF
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
66
Limitations discussion• Explain the effects of running on Xen• Explain what happens if TDF is low and
utilisation is high • Explain that insufficient CPU compromises– Emulated network speeds– Capability of guests to utilise the available bandwidth– Skews the performance of networked applications– Adds excessive latency• Scheduling also contributes
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
67
A more complicated example• Showcase the power of SELENA :P• Use the MRC2 experiment
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
68
Work in progress• API compatibility with MiniNet
• Further improve scalability - Multi-machine emulation - Optimize guest-2-guest Xen communications
• Features and use cases– SDN coupling with workload consolidation – Emulation of live VM migration– Incorporate energy models
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA
69
SELENA is free and open.Give it a try:
- http://selena-project.github.io
http://selena-project.github.io/ ANCS 2014, Marina del Rey, California, USA