OFRewind: Enabling Record & Replay
Troubleshooting for Networks
Andreas Wundsam • Dan LevinSrini Seetharaman • Anja Feldmann
An-Institut der Technischen Universität Berlin
USENIX ATC 2011
Quick 101
classical switch
Quick 101
OpenFlow switch
PKT_IN
FLOW_MOD
entry
OpenFlow entry
!"#$%&
'()$
*+,
-)%
*+,
.-$
/$&
$012
34+5
67
6'
!)%
6'
7-$
6'
')($
8,'
-1()$
8,'
.1()$
9:;2 +%$#(< !$=$-
>? @()"=).A1=%B2$A$(A1()$C-D
E? /<%=1-:;=$2A=<.AF()"=).A$(A%(<$)(;;2)
G? 7)(1A1=%B2$
H? !2<.A$(A<()I=;A1)(%2--#<JA1#12;#<2
KAI=-B
'=%B2$AKAL0$2A%(:<$2)-
(Figure from the Openflow Intro Presentation, N. McKeown)
Back to the topic of my talk:OFRewind!
Motivating use case
20:00 21:00 22:00 23:00 00:000
50
100C
PU U
til %
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SET
CO
NFI
G
CPU Utilization of an OpenFlow switch
20:00 21:00 22:00 23:00 00:000
50
100C
PU U
til %
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
50
100
150
200
250
300
350
400
PAC
KET
IN
No correlation!
Arrivals of PKT_IN msgs
20:00 21:00 22:00 23:00 00:000
50
100C
PU U
til %
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
10
20
30
40
50
60
70
80
FLO
W M
OD
No correlation!
Arrivals of FLOW_MOD msgs
No correlation!
20:00 21:00 22:00 23:00 00:000
50
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
50
100
150
200
250
300
350
400
PAC
KET
IN
20:00 21:00 22:00 23:00 00:000
50
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
50
100
150
200
250
300
350
400
450
500
PAC
KET
OU
T
20:00 21:00 22:00 23:00 00:000
20
40
60
80
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
10
20
30
40
50
60
70
FLO
W E
XPIR
ED
20:00 21:00 22:00 23:00 00:000
50
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
10
20
30
40
50
60
70
80
FLO
W M
OD
20:00 21:00 22:00 23:00 00:000
20
40
60
80
100
CPU
Util
%Nov−06−2009 to Nov−07−2009
20:00 21:00 22:00 23:00 00:000
5
10
15
20
25
STAT
S R
EPLY
20:00 21:00 22:00 23:00 00:000
20
40
60
80
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
5
10
15
20
25
STAT
S R
EQU
EST
Clueless...
• Switch is a black box component
• Can't inspect internal state, source code
• No analytical explanation for the behavior
• Message arrivals do not correlate with symptoms
• Existing interfaces (CLI, SNMP) too coarse grained
Troubleshooting networks is hard
huge, critical black boxes timing / races
A solution?
Record
In production
Trouble-shoot
Replay
Reproduce atconvenient
location / pace
Existing approaches
Endhost Replay Debugging
Fully deterministic replay, via binary instrumentation /
virtualization
✘ no black boxes
✘ scalability?
TCPDump / TCPReplay et. al.
Capture/Replay events
✘ Single vantage point, no network wide view
✘ Scalability due to dataplane datarates
Existing approaches
Endhost Replay Debugging
Fully deterministic replay, via binary instrumentation /
virtualization
✘ no black boxes
✘ scalability?
TCPDump / TCPReplay et. al.
Capture/Replay events
✘ Single vantage point, no network wide view
✘ Scalability due to dataplane datarates
Full recording of all events feasible?
However...
• Not all traffic is equal (ctrl plane: 1% traffic, 95-99% bugs!)*
• Behavior of many network devices:
Largely Deterministic w.r.t.
Control Plane Network Events
* Altekar / Stoica, 2010
events + traffic
selective: record important traffic (control)
skip/aggregate less important traffic (data plane)
Recordreinject events + traffic
"best effort replay"
replay partial recordings
reproduce problem at a chosen time / location
Replay
Go Network* Wide / Always On!
* controller domain
Replay TweakingLocalize problems through:
Device mapping
Scale time investigate timing issues
Time dilation
different devices / versionsinvestigate regressions / vendor implementation issues
iteratively replay subselected traffic localize events that trigger failure
Trace bisection
Goals
✓Record a controller domain
✓Scalable, selective, consistent
✓Even with black boxes
✓coordinated Replay
✓ Replay tweaking
✓ Localize problems
Non-Goals
✘Root cause analysis
✘Automatic configuration of what to record
✘Fully deterministic replay
Introducing the tool
System design2 components of 2 modules each:
OFRecordOpenFlow controller
OFRecord
sw3sw2
sw1
c1
c2c3
c4 c5
c6
DataStoreDataStores
p2p1
pm
OFReplay
p2
OFReplay
sw3sw2
sw1
DataStoreDataStores
p1
pm
OFReplayOpenFlow controller
OFReplay
Typical Usage
• Deploy Ofrecord in production environment -> proxy to 'regular' controller
• Always-on OF messages, control plane, data plane summaries
• Alter selection rules as necessary
• Deploy Ofreplay in lab environment
• Localize bugs / validate bug fixes
Case studies
1. Debugging Black box components
• CPU inflation in an OpenFlow switch
2. Debugging OpenFlow controllers
• NOX problem
+ Others (see poster/paper)
Back to CPU inflation
• Replay and bisect the trace by message type
20:00 21:00 22:00 23:00 00:000
50
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SET
CO
NFI
G
Back to CPU inflation
• When replaying STATS_REQ msgs...
STATS_REQ msgs reproduce the problemeven though there is no correlation in arrival times
• Replay and bisect the trace by message type
20:00 21:00 22:00 23:00 00:000
20
40
60
80
100
CPU
Util
%
Nov−06−2009 to Nov−07−200920:00 21:00 22:00 23:00 00:00
0
5
10
15
20
25
STAT
S R
EQU
EST
Record
08:06 08:36 09:06 09:36 10:06 10:360
50
100Replayed traffic characteristics
Time
CPU
usa
ge (%
)
08:06 08:36 09:06 09:36 10:06 10:360
50
100
Time
Flow
set
up ti
me
(ms)
Replay
Debugging controllers: NOX problem
• Problem record: Messages initiated by one specific device don't reach NOX controller module
• Not reproducible at the lab
Debugging controllers: NOX problem
• Record at end user site
• Replay at lab towards NOX
• Use host-level debugging to analyze NOX behavior
Debugging controllers: NOX problem
• NOX has an 'intelligent' MAC address parser that handles both binary and ASCII MAC addresses
• '0x3a' is the ASCII representation of ':' and appeared in the binary form of this MAC :)
00:26:55:da:3a:40
• Trigger: specific source MAC address
0x3a == ':'
Performance Evaluation
• Record: production environment
• OFRecord controller performance
• Impact on switch performance
• Replay: lab environment
• Timing accuracy
0 10 20 30 40 50 60
02
00
00
60
00
01
00
00
0
# Switches
Flo
w R
ate
/s
flowvisornox!pyswitchnox!switchofrecordofrecord!dataof!simple
OFRecord controller performance
Median # Flows handled by different controllers (measured with cbench)
NOX, Flowvisor, OFRecord
SimpleController
Impact on switch performance
5 10 20 50 100 200 500 2000 5000
510
20
50
100
500
2000
Flows sent/s
Flo
ws
rec/
s
of!record (Vendor A)of!record!data (Vendor A)of!simple (Vendor A)of!record (Vendor B)of!record!data (Vendor B)of!simple (Vendor B)
• Single UDP packet flows created using hping
• sent to switches of two different vendors
• measure # flows successfully forwarded
• compare OFRecord vs. SimpleCtrl
Vendor B breaks down
Vendor A saturates
OFRecord:limited switchperformance
penalty
End-to-end performance
Rate [Flows/s] Drop % sd (timing) [ms]5 0 4.510 0 15.620 0 21,150 0 23,4100 0 10,9200 0 13,9400 19 % 15,8
Summary
• reproduce problems at convenient time and place
• Combined in OfRewind, an Open-Flow based tool for Network Record & Replayhttp://www.openflow.org/wk/index.php/OFRewind
• Enables practical record and replay of network domains
Selective, consistent, multigranularity
Network Recording
Adaptive coordinatedbest-effort
Network Replay&
New Primitives:
Future work
• Scale to larger topology sizes, more complex networks
• Extend to production quality tool
• Improve timing for very fast flow rates
• Automated regression tests through standard sets of traces
Thank you.
Summary
• reproduce problems at convenient time and location
• Combined in OfRewind, an Open-Flow based tool for Network, Record & Replay
• Enables practical record and replay of network domains
• http://www.openflow.org/wk/index.php/OFRewind
Selective, consistent, multigranularity
Network Recording
Adaptive coordinatedbest-effort
Network Replay&
New Primitives: