SCREAM: Sketch Resource Allocation for Software-defined Measurement
Masoud Moshref, Minlan Yu,
Ramesh Govindan, Amin Vahdat
(CoNEXT’15)
Measurement is Crucial for Network Management
2
AccountingAnomaly Detection
Traffic Engineering
Heavy Hitter detectionHeavy hitter detection (HH)
Change detectionSuper source detection (SSD)
DDoS detection
Anomaly Detection
Traffic Engineering
Network Management on multiple tenants:
Measurement tasks:
Heavy Hitter detectionHierarchical heavy hitter detection (HHH)
Need fine-grained visibility of network traffic
Controller
DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15]
Software Defined Measurement
3
Switch ATask 1 counters
Task 2 counters
Switch BTask 1 counters
Task 2 counters
Collect
Configure
Task 2Task 1
Our Focus: Sketch-based Measurement
4
Summaries of streaming data to approximately answer specific queries E.g., Bitmap for counting unique items
OpenFlow CountersDREAM [SIGCOMM’14]
Sketches
Memory Expensive, power-hungry TCAM
Cheaper SRAM
Counters Volume counters Volume and Connection counters
Flows Selected prefixes All traffic all-the-time
SCREAM [CoNEXT’15]
Sketches use a cheaper memory and are more expressive
Sketch Example: Count-Min Sketch
5
(IP, 1 Kbytes)
h1(IP)
h2(IP)
h3(IP)
What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2
d
At packet arrival:
Provable error bound given traffic properties (e.g., skew)Resource accuracy trade-off:
At query:
2+1=3
4+1=5
1+1=2
Challenges: Limited Counters for Many Tasks
6
Many task instances:• 3 types (Heavy hitter, Hierarchical heavy hitter, Super source)• Different flow aggregates (Rack, App, Src/Dst/Port)• 1000s of tenants
Limited shared resources:• SRAM capacity (e.g., 128 MB)• Shared with other functions (e.g., routing)
Too many resources to guarantee accuracy:1 MB-32 MB per task• Less than 4-128 tasks in SRAM
Goal: Many Accurate Sketch-based Measurements
7
Users dynamically instantiate a variety of measurement tasks
SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy
Approach: Dynamic Resource Allocation
8
Resource accuracy trade-off depends on traffic
Dynamic allocation for current traffic
Worst-case uses >10x counters than average
Count Min: Provable error bound given traffic propertiesEx: Skew of traffic from each IP
Skew
Req
uir
ed m
emo
ry
Opportunity: Temporal Multiplexing
9
Task 1
Task 2R
equ
ired
Mem
ory
Time
Multiplex memory among tasks over time
Memory requirement varies over time
Opportunity: Spatial Multiplexing
10
Req
uir
ed M
emo
ry
Switch A Switch B
Memory requirement varies across switches
Multiplex memory among tasks across switches
Task 1
Task 2
Key Insight
11
Leverage spatial and temporal multiplexing
and dynamically allocate switch memory per task
to achieve sufficient accuracy for many tasks
• DREAM has the same insight• SCREAM applies it for sketches
SCREAM Contributions
12
Heavy hitter (HH) tasks
Super Source(SSD) tasks
Dynamic resource allocator
Hierarchical heavy hitter (HHH) tasks
Allocation
1- Supports 3 sketch-based task types
2- Allocate memory among sketch-based task instances across switches while maintaining sufficient accuracy
SCREAM
• Anomaly detection• Traffic engineering• DDoS detection
SCREAM Iterative Workflow
13
Estimate accuracy
Allocate resources
Collect & report
Counters from many switches
Accuracy
Memory size
SCREAM Iterative Workflow
14
0 20 40 600
20
40
60
80
100
Time (s)
Task 1
Task 2
0 20 40 6010
20
30
40
50
Time (s)
Allo
ca
ted
Me
mo
ry (
KB
)
Task 1
Task 2
Task 1
Task 2
Task 1
Task 2
Task1 accuracy <80%
Give more memory to task1
Estimate accuracy
Allocate resources
Collect & report
Acc
ura
cy
SCREAM Iterative Workflow
15
0 20 40 600
20
40
60
80
100
Time (s)
Pre
cis
ion
Task 1
Task 2
0 20 40 6010
20
30
40
50
Time (s)
Allo
ca
ted
Me
mo
ry (
KB
)
Task 1
Task 2
Task 1
Task 2
Estimate accuracy
Allocate resources
Collect & report
Skew of traffic for task2 changesTask2 accuracy <80%
Give more memory to task2
Acc
ura
cyMerge counters from switches
SCREAM Challenges
Estimate accuracy
Allocate resources
Collect & report
Network-wide task implementation using sketches
Accuracy estimation without the ground-truth
Fast & Stable allocation in DREAM [SIGCOMM’14]
Switch BSwitch A
Challenge: Merge Sketches of Different Sizes
17
Network-wide Task
Heavy hitter (HH)
d d
w1 w2
Source IPs sending > 10Mbps
10 15
25
≥
SCREAM Solution to Merge Sketches for HH Detection
18
10
30
70
40
50
20
10 40 30 50 70 20
50
10 4030 5070 20
30
+
Previous work: Min of sums SCREAM: Sum of mins
Min 10 20
Min Min50 80 90
+ ++
Switch BSwitch A10 15
25
Both over-approximate smaller is more accurate
SCREAM Solutions
Estimate accuracy
Allocate resources
Collect & report
Accuracy estimation without the ground-truth
• Merge sketches of different sizes for HH, HHH, SSD• SSD algorithm with higher and more stable accuracy
Network-wide task implementation using sketches
Fast & Stable allocation in DREAM [SIGCOMM’14]
Precision Estimation for Heavy Hitter Detection
20
Threshold
True HH False HH
Estimated
Real
Error Estimate-ThresholdEstimate-Threshold
= Sum(P[Detected HH is true])
= 1 - P[Error ≥ Estimate-Threshold]
True detected HHDetected HHs
Precision =
Insight: Relate probability to Error on counters of detected HHs
P[Detected HH is true]
Precision Estimation Step 1: Find a Bound on The Error
21
Idea 1: Use average Error in Markov’s inequality to bound it
Idea 1
= 1 - P[Error ≥ Estimate-Threshold]
Insight: Relate probability to Error on counters of detected HHs
P[Detected HH is true]
A row in Count-Min:
Precision Estimation Step 2: Improve The Bound
22
Insight:• Average Error = heavy items collision + small items collision• Counter indices of detected HHs show heavy collisions
Idea 2: Markov’s inequality only for small items
Idea 1Idea 2
SCREAM Solutions
Estimate accuracy
Allocate resources
Collect & report
Accuracy estimation without the ground-truth
• Merge sketches of different sizes for HH, HHH, SSD• SSD algorithm with higher and more stable accuracy
Network-wide task implementation using sketches
Precision estimators for HH, HHH and SSD tasks
Fast & Stable allocation in DREAM [SIGCOMM’14]
Evaluation
24
Metrics:
• Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy
• % of rejected tasks
Alternatives:• OpenSketch: Allocate for bounded error for worst-case
traffic at task instantiation (test with different bounds)• Oracle: Knows required resource for a task in each
switch in advance
Evaluation Setting
25
Simulation for 8 switches:• 256 task instances (HH, HHH, SSD, combination)• Accuracy bound = 80%• 5 min tasks arriving in 20 minutes• 2 hours CAIDA trace
128 256 384 5120
20
40
60
80
100
Switch capacity (KB)R
eje
cte
d ta
sks (
%)
OS_10
OS_50
OS_90
SCREAM
128 256 384 5120
20
40
60
80
100
Switch capacity (KB)
Ave
rag
e S
atis
factio
n
OS_10
OS_50
OS_90
SCREAM
SCREAM Provides High Accuracy for More Tasks
26
SCREAM: High satisfaction and low reject
OpenSketch:
Loose bound Under provision low satisfactionTight bound Over provision high reject
SCREAM’s Performance Is Close to An Oracle
27
128 256 384 5120
20
40
60
80
100
Switch capacity (KB)
Re
jecte
d ta
sks (
%)
Oracle
SCREAM
128 256 384 5120
20
40
60
80
100
Switch capacity (KB)
Ave
rag
e S
atis
factio
n
Oracle
SCREAM
SCREAM performance is close to an oracle, its satisfaction is a bit lower because:• Iterative allocation takes time• Accuracy estimation has error
Other Evaluations
28
SCREAM accuracy estimation has 5% error in averageAccuracy estimation error
Changing traffic skewSCREAM supports more accurate tasks than OpenSketch
Other accuracy metricsTasks in SCREAM have high recall (low false negative)
Conclusion
29
Practical sketch-based SDM by dynamic memory allocation• Implementing network-wide tasks using sketches• Estimating accuracy for 3 types of tasks
SCREAM is available at github.com/USC-NSL/SCREAM
Measurement is crucial for SDN managementin a resource-constrained environment
Thanks!Questions?
30