8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
1/15
Enabling Flow-level Latency
Measurements across Routers in Data
Centers
Parmjeet Singh, Myungjin Lee
Sagar Kumar, Ramana Rao Kompella
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
2/15
Latency-critical applications in data centers
Guaranteeing low end-to-end latency is important Web search (e.g., Googles instant search service)
Retail advertising
Recommendation systems
High-frequency trading in financial data centers
Operators want to troubleshoot latency anomalies
End-host latencies can be monitored locally
Detection, diagnosis and localization through a network: no
native support of latency measurements in a router/switch
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
3/15
Prior solutions
Lossy Difference Aggregator (LDA) Kompella et al. [SIGCOMM 09]
Aggregate latency statistics
Reference Latency Interpolation (RLI) Lee et al. [SIGCOMM 10]
Per-flow latency measurements
More suitable due to more fine-grained measurements
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
4/15
Deployment scenario of RLI
Upgrading all switches/routers in a data center network Pros
Provide finest granularity of latency anomaly localization
Cons
Significant deployment cost
Possible downtime of entire production data centers
In this work, we are considering partial deployment of RLI
Our approach: RLI across Routers (RLIR)
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
5/15
Overview of RLI architecture
Goal
Latency statistics on a per-flowbasis between interfaces
Problem setting
No storing timestamp for each packet at ingress and egressdue to high storage and communication cost
Regular packets do not carry timestamps
Router
Ingress I
Egress E
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
6/15
Overview of RLI architecture
Premise of RLI: delay locality
Approach
1) The injector sends reference packets regularly
2) Reference packet carries ingress timestamp
3) Linear interpolation: compute per-packet latency estimates atthe latency estimator
4) Per-flow estimates by aggregating per-packet estimates
Latency
Estimator
Reference
Packet
Injector
Ingress I Egress E
2
1
R
L
Delay
Time
R
L
1
Linear interpolation
line
2
Interpolateddelay
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
7/15
Full vs. Partial deployment
Full deployment: 16 RLI sender-receiver pairs
Partial deployment: 4 RLI senders + 2 RLI receivers
81.25 % deployment cost reduction
Switch 1 Switch 5
Switch 2 Switch 4
Switch 3
Switch 6
RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
8/15
Case 1: Presence of cross traffic
Issue: Inaccurate link utilization estimation at the sender
leads to high reference packet injection rate
Approach Not actively addressing the issue
Evaluation shows no much impact on packet loss rate increase
Details in the paper
Switch 1 Switch 5
Switch 2 Switch 4
Switch 3
Switch 6
RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)
Link utilization
estimation on Switch 1Bottleneck
LinkCross
Traffic
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
9/15
Case 2: RLI Sender side
Issue: Traffic may take different routes at an intermediate
switch Approach: Sender sends reference packets to all receivers
Switch 1 Switch 5
Switch 2 Switch 4
Switch 3
Switch 6
RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
10/15
Case 3: RLI Receiver side
Issue: Hard to associate reference packets and regularpackets that traversed the same path
Approaches Packet marking: requires native support from routers
Reverse ECMP computation: reverse engineer intermediateroutes using ECMP hash function
IP prefix matching at limited situation
Switch 1 Switch 5
Switch 2 Switch 4
Switch 3
Switch 6
RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
11/15
Deployment example in fat-tree topology
RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)
IP prefix matching Reverse ECMP computation /
IP prefix matching
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
12/15
Evaluation
Simulation setup Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)
Simulator
Results
Accuracy of per-flow latency estimates
Traffic
DividerSwitch1 Switch2
Cross TrafficInjector
Packet
Trace
RLI
Receiver
RLI
Sender
CrossTraffic
RegularTraffic Referencepackets
10% / 1%
injection rate
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
13/15
67%
Accuracy of per-flow latency estimates
10% injection
1% injection 10% injection
1% injection
Bottleneck link utilization: 93%
Relative error
CDF
1.2% 4.5% 18% 31%
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
14/15
Summary
Low latency applications in data centers Localization of latency anomaly is important
RLI provides flow-level latency statistics, but full
deployment (i.e., all routers/switches) cost is expensive
Proposed a solution enabling partial deployment of RLI
No too much loss in localization granularity (i.e., every other
router)
8/6/2019 Enabling Flow-Level Latency Measurements Across Routers in Data Centers_ppt
15/15
Thank you! Questions?