RobinHood: Tail Latency-Aware Caching Dynamically Reallocating from Cache-Rich to Cache-Poor
Daniel S. Berger Benjamin Berg Timothy Zhu Carnegie Mellon University Pennsylvania State University
USENIX OSDI, 10/8/18.
Siddhartha Sen Mor Harchol-BalterMicrosoft Research Carnegie Mellon University
Typical Web Architecture
1
Aggregationserver
Recom
.
Products
Ads
User request
...Backend queries
Request latency = max of query latencies
Qu
erylaten
cy
2
Typical Web Architecture
Goal: minimize 99-th percentile
(P99) request latency
Aggregationserver
Recom
.
Products
Ads
User request
...
Request latency = max of query latencies
Qu
erylaten
cy
Backend queries
What Causes High P99 Request Latency?
3
Aggregationserver
Recom
.
Products
Ads
User request
...Backend queries
Observations at xbox.com (3/2018):
Better load balancing? Elastically scale backends?
Partially implemented
Qu
ery latency
Request latency
What Else Can We Do?
4
Aggregationserver
Recom
.
Products
Ads
User request
...Backend queries
Cache
Can we use the aggregation cache to reduce the P99 request latency?
Observations at xbox.com (3/2018):
Aggregation Cache: Currently shared among queries to all backends
Qu
ery latency
Request latency
Can We Use Caching to Reduce the P99?
5
State-of-the-art caching systems focus on hit ratio, fairness — not the P99
“Caching layers do not directly address tail latency, aside from configurations where the entire working set can reside in a cache.”
1ms 90%
100ms 10%
Belief: No Cache
B
P99=100ms
95%
5%P99=
100ms
Can We Use Caching to Reduce the P99?
6
But: latency isnot a constant
50ms 500ms85% 15%
Caching can reduce P99 request latency!
Effectiveness in web architecture?
Belief: No 1ms 90%
100ms 10%
Cache
B
P99=500ms
95%
5%P99=50ms
RobinHood: Key Idea
7
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
RobinHood: more cache ⇒ less load ⇒ much lower P99
RobinHood: Key Idea
8
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
Products Recom. Ads
Dynamic Cache Partitions
9
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
Recom. AdsProducts
RobinHood: Key Idea
Dynamic Cache Partitions
10
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
Recom. AdsProducts
RobinHood: Key Idea
Dynamic Cache Partitions
11
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
Recom. AdsProducts
RobinHood: Key Idea
Dynamic Cache Partitions
12
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
Recom. AdsProducts
RobinHood: Key Idea
Dynamic Cache Partitions
13
During load spike:Observations for xbox.com (3/2018):
Aggregationserver
Recom
.
Product
s
Ads
User request
...Backend queries
Cache
Recom. AdsProducts
RobinHood: Key Idea
Dynamic Cache Partitions
RobinHood Cache
14
The RobinHood Caching System
Scalable in # backends,
# aggregation servers
Dynamically partition
the aggregation cache
First caching system to
minimize request P99
Deployable on off-the-
shelf software stack
15
Highlatency
User requests99.5% 0.5%
Cache
How to Repartition the Cache?
How to redistribute the tax?
Every 5 seconds: RobinHood taxes everyone 1%
First idea: give cache to high-latency backends
Recall: not all requests are the same
Small effect on request P99
RobinHood: find the cause of high request P99
16
P0 P100P99
Who “blocked” this request?
How to Repartition the Cache?
How to redistribute the tax?
Every 5 seconds: RobinHood taxes everyone 1%
17
P0 P100P99
Who “blocked” these requests?
How to Repartition the Cache?
⇒ Track “request blocking count” (RBC) for each backend
RobinHood: find the cause of high request P99
How to redistribute the tax?
Every 5 seconds: RobinHood taxes everyone 1%
RobinHood Architecture
18
Aggregation server
... ...
Cache
RobinHood Controller
- ingests RBC
- calculates / enforces cache
allocation
- not on request path
RH-control
Backends
RobinHood Architecture
19
... ...
Cache Cache Cache
⇒ RH-control / Ag. server
In practice many Ag. servers
... ...
- Local decisions
- Local measurements- Pooled measurements
Challenge: insufficient# tail data points
RH-control RH-control RH-control Ag. servers
Backends
Distributed RobinHood:
Experimental Setup
20
Request generator
MySQL(I/O Bound)
Matrix Multiply(CPU Bound)
K-V Store(CPU Bound)
Replay production trace
For 4 hours, 200k queries/second
32 GB cache size
⇒ Emulate query latency spikes
ABCD
20x
Ag. servers
... ...
Cache Cache Cache... ...
Backends
RH-control RH-control RH-control16x
Evaluation Results: P99 Request Latency
21
RobinHood[our proposal]
Balance Query Latencies[Hyberbolic, ATC’17]
Original MS System[OneRF]
Maximize Overall Hit Ratio[Cliffhanger, NSDI’16]
Req
uest
P99
Lat
ency
[ms]
>
What Makes RobinHood so Effective?
22
RobinHood[our proposal]
Original MS System[OneRF]
Req
uest
P99
Lat
ency
[ms]
>
The RobinHood tradeoff:→ up to 2.5x higher latency→ typically 4x lower latency
- Sacrifice performance of some backends- Reduce latency of bottleneck backends
⇒ Reduced request latency
Conclusions
Yes! Huge reduction in P99 spikes and SLO violations.⇒ Use cache as load balancers: “RBC load metric”.
Yes! Built using off-the-shelf software stack. Works orthogonally to existing load balancing and data/quality tradeoffs.
23
Feasibility in production systems?
Is it possible to use caches to improve the request P99?
No! There’s a lot to do. Need to consider the effect of other request structures.
Is this the optimal solution? End of this project?
Poster #31