Date post: | 17-Dec-2014 |
Category: |
Technology |
Upload: | foundationdb |
View: | 1,700 times |
Download: | 1 times |
Load balancingtheory and practice
Welcome
Me:• Dave Rosenthal• Co-founder of FoundationDB• Spent last three years building a distributed
transactional NoSQL database• It’s my birthday
Any time you have multiple computers working on a job, you have a load balancing problem!
Warning
There is an ugly downside to learning about load balancing: TSA checkpoints, grocery store lines, and traffic lights may become even more frustrating.
What is load balancing?
Wikipedia: “…methodology to distribute workload across multiple computers … to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload”
All part of the latency curve
The latency curve
Series11
10
100
1000
10000
Jobs/second
Late
ncy
Overload
Saturation
Nominal Interesting
Goal for real-time systems
Series11
10
100
1000
10000
Jobs/second
Late
ncy
Low latency at given load
Goal for batch systems
Series11
10
100
1000
10000
Jobs/second
Late
ncy High Jobs/sec at a
reasonable latency
The latency curve
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11
10
100
1000
Load
Late
ncy
(ms)
Better load balancing strategies can dramatically improve both latency and throughput
Load balancing tensions
• We want to reduce queue lengths in the system to yield better latency
• We want to lengthen queue lengths to keep a “buffer” of work to keep busy during irregular traffic and yield better throughput
• For distributed systems, equalizing queue lengths sounds good
Can we just limit queue sizes?
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25
30
35
40
Queued job limit
% o
f dro
pped
jobs
Simple strategies
Global job queue: for slow tasksRound robin: for highly uniform situationsRandom: probably won’t screw youSticky: for cacheable situationsFastest of N tries: tradeoff throughput for latency. I recommend N = 2 or 3.
Use a global queue if possible
1 2 3 4 5 6 7 8 9 100.1
1
10
Random assignmentGlobal Job Queue
Cluster Size
Late
ncy
unde
r 80%
load
Options for information transfer
• None (rare)• Latency (most common)• Failure detection• Explicit– Load average– Queue length– Response times
FoundationDB’s approach
1. Request to random of three servers2. Server either answers query or replies “busy” if its queue
is longer than the queue limit estimate3. Queries that were busy are sent to second random server
with “must do” flag set.
Queue limit = 25 * 2^(20*P)• A global queue limit is implicitly shared by estimating the
fraction of incoming requests (P) that are flagged “must do”• Converges to a P(redirect)/queue-size equilibrium
FDB latency curve before/after
0 200000 400000 600000 800000 1000000 12000000.1
1
10
100
Operations per second
Late
ncy
0 200000 400000 600000 800000 1000000 12000000.1
1
10
100
Operations per second
Late
ncy
Tackling load balancing
• Queuing theory: One useful insight• Simulation: Do this• Instrumentation: Do this• Control theory: Know how to avoid this• Operations research: Read about this for fun– Blackett: Shield planes where they are not shot!
The one insight: Little’s law
Q = R*W
• (Q)ueue size = (R)ate * (W)ait-time• Q is the average number of jobs in the system• R is the average arrival rate (jobs/second)• W is the average wait time (seconds)• For any (!) steady-state systems– Or sub-systems, or joint systems, or…
Little’s law example 1
Q = R*W
• We get 1,000,000 request per second (R=1E6)• We take 100 ms to service each request• (Q = 1E6*0.100)• Little’s Law: Average queue depth is 100,000!
Little’s law example 2
W = Q/R
• We have 100 users in the system making continuous requests (Q=100)
• We get 10,000 requests per second• (W = 100 / 10,000)• Little’s Law: Average wait time is 10 ms
Little’s law ramifications
Q = R*W
• In distributed system:– R scales up– W remains the same, or gets a bit worse
• To maintain performance, you’re going to need a whole lot of jobs in flight
The rest of queuing theory
Erlang• A language • A man (Agner Krarup Erlang)• And a unit! (Q from little’s law AKA offered load
is measured in dimensionless Erlang units)• Erlang-B formula (for limited-length queues)• Erlang-C formula (P(waiting))
Abandon hope
Series11
10
100
1000
10000
Math for queuing theory
Real-world applicability
Com
plex
ity o
f Mat
h
Little’s law ?
Simulation
The best way to explore distributed system behavior
Quiz
Model: Jobs of random durations. 80% load.Goal: Minimize average job latency.
What to work a bit more on?• First task received• Last task received• Shortest task• Longest task• Random task• Task with least work remaining• Task with most work remaining
Simulation code snippits
Simulation results at 80% load
Task with most work remaining
Task with least work remaining
Random task
Longest task
Shortest task
Last task received
First task received
0 5 10 15 20 25 30 35 40 45 50
Latency
Simulation results at 95% load
Task with most work remaining
Task with least work remaining
Random task
Longest task
Shortest task
Last task received
First task received
10 100 1000 10000 100000
Latency
FoundationDB’s approach
• Strategy validated using simulation used for a single server’s fiber scheduling
• High priority: Work on the next task to finish• But be careful to enqueue incoming work from
the network with highest priority—we want to know about all our jobs to make good decisions
• Low priority: Catch up with housekeeping (e.g. non-log writing)
Load spikes
Low load system High load system
Series1
Series1
Bursts of job requests can destroy latency. The effect is quadratic: A burst produces a queue of size B that lasts time proportional to B. On highly-loaded systems, the effect is multiplied by 1/(1-load), leading to huge latency impacts.
Burst-avoiding tip
1. Search for any delay/interval in your system2. If system correctness depends on the
delay/interval being exact, first fix that3. Now change that delay/interval to randomly
wait 0.8-1.2 times the nominal time on each execution
YMMV, but this tends to diffuse system events more evenly in time and help utilization and latency.
Overload
Series11
10
100
1000
10000
Jobs/second
Late
ncy
Overload
Overload
What happens when work comes in too fast?• Somewhere in your system a queue is going to
get huge. Where?• Lowered efficiency due to:– Sloshing– Poor caching
• Unconditional acceptance of new work means no information transfer to previous system!
Overload (cont’d): Sloshing
Loading 10 million rows into popular NoSQL K/V store shows sloshing
12.5 minutes
Overload (cont’d): No sloshing
Loading 10 million rows into FDB shows smooth behavior:
System queuing
Work
A
B
C
D
E
Node 1
Queue
Node 2
Queue
Node 3
Queue
System queuing
Work
B
C
D
E
Node 1
Queue
A
Node 2
Queue
Node 3
Queue
Internal queue buildup
Work
E
Node 1
Queue
A
B
C
D
Node 2
Queue
Node 3
Queue
Even queues, external buildup
Work
D
E
…
Node 1
Queue
C
Node 2
Queue
B
Node 3
Queue
A
Our approach
“Ratekeeper”• Active management of internal queue sizes prevents
sloshing• Avoids every subcomponent needing it’s own well-
tuned load balancing strategy• Explicitly send queue information at 10hz back to a
centrally-elected control algorithm• When queues get large, slow system input• Pushes latency into an external queue at the front of
the system using “tickets”
Ratekeeper in action
0 100 200 300 400 500 6000
200000
400000
600000
800000
1000000
1200000
1400000
Seconds
Ope
ratio
ns p
er s
econ
d
Ratekeeper internals
What can go wrong
Well, we are controlling the queue depths of the system, so, basically, everything in control theory…
Namely, oscillation:
Recognizing oscillation
• Something moving up and down :)– Look for low utilization of parallel resources– Zoom in!
• Think about sources of feedback—is there some way that having a machine getting more job done feeds either less or more work for that machine in the future? (probably yes)
What oscillation looks like
1 1.5 2 2.5 3 3.5 4 4.5 50
10
20
30
40
50
60
70
Node ANode B
Util
izati
on %
What oscillation looks like
2 2.05 2.1 2.15 2.2 2.25 2.30
20
40
60
80
100
120
Node ANode B
Util
izati
on %
Avoiding oscillation
• This is control theory—avoid if possible!• The major thing to know: control gets harder
at frequencies get higher. (e.g. Bose headphones)
• Two strategies:– Control on a longer time scale– Introduce a low-pass-filer in the control loop (e.g.
exponential moving average)
Instrumentation
If you can’t measure, you can’t make it better
Things that might be nice to measure:• Latencies• Queue lengths• Causes of latency?
Measuring latencies
Our approach:• We want information about the distribution, not
just the average• We use a “Distribution” class– addSample(X)– Stores 500+ samples– Throws away half of them when it hits 1000 samples,
and halves probability of accepting new samples– Also tracks exact min, max, mean, and stddev
Measuring queue lengths
Our approach:• Track the % of time that a queue is at zero length• Measure queue length snapshots at intervals• Watch out for oscillations– Slow ones you can see– Fast ones look like noise (which, unfortunately, is
also what noise looks like)– “Zoom in” to exclude the possibility of micro-
oscillations
Measuring latency from blocking
• Easy to calculate:– L = (b0^2 + b1^2 … bN^2) / elapsed – Total all squared seconds of blocking time over
some interval, divide by the duration of the interval. • Measures impact of unavailability on mean
latency from random traffic• Example: Is server’s slow latency explained by
this lock?• Doesn’t count catch-up time.
Summary
Thanks for listening, and remember:• Everything has a latency curve• Little’s law• Randomize regular intervals• Validate designs with simulation• Instrument
May your queues be small, but not empty
Prioritization/QOS
• Can help in systems under partial load• Vital in systems that handle batch and real-
time loads simultaneously• Be careful that high priority work doesn’t
generate other high priority work plus other jobs in the queue. This can lead to poor utilization analogous to the internal queue buildup case.
Congestion pricing
• My favorite topic• Priority isn’t just a function of the benefit of
your job• To be a good citizen, you should subtract the
costs to others• For example, jumping into the front of a long
queue has costs proportional to the queue size
Other FIFO alternatives?
• LIFO– Avoids the reason to line up early– In situations where there is adequate capacity to
serve everyone, can yield better waiting times for everyone involved