+ All Categories
Home > Technology > Load balancing theory and practice

Load balancing theory and practice

Date post: 17-Dec-2014
Category:
Upload: foundationdb
View: 1,700 times
Download: 1 times
Share this document with a friend
Description:
Presentation about load balancing by David Rosenthal, one of the founders at FoundationDB.
54
Load balancing theory and practice
Transcript
Page 1: Load balancing theory and practice

Load balancingtheory and practice

Page 2: Load balancing theory and practice

Welcome

Me:• Dave Rosenthal• Co-founder of FoundationDB• Spent last three years building a distributed

transactional NoSQL database• It’s my birthday

Any time you have multiple computers working on a job, you have a load balancing problem!

Page 3: Load balancing theory and practice

Warning

There is an ugly downside to learning about load balancing: TSA checkpoints, grocery store lines, and traffic lights may become even more frustrating.

Page 4: Load balancing theory and practice

What is load balancing?

Wikipedia: “…methodology to distribute workload across multiple computers … to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload”

All part of the latency curve

Page 5: Load balancing theory and practice

The latency curve

Series11

10

100

1000

10000

Jobs/second

Late

ncy

Overload

Saturation

Nominal Interesting

Page 6: Load balancing theory and practice

Goal for real-time systems

Series11

10

100

1000

10000

Jobs/second

Late

ncy

Low latency at given load

Page 7: Load balancing theory and practice

Goal for batch systems

Series11

10

100

1000

10000

Jobs/second

Late

ncy High Jobs/sec at a

reasonable latency

Page 8: Load balancing theory and practice

The latency curve

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

10

100

1000

Load

Late

ncy

(ms)

Better load balancing strategies can dramatically improve both latency and throughput

Page 9: Load balancing theory and practice

Load balancing tensions

• We want to reduce queue lengths in the system to yield better latency

• We want to lengthen queue lengths to keep a “buffer” of work to keep busy during irregular traffic and yield better throughput

• For distributed systems, equalizing queue lengths sounds good

Page 10: Load balancing theory and practice

Can we just limit queue sizes?

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35

40

Queued job limit

% o

f dro

pped

jobs

Page 11: Load balancing theory and practice

Simple strategies

Global job queue: for slow tasksRound robin: for highly uniform situationsRandom: probably won’t screw youSticky: for cacheable situationsFastest of N tries: tradeoff throughput for latency. I recommend N = 2 or 3.

Page 12: Load balancing theory and practice

Use a global queue if possible

1 2 3 4 5 6 7 8 9 100.1

1

10

Random assignmentGlobal Job Queue

Cluster Size

Late

ncy

unde

r 80%

load

Page 13: Load balancing theory and practice

Options for information transfer

• None (rare)• Latency (most common)• Failure detection• Explicit– Load average– Queue length– Response times

Page 14: Load balancing theory and practice

FoundationDB’s approach

1. Request to random of three servers2. Server either answers query or replies “busy” if its queue

is longer than the queue limit estimate3. Queries that were busy are sent to second random server

with “must do” flag set.

Queue limit = 25 * 2^(20*P)• A global queue limit is implicitly shared by estimating the

fraction of incoming requests (P) that are flagged “must do”• Converges to a P(redirect)/queue-size equilibrium

Page 15: Load balancing theory and practice

FDB latency curve before/after

0 200000 400000 600000 800000 1000000 12000000.1

1

10

100

Operations per second

Late

ncy

0 200000 400000 600000 800000 1000000 12000000.1

1

10

100

Operations per second

Late

ncy

Page 16: Load balancing theory and practice

Tackling load balancing

• Queuing theory: One useful insight• Simulation: Do this• Instrumentation: Do this• Control theory: Know how to avoid this• Operations research: Read about this for fun– Blackett: Shield planes where they are not shot!

Page 17: Load balancing theory and practice

The one insight: Little’s law

Q = R*W

• (Q)ueue size = (R)ate * (W)ait-time• Q is the average number of jobs in the system• R is the average arrival rate (jobs/second)• W is the average wait time (seconds)• For any (!) steady-state systems– Or sub-systems, or joint systems, or…

Page 18: Load balancing theory and practice

Little’s law example 1

Q = R*W

• We get 1,000,000 request per second (R=1E6)• We take 100 ms to service each request• (Q = 1E6*0.100)• Little’s Law: Average queue depth is 100,000!

Page 19: Load balancing theory and practice

Little’s law example 2

W = Q/R

• We have 100 users in the system making continuous requests (Q=100)

• We get 10,000 requests per second• (W = 100 / 10,000)• Little’s Law: Average wait time is 10 ms

Page 20: Load balancing theory and practice

Little’s law ramifications

Q = R*W

• In distributed system:– R scales up– W remains the same, or gets a bit worse

• To maintain performance, you’re going to need a whole lot of jobs in flight

Page 21: Load balancing theory and practice

The rest of queuing theory

Erlang• A language • A man (Agner Krarup Erlang)• And a unit! (Q from little’s law AKA offered load

is measured in dimensionless Erlang units)• Erlang-B formula (for limited-length queues)• Erlang-C formula (P(waiting))

Page 22: Load balancing theory and practice

Abandon hope

Series11

10

100

1000

10000

Math for queuing theory

Real-world applicability

Com

plex

ity o

f Mat

h

Little’s law ?

Page 23: Load balancing theory and practice

Simulation

The best way to explore distributed system behavior

Page 24: Load balancing theory and practice

Quiz

Model: Jobs of random durations. 80% load.Goal: Minimize average job latency.

What to work a bit more on?• First task received• Last task received• Shortest task• Longest task• Random task• Task with least work remaining• Task with most work remaining

Page 25: Load balancing theory and practice

Simulation code snippits

Page 26: Load balancing theory and practice

Simulation results at 80% load

Task with most work remaining

Task with least work remaining

Random task

Longest task

Shortest task

Last task received

First task received

0 5 10 15 20 25 30 35 40 45 50

Latency

Page 27: Load balancing theory and practice

Simulation results at 95% load

Task with most work remaining

Task with least work remaining

Random task

Longest task

Shortest task

Last task received

First task received

10 100 1000 10000 100000

Latency

Page 28: Load balancing theory and practice

FoundationDB’s approach

• Strategy validated using simulation used for a single server’s fiber scheduling

• High priority: Work on the next task to finish• But be careful to enqueue incoming work from

the network with highest priority—we want to know about all our jobs to make good decisions

• Low priority: Catch up with housekeeping (e.g. non-log writing)

Page 29: Load balancing theory and practice

Load spikes

Low load system High load system

Series1

Series1

Bursts of job requests can destroy latency. The effect is quadratic: A burst produces a queue of size B that lasts time proportional to B. On highly-loaded systems, the effect is multiplied by 1/(1-load), leading to huge latency impacts.

Page 30: Load balancing theory and practice

Burst-avoiding tip

1. Search for any delay/interval in your system2. If system correctness depends on the

delay/interval being exact, first fix that3. Now change that delay/interval to randomly

wait 0.8-1.2 times the nominal time on each execution

YMMV, but this tends to diffuse system events more evenly in time and help utilization and latency.

Page 31: Load balancing theory and practice

Overload

Series11

10

100

1000

10000

Jobs/second

Late

ncy

Overload

Page 32: Load balancing theory and practice

Overload

What happens when work comes in too fast?• Somewhere in your system a queue is going to

get huge. Where?• Lowered efficiency due to:– Sloshing– Poor caching

• Unconditional acceptance of new work means no information transfer to previous system!

Page 33: Load balancing theory and practice

Overload (cont’d): Sloshing

Loading 10 million rows into popular NoSQL K/V store shows sloshing

12.5 minutes

Page 34: Load balancing theory and practice

Overload (cont’d): No sloshing

Loading 10 million rows into FDB shows smooth behavior:

Page 35: Load balancing theory and practice

System queuing

Work

A

B

C

D

E

Node 1

Queue

Node 2

Queue

Node 3

Queue

Page 36: Load balancing theory and practice

System queuing

Work

B

C

D

E

Node 1

Queue

A

Node 2

Queue

Node 3

Queue

Page 37: Load balancing theory and practice

Internal queue buildup

Work

E

Node 1

Queue

A

B

C

D

Node 2

Queue

Node 3

Queue

Page 38: Load balancing theory and practice

Even queues, external buildup

Work

D

E

Node 1

Queue

C

Node 2

Queue

B

Node 3

Queue

A

Page 39: Load balancing theory and practice

Our approach

“Ratekeeper”• Active management of internal queue sizes prevents

sloshing• Avoids every subcomponent needing it’s own well-

tuned load balancing strategy• Explicitly send queue information at 10hz back to a

centrally-elected control algorithm• When queues get large, slow system input• Pushes latency into an external queue at the front of

the system using “tickets”

Page 40: Load balancing theory and practice

Ratekeeper in action

0 100 200 300 400 500 6000

200000

400000

600000

800000

1000000

1200000

1400000

Seconds

Ope

ratio

ns p

er s

econ

d

Page 41: Load balancing theory and practice

Ratekeeper internals

Page 42: Load balancing theory and practice

What can go wrong

Well, we are controlling the queue depths of the system, so, basically, everything in control theory…

Namely, oscillation:

Page 43: Load balancing theory and practice

Recognizing oscillation

• Something moving up and down :)– Look for low utilization of parallel resources– Zoom in!

• Think about sources of feedback—is there some way that having a machine getting more job done feeds either less or more work for that machine in the future? (probably yes)

Page 44: Load balancing theory and practice

What oscillation looks like

1 1.5 2 2.5 3 3.5 4 4.5 50

10

20

30

40

50

60

70

Node ANode B

Util

izati

on %

Page 45: Load balancing theory and practice

What oscillation looks like

2 2.05 2.1 2.15 2.2 2.25 2.30

20

40

60

80

100

120

Node ANode B

Util

izati

on %

Page 46: Load balancing theory and practice

Avoiding oscillation

• This is control theory—avoid if possible!• The major thing to know: control gets harder

at frequencies get higher. (e.g. Bose headphones)

• Two strategies:– Control on a longer time scale– Introduce a low-pass-filer in the control loop (e.g.

exponential moving average)

Page 47: Load balancing theory and practice

Instrumentation

If you can’t measure, you can’t make it better

Things that might be nice to measure:• Latencies• Queue lengths• Causes of latency?

Page 48: Load balancing theory and practice

Measuring latencies

Our approach:• We want information about the distribution, not

just the average• We use a “Distribution” class– addSample(X)– Stores 500+ samples– Throws away half of them when it hits 1000 samples,

and halves probability of accepting new samples– Also tracks exact min, max, mean, and stddev

Page 49: Load balancing theory and practice

Measuring queue lengths

Our approach:• Track the % of time that a queue is at zero length• Measure queue length snapshots at intervals• Watch out for oscillations– Slow ones you can see– Fast ones look like noise (which, unfortunately, is

also what noise looks like)– “Zoom in” to exclude the possibility of micro-

oscillations

Page 50: Load balancing theory and practice

Measuring latency from blocking

• Easy to calculate:– L = (b0^2 + b1^2 … bN^2) / elapsed – Total all squared seconds of blocking time over

some interval, divide by the duration of the interval. • Measures impact of unavailability on mean

latency from random traffic• Example: Is server’s slow latency explained by

this lock?• Doesn’t count catch-up time.

Page 51: Load balancing theory and practice

Summary

[email protected]

Thanks for listening, and remember:• Everything has a latency curve• Little’s law• Randomize regular intervals• Validate designs with simulation• Instrument

May your queues be small, but not empty

Page 52: Load balancing theory and practice

Prioritization/QOS

• Can help in systems under partial load• Vital in systems that handle batch and real-

time loads simultaneously• Be careful that high priority work doesn’t

generate other high priority work plus other jobs in the queue. This can lead to poor utilization analogous to the internal queue buildup case.

Page 53: Load balancing theory and practice

Congestion pricing

• My favorite topic• Priority isn’t just a function of the benefit of

your job• To be a good citizen, you should subtract the

costs to others• For example, jumping into the front of a long

queue has costs proportional to the queue size

Page 54: Load balancing theory and practice

Other FIFO alternatives?

• LIFO– Avoids the reason to line up early– In situations where there is adequate capacity to

serve everyone, can yield better waiting times for everyone involved


Recommended