Nick McKeown CS244 Lecture 7 Valiant Load Balancing.

Post on 11-Jan-2016

220 views 1 download

Tags:

transcript

Nick McKeown

CS244 Lecture 7

Valiant Load Balancing

Simple Model of US Backbone

2

Designing a Backbone Network

3

1. Hard to measure current traffic matrix.- Harder still to estimate future traffic matrices.

1. Hard to know which traffic matrices can be supported.- Harder still under link and node failures.

The Problem

4

1 2

3N

… 4

r1

r4

r3

r2

POPs in big cities

Q: How capacity to provision between two POPs?

The Problem

5

1 2

3N

… 4

r1

r4

r3

r2

6

In

In

In

Out

Out

Out

r

r

r

r

r

r

Router capacity = NrSwitch capacity = N2r

100% Throughput in a Mesh

?

?

?

?

?

?

?

?

?

r

r

r

r

r

r

r

r

r

rrrr

Questions

How would we provision the links if we know the traffic matrix?

What is the cost of not knowing the traffic matrix?

7

Valiant Load Balancing

8

1 2

3N

… 4

r

r

r

r

2r/N

9

Outline for Today

1. Basic idea of load-balancing

2. Packet mis-sequencing

3. An optical switch fabric

10

R

In

In

In

Out

Out

Out

R

R

R

R

R

R/N

R/N

R/N

R/NR/N

R/N

R/N

R/N

R/N

If Traffic Is Uniform

RNR /NR /NR /

R

NR / NR /

11

Real Traffic is Not Uniform

R

In

In

In

Out

Out

Out

R

R

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

RNR /NR /NR /

R

RNR /NR /NR /

R

RNR /NR /NR /

R

R

R

R

?

Can we make traffic “sufficiently uniform” to make the problem trivial?

12

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

VLB Switch

Load-balancing stage Forwarding stage

In

In

In

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R

R

R

100% throughput for weakly mixing traffic (Valiant, C.-S. Chang)

13

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

112233

VLB Switch

14

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N33

22

11

VLB Switch

15

Out

Out

Out

R

R

R

In

In

In

R

R

R

R/N

R/N

R/N

Intuition: 100% Throughput

Arrivals to second mesh:

Capacity of second mesh:

Second mesh: arrival rate < service rate

01

-b RUaUN

C

UN

RC

Cba

[C.-S. Chang]

R/N

R/N

R/N

16

Another way of thinking about it

1

N

1

N

1

N

External Outputs

Internal Inputs

External Inputs

Load-balancing cyclic shift

Switching cyclic shift

Interesting properties:• 100% throughput, no arbiter (but 2x switching capacity)• No part of the system need operate faster than the line rate

Performance

1. What are the performance tradeoffs between a scheduler and a load-balanced design?

2. How can a load-balanced switch have lower loss than an OQ switch?

3. “I’m surprised that no one came up with the idea earlier”

17

What you said

My favorite line in the paper is the following: "If it is possible to build a packet switch with 100% throughput that has no scheduler, no reconfigurable switch fabric, and buffer memories operating without speedup, where does the packet switching actually take place?" The answer of course in the VOQs…”

18

19

Outline

1. Basic idea of load-balancing

2. Packet mis-sequencing

3. An optical switch fabric

What you said

“[I]f packet mis-sequencing is such a performance hit due to the way TCP is designed and how the Internet has reached to a point where a new transport protocol adoption is impractical, I wonder how long we will all have to live with TCP before Internet traffic reaches a point where a whole new layering system must be re-architected (and what might David Clark might have to say about that).”

20

Packet Mis-sequencing

1. Does the Internet allow packets to be mis-sequenced?

2. Why do we (or network operators) care?

3. Will the Internet require packets to stay in sequence in the future?

21

22

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Packet Reordering

12

23

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Bounding Delay Difference Between Middle Ports

1

2

cells

24

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

123

0

Uniform Frame Spreading

12

25

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

FOFF (Full Ordered Frames First)

12

26

FOFF (Full Ordered Frames First)

Input Algorithm N FIFO queues corresponding to the N output flows Spread each flow uniformly: if last packet was sent to

middle port k, send next to k+1. Every N time-slots, pick a flow:

- If full frame exists, pick it and spread like UFS - Else if all frames are partial, pick one in round-robin order and send it

123

12

4

N

27

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

Bounding Reordering

123

NN

28

FOFF

Output properties N FIFO queues corresponding to the N middle

ports Buffer size less than N2 packets If there are N2 packets, one of the head-of-line

packets is in order

111

22

333

Output

4

N

29

VLB + FOFF Properties

With quite a lot of work, packet order is maintained

Interestingly, expected packet delay is within a constant of OQ switch (surprising)

Therefore, VLB with FOFF has 100% throughput

30

Outline

1. Basic idea of load-balancing

2. Packet mis-sequencing

3. An optical switch fabric

What you said

"They state that their theoretical 100 Tb/s switch should be able to be built in about 3 years, so if they were correct it should have long since have been built. I was unable to find anything about 100 Tb/s optics switches being used, so I’m not sure if it happened or not. There was another paper on the subject in 2010, suggesting that it took more than 3 years for technology to advance sufficiently. If such a switch has been manufactured, did it perform to expectations? If not, what prevented it?”

31

32

Out

Out

Out

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

In

In

In

R

R

R

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

R/N

From Two Meshes to One Mesh

One linecard

In

Out

33

From Two Meshes to One Mesh

First meshIn Out

In Out

In Out

In Out

One linecard

Second mesh

R R

R

R

R

34

From Two Meshes to One Mesh

Combined meshIn Out

In Out

In Out

In Out

2RR

2R

2R

2R

35

Many Fabric Options

Options

Space: Full uniform meshTime: Round-robin crossbarWavelength: Static WDM

Any spreadingdevice

C1, C2, …, CN

C1

C2

C3

CN

In Out

In Out

In Out

In Out

N channels each at rate 2R/NOne linecard

36

AWGR (Arrayed Waveguide Grating Router) A Passive Optical Component

Wavelength i on input port j goes to output port (i+j-1) mod N

Can shuffle information from different inputs

1,

2…N

NxN AWGR

Linecard 1

Linecard 2

Linecard N

1

2

N

Linecard 1

Linecard 2

Linecard N

37

In Out

In Out

In Out

In Out

Static WDM Switching: Packaging

AWGR

Passive andAlmost Zero

Power

A

B

C

D

A, B, C, D

A, B, C, D

A, B, C, D

A, B, C, D

A, A, A, A

B, B, B, B

C, C, C, C

D, D, D, D

N WDM channels, each at rate 2R/N

Linecard placement and failureWhat happens if a linecard is missing or fails?

Does this happen in practice?

38