EE 382C - S11 - Lecture 4 1
EE382C
Lecture 4
High-Radix and Non-Blocking Networks
4/7/11
EE 382C - S11 - Lecture 4 2
Question of the day
• What topology has an average hop count, Havg for load-
balanced traffic that is close to the logd/2N bound and is
able to route arbitrary traffic with an Hmax of twice this
amount?
EE 382C - S11 - Lecture 4 3
High-Radix Networks
EE 382C - S11 - Lecture 4
Bandwidth Trend (ISCA ’05)
0.1
1
10
100
1000
10000
1985 1990 1995 2000 2005 2010
year
ban
dw
idth
per
rou
ter
no
de (
Gb
/s)
Torus Routing Chip
Intel iPSC/2
J-Machine
CM-5
Intel Paragon XP
Cray T3D
MIT Alewife
IBM Vulcan
Cray T3E
SGI Origin 2000
AlphaServer GS320
IBM SP Switch2
Quadrics QsNet
Cray X1
Velio 3003
IBM HPS
SGI Altix 3000
Cray XT3
YARC
EE 382C - S11 - Lecture 4
Router bandwidth
Router
Router
5
EE 382C - S11 - Lecture 4
Router
As bandwidth increases …
Router
6
EE 382C - S11 - Lecture 4
Router
Router
Low-Radix vs. High-Radix Router
Router
Low-radix (small number of fat ports) High-radix (large number of skinny ports)
7
EE 382C - S11 - Lecture 4
Latency vs. Radix
0
50
100
150
200
250
300
0 50 100 150 200 250
radix
late
nc
y (
nse
c)
2003 technology 2010 technology
Optimal radix ~ 40
Optimal radix ~ 128
Serialization latency increases
Header latency
decreases
EE 382C - S11 - Lecture 4
Determining Optimal Radix
Latency = Header Latency + Serialization Latency
= H tr + L / b
= 2trlogkN + 2kL / B
Optimal radix
k log2 k = (B tr log N) / L
where k = radixB = total BandwidthN = # of nodesL = message size
Aspect Ratio
9
EE 382C - S11 - Lecture 4
Higher Aspect Ratio, Higher Optimal Radix
1996
2003
2010
1991
1
10
100
1000
10 100 1000 10000
Aspect Ratio
Op
tim
al R
ad
ix (
k)
10
KN – The Ultimate High-Radix Network
EE 382C - S11 - Lecture 4 11
High-Radix Butterfly
• Just build a butterfly with large k
• For k = 128
– 128 in 1 stage
– 16K in 2 stages
– 2M in 3 stages
• But – vulnerable to adversarial
traffic
EE 382C - S11 - Lecture 4 12
High-Radix Clos
• Not vulnerable to adversarial traffic,
• But twice the number of stages – even on benign traffic.
EE 382C - S11 - Lecture 4 13
High-Radix Interconnection Networks
Flattened Butterfly
R0'
R1'
R2'
R3'
R4'
R5'
R6'
R7'
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10
I11
I12
I13
I14
I15
O0
O1
O2
O3
O4
O5
O6
O7
O8
O9
O10
O11
O12
O13
O14
O15
{ {{{
dimension 1
{
dimension 2
{
dimension 3
dimension 1
dimension 2
I0
I1
I2
I3
I4
I5
I6
I7
R0_3R0_2O0
O1
O2
O3
O4
O5
O5
O7
I8
I9
I10
I11
I12
I13
I14
I15
O8
O9
O10
O11
O12
O13
O14
O15
R0_1R0_0
Dragonfly Topology
Dragonfly Topology
R2 Rn
-1
R1 R0 Rn
-2
Interconnection Network
intra-group interconnection
network
R0 R1 Ra
-1
G0 G1 Gg-1Gg-1
Inter-group Interconnection Network
Dragonfly Topology
Dragonfly Topology Example
P0 P1
R0
P2 P3
R1
P4 P5
R2
P6 P7
R3
P24 P25 P26 P27 P28 P29 P30 P31
P8 P9
R4
P10 P11
R5
P12 P13
R6
P14 P15
R7
R1
2
R1
3
R1
4
R1
5
G2
G4 G5 G6 G7 G8
G3
G0 G1
Dragonfly by the numbers
• Consider k=128
• Divide into g, l, p
– p processors per router
– l connections to other routers in same group
– g global connections – to other groups
• Design method
– Pick g
– Let l = 2g-1
– Let p = 128-2g
– Compute concentration factor c = p/g
EE 382C - S11 - Lecture 4 17
An Example
d 128 128 128 128
p 93 75 51 33
l 23 35 51 63
g 12 18 26 32
p/g 7.75 4.17 1.96 1.03
(l+1)g 288 648 1352 2048
group 2139 2625 2601 2079
N 618,171 1,703,625 3,519,153 4,259,871
EE 382C - S11 - Lecture 4 18
EE 382C - S11 - Lecture 4 19
Non-Blocking Networks
EE 382C - S11 - Lecture 4 20
Non-blocking networks
• Non-blocking: able to connect any unconnected input to
any unconnected output.
• Circuit switching vs. packet switching
• Strictly vs. rearrangeably non-blocking
• Non-interfering networks
EE 382C - S11 - Lecture 4 21
Crossbar Switch
in0
in1
in2
in3
out0
out1
out2
out3
out4
crosspointinput line
output line
Old Mechanical Crossbar
EE 382C - S11 - Lecture 4 22
EE 382C - S11 - Lecture 4 23
Implemented with Multiplexers
in0
in1
in2
in3
out0
out1
out2
out3
out4
1 0 3 1 2
EE 382C - S11 - Lecture 4 24
Crossbar Expansion
in0
in(n-1)
inn
in(2n-1)
out0
ou
t(n-1
)
ou
tn
ou
t(2n
-1)
EE 382C - S11 - Lecture 4 25
Clos Networks
EE 382C - S11 - Lecture 4 26
Basic Clos Structure (3,3,4)
n=3 ports
per switch
middle
switch 1
4x4
m=3 r x r
middle switches
middle
switch 2
4x4
middle
switch 3
4x4
input
switch 1
3x3
1.1
1.2
1.3
r=4 n x m
input switches
input
switch 2
3x3
2.1
2.2
2.3
input
switch 3
3x3
3.1
3.2
3.3
input
switch 4
3x3
4.1
4.2
4.3
output
switch 1
3x3
1.1
1.2
1.3
r=4 m x n
output switches
output
switch 2
3x3
2.1
2.2
2.3
output
switch 3
3x3
3.1
3.2
3.3
output
switch 4
3x3
4.1
4.2
4.3
EE 382C - S11 - Lecture 4 27
Simple Clos (2,2,2)
a1
a2
b1
b2
1.1
1.2
1.1
1.2
2.1
2.2
2.1
2.2
• Route
– 1.1 to 1.1
– 2.2 to 2.2
– 1.2 to 2.1
– 2.1 to 1.2
EE 382C - S11 - Lecture 4 28
Routing example
• Route
– 1.1 to 1.1
– 2.2 to 2.2
– 1.2 to 2.1
– 2.1 to 1.2
a1
a2
b1
b2
1.1
1.2
1.1
1.2
2.1
2.2
2.1
2.2
EE 382C - S11 - Lecture 4 29
Edge Coloring
a1
a2
b1
b2
EE 382C - S11 - Lecture 4 30
Edge Coloring
• Route
– 1.1 to 1.1
– 2.2 to 2.2
– 1.2 to 2.1
– 2.1 to 1.2
a1
a2
b1
b2
EE 382C - S11 - Lecture 4 31
Rearrangement
• Route
– 1.1 to 1.1
– 2.2 to 2.2
– 1.2 to 2.1
– 2.1 to 1.2
• Route next call violating rule
a1
a2
b1
b2
EE 382C - S11 - Lecture 4 32
Rearrangement
• Route
– 1.1 to 1.1
– 2.2 to 2.2
– 1.2 to 2.1
– 2.1 to 1.2
• Route next call violating rule
• Fix color conflict at b2 by flipping
other edge
a1
a2
b1
b2
EE 382C - S11 - Lecture 4 33
Rearrangement
• Route
– 1.1 to 1.1
– 2.2 to 2.2
– 1.2 to 2.1
– 2.1 to 1.2
• Route next call violating rule
• Fix color conflict at b2 by flipping
other edge
• Final circuit no longer conflicts
a1
a2
b1
b2
EE 382C - S11 - Lecture 4 34
Rearrangement
a1
a2
b1
b2
a1
a2
b1
b2
1.1
1.2
1.1
1.2
2.1
2.2
2.1
2.2
EE 382C - S11 - Lecture 4 35
a1
a2
b1
b2
Strictly non-blocking
a1
a2
b1
b2
1.1
1.2
1.1
1.2
2.1
2.2
2.1
2.2
EE 382C - S11 - Lecture 4 36
Strictly non-blocking Routing
• Number of stages
• Vector of free middle stages at every input and output
switch
• E.g. with 15 middle stages:
– Input i : 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1
– Output j : 0 1 1 0 1 1 1 0 1 0 0 0 0 1 1
---------------------------------------
EE 382C - S11 - Lecture 4 37
3D View of a Clos Network
r - n x m
input switches
m - r x r
middle switches
r - m x n
output switches
EE 382C - S11 - Lecture 4 38
Rearrangement Example
EE 382C - S11 - Lecture 4 39
(3,3,4) Routing Problem
I1
I2
I3
I4
O1
O2
O3
O4
EE 382C - S11 - Lecture 4 40
Before Setting Up (2,4)
1
2
3I1
I2
I3
I4
O1
O2
O3
O4
EE 382C - S11 - Lecture 4 41
Connect (2,4) using RED
I1
I2
I3
I4
O1
O2
O3
O4
EE 382C - S11 - Lecture 4 42
Switch (1,4) to BLUE
I1
I2
I3
I4
O1
O2
O3
O4
The red link being
moved cannot lead
back to i2 since i2’s
red link is already
accounted for.
EE 382C - S11 - Lecture 4 43
Connect (4,3) using BLUE
I1
I2
I3
I4
O1
O2
O3
O4
EE 382C - S11 - Lecture 4 44
Chain of calls to be switched
BLUE/RED
I1
I2
I3
I4
O1
O2
O3
O4
Chain is guaranteed
to be acyclic since
input blue and output
red links of previous
nodes are already
accounted for.
EE 382C - S11 - Lecture 4 45
After Switching 5 calls
I1
I2
I3
I4
O1
O2
O3
O4
EE 382C - S11 - Lecture 4 46
Complete Colored Graph
I1
I2
I3
I4
O1
O2
O3
O4
EE 382C - S11 - Lecture 4 47
Question of the day
• What topology has an average hop count, Havg for load-
balanced traffic that is close to the logd/2N bound and is
able to route arbitrary traffic with an Hmax of twice this
amount?
EE 382C - S11 - Lecture 4 48
Flattened Butterfly Topology
EE 382C - S11 - Lecture 4 49
Non-Blocking Summary
• Non blocking
– Can connect any unconnected input to any unconnected output
• Strictly non-blocking: without moving any other connections
• Rearrangeably non-blocking: may require moving other connections
– Applies to circuit switching
• For packet switching you usually want a non-interfering network
• Crossbar
– Trivially non-blocking
– Also has stiff backpressure
• Clos network
– 3 stages of crossbars – each switch of stage i connects to all switches of stage i+1
– (m,n,r)
– Strictly non blocking if m 2n-1, rearrangeable if m n
– Schedule with the looping algorithm – augmented paths
– Multicast
• Conflict vector representation
• Can fanout in input or middle stage (split calls)