+ All Categories
Home > Documents > Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Date post: 23-Dec-2015
Category:
Upload: ursula-perry
View: 219 times
Download: 1 times
Share this document with a friend
29
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1
Transcript
Page 1: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Routing Algorithms

ECE 284On-Chip Interconnection Networks

Spring 2014

1

Page 2: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Routing

• Will assume 2D mesh in this talk• How flits are routed from source

to destination can greatly impactnetwork congestion

• Two types of routing:• Oblivious routing: routing does not

consider or depend on the currenttraffic condition

• Adaptive routing: takes intoconsideration current traffic condition to determine the routing path (tries to get around congested areas)

• Oblivious routing simpler (less expensive) to implement• This talk will review existing oblivious routing algorithms

2

Page 3: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

3

Routing Algorithm Objectives

• Maximize throughput– How much load the network can handle

• Minimize latency– Minimize routing delay between source and destination

Page 4: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

4

Dimension-Ordered Routing (DOR)(also called XY routing)

either minimal XY or YX routing to the destination (here it uses XY route with probability 1.0)

S

D

Page 5: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

5

DOR (XY) Routing with Uniform Traffic

• For an N = K x K mesh, N/2 nodes are in the top half.

• 1/2 of its traffic will cross the bisection.

• Traffic crossing bisection uniformly distributed across K channels.

• Therefore, maximum channel load for DOR with uniform traffic is:

ϒ(DOR, uniform) = [ (N/2) * (1/2) ] / K = K/4

Page 6: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

(0, 0) -> (3, 3)

(1, 0) -> (3, 2)

(2, 0) -> (3, 1)

6

Problem with DOR (XY) Routing

• Minimal hop count• But, in the worst-case, the links can get overly

congested. e.g., transpose traffic pattern.

(0, 0) -> (3, 3)

(1, 0) -> (3, 2)

(2, 0) -> (3, 1)

ϒwc(DOR) = K – 1 >> ϒ(DOR, uniform)

in the worst-case.

Page 7: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

7

Valiant Load-Balancing (VAL) [1981]

randomly chosenintermediate node

minimal XY routing to any intermediate node, then minimal XY routing to destination node

S

D

Page 8: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

8

Valiant Load-Balancing (VAL)

• Works by turning any traffic pattern into2 phases of uniform traffic patterns, even adversarial or worst-case traffic patterns.

• In effect, it evenly load-balances the traffic.• Worst-case channel load

ϒwc(VAL) = 2 * ϒ(DOR, uniform) = 2 * (K/4) = K/2

which is 1/2 network capacity relative to DOR and uniform traffic.

Page 9: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

9

Valiant Load-Balancing (VAL)

• Effective network capacity normalized throughput is 1/2 capacity.

• However, average hop count is 2X DOR.• 1/2 capacity was thought to be the optimal

worst-case throughput for any routing algorithm.

Page 10: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

10

ROMM [1995]

intermediate noderandomly chosen

only in the minimaldirection to destination

minimal XY routing to an intermediate node only in the minimal direction, then minimal XY routing to

the final destination node

S

D

Page 11: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

11

ROMM

• Tries to load-balance traffic by randomly distributing traffic along all possible minimal paths.

• Good that minimal number of hops is guaranteed.

• But, turns out in the worst-case, ROMM performs about as bad as DOR.

Page 12: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

12

O1TURN [2005]

use both minimal XY and YX routing to the destination (0.5 XY + 0.5 YX)

S

D

Page 13: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

13

O1TURN

• Even though it only considers XY or YX path, not all possible paths in VAL or all possible minimal paths in ROMM, it is guaranteed to achieve 1/2 capacity for the even radix case, which has been shown to be optimal.

• For the odd radix case, O1TURN is very near the optimal 1/2 capacity.

• Unlike VAL, O1TURN only uses minimal routing paths, thus no penalty in hop count.

Page 14: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

14

Comparison

Even Radix : Opt * 1Odd Radix : Opt * (1 - 1 / K2)

VALDORROMMO1TURN

0.5

0.4

0.3

0.2

0.1

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Page 15: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

15

Simulation Results

• 4 x 4 2D MESH – Uniform Random Traffic Pattern

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Av

era

ge

La

ten

cy

(c

yc

le)

DOR

ROMM

O1TURN

DUATO

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Page 16: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

16

Simulation Results

• 4 x 4 2D MESH – Matrix Transpose Traffic Pattern– One of the worst-case traffic pattern for DOR

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Ave

rag

e L

aten

cy (c

ycle

) DOR

ROMM

O1TURN

DUATO

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Page 17: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

17

Simulation Results

• 4 x 4 2D MESH – Bit Complement Traffic Pattern– Already balanced traffic pattern

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Ave

rag

e L

aten

cy (c

ycle

) DOR

ROMM

O1TURN

DUATO

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Page 18: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

18

Simulation Results

• 4 x 4 2D MESH – HOT SPOT Traffic Pattern– 2 nodes have 20% of traffic

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Av

era

ge

La

ten

cy

(c

yc

le)

DOR

ROMM

O1TURN

DUATO

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Page 19: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

19

0

500

1000

1500

2000

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Av

era

ge

La

ten

cy

(F

O4

)

DOR

ROMM

O1TURN

DUATO

Simulation Results

• Delay penalty of adaptive routing– How the complexity of router implementation affects on latency– Hot Spot Traffic Pattern

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Page 20: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

20

U2TURN [2012]

• 1/2 capacity has been thought to be optimal worst-case throughput for both odd and even radices, and O1TURN is the state-of-the-art for achieving this for the even radix case.

• But, turns out 1/2 capacity is not optimal for the odd radix case.

Page 21: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

21

U2TURN

• U2TURN considers all possible XYX and YXY 2-TURN paths, and selects these paths with equal probability.

• XYX paths: randomly select a node on the same row and route to it, followed by minimal YX routing to final destination.

• YXY paths: randomly select a node on the same column and route to it, followed by minimal XY routing to final destination.

Page 22: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

22

Analytical Results• For the even radix case, worst-case capacity of U2TURN

= 1/2, same as VAL and O1TURN, which is optimal.

• But, for the odd radix case, worst-case capacity of U2TURN =

(K+1)/(2K+1) > 1/2

which is better than any existing routing algorithm.

Page 23: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

23

Worst-Case Throughput

VALDORU2TURNO1TURNOptimal routing

ROMM

Page 24: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Throughput Comparison for Odd Radix

24

3X3 mesh VAL DOR O1TURN U2TURNWorst-case 0.5 0.33 0.44 0.57

Average-case 0.5 0.405 0.477 0.604

Transpose 0.5 0.33 0.67 0.8

Random 0.5 1 1 0.72

DOR-WC 0.5 0.33 0.67 0.8

Complement 0.5 0.67 0.67 0.57

Nearest-Neighbor 0.5 1.33 1.33 0.75

5X5 VAL DOR O1TURN U2TURN0.5 0.3 0.48 0.55

0.5 0.44 0.53 0.632

0.5 0.3 0.6 0.75

0.5 1 1 0.685

0.5 0.3 0.6 0.75

0.5 0.6 0.6 0.55

0.5 2.4 2.4 1.17

Page 25: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Throughput Comparison for Even Radix

25

4X4 mesh VAL DOR O1TURN U2TURNWorst-case 0.5 0.33 0.5 0.5

Average-case 0.5 0.48 0.54 0.64

Transpose 0.5 0.33 0.67 0.8

Random 0.5 1 1 0.7

DOR-WC 0.5 0.33 0.67 0.8

Complement 0.5 0.5 0.5 0.5

Nearest-Neighbor 0.5 2 2 1.1

6X6 VAL DOR O1TURN U2TURN0.5 0.3 0.5 0.5

0.5 0.47 0.556 0.65

0.5 0.3 0.6 0.75

0.5 1 1 0.682

0.5 0.3 0.6 0.75

0.5 0.5 0.5 0.5

0.5 3 3 1.27

Page 26: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Latency• A potential concern about U2TURN is that it uses

non-minimal routing paths.

• However, U2TURN does a better job of load-balancing traffic than O1TURN for difficult traffic patterns.

• Hence, the queuing delay can be very high for O1TURN for difficult traffic patterns, hence longer latency despite fewer number of hops.

• Surprisingly, latency better for both odd and even radix cases for difficult traffic patterns. 26

Page 27: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Latency for 7 x 7 Mesh

27

Page 28: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

Latency for 8 x 8 Mesh

28

Page 29: Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring 2014 1.

References• Valiant [L.G.Valiant et. al, ACM 1981]• ROMM [T.Nesson et. al, ACM 1995]• O1TURN [D. Seo et. al, ISCA 2005]• U2TURN [G. Sun et. Al, ICCD 2012]

29


Recommended