+ All Categories
Home > Documents > O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali,...

O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali,...

Date post: 02-Jan-2016
Category:
Upload: brittney-fitzgerald
View: 214 times
Download: 0 times
Share this document with a friend
32
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo , Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University
Transcript

O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks

DaeHo Seo, Akif Ali, WonTaek LimNauman Rafique, Mithuna Thottethodi

School of Electrical and Computer EngineeringPurdue University

June 08 2005 Purdue University 2

Motivation

• New routing algorithm for 2D Mesh networks : O1TURN

• Why 2D Mesh networks?– Important class of interconnection

network– Natural topology for on-chip

network– Many Applications

• “yet another routing algorithm”?

June 08 2005 Purdue University 3

Routing Algorithms: Objectives

• Maximize throughput and minimize latency

• O1TURN satisfies all design goals

IDEAL DOR ROMM VALIANT MIN-ADAPTIVE

Average case throughput X X X

Worst case Throughput X X ?

Minimal # of network hops X X X X

Low complexity router X X X

June 08 2005 Purdue University 4

Challenges

• Intuition: Path flexibility, Load Balancing, Throughput correlated

• Prior results – Throughput : Increasing path flexibility [SPAA 2002]

• May not improve worst case throughput, even decrease• Likely to improve average case throughput

– Latency : Increasing path flexibility may increase router complexity

IDEAL DOR ROMM VALIANT MIN-ADAPTIVE

Average case throughput X X X

Worst case Throughput X X ?

Minimal # of network hops X X X X

Low complexity router X X X

# of Paths ? 1 Θ(K’2) Θ(K2) Θ(2K’)

June 08 2005 Purdue University 5

Contributions

• Develop new routing algorithm : O1TURN• Throughput

– Better than DOR / ROMM for worst-case throughput• Near optimal worst-case throughput for 2D Mesh

– Captures most of the “opportunity” with limited path flexibility for average case throughput

• O1TURN (with 2 paths) as good as ROMM (with Θ(K’2) paths)

• Latency– Router Implementation for O1TURN

• Comparable complexity as simple DOR router

• Key Point :– Partition the delay-critical circuitry

• O1TURN is minimal : One goal trivially satisfied

June 08 2005 Purdue University 6

Outline

• Background of interconnection network

• O1TURN routing algorithm

• O1TURN router implementation

• Simulation Results

• Conclusion and Q&A

June 08 2005 Purdue University 7

Outline

• Background of interconnection network

• O1TURN routing algorithm

• O1TURN router implementation

• Simulation Results

• Conclusion and Q&A

June 08 2005 Purdue University 8

Background

• Packet Switched, 2D mesh network– Each packet independently routed

• Terminology– Network Radix = k in kxk network (NOT Degree)

• Simplifying assumptions for this talk – One packet crosses a link in one cycle– Square mesh networks (K x K)– K is even (K = 2p)

• Analytical method for throughput analysis– TD Method [Towles and Dally, SPAA 2002]– Worst-case throughput = (Maximum channel load)-1

– Given permutation and (oblivious) routing algorithm• Find maximum channel load

– Given only (oblivious) routing algorithm• Find permutation that causes maximum channel load

June 08 2005 Purdue University 9

TD-Method Example

A B

C D

Traffic :Src -> DstA -> DD -> A

A -> B -> D A -> B -> DA -> C -> D

A B

C D

1

11

1

D -> C -> A

A B

C D

0.5

0.5

0.5 0.5

0.5

0.5

0.50.5

D -> B -> AD -> C -> A

• Max Channel Load = 1• Worst-case Throughput = (1 / 1) = 1

• Max Channel Load = 0.5• Worst-case Throughput = (1 / 0.5) = 2

Unit of worst-case throughput = packets / node / cycle

June 08 2005 Purdue University 10

Outline

• Background of interconnection network

• O1TURN routing algorithm

• O1TURN router implementation

• Simulation Results

• Conclusion and Q&A

June 08 2005 Purdue University 11

O1TURN routing algorithm

• Orthogonal 1 TURN routing– There is no U-TURN => Orthogonal– At most 1 turn => 1TURN

• Use 2 routes– At most 2 minimal, 1-turn routes in

2D MESH (XY, YX)– Two routing algorithms (XY routing,

YX routing)– With same probability

S

D1

2

June 08 2005 Purdue University 12

O1TURN routing algorithm

• Claim: Maximum channel load of O1TURN is K / 2• Proof: Two sources of load contributions

– # of nodes of left side of channel by XY routing– # of nodes of right side of channel by YX routing

……………

……………

……………

……………

……………

……………

……………

……………

……………

……………

……………

C

……………

……………

……………

……………

……………

……………

……………

……………

……………

……………

……………

CN * 0.5 (K - N) * 0.5

XY routing YX routing

June 08 2005 Purdue University 13

Optimal Worst Case Throughput

• Maximum channel load = K / 2– Worst-case Throughput = 2 / K by TD

Method

• Consider a permutation where 100% packets cross bisection– Throughput (X) bounded when bisection links

saturated– X * (K2 / 2) = K – X = 2 / K packets / node / cycle

• When K is odd, O1TURN is within (1 / K2) of optimal worst-case throughput

K x K mesh

June 08 2005 Purdue University 14

0

0.2

0.4

0.6

0.8

1

2 4 6 8 10 12 14 16

Network Radix (k)

No

rma

lize

d T

hro

ug

hp

ut

OPTIMAL

DOR

ROMM

O1TURN

Worst-case Throughput Trends

• Worst-case channel load as network size changes– Normalized to Optimal worst-case throughput– Worst case throughput of DOR, ROMM degrades with K

RecallEven Radix : Opt * 1Odd Radix : Opt * (1 - 1 / K2)

June 08 2005 Purdue University 15

Average Case Analysis

• Extension of TD method [B.Towles et.al., SPAA 2003] – Examine randomly chosen permutations– Harmonic means of worst-case throughput of various

permutations– 1 M random permutations

• O1TURN shows the better or the same average case throughput

4 x 4 2D MESH

DOR ROMM O1TURN

Average case throughput 1 1.113 1.136

8 x 8 2D MESH

Average case throughput 1 1.180 1.188

June 08 2005 Purdue University 16

O1TURN Summary

• Near optimal worst-case Throughput– By TD method– Optimal for even K – Approaches Optimal for large, odd K

• Average case throughput– Better than DOR and comparable to ROMM

• Minimal # of network hops

– O1TURN is minimal routing

June 08 2005 Purdue University 17

Outline

• Background of interconnection network

• O1TURN routing algorithm

• O1TURN router implementation

• Simulation Results

• Conclusion and Q&A

June 08 2005 Purdue University 18

Base Router Implementation

• Base Router : Pipelined Virtual Channel Router– 4 Stages : Routing, Virtual Channel allocation, Switch allocation,

Crossbar & Physical Channel transfer– One control block controls all virtual channels– Critical Stage : Virtual Channel allocation stage

VC ID

INJECT

X+

X-

Y+

Y-

EJECT

5 X 5CROSSBAR

Routing AlgorithmVC Allocation

Switch Allocation

CREDITS OUT (ALL PCs and VCs) CREDITS IN (ALL PCs and VCs)

June 08 2005 Purdue University 19

O1TURN Router Implementation

• O1TURN Router– Separate Virtual Channels into two virtual networks (VN)– One VN for XY routing, the other for YX routing– Deadlock prevention in each independent VN due to DOR

VC ID

INJECT

X+

X-

Y+

Y-

EJECT

5 X 5CROSSBAR

Routing (YX)VC Allocation

CREDITS OUT (ALL PCs and YX VCs)CREDITS IN

(ALL PCs and YX VCs)

Routing (XY)VC Allocation

Switch Allocation

CREDITS IN (ALL PCs and XY VCs)

CREDITS OUT (ALL PCs and XY VCs)

June 08 2005 Purdue University 20

• Existing router delay models for pipelined routers – Peh and Dally [HPCA 2001]

• Based on the logical effort method – [I.Sutherland, B. Sproull, 1999]– FO4 unit

– Comparable complexity as DOR router

Delay Analysis

VCs / PC DOR O1TURN

VC allocation SW allocation VC allocation SW allocation

4 17 14 14 14

8 20 16 17 16

June 08 2005 Purdue University 21

O1TURN Summary

• Near Optimal Worst case Throughput

• Good average case Throughput• Minimal Network Hops

• Low Complexity Router Implementation– Comparable complexity as

DOR router

IDEAL O1TURN

Average case throughput X X

Worst case Throughput X X

Minimal # of network hops X X

Low complexity router X X

June 08 2005 Purdue University 22

Outline

• Background of interconnection network

• O1TURN routing algorithm

• O1TURN router implementation

• Simulation Results

• Conclusion and Q&A

June 08 2005 Purdue University 23

Evaluation Method

• Modified Popnet network Simulator [L. Shang, 2003]• 4x4 2D MESH (8x8 in paper)• Full-duplex, bidirectional links• 8 VCs per PC• 5 Flits per packet• 500 K cycles• Synthetic Traffic: Uniform Random, BC, MT, HOT SPOT• Compared with existing routing algorithms

– Oblivious routing algorithms (DOR, ROMM)– Adaptive routing algorithm (DUATO)

June 08 2005 Purdue University 24

Simulation Results

• 4 x 4 2D MESH – Uniform Random Traffic Pattern

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Av

era

ge

La

ten

cy

(c

yc

le)

DOR

ROMM

O1TURN

DUATO

June 08 2005 Purdue University 25

Simulation Results

• 4 x 4 2D MESH – Matrix Transpose Traffic Pattern– One of the worst-case traffic pattern for DOR

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Ave

rag

e L

aten

cy (c

ycle

) DOR

ROMM

O1TURN

DUATO

June 08 2005 Purdue University 26

Simulation Results

• 4 x 4 2D MESH – Bit Complement Traffic Pattern– Already balanced traffic pattern

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Ave

rag

e L

aten

cy (c

ycle

) DOR

ROMM

O1TURN

DUATO

June 08 2005 Purdue University 27

Simulation Results

• 4 x 4 2D MESH – HOT SPOT Traffic Pattern– 2 nodes have 20% of traffic

0

50

100

150

200

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Av

era

ge

La

ten

cy

(c

yc

le)

DOR

ROMM

O1TURN

DUATO

June 08 2005 Purdue University 28

0

500

1000

1500

2000

0 0.2 0.4 0.6 0.8 1

Throughput (flits / node / cycle)

Av

era

ge

La

ten

cy

(F

O4

)

DOR

ROMM

O1TURN

DUATO

Simulation Results

• Delay penalty of adaptive routing– How the complexity of router implementation affects on latency– Hot Spot Traffic Pattern

June 08 2005 Purdue University 29

Outline

• Background of interconnection network

• O1TURN routing algorithm

• O1TURN router implementation

• Simulation Results

• Conclusion and Q&A

June 08 2005 Purdue University 30

Related Work

• Routing algorithms– Valiant [L.G.Valiant et.al, ACM 1981]– ROMM [T.Nesson et.al, ACM 1995]– DUATO [J.Duato et.al, 1993]

• Partitioned router implementation– Mad Postman [Jesshope et.al, ISCA 1989]– PFNF [Upadhyay et.al, 1997]

• Analysis methods– Worst-case [B.Towles et.al, 2002]– Throughput centric [B.Towles et.al, 2003]– Delay model [L.S.Peh et.al, HPCA 2001]

June 08 2005 Purdue University 31

Conclusion

• Goals– Good average case throughput– Good or Optimal worst case throughput– Minimal # of network hops– Low complexity router implementation

• O1TURN – Provide near optimal worst case throughput– Provide the better or the same average case throughput

compared with existing routing algorithms– Minimal # of network hops– Simple router implementation : comparable with DOR router– Satisfy all performance aspects

June 08 2005 Purdue University 32

Q & A


Recommended