+ All Categories
Home > Documents > Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform...

Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform...

Date post: 18-May-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
29
Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and Computer Engineering Georgia Institute of Technology [email protected] ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/
Transcript
Page 1: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Lecture 3:Topology - IITushar Krishna

Assistant ProfessorSchool of Electrical and Computer EngineeringGeorgia Institute of Technology

[email protected]

ECE 8823 A / CS 8803 - ICNInterconnection NetworksSpring 2017http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/

Page 2: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Topology: How to connect the nodes with links

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

2

~Road Network

Page 3: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Run-Time Metrics¡Hop Count¡Latency

¡Maximum Channel Load¡Throughput

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

3

Page 4: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Maximum Channel Load¡Identify channel with maximum traffic¡Count total flows through it

¡Maximum Throughput = 1 / (max channel load)

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

4

Page 5: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Maximum Channel Load

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

5

A

B

D

C E

H

G

F

¡ Identify bottleneck channel¡ For uniform random traffic, is the bisection channel

¡Suppose each node generates p messages per cycle¡ 4p messages per cycle in left ring¡ 2p message per cycle will cross to other ring¡ Link can handle one message per cycle¡ So maximum injection rate of p = ½

Page 6: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Maximum Channel Load

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

6

¡What if Hot Spot Traffic?¡Suppose every node sends to node G

¡Which is the bottleneck channel?¡Used by A, B, C, D, E, and F to send to G¡Max Throughput = 1 / 6

A

B

D

C E

I

G

F

H

Page 7: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Maximum Channel Load

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

7

0 1 2 3 4 5 6 7

With uniform random traffic– 3 sends 1/8 of its traffic to 4,5,6– 3 sends 1/16 of its traffic to 7 (2 possible shortest paths)– 2 sends 1/8 of its traffic to 4,5 – Etc

Max Channel load = 1

Page 8: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Traffic Patterns¡ Historically derived from

particular applications of interest

¡ Important to stress test the network with different patterns¡ Uniform random can make

bad topologies look good

¡ For a particular topology and traffic pattern, one can derive¡ Avg Hop Count (à Low-

Load Latency)¡ Max Channel Load (à Peak

Throughput)

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

8

Page 9: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Is it possible to achieve derived low-load latency & peak throughput?

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

9

Latency

Offered Traffic (bits/sec)

Min latency given by topology

Min latency given by routing algorithm

Zero load latency(topology+routing+f

low control)

Throughput given by topology

Throughput given by routing

Throughput given by flow

control

Page 10: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Uniform Random Traffic on a k x k Mesh

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

10

Zero-load latency?(“Ideal Latency”)T = (H+1).(trouter + tstall_avg)+ (H+2).(twire) + Tser

H = number of hops inside networktrouter = per-hop router pipeline delaytwire = per-hop link delaytstall = per-hop stall delay (due to contention)Tser = serialization delay

Let’s assume 1-flit packets (Tser = 0)Ideal case: trouter = 1, twire = 1

Zero-load => tstall_avg ~ 0Suppose k = 8, Havg = 5.333 => Tzero-load = 13.666

Page 11: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Uniform Random Traffic on a k x k Mesh

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

11

Saturation Throughput?(“Ideal Throughput” or Peak Injection Rate)

1 / max channel loadLets calculate load on one of the bisection links

- k2/2 nodes on the left. - Half their messages (k2/4) cross the bisection links- Total k bisection links from left to right.- Load on each bisection link = k2/4k = k/4- Peak Throughput = 4/k

For k = 4, peak throughput = 1 flit/node/cycleFor k = 8 (64-core mesh), peak throughput = ½ flits / node / cycle

Page 12: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Latency-throughput for 8x8 Mesh in Lab 1

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

12

Latency

Offered Traffic (flits/node/cycle)Min latency

given by topology

Actual zero-Load Latency (Lab 1)

Throughput given by topology

Actual throughput

(lab 1)

13.66

0.50.37*

34.4

*Garnet injected 1-flit and 5-flit packets with probability 2/3 and 1/3

Latency Gap

Throughput Gap

What is the default router delay in Garnet?

Page 13: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Another representation: Injection Rate as a % of “capacity”

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

13

Offered Traffic (% of capacity)

10050

For 4x4 Mesh, 100 => 1 flit/node/cycleFor 8x8 Mesh, 100% => 0.5 flits/node/cycle

Latency

This representation is better to understand if we are able to achieve the throughput the network was actually designed for

Page 14: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Topology Classification¡Direct¡ Each router is associated with a terminal node¡ All routers are sources and destinations of traffic¡ Example: Ring, Mesh, Torus¡ Most on-chip networks use direct topologies

¡Indirect¡ Routers are distinct from terminal nodes¡ Terminal nodes can source / sink traffic¡ Intermediate nodes switch traffic¡ Examples: Crossbar, Butterfly, Clos, Omega, Benes, …

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

14

Page 15: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Crossbar¡Pros¡Every node connected to all

others (non-blocking)¡Low latency and high

bandwidth¡Used by GPUs

¡Cons¡Area and Power goes up

quadratically (O(N2) cost)¡Expensive to layout¡Difficult to arbitrate

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

15

S

S

S

SDDD …

D

Switch

Bisection BW = ?Degree = ?Diameter = ?

1N

1

Page 16: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Butterfly (k-ary n-fly)

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

16

As a convention, source and destination nodes drawn logically separate on the left and right, though physically the two 0s, two 1s, etc are often the same physical node.

Radix of each switch = k (i.e., k inputs and k outputs

2-ary 4-fly

Number of stages = n

Total Source/Destination Terminal Nodes = kn

In each stage, kn-1 switchesEach switch is a k x k crossbar

Sources Destinations

Page 17: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Butterfly (k-ary n-fly): Metrics

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

17

Degree?

Bisection Bandwidth?

Diameter?

Hop Count?

2-ary 4-fly

Channel Load?(for uniform traffic)

Path Diversity?

k

n+1

N/4where N = kn)

n+1

1

None.Only one route between any pair

Page 18: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Tackling path diversity in a butterfly

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

18

Additional Stage

Page 19: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Beneš Network

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

19

Pronounced Ben-ish

Back to back butterflies

N-alternate paths between any pairIs non-blocking

Page 20: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Shuffle/Omega Network(Isomorphic Butterfly)

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

20

00

01

02

03

10

12

11

13

20

21

22

23

00

01

02

03

10

11

12

13

20

21

22

23

Shuffle Network 2-ary 3-fly

Page 21: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Clos Networks: (m, n, r)

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

21

3-stages

m = number of middle switches

n = number of input (output) ports on input (output) switches

r = number of input / output switches

Clos (5, 3, 4)

Page 22: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Non-blocking Clos¡ A clos network is strictly non-blocking for unicast

traffic iff m >= 2n-1¡ an unused input on an ingress switch can always be

connected to an unused output on an egress switch without having to re-arrange existing routes

¡ Proof (1953):¡ Suppose an input switch has one free terminal and this has to

be connected to a free terminal of an output switch¡ Worst case¡ (n-1) input terminals of input switch use (n-1) separate middle

switches¡ (n-1) output terminals of output switch use (n-1) separate middle

switches¡ We need another middle switch to connect this input to output¡ Total = (n-1) + (n-1) + 1 = 2n-1

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

22

Page 23: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Non-blocking Clos¡A clos network is rearrangeably non-blocking

for unicast traffic iff m >= n¡an unused input on an ingress switch can always be

connected to an unused output on an egress switch but this might require re-arranging of existing routes

¡Proof (1953):¡ If m = n, each input can use one middle switch to

connect to its output

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

23

Page 24: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Binary Fat Tree

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

24

1

2

4

Diameter?

Bisection Bandwidth? N

2log2N

Can be built by folding a multi-stage clos

Page 25: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Beneš à Folded Clos

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

25

Page 26: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Hierarchical Topologies: Concentrators

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

26

Advantages:

Disadvantages:

- Low diameter- Fewer links

- Lower bisection bandwidth- Link at concentrator can become bottleneck

Page 27: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

More Hierarchical Topologies

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

27

ATAC: PACT 2010

Page 28: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Which topology should you choose?¡Hard to optimize for everything¡Desired bandwidth¡Desired latency

¡Physical Constraints¡Wire budget¡ Indirect topologies popular off-chip¡ On-chip networks often use direct topologies due to wiring

constraints¡Wire layout¡ Topologies should be easy to layout on a planar 2D

substrate¡ Router complexity¡ Number of ports

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

28

Page 29: Lecture 3: Topology - IIpwp.gatech.edu/ece-tushar/wp-content/uploads/sites/... · ¡For uniform random traffic, is the bisection channel ¡Suppose each node generates p messages per

Lab 2¡Topology Comparison!¡You will implement a Flattened Butterfly Topology¡ Kim et al., “Flattened Butterfly Topology for On-Chip

Networks”, MICRO 2007¡ Read the paper to understand the topology¡ You can ignore the routing/flow-control details for now as

we haven’t covered that in class yet¡ Compare performance against Mesh keeping the

bisection bandwidth constant¡ I will email out details¡ Looking at $gem5/configs/topologies to see how

topologies are implemented. You will write a FlattenedButterfly.py file

January 18, 2017ICN | Spring 2017 | L03: Topology - II © Tushar Krishna, School of ECE, Georgia Tech

29


Recommended