+ All Categories
Home > Documents > 3. Interconnection Networks

3. Interconnection Networks

Date post: 21-Feb-2016
Category:
Upload: merrill
View: 47 times
Download: 0 times
Share this document with a friend
Description:
3. Interconnection Networks . Historical Perspective. Early machines were: Collection of microprocessors. Communication was performed using bi-directional queues between nearest neighbors. Messages were forwarded by processors on path. “Store and forward” networking - PowerPoint PPT Presentation
Popular Tags:
30
3. Interconnection Networks
Transcript
Page 1: 3. Interconnection Networks

3. Interconnection Networks

Page 2: 3. Interconnection Networks

Historical Perspective• Early machines were:

• Collection of microprocessors.• Communication was performed using bi-directional queues

between nearest neighbors.• Messages were forwarded by processors on path.

• “Store and forward” networking• There was a strong emphasis on topology in algorithms,

in order to minimize the number of hops = minimize time

Page 3: 3. Interconnection Networks

Network Analogy

• To have a large number of transfers occurring at once, you need a large number of distinct wires.

• Networks are like streets:• Link = street.• Switch = intersection.• Distances (hops) = number of blocks traveled.• Routing algorithm = travel plan.

• Properties:• Latency: how long to get between nodes in the

network.• Bandwidth: how much data can be moved per unit

time.• Bandwidth is limited by the number of wires and the rate at

which each wire can accept data.

Page 4: 3. Interconnection Networks

Design Characteristics of a Network• Topology (how things are connected):

• Crossbar, ring, 2-D and 3-D meshs or torus, hypercube, tree, butterfly, perfect shuffle ....

• Routing algorithm (path used):• Example in 2D torus: all east-west then all

north-south (avoids deadlock).• Switching strategy:

• Circuit switching: full path reserved for entire message, like the telephone.

• Packet switching: message broken into separately-routed packets, like the post office.

• Flow control (what if there is congestion):• Stall, store data temporarily in buffers, re-route data

to other nodes, tell source node to temporarily halt, discard, etc.

Page 5: 3. Interconnection Networks

Performance Properties of a Network: Latency• Diameter: the maximum (over all pairs of nodes) of the

shortest path between a given pair of nodes.• Latency: delay between send and receive times

• Latency tends to vary widely across architectures• Vendors often report hardware latencies (wire time)• Application programmers care about software

latencies (user program to user program)• Observations:

• Hardware/software latencies often differ by 1-2 orders of magnitude

• Maximum hardware latency varies with diameter, but the variation in software latency is usually negligible

• Latency is important for programs with many small messages

Page 6: 3. Interconnection Networks

Performance Properties of a Network: Bandwidth• The bandwidth of a link = w * 1/t

• w is the number of wires• t is the time per bit

• Bandwidth typically in Gigabytes (GB), i.e., 8* 220 bits• Effective bandwidth is usually lower than physical link

bandwidth due to packet overhead.Routing and control header

Data payload

Error code

Trailer

• Bandwidth is important for applications with mostly large messages

Page 7: 3. Interconnection Networks

Performance Properties of a Network: Bisection Bandwidth

• Bisection bandwidth: bandwidth across smallest cut that divides network into two equal halves

• Bandwidth across “narrowest” part of the network

bisection cut

not a bisectioncut

bisection bw= link bw bisection bw = sqrt(n) * link bw

• Bisection bandwidth is important for algorithms in which all processors need to communicate with all others

Page 8: 3. Interconnection Networks

Network Topology• In the past, there was considerable research in network

topology and in mapping algorithms to topology.• Key cost to be minimized: number of “hops” between

nodes (e.g. “store and forward”)• Modern networks hide hop cost (i.e., “wormhole

routing”), so topology is no longer a major factor in algorithm performance.

• Example: On IBM SP system, hardware latency varies from 0.5 usec to 1.5 usec, but user-level message passing latency is roughly 36 usec.

• Need some background in network topology• Algorithms may have a communication topology• Topology affects bisection bandwidth.

Page 9: 3. Interconnection Networks
Page 10: 3. Interconnection Networks
Page 11: 3. Interconnection Networks

Linear and Ring Topologies

• Linear array

• Diameter = n-1; average distance ~n/3.• Bisection bandwidth = 1 (in units of link bandwidth).

• Torus or Ring

• Diameter = n/2; average distance ~ n/4.• Bisection bandwidth = 2.• Natural for algorithms that work with 1D arrays.

Page 12: 3. Interconnection Networks

Meshes and Tori

Two dimensional mesh • Diameter = 2 * (sqrt(n ) – 1)• Bisection bandwidth = sqrt(n)

• Generalizes to higher dimensions (Cray T3D used 3D Torus).• Natural for algorithms that work with 2D and/or 3D arrays.

Two dimensional torus• Diameter = sqrt(n )• Bisection bandwidth = 2* sqrt(n)

Page 13: 3. Interconnection Networks

Hypercubes• Number of nodes n = 2d for dimension d.

• Diameter = d. • Bisection bandwidth = n/2.

• 0d 1d 2d 3d 4d

• Popular in early machines (Intel iPSC, NCUBE).• Lots of clever algorithms.

• Greycode addressing:• Each node connected to

d others with 1 bit different. 001000

100

010 011

111

101

110

Page 14: 3. Interconnection Networks

Trees

• Diameter = log n.• Bisection bandwidth = 1.• Easy layout as planar graph.• Many tree algorithms (e.g., summation).• Fat trees avoid bisection bandwidth problem:

• More (or wider) links near top.• Example: Thinking Machines CM-5.

Page 15: 3. Interconnection Networks

Butterflies with n = (k+1)2^k nodes• Diameter = 2k.• Bisection bandwidth = 2^k.• Cost: lots of wires.• Used in BBN Butterfly.• Natural for FFT.

O 1O 1

O 1 O 1

butterfly switchmultistage butterfly network

Page 16: 3. Interconnection Networks
Page 17: 3. Interconnection Networks

Topologies in Real MachinesRed Storm (Opteron + Cray network, future)

3D Mesh

Blue Gene/L 3D Torus

SGI Altix Fat tree

Cray X1 4D Hypercube*

Myricom (Millennium) Arbitrary

Quadrics (in HP Alpha server clusters)

Fat tree

IBM SP Fat tree (approx)

SGI Origin Hypercube

Intel Paragon (old) 2D Mesh

BBN Butterfly (really old) Butterfly

olde

r n

ewer

Many of these are approximations:E.g., the X1 is really a “quad bristled hypercube” and some of the fat trees are not as fat as they should be at the top

Page 18: 3. Interconnection Networks

Performance Models

Page 19: 3. Interconnection Networks
Page 20: 3. Interconnection Networks
Page 21: 3. Interconnection Networks
Page 22: 3. Interconnection Networks
Page 23: 3. Interconnection Networks

Latency and Bandwidth Model

• Time to send message of length n is roughly

• Topology is assumed irrelevant.• Often called “ model” and written

• Usually >> >> time per flop.• One long message is cheaper than many short ones.

• Can do hundreds or thousands of flops for cost of one message.• Lesson: Need large computation-to-communication ratio to

be efficient.

Time = latency + n*cost_per_word = latency + n/bandwidth

Time = + n*

nn

Page 24: 3. Interconnection Networks

Alpha-Beta Parameters on Current Machines• These numbers were obtained empirically

machine

T3E/Shm 1.2 0.003T3E/MPI 6.7 0.003IBM/LAPI 9.4 0.003IBM/MPI 7.6 0.004Quadrics/Get 3.267 0.00498Quadrics/Shm 1.3 0.005Quadrics/MPI 7.3 0.005Myrinet/GM 7.7 0.005Myrinet/MPI 7.2 0.006Dolphin/MPI 7.767 0.00529Giganet/VIPL 3.0 0.010GigE/VIPL 4.6 0.008GigE/MPI 5.854 0.00872

is latency in usecsis BW in usecs per Byte

How well does the model Time = + n*predict actual performance?

Page 25: 3. Interconnection Networks

End to End Latency Over Time

nCube/2

nCube/2 CM5

CM5 CS2

CS2

SP1SP2ParagonT3D

T3DSPP

KSRSPPCenju3

T3E

T3E

SP-Power3

QuadricsMyrinet

Quadrics1

10

100

1000

1990 1992 1994 1996 1998 2000 2002Year (approximate)

usec

• Latency has not improved significantly, unlike Moore’s Law• T3E (shmem) was lowest point – in 1997

Data from Kathy Yelick, UCB and NERSC

Page 26: 3. Interconnection Networks

Send Overhead Over Time

• Overhead has not improved significantly; T3D was best• Lack of integration; lack of attention in software

Myrinet2K

Dolphin

T3E

Cenju4

CM5

CM5

Meiko

MeikoParagon

T3D

Dolphin

Myrinet

SP3

SCI

Compaq

NCube/2

T3E0

2

4

6

8

10

12

14

1990 1992 1994 1996 1998 2000 2002Year (approximate)

usec

Data from Kathy Yelick, UCB and NERSC

Page 27: 3. Interconnection Networks

Bandwidth Chart

0

50

100

150

200

250

300

350

400

2048 4096 8192 16384 32768 65536 131072

Message Size (Bytes)

Ban

dwid

th (M

B/s

ec)

T3E/MPIT3E/ShmemIBM/MPIIBM/LAPICompaq/PutCompaq/GetM2K/MPIM2K/GMDolphin/MPIGiganet/VIPLSysKonnect

Data from Mike Welcome, NERSC

Page 28: 3. Interconnection Networks

1

10

100

1000

10000

8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072

T3E/Shm

T3E/MPI

IBM/LAPI

IBM/MPI

Quadrics/Shm

Quadrics/MPI

Myrinet/GM

Myrinet/MPI

GigE/VIPL

GigE/MPI

Drop Page Fields Here

Sum of model

size

machine

Model Time Varying Message Size & Machines

Page 29: 3. Interconnection Networks

1

10

100

1000

10000

8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072

T3E/Shm

T3E/MPI

IBM/LAPI

IBM/MPI

Quadrics/Shm

Quadrics/MPI

Myrinet/GM

Myrinet/MPI

GigE/VIPL

GigE/MPI

Drop Page Fields Here

Sum of gap

size

machine

Measured Message Time

Page 30: 3. Interconnection Networks

Results: EEL and Overhead

0

5

10

15

20

25

T3E/M

PI

T3E/Shm

em

T3E/E-R

eg

IBM/M

PI

IBM/LA

PI

Quadri

cs/M

PI

Quadri

cs/P

ut

Quadri

cs/G

et

M2K/M

PI

M2K/G

M

Dolphin

/MPI

Gigane

t/VIP

L

usec

Send Overhead (alone) Send & Rec Overhead Rec Overhead (alone) Added Latency

Data from Mike Welcome, NERSC


Recommended