+ All Categories
Home > Documents > ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200...

ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200...

Date post: 27-Feb-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
29
ISPA 2016, August 23 - 26, Tianjin, China Hosei University ISPA 2016 – 1 / 29 http://cis.k.hosei.ac.jp/yamin/
Transcript
Page 1: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

ISPA 2016, August 23 - 26, Tianjin, China

Hosei University ISPA 2016 – 1 / 29

http://cis.k.hosei.ac.jp/∼yamin/

Page 2: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Question about k-Ary n-Cube

Hosei University ISPA 2016 – 2 / 29

If we have 27 nodes, we can build a 3-ary 3-cube

(0,0,0)

(2,2,2)

(0,2,0)

(2,2,0)

(2,0,2)

(0,0,2)

(0,0,1)

(2,2,1)

(1,0,2)

(0,1,0)(2,1,2)

(1,2,0)

Suppose we have 10,000,000 nodes, k =?, n =? so that

the system has high performance at low cost

k = 3, n = 3

N = kn = 27

Page 3: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Bidirectional 4-Ary 2-Cube (p = 1)

Hosei University ISPA 2016 – 3 / 29

(0,0) (3,0)

(0,3) (3,3)

(1,0) (2,0)

(1,3) (2,3)

Router

Compute

node

Node

External ports: 4

Internal ports: 1 p: the number of computer nodes in a node

Page 4: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Unidirectional 4-Ary 2-Cube (p = 1)

Hosei University ISPA 2016 – 4 / 29

(0,0) (3,0)

(0,3) (3,3)

(1,0) (2,0)

(1,3) (2,3)

Router

Compute

node

Node

External ports: 2

Internal ports: 1 p: the number of computer nodes in a node

Page 5: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Bidirectional or Unidirectional 3-Ary 3-Cube

Hosei University ISPA 2016 – 5 / 29

(0,0,0)

(2,2,2)

(0,2,0)

(2,2,0)

(2,0,2)

(0,0,2)

(0,0,1)

(2,2,1)

(1,0,2)

(0,1,0)(2,1,2)

(1,2,0)

Page 6: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Interconnection Network

Hosei University ISPA 2016 – 6 / 29

Switch (router)

CPU/memory board Interconnection network

Link (cable)

Ports are connected by links

based on a certain topologyNode

Communication port

Used for designing large distributed memory parallel systems

Page 7: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Router with Four External Ports (p = 1)

Hosei University ISPA 2016 – 7 / 29

Compute node

mux

Controller

Compute node

Controller

mux

mux

mux

mux

mux

mux

mux

mux

mux

(a) 5 × 5 crossbar (b) 5 × 5 input buffered crossbar

Page 8: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Cross-Point Buffered Router (p = 1)

Hosei University ISPA 2016 – 8 / 29

Compute node

(Processing element core or CPU/memory board)mux

dem

ux

mux

dem

ux

mux

dem

ux

mux

dem

ux

mux

dem

ux

5 × 5 cross-point buffered crossbar

Crossbar controller

Flits

Page 9: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Diameter Comparison (Bidirectional Torus)

Hosei University ISPA 2016 – 9 / 29

0

50

100

150

200

250

300

350

400

450

500

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

Diameter

Number of nodes in the system

k-ary 2-cubek-ary 3-cubek-ary 4-cubek-ary 5-cubek-ary 6-cube

n-cube

As n becomes larger, the diameter becomes smaller, but degree gets larger

Page 10: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Topological Properties

Hosei University ISPA 2016 – 10 / 29

Network # of nodes Degree Diameter Bisection

n-cube 2n n n 2n−1

k-ary n-cube (mesh) kn 2n n(k − 1) kn−1

Bidirectional

k-ary n-cube (torus)kn 2n n⌊k/2⌋ 2kn−1

Unidirectional

k-ary n-cube (torus)kn n n(k − 1) kn−1

Page 11: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Number of Compute Nodes in a Node

Hosei University ISPA 2016 – 11 / 29

Router

Comp.

node

Comp.

node

Comp.

node

Comp.

node

Router

Comp.

node

Comp.

node

Comp.

node

Router

Comp.

node

Comp.

node

Router

Comp.

node

(d) p = 4(c) p = 3

(b) p = 2(a) p = 1

Node Node

Node

Page 12: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP — Relative Cost Performance

Hosei University ISPA 2016 – 12 / 29

RCP =(d + p)λD

(log2N + p)λlog

2N

d: node degreep: the number of compute nodes in a node

λ: the router complexity (1.0 ≤ λ ≤ 2.0)D: diameter

N : the number of nodes in system

Taking p and λ into consideration

The smaller RCP, the lower cost and higher performance

Page 13: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP of Hypercube

Hosei University ISPA 2016 – 13 / 29

RCP =(d + p)λD

(log2N + p)λlog

2N

=(n + p)λn

(n + p)λn

≡ 1

n-cube:d = n (node degree)

D = n (diameter)

N = 2n (the number of nodes in system)

Irrespective of λ, p, and N

Page 14: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Derivative of RCP

Hosei University ISPA 2016 – 14 / 29

Let x = log2N , then N = 2x = kn, or k = 2x/n, therefore

we have D = kn/2 = 2x/nn/2

Let g(x) = (2n + p)λ2x/nn/2

f(x) = (x + p)λx

Then RCP′ = (g(x)/f(x))′ =

g′(x)f(x) − g(x)f ′(x)

f 2(x)

where g′(x) = (2n + p)λ2x/nln2/2

f ′(x) = ((x + p)λ)′x + (x + p)λx′

= λ(x + p)λ−1x + (x + p)λ

Page 15: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Derivative of RCP

Hosei University ISPA 2016 – 15 / 29

Let RCP′ = 0, i.e.,

g′(x)f(x) = g(x)f ′(x)

The positive number of x can be calculated from the equation

ln2(x + p)x = n((λ + 1)x + p)

Then we can determine an odd k from the equation

k = ⌊2x/n⌋ or

k = ⌈2x/n⌉

If both are even, k = 2x/n + 1

Page 16: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison (p = 1, λ = 1.0)

Hosei University ISPA 2016 – 16 / 29

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

k-ary2-cube

k-ary3-cube

k-ary4-cube

k-ary5-cube

k-ary 6-cube

n-cube

Relativecostperform

ance

Number of nodes in the system

Page 17: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison (p = 1, λ = 1.5)

Hosei University ISPA 2016 – 17 / 29

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

k-ary 6

-cubek-a

ry5-c

ube

k-ary4-cube

k-ary3-cube

k-ary2-cube

n-cube

Relativecostperform

ance

Number of nodes in the system

Page 18: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison (p = 1, λ = 2.0)

Hosei University ISPA 2016 – 18 / 29

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

k-ary

6-cu

be

k-ary 5

-cube

k-ary 4-cube

k-ary3-cube

k-ary2-cube

n-cube

Relativecostperform

ance

Number of nodes in the system

Page 19: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison on λ (p = 1)

Hosei University ISPA 2016 – 19 / 29

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

Relativecostperform

ance

λ

N = 211, 2-cube

N = 211, 3-cube

N = 211, 4-cube

N = 211, 5-cube

N = 211, 6-cube

n-cube

Page 20: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison on p

Hosei University ISPA 2016 – 20 / 29

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

RelativeRCPto

thatwith

p=

1

Number of compute nodes in the system

λ = 1.5, n = 3, p = 4

λ = 1.5, n = 2, p = 4

λ = 1.5, n = 3, p = 2

λ = 1.5, n = 2, p = 2

p = 1

Page 21: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison on n (p = 1)

Hosei University ISPA 2016 – 21 / 29

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2 3 4 5 6 7 8 9 10 11 12 13 14

N=

210 , λ

=1.0

N=

210 ,

λ=

1.5

N=

210 ,

λ=

2.0

N = 220 , λ = 1.0

N=

220 , λ =

1.5

N=

220 , λ

=2.

0

n-cube

Relativecostperform

ance

n

Page 22: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison for Bidirectional Torus

Hosei University ISPA 2016 – 22 / 29

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

n=

5, even

k

n=

5,odd

k

n=

4, even

kn

=4,odd

k

n=

5, even

kn

=5, odd

k

Relativecostperform

ance

Number of nodes in the system

λ = 1.0λ = 1.5λ = 2.0n-cube

Page 23: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Recommended Bidirectional Tori with p = 1

Hosei University ISPA 2016 – 23 / 29

N n k d D RCP λ

121 2 11 4 10 0.576 2.0

256 2 16 4 16 0.617 2.0

343 3 7 6 9 0.794 1.0

1,000 3 10 6 15 0.768 1.5

3,375 3 15 6 21 0.543 2.0

4,913 3 17 6 24 0.545 2.0

14,641 4 11 8 20 0.683 1.5

16,807 5 7 10 15 0.782 1.0

50,625 4 15 8 28 0.525 2.0

117,649 6 7 12 18 0.779 1.0

161,051 5 11 10 25 0.674 1.5

248,832 5 12 10 30 0.742 1.5

759,375 5 15 10 35 0.514 2.0

1,771,561 6 11 12 30 0.668 1.5

2,476,099 5 19 10 45 0.518 2.0

11,390,625 6 15 12 42 0.507 2.0

47,045,881 6 19 12 54 0.728 2.0

Page 24: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP of Mesh (p = 1, λ = 1.5)

Hosei University ISPA 2016 – 24 / 29

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

2.8

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

k-ary 6

-cubek-

ary5-cube

k-ary4-cube

k-ary3-cube

k-ary2-cube

n-cube

Relativecostperform

ance

Number of nodes in the system

Page 25: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Dividing Mesh RCP by Torus RCP

Hosei University ISPA 2016 – 25 / 29

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

2.1

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

MeshrelativeRCPto

torus

Number of nodes in the system

k-ary 2-cubek-ary 3-cubek-ary 4-cubek-ary 5-cubek-ary 6-cube

The performance of mesh is worse than that of torus

Page 26: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Improvement of Unidirectional Torus

Hosei University ISPA 2016 – 26 / 29

1.0

1.5

2.0

2.5

3.0

3.5

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

Bidir.torusRCP/unidir.torusRCP

Number of nodes in the system

k-ary 6-cubek-ary 5-cubek-ary 4-cubek-ary 3-cubek-ary 2-cube

The unidirectional torus has better performance than bidirectional torus

Page 27: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

RCP Comparison for Unidirectional Torus

Hosei University ISPA 2016 – 27 / 29

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

24

26

28

210

212

214

216

218

220

222

224

226

228

230

232

n=

2

n=

4

n=

6

n=

8

n=

2

n=

4

n=

6

n=

2

n=

4n = 6

Relativecostperform

ance

Number of nodes in the system

λ = 1.0λ = 1.5λ = 2.0n-cube

Page 28: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Recommended Undirectional Tori with p = 1

Hosei University ISPA 2016 – 28 / 29

N n k d D RCP λ

196 2 14 2 26 0.414 2.0

256 4 4 4 12 0.833 1.0

512 3 8 3 21 0.590 1.5

1,024 5 4 5 15 0.818 1.0

2,744 3 14 3 39 0.354 2.0

3,375 3 15 3 42 0.354 2.0

4,096 4 8 4 28 0.557 1.5

15,625 6 5 6 24 0.808 1.0

32,768 5 8 5 35 0.536 1.5

50,625 4 15 4 56 0.324 2.0

59,049 5 9 5 40 0.536 1.5

262,144 6 8 6 42 0.522 1.5

531,441 6 9 6 48 0.522 1.5

759,375 5 15 5 70 0.306 2.0

1,048,576 5 16 5 75 0.306 2.0

7,529,536 6 14 6 78 0.294 2.0

24,137,569 6 17 6 96 0.294 2.0

Page 29: ISPA 2016 - 法政大学 [HOSEI UNIVERSITY]Hosei University ISPA 2016 – 9 / 29 0 50 100 150 200 250 300 350 400 450 500 24 26 28 210 212 214 216 218 220 222 224 226 228 230 232 Diameter

Summary

Hosei University ISPA 2016 – 29 / 29

The k-ary n-cube has been deeply investigated and widely

adopted in real supercomputer designs

We proposed an analytical model for evaluating the relative

cost performance to hypercube

RCP = ((d + p)λD) / ((log2N + p)λlog

2N)

By using this model, we can con�gure the k-ary n-cube toachieve high performance at low cost.

We also investigated k-ary n-dimensional mesh and

unidirectional k-ary n-dimensional torus

The unidirectional k-ary n-dimensional torus is better than

that of the bidirectional k-ary n-dimensional torus


Recommended