+ All Categories
Home > Documents > What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

Date post: 31-Dec-2015
Category:
Upload: nicholas-lamb
View: 22 times
Download: 1 times
Share this document with a friend
Description:
What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?. Shengxin Zhu The University of Oxford. Prof. Xingping Liu and Prof. Tongxiang Gu National Key Laboratory of Computational Physics Institute of Applied Physics and Computational Mathematics. - PowerPoint PPT Presentation
Popular Tags:
28
22/4/25 SNSCC'12, [email protected]. ac.uk 1 What is the most important kernel of sparse linear solvers for heterogeneous supercomputers? Shengxin Zhu The University of Oxford Prof. Xingping Liu and Prof. Tongxi ang Gu National Key Laboratory of Computational Physics Institute of Applied Physics and Computational Mathematics
Transcript
Page 1: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 1

What is the most important kernel of sparse linear

solvers for heterogeneous supercomputers?

Shengxin ZhuThe University of Oxford

Prof. Xingping Liu and Prof. Tongxiang Gu

National Key Laboratory of Computational Physics Institute of Applied Physics and Computational Mathematics

Page 2: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 2

Outlines

Brief introduction on Heterogeneous supper-computers Computation kernels of Krylov methods Influence of communications Case study: GPBiCG(m,l) Challenging problems Conclusion

Page 3: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 3

Introduction to heterogeneous supper-computers

Dawning5000A Nodes: Bandwidth: Memory:

3

Dawning 5000Ranking history

11/2008 11th

06/2009 15th

11/2009 19th

06/2010 24th

11/2010 35th

06/2011 40th

11/2011 58th

2011/ Nov : top500

1st K (JP)

2st NUDT (CN)

3rd Cray (US)

4th Dawning (CN)

Page 4: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 4

Computational kernels of Krylov methods

Vector update: parallel in nature

Mat-vec : Computation intensive; multi-core technology CUDA/O

penMP

Inner product: Communication intensive (CPU/MPI).

Page 5: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 5

Influence of communicationfirst glance

S Zhu, MSc Thesis, CAEP, 2010

Computation cheap

Communication expensive

Based on Aztec by Prof. Tuminaro et al @ Sandia

Page 6: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 6

Real reason for time-consuming communications

Small workshops: focus less preparing time

Conference: diversity more preparing time

2

2

k dots

g

2 /

lo

f s w

w

k lnn

s

i tkNtt kP t

t t P

vector update 2 /vec flt Nt P

_mat_vec 2 / -1m v z fln Nt Pt

bandwidth :Latency :

w

s

tt

Page 7: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 7

Strategies for minimizing communications

Replacing dot by others (semi-Chebyshev ) : workshop only no conference if possible. Inner product free , Gu, Liu, Mo(2002)

Reorganizing algorithm such that: (reduce number of conference and each conference accept more talks) residual replacement strategies due to Von de Vorst (2000s). CA –KSMs, Demmel et al (2008)

Overlapping communication over computation

Page 8: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 8

A case study, Paralleling GPBiCG(m,l) (S. Fujino, 2002)

GPBiCG(1,0) BiCGSTAB

GPBiCG(0,1) GPBiCG

GPBiCG(1,1) BiCGSTAB2

Could be used to design breakdown free BiCGSTAB method.

Page 9: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 9

GPBiCG(m,l) (S. Fujino, 2002)

*0

*0

0 0 1 1

1 1 1

1 1

1. , 0,

2. 0,1,...,

3. ( ),

4. ;

5. ;

6.

7.

,

(

,

k

k k k k k

k k

k k k k k k

k k k k k

k

k

k

r b Ax t w

k r tol

p r p u

q Ar r

p

t r q s At

y t t w

mod (k,m l

r q

+

if

for do

1

)

8.

9.

10.

11.

,

,

k kk

k k

k k k

k k k k k

k k k k

)< m or k = 0

u q

z r u

r t

s

s

s t

s

then

1 1 1

, , , ,,

, , , ,

, , , ,

12.

13.

14.

15.

, , ,

,

k k k k k k k kk

k k k k k k k k

k k k k k k k kk

k k k

k k k k k

k k k k

k k

k

k

k

k

s t y t y s s t

s s y y y s s y

y y s t y t s y

u q t r

s

u

z

s y y y s s y

else

1

1

*0 1

*0

16.

17.

18.

19.

,

20.

21

.

,

k k k k k

k k k k k k

k k k k k

kkk

k k

k k k k

r r

r

r z u

r t y At

x x p zw s q

r

e

endif

nddo

Page 10: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 10

GPBiCG(m,l) (S. Fujino, 2002)

*0

*0

0 0 1 1

1 1 1

1 1

1. , 0,

2. 0,1,...,

3. ( ),

4. ;

5. ;

6.

7.

,

(

,

k

k k k k k

k k

k k k k k k

k k k k k

k

k

k

r b Ax t w

k r tol

p r p u

q Ar r

p

t r q s At

y t t w

mod (k,m l

r q

+

if

for do

1

)

8.

9.

10.

11.

,

,

k kk

k k

k k k

k k k k k

k k k k

)< m or k = 0

u q

z r u

r t

s

s

s t

s

then

1 1 1

, , , ,,

, , , ,

, , , ,

12.

13.

14.

15.

, , ,

,

k k k k k k k kk

k k k k k k k k

k k k k k k k kk

k k k

k k k k k

k k k k

k k

k

k

k

k

s t y t y s s t

s s y y y s s y

y y s t y t s y

u q t r

s

u

z

s y y y s s y

else

1

1

*0 1

*0

16.

17.

18.

19.

,

20.

21

.

,

k k k k k

k k k k k k

k k k k k

kkk

k k

k k k k

r r

r

r z u

r t y At

x x p zw s q

r

e

endif

nddo

Page 11: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 11

0 0 1 1 0 0 0 0

0 00 0 0 0

1

1

1. , 0, , ,

, , , /

2. 0,1,...,

3. ;

;

4.

5. (

T

k

k k k k k k k

k k k k

r b Ax t w f A r p r

q Aq fp rr rr fp

k r tol

tem t t r q s At

y tem t w

mod (k,m+l)< m

for do

if

)

6.

7. / 0;

8.

9.

10.

11.

compu te k k k k kk k

k kk k

k kk

k k

kk k

k

k+1

k+1

1

k

k

+

or k = 0

st ss ,

r

st ,ss

fu

t rs

fq

f

,

r

t

rt ,rs , fs , f t , fq fp

r fsf

u

r

then

1

1 1 1

12.

13.

14.

15.

16. , , ; ,

,

k k

k k k k k

k k k k

k k k

k k k kk

k k

k k k

kk k

st ,ss ,sy yt yy rt ,ry rs ,

fs , fy f t , fh

q

z r u

r t s

h t r u

else

compute

, k k kfq fp

1

17. ,

18.

19.

20.

22.

k

k

k k kk k k k kk k

k kk k k k k k

k kk k k

k kk k

k k

+

k1 k

1

k k+

f

ss yt sy st st yy yt sy

ss yy ys sy ss yy ys sy

rt rs ry

fq fh

ft fs fy

rr

u

f r

1

1 1 1

1

1

23.

24.

25.

26.

27. 28. 29.

k

k k k k k k k k

k k k k k k k

k k k k k k

k k k k k

k

k

k

k

kk

k k

r

u q t r u

z r z u

r t y At

x x p zw

r

q

r

s

r

endif

1

1

1

1 1

1

( ); 30.

31.

3

2.

k+1

k k k

k

k k k+1 k

k

+

k

k

+1 kk

1

k

p r p ufp

rr

q = A

f

f rp

f

= + f-

p

p u

enddo

: , direct computed

indirect computed

xy x y

xy := (x, y)

Algorithm Design of PGPBiCG(m,l) Method

Page 12: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 12

PGPBiCG(m,l) Method(reduce # global commun. )

Algorithm reconstruct: three GobalCs to one !

Global synch.

Global synch.

Global synch.

Global synch.reconstruct

Page 13: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 13

Performance

Based on Aztec by Prof. R.S. Tuminaro et al @ Sandia

Page 14: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 14

Convergence analysis

Residual replacements strategies

Backward stable analysis

1

1

1

1

1

1

1

1

Our methods (1

, , 200

,0)

2

k k k k k

k k k k

kk k

k

k

kk k

k k kk k k

kk

k k k

kk k k k

k k k

IBiCGSTAB Yang

rr rr rq f s

f

PGPBICG

rr rt f s

fu fq

fr ft fs

fp fr fp f

u fq

fr fr fq f s

fp fr fp fu

u

Page 15: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 15

Challenging problemAccurate compute dot

Why Mindless by Kahan Accurate compute inner product.

Ogita and Rump –et-al, Accurate sum and dot product, SIAM Sci Compt. 2005 cited 188 times. (but) ….

PLASMA team Backward stable analysis of residual replacement methods.

Carson and Demmel, A residual replacement strategy for improving the maximum attainable accuracy of communication avoiding Krylov subspace Methods, April 20 2012

Reliable dot computation algorithm

Page 16: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 16

Conclusion: Avoiding communication Reliable computation Inner product computation is very likely to be the most challengin

g kernel for HHPC, while Mat_vec important for both… Software abstraction and threads programming are helpful, toge

ther with re-designing algorithms will do better

Math/Algorithm CS/Performance Applications interfaceAztec

POSKIPOSKI Hyper, PETSc; Trilinos

(Parallel Optimized Sparse Kernel Interface LIbrary) Poski v.1.0 May 02/2012

Page 17: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 17

Thanks !

Page 18: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 18

More than ten thousand processors are connected by network

Global Communication becomes more and more serious

Initial study on communication complexity

Page 19: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 19

Based on the former two strategies de Sturler and van der Vorst: Parallel GMRES(m) and CG methods

(1995) Bucker and Sauren: Parallel QMR method (1997) Yang and Brent: Improved CGS, BiCG and BiCGSTAB methods

(2002-03) Gu and Liu et al.: ICR, IBiCR, IBiCGSTAB(2) and PQMRCGST

AB methods (2004-2010) Demmel et al CA-KSMs (2008---)

Gu, Liu and Mo: MSD-CG: multiple search direction conjugate gradient method (2004) replaced the inner products computation by solving linear systems

with small size. Eliminates global inner products completely. The idea have been generated to MPCG by Grief and Bridson (200

6)

Methods in literatures

Page 20: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 20

Comparison of computational count of two Algorithms

GPBiCG( , )

PGPBiCG( , )

No._innMethod Mat_vec vect_update Syn_poin

M Lts

H T

2 18 2 51 2

2 18 0 0

3

5 19 1

m l

m l

_

+

vector update ti

communication time

2

me

mat - vec time

k inner products

computation kernals compute time

2 -1 /

2 /

fvec

m v

in

l

z

k

l

fl w

f

sn t k

2Nt /P

n

t

Nt P

kNt

=

t P t

t

Page 21: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 21

Comparison of computational count of two Algorithms

2

2

2

2

2 / 2log ( )H 1

4 / 2log ( 2 )M 2GPBiCG ,

10 / 2 log ( 5 )L 5

The time of inner product operations of GPBiCG(m,l) and PGPbiCG(m, l)

Methods position No. time

4 / 2log

PGP

2 )T 2

B

(

fl s w

fl s w

fl s w

fl s w

t N P P t t

t N P P t tm l

t N P P t t

t N P P t t

2

2

18 / 2log ( 9 )M 9ICG( , )

30 / 2 log ( 15 )L 15fl s w

fl s w

t N P P t tm l

t N P P t t

Page 22: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 22

Mathematical model of the time consummation

12 2

1

2

1

1

,

2

2

2

2

=log

32 46 2 1

6 10 16

40 60 2 1

2 18 30

66%

log

2

log

( )

2

z fl

s w

G

z fl

s w

PGs w

G

l w

P

sf

G

G

T P

m l m l n Nt

m l t m l t

m l m l n Nt

m l t

t tN

tP

T P m

m l t

lP

T P m lP

T Tt t

T

Page 23: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 23

Scalability analysis

Scaled Speedu,1

p 3 ,

PGC S P

PPGP

TTS

T P

S

T S

2 2

1 1

2 2

1 1

( , )

,

Isoef

,

( ) ,

/ 3

12log

(1 ) 1

2log

(1

ficiency analysis

) 1

E

over P

S

G

over

G

PG

PG

N = f P E

T N P PT N P

ET N T N P

Em l EPEP P

NE E

m l EPN

N

EP

E E

N

P

fi xed

Page 24: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 24

The optimal number of processors

12 2

12 2

12 2 1

32 46 2 1 ln 2log 2

6 10 16

40 60 2 1 ln 2log 2

2 18 30

Brief proof

log

/

,

3

,

z floptG G

s w

z flop

IG G

tPG PG

s w

m l m l n NtT P m l P

P m l t m l t

m l m l n NtT P m l P

P

P P

m l t m l t

x x Cx

Opti mal number

2

1

2

0 C=const

ln 2' 0, '' x 0 x x

Popt

Page 25: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 25

Convergence Analysis

N Let x,y R , is the inner product computed by computer, and is

the real value , where is machine prec

Lemm

ision

a.

.

T

nT Ti ii=1

x y y

fl x y - x y 1.01n x y

u u

Tfl x

, then

1

1

1

1

1

1

1

1

Our methods (1

, , 200

,0)

2

k k k k k

k k k k

kk k

k

k

kk k

k k kk k k

kk

k k k

kk k k k

k k k

IBiCGSTAB Yang

rr rr rq f s

f

PGPBICG

rr rt f s

fu fq

fr ft fs

fp fr fp f

u fq

fr fr fq f s

fp fr fp fu

u

When n is very large then might be Co mun ch larger thclusion. an .T Tfl x y - x y u

Page 26: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 26

Numerical Experiments: timing and improvements

2 2

2 2

0 00 1

0, , (0,1)

0, 0, 0, 10

1512, 1, 0

y yx x

u u u ua b c d eu x y

x y x yu u

u ux x

a b c d e

Experiment I Each CPU 3600

Experiment II

problem size 960

(1,0)

(0,1)

(1,1)

(2

96

,8)

( ,

0

8 2)

cc

G IG

G

G IGc c

c Gc

TR =

T

T T

T

T T

T

GPBiCG

GPBiCG

GPBiCG

GPBiCG

GPBiCG

fit

comm/ al l :

speed up

Page 27: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 27

Numerical Experiments: Speedup

Page 28: What is the most important kernel of sparse linear solvers for heterogeneous supercomputers?

23/4/19 SNSCC'12, [email protected] 28

PGPBiCG(m,l) method is more scalable and parallel for solving large sparse unsymmetrical linear systems on distributed parallel architectures

Performance, isoefficiency analysis and numerical experiments have been done for PGPBiCG(m,l) and GPBiCG(m,l) methods

The parallel communication performance can be improved by a factor of larger than 3.

The PGPBiCG(m,l) method has better parallel speed up compared with the GPBiC(m,l) method.

For further performance improvements: overlap of computation with communication, numerical stability.

Conclusions


Recommended