+ All Categories
Home > Documents > Introduction to Parallel FEM - nkl.cc.u-tokyo.ac.jp

Introduction to Parallel FEM - nkl.cc.u-tokyo.ac.jp

Date post: 12-Mar-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
99
Introduction to Parallel FEM Kengo Nakajima Information Technology Center The University of Tokyo
Transcript

Intr

oduc

tion

to P

aral

lel F

EM

Ken

go N

akaj

ima

Info

rmat

ion

Tech

nolo

gy C

ente

rTh

e U

nive

rsity

of T

okyo

Intro

pFE

M2

•In

trodu

ctio

n to

MP

I•

Roa

d to

Par

alle

l FE

M–

Loca

l Dat

a S

truct

ure

•M

PI f

or P

aral

lel F

EM

–C

olle

ctiv

e C

omm

unic

atio

n–

Poi

nt-to

-Poi

nt (P

eer-t

o-P

eer)

Com

mun

icat

ion

Wha

t is

MPI

? (1

/2)

•M

essa

ge P

assi

ng In

terfa

ce•

“Spe

cific

atio

n” o

f mes

sage

pas

sing

AP

I for

dis

tribu

ted

mem

ory

envi

ronm

ent

–N

ot a

pro

gram

, Not

a li

brar

y•

http

://ph

ase.

hpcc

.jp/p

hase

/mpi

-j/m

l/mpi

-j-ht

ml/c

onte

nts.

htm

l

•H

isto

ry–

1992

MP

I For

um–

1994

MP

I-1–

1997

MP

I-2, M

PI-3

is s

oon

avai

labl

e•

Impl

emen

tatio

n–

mpi

chA

NL

(Arg

onne

Nat

iona

l Lab

orat

ory)

–O

penM

PI,

LAM

–H

/W v

endo

rs–

C/C

++, F

OTR

AN

, Jav

a ; U

nix,

Lin

ux, W

indo

ws,

Mac

OS

Intro

pFE

M3

Wha

t is

MPI

? (2

/2)

•“m

pich

” (fre

e) is

wid

ely

used

–su

ppor

ts M

PI-2

spe

c. (p

artia

lly)

–M

PIC

H2

afte

r Nov

. 200

5.–

http

://w

ww

-uni

x.m

cs.a

nl.g

ov/m

pi/

•W

hy M

PI i

s w

idel

y us

ed a

s de

fact

o st

anda

rd ?

–U

nifo

rm in

terfa

ce th

roug

h M

PI f

orum

•P

orta

ble,

can

wor

k on

any

type

s of

com

pute

rs•

Can

be

calle

d fro

m F

ortra

n, C

, etc

.

–m

pich

•fre

e, s

uppo

rts e

very

arc

hite

ctur

e

•P

VM

(Par

alle

l Virt

ual M

achi

ne) w

as a

lso

prop

osed

in

early

90’

s bu

t not

so

wid

ely

used

as

MP

I

Intro

pFE

M4

Ref

eren

ces

•W

.Gro

ppet

al.,

Usi

ng M

PI s

econ

d ed

ition

, MIT

Pre

ss, 1

999.

M.J

.Qui

nn, P

aral

lel P

rogr

amm

ing

in C

with

MP

I and

Ope

nMP

, M

cGra

whi

ll, 2

003.

•W

.Gro

ppet

al.,

MP

I:Th

e C

ompl

ete

Ref

eren

ce V

ol.I,

II, M

IT P

ress

, 19

98.

•ht

tp://

ww

w-u

nix.

mcs

.anl

.gov

/mpi

/ww

w/

–A

PI(

App

licat

ion

Inte

rface

) of M

PI

Intro

pFE

M5

How

to le

arn

MPI

(1/2

)•

Gra

mm

ar–

10-2

0 fu

nctio

ns o

f MP

I-1 w

ill be

taug

ht in

the

clas

s•

alth

ough

ther

e ar

e m

any

conv

enie

nt c

apab

ilitie

s in

MP

I-2–

If yo

u ne

ed fu

rther

info

rmat

ion,

you

can

find

info

rmat

ion

from

web

, bo

oks,

and

MP

I exp

erts

.•

Pra

ctic

e is

impo

rtant

–P

rogr

amm

ing

–“R

unni

ng th

e co

des”

is th

e m

ost i

mpo

rtant

•B

e fa

milia

r with

or “

grab

” the

idea

of S

PM

D/S

IMD

op’

s–

Sin

gle

Pro

gram

/Inst

ruct

ion

Mul

tiple

Dat

a–

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

a•

Larg

e-sc

ale

data

is d

ecom

pose

d, a

nd e

ach

part

is c

ompu

ted

by e

ach

proc

ess

–G

loba

l/Loc

al D

ata,

Glo

bal/L

ocal

Num

berin

g

Intro

pFE

M6

7

SPM

D

PE

#0

Pro

gram

Dat

a #0

PE

#1

Pro

gram

Dat

a #1

PE

#2

Pro

gram

Dat

a #2

PE

#M

-1

Pro

gram

Dat

a #M

-1

mpirun -np M <Program>

You

unde

rsta

nd 9

0% M

PI,

if yo

u un

ders

tand

this

figu

re.

PE

: Pro

cess

ing

Ele

men

tP

roce

ssor

, Dom

ain,

Pro

cess

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

aLa

rge-

scal

e da

ta is

dec

ompo

sed,

and

eac

h pa

rt is

com

pute

d by

eac

h pr

oces

sIt

is id

eal t

hat p

aral

lel p

rogr

am is

not

diff

eren

t fro

m s

eria

l one

exc

ept c

omm

unic

atio

n.

8

Som

e Te

chni

cal T

erm

s•

Pro

cess

or, C

ore

–P

roce

ssin

g U

nit (

H/W

), P

roce

ssor

=Cor

e fo

r sin

gle-

core

pro

c’s

•P

roce

ss–

Uni

t for

MP

I com

puta

tion,

nea

rly e

qual

to “c

ore”

–E

ach

core

(or p

roce

ssor

) can

hos

t mul

tiple

pro

cess

es (b

ut n

ot

effic

ient

)•

PE

(Pro

cess

ing

Ele

men

t)–

PE

orig

inal

ly m

ean

“pro

cess

or”,

but i

t is

som

etim

es u

sed

as

“pro

cess

” in

this

cla

ss. M

oreo

ver i

t mea

ns “d

omai

n” (n

ext)

•In

mul

ticor

e pr

oc’s

: PE

gen

eral

ly m

eans

“cor

e”

•D

omai

n–

dom

ain=

proc

ess

(=P

E),

each

of “

MD

” in

“SP

MD

”, ea

ch d

ata

set

•P

roce

ss ID

of M

PI (

ID o

f PE

, ID

of d

omai

n) s

tarts

from

“0”

–if

you

have

8 p

roce

sses

(PE

’s, d

omai

ns),

ID is

0~7

9

SPM

D

PE

#0

Pro

gram

Dat

a #0

PE

#1

Pro

gram

Dat

a #1

PE

#2

Pro

gram

Dat

a #2

PE

#M

-1

Pro

gram

Dat

a #M

-1

mpirun -np M <Program>

You

unde

rsta

nd 9

0% M

PI,

if yo

u un

ders

tand

this

figu

re.

PE

: Pro

cess

ing

Ele

men

tP

roce

ssor

, Dom

ain,

Pro

cess

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

aLa

rge-

scal

e da

ta is

dec

ompo

sed,

and

eac

h pa

rt is

com

pute

d by

eac

h pr

oces

sIt

is id

eal t

hat p

aral

lel p

rogr

am is

not

diff

eren

t fro

m s

eria

l one

exc

ept c

omm

unic

atio

n.

10

How

to le

arn

MPI

(2/2

)

•N

OT

so d

iffic

ult.

•Th

eref

ore,

2-3

lect

ures

are

eno

ugh

for j

ust l

earn

ing

gram

mar

of M

PI.

•G

rab

the

idea

of S

PM

D !

11

Act

ually

, MPI

is n

ot e

noug

h ...

•M

ultic

ore

Clu

ster

s•

Het

erog

eneo

us C

lust

ers

(CP

U+G

PU

, CP

U+M

anyc

ore)

•M

PI +

X (

+ Y

)•

X –O

penM

P, P

thre

ad–

Ope

nAC

C–

CU

DA

, Ope

nCL

•Y –

Vec

toriz

atio

n

121212

Firs

t Exa

mpl

eimplicit REAL*8 (A-H,O-Z)

include 'mpif.h‘

integer :: PETOT, my_rank, ierr

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )

call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )

write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT

call MPI_FINALIZE (ierr)

stop

end

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

hello.f

hello.c

131313

Bas

ic/E

ssen

tial F

unct

ions

implicit REAL*8 (A-H,O-Z)

include 'mpif.h‘

integer :: PETOT, my_rank, ierr

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )

call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )

write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT

call MPI_FINALIZE (ierr)

stop

end

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

‘mpif.h’, “mpi.h”

Ess

entia

l Inc

lude

file

“use mpi”

is p

ossi

ble

in F

90

MPI_Init

Initi

aliz

atio

n

MPI_Comm_size

Num

ber o

f MP

I Pro

cess

esmpirun

-np

XX

<prog>

MPI_Comm_rank

Pro

cess

ID s

tarti

ng fr

om 0

MPI_Finalize

Term

inat

ion

of M

PI p

roce

sses

1414

mpi

.h,

mpi

f.himplicit REAL*8 (A-H,O-Z)

include 'mpif.h‘

integer :: PETOT, my_rank, ierr

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )

call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )

write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT

call MPI_FINALIZE (ierr)

stop

end

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

•V

ario

us ty

pes

of p

aram

eter

s an

d va

riabl

es fo

r MP

I & th

eir i

nitia

l val

ues.

•N

ame

of e

ach

var.

star

ts fr

om “M

PI_

”•

Val

ues

of th

ese

para

met

ers

and

varia

bles

can

not b

e ch

ange

d by

us

ers.

•U

sers

do

not s

peci

fy v

aria

bles

st

artin

g fro

m “M

PI_

” in

user

s’

prog

ram

s.

1515

MPI

_Ini

t•

Initi

aliz

e th

e M

PI e

xecu

tion

envi

ronm

ent (

requ

ired)

•It

is re

com

men

ded

to p

ut th

is B

EFO

RE

all

stat

emen

ts in

the

prog

ram

.

•MPI_Init(argc, argv)

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

C

1616

MPI

_Fin

aliz

e•

Term

inat

es M

PI e

xecu

tion

envi

ronm

ent (

requ

ired)

•It

is re

com

men

ded

to p

ut th

is A

FTE

R a

ll st

atem

ents

in th

e pr

ogra

m.

•P

leas

e do

not

forg

et th

is.

•MPI_Finalize()

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

C

1717

MPI

_Com

m_s

ize

•D

eter

min

es th

e si

ze o

f the

gro

up a

ssoc

iate

d w

ith a

com

mun

icat

or•

not r

equi

red,

but

ver

y co

nven

ient

func

tion

•MPI_Comm_size (comm, size)

–comm

MP

I_C

ommI

com

mun

icat

or–

size

int

Onu

mbe

r of p

roce

sses

in th

e gr

oup

of c

omm

unic

ator

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

C

1818

Wha

t is

Com

mun

icat

or ?

•G

roup

of p

roce

sses

for c

omm

unic

atio

n•

Com

mun

icat

or m

ust b

e sp

ecifi

ed in

MP

I pro

gram

as

a un

it of

com

mun

icat

ion

•A

ll pr

oces

ses

belo

ng to

a g

roup

, nam

ed

“MPI

_CO

MM

_WO

RLD

” (de

faul

t)•

Mul

tiple

com

mun

icat

ors

can

be c

reat

ed, a

nd c

ompl

icat

ed

oper

atio

ns a

re p

ossi

ble.

–C

ompu

tatio

n, V

isua

lizat

ion

•O

nly

“MP

I_C

OM

M_W

OR

LD” i

s ne

eded

in th

is c

lass

.

MPI_Comm_Size (MPI_COMM_WORLD, PETOT)

1919

MPI

_CO

MM

_WO

RLD

Com

mun

icat

or in

MPI

One

pro

cess

can

bel

ong

to m

ultip

le c

omm

unic

ator

s

CO

MM

_MA

NTL

EC

OM

M_C

RU

ST

CO

MM

_VIS

2020

Cou

plin

g be

twee

n “G

roun

d M

otio

n” a

nd

“Slo

shin

g of

Tan

ks fo

r Oil-

Stor

age”

22

Targ

et A

pplic

atio

n•

Cou

plin

g be

twee

n “G

roun

d M

otio

n” a

nd “S

losh

ing

of

Tank

s fo

r Oil-

Sto

rage

”–

“One

-way

” cou

plin

g fro

m “G

roun

d M

otio

n” to

“Tan

ks”.

–D

ispl

acem

ent o

f gro

und

surfa

ce is

giv

en a

s fo

rced

di

spla

cem

ent o

f bot

tom

sur

face

of t

anks

.–

1 Ta

nk =

1 P

E (s

eria

l)

Def

orm

atio

nof

sur

face

w

ill b

e gi

ven

as

boun

dary

con

ditio

nsat

bot

tom

of t

anks

.

Def

orm

atio

nof

sur

face

w

ill b

e gi

ven

as

boun

dary

con

ditio

nsat

bot

tom

of t

anks

.

2003

Tok

achi

Ear

thqu

ake

(M8.

0)Fi

re a

ccid

ent o

f oil

tank

s du

e to

long

per

iod

grou

nd m

otio

n (s

urfa

ce w

aves

) dev

elop

ed in

the

basi

n of

Tom

akom

ai

Intro

pFE

M23

Seis

mic

Wav

e Pr

opag

atio

n,

Und

ergr

ound

Str

uctu

re

Intro

pFE

M24

25

Sim

ulat

ion

Cod

es•

Gro

und

Mot

ion

(Ichi

mur

a): F

ortra

n–

Par

alle

l FE

M, 3

D E

last

ic/D

ynam

ic•

Exp

licit

forw

ard

Eul

er s

chem

e

–E

ach

elem

ent:

2m×

2m×

2m c

ube

–24

0m×

240m

×10

0m re

gion

•S

losh

ing

of T

anks

(Nag

ashi

ma)

: C–

Ser

ial F

EM

(Em

barr

assi

ngly

Par

alle

l)•

Impl

icit

back

war

d E

uler

, Sky

line

met

hod

•S

hell

elem

ents

+ In

visc

id p

oten

tial f

low

–D

: 42.

7m, H

: 24.

9m, T

: 20m

m,

–Fr

eque

ncy:

7.6

sec.

–80

ele

men

ts in

circ

., 0.

6m m

esh

in h

eigh

t–

Tank

-to-T

ank:

60m

, 4×

4•

Tota

l num

ber o

f unk

now

ns: 2

,918

,169

2626

Thre

e C

omm

unic

ator

sm

eshG

LOB

AL%

MP

I_C

OM

M

base

mem

t#0

base

men

t#1

base

men

t#2

base

men

t#3

mes

hBA

SE

%M

PI_

CO

MM

tank #0

tank #1

tank #2

tank #3

tank #4

tank #5

tank #6

tank #7

tank #8

mes

hTA

NK

%M

PI_

CO

MM

mes

hGLO

BA

L%m

y_ra

nk=

0~3

mes

hBA

SE

%m

y_ra

nk=

0~3

mes

hGLO

BA

L%m

y_ra

nk=

4~12

mes

hTA

NK

%m

y_ra

nk=

0~ 8

mes

hTA

NK

%m

y_ra

nk=

-1m

eshB

AS

E%

my_

rank

= -1

mes

hGLO

BA

L%M

PI_

CO

MM

base

mem

t#0

base

men

t#1

base

men

t#2

base

men

t#3

mes

hBA

SE

%M

PI_

CO

MM

base

mem

t#0

base

men

t#1

base

men

t#2

base

men

t#3

mes

hBA

SE

%M

PI_

CO

MM

tank #0

tank #1

tank #2

tank #3

tank #4

tank #5

tank #6

tank #7

tank #8

mes

hTA

NK

%M

PI_

CO

MM

tank #0

tank #1

tank #2

tank #3

tank #4

tank #5

tank #6

tank #7

tank #8

mes

hTA

NK

%M

PI_

CO

MM

mes

hGLO

BA

L%m

y_ra

nk=

0~3

mes

hBA

SE

%m

y_ra

nk=

0~3

mes

hGLO

BA

L%m

y_ra

nk=

4~12

mes

hTA

NK

%m

y_ra

nk=

0~ 8

mes

hTA

NK

%m

y_ra

nk=

-1m

eshB

AS

E%

my_

rank

= -1

2727

MPI

_Com

m_r

ank

•D

eter

min

es th

e ra

nk o

f the

cal

ling

proc

ess

in th

e co

mm

unic

ator

–“ID

of M

PI p

roce

ss” i

s so

met

imes

cal

led

“ran

k”

•MPI_Comm_rank (comm, rank)

–comm

MP

I_C

ommI

com

mun

icat

or–

rank

int

Ora

nk o

f the

cal

ling

proc

ess

in th

e gr

oup

of c

omm

Sta

rting

from

“0”

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

C

Intro

pFE

M28

•In

trodu

ctio

n to

MP

I•

Roa

d to

Par

alle

l FEM

–Lo

cal D

ata

Stru

ctur

e

•M

PI f

or P

aral

lel F

EM

–C

olle

ctiv

e C

omm

unic

atio

n–

Poi

nt-to

-Poi

nt (P

eer-t

o-P

eer)

Com

mun

icat

ion

Mot

ivat

ion

for P

aral

lel C

ompu

ting

•La

rge-

scal

e pa

ralle

l com

pute

r ena

bles

fast

com

putin

g in

larg

e-sc

ale

scie

ntifi

c si

mul

atio

ns w

ith d

etai

led

mod

els.

C

ompu

tatio

nal s

cien

ce d

evel

ops

new

fron

tiers

of

scie

nce

and

engi

neer

ing.

•W

hy p

aral

lel c

ompu

ting

?–

fast

er–

larg

er–

“larg

er” i

s m

ore

impo

rtant

from

the

view

poi

nt o

f “ne

w fr

ontie

rs

of s

cien

ce &

eng

inee

ring”

, but

“fas

ter”

is a

lso

impo

rtant

.–

+ m

ore

com

plic

ated

–Id

eal:

Sca

labl

e•

Sol

ving

Nx

scal

e pr

oble

m u

sing

Nx

com

puta

tiona

l res

ourc

es d

urin

g sa

me

com

puta

tion

time.

Intro

pFE

M29

to

sol

ve la

rger

pro

blem

s fa

ster

...

–fin

er m

eshe

s pr

ovid

e m

ore

accu

rate

sol

utio

n

Wha

t is

Para

llel C

ompu

ting

?

Hom

ogen

eous

/Het

erog

eneo

usPo

rous

Med

iaLa

wre

nce

Live

rmor

e N

atio

nal L

abor

ator

y

Hom

ogen

eous

Het

erog

eneo

us

very

fine

mes

hes

are

requ

ired

for s

imul

atio

ns o

f he

tero

gene

ous

field

.

Intro

pFE

M30

P

C w

ith 1

GB

mem

ory

: 1M

mes

hes

are

the

limit

for F

EM

−S

outh

wes

t Jap

an w

ith (1

000k

m)3

in 1

km m

esh

-> 1

09m

eshe

s

Larg

e D

ata

-> D

omai

n D

ecom

posi

tion

-> L

ocal

Ope

ratio

n

Inte

r-D

omai

n C

omm

unic

atio

n fo

r Glo

bal O

pera

tion.

Larg

e-S

cale

Dat

a

Loca

lD

ata

Loca

lD

ata

Loca

lD

ata

Loca

lD

ata

Loca

lD

ata

Loca

lD

ata

Loca

lD

ata

Loca

lD

ataC

omm

unic

atio

n

Par

titio

ning

Wha

t is

Para

llel C

ompu

ting

?(c

ont.)

Intro

pFE

M31

P

aral

lel C

ompu

ting

-> L

ocal

Ope

ratio

ns

C

omm

unic

atio

ns a

re re

quire

d in

Glo

bal O

pera

tions

for

Con

sist

ency

.

Wha

t is

Com

mun

icat

ion

?

Intro

pFE

M32

In

itial

izat

ion

−C

ontro

l Dat

a−

Nod

e, C

onne

ctiv

ity o

f Ele

men

ts (N

: Nod

e#, N

E: E

lem

#)−

Initi

aliz

atio

n of

Arr

ays

(Glo

bal/E

lem

ent M

atric

es)

−E

lem

ent-G

loba

l Mat

rix M

appi

ng (I

ndex

, Ite

m)

G

ener

atio

n of

Mat

rix: L

ocal

Op.

: goo

d fo

r par

alle

l com

putin

g−

Ele

men

t-by-

Ele

men

t Ope

ratio

ns (d

o ic

el=

1, N

E)

E

lem

ent m

atric

es

Acc

umul

atio

n to

glo

bal m

atrix

B

ound

ary

Con

ditio

ns

Li

near

Sol

ver:

Glo

bal O

p.: C

omm

unic

atio

n ne

eded

−C

onju

gate

Gra

dien

t Met

hod

C

alcu

latio

n of

Stre

ss

Fini

te E

lem

ent P

roce

dure

sIn

tro p

FEM

33

Larg

e S

cale

Dat

a ->

par

titio

ned

into

Dis

tribu

ted

Loca

l Dat

a S

ets.

Loca

l Dat

a

Loca

l Dat

a

Loca

l Dat

a

Loca

l Dat

a

FEM

Mat

rix

FEM

Mat

rix

FEM

Mat

rix

FEM

Mat

rix

FEM

cod

e ca

n as

sem

bles

coe

ffici

ent m

atrix

for e

ach

loca

l dat

a se

t :

this

par

t cou

ld b

e co

mpl

etel

y lo

cal,

sam

e as

ser

ial o

pera

tions

Line

ar S

olve

r

Line

ar S

olve

r

Line

ar S

olve

r

Line

ar S

olve

r

Glo

bal O

pera

tions

& C

omm

unic

atio

ns h

appe

n on

ly in

Lin

ear S

olve

rsdo

t pro

duct

s, m

atrix

-vec

tor m

ultip

ly, p

reco

nditi

onin

g

Ope

ratio

ns in

Par

alle

l FEM

SP

MD

: Sin

gle-

Pro

gram

Mul

tiple

-Dat

a

MPI

MPI

MPI

Intro

pFE

M34

D

esig

n of

“Loc

al D

ata

Stru

ctur

e” is

impo

rtant

−Fo

r SP

MD

-type

ope

ratio

ns in

the

prev

ious

pag

e

M

atrix

Gen

erat

ion

P

reco

nditi

oned

Iter

ativ

e S

olve

rs fo

r Lin

ear E

quat

ions

Para

llel F

EM P

roce

dure

sIn

tro p

FEM

35

Bi-L

inea

r Squ

are

Elem

ents

Valu

es a

re d

efin

ed o

n ea

ch n

ode

1

5 1

2

6 2

3

7 3

8 4

1

5 1

6 2

3

7 3

8 4

Loca

l inf

orm

atio

n is

not

eno

ugh

for m

atrix

ass

embl

ing.

divi

de in

to tw

o do

mai

ns b

y “n

ode-

base

d” m

anne

r, w

here

nu

mbe

r of “

node

s (v

ertic

es)”

are

bala

nced

.

21

5 1

6 2

3

8 4

7 3

2

6 2

7 3

Info

rmat

ion

of o

verla

pped

el

emen

ts a

nd c

onne

cted

nod

es

are

requ

ired

for m

atrix

as

sem

blin

g on

bou

ndar

y no

des.

Intro

pFE

M36

Loca

l Dat

a of

Par

alle

l FEM

N

ode-

base

d pa

rtitio

ning

for I

C/IL

U ty

pe p

reco

nditi

onin

g m

etho

ds

Loca

l dat

a in

clud

es in

form

atio

n fo

r :

Nod

es o

rigin

ally

ass

igne

d to

the

parti

tion/

PE

E

lem

ents

whi

ch in

clud

e th

e no

des

: Ele

men

t-bas

ed o

pera

tions

(Mat

rix

Ass

embl

e) a

re a

llow

ed fo

r flu

id/s

truct

ure

subs

yste

ms.

All

node

s w

hich

form

the

elem

ents

but

out

of t

he p

artit

ion

N

odes

are

cla

ssifi

ed in

to th

e fo

llow

ing

3 ca

tego

ries

from

the

view

poin

t of t

he m

essa

ge p

assi

ng

In

tern

al n

odes

orig

inal

ly a

ssig

ned

node

s

Ext

erna

l nod

esin

the

over

lapp

ed e

lem

ents

but

out

of t

he p

artit

ion

B

ound

ary

node

sex

tern

al n

odes

of o

ther

par

titio

n

Com

mun

icat

ion

tabl

e be

twee

n pa

rtitio

ns

NO

glo

bal i

nfor

mat

ion

requ

ired

exce

pt p

artit

ion-

to-p

artit

ion

conn

ectiv

ity

Intro

pFE

M37

Intro

pFE

M38

Nod

e-ba

sed

Part

ition

ing

inte

rnal

nod

es -

elem

ents

-ex

tern

al n

odes

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

12

34

5

21

22

23

24

25

16

17

18

20

11

12

13

14

15

67

89

10

19PE#0

PE#1

PE#2

PE#3

Intro

pFE

M39

E

lem

ents

whi

ch in

clud

e In

tern

al N

odes

Nod

e-ba

sed

Part

ition

ing

inte

rnal

nod

es -

elem

ents

-ex

tern

al n

odes

89

11

10141315

12

E

xter

nal N

odes

incl

uded

in th

e E

lem

ents

in o

verla

pped

regi

on a

mon

g pa

rtitio

ns.

P

artit

ione

d no

des

them

selv

es (I

nter

nal N

odes

)

12

3

45

67

In

fo o

f Ext

erna

l Nod

es a

re re

quire

d fo

r com

plet

ely

loca

l el

emen

t–ba

sed

oper

atio

ns o

n ea

ch p

roce

ssor

.

E

lem

ents

whi

ch in

clud

e In

tern

al N

odes

Nod

e-ba

sed

Part

ition

ing

inte

rnal

nod

es -

elem

ents

-ex

tern

al n

odes

89

11

10141315

12

E

xter

nal N

odes

incl

uded

in th

e E

lem

ents

in o

verla

pped

regi

on a

mon

g pa

rtitio

ns.

P

artit

ione

d no

des

them

selv

es (I

nter

nal N

odes

)

12

3

45

67

In

fo o

f Ext

erna

l Nod

es a

re re

quire

d fo

r com

plet

ely

loca

l el

emen

t–ba

sed

oper

atio

ns o

n ea

ch p

roce

ssor

.

We

do n

ot n

eed

com

mun

icat

ion

durin

g m

atrix

ass

embl

e !!

Intro

pFE

M40

Intro

pFE

M41

Para

llel C

ompu

ting

in F

EMSP

MD

: Sin

gle-

Prog

ram

Mul

tiple

-Dat

a

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

MPI

MPI

MPI

MPI

MPI

MPI

Intro

pFE

M42

Para

llel C

ompu

ting

in F

EMSP

MD

: Sin

gle-

Prog

ram

Mul

tiple

-Dat

a

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

MPI

MPI

MPI

MPI

MPI

MPI

Intro

pFE

M43

Para

llel C

ompu

ting

in F

EMSP

MD

: Sin

gle-

Prog

ram

Mul

tiple

-Dat

a

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

MPI

MPI

MPI

MPI

MPI

MPI

12

3

45

67

89

11

10

14

13

15

12

12

3

45

67

89

11

10

14

13

15

12

71

23

10

911

12

56

84

71

23

10

911

12

56

84

78

910

45

612

311

12

78

910

45

612

311

12

34

8

69

10

12

12

5

11

7

34

8

69

10

12

12

5

11

7

Intro

pFE

M44

Para

llel C

ompu

ting

in F

EMSP

MD

: Sin

gle-

Prog

ram

Mul

tiple

-Dat

a

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

MPI

MPI

MPI

MPI

MPI

MPI

12

3

45

67

89

11

10

14

13

15

12

12

3

45

67

89

11

10

14

13

15

12

71

23

10

911

12

56

84

71

23

10

911

12

56

84

78

910

45

612

311

12

78

910

45

612

311

12

34

8

69

10

12

12

5

11

7

34

8

69

10

12

12

5

11

7

Intro

pFE

M45

Para

llel C

ompu

ting

in F

EMSP

MD

: Sin

gle-

Prog

ram

Mul

tiple

-Dat

a

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

Loca

l Dat

aFE

M c

ode

Line

ar S

olve

rs

MPI

MPI

MPI

MPI

MPI

MPI

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

01

23

45

67

89

1011

01

23

45

67

89

10

01

23

45

67

89

1011

12

34

56

78

910

0

Intro

pFE

M46

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10

Intro

pFE

M47

# “I

nter

nal N

odes

” sh

ould

be

bala

nced

#0 #1 #2

0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10

Intro

pFE

M48

Mat

rices

are

inco

mpl

ete

!0

1

2

3

4

5

6

7

8

9

10

11

#0 #1 #2

0 1 2 3 8 9 10 114 5 6 7

Intro

pFE

M49

Con

nect

ed E

lem

ents

+ E

xter

nal N

odes

#0 #1 #2

0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4

0 1 2 3

7 8 9 10 11

7 8 9 10

3 4 5 6 7 8

3 4 5 6 7

Intro

pFE

M50

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

0 1 2 3 4

0 1 2 3

7 8 9 10 11

7 8 9 10

3 4 5 6 7 8

3 4 5 6 7

#0 #1 #2

0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

Intro

pFE

M51

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

01

23

45

67

89

1011

01

23

45

67

89

10

01

23

4

78

910

11

12

3

78

910

0

34

56

78

34

56

7

01

23

45

67

89

1011

12

34

56

78

910

0

Intro

pFE

M52

Loca

l Num

berin

g fo

r SPM

DN

umbe

ring

of in

tern

al n

odes

is 1

-N (0

-N-1

), sa

me

oper

atio

ns

in s

eria

l pro

gram

can

be

appl

ied.

How

abo

ut n

umbe

ring

of

exte

rnal

nod

es ?

01

23

?

?0

12

3

?0

12

3?

01

23

4

78

910

11

12

3

78

910

0

34

56

78

34

56

7

Intro

pFE

M53

SPM

D

PE

#0

Pro

gram

Dat

a #0

PE

#1

Pro

gram

Dat

a #1

PE

#2

Pro

gram

Dat

a #2

PE

#M

-1

Pro

gram

Dat

a #M

-1

mpirun -np M <Program>

You

unde

rsta

nd 9

0% M

PI,

if yo

u un

ders

tand

this

figu

re.

PE

: Pro

cess

ing

Ele

men

tP

roce

ssor

, Dom

ain,

Pro

cess

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

aLa

rge-

scal

e da

ta is

dec

ompo

sed,

and

eac

h pa

rt is

com

pute

d by

eac

h pr

oces

sIt

is id

eal t

hat p

aral

lel p

rogr

am is

not

diff

eren

t fro

m s

eria

l one

exc

ept c

omm

unic

atio

n.

Intro

pFE

M54

Loca

l Num

berin

g fo

r SPM

DN

umbe

ring

of e

xter

nal n

odes

: N+1

, N+2

(N,N

+1)

01

23

4

40

12

3

40

12

35

Intro

pFE

M55

Intro

pFE

M56

•In

trodu

ctio

n to

MP

I•

Roa

d to

Par

alle

l FE

M–

Loca

l Dat

a S

truct

ure

•M

PI fo

r Par

alle

l FEM

–C

olle

ctiv

e C

omm

unic

atio

n–

Poi

nt-to

-Poi

nt (P

eer-t

o-P

eer)

Com

mun

icat

ion

P

aral

lel p

roce

dure

s ar

e re

quire

d in

:

Dot

pro

duct

s

Mat

rix-v

ecto

r mul

tiplic

atio

n

How

to “

Para

lleliz

e” It

erat

ive

Solv

ers

?e.

g. C

G m

etho

d (w

ith n

o pr

econ

ditio

ning

)

Compute r(0)= b-[A]x(0)

fori= 1, 2, …

z(i-1)= r(i-1)

i-1= r

(i-1) z(i-1)

ifi=1

p(1)= z(0)

else

i-1=

i-1/ i

-2

p(i)= z

(i-1) + i

-1p(i)

endif

q(i)= [A]p(i)

i = i

-1/p(i)q(i)

x(i)= x(i-1) + ip(i)

r(i)= r(i-1) -

iq(i)

check convergence |r|

end

Intro

pFE

M57

Pre

cond

ition

ing,

DA

XP

YLo

cal O

pera

tions

by

Onl

y In

tern

al P

oint

s: P

aral

lel P

roce

ssin

g is

pos

sibl

e

/*

//--{x}= {x} + ALPHA*{p}

// {r}= {r} -ALPHA*{q}

*/

for(i=0;i<N;i++){

PHI[i] += Alpha * W[P][i];

W[R][i] -= Alpha * W[Q][i];

}

/*//--

{z}= [Minv]{r}

*/

for(i=0;i<N;i++){

W[Z][i] = W[DD][i] * W[R][i];

}

0 1 2 3 4 5 6 7 8 9 10 11

Intro

pFE

M58

Dot

Pro

duct

sG

loba

l Sum

mat

ion

need

ed: C

omm

unic

atio

n ?

/*

//--

ALPHA= RHO / {p}{q}

*/C1 = 0.0;

for(i=0;i<N;i++){

C1 += W[P][i] * W[Q][i];

} Alpha = Rho / C1;

0 1 2 3 4 5 6 7 8 9 10 11

Intro

pFE

M59

6060

MPI

_Red

uce

•R

educ

es v

alue

s on

all

proc

esse

s to

a s

ingl

e va

lue

–S

umm

atio

n, P

rodu

ct, M

ax, M

in e

tc.

•MPI_Reduce(sendbuf,recvbuf,count,datatype,op,root,comm)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

end

buffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s re

ceiv

e bu

ffer

type

is d

efin

ed b

y ”datatype”

–count

int

Inu

mbe

r of e

lem

ents

in s

end/

rece

ive

buffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

end/

reci

vebu

ffer

FORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.

C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc

–op

MP

I_O

pI

redu

ce o

pera

tion

MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND etc

Use

rs c

an d

efin

e op

erat

ions

by MPI_OP_CREATE

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

or

Red

uce

P#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

C

6161

MPI

_Bca

st

•B

road

cast

s a

mes

sage

from

the

proc

ess

with

rank

"roo

t" to

all

othe

r pr

oces

ses

of th

e co

mm

unic

ator

•MPI_Bcast (buffer,count,datatype,root,comm)

–buffer

choi

ce

I/O

star

ting

addr

ess

of b

uffe

rty

pe is

def

ined

by ”datatype”

–count

int

Inu

mbe

r of e

lem

ents

in s

end/

rece

ive

buffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

end/

reci

ve b

uffe

rFORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.

C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc.

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

or

A0

P#0

B0

C0

D0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

P#1

P#2

P#3

Bro

adca

stA

0P

#0B

0C

0D

0

A0

P#1

B0

C0

D0

A0

P#2

B0

C0

D0

A0

P#3

B0

C0

D0

A0

P#0

B0

C0

D0

A0

P#1

B0

C0

D0

A0

P#2

B0

C0

D0

A0

P#3

B0

C0

D0 C

6262

MPI

_Allr

educ

e•

MP

I_R

educ

e +

MP

I_B

cast

•S

umm

atio

n (o

f dot

pro

duct

s) a

nd M

AX

/MIN

val

ues

are

likel

y to

util

ized

in

each

pro

cess

•call MPI_Allreduce

(sendbuf,recvbuf,count,datatype,op, comm)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

end

buffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s re

ceiv

e bu

ffer

type

is d

efin

ed b

y ”datatype”

–count

int

Inu

mbe

r of e

lem

ents

in s

end/

rece

ive

buffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

end/

reci

ve b

uffe

r

–op

MP

I_O

pI

redu

ce o

pera

tion

–comm

MP

I_C

ommI

com

mun

icat

or

All

redu

ceP

#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

C

63

“op”

of M

PI_R

educ

e/A

llred

uce

•MPI_MAX,MPI_MIN

Max

, Min

•MPI_SUM,MPI_PROD

Sum

mat

ion,

Pro

duct

•MPI_LAND

Logi

cal A

ND

MPI_Reduce

(sendbuf,recvbuf,count,datatype,op,root,comm)

C

Intro

pFE

M64

•In

trodu

ctio

n to

MP

I•

Roa

d to

Par

alle

l FE

M–

Loca

l Dat

a S

truct

ure

•M

PI fo

r Par

alle

l FEM

–C

olle

ctiv

e C

omm

unic

atio

n–

Poin

t-to-

Poin

t (Pe

er-to

-Pee

r) C

omm

unic

atio

n

P

aral

lel p

roce

dure

s ar

e re

quire

d in

:

Dot

pro

duct

s

Mat

rix-v

ecto

r mul

tiplic

atio

n

How

to “

Para

lleliz

e” It

erat

ive

Solv

ers

?e.

g. C

G m

etho

d (w

ith n

o pr

econ

ditio

ning

)

Compute r(0)= b-[A]x(0)

fori= 1, 2, …

z(i-1)= r(i-1)

i-1= r

(i-1) z(i-1)

ifi=1

p(1)= z(0)

else

i-1=

i-1/ i

-2

p(i)= z

(i-1) + i

-1p(i)

endif

q(i)= [A]p(i)

i = i

-1/p(i)q(i)

x(i)= x(i-1) + ip(i)

r(i)= r(i-1) -

iq(i)

check convergence |r|

end

Intro

pFE

M65

Mat

rix-V

ecto

r Pro

duct

sV

alue

s at

Ext

erna

l Poi

nts:

P-to

-P C

omm

unic

atio

n/*//--{q}= [A]{p}

*/for(i=0;i<N;i++){

W[Q][i] = Diag[i] * W[P][i];

for(j=Index[i];j<Index[i+1];j++){

W[Q][i] += AMat[j]*W[P][Item[j]];

}}

40

12

35

Intro

pFE

M66

Mat

-Vec

Pro

duct

s: L

ocal

Op.

Pos

sibl

e0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

=

Intro

pFE

M67

Mat

-Vec

Pro

duct

s: L

ocal

Op.

Pos

sibl

e0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

=

Intro

pFE

M68

Mat

-Vec

Pro

duct

s: L

ocal

Op.

Pos

sibl

e0

1

2

3

0

1

2

3

0

1

2

3

0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3

=

Intro

pFE

M69

Mat

-Vec

Pro

duct

s: L

ocal

Op.

#0

0

1

2

3

0 1 2 3

0 1 2 3

=

4

01

23

4

Intro

pFE

M70

Mat

-Vec

Pro

duct

s: L

ocal

Op.

#1

0

1

2

3

0 1 2 3

0 1 2 3

=

40

12

35

0

1

2

3

0 1 2 3

0 1 2 3

=

4 5

Intro

pFE

M71

Mat

-Vec

Pro

duct

s: L

ocal

Op.

#2

0 1 2 3

0 1 2 3

=

0

1

2

3

0 1 2 3

0 1 2 3

=

4

0

1

2

3

40

12

3

Intro

pFE

M72

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

01

23

45

67

89

1011

01

23

45

67

89

10

01

23

4

78

910

11

12

3

78

910

0

34

56

78

34

56

7

01

23

45

67

89

1011

12

34

56

78

910

0

Intro

pFE

M73

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

Loca

l ID

: Sta

rting

from

0 fo

r nod

e an

d el

em a

t eac

h do

mai

n

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

Intro

pFE

M74

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

Inte

rnal

/Ext

erna

l Nod

es

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

Intro

pFE

M75

Wha

t is

Peer

-to-P

eer C

omm

unic

atio

n ?

•C

olle

ctiv

e C

omm

unic

atio

n–

MP

I_R

educ

e, M

PI_

Sca

tter/G

athe

r etc

.–

Com

mun

icat

ions

with

all

proc

esse

s in

the

com

mun

icat

or–

App

licat

ion

Are

a•

BE

M, S

pect

ral M

etho

d, M

D: g

loba

l int

erac

tions

are

con

side

red

•D

ot p

rodu

cts,

MA

X/M

IN: G

loba

l Sum

mat

ion

& C

ompa

rison

•P

eer-

toP

eer/P

oint

-to-P

oint

–M

PI_

Sen

d, M

PI_

Rec

eive

–C

omm

unic

atio

n w

ith li

mite

d pr

oces

ses

•N

eigh

bors

–A

pplic

atio

n A

rea

•FE

M, F

DM

: Loc

aliz

ed M

etho

d

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

Intro

pFE

M76

MPI

_Ise

nd•

Beg

ins

a no

n-bl

ocki

ng s

end

–S

end

the

cont

ents

of s

endi

ng b

uffe

r (st

artin

g fro

m sendbuf

, num

ber o

f mes

sage

s: count

) to

dest

with

tag

. –

Con

tent

s of

sen

ding

buf

fer c

anno

t be

mod

ified

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

.

•MPI_Isend

(sendbuf,count,datatype,dest,tag,comm,request)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

r–

count

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

t–

dest

int

Ira

nk o

f des

tinat

ion

–tag

int

Im

essa

ge ta

gTh

is in

tege

r can

be

used

by

the

appl

icat

ion

to d

istin

guis

h m

essa

ges.

Com

mun

icat

ion

occu

rs if

tag’s

of

MPI_Isend

and MPI_Irecv

are

mat

ched

. U

sual

ly ta

g is

set

to b

e “0

” (in

this

cla

ss),

–comm

MP

I_C

ommI

com

mun

icat

or–

request

MP

I_R

eque

stO

com

mun

icat

ion

requ

est a

rray

used

in MPI_Waitall

CIn

tro p

FEM

77

•MPI_Isend

(sendbuf,count,datatype,dest,tag,comm,request)

sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

rcount

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

rdatatype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

tdest

int

Ira

nk o

f des

tinat

ion

78

SEN

D: s

endi

ng fr

om b

ound

ary

node

sSe

nd c

ontin

uous

dat

a to

sen

d bu

ffer o

f nei

ghbo

rs

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

Intro

pFE

M

Intro

pFE

M78

Com

mun

icat

ion

Req

uest

: req

uest

•MPI_Isend

(sendbuf,count,datatype,dest,tag,comm,request)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

r–

count

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

t–

dest

int

Ira

nk o

f des

tinat

ion

–tag

int

Im

essa

ge ta

gTh

is in

tege

r can

be

used

by

the

appl

icat

ion

to d

istin

guis

h m

essa

ges.

Com

mun

icat

ion

occu

rs if

tag’s

of

MPI_Isend

and MPI_Irecv

are

mat

ched

. U

sual

ly ta

g is

set

to b

e “0

” (in

this

cla

ss),

–comm

MP

I_C

ommI

com

mun

icat

or–

request

MP

I_R

eque

stO

com

mun

icat

ion

requ

est u

sed

in MPI_Waitall

Siz

e of

the

arra

y is

tota

l num

ber o

f nei

ghbo

ring

proc

esse

s

•Ju

st d

efin

e th

e ar

ray

C

Intro

pFE

M79

80

REC

V: re

ceiv

ing

to e

xter

naln

odes

Rec

v. c

ontin

uous

dat

a to

recv

. buf

fer f

rom

nei

ghbo

rs•

MPI_Irecv

(recvbuf,count,datatype,source,tag,comm,request)

recvbuf

choi

ceI

star

ting

addr

ess

of re

ceiv

ing

buffe

rcount

int

Inu

mbe

r of e

lem

ents

in re

ceiv

ing

buffe

rdatatype

MP

I_D

atat

ypeI

data

type

of e

ach

rece

ivin

g bu

ffer e

lem

ent

source

int

Ira

nk o

f sou

rce

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

Intro

pFE

M

Intro

pFE

M80

MPI

_Ire

cv•

Beg

ins

a no

n-bl

ocki

ng re

ceiv

e –

Rec

eivi

ng th

e co

nten

ts o

f rec

eivi

ng b

uffe

r (st

artin

g fro

m recvbuf

, num

ber o

f mes

sage

s:

count

) fro

m source

with

tag

. –

Con

tent

s of

rece

ivin

g bu

ffer c

anno

t be

used

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

.

•MPI_Irecv

(recvbuf,count,datatype,source,tag,comm,request)

–recvbuf

choi

ceI

star

ting

addr

ess

of re

ceiv

ing

buffe

r–

count

int

Inu

mbe

r of e

lem

ents

in re

ceiv

ing

buffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

ach

rece

ivin

g bu

ffer e

lem

ent

–source

int

Ira

nk o

f sou

rce

–tag

int

Im

essa

ge ta

gTh

is in

tege

r can

be

used

by

the

appl

icat

ion

to d

istin

guis

h m

essa

ges.

Com

mun

icat

ion

occu

rs if

tag’s

of

MPI_Isend

and MPI_Irecv

are

mat

ched

. U

sual

ly ta

g is

set

to b

e “0

” (in

this

cla

ss),

–comm

MP

I_C

ommI

com

mun

icat

or–

request

MP

I_R

eque

stO

com

mun

icat

ion

requ

est a

rray

used

in MPI_Waitall

CIn

tro p

FEM

81

MPI

_Wai

tall

•MPI_Waitall

bloc

ks u

ntil

all c

omm

’s, a

ssoc

iate

d w

ith request

in th

e ar

ray,

co

mpl

ete.

It is

use

d fo

r syn

chro

nizi

ng MPI_Isend

and MPI_Irecv

in th

is c

lass

.•

At s

endi

ng p

hase

, con

tent

s of

sen

ding

buf

fer c

anno

t be

mod

ified

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

. At r

ecei

ving

pha

se, c

onte

nts

of re

ceiv

ing

buffe

r ca

nnot

be

used

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

.•

MPI_Isend

and MPI_Irecv

can

be s

ynch

roni

zed

sim

ulta

neou

sly

with

a s

ingl

e MPI_Waitall

if it

is c

onsi

tent

.–

Sam

e request

shou

ld b

e us

ed in

MPI_Isend

and MPI_Irecv

.•

Its o

pera

tion

is s

imila

r to

that

of MPI_Barrier

but, MPI_Waitall

can

not b

e re

plac

ed b

yMPI_Barrier

.–

Pos

sibl

e tro

uble

s us

ing MPI_Barrier

inst

ead

of MPI_Waitall

: Con

tent

s of

request

and

status

are

not u

pdat

ed p

rope

rly, v

ery

slow

ope

ratio

ns e

tc.

•MPI_Waitall

(count,request,status)

–count

int

Inu

mbe

r of p

roce

sses

to b

e sy

nchr

oniz

ed

–request

MP

I_R

eque

stI/O

com

m. r

eque

st u

sed

in MPI_Waitall

(arr

ay s

ize:

count

)–

status

MP

I_S

tatu

sO

arra

y of

sta

tus

obje

cts

MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’C

Intro

pFE

M82

Arr

ay o

f sta

tus

obje

ct:

stat

us

•MPI_Waitall

(count,request,status)

–count

int

Inu

mbe

r of p

roce

sses

to b

e sy

nchr

oniz

ed

–request

MP

I_R

eque

stI/O

com

m. r

eque

st u

sed

in MPI_Waitall

(arr

ay s

ize:

count

)–

status

MP

I_S

tatu

sO

arra

y of

sta

tus

obje

cts

MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’

•Ju

st d

efin

e th

e ar

ray

C

Intro

pFE

M83

Intro

pFE

M84

Nod

e-ba

sed

Part

ition

ing

inte

rnal

nod

es -

elem

ents

-ex

tern

al n

odes

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

12

34

5

21

22

23

24

25

16

17

18

20

11

12

13

14

15

67

89

10

19PE#0

PE#1

PE#2

PE#3

Des

crip

tion

of D

istr

ibut

ed L

ocal

Dat

a

•In

tern

al/E

xter

nal N

odes

–N

umbe

ring:

Sta

rting

from

inte

rnal

pts,

th

en e

xter

nalp

ts a

fter t

hat

•N

eigh

bors

–S

hare

s ov

erla

pped

ele

men

ts–

Num

ber a

nd ID

of n

eigh

bors

•E

xter

nal N

odes

–Fr

om w

here

, how

man

y, a

nd w

hich

ex

tern

al p

oint

s ar

e re

ceiv

ed/im

porte

d ?

•B

ound

ary

Nod

es–

To w

here

, how

man

y an

d w

hich

bo

unda

ry p

oint

s ar

e se

nt/e

xpor

ted

?

85M

PI P

rogr

amm

ing

71

23

109

1112

56

84

Exte

rnal

Nod

es: R

ECEI

VEP

E#2

: re

ceiv

e in

form

atio

n fo

r “ex

tern

al n

odes

71

23

109

1112

56

84

PE#2

12

3

45

67

89

11

10141315

12

PE#0

34

8

69

1012

12

5

11

7

PE#3

Intro

pFE

M86

Bou

ndar

y N

odes

: SEN

DP

E#2

: se

nd in

form

atio

n on

“bou

ndar

y no

des”

71

23

109

1112

56

84

PE#2

12

3

45

67

89

11

10141315

12

PE#0

34

8

69

1012

12

5

11

7

PE#3

Intro

pFE

M87

Gen

eral

ized

Com

m. T

able

: Sen

d•

Nei

ghbo

rs–

Nei

bPE

Tot,

Nei

bPE

[nei

b]•

Mes

sage

siz

e fo

r eac

h ne

ighb

or–

expo

rt_in

dex[

neib

], ne

ib=

0, N

eibP

ETo

t-1•

ID o

f bou

ndar

ypo

ints

–ex

port_

item

[k],

k= 0

, exp

ort_

inde

x[N

eibP

ETo

t]-1

•M

essa

ges

to e

ach

neig

hbor

–S

endB

uf[k

], k=

0, e

xpor

t_in

dex[

Nei

bPE

Tot]-

1

C

Intro

pFE

M88

SEN

D: M

PI_I

send

/Irec

v/W

aita

llne

ib#0

Send

Buf

neib

#1ne

ib#2

neib

#3

BU

Flen

gth_

eB

UFl

engt

h_e

BU

Flen

gth_

eB

UFl

engt

h_e

expo

rt_i

ndex

[0]

expo

rt_i

ndex

[1]

expo

rt_i

ndex

[2]

expo

rt_i

ndex

[3]

expo

rt_i

ndex

[4]

for (neib=0; neib<NeibPETot;neib++){

for (k=export_index[neib];k<export_index[neib+1];k++){

kk= export_item[k];

SendBuf[k]= VAL[kk];

}} for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_e= export_index[neib];

iE_e= export_index[neib+1];

BUFlength_e= iE_e -iS_e

ierr= MPI_Isend

(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqSend[neib])

} MPI_Waitall(NeibPETot, ReqSend, StatSend);

Cop

ied

to s

endi

ng b

uffe

rs

expo

rt_i

tem

(exp

ort_

inde

x[ne

ib]:e

xpor

t_in

dex[

neib

+1]-1

) ar

e se

nt to

nei

b-th

nei

ghbo

r

CIn

tro p

FEM

89

Gen

eral

ized

Com

m. T

able

: Rec

eive

•N

eigh

bors

–N

eibP

ETo

t ,N

eibP

E[n

eib]

•M

essa

ge s

ize

for e

ach

neig

hbor

–im

port_

inde

x[ne

ib],

neib

= 0,

Nei

bPE

Tot-1

•ID

of e

xter

nalp

oint

s–

impo

rt_ite

m[k

], k=

0, i

mpo

rt_in

dex[

Nei

bPE

Tot]-

1•

Mes

sage

s fro

m e

ach

neig

hbor

–R

ecvB

uf[k

], k=

0, i

mpo

rt_in

dex[

Nei

bPE

Tot]-

1

C

Intro

pFE

M90

REC

V: M

PI_I

send

/Irec

v/W

aita

ll

neib

#0R

ecvB

ufne

ib#1

neib

#2ne

ib#3

BU

Flen

gth_

iB

UFl

engt

h_i

BU

Flen

gth_

iB

UFl

engt

h_i

for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_i= import_index[neib];

iE_i= import_index[neib+1];

BUFlength_i= iE_i -iS_i

ierr= MPI_Irecv

(&RecvBuf[iS_i], BUFlength_i, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqRecv[neib])

} MPI_Waitall(NeibPETot, ReqRecv, StatRecv);

for (neib=0; neib<NeibPETot;neib++){

for (k=import_index[neib];k<import_index[neib+1];k++){

kk= import_item[k];

VAL[kk]= RecvBuf[k];

}} im

port

_ind

ex[0

]im

port

_ind

ex[1

]im

port

_ind

ex[2

]im

port

_ind

ex[3

]im

port

_ind

ex[4

]C

impo

rt_i

tem

(im

port

_ind

ex[n

eib]

:impo

rt_i

ndex

[nei

b+1]

-1)

are

rece

ived

from

nei

b-th

nei

ghbo

r

Cop

ied

from

rece

ivin

g bu

ffer

Intro

pFE

M91

Rel

atio

nshi

p SE

ND

/REC

V

do neib= 1, NEIBPETOT

iS_i= import_index(neib-1) + 1

iE_i= import_index(neib )

BUFlength_i= iE_i + 1 -iS_i

call MPI_IRECV &

& (RECVbuf(iS_i), BUFlength_i,MPI_INTEGER, NEIBPE(neib),0,&

& MPI_COMM_WORLD, request_recv(neib),ierr)

enddo

do neib= 1, NEIBPETOT

iS_e= export_index(neib-1) + 1

iE_e= export_index(neib )

BUFlength_e= iE_e + 1 -iS_e

call MPI_ISEND

&& (SENDbuf(iS_e), BUFlength_e, MPI_INTEGER, NEIBPE(neib),0,&

& MPI_COMM_WORLD, request_send(neib),ierr)

enddo

•C

onsi

sten

cy o

f ID

’s o

f sou

rces

/des

tinat

ions

, siz

e an

d co

nten

ts o

f mes

sage

s !

•C

omm

unic

atio

n oc

curs

whe

n N

EIB

PE

(nei

b) m

atch

es

Intro

pFE

M92

Rel

atio

nshi

p SE

ND

/REC

V (#

0 to

#3)

•C

onsi

sten

cy o

f ID

’s o

f sou

rces

/des

tinat

ions

, siz

e an

d co

nten

ts o

f mes

sage

s !

•C

omm

unic

atio

n oc

curs

whe

n N

EIB

PE

(nei

b) m

atch

es

Send

#0

Rec

v. #

3

#1 #5 #9

#1 #10

#0

#3

NEI

BPE

(:)=1

,3,5

,9N

EIB

PE(:)

=1,0

,10

Intro

pFE

M93

Exam

ple:

SEN

DP

E#2

: se

nd in

form

atio

n on

“bou

ndar

y no

des”

71

23

109

1112

56

84

PE#2

12

3

45

67

89

11

10141315

12

PE#0

34

8

69

1012

12

5

11

7

PE#3

NEIBPE= 2

NEIBPE[0]=3, NEIBPE[1]= 0

EXPORT_INDEX[0]= 0

EXPORT_INDEX[1]= 2

EXPORT_INDEX[2]= 2+3 = 5

EXPORT_ITEM[0-4]=1,4,4,5,6

Intro

pFE

M94

Send

ing

Buf

fer i

s ni

ce ..

.

Num

berin

g of

thes

e bo

unda

ry

node

s is

not

con

tinuo

us, t

here

fore

th

e fo

llow

ing

proc

edur

e of

M

PI_

Isen

d is

not

app

lied

dire

ctly

:

・S

tarti

ng a

ddre

ss o

f sen

ding

buf

fer

・X

X-m

essa

ges

from

that

add

ress

for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_e= export_index[neib];

iE_e= export_index[neib+1];

BUFlength_e= iE_e -iS_e

ierr= MPI_Isend

(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqSend[neib])

}

C95

71

23

109

1112

56

84

PE#2

12

3

45

67

89

11

10141315

12

PE#0

34

8

69

1012

12

5

11

7

PE#3

NEIBPE= 2

NEIBPE[0]=3, NEIBPE[1]= 0

EXPORT_INDEX[0]= 0

EXPORT_INDEX[1]= 2

EXPORT_INDEX[2]= 2+3 = 5

EXPORT_ITEM[0-4]=1,4,4,5,6

Exam

ple:

REC

EIVE

PE

#2 :

rece

ive

info

rmat

ion

for “

exte

rnal

nod

es”

71

23

109

1112

56

84

PE#2

12

3

45

67

89

11

10141315

12

PE#0

34

8

69

1012

12

5

11

7

PE#3

NEIBPE= 2

NEIBPE[0]=3, NEIBPE[1]= 0

IMPORT_INDEX[0]= 0

IMPORT_INDEX[1]= 3

IMPORT_INDEX[2]= 3+3 = 6

IMPORT_ITEM[0-5]=7,8,10,9,11,12

Intro

pFE

M96

Not

ice:

Sen

d/R

ecv

Arr

ays

#PE0send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE1recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

#PE1send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE0recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

•“le

ngth

_sen

d” o

f sen

ding

pro

cess

mus

t be

equa

l to

“leng

th_r

ecv”

of r

ecei

ving

pro

cess

.–

PE

#0 to

PE

#1, P

E#1

to P

E#0

•“s

endb

uf” a

nd “r

ecvb

uf”:

diffe

rent

add

ress

97

Com

mun

icat

ion

Patte

rn u

sing

1D

St

ruct

ure

halo

halo

halo

halo

Dr.

Osn

i Mar

ques

(La

wre

nce

Ber

kele

y N

atio

nal L

abor

ator

y)

98

Dis

trib

uted

Loc

al D

ata

Stru

ctur

e fo

r Pa

ralle

l Com

puta

tion

•D

istri

bute

d lo

cal d

ata

stru

ctur

e fo

r dom

ain-

to-d

oain

co

mm

unic

atio

ns h

as b

een

intro

duce

d, w

hich

is

appr

opria

te fo

r suc

h ap

plic

atio

ns w

ith s

pars

e co

effic

ient

m

atric

es (e

.g. F

DM

, FE

M, F

VM

etc

.).–

SP

MD

–Lo

cal N

umbe

ring:

Inte

rnal

pts

to E

xter

nal p

ts–

Gen

eral

ized

com

mun

icat

ion

tabl

e

•E

very

thin

g is

eas

y, if

pro

per d

ata

stru

ctur

e is

def

ined

:–

Val

ues

at b

ound

ary

pts

are

copi

ed in

to s

endi

ng b

uffe

rs–

Sen

d/R

ecv

–V

alue

s at

ext

erna

lpts

are

upd

ated

thro

ugh

rece

ivin

g bu

ffers

99


Recommended