+ All Categories
Home > Documents > Parallel Computing

Parallel Computing

Date post: 21-Mar-2016
Category:
Upload: marli
View: 32 times
Download: 0 times
Share this document with a friend
Description:
Parallel Computing. Michael Young, Mark Iredell. NWS Computer History. 1968 CDC 6600 1974 IBM 360 1983 CYBER 205first vector parallelism 1991 Cray Y-MPfirst shared memory parallelism 1994 Cray C-90~16 gigaflops 2000 IBM SPfirst distributed memory parallelism 2002 IBM SP P3 - PowerPoint PPT Presentation
17
Parallel Computing Michael Young, Mark Iredell
Transcript
Page 1: Parallel Computing

Parallel Computing

Michael Young, Mark Iredell

Page 2: Parallel Computing

NEMS/GFS Modeling Summer School 2

NWS Computer History 1968 CDC 6600

1974 IBM 360

1983 CYBER 205 first vector parallelism

1991 Cray Y-MP first shared memory parallelism

1994 Cray C-90 ~16 gigaflops

2000 IBM SP first distributed memory parallelism

2002 IBM SP P3

2004 IBM SP P4

2006 IBM SP P5

2009 IBM SP P6

2013 IBM Idataplex SB ~200 teraflops

Page 3: Parallel Computing

NEMS/GFS Modeling Summer School 3

One time loop is divided into : Computation of the tendencies of

divergence, surface pressure, temperature and vorticity and tracers (grid)

Semi-implicit time integration (spectral) First half of time filter (spectral) Physical effects included in the model (grid) Damping to simulate subgrid dissipation

(spectral) Completion of the time filter (spectral)

Algorithm of the GFS Spectral Model

Page 4: Parallel Computing

NEMS/GFS Modeling Summer School 4

Definitions :Operational Spectral Truncation T574 with a Physical Grid of 1760 longitudes by 880 latitudes and

64 vertical levels (23 km resolution)θ is latitudeλ is longitudel is zonal wavenumbern is total wavenumber (zonal + meridional)

Algorithm of the GFS Spectral Model

Page 5: Parallel Computing

NEMS/GFS Modeling Summer School 5

Three Variable Spaces

Spectral (L x N x K) Fourier (L x J x K) Physical Grid ( I x J x k)I is number of longitude pointsJ is number of latitudesK is number of levels

Page 6: Parallel Computing

NEMS/GFS Modeling Summer School 6

The Spectral Technique

All fields possess a spherical harmonic representation:

where

ilJ

l

J

ln

ln

ln ePfF )(sin),(

0

nl

nnll

nln dx

xdxlnlnn

nxP

)1()1()!(2))(12(

!21)(

222

21

Page 7: Parallel Computing

NEMS/GFS Modeling Summer School 7

Spectral to Grid Transform

Legendre transform:

Fourier transform using FFT:

)(sin)( ln

J

ln

ln

l PfF

J

l

ill eFF0

)(),(

Page 8: Parallel Computing

NEMS/GFS Modeling Summer School 8

Grid to Spectral Transform

Inverse Fourier transform (FFT):

Inverse Legendre (Gaussian quadrature):

2

0

2

2

cos)(sin),(21 ddePFf ill

nln

M

j

Mjli

j

ill

eF

deFF

0

2

0

),(

),(21)(

N

kk

lnk

lk

ln PFwF

1

)(sin)(

Page 9: Parallel Computing

NEMS/GFS Modeling Summer School 9

MPI and OpenMP

GFS uses Hybrid 1-Dimensional MPI layout and OpenMP threading at do loop level

MPI (Message Passing Interface) is used to communicate between tasks which contain a subgrid of a field

OpenMP supports shared memory multiprocessor programming (threading) using compiler directives

Page 10: Parallel Computing

NEMS/GFS Modeling Summer School 10

MPI and OpenMP

Data Transposes are implemented using MPI_alltoallv

Required to switch between the variable spaces which have different 1-D MPI decompositions

Page 11: Parallel Computing

NEMS/GFS Modeling Summer School 11

Spectral to Physical Grid

Call sumfln_slg_gg (Legendre Transform) Call four_to_grid (FFT) Data Transpose after Legendre Transform in

preparation for FFT to Physical grid space

call mpi_alltoallv(works,sendcounts,sdispls,mpi_r_mpi, x workr,recvcounts,sdispls,mpi_r_mpi, x mc_comp,ierr)

Page 12: Parallel Computing

NEMS/GFS Modeling Summer School 12

Physical Grid to Spectral

Call Grid_to_four (Inverse FFT) Call Four2fln_gg (Inverse Legendre Transform) Data Transpose performed before the Inverse

Legendre Transform

call mpi_alltoallv(works,sendcounts,sdispls,MPI_R_MPI, x workr,recvcounts,sdispls,MPI_R_MPI, x MC_COMP,ierr)

Page 13: Parallel Computing

NEMS/GFS Modeling Summer School 13

Physical Grid Space Parallelism

1-D MPI distributed over latitudes. OpenMP threading used on longitude points.

Each MPI task holds a group of latitudes, all longitudes, and all levels

Cyclic distribution of latitudes used for load balancing the MPI tasks due to a smaller number of longitude points per latitude as latitude increases (approaches the poles).

Page 14: Parallel Computing

NEMS/GFS Modeling Summer School 14

Physical Grid Space Parallelism

Cyclic distribution of latitudes example5 MPI tasks and 20 Latitudes would be Task 1 2 3 4 5Lat 1 2 3 4 5Lat 10 9 8 7 6Lat 11 12 13 14 15Lat 20 19 18 17 16

Page 15: Parallel Computing

NEMS/GFS Modeling Summer School 15

Physical Grid Space Parallelism

Physical Grid Vector Length per OpenMP thread

NGPTC (namelist variable) defines number (block) of longitude points per group (vector length per processor) that each thread will work on

Typically set anywhere from 15-30 points

Page 16: Parallel Computing

NEMS/GFS Modeling Summer School 16

Spectral Space Parallelism

Hybrid 1-D MPI layout with OpenMP threading Spectral space 1-D MPI distributed over

zonal wave numbers (l's). OpenMP threading used on a stack of variables times number of levels.

Each MPI task holds a group of l’s, all n’s, and all levels

Cyclic distribution of l's used for load balancing the MPI tasks due to smaller numbers of meridional points per zonal wave number as the wave number increases.

Page 17: Parallel Computing

NEMS/GFS Modeling Summer School 17

GFS Scalability

1-D MPI scales to 2/3 of the spectral truncation. For T574 about 400 MPI tasks.

OpenMP threading scales to 8 threads. T574 scales to 400 x 8 = 3200 processors.


Recommended