+ All Categories
Home > Documents > Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf ·...

Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf ·...

Date post: 19-Apr-2018
Category:
Upload: trinhxuyen
View: 219 times
Download: 4 times
Share this document with a friend
45
Overview of High Performance Computing Timothy H. Kaiser, PH.D. [email protected] 1 http://geco.mines.edu/workshop
Transcript
Page 1: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Overview ofHigh Performance

Computing

Timothy H. Kaiser, [email protected]

1

http://geco.mines.edu/workshop

Page 2: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

2

This tutorial will cover all three time slots. In the first session we will discuss the importance of parallel computing to high performance computing. We will by example, show the basic concepts of parallel computing. The advantages and disadvantages of parallel computing will be discussed. We will present an overview of current and future trends in HPC hardware. The second session will provide an introduction to MPI, the most common package used to write parallel programs for HPC platforms. As tradition dictates, we will show how to write "Hello World" in MPI. Attendees will be shown how to and allowed to build and run relatively simple examples on a consortium resource. The third session will briefly discuss other important HPC topics. This will include a discussion of OpenMP, hybrid programming, combining MPI and OpenMP. Some computational libraries available for HPC will be highlighted. We will briefly mention parallel computing using graphic processing units (GPUs). 

Page 3: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Today’s Overview

• HPC computing in a nutshell?

• Basic MPI - run an example

• A few additional MPI features

• A “Real” MPI example

• Scripting

• OpenMP

• Libraries and other stuff

3

Page 4: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Introduction

• What is parallel computing?

• Why go parallel?

• When do you go parallel?

• What are some limits of parallel computing?

• Types of parallel computers

• Some terminology

4

Page 5: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

What is Parallelism?

The concept is simple:Parallelism = applying multiple processors

to a single problem

• Consider your favorite computational application

• One processor can give me results in N hours

• Why not use N processors-- and get the results in just one hour?

5

Page 6: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Parallel computing iscomputing by committee

Process 0

does work

for this region

Process 1

does work

for this region

Process 2

does work

for this region

Process 3

does work

for this region

Grid of a Problem

to be Solved• Parallel computing: the use of multiple computers or processors working together on a common task.

• Each processor works on its section of the problem

• Processors are allowed to exchange information with other processors

6

Page 7: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Why do parallel computing?

• Limits of single CPU computing

• Available memory

• Performance

• Parallel computing allows:

• Solve problems that don’t fit on a single CPU

• Solve problems that can’t be solved in a reasonable time

7

Page 8: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Why do parallel computing?

• We can run…

• Larger problems

• Faster

• More cases

• Run simulations at finer resolutions

• Model physical phenomena more realistically

8

Page 9: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Weather Forecasting

•Atmosphere is modeled by dividing it into three-dimensional regions or cells

•1 mile x 1 mile x 1 mile (10 cells high) •about 500 x 10 6 cells.

•The calculations of each cell are repeated many times to model the passage of time. •About 200 floating point operations per cell per time step or 10 11 floating point operations necessary per time step •10 day forecast with 10 minute resolution => 1.5x1014 flop

•100 Mflops would take about 17 days•1.7 Tflops would take 2 minutes•17 Tflops would take 12 seconds

9

Page 10: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Modeling Motion of Astronomical bodies

(brute force)

• Each body is attracted to each other body by gravitational forces.

• Movement of each body can be predicted by calculating the total force experienced by the body.

• For N bodies, N - 1 forces / body yields N 2 calculations each time step

• A galaxy has, 10 11 stars => 10 9 years for one iteration

• Using a N log N efficient approximate algorithm => about a year

• NOTE: This is closely related to another hot topic: Protein Folding

10

Page 11: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Types of parallelism two extremes

• Data parallel

• Each processor performs the same task on different data

• Example - grid problems

• Bag of Tasks or Embarrassingly Parallel is a special case

• Task parallel

• Each processor performs a different task

• Example - signal processing such as encoding multitrack data

• Pipeline is a special case

11

Page 12: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Simple data parallel program

Starting partial differential equation:

Finite DifferenceApproximation:

PE #0 PE #1 PE #2

PE #4 PE #5 PE #6

PE #3

PE #7

y

x

• Example: integrate 2-D propagation problem

12

Page 13: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Typical Task Parallel Application

DATA NormalizeTask

FFTTask

MultiplyTask

InverseFFTTask

• Signal processing

• Use one processor for each task

• Can use more processors if one is overloaded

• This is a pipeline

13

Page 14: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

start parallel

work 1d

work 1c

work 1b

work 2c

work 2d

work 2b

End Parallel

work (N)a

work (N)b

work (N)c

work (N)d

work 1a

Begin

work 2a

End

Communicate & Repeat

Parallel Program Structure

14

Page 15: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

start parallel

work 1d

work 1c

work 1b

work 2c

work 2d

work 2b

End Parallel

work (N)a

work (N)b

work (N)c

work (N)d

work 1a

Begin

work 2a

Start Serial Section

start parallel

work 1x work 2x

work 1y work 2yEnd Serial Section

work 1z work 2zEnd Parallel

work (N)x

work (N)y

Endwork (N)z

Communicate & Repeat

Parallel Problems

Serial Section(No Parallel Work)

Subtasks don’tfinish together

Not using all processors

15

Page 16: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

A “Real” example

16

#!/usr/bin/env pythonfrom sys import argvfrom os.path import isfilefrom time import sleepfrom math import sin,cos#fname="message"my_id=int(argv[1])print my_id, "starting program"#

else: myval=sin(10.0) notready=True while notready : if isfile(fname) : notready=False sleep(3) mf=open(fname,"r") message=float(mf.readline()) mf.close() total=myval**2+message**2 else: sleep(5)

if (my_id == 1): sleep(2) myval=cos(10.0) mf=open(fname,"w") mf.write(str(myval)) mf.close()

print my_id, "done with program"

Page 17: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Theoretical upper limits• All parallel programs contain:

• Parallel sections

• Serial sections

• Serial sections are when work is being duplicated or no useful work is being done, (waiting for others)

• Serial sections limit the parallel effectiveness

• If you have a lot of serial computation then you will not get good speedup

• No serial work “allows” perfect speedup

• Amdahl’s Law states this formally17

Page 18: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Amdahl’s Law

t p= (fp/N + fs) ts

S= ts t p=fp/N + fs1

• Amdahl’s Law places a strict limit on the speedup that can be realized by using multiple processors.

• Effect of multiple processors on run time

• Effect of multiple processors on speed up

• Where

• Fs = serial fraction of code

• Fp = parallel fraction of code

• N = number of processors

• Perfect speedup t=t1/n or S(n)=n

18

Page 19: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Illustration of Amdahl's Law

It takes only a small fraction of serial content

in a code todegrade the parallel

performance.

19

Page 20: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

•Amdahl’s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for communications.•In reality, communications will result in a further degradation of performance

Amdahl’s Law Vs. Reality

0

10

20

30

40

50

60

70

80

0 50 100 150 200 250Number of processors

Amdahl's LawReality

fp = 0.99

20

Page 21: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Sometimes you don’t get what you expect!

21

Page 22: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Some other considerations

• Writing effective parallel application is difficult

• Communication can limit parallel efficiency

• Serial time can dominate

• Load balance is important

• Is it worth your time to rewrite your application

• Do the CPU requirements justify parallelization?

• Will the code be used just once?

22

Page 23: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Parallelism Carries a Price Tag

• Parallel programming

• Involves a steep learning curve

• Is effort-intensive

• Parallel computing environments are unstable and unpredictable

• Don’t respond to many serial debugging and tuning techniques

• May not yield the results you want, even if you invest a lot of time

Will the investment of your time be worth it?

23

Page 24: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Terms related to algorithms

• Amdahl’s Law (talked about this already)

• Superlinear Speedup

• Efficiency

• Cost

• Scalability

• Problem Size

• Gustafson’s Law

24

Page 25: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

S(n) > n, may be seen on occasion, but usually this is due to using a suboptimal sequential algorithm or some unique feature of the

architecture that favors the parallel formation.

One common reason for superlinear speedup is the extra cache in the multiprocessor system

which can hold more of the problem data at any instant, it leads to less, relatively slow memory

traffic.

Superlinear Speedup

25

Page 26: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

EfficiencyEfficiency = Execution time using one processor over the

Execution time using a number of processors

Its just the speedup divided by the number of processors

26

Page 27: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Cost

The processor-time product or cost (or work) of a computation defined asCost = (execution time) x (total number of processors used)

The cost of a sequential computation is simply its execution time, t s . The cost of aparallel computation is t p x n. The parallel execution time, t p , is given by ts/S(n)

Hence, the cost of a parallel computation is given by

Cost-Optimal Parallel AlgorithmOne in which the cost to solve a problem on a multiprocessor is proportional to the cost

27

Page 28: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Scalability

Used to indicate a hardware design that allows the system to be increased in size and in doing so to obtain

increased performance - could be described as architecture or hardware scalability.

Scalability is also used to indicate that a parallel algorithm can accommodate increased data items with a low and bounded increase in computational steps -

could be described as algorithmic scalability.

28

Page 29: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Problem size

•Intuitively, we would think of the number of data elements being processed in the algorithm as a measure of size.

•However, doubling the date set size would not necessarily double the number of computational steps. It will depend upon the problem.

•For example, adding two matrices has this effect, but multiplying matrices quadruples operations.

Problem size: the number of basic steps in the best sequential algorithm for a given problem and data set size

Note: Bad sequential algorithms tend to scale well29

Page 30: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Other names for Scaling

• Strong Scaling (Engineering)

• For a fixed problem size how does the time to solution vary with the number of processors

• Weak Scaling

• How the time to solution varies with processor count with a fixed problem size per processor

30

Page 31: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Some Classes of machines

Processor

Memory

Processor

Memory

Processor

Memory

Processor

Memory

Network

Distributed MemoryProcessors only Have access to

their local memory“talk” to other processors over a network

31

Page 32: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Some Classes of machines

Processor

Processor

Processor

Processor

Processor

Processor

Processor

Processor

Memory

UniformShared

Memory(UMA)

All processorshave equal access

to Memory

Can “talk”via memory

32

Page 33: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Some Classes of machinesHybrid

Shared memory nodes connected by a network

. . .

33

Page 34: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Some Classes of machinesMore common today

Each node has a collectionof multicore chips

. . .

Ra has 268 nodes256 quad core dual socket 12 dual core quad socket

34

Page 35: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Some Classes of machines

Hybrid Machines

•Add special purpose processors to normal processors•Not a new concept but, regaining traction•Example: our Tesla Nvidia node, cuda1

"Normal" CPU

Special Purpose

Processor

FPGA, GPU, Vector,

Cell...

35

Page 36: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Network Topology

• For ultimate performance you may be concerned how you nodes are connected.

• Avoid communications between distant node

• For some machines it might be difficult to control or know the placement of applications

36

Page 37: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Network Terminology

• Latency

• How long to get between nodes in the network.

• Bandwidth

• How much data can be moved per unit time.

• Bandwidth is limited by the number of wires and the rate at which each wire can accept data and choke points

37

Page 38: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Ring

38

Page 39: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Grid

Wrapping produces torus

39

Page 40: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

TreeFat tree

the lines get wider as you

go up

40

Page 41: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Hypercube

100

001

011

000

010

101

111110

3 dimensional hypercube

41

Page 42: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

4D Hypercube

Some communications algorithms are hypercube basedHow big would a 7d hypercube be?

1000

0010

0110

0000

0100

1010

11101100

1001

0011

0111

0001

0101

1011

11111101

42

Page 43: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Star

?

Quality depends on what is in the center

43

Page 44: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

Example: An Infiniband Switch

Infiniband, DDR, Cisco 7024 IB Server Switch - 48 Port

Adaptors. Each compute node has one DDR 1-Port HCA

4X DDR=> 16Gbit/sec

140 nanosecond hardware latency

1.26 microsecond at software level

44

Page 45: Overview of High Performance Computinggeco.mines.edu/workshop/aug2011/01mon/HPC-Overview.pdf · Overview of High Performance Computing ... In the first session we will discuss the

45

Measured Bandwidth


Recommended