(Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.

Post on 20-Jan-2016

225 views 10 download

Tags:

transcript

(Short) Introduction to Parallel Computing

CS 6560: Operating Systems Design

2

Why Parallel Computing? Performance!

Many applications require serious performance. Examples:

Structural biology

Chemical dynamics

Pharmaceutical design

Weather forecasting

Human genome

Ocean modeling

3

Processor Performance: Need Parallelism!

2-3 GHz

4

Case Study 1: Simulating Ocean Currents

Model as two-dimensional gridsDiscretize in space and time

finer spatial and temporal resolution => greater accuracy

Many different computations per time step

set up and solve equations

Concurrency across and within grid computations(a) Cross sections (b) Spatial discretization

of a cross section

5

Simulate interactions of many stars evolving over time

Computing forces is expensive

O(n2) brute force approach

Hierarchical methods take advantage of force law: Gm1m2

r2

Star on which forcesare being computed

Star too close toapproximate

Small group far enough away toapproximate to center of mass

Large group farenough away toapproximate

Case Study 2: Simulating Galaxy Evolution

6

Case Study 2: Barnes-Hut

Many time steps, plenty of concurrency across stars

Locality Goal

Particles close together in space should be on same processor

Difficulties: Non-uniform, dynamically changing

Spatial Domain Quad-tree

7

Case Study 3: Rendering by Ray Tracing

Goal is to produce image from representation of real world

Shoot rays into scene through pixels in projection plane

Result is color for pixel

Rays shot through pixels in projection plane are called primary rays

Reflect and refract when they hit objects

Recursive process generates ray tree per primary ray

Tradeoffs between execution time and image quality Viewpoint

Projection Plane

3D Scene

Ray fromviewpoint to

upper right cornerpixel

Dynamicallygenerated ray

8

Partitioning

Need dynamic assignment

Use contiguous blocks to exploit spatial coherence among neighboring rays, plus tiles for task stealing

A block,the unit ofassignment

A tile,the unit of decompositionand stealing

9

Sample Speedups

Speedups on NUMA multiprocessor

Speedup = (best) time on 1 processor / time on multiple processors

10

Ideal/Linear Speedup? Amdahl’s Law

If a fraction s of a computation is not parallelizable, then the best achievable speedup is

sS

1

Speedup for computations with fraction s of sequential work

0

20

40

60

80

100

1 10 20 30 40 50 60 70 80 90 100

Number of processors

Sp

eed

up

0

0.01

0.025

0.05

0.1

0.2

NNss

S ,/)1(

1

11

Pictorial Depiction of Amdahl’s Law

1

p

1

Time

Parallelizable work Sequential work

12

But Goal is not just Performance

At some point, we’re willing to trade some performance for:

Ease of programming

High portability

Low cost

Ease of programming & high portability

Parallel programming for the masses

Leverage new or faster hardware asap

Low cost

High-end parallel machines are expensive resources

13

Parallel Applications

Scientific computing not the only class of parallel applications

Examples of non-scientific parallel applications:

Data mining

Real-time rendering

Distributed servers

Today, programmers are encouraged to find parallelism in all sorts of software