+ All Categories
Home > Documents > Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South...

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South...

Date post: 14-Jan-2016
Category:
Upload: naomi-elliott
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
57
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu http://mleg.cse.sc.edu/edu /csce569/ CSCE569 Parallel Computing University of South Carolina Department of Computer Science and Engineering
Transcript
Page 1: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Lecture 4TTH 03:30AM-04:45PM

Dr. Jianjun Huhttp://mleg.cse.sc.edu/edu/csc

e569/

CSCE569 Parallel Computing

University of South CarolinaDepartment of Computer Science and Engineering

Page 2: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Outline: Parallel Programming Modeland algorithm design

Task/channel modelAlgorithm design methodologyCase studies

Page 3: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Task/Channel ModelParallel computation = set of tasksTask

ProgramLocal memoryCollection of I/O ports

Tasks interact by sending messages through channels

Page 4: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Task/Channel Model

TaskTaskChannelChannel

Page 5: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Foster’s Design MethodologyPartitioningCommunicationAgglomerationMapping

Page 6: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Foster’s Methodology

P ro b lemP artitio ning

C o m m unic atio n

A gglo m eratio nM ap p ing

Page 7: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

PartitioningDividing computation and data into

piecesDomain decomposition

Divide data into piecesDetermine how to associate

computations with the dataFunctional decomposition

Divide computation into piecesDetermine how to associate data with

the computations

Page 8: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Example Domain Decompositions

Page 9: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Example Functional Decomposition

Page 10: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Partitioning ChecklistAt least 10x more primitive tasks than

processors in target computerMinimize redundant computations and

redundant data storagePrimitive tasks roughly the same sizeNumber of tasks an increasing function of

problem size

Page 11: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

CommunicationDetermine values passed among tasksLocal communication

Task needs values from a small number of other tasks

Create channels illustrating data flowGlobal communication

Significant number of tasks contribute data to perform a computation

Don’t create channels for them early in design

Page 12: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Communication ChecklistCommunication operations balanced

among tasksEach task communicates with only small

group of neighborsTasks can perform communications

concurrentlyTask can perform computations

concurrently

Page 13: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

AgglomerationGrouping tasks into larger tasksGoals

Improve performanceMaintain scalability of programSimplify programming

In MPI programming, goal often to create one agglomerated task per processor

Page 14: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Agglomeration Can Improve PerformanceEliminate communication between

primitive tasks agglomerated into consolidated task

Combine groups of sending and receiving tasks

Page 15: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Agglomeration ChecklistLocality of parallel algorithm has increasedReplicated computations take less time

than communications they replaceData replication doesn’t affect scalabilityAgglomerated tasks have similar

computational and communications costsNumber of tasks increases with problem

sizeNumber of tasks suitable for likely target

systemsTradeoff between agglomeration and code

modifications costs is reasonable

Page 16: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

MappingProcess of assigning tasks to processorsCentralized multiprocessor: mapping done

by operating systemDistributed memory system: mapping done

by userConflicting goals of mapping

Maximize processor utilizationMinimize interprocessor communication

Page 17: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Mapping Example

Page 18: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Optimal MappingFinding optimal mapping is NP-hardMust rely on heuristics

Page 19: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Mapping Decision TreeStatic number of tasks

Structured communicationConstant computation time per task

Agglomerate tasks to minimize commCreate one task per processor

Variable computation time per taskCyclically map tasks to processors

Unstructured communicationUse a static load balancing algorithm

Dynamic number of tasks

Page 20: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Mapping StrategyStatic number of tasksDynamic number of tasks

Frequent communications between tasksUse a dynamic load balancing algorithm

Many short-lived tasksUse a run-time task-scheduling algorithm

Page 21: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Mapping ChecklistConsidered designs based on one task per

processor and multiple tasks per processorEvaluated static and dynamic task

allocationIf dynamic task allocation chosen, task

allocator is not a bottleneck to performanceIf static task allocation chosen, ratio of

tasks to processors is at least 10:1

Page 22: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Case StudiesBoundary value problemFinding the maximumThe n-body problemAdding data input

Page 23: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Boundary Value Problem

Ice water Rod Insulation

Page 24: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Rod Cools as Time Progresses

Page 25: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finite Difference Approximation

Page 26: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

PartitioningOne data item per grid pointAssociate one primitive task with each grid

pointTwo-dimensional domain decomposition

Page 27: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

CommunicationIdentify communication pattern between

primitive tasksEach interior primitive task has three

incoming and three outgoing channels

Page 28: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Agglomeration and Mapping

Agglomeration

Page 29: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Sequential execution time – time to update elementn – number of elementsm – number of iterationsSequential execution time: m (n-1)

Page 30: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Parallel Execution Timep – number of processors – message latencyParallel execution time m((n-1)/p+2)

Page 31: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finding the Maximum Error

Computed 0.15 0.16 0.16 0.19Correct 0.15 0.16 0.17 0.18Error (%) 0.00% 0.00% 6.25% 5.26%

6.25%

Page 32: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

ReductionGiven associative operator a0 a1 a2 … an-1

ExamplesAddMultiplyAnd, OrMaximum, Minimum

Page 33: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Parallel Reduction Evolution

Page 34: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Parallel Reduction Evolution

Page 35: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Parallel Reduction Evolution

Page 36: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Binomial Trees

Subgraph of hypercube

Page 37: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finding Global Sum

4 2 0 7

-3 5 -6 -3

8 1 2 3

-4 4 6 -1

Page 38: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finding Global Sum

1 7 -6 4

4 5 8 2

Page 39: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finding Global Sum

8 -2

9 10

Page 40: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finding Global Sum

17 8

Page 41: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Finding Global Sum

25

Binomial Tree

Page 42: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Agglomeration

Page 43: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Agglomeration

sum

sum sum

sum

Page 44: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

The n-body Problem

Page 45: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

The n-body Problem

Page 46: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

PartitioningDomain partitioningAssume one task per particleTask has particle’s position, velocity vectorIteration

Get positions of all other particlesCompute new position, velocity

Page 47: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Gather

Page 48: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

All-gather

Page 49: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Complete Graph for All-gather

Page 50: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Hypercube for All-gather

Page 51: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Communication Time

p

pnp

p

np

i

)1(

log2

log

1

1-i

Hypercube

Complete graph

p

pnp

pnp

)1(

)1()/

)(1(

Page 52: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Adding Data Input

Page 53: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Scatter

Page 54: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Scatter in log p Steps

12345678 56781234 56 12

7834

Page 55: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Summary: Task/channel ModelParallel computation

Set of tasksInteractions through channels

Good designsMaximize local computationsMinimize communicationsScale up

Page 56: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Summary: Design StepsPartition computationAgglomerate tasksMap tasks to processorsGoals

Maximize processor utilizationMinimize inter-processor communication

Page 57: Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu  CSCE569 Parallel Computing University of South Carolina Department of.

Summary: Fundamental AlgorithmsReductionGather and scatterAll-gather


Recommended