Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | thanhvan34 |
View: | 223 times |
Download: | 0 times |
of 29
8/11/2019 Design Parallel Programs
1/29
Tran, Van Hoai
Designing Parallel Programs
Dr. Tran, Van HoaiDepartment of Systems & Networking
Faculty of Computer Science and EngineeringHCMC University of Technology
E-mail:[email protected]
2009-2010
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
2/29
Tran, Van Hoai
Issues
Considerations
Parallel machine architectures Decomposition strategies
Programming models
Performance aspects: scalability, load balance Parallel debugging, analysis, tuning
I/O on parallel machines
Not easy to suggest a methodical approach
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
3/29
Tran, Van Hoai
Steps in Designing (I. Foster) Partitioning(phn hoch): decomposing the problem into small tasks
which can be performed in parallel
Communication(lin lc): determining communication structures, al-
gorithms to coordinate tasks
Agglomeration(kt t): combining the tasks into larger ones consid-ering performance requirements and implementation costs.
Mapping(nh x): assigning tasks to processors to maximize processorutilization and to minimize communication costs.
Problem Partition Communicate
Map
Agglomerate
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
4/29
Tran, Van Hoai
Other practical issues
Data distribution: input/output & intermediate data
Data access: management the access of shared data Stage synchronization
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
5/29
Tran, Van Hoai
Partition (Decomposition)Tasks:programmer-definedunits of computation
Tasks can be executedsimultaneously
Once defined, tasks areindivisibleunits of computation
Fine-grained decomposition
Two dimensions of decomposition:
Domain decomposition: data associated with the problem
Functional decomposition: computation operating on thedata
Avoiding the replication
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
6/29
Tran, Van Hoai
Domain DecompositionSteps:
Dividing the data into equally-sized small tasks
Input/output & intermediate data Different partitions may be possible
Different decompositions may exist for different phases
Determing the operations of computation on each taskTask = (data,operations)
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
7/29
Tran, Van Hoai
Functional DecompositionSteps:
Dividing the computation into disjoint tasks
Examining data requirements of the tasksAvoiding data replication
Hydrology Model Ocean Model
Atmospheric Model
Land Surface Model
Climate model
Search tree can be con-sidered as functional de-
composition
Functional decomposition is a program structuring tech-nique (modularity)Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
8/29
Tran, Van Hoai
Decomposition Methods
Domain decomposition (data decomposition)
Functional decomposition (task decomposition) Recursive decomposition
Exploratory decomposition
Speculative decomposition
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
9/29
Tran, Van Hoai
Recursive Decomposition
Suitable for problems that can be solved using the divide-
and-conquer paradigmEach of thesubproblemsgenerated by divide step becomes
tasks
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
10/29
Tran, Van Hoai
Quick Sort
QUICKSORT( A, q, r)if q < rthen
x:= A[q]s:= q
for i:= q+ 1to rdoif A[i] xthen
s:= s+ 1
swap( A[s], A[i])end if
end forswap( A[q], A[s])
QUICKSORT( A, q, s)QUICKSORT( A, s+ 1, r)end if
3 2 1 5 8 4 3 7
1 2 3 8 4 3 7
1 2 3 3 4 5 8 7
1 2 3 4 5 7 83
1 2 3 3 4 5 7 8
5 Pivot
Final position
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
11/29
Tran, Van Hoai
Quick Sort
5 11211 10 6 8 3 7 4 9 2
3 4 21
1 2 3 4
1 2 3 4
5 12 11 10 6 8 7 9
5 6 8 7 9 12 1110
5 6 7 8
5 6 7 8
9 10 12 11
10 1211
11 12
Quicksort task-dependency graph based on recursive decomposition
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
12/29
Tran, Van Hoai
Minimum Finding
Possibly use divide-and-conquer algorithms to solve problems which aretraditionally solved by non divide-and-conquer approaches
FINDMIN( A)min:= A[0]
s:= q
for i:= 1to n 1doif A[i]< minthen
min:= A[i]
end ifend for
RECURSIVE_MINFIND(A, n)if n= 1then
min:= A[0]
else
lmin:= RECURSIVE_FINDMIN( A, n/2)rmin:=RECURSIVE_FINDMIN( &A[n/2], nn/2)if lmin < rminthen
min:=lminelse
min:=rmin
end if
end if
Parallel Computing 2009-2010
T V H i
8/11/2019 Design Parallel Programs
13/29
Tran, Van Hoai
Domain Decomposition
Operating on large amounts of data
Often performed in two steps: Partitioning the data
Inducing the computational partitioning from the data partitioning
Data to be partitioned: input/out/intermediate
Parallel Computing 2009-2010
T V H i
8/11/2019 Design Parallel Programs
14/29
Tran, Van Hoai
Domain DecompositionDense matrix-vector multiplication
task 1
task 2
task n
A b y
21 n
3-D grid decomposition
Parallel Computing 2009-2010
Tran Van Hoai
8/11/2019 Design Parallel Programs
15/29
Tran, Van Hoai
Matrix-Matrix Multiplication
Partitioning the output data
A11 A12
A21 A22
. B
11 B12
B21 B22
C
11 C12
C21 C22
Partitioning
Task 1: C11=A11B11+A12B21
Task 2: C12=A11B12+A12B22
Task 3: C21=A21B11+A22B21
Task 4: C22=A21B12+A22B22
Parallel Computing 2009-2010
Tran Van Hoai
8/11/2019 Design Parallel Programs
16/29
Tran, Van Hoai
Matrix-Matrix Multiplication
There are different decompositions of computationsDecomposition 1
Task 1: C11=A11B11
Task 2: C11=C11+A12B21
Task 3: C12=A11B12Task 4: C12=C12+A12B22
Task 5: C21=A21B11
Task 6: C21=C21+A22B21Task 7: C22=A21B12
Task 8: C22=C22+A22B22
Decomposition 2
Task 1: C11=A11B11
Task 2: C11=C11+A12B21
Task 3: C12=A12B22Task 4: C12=C12+A11B12
Task 5: C21=A22B21
Task 6: C21=C21+A21B11Task 7: C22=A21B12
Task 8: C22=C22+A22B22
Parallel Computing 2009-2010
Tran Van Hoai
8/11/2019 Design Parallel Programs
17/29
Tran, Van Hoai
Matrix-Matrix MultiplicationPartitioning the intermediate dataStage 1
A11 A12A21 A22
. B11 B12
B21 B22
D111 D112
D122 D122
D211 D212
D222 D222
Stage 2
D111 D112
D122 D122
+
D211 D212
D222 D222
C11 C12
C21 C22
Parallel Computing 2009-2010
Tran Van Hoai
8/11/2019 Design Parallel Programs
18/29
Tran, Van Hoai
Matrix-Matrix MultiplicationA decomposition induced by a partitioning ofD
Task 01: D111=A11B11
Task 02: D211=A12B21
Task 03: D112=A11B12Task 04: D212=A12B22
Task 05: D121=A21B11
Task 06: D221=A22B21
Task 07: D122=A21B12
Task 08: D222=A22B22
Task 09: C11=D111+D211
Task 10: C12=D112+D212Task 11: C12=D121+D221
Task 12: C22=D122+D222
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
19/29
Tran, Van Hoai
Matrix-Matrix Multiplication
A 21
A 11 B 11 B 12
D 121 D 122
D 111 D 112
A 22
A 12
B 21 B 22 D 221 D 222
D 211 D 212
C 22C 21
C12
C11
1 2 3 4 5 6 7 8
9 10 11 12
+
Taskdependency graph
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
20/29
,
Domain DecompositionMost widely-used decomposition technique
Large problems often have large amounts of data
Splitting work based on data is natural way to obtain a high concur-rency
Can be combined with other methods
2
2 1
1
3 7 9 11 4 5 8 1 Domain decomposition
Recursive Decomposition
10 6 137 19 3 9
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
21/29
Exploratory Decomposition
Decomposing computations corresponding to a search of
a space of solutionsNot as general purpose
Possibly resulting in speedup anomalies Slow-down or superlinear speedup
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
22/29
Solution
Total sequential work: 2m+1
Total parallel work: 1 Total parallel work: 4mTotal sequential work: m
m m m m m m m m
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
23/29
Speculative DecompositionExtracting concurrency in problems in which next steps is
one of many possible actions that can only be determined
when the current task finishesPrinciple:
Assuming a certain outcome of currently executed tasks
Executing some of the next steps (speculation)
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
24/29
A
B
C
D
E
F
G
H
ISysteminouts
SystemOutput
Network of discrete event simulation
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
25/29
Speculative ExecutionIf predictions are wrong
Work is wasted
Work may need to beundonestate-restoration overhead
Onlyway to extract concurrency
Parallel Computing 2009-2010
8/11/2019 Design Parallel Programs
26/29
Tran, Van Hoai
8/11/2019 Design Parallel Programs
27/29
CommunicationCommunication is specified in 2 phases:
Defining channel structure, (technology-dependent)
Specifying messages sent and received
Determining communication requirements in functional de-composition is easier than in domain decomposition
Data requirements among graph is presented astask-dependencygraph(TDG): certain task(s) can only start once someother task(s) have finished
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
28/29
Task-Dependency GraphKey concepts derived from task-dependency graph
Degree of concurrency: number of tasks can be concur-
rently executedCritical path: the longest vertex-weighted path
weights represent task size
Task granularityaffects both of the characteristics above
Parallel Computing 2009-2010
Tran, Van Hoai
8/11/2019 Design Parallel Programs
29/29
Task-Interaction Graph (TIG)Capture the pattern of interaction between tasks
TIG usually contains TDG as a subgraph
i.e., there may be interactions between tasks even if there are nodependencies (e.g., accesses on shared data)
70 1 2 3 4 5 6 98 1011
Task 11
Task 8
Task 4
Task 0
0
12
3
4
56 7
8 11109
TDG and TIG are important in developing effectivelymapping (maximize concurreny and minimize overheads)
Parallel Computing 2009-2010