+ All Categories
Home > Documents > Implementing a Task Decomposition

Implementing a Task Decomposition

Date post: 23-Feb-2016
Category:
Upload: elam
View: 22 times
Download: 0 times
Share this document with a friend
Description:
Implementing a Task Decomposition. Introduction to Parallel Programming – Part 9. Review & Objectives. Previously: Described how the OpenMP task pragma is different from the for pragma - PowerPoint PPT Presentation
Popular Tags:
47
INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9
Transcript
Page 1: Implementing a Task  Decomposition

INTEL CONFIDENTIAL

Implementing a Task DecompositionIntroduction to Parallel Programming – Part 9

Page 2: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2

Review & Objectives

Previously:Described how the OpenMP task pragma is different from

the for pragmaShowed how to code task decomposition solutions for while

loop and recursive tasks, with the OpenMP task construct

At the end of this part you should be able to:Design and implement a task decomposition solution

Page 3: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Case Study: The N Queens Problem

3

Is there a way to placeN queens on an N-by-Nchessboard such thatno queen threatens

another queen?

Page 4: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

A Solution to the 4 Queens Problem

4

Page 5: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Exhaustive Search

5

Page 6: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #1 for Parallel Search

Create threads to explore different parts of the search tree simultaneously

If a node has childrenThe thread creates child nodesThe thread explores one child node itselfThread creates a new thread for every other

child node

6

Page 7: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #1 for Parallel Search

7

Thread W

Thread W NewThread X

NewThread Y

NewThread Z

Page 8: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Pros and Cons of Design #1

ProsSimple design, easy to implementBalances work among threads

ConsToo many threads createdLifetime of threads too shortOverhead costs too high

8

Page 9: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #2 for Parallel Search

One thread created for each subtree rooted at a particular depth

Each thread sequentially explores its subtree

9

Page 10: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #2 in Action

10

Thread1

Thread2

Thread3

Page 11: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Pros and Cons of Design #2

ProsThread creation/termination time minimized

ConsSubtree sizes may vary dramaticallySome threads may finish long before othersImbalanced workloads lower efficiency

11

Page 12: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #3 for Parallel Search

Main thread creates work pool—list of subtrees to explore

Main thread creates finite number of co-worker threads

Each subtree exploration is done by a single threadInactive threads go to pool to get more work

12

Page 13: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Work Pool Analogy

More rows than workersEach worker takes an

unpicked row and picks the crop

After completing a row, the worker takes another unpicked row

Process continues until all rows have been harvested

13

Page 14: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #3 in Action

14

Thread1

Thread2

Thread3

Thread3

Thread1

Page 15: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Pros and Cons of Strategy #3

ProsThread creation/termination time minimizedWorkload balance better than strategy #2

ConsThreads need exclusive access to data

structure containing work to be done, a sequential component

Workload balance worse than strategy #1Conclusion

Good compromise between designs 1 and 2

15

Page 16: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Implementing Strategy #3 for N Queens

Work pool consists of N boards representing N possible placements of queen on first row

16

Page 17: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Parallel Program Design

One thread creates list of partially filled-in boardsFork: Create one thread per coreEach thread repeatedly gets board from list, searches

for solutions, and adds to solution count, until no more board on list

Join: Occurs when list is emptyOne thread prints number of solutions found

17

Page 18: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Search Tree Node Structure

/* The ‘board’ struct contains information about a node in the search tree; i.e., partially filled- in board. The work pool is a singly linked list of ‘board’ structs. */

struct board { int pieces; /* # of queens on board*/ int places[MAX_N]; /* Queen’s pos in each row

*/ struct board *next; /* Next search tree node */};

18

Page 19: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Key Code in main Function

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;search_for_solutions (n, stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

19

Page 20: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Insertion of OpenMP Code

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;

#pragma omp parallel search_for_solutions (n, stack, &num_solutions);

printf ("The %d-queens puzzle has %d solutions\n", n, num_solutions);

20

Page 21: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Original C Function to Get Work

void search_for_solutions (int n, struct board *stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);

while (stack != NULL) { ptr = stack; stack = stack->next; search (n, ptr, num_solutions); free (ptr); }}

21

Page 22: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

C/OpenMP Function to Get Work

void search_for_solutions (int n, struct board *stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);

while (stack != NULL) {#pragma omp critical{ ptr = stack; stack = stack->next; }

search (n, ptr, num_solutions); free (ptr); }}

22

Page 23: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Original C Search Function

void search (int n, struct board *ptr,int *num_solutions)

{ int i; int no_threats (struct board *);

if (ptr->pieces == n) { (*num_solutions)++; } else { ptr->pieces++; for (i = 0; i < n; i++) { ptr->places[ptr->pieces-1] = i; if (no_threats(ptr))

search (n, ptr, num_solutions); } ptr->pieces--; }}

23

Page 24: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

C/OpenMP Search Function

void search (int n, struct board *ptr,int *num_solutions)

{ int i; int no_threats (struct board *);

if (ptr->pieces == n) { #pragma omp critical (*num_solutions)++; } else { ptr->pieces++; for (i = 0; i < n; i++) { ptr->places[ptr->pieces-1] = i; if (no_threats(ptr))

search (n, ptr, num_solutions); } ptr->pieces--; }}

24

Page 25: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Only One Problem: It Doesn’t Work!

OpenMP program throws an exceptionCulprit: Variable stack

25

Heap

stack

Page 26: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Problem Site

int main (){ struct board *stack; ... #pragma omp parallel search_for_solutions(n, stack, &num_solutions); ...}

void search_for_solutions (int n, struct board *stack, int *num_solutions){ ... while (stack != NULL) ...

26

Page 27: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

1. Both Threads Point to Top

27

stack stack

Thread 1 Thread 2

Page 28: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2. Thread 1 Grabs First Element

28

stack

Thread 1 Thread 2

stackptr

Page 29: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

3. Thread 2 Grabs “Next” Element

29

Thread 1 Thread 2

stackptr

stackptr

Error #1Thread 2

grabs same element

Page 30: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

4. Thread 1 Deletes Element

30

stack

Thread 1 Thread 2

stackptr

?

Error #2Thread 2’s stack pointer dangles

Page 31: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

31

stack

Thread 1 Thread 2

stackptr

Thread 1 gets hits critical

region & reads stack

Page 32: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

32

stack

Thread 1 Thread 2

stackptr

Thread 1 copies stack to ptr

Page 33: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

33

stack

Thread 1 Thread 2

stackptr

Thread 1 advances stack

Page 34: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

34

stack

Thread 1 Thread 2

stackptr

Thread 1 exits critical region

Page 35: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

35

stack

Thread 1 Thread 2

stackptr

?Thread 1 frees ptr

Thread 2 stack points to

undefined value

Page 36: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

36

Thread 1 Thread 2

stack

Page 37: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

37

Thread 2

stack

Thread 1

stackptr

stackptr

Why would this work?

Page 38: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

38

Thread 2

stack

Thread 1

stackptr

stackptr

Thread 1 enters critical region

Page 39: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

39

Thread 2

stack

Thread 1

stackptr

stackptr

Thread 1 copies stack to ptr

Page 40: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

40

Thread 2

stack

Thread 1

stackptr

stackptr

Thread 1 advances stack

Page 41: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

41

Thread 2

stack

Thread 1

stackptr

stackptr

Thread 1 exits critical region

Page 42: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

42

Thread 2

stack

Thread 1

stackptr

stackptr

Thread 1 frees ptr – no dangling

memory

Page 43: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 2: Use Indirection (Best choice)

43

Thread 1 Thread 2

&stack

Now data is encapsulated inside function calls and no longer susceptible to

overwriting “global/static” variable

Page 44: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Corrected main Function

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;#pragma omp parallel search_for_solutions (n, &stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

44

Page 45: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Corrected Stack Access Function

void search_for_solutions (int n, struct board **stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);

while (*stack != NULL) {#pragma omp critical{ ptr = *stack;

*stack = (*stack)->next; } search (n, ptr, num_solutions); free (ptr); }}

45

Page 46: Implementing a Task  Decomposition

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

References

Rohit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon, Parallel Programming in OpenMP, Morgan Kaufmann (2001).

Barbara Chapman, Gabriele Jost, Ruud van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, MIT Press (2008).

Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

46

Page 47: Implementing a Task  Decomposition

Recommended