+ All Categories
Home > Documents > 6.189 - Lecture 9 - Debugging Parallel...

6.189 - Lecture 9 - Debugging Parallel...

Date post: 30-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
30
Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT 6.189 IAP 2007 Lecture 9 Debugging Parallel Programs
Transcript
Page 1: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT

6.189 IAP 2007

Lecture 9

Debugging Parallel Programs

Page 2: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

2 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Debugging Parallel Programs is Hard-er

● Parallel programs are subject to the usual bugs

● Plus: new timing and synchronization errors

● And: parallel bugs often disappear when you add code to try to identify the bug

Page 3: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

3 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Visual Debugging of Parallel Programs

● A global view of the multiprocessor architectureProcessors and communication links

● See which communication links are usedPerhaps even change the data in transmission

● Utilization of each processorCan identify blocked processors, deadlock

● “step” through functionality?Lack of a global clock

● Likely won’t help with data races

Page 4: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

4 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

TotalView

Page 5: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

5 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Debugging Parallel Programs

● Commercial debuggersTotalView, …

● The printf approach

● gdb, MPI gdb, ppu/spu gdb, …

● Research debuggersStreamIt Debugger, …

Page 6: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

6 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

StreamIt Debugger

Page 7: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

7 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Cell Debugger in Eclipse IDE

Page 8: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

8 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Pattern-based Approach to Debugging

● “Defect Patterns”: common kinds of bugs in parallel programs

Useful tips to prevent themRecipes for effective resolution

● Inspired by empirical studies at University of Maryland

http://fc-md.umd.edu/softwareday//presentations/Session0/Keynote.pdf

● At the end of this course, will try to identify some common Cell defect patterns based on your feedback and projects

Page 9: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

9 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Defect Pattern: Erroneous Use of Language Features

● ExamplesInconsistent parameter types for get/send and put/receiveRequired function callsInappropriate choice of functions

● SymptomsCompile-type error (easy to fix)Some defects may surface only under specific conditions – Number of processors, value of input, alignment issues

● CauseLack of experience with the syntax and semantics of new language features

● PreventionCheck unfamiliar language features carefully

Page 10: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

10 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Does Cell have too many functions?

spe_create_threadspe_wait

spe_write_in_mboxspe_stat_in_mbox

spe_read_out_mboxspe_stat_out_mbox

spe_write_signal

spe_get_lsspe_get_ps_area

spe_mfc_getspe_mfc_putspe_mfc_read_tag_status

spe_create_groupspe_get_event

mfc_getmfc_putmfc_stat_cmd_queuemfc_write_tag_maskmfc_read_tag_status_all/any/immediate

spu_read_in_mboxspu_stat_in_mbox

spu_write_out_mbox, spu_write_out_intr_mboxspu_stat_out_mbox, spu_stat_out_intr_mbox

spu_read_signal1/2spu_stat_signal1/2

spu_write_event_maskspu_read_event_statusspu_stat_event_statusspu_write_event_ack

spu_read_decrementerspu_write_decrementer

● Yes! But you may not need all of them● Understand a few basic features

Page 11: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

11 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Defect Pattern: Space Decomposition

● Incorrect mapping between the problem space and the program memory space

● SymptomsSegmentation fault (if array index is out of range)Incorrect or slightly incorrect output

● CauseMapping in parallel version can be different from that in serialversion– Array origin is different in every processor– Additional memory space for communication can complicate the

mapping logic

● PreventionValidate memory allocation carefully when parallelizing code

Page 12: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

12 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Example Problem

● N cells, each of which holds an integer [0..9]cell[0]=2, cell[1]=1, …, cell[N-1]=3

● In each step, cells are updated using values of neighboring cellscellnext[x] = (cell[x-1] + cell[x+1]) mod 10cellnext[0]=(3+1), cellnext[1]=(2+6), …

Assume the last cell is connected to the first cell● Repeat for steps times

A sequence of N cells2 1 6 8 7 1 0 2 4 5 1 … 3

Example adapted from Taiga Nakamura

Page 13: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

13 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Sequential Implementation

● Approach to implementationUse an integer array buffer[] for current cell valuesUse a second array nextbuffer[] to store the values for next stepSwap the buffers

Page 14: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

14 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

/* Initialize cells */int x, n, *tmp;int *buffer = (int*)malloc(N * sizeof(int));int *nextbuffer = (int*)malloc(N * sizeof(int));FILE *fp = fopen("input.dat", "r");if (fp == NULL) { exit(-1); }for (x = 0; x < N; x++) { fscanf(fp, "%d", &buffer[x]); }fclose(fp);

/* Main loop */for (n = 0; n < steps; n++) {

for (x = 0; x < N; x++) {nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N]) % 10;

}tmp = buffer; buffer = nextbuffer; nextbuffer = tmp;

}

/* Final output */...free(nextbuffer); free(buffer);

Sequential C Code

Example adapted from Taiga Nakamura

Page 15: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

15 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Approach to a Parallel Version

● Each processor keeps 1/size cellssize = number of processors

● Each processor needs to:update the locally-stored cellsexchange boundary cell values betweenneighboring processes

2 1 6 8 7 1 0 2 4 5 1 … 3

2 1 …P0 …

P1

…P(size-1)

P2

Example adapted from Taiga Nakamura

Page 16: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

16 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

nlocal = N / size;buffer = (int*)malloc((nlocal+2) * sizeof(int));nextbuffer = (int*)malloc((nlocal+2) * sizeof(int));

/* Main loop */for (n = 0; n < steps; n++) {

for (x = 0; x < nlocal; x++) {nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N]) % 10;

}/* Exchange boundary cells with neighbors */...

tmp = buffer; buffer = nextbuffer; nextbuffer = tmp;}

Decomposition

…buffer[]

0 (nlocal+1)

Where are the bugs?

Example adapted from Taiga Nakamura

Page 17: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

17 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

nlocal = N / size;buffer = (int*)malloc((nlocal+2) * sizeof(int));nextbuffer = (int*)malloc((nlocal+2) * sizeof(int));

/* Main loop */for (n = 0; n < steps; n++) {

for (x = 0; x < nlocal; x++) {nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N]) % 10;

}/* Exchange boundary cells with neighbors */...

tmp = buffer; buffer = nextbuffer; nextbuffer = tmp;}

Decomposition

…buffer[]

0 (nlocal+1)

Where are the bugs?

N may not be divisible by size

(x = 1; x < nlocal+1; x++)

Example adapted from Taiga Nakamura

Page 18: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

18 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Defect Pattern: Synchronization

● Improper coordination between processesWell-known defect type in parallel programmingDeadlocks, race conditions

● SymptomsProgram hangsIncorrect/non-deterministic output

● CausesSome defects can be very subtleUse of asynchronous (non-blocking) communication can lead to more synchronization defects

● PreventionsMake sure that all communication is correctly coordinated

Page 19: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

19 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

/* Main loop */for (n = 0; n < steps; n++) {

for (x = 1; x < nlocal+1; x++) {nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N]) % 10;

}/* Exchange boundary cells with neighbors */receive (&nextbuffer[0], (rank+size-1)%size);send (&nextbuffer[nlocal], (rank+1)%size);receive (&nextbuffer[nlocal+1], (rank+1)%size);send (&nextbuffer[1], (rank+size-1)%size);tmp = buffer; buffer = nextbuffer; nextbuffer = tmp;

}

Communication Where are the bugs?

● Deadlock …

0 (nlocal+1)Example adapted from Taiga Nakamura

Page 20: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

20 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Modes of Communication

● Recall there are different types of sends and receives

SynchronousAsynchronousBlockingNon-blocking

● Tips for orchestrating communicationAlternate the order of sends and receivesUse asynchronous and non-blocking messageswhere possible

Page 21: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

21 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Defect Pattern: Side-effect of Parallelization

● Ordinary serial constructs may have unexpected side-effects when they used concurrently

● SymptomsVarious correctness and performance problems

● CausesSequential part of code is overlookedTypical parallel programs contain only a few parallel primitives, and the rest of the code is a sequential program running many times

● PreventionDon’t just focus on the parallel codeCheck that the serial code is working on one processor, but remember that the defect may surface only in a parallel context

Page 22: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

22 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

/* Initialize cells with input file */fp = fopen("input.dat", "r");if (fp == NULL) { exit(-1); }nskip = ...for (x = 0; x < nskip; x++) { fscanf(fp, "%d", &dummy);}for (x = 0; x < nlocal; x++) { fscanf(fp, "%d", &buffer[x+1]);}fclose(fp);

/* Main loop */...

Data I/O in SPMD Program Where are the bugs?

Example adapted from Taiga Nakamura

Page 23: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

23 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

/* Initialize cells with input file */fp = fopen("input.dat", "r");if (fp == NULL) { exit(-1); }nskip = ...for (x = 0; x < nskip; x++) { fscanf(fp, "%d", &dummy);}for (x = 0; x < nlocal; x++) { fscanf(fp, "%d", &buffer[x+1]);}fclose(fp);

/* Main loop */...

Data I/O in SPMD Program Where are the bugs?

● File system may cause performance bottleneck if all processors access the same file simultaneously

● Schedule I/O carefully

Example adapted from Taiga Nakamura

Page 24: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

24 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Data I/O in SPMD Program

● Often only one processor (master) needs to do the I/O

Where are the bugs?

/* Initialize cells with input file */if (rank == MASTER) {fp = fopen("input.dat", "r");if (fp == NULL) { exit(-1); }for (x = 0; x < nlocal; x++) { fscanf(fp, "%d", &buffer[x+1]);}for (p = 1; p < size; p++) {

/* Read initial data for process p and send it */}fclose(fp);}else {

/* Receive initial data*/}

Example adapted from Taiga Nakamura

Page 25: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

25 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

/* What if we initialize cells with random values... */srand(time(NULL));for (x = 0; x < nlocal; x++) {

buffer[x+1] = rand() % 10;}

/* Main loop */...

Generating Initial Data Where are the bugs?

Example adapted from Taiga Nakamura

Page 26: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

26 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

/* What if we initialize cells with random values... */srand(time(NULL));for (x = 0; x < nlocal; x++) {

buffer[x+1] = rand() % 10;}

/* Main loop */...

Generating Initial Data Where are the bugs?

● All processors might use the same pseudo-random seed (and hence sequence), spoiling independence

● Hidden serialization in rand() causes performance bottleneck

srand(time(NULL) + rank);

Example adapted from Taiga Nakamura

Page 27: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

27 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Defect Pattern: Performance Scalability

● SymptomsSub-linear scalabilityPerformance much less than expectedMost time spent waiting

● CausesUnbalanced amount of computationLoad balancing may depend on input data

● PreventionMake sure all processors are “working” in parallelProfiling tools might help

Page 28: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

28 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Summary

● Some common bugs in parallel programmingErroneous use of language featuresSpace decompositionSide-effect of parallelizationSynchronizationPerformance scalability

● There are other kinds of bugs as well: data race

Page 29: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

29 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Comment on Data Race Detection

● Trace analysis can helpExecute programGenerate trace of all memory accesses and synchronization operations Build a graph of orderings (solid arrows below) and conflicting memory references (dashed lines below)Detect races (when two nodes connected by dashed lines are not ordered by solid arrows)

● Intel Thread Checker is an exampleMore tools available for automatic race detection

Page 30: 6.189 - Lecture 9 - Debugging Parallel Programsgroups.csail.mit.edu/cag/ps3/lectures/6.189-lecture9...Dr. Rodric Rabbah, IBM. 8 6.189 IAP 2007 MIT Pattern-based Approach to Debugging

30 6.189 IAP 2007 MITDr. Rodric Rabbah, IBM.

Trend in Debugging Technology

● Trace-based● Checkpointing● Replay

● One day… you’ll have the equivalent of TiVo for debugging your programs


Recommended