Debugging Cluster Programs using symbolic debuggers
Transcript
Slide 1
Debugging Cluster Programs using symbolic debuggers
Slide 2
Debugging Code Careful review of your code Add debugging code
to your code print statements at strategic locations in code remove
later Use a symbolic debugger
Slide 3
Careful review of your code Rereading your code is often
helpful Most parallel code errors are serial errors Compare your
code to specs Take a break, review your code with a fresh brain
Have someone else help you review your code
Slide 4
Common sources of errors Beyond what the compiler catches
Usually run-time errors Incorrect use of pointers Point out of
memory Reference should have used a pointer Referenced wrong
variable Index initialized wrong, wrong exit condition
Slide 5
Common parallel errors Deadlock errors Receive before send
Receive, but no send Incorrect arguments in MPI calls Mismatch on
tags Mismatch of source/destination Misunderstanding of a the use
of an argument
Slide 6
Add Debugging Code Add strategically placed code in your code
to display critical information Watch values of variables as the
program progresses Can create data-dump functions call when you
need them Have a way to remove them in production code
Slide 7
Add Debugging Code Can be difficult to get the right debugging
code in the right place Does not scale well in parallel environment
Can produce unmanageable or unintelligible output
Slide 8
Symbolic Debuggers Allow you to inspect your code monitor its
behavior modify the data values on the fly as your code
executes
Slide 9
gdb GNU debugger
Slide 10
Frequently used GDB commands: break [file:]function - Set a
breakpoint at function (in file). run [arglist] - Start your
program (with arglist, if specified). bt - Backtrace: display the
program stack. print expr - Display the value of an expression. c -
Continue running your program (after stopping, e.g. at a
breakpoint). next - Execute next program line (after stopping);
step over any function calls in the line. step - Execute next
program line (after stopping); step into any function calls in the
line. help [name] - Show information about GDB command name, or
general information about using GDB. quit - Exit from GDB.
Slide 11
gdb
Slide 12
Slide 13
Slide 14
Slide 15
Running in X-windows Linux (Unix) to Linux ssh to host, login
and enter X application Other platforms (Windows, Mac) Use X-
windows server application VNC in most platforms VNC operates as a
remote control application in Linux VNC operates as a X-windows
server viewer for Windows, Macintosh, Solaris
Slide 16
Running in X-windows Using VNC ssh to host and login start
vncserver pay attention to display id (:n) from your desktop run
VNCViewer select host with correct display id After session kill
vncserver vncserver kill :n (n is display id number)
Slide 17
Using VNC
Slide 18
Slide 19
x desktop with VNC
Slide 20
ddd a graphic front end to gdb
Slide 21
pgdbg Debugger from the Portland Group (PGI) Can use with PG
compilers Can use with GNU compilers
Slide 22
pgdbg common commands Back to text mode for a bit lis[t] [count
| low:high | routine | line,count] -display lines from the source
code file or routine att[ach] [ | ] - attach to a running process
or start a local executable and attach to it, or start an
executable on c[ont] - continue executing from the current
location
Slide 23
pgdbg common commands det[ach] detach from the currently
attached process halt halt the executing process or thread n[ext]
[count] continue executing and stop after count lines of source
code nexti [count] continue executing and stop after count
instructions
Slide 24
pgdbg common commands q[uit] terminate pgdbg and exit ru[n]
[arg0 arg1 argn] run program from beginning with arguments arg0,
arg1 s[tep] [count] execute next count lines of source code and
stop. Step steps into called routines s[tep] up steps out of
current routine stepi [count] execute next count instructions and
stop. Steps into called routines
Slide 25
pgdbg common commands stepi up steps out of current routine and
stops Event command break line | function - sets a break point to
specified line or function. If no line or function specified lists
existing breakpoints. A break point stops execution at specified
point clear [all | line | func] clears all breakpoints, or a
breakpoint at line line or at function func.
Slide 26
pgdbg common commands stop var - break when the value of var
changes at a location watch expr stops and display the value of
expr when it changes track expr like watch except does not stop
execution trace var - displays a trace of source line execution
when the value of var changes
Slide 27
pgdbg common commands p[rint] var displays the value of a
variable edit filename evokes an editor to edit file filename. If
no filename given edits current file decl[aration name displays the
type declaration for the object name as[ign] var = expr - assigns
the value expr to the variable var proc [number] sets the current
process to process number number
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Resources gdb man gdb info gdb; Using GDB: A Guide to the GNU
Source- Level Debugger, Richard M. Stallman and Roland H. Pesch,
July 1991. ddd man ddd VNC http://www.uk.research.att.com/vnc/
http://www.realvnc.com