+ All Categories
Home > Documents > Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive...

Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive...

Date post: 17-Jan-2018
Category:
Upload: gregory-alexander
View: 226 times
Download: 0 times
Share this document with a friend
Description:
Before Programming Think about the structure of the program you are writing What are the data structures? Careful planning can lead to programs that are: –Easier to debug –Easier to understand later when modifications prove necessary There’s a whole industry around tools for program design
31
Debugging Kate Hedstrom August 2006
Transcript
Page 1: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Debugging

Kate HedstromAugust 2006

Page 2: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Overview• Think before coding• Common mistakes• Defensive programming• Core files• Interactive debugging• Other tips• Parallel bug story• Demo

Page 3: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Before Programming• Think about the structure of the program you are writing

• What are the data structures?• Careful planning can lead to programs that are:– Easier to debug– Easier to understand later when modifications

prove necessary• There’s a whole industry around tools for program design

Page 4: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Various Problems• Failure to compile

– First compiler message is valid, rest could be due to confusion caused by first error

• Failure to link– Missing routines– Missing libraries

• Failure to run• Runs but gives the wrong answer

Page 5: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Some Common Mistakes• Number and type of arguments• Misspelled variables• Uninitialized variables• Failure to match up do/if and the end do/if

• Index out of range• Array size too small• Parallel bugs

Page 6: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Defensive programming

• Let the compiler help you find problems

• Implicit none (Fortran)• Use modules or interface blocks to let the compiler check the argument count/type for you (Fortran 90)

• Check error codes on function calls• Write useful comments!

Page 7: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Messages• Assert (C/C++)#include <assert.h>assert(g == 9.8);• Fortran example:GET_2DFLD - unable to find requested variable:

In input file: /wrkdir/kate/….ERROR: Abnormal termination: NetCDF INPUTREASON: No error

Page 8: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

if (.not. Got_var) then write (stdout,10) trim(Vname(1,ifield)), trim(ncfile) exit_flag = 2 returnend ifstatus = nf_open(trim(ncfile), nf_nowrite, ncid)if (status .ne. nf_noerr) then write (stdout, 20) trim(ncfile) exit_flag = 3 ioerror = status returnend if10 format(/, ‘GET_2DFLD - unable to find …)20 format(/, ‘GET_2DFLD - unable to open NetCDF file:‘

a)

Page 9: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

• C example:If (init_graph(graph) == OKAY) { while ((count < MAX_EDGES) && !ferror(source) &&

!feof(source)) { if (fgets(line, MAX_LEN, source) != NULL){ linenum++; if (sscanf(line, “%d %d: %d\n” …) ==3) { : } else { fprintf(stderr, “%s[%s()] Error: sscanf

couldn’t parse line #%d\n”, progname, proc, linenum); fprintf(stderr, “line = \”%s\”\n”, line); return(-2); }

Page 10: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Modular Programming and Testing

• Write programs in components or modules

• Test them individually• There are “test harnesses” for creating and managing tests– Many gnu programs can be tested with

“make check” after the “make”

Page 11: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Other tips• Check cpp labels:

– ifdefs– ifnames

• Bounds checking (-C)• Floating point trap (-qflttrap=enable:invalid:imprecise on IBM)

• Try another compiler - and write portable code

• There is no shame in using print statements

Page 12: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Core files• Contain a binary dump of your program as it crashed

• Can extract a stack trace from it

• If you recompile -g, might have enough info in the core to solve your problem

• Check your limits - you might be truncating your cores

Page 13: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Causes of Core Files

• Not enough memory• Segmentation violation

– Not enough stack space– Wrong number of function arguments

• Floating point error if not using IEEE standard

• I/O error

Page 14: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

More on Core Files• Running “file” on it will tell you the executable name:

% file core Core: AIX core file fulldump 64-bit, ncra• I prefer dbx to totalview on core files:

% dbx ncra core (dbx) where abort() at 0x9000… nco_exit(??), line 28 in nco_ctl.c main(argc = 47, argv = 0x0fff….), line548 in ncra.c

Page 15: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Interactive Debuggers

• Totalview:– Is on both Cray and IBM– Has a gui– Works for parallel programs– Is worth learning– Isn’t my favorite debugger

• Text based debuggers:– dbx, gdb, etc– Some have had gui wrappers (xxgdb, for instance)

Page 16: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Debugger Uses

• Finding bugs• Help to understand the code

– Watch variables change– Watch the flow control– Perl debugger helped me learn Perl

Page 17: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Debugger Features• Set breakpoints• Execute:

– run/go– step– next

• View variables• Works for each process/thread• Debug the serial version first!

Page 18: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Tips for Totalview on IBM

• Use -qfullpath as well as -g compiler option

• When your program reads from standard input, invoke as:

totalview roms < roms.in• Doesn’t work right with -q64 or -qflttrap on IBM

Page 19: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

dbx/gdb Commands• help - list of commands• where - stack trace, call trace• print - give the value of an expression• break/stop in/stop at - set a breakpoint• run - start execution until first breakpoint

• cont - continue to next breakpoint• step - step into function• next - execute next command• list - list source code for next ten lines• quit - how to get out

Page 20: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Debugger Caveats

• Debuggers have bugs too• Developers code in C/C++, don’t focus on Fortran

• If you don’t know where the problem is, you can spend an awful lot of time in the debugger

Page 21: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Miscellaneous

• Any parallel program should be compared to the serial version

• Did you overflow your quota?• Can the processor see the filesystem?

• Try recompiling after “make clean”

• Are you solving the equations you think you’re solving?

Page 22: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Compiler Bugs• More common than you might think

• Again, try other compilers• Try turning off optimization• I once had a situation where adding a print statement made the problem go away

• Auto-parallel compilers are especially buggy

Page 23: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Parallel Bug Story

• It’s always a good idea to compare the serial and parallel runs

• I can plot the difference field between the two outputs

• I can create a differences file with ncdiff (part of NCO)

Page 24: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Differences after a Day

Page 25: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Differences after one

step - in a part of the

domain without ice

Page 26: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

What’s up?

• A variable was not being initialized properly - “if” statement without an “else”

• Both serial and parallel values are random junk

• Fixing this did not fix the one-day plot

Page 27: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Differences after a few steps - guess where the

tile boundaries

are

Page 28: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

What was That?• The ocean code does a check for water colder than the local freezing point

• It then forms ice and tells the ice model about the new ice

• It adjusts the local temperature and salinity to account for the ice growth (warmer and saltier)

• It failed to then update the salinity and temperature ghost points

Page 29: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

More…• Plotting the differences in surface temperature after one step failed to show this

• The change was very small and the single precision plotting code couldn’t catch it

• Differences did show up in timestep two of the ice variables

• Running ncdiff on the first step, then asking for the min/max values in temperature showed a problem

Page 30: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Debugging• I didn’t then know how to use totalview in parallel (fixed!)

• I don’t have good luck with totalview and 64-bit code

• Enclosing print statements inside if statements prevents each process from printing, possibly trying to print out-of-range values

• Find i,j value of the worst point from the diff file, print just that point - many fields

Page 31: Debugging Kate Hedstrom August 2006. Overview Think before coding Common mistakes Defensive programming Core files Interactive debugging Other tips Parallel.

Last Word

• In my field, it is the problems that blow up right away that are the easiest to fix. You can see things go bad in the debugger, perhaps in the very first timestep. The problems that blow up after days and days of cpu time are more challenging and might require a complete rewrite of the model.


Recommended