Home >Documents >Parallel Debugging Techniques & Introduction to Totalview

Parallel Debugging Techniques & Introduction to Totalview

Date post:25-Feb-2016
Category:
View:68 times
Download:0 times
Share this document with a friend
Description:
Parallel Debugging Techniques & Introduction to Totalview. Le Yan Louisiana Optical Network Initiative. Outline. Overview of parallel debugging Challenges Tools Strategies Get familiar with TotalView through hands-on exercises. Outline. Overview of parallel debugging Challenges Tools - PowerPoint PPT Presentation
Transcript:

Parallel Debugging Techniques

Parallel Debugging Techniques& Introduction to TotalviewLe YanLouisiana Optical Network Initiative7/6/2010Scaling to Petascale Virtual Summer School 0OutlineOverview of parallel debuggingChallengesToolsStrategiesGet familiar with TotalView through hands-on exercises7/6/2010Scaling to Petascale Virtual Summer School1OutlineOverview of parallel debuggingChallengesToolsStrategiesGet familiar with TotalView through hands-on exercises7/6/2010Scaling to Petascale Virtual Summer School2Bugs in Parallel ProgrammingParallel programs are prone to the usual bugs found in sequential programsImproper pointer usageStepping over array boundsInfinite loopsPlus7/6/2010Scaling to Petascale Virtual Summer School3Common Types of Bugs in Parallel ProgrammingErroneous use of language featuresMismatched parameters, missing mandatory calls etc.Defective space decompositionIncorrect/improper synchronizationHidden serialization

http://www.hpcbugbase.org/index.php/Main_Page7/6/2010Scaling to Petascale Virtual Summer School4Debugging EssentialsReproducibilityFind the scenario where the error is reproducibleReductionReduce the problem to its essenceDeductionForm hypotheses on what the problem might beExperimentationFilter out invalid hypothesesTerrence Parr, Learn The Essentials of Debugginghttp://www.ibm.com/developerworks/web/library/wa-debug.html?ca=dgr-lnxw03Dbug7/6/2010Scaling to Petascale Virtual Summer School5Challenges in Parallel DebuggingReproducibilityMany problems cannot be easily reproducedReductionSmallest scale might still be too large and complex to handleDeductionNeed to consider concurrent and interdependent program instancesExperimentationCyclic debugging might be very expensive7/6/2010Scaling to Petascale Virtual Summer School6Bugs: A Little ExampleWhat is the potential problem with large core count?integer*4 :: i,ista,iendinteger*4 :: chunksize=1024*1024call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error)ista=myrank*chunksize+1iend=(myrank+1)*chunksizedo i = ista,iend enddo7/6/2010Scaling to Petascale Virtual Summer School7Bugs: A Little ExampleA bug that shows up only when running with more than 4096 coresinteger*4 :: i,ista,iendinteger*4 :: chunksize=1024*1024call MPI_Comm_Rank(MPI_COMM_WORLD, & myrank,error)ista=myrank*chunksize+1iend=(myrank+1)*chunksizedo i = ista,iend enddoInteger overflow if myrank 4096 7/6/2010Scaling to Petascale Virtual Summer School8Debugging with write/printfVery easy to use and most portable, butNeed to edit, recompile and rerun when additional information is desiredMay change program behaviorOnly capable of displaying a subset of the programs stateOutput size grows rapidly with increasing core count and harder to comprehendNot recommended7/6/2010Scaling to Petascale Virtual Summer School9Compilers Can HelpMost compilers can (at runtime)Check array boundsTrap floating operation errorsProvide traceback informationRelatively scalable, butOverhead addedLimited capabilityNon-interactive7/6/2010Scaling to Petascale Virtual Summer School10Parallel DebuggersCapable of what serials debuggers can doControl program executionSet action pointsView/edit values of variablesMore importantlyControl program execution at various levelsGroup/process/threadDisplay communication status between processes

7/6/2010Scaling to Petascale Virtual Summer School11An Ideal Parallel DebuggerShould allow easy process/thread control and navigationShould support multiple high performance computing platformsShould not limit the number of processes being debugged and should allow it to vary at runtime

7/6/2010Scaling to Petascale Virtual Summer School12How Parallel Debuggers WorkFrontendGUIDebugger engineDebugger AgentsControl application processesSend data back to the debugger engine to analyze

Compute nodesInteractive nodeUser processesDebugger engineAgentAgentAgentGUI7/6/2010Scaling to Petascale Virtual Summer School13Debugging at Very Large ScaleThe debugger itself becomes a large parallel applicationBottlenecksDebugger framework startup costCommunication between frontend and agentsAccess to shared resources, e.g. file system7/6/2010Scaling to Petascale Virtual Summer School14Validation Is CrucialHave a solid validation procedure to check the correctnessTest smaller components before putting them together

7/6/2010Scaling to Petascale Virtual Summer School15General Parallel Debugging StrategyIncremental debuggingDownscale if possibleParticipating processes, problem size and/or number of iterationsExample: run with one single thread to detect scope errors in OpenMP programsAdd more instances to reveal other issuesExample: run MPI programs on more than one node to detect problems introduced by network delays

7/6/2010Scaling to Petascale Virtual Summer School16Strategy at Large ScaleAgain, downscale if possibleReduce the number of processes to which the debugger is attachedReduces overheadReduces the required number of license seats as wellFocus on one or a small number of processes/threadsAnalyze call path and message queues to find problematic processesControl the execution of as few processes/threads as possible while keeping others runningProvides the context where the error occurs

7/6/2010Scaling to Petascale Virtual Summer School17Trends in Debugging TechnologyLightweight trace analysis tools Help to identify processes/threads that have similar behavior and reduce the search spaceComplementary to full feature debuggersExample: Stack Trace Analysis Tool (STAT)Replay/Reverse executionReplayEngine now available from TotalViewPost-mortem statistical analysisDetect anomalies by analyzing profile dissimilarity of multiple runs7/6/2010Scaling to Petascale Virtual Summer School18OutlineOverview of parallel debuggingChallengesToolsStrategiesGet familiar with TotalView through hands-on exercises7/6/2010Scaling to Petascale Virtual Summer School19What Is TotalViewA powerful debugger for both serial and parallel programsSupport Fortran, C/C++ and AssemblerSupported on most platformsBoth graphic and command line interfaceFeaturesCommon debugging functions such as execution control and breakpointsMemory debuggingReverse debuggingBatch mode debuggingRemote debugging client7/6/2010Scaling to Petascale Virtual Summer School20Three Ways to Start TotalViewStart with core dumpsStart by attaching to one or more running processesStart the executable within TotalView

7/6/2010Scaling to Petascale Virtual Summer School21User Interface - Root Window

Host nameStatusTotalView IDMPI RankStatus CodeDescriptionBlankExitedBAt breakpointEErrorHHeldKIn kernelMMixedRRunningTStoppedWAt watchpoint7/6/2010Scaling to Petascale Virtual Summer School22User Interface Process Window

Stack trace paneCall stack of routinesStack frame paneLocal variables, registers and function parametersSource paneSource codeAction points, processes, threads paneManage action points, processes and threads7/6/2010Scaling to Petascale Virtual Summer School23Control Commands7/6/2010Scaling to Petascale Virtual Summer School24TotalViewDescriptionGoStart/resume executionHaltStop executionKillTerminate the jobRestartRestarts a running programNextRun to the next source line without stepping into another functionStepRun to next source lineOutRun to the completion of current functionRun toRun to the indicated location

Controlling ExecutionThe process window always focuses on one process/threadSwitch between processes/threadsp+/p-, t+/t-, double click in root window, process/thread tabNeed to set the appropriate scope whenGiving control commandsSetting action points7/6/2010Scaling to Petascale Virtual Summer School25Process/Thread Groups

Scope of commands and action pointsGroup(control)All processes and threadsGroup(workers)All threads that are executing user codeRank XCurrent process and its threadsProcess(workers)User threads in the current processThread X.YCurrent threadUser defined groupGroup -> Custom Groups, orCreate in call graph

7/6/2010Scaling to Petascale Virtual Summer School26Types of Action PointsBreakpoints stop the execution of the processes and threads that reach itEvaluation points: stop and execute a code fragment when reachedUseful when testing small patchesProcess barrier points synchronize a set of processes or threadsWatchpoints monitor a location in memory and stop execution when its value changesUnconditionalConditional7/6/2010Scaling to Petascale Virtual Summer School27Setting Action PointsBreakpointsRight click on a source line -> Set breakpointClick on the line numberWatch pointsRight click on a variable -> Create watchpointBarrier pointsRight click on a source line -> Set barrierEdit action point propertyRight click on a action point in the Action Points tab -> Properties7/6/2010Scaling to Petascale Virtual Summer School28Viewing/Editing DataView values and types of variablesAt one process/threadAcross all processes/threadsEdit variable value and typeArray DataSlicingFiltering VisualizationStatistics7/6/2010Scaling to Petascale Virtual Summer School29

Viewing Dynamic Arrays in C/C++Edit type in the variable windowTell TotalView how to access the memory from a starting locationExampleTo view an array of 100 integersChange Int * to int[100]*7/6/2010Scaling to Petascale Virtual Summer School30MPI Message Queues

DetectDeadlocksLoad balancing issuesTools -> Message Queue GraphMore options available

7/6/2010Scaling to Petascale Virtual Summer School31Call GraphTools -> Call graphQuick view of program stateNodes: functionsEdges: callsLook for outliers

7/6/2010Scaling to Petascale Virtual Summer School32Attaching to/Detaching from Processes You canAttach to one or more running processes after launching TotalViewLaunch the program within TotalView and detach from/reattach to any subset of processes later on

7/6/2010Scaling to Petascale Virtual Summer School33Memory DebuggingFeaturesMemory usage reportError detectionMemory leakDangling pointerMemory corruptionEvent notificationDeallocation/reallocationMemory comparison between processes

7/6/2010Scaling to Petascale Virtual Summer School34Memory Debugging - UsageNeed to link to the TotalView heap library to monitor heap statusThe name of the library is platform dependentTo access memory debugging functionsPrior to 8.7Tools -> Memory debuggingSince 8.7Debug -> Open MemoryScape7/6/2010Scaling to Petascale Virtual Summer School35ReferencesTotalView user manualhttp://www.totalviewtech.com/support/documentation/totalview/index.htmlLLNL TotalView tutorialhttps://computing.llnl.gov/tutorials/totalviewNCSA Cyberinfrastructure TutorDebugging Serial and Parallel Codes courseHPCBugBasehttp://hpcbugbase.org/index.php/Main_Page7/6/2010Scaling to Petascale Virtual Summer School36Hands-on ExerciseDebug MPI and OpenMP programs that solve a simple problem to get familiar withBasic functionalities of parallel debuggersTotalView: BigRed, Kraken, Steele and Queen BeeDDT: BigRed, Kraken, Ranger and LonestarSome common types of bugs in parallel programmingPrograms and instructions can be found at http://www.cct.lsu.edu/~lyan1/summerschool10

7/6/2010Scaling to Petascale Virtual Summer School37Problem01234567845A 1-D periodic array with N elementsInitial valueC: cell(x)=x%10Fortran: cell(x)=mod(x-1,10)In each iteration, all elements are updated with the value of two adjacent elements:cell(x)i+1=[cell(x-1)i+cell(x+1)i]%10Execute Niter iterationsThe final outputs are the global maximum and averagehttp://www.hpcbugbase.org/index.php/Main_Page7/6/2010Scaling to Petascale Virtual Summer School38Sequential ProgramUse an integer array to hold current valuesUse another integer array to hold the calculated valuesSwap the pointers at the end of each iterationThe result is used to check the correctness of the parallel programsChances are that we will not have such a luxury for large jobs7/6/2010Scaling to Petascale Virtual Summer School39MPI ProgramDivide the array among n processesEach process works on its local arrayExchange boundary data with neighbor processes at the end of each iterationRing topology 01234567845501256567823789050Process 1Process nProcess 27/6/2010Scaling to Petascale Virtual Summer School40OpenMP ProgramEach thread works on its own part of the global arrayAll threads have access to the entire array, so no data exchange is necessary

01234567845Thread 0Thread 1 Thread n7/6/2010Scaling to Petascale Virtual Summer School41

Popular Tags:
of 42/42
Parallel Debugging Techniques & Introduction to Totalview Le Yan Louisiana Optical Network Initiative 7/6/2010 Scaling to Petascale Virtual Summer School
Embed Size (px)
Recommended