+ All Categories
Transcript
Page 1: TotalView Debugger On Blue Gene

Scalable Debugging with TotalViewon Blue Gene 

John DelSignore, CTOTotalView Technologies

Page 2: TotalView Debugger On Blue Gene

2

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Agenda

• TotalView on Blue Gene– A little history– Current status

• Recent TotalView improvements– ReplayEngine (reverse debugging)– Remote Display– TotalView Script (batch debugging)

• Future work– BG/*– Heterogeneous systems– Many core, transactional memory, speculative execution– Peta­scale debugging

Page 3: TotalView Debugger On Blue Gene

3

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Supported Blue Gene Architectures and Compilers

• Blue Gene/L and Blue Gene/P• Languages / Compilers

– C/C++, Fortran, Assembly– GNU Compilers– IBM Compilers– IBM OpenMP (on BG/P)

• Parallel Environments– IBM MPI – IBM OpenMP (on BG/P)– Pthreads (BG/P)

• Runtime linking/loading (BG/P)– Shared libraries– Dynamically loaded shared libraries

Page 4: TotalView Debugger On Blue Gene

4

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Blue Gene Architecture

• TotalView client (GUI/CLI) runs on the Front End node

• Client communicates with the TotalView debugger servers running on the I/O nodes via a socket

• The debugger servers communicate with the CIOD to control processes and threads running on the Compute nodes

• Fan­out ratios (CNs/server)– BG/L: 32­64, 2 cores/CN, 

128 threads/server– BG/P:128­256, 4 cores/CN, 

1024 threads/server– Ratio increasing (8K thr/svr?)– Parallelize server operation

Page 5: TotalView Debugger On Blue Gene

5

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

TotalView Blue Gene/L Support

• TotalView involvement since 2003• Support for Blue Gene/L since 2005• Debugging interfaces developed via close 

collaboration with IBM• Used on DOE/NNSA/LLNL's  Blue Gene/L system 

containing 212 K cores– Heap memory debugging support added– Blue Gene/L scaling and performance tuning project– TotalView has debugged jobs as large as 8,192 processes 

(LLNL)• Work on Blue Gene/L facilitated Blue Gene/P 

support

Page 6: TotalView Debugger On Blue Gene

6

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

TotalView Blue Gene/P Support

• Blue Gene/P supported since Q4 2007• Continued close collaboration with IBM to 

develop multi­threaded debugging interfaces• Support for shared libraries and dynamically 

loaded libraries• Scalability improvements• TotalView has debugged jobs as large as 32K 

(Jülich)

Page 7: TotalView Debugger On Blue Gene

7

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

TotalView Blue Gene/P Sites

• Currently running at over  30 sites in Germany, France, UK, and US, including

– Argonne– Boston University– Daresbury– IDRIS– Jülich– LLNL– Max Planck– ORNL– Princeton University– Rensselaer Polytechnic Institute

• Jülich workshop, March 08• Argonne workshop, May 08

Page 8: TotalView Debugger On Blue Gene

8

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Recent TotalView Improvementson Blue Gene and Linux

• Remote Display– Run a remote version of the TotalView GUI…– …display it locally, with fast, interactive performance– Easy, fast, secure

• tvscript– Simplifies debugging batch jobs– Event/action paradigm– Configurable

• ReplayEngine– Step execution back in time– Uses reverse debugging technology– Linux x86 and x86­64 (currently only)

Page 9: TotalView Debugger On Blue Gene

9

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Remote Display

• Presents a window on your machine that will display TotalView executing on a remote system

• Two components: – Client, runs on the local 

system, available for  Linux x86, x86­64 Windows XP, Vista

– Server, which runs on any system supported by TotalView, invisibly managing the connections between the host and client

• The Client also provides for submission of jobs to batch queuing systems PBS Pro and LoadLeveler

Page 10: TotalView Debugger On Blue Gene

10

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Batch Scripting

• Designed for debugging in a batch environment• tvscript lets you define the events to act on, the actions to 

take when an event occurs• Typical events

– Action point (e.g., breakpoint)– Memory error (e.g., malloc returns 0, guard block corruption)– Errors (e.g., SEGV, FPE)

• Typical actions– Display a backtrace– List memory leaks– Print variables and arrays

• Configurable– Supports external script files– Allows generation of even more complex actions and events

Page 11: TotalView Debugger On Blue Gene

11

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Replay Engine

Step forward over functions

Step forward into functions

Advance forward out of current Function, after the call

Advance forward to selected line 

Step backward over functions

Step backward into functions

Advance backward out of  currentFunction, to before the call

Advance backward to selected line

Advance forward to “live” session

• Intuitive user interface, integrated with TotalView

Page 12: TotalView Debugger On Blue Gene

12

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Possible Future Blue Gene Work

• BG/* support– Support future generations of Blue Gene

• Fast conditional breakpoints/watchpoints– Expressions compiled/patched into target, excute in parallel, 

about 10usecs/expression• Asynchronous thread control

– Thread barrier breakpoint, thread single stepping• User programmable visual data

– Allows user define complex data access function• Debugging optimized code• Post­mortem debugging• Fast DLL debugging interface• LLNL collaboration for scalable subset attach

– Integrates with lightweight tools such as STAT

Page 13: TotalView Debugger On Blue Gene

<number>

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Possible Other Future Work

• Scalability/performance– Continue scalability and performance improvements– Tree­based infrastructure for logarithmic scaling– Peta­scale debugging– Hundreds of thousands of threads

• Heterogeneous systems– IBM Roadrunner (x86­64/Cell)– GPUs

• Emerging technologies– Many core– Transactional memory– Speculative execution

Page 14: TotalView Debugger On Blue Gene

<number>

TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Noticewww.totalviewtech.com

Questions?  

More Information

• Blue Gene Technical Development Interest Group– Contact [email protected]

• Technical support – [email protected]

• BG LLNL case study– www.totalviewtech.com/pdf/case_study_scientific_computing.pdf 

• Customer training or webinars– [email protected]

• Web site – www.totalviewtech.com


Top Related