+ All Categories
Home > Documents > Improving the Scalability of the TotalView Debugger using...

Improving the Scalability of the TotalView Debugger using...

Date post: 12-May-2020
Category:
Upload: others
View: 16 times
Download: 0 times
Share this document with a friend
25
Improving the Scalability of the TotalView Debugger using TBONs Michael J. Brim Paradyn Project, University of Wisconsin John DelSignore Rogue Wave Software CScADS August 1, 2011
Transcript
Page 1: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

Improving the Scalability of the TotalView Debugger using TBONs

Michael J. BrimParadyn Project, University of Wisconsin

John DelSignoreRogue Wave Software

CScADSAugust 1, 2011

Page 2: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

2Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

The Tool Scalability Problem

Key tasks:o Application Controlo Data collectiono Data centralization/analysis

As scale increases,front-end becomes bottleneck

FE

…… …BE

appappappapp

BE

appappappapp

BE

appappappapp

BE

appappappapp

O(10,000)

O(1,000,000)

TotalView Debugger

Page 3: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

3Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Tree-Based Overlay Networks (TBONs)

o Scalable multicast

o Scalable gather

o Scalable data aggregation

oNatural redundancy

FE

…… …BE

appappappapp

BE

appappappapp

BE

appappappapp

BE

appappappapp

CP CP

CP CP CP CP

Page 4: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

4Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

MRNet – Multicast / Reduction NetworkGeneral-purpose TBON API

o Network: user-defined topologyo Stream: logical data channel

• to a set of back-ends• multicast, gather, and custom

reductiono Packet: collection of datao Filter: stream data operator

• synchronization• transformation

Widely adopted by HPC toolso CEPBA toolkit o Cray ATP & CCDBo Open|SpeedShop & CBTFo STATo TAUo …

FE

…… …BE

appappappapp

BE

appappappapp

BE

appappappapp

BE

appappappapp

CP CP

CP CP CP CP

F(x1,…,xn)

Page 5: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

5Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TBON-FS : the TBON File System

Specialized TBON for distributed file accesso back-end data sinks/sources are fileso simplifies tool front-end development

by providing an intuitive interface based on POSIX I/O

o custom tool back-end functionality via synthetic file systems loaded into TBON-FS servers

Uses MRNet for:o scalable unified name space

compositiono scalable group file operations

Client

CP CP

CP CP CP CP

libtbonfs

TBON-FSServer

TBON-FSServer

…TBON-FSServer

TBON-FSServer

FS FS FS FS

Page 6: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

6Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Group File Operations

gfd = gopen(dir, flags, mode)

Operating on Groupso Use group file descriptor with regular file operations

(e.g., read and write)• avoids iteration, one system call per group operation

o Semantics• operation applied to each group member• user-controlled aggregation of status and data results

Page 7: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

/proc/proc /proc /proc

TBON-FS: Scalable Group File Operationsint rc = read(gfd, databuf, 1024)

TBON-FSServer

TBON-FSServer

TBON-FSServer

TBON-FSServer

TBŌN-FS Client

stat() data() stat() data()

stat() data() stat() data()

stat() data()

TBON(MRNet)

Status Aggregation

(sum)

Data Aggregation

(concatenate)

1024×gsize(gfd)

Page 8: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

Scalable Distributed Process Monitoring: ptop

Avg. %CPU 4096 processes

4,096 files

>1,000,000 files

/proc/uptime /proc/loadavg/proc/stat /proc/meminfo

/proc/$pid/stat

/proc/$pid/statm

/proc/$pid/status

Page 9: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

9Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Group Process Control & Inspection

/proc : a good starting pointowrite to process/thread control file(s) to run/stop/signal o read files containing process/thread statuso read/write process address spaceo read/write thread registers

But, o functionality differs by OS (e.g., no control on Linux)o no notion of group operationso always contains all host processes

Page 10: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

10Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

proc++ : Synthetic File System for Process Control

Improvements over /proc

1.process/thread groupso explicit group managemento directories containing members’

control and inspection files automatically created

2. high-level debugger operations o breakpointso steppingo stack walks

3. platform-independent interface

foreach(member){restore_insn()step_target()insert_bkpt()run_target()

}

run_group()

/proc proc++

Example: Continue group from breakpoint

MPMD

odd/even diagonal

Page 11: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

11Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

proc++ : from the makers of Dyninst

Most capabilities provided by ProcControlAPIo Cross-platform component library / C++ API

• Linux, FreeBSD, BlueGene, Windows

o Process / thread control and inspection• Stop / continue processes, single-step threads• Read / write process memory, thread registers• Insert / remove breakpoints• Inferior remote procedure calls• Callbacks for asynchronous event notification

Thread stack walks (StackwalkerAPI)

Page 12: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

12Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView Parallel Debugger

Commercial debugger from Rogue Wave Softwareo Sequential, multi-threaded, and parallel programso Fortran, C, C++ code from various compilerso pthreads, OpenMP, MPI, UPC

20+ years of engineering and HPC experienceo Advanced MPI debuggingo Built-in memory debuggero Reverse debugging (application DVR)o Recent support for GPGPU (CUDA) code

Page 13: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

13Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView is a great case study

Most widely-used HPC debuggero Lots of happy users

Known scalability limitationso Lots of users that need it to work at full-scale on largest systems

(i.e., @ 200K+ processes)

20+ years of engineeringo A real tool that works on real applicationso Modular architecture that evolved over timeo Operations on process and thread groups are primary focus

Page 14: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

14Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView: Original Architecture

Process Object

Thread Object

Group Object foreach( targ in grp ){tracer = targ.getTracer()result = tracer.op(args)targ.update( result )

}

Group Operation

OSTracer

server server server server server server

Debugger Layer

User Interface

Layer

dbgGrp = grp.getDebugGrp()result = dbgGrp.groupOp(args)foreach( targ in grp )targ.update( result )

Group Operation

appappappapp appappappapp appappappapp appappappapp appappappapp appappappapp

TotalView Client

FE

…BE

appappappapp

BE

appappappapp

BE

appappappapp

BE

appappappapp

… …

Page 15: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

15Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView Integration Challenges

Group Operationso no group operations at (lowest) tracer level

• pushed groups down to use group file operations

o some group operations at UI level use iteration• added group operations at debugger level

o some group operations require process- or thread-specific context• extended proc++ interface and capabilities

Multi-level object maintenance

Page 16: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

16Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView: TBON-FS Architecture

Process Object

Thread Object

Group Objectrep = grp.getMember(0)tracer = rep.getTracer()result = tracer.groupOp(args)foreach( targ in grp )targ.update( result )

Group Operation

Debugger Layer

User Interface

Layer

dbgGrp = grp.getDebugGrp()result = dbgGrp.groupOp(args)foreach( targ in grp )targ.update( result )

Group Operation

proc++ Tracer

TBON-FS

proc++ proc++ proc++ proc++ proc++ proc++

appappappapp appappappapp appappappapp appappappapp appappappapp appappappapp

Group File Operations

TotalView Client

Page 17: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

17Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView: MRNet Architecture

Process Object

Thread Object

Group Objectrep = grp.getMember(0)tracer = rep.getTracer()result = tracer.groupOp(args)foreach( targ in grp )targ.update( result )

Group Operation

Debugger Layer

User Interface

Layer

dbgGrp = grp.getDebugGrp()result = dbgGrp.groupOp(args)foreach( targ in grp )targ.update( result )

Group Operation

OS Tracer

MRNet

tracer tracer tracer tracer tracer tracer

appappappapp appappappapp appappappapp appappappapp appappappapp appappappapp

Group Tracer Operations

TotalView Client

Page 18: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

Scalability: proc++ group writes

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 5 10 15 20 25 30 35 40 45 50# app processes (thousands)

Tim

e (s

econ

ds)

write stopwrite continue

write breakpoint

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40 45 50

# app processes (thousands)

Tim

e (s

econ

ds)

write attachwrite singlestep

Page 19: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

Scalability: proc++ group reads

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 5 10 15 20 25 30 35 40 45 50

# app processes (thousands)

Tim

e (s

econ

ds)

read regs gprread regs pc

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 5 10 15 20 25 30 35 40 45 50

# app processes (thousands)

Tim

e (s

econ

ds)

read addr mapsread events

Page 20: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

20Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Amdahl’s Law for Scalable Tools

Speed-up from using scalable group file operations is limited by front-end sequential behaviors

o reduce the number of objects per targeto reduce the state kept in those objectso eliminate iterative allocation of objectso eliminate iterative object state updates

Y

X

Y

X

N N

index

Page 21: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

21Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Keys to Real Tool Scalability

“iteration is the bane of scalability”- meo any operation requiring a linear number of steps is a show-stopper

1. Limited sequential behavior in tool front-end2.Good group representation

• efficient creation and update ⇒ distributed group state

3.Constant or logarithmic time group operations• parallel execution across group members

4.Constant or logarithmic size data at tool front-end• tool internal state: O(# of groups), not O(# of targets)• user display of group data: scalable aggregation is necessary

Page 22: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

22Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Tool Scalability “rules to live by”

1. Single-target operations must be efficient, but rarely used

2. On-demand data access (lazy evaluation)• do not collect or generate data that is never used

3. Data Caching• individual target data at tool front-end is a bad idea

leads to iterative cache invalidation and updatesee rule #2

• individual target data at tool back-ends is a time/space tradeoff• group data at tool front-end is a time/space tradeoff

caching within a TBON can limit both time and space

Page 23: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

23Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

Questions?

Group File Operations & TBON-FSo International Conference on High Performance Computing (HiPC

2009) Best Papero ftp://ftp.cs.wisc.edu/paradyn/papers/Brim09GroupFile.pdf

Scalable Composition of File System Name Spaceso International Workshop on Runtime and Operating Systems for

Supercomputers (ROSS 2011)o ftp://ftp.cs.wisc.edu/paradyn/papers/Brim11FinalNamespace.pdf

MRNet : http://www.paradyn.org/mrnet/TBON-FS or proc++ Source Code (talk to me)

Page 24: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

24Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView Integration: proc++ Extensions

Problem: dynamic address space mappings

How can we do group address space write/read?

executable executable executable

Rank 0 Rank i Rank N

0x400 0x400

libfoo

libbar

libfoo

libbar libfoo

libbar0x800 0x800

Page 25: Improving the Scalability of the TotalView Debugger using ...cscads.rice.edu/Brim-Debugging-CScADS-2011.pdf · Improving the Scalability of the TotalView Debugger using TBON-FS and

25Improving the Scalability of the TotalView Debugger using TBON-FS and proc++

TotalView Integration: proc++ Extensions

Solution: image files that hide dynamic mappings

o one file for each mapped code imageo zero offset corresponds to map base of imageo to read / write symbols in image, seek to the

symbol offset

executable

Rank X

0x000

libfoo libbar


Recommended