Query-Based Debugging
Raimondas Lencevicius
Department of Computer Science, UCSB
2
Debugging of OO Programs
• Symbolic debugging– Control flow debugging– Object state monitoring– Data breakpoints– Conditional breakpoints
• Debugging of abstract relationships?– Complex object relationships
3
Debugging Object Relationships
• Programmers need to find objects violating relationships– “Are there any windows that do not reference
some child widget?”
• Current debuggers provide only low-level views
• Programmers have to write special testing code
4
Goals of Query-Based Debugging
• Make debugging of data structures easier by answering questions about object relationships
• Explore unfamiliar programs
• Find data structure errors as soon as they occur
5
Query-Based Debugging
• Ask common questions about program state
• Quickly access sets of interesting objects
• Check properties of large groups of objects using single query
• Answer queries while program is running
• Provide functionality efficiently
6
Window
Widgets
Program:
Graphical user interface:
window widget1widget collection
parent window
widget2
Windows and Widgets
7
Query Example
• “Are there any windows that do not reference some child widget?”
window widget1widget collection
parent window
8
Talk Overview
• Query case study
• Query model
• Implementation of debugger
• Dynamic queries
• Experimental results
• Future work
• Conclusions
9
Java Compiler - Case Study
• Goal: understand and debug Java subset compiler written for UCSB compiler course
• Variety of queries– “Can the current lexer token refer to an
unitialized token?”– “Can identifiers declared in the same scope
have the same name and type?”– “Can methods have the same name?”
10
Java Compiler - Case Study
• “Can methods have the same name?”• Experiment with input file containing such
methods:…
static int isOne(int c)
{ return 0;}
…
static int isOne(int c)
{ return 1; }
…
11
Java Compiler - Case Study
• “Can methods have the same name?”• Debugger gives positive answer
• But not a program error– Compiler finds duplicate methods in later phase
SemanticException: The name `isOne' at line 27 chars 14 to 20 was already declared.
MethodDeclaration
public Id name >> “isOne”…
Code >>…(ReturnStmt,Num"0")...
MethodDeclaration
public Id name >> “isOne”…
Code >>…(ReturnStmt,Num”1")...
12
Java Compiler Example Summary
• Explore unfamiliar program
• Find a possible error– Further program investigation shows that there
is no error
• Use query as invariant to verify program’s execution– Dynamic query
13
Talk Overview
• Query case study
• Query model
• Implementation of debugger
• Dynamic queries
• Experimental results
• Future work
• Conclusions
14
Query Model• Widget wid; Window win.
(wid.window == win) && (! win.widgetCollection.contains(wid))
Search domain
Constraint expression in conjunctive form
• Arbitrary boolean constraint expression• Assumption: side-effect free methods
• Selection and join queries
15
Java Compiler Example
• “Can methods have the same name?”MethodDecl x y.(x.name.spelling == y.name.spelling)&& (x != y)
16
Talk Overview
• Query case study
• Query model
• Implementation of debugger
• Dynamic queries
• Experimental results
• Future work
• Conclusions
17
Static Query Implementation
Query string
Intermediate form Optimized form Generated code
Domain collections
Variable types
Domain sizes
User input
Parser Optimizer
Domaincollector
Code generatorExecution module
GUI Output
18
Overview of Implementation
• Enumeration primitive: finds all instances of domain
• Join ordering: finds good order to evaluate query
• Hash joins: speed up equality constraints
• Incremental delivery: shows first result early
19
Query Execution
(d.contains(m))?
Declaration d
Method m
x1 m2x1 m2
d1 m2
CallExpression ce
(ce.decl == m)?ce1x1 m1
ce1x1 m1ce1d1 m1
“Find all declared methods returning integers and called at least once”
Declaration d; Method m; CallExpression ce.(d.contains(m)) && (ce.decl == m) &&(m.typeName != “int”)
20
Join OrderingInefficient ordering
Efficient ordering
10%
2000 200
10
200
1001%
10%10
20010
200
1001%
21
Join Ordering
• Join execution order significantly influences performancececil_method a b; cecil_formal c d. (a.formals.includes(c)) && (b.formals.includes(d)) && (c.name == d.name) && (a != c) && (b != d)
– Naïve evaluation of Cartesian product is slow– Straightforward order takes 37 seconds– Optimized order takes 6 seconds.
• Problem is NP-complete
• System uses heuristics
22
Hash JoinsNested-loop joins
Hash joins
200
X = Y 20,000 operations
100
X = Y100200
300 operations
23
Incremental Delivery
Declaration d
Method m
x1 m2x1 m2
d1 m2
CallExpression ce
ce1x1 m1ce1x1 m1
ce1d1 m1
• Show first result early by pushing intermediate results through pipeline
(d.contains(m))?
(ce.decl == m)?
24
Incremental Delivery
• Goal: fast response for most queries
• Pipelining– Joins are separate threads connected in pipeline
by limited-size buffers– Thread blocks on empty input or full output– Scheduler prefers threads closer to the end of
pipeline
• Time-slicing– Interrupt “slow” threads and reschedule
25
Talk Overview
• Query case study
• Query model
• Implementation of debugger
• Dynamic queries
• Experimental results
• Future work
• Conclusions
26
Gas Tank - Case Study
• Goal: to debug a gas tank simulation applet
• Inter-object constraints– Molecules should stay inside the gas tank– Molecules should not occupy the same position
27
Gas Tank - Case Study
• Detecting an error is not enough
• What code led to this error?
• Need dynamic queries!
Blue molecule x = 20, y = 25 Red molecule x = 20, y = 25
28
Gas Tank - Case Study
• Dynamic query finds error in Move methodpublic void move() {… x += (int)(v*Math.cos(dtor(dir)));y += (int)(v*Math.sin(dtor(dir))); …
• Fix the errory += (int)(v*Math.sin(dtor(dir)));if collided() then handleCollision();
• But debugger still shows an error• Exclude “atomic” regions
29
Motivation of Dynamic Queries
• Close cause-effect gap between error and its discovery– Errors are reported as soon as they occur
• Display dynamics of objects’ relationships - visualization
• Perform continuous invariant or assertion checks
30
Dynamic Query Implementation
Query Results
Java Program
Query String and Change Set
Custom Class Loader
Standard Java Virtual Machine
CustomDebugger Code
Instrumented Java Program
DebuggerLibrary Code
31
Implementation of Dynamic Queries
• Monitor changes that affect query result
• Invoke debugger when change occurs
• Reevaluate query efficiently - incrementally
32
Change MonitoringMolecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2)
• When to reevaluate?– What to monitor?
• Change set - objects and fields affecting result of query– Domain objects– Referenced fields Molecule <init>, x, y– Objects and fields referenced in methods
33
Instrumentation…x += … ; …
…
22: iadd
23: putfield 37
26: aload_0
…
Compile
Load and Instrument
…
22: iadd
23: invokestatic debug
26: aload_0
…
Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) &&
(m1 != m2)
public final class DebuggingCode implements RunTimeCode {
public static void debug(Molecule updatedObject, int newValue) { … updatedObject.x = newValue; // replaces putfield 37 QueryTool.runTool(updatedObject); // invokes query evaluator }}
34
Implementation of Monitoring
• Java bytecode instrumented during load time– Custom class loader
– Uses modified class file handling tools from BCA library
• Creation and deletion of domain objects– Creation monitored by instrumenting constructors
– Deletion handled by GC - not implemented yet
• Modification of change set fields– Instrumentation of field assignments
35
Efficient Query Reevaluation
• Same techniques as static queries– Join ordering
– Hash joins
• Incremental reevaluation
• Custom code generation for selection queries
36
Incremental ReevaluationOriginal query: A * B * C
Incremental query: A * B * C
200 200
10
200
10010%
10%1 1
1
10
1001%
1%
Old results
37
Query Reevaluation Optimizations
Molecule m1, m2.(m1.x == m2.x) && (m1.y == m2.y) && (m1 != m2)
• Same value assignments
– Do not change result - no reevaluation required
• Fast selection queries– Lean custom code
… x = 5; …x: 5
Molecule m
38
Talk Overview
• Query case study
• Query model
• Implementation of debugger
• Dynamic queries
• Experimental results
• Future work
• Conclusions
39
Static Query Experiments
• Setup: Sun Ultra 2/200 (200 Mhz UltraSparc) running modified Self 4.0
• Queries– Self GUI– Cecil compiler– Synthetic stress tests
• Different query structures
40
Static Query Evaluation Time20.7
5.9
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Query number
Tim
e (s
ec)
Completion TimeResponse TimeTranslation TimePrimitive Time
Self GUI Cecil comp Points and rectangles
12 x 146 x 370
11K x 4.5K hash join
4.5K x 4.5K
1804 join
Costly selection
41
Discussion of Static Query Experiments
• Most queries take less than a second to execute
• Join ordering heuristic performs well
• Hash joins can speed up execution
• Incremental delivery decreases response time
42
Discussion of Results
• Query 17– 5,000x5,000 = 25,000,000 checks
• Query 18– Complex, large intermediate results
43
Dynamic Query Experiments• Implemented in fully portable Java 1.2• Setup: Sun Ultra 2/2300 (300 Mhz UltraSparc II) running
Sun Solaris Java 1.2 with JIT compiler
• Queries– Gas tank
– Decaf compiler
– SPECjvm98 applications:
– Synthetic stress test microbenchmarks
• Jess expert system
• compress
• Ray tracer
44
Program Slowdown - Selections
• Overhead does not depend on domain size
• Query 4:z.OutCnt < 0Queries 5-6: z.count() < 0,Query 7:z.costlyMathCount(0)
• Query 12: point.radialDistanceGreaterThan(100M)
1 2 3 4 5 6 7 8 9 10 11 120
0.5
1
1.5
2
2.5
3
3.5
Slo
wd
ow
n
Query number
5.83
Decaf
Gas tank
Jess
Compress
Ray tracer
Invocation frequency
1.9M/s
2.3M/s
45
Program Slowdown - Joins
• Practical for infrequent invocations
Size Slowdown Invocationfrequency
Gas tank 33x33 hash join 2.13 54K
Decaf 120Kx600 hash join 3.43 25K
Ray tracer 85Kx8K hash join 229 350K
Compress 1x1 hash join 157 1.5M
Compress 1x1 join 77 2.6M
Micro benchmark 1x20 hash join 228 40M
Microbenchmark 1x20 join 930 42M
46
Discussion of Dynamic Query Experiments
• Selections are efficient
• Join queries practical for infrequent evaluations and small query domains
• Can we predict debugger performance for wide class of queries?– Query execution model
47
Performance Model
Tinstrumented = Toriginal (1 + Tevaluate * Fevaluate)
• Slowdown depends on– Frequency of debugger invocations
– Selections: Tevaluate = 131 ns - 4.26 s
– Joins: Tevaluate = 5.7 s - 546 s
48
Field Assignment Frequencies
• Microbenchmark: 40M assignments per second• SPECjvm98 suite
– Max frequency: 1.9M assignments per second in compress
– 95% fields have < 100K assignments per second
0.1
0.5 1 5
10
50
10
05
00
10
00
50
00
10
K5
0K
10
0K
50
0K
1M
2M
0
10
20
30
40
50
60
70
80
90
100
Cu
mu
lativ
e p
erc
en
tag
e o
f fie
lds
Field assignment frequency
0.1
0.5 1 51
05
01
00
50
01
00
05
00
01
0K
50
K1
00
K5
00
K1
M 2M
0
50
100
150
200
250N
um
be
r o
f fie
lds
Field assignment frequency
49
Selection Slowdown Estimates
• 500K assignments per second
– 6.5% overhead for Tevaluate = 130 ns
– 313% overhead for Tevaluate = 4.26 s
• 95% fields have < 100K assignments per second
– 43% overhead for 4.26 s selection constraints
0.1
0.5 1 5
10
50
10
0
50
0
10
00
50
00
10
K
50
K
10
0K
50
0K
1M
2M
0
1
2
3
4
5
6
7
8
9
10
Slo
wd
ow
n
Field assignment frequency
Low cost
High cost
50
Summary of Dynamic Queries
• Selection queries are efficient– Less than factor 2 slowdown in experiments
including stress tests– Projected less than 43% overhead for most
selection queries
• Join queries are efficient for infrequent evaluations– 2-930 factor slowdown on join queries
51
Related Work• Extensions to symbolic debuggers
– Limited queries on objects [Sefika et al., Hart et al.]
– Script based visualization of data structures [Duel]
– Data structure animation [HotWire]
– Instance filtering and reference visualization [Look!, DDD]
– Method call visualization [Program Explorer, Object Visualizer]
• Rule-based extensions of OO languages [R++]
• Software visualization [Balsa-Zeus, Tango-Polka, Pavane]
• Database query optimization [Ibaraki and Kameda, Krishnamurthy et al., Swami and Iyer]
52
Future Work• Functionality extensions
– Support for projection, arbitrary computations– Supporting on-the-fly debugging– Distributed query-based debugging– Safe update points
• Execution optimizations– Delaying monotonic updates– Lookup caches
53
Conclusions• New approach to debugging
– Quick access to sets of interesting objects
– Efficient way to check properties of large groups of objects using single query
– Instant error alert with dynamic queries
• Good performance– Most static queries execute in one or two seconds
– Most dynamic selection queries slow down programs less than 43%
54
Further Information
• Query-Based Debugginghttp://www.cs.ucsb.edu/~raimisl/DQBD.html
OOPSLA’97 and ECOOP’99 papers
• Researchhttp://www.cs.ucsb.edu/~raimisl/Research.html
55
Static Query Evaluation Time20.7
5.9
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Query number
Tim
e (s
ec)
Completion TimeResponse TimeTranslation TimePrimitive Time
56
Program Slowdown
• Other join queries - 77-229 slowdown
• Microbenchmark
– Selection - 6.4 slowdown
– Hash join - 228 slowdown
– Nested join - 930 slowdown
1 2 3 4 5 6 7 8 9 10 11 12 13 140
0.5
1
1.5
2
2.5
3
3.5
Slo
wd
ow
n
Query number
5.83
Decaf
Gas tank
Jess
Compress
Ray tracer
57
Breakdown of Query Overhead
• 76% Evaluation time
• 17% Loading
• 7% Garbage collection (128M heap)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
10
20
30
40
50
60
70
80
90
100
Ove
rhe
ad
pe
rce
nta
ge
Query number
Loading
GC
First evaluation
Evaluation