Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | gavin-castillo |
View: | 215 times |
Download: | 0 times |
DynaMine: Finding Common Error Patternsby Mining Software Revision Histories
Benjamin LivshitsStanford
University
Thomas ZimmermannSaarland University
A Box Full of Nails
A lot of promise potential excitement
Not that many success stories Not sure what to apply it to Let’s try this particularly exciting idea
Miners looking at their tools Promises, promises…
Interesting usage patterns found by CVS mining
Interesting error patterns found by CVS mining
My Background
Tools for bug detection Analysis: pointer analysis, etc. Mostly static, some dynamic
Applications: Security
Buffer overruns Format string violations SQL injections Cross-site scripting HTTP response splitting Data lifetimes
J2EE patterns Bad session stores Lapsed listeners
Eclipse patterns Missing calls to dispose Not calling super Forgetting to deregister listeners
Classification of Error Patterns
NULL dereferences Buffer overruns Double-deletes Locking errors/threads
Generic patterns -- the usual suspects
App-specific patterns particular to a system or a set of APIs
Bugs in Linux code
Bugs in J2EE servlets
Device drivers
Error Pattern Iceberg
NULL dereferencesBuffer overrunsDouble-deletesLocks/threads
Classification of Error Patterns
App-specific patterns particular to a system or a set of APIs
Intuition: Many other application-specific patterns exist Much of application-specific stuff remains a gray area so far
Goal: Let’s figure out what the patterns are
Generic patterns -- the usual suspects
NULL dereferences Buffer overruns Double-deletes Locking errors/threads
Anybody knows any good error
patterns specific to WinAmp plugins??
There are hundreds of
WinAmp plugins out there
Motivation: Matching Method Pairs
Start small: Matching method pairs Only two methods A very simple state machine Calls must match perfectly, order matters
Very common, our inspiration is System calls
fopen/fclose lock/unlock …
GUI operations addNotify/removeNotify addListener/removeListener createWidget/destroyWidget …
Want to find more of the same And, if are lucky, more interesting patterns
DynaMine: Our Insight
Our problem: Want to find patterns whose violation causes
errors Want to find patterns for program understanding
Our technique: Look at revision histories
Crucial observation:
Use data mining techniques to find method that are often added at the same time
Things that are frequently checked in together often form a pattern
DynaMine: Our Insight (continued)
Now we know the potential patterns “Profile” the patterns
Run the application See how many times each pattern
hits – number of times a pattern is followed misses – number of times a pattern is violated
Based on this statistics, classify the patterns Usage patterns – almost always hold Error patterns – violated a large number of the times, but
still hold most of the time Unlikely patterns – not validated enough times
Architecture of DynaMine
mine CVS histories patterns
run the application
post-process
usagepatterns
errorpatterns
unlikelypatterns
sort andfilter
revision history mining
dynamic analysis
report bugs
report patterns
reporting
instrument relevantmethod calls
Mining approach
Mining Basics
Rely on co-change Simplification: look at
method calls only Look for interesting
patterns in the way methods are called
Example: Sequence of revisions Files Foo.java, Bar.java,
Baz.java, Qux.java
o1.addListenero1.removeListener
o2.addListenero2.removeListenerSystem.out.println
o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next
o4.addListenerSystem.out.println
o4.removeListener
Foo.java1.12
Bar.java1.47
Baz.java1.23
Qux.java1.41
1.42
Mining Matching Method Calls
Use our observation: Methods that are
frequently added simultaneously often represent a usage pattern
For instance: … addListener(…); … removeListener(…); …
o1.addListenero1.removeListener
o2.addListenero2.removeListenerSystem.out.println
o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next
o4.addListenerSystem.out.println
o4.removeListener
Foo.java1.12
Bar.java1.47
Baz.java1.23
Qux.java1.41
1.42
Data Mining Summary
We consider method calls added in each check-in We want to find patterns of method calls
Too many potential patterns to consider Want to filter and rank them Use support and confidence for that
Support and confidence of each pattern Standard metrics used in data mining Support reflects how many times each pair appears Confidence reflects how strongly a particular pair is
correlated Refer to the paper for details
Improvements Over the Traditional Approach
Default data mining approach doesn’t quite work
Filters based on confidence and support Still too many potential patterns!
1. Filtering: Consider only patterns with the same initial
subsequence as potential patterns
2. Ranking: Use one-line “fixes” to find likely error patterns
Matching Initial Call Sequences
o1.addListenero1.removeListener
o2.addListenero2.removeListenerSystem.out.println
o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next
o4.addListenerSystem.out.println
o4.removeListener
Foo.java1.12
Bar.java1.47
Baz.java1.23
Qux.java1.41
1.42
1 Pair
3 Pairs 1 Pair
10 Pairs 2 Pairs
1 Pair 0 Pairs
0 Pairs
Using Fixes to Rank Patterns
Look for one-call additions which likely indicate fixes
Rank patterns with such methods higher
o1.addListenero1.removeListener
o2.addListenero2.removeListenerSystem.out.println
o3.addListenero3.removeListenerlist.iteratoriter.hasNextiter.next
o4.addListenerSystem.out.println
o4.removeListener
Foo.java1.12
Bar.java1.47
Baz.java1.23
Qux.java1.41
1.42
This is a fix! Move patterns containing removeListener up
Applications under Study
Apply these ideas to the revision history of Eclipse and jEdit Very large open-source projects Many people working on both, are all over the
planet 122 on Eclipse 92 on jEdit
Many check-ins Eclipse 2,837,854 jEdit 144,495
Long histories Eclipse since 2001 jEdit since 2000
Some patterns
(as promised)
Categories of Patterns
Method calls during execution: Care about the methods Care about the order Care about the parameters/return values
Here’re some common cases
Matching method pairs
State machines
More complex patterns
Some Interesting Method Pairs (1)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen
…
Some Interesting Method Pairs (2)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidgetstopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen
…
Register/unregister the current widget with the parent display object for
subsequent event forwarding
Some Interesting Method Pairs (3)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListenercreatePropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlock
OpenEvent fireOpen
…
Add/remove listener for a particular kind of GUI events
Some Interesting Method Pairs (4)
kEventControlActivate kEventControlDeactivate
addDebugEventListener removeDebugEventListener
beginRule endRule
suspend resume
NewPtr DisposePtr
addListener removeListener
register deregister
addElementChangedListener removeElementChangedListener
addResourceChangeListener removeResourceChangeListener
addPropertyChangeListener removePropertyChangeListener
createPropertyList reapPropertyList
preReplaceChild postReplaceChild
addWidget removeWidget
stopMeasuring commitMeasurements
blockSignal unblockSignal
HLock HUnlockOpenEvent fireOpen
…
Use OS native locking mechanism for resources such as icons, etc.
State Machines
Order captured by a state machine Must be followed precisely: omitting
or repeating a method call is a sign of error.
Simplest formalism for describing the object life-cycle.
Matching method pairs – specific case Very common in C
Consider OS code Less common in Java, but…
State Machines (1)
o.enterAlignment [o.redoAlignment] o.exitAlignment
Part of the org.eclipse.jdt.internal.formatter.Scribe package responsible for pretty-printing of code
enterAlignment/exitAlignment pairs must match
redoAlignment is invoked in exception cases
State Machines (2)
o.beginCompoundEdit()(o.insert(...) | o.remove(...))+
o.endCompoundEdit()
Compound edits within jEdit: can be undone at once
beginCompoundEdit/endCompoundEdit act as brackets
Other operations inbetween
State Machines (3)
OS.PmMemCreateMC[OS.PmMemStart OS.PmMemFlush
OS.PmMemStop]OS.PmMemReleaseMC
Memory context manipulation (like memory pools)
Wrappers around underlying OS functionality The middle part of the pattern is optional
More Complex Stuff (1)
try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {
if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork));}
} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}
More Complex Stuff (2)
try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {
if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false,
Policy.subMonitorFor(monitor, Policy.endOpWork));}
} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}
More Complex Stuff (3)
try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally {
if (depth >= 0) workspace.getWorkManager().endUnprotected(depth);
workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork));}
} catch (CoreException e) { return e.getStatus();} finally { monitor.done();}
Grammar for Workspace Transactions
Requires human intelligence Requires a lot of it Is actually an excellent pattern – haven’t seen runtime violations
S → O
O → w.prepareOperation()
w.beginOperation()
U
w.endOperation()
U → w.getWorkManager().beginUnprotected()
S
[w.getWorkManager().operationCanceled()]
w.getWorkManager().beginUnprotected()
Dynamic checking
Dynamically Check the Patterns
Home-grown bytecode instrumentor Get a list of matching patterns Instrument calls to any of the methods to dump parameters
Post-processing of the output Process a stream of events Find and count matches and mismatches
…o.register(d)…o.deregister(d)…o.deregister(d)
matched
mismatched
???
Experiments
Experimental Setup
Applied to Eclipse and jEdit 3,600,000 lines of Java code combined Included many plugins
Times: 6 days to fetch and process CVS histories 30 minutes to compute the patterns An hour to instrument 15 minutes to run And we are done!
Experimental Summary
Pattern classification: 56 patterns total 13 are usage patterns 8 are error patterns 11 are unlikely patterns 24 were not hit at
runtime Error patterns
Resulted in a total of 264 dynamically confirmed pattern violations
Summary
Knowing code patterns is important We explored using software histories:
Co-change often indicates patterns Use previous fixes (one-line changes) to
drive error patterns Found interesting patterns:
Matching method pairs State machines More complex stuff
Confirmed valid patterns Found pattern violations at runtime We have a paper in FSE 2005