Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 0 times |
End-User Program Analysis
Bor-Yuh Evan ChangUniversity of California, Berkeley
Dissertation TalkAugust 28, 2008
Advisor: George C. Necula, Collaborator: Xavier Rival (INRIA)
2Bor-Yuh Evan Chang - End-User Program Analysis
Software errors cost a lot
~$60 billion annually (~0.5% of US GDP)– 2002 National Institute of Standards and
Technology report
total annual revenue of>10x annual budget of >
3Bor-Yuh Evan Chang - End-User Program Analysis
But there’s hope in program analysis
Microsoft uses and distributesthe Static Driver Verifier
Airbus appliesthe Astrée Static Analyzer
Companies, such as Coverity and Fortify, market static source code analysis tools
4Bor-Yuh Evan Chang - End-User Program Analysis
Because program analysis caneliminate entire classes of bugs
For example,– Reading from a closed file:
– Reacquiring a locked lock:
How?– Systematically examine the program
– Simulate running program on “all inputs”
– “Automated code review”
read( );
acquire( );
5Bor-Yuh Evan Chang - End-User Program Analysis
… code …// x now points to an unlocked lock
acquire(x);… code …
analysis state
Program analysis by example:Checking for double acquires
Simulate running program on “all inputs”
x
acquire(x);… code …
6Bor-Yuh Evan Chang - End-User Program Analysis
… code …// x now points to an unlocked lock in a linked list
acquire(x);… code …
ideal analysis state
Program analysis by example:Checking for double acquires
Simulate running program on “all inputs”
x xx
or or or …
undecidability
7Bor-Yuh Evan Chang - End-User Program Analysis
… code …// x now points to an unlocked lock in a linked list
acquire(x);… code …
ideal analysis state analysis state
Must abstract
x xx
or or or … ?
xFor decidability, must abstract—“model all inputs” (e.g., merge objects)
Abstraction too coarse or not precise enough (e.g., lost x is always unlocked)
mislabels good code as buggy
8Bor-Yuh Evan Chang - End-User Program Analysis
To address the precision challenge
Traditional program analysis mentality:
“Why can’t developers write more specifications for our analysis? Then, we could verify so much more.”
“Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.”
End-user approach:
“Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?”
9Bor-Yuh Evan Chang - End-User Program Analysis
Summary of overview
Challenge in analysis: Finding a good abstraction
precise enough but not more than necessary
Powerful, generic abstractionsexpensive, hard to use and understand
Built-in, default abstractionsoften not precise enough (e.g., data structures)
End-user approach:Must involve the user in abstraction
without expecting the user to be a program analysis expert
10Bor-Yuh Evan Chang - End-User Program Analysis
Overview of contributions
Extensible Inductive Shape Analysis [POPL’08,SAS’07]
Precise inference of data structure propertiesAble to check, for instance, the locking
example
Targeted to software developersUses data structure checking code for guidance Turns testing code into a specification for
static analysis
Efficient~10-100x speed-up over generic approaches Builds abstraction out of developer-supplied
checking code
Extensible InductiveShape Analysis
Precise inference of data structure properties
End-user approach
[POPL’08, SAS’07]
…
12Bor-Yuh Evan Chang - End-User Program Analysis
Shape analysis is a fundamental analysisData structures are at the core of
– Traditional languages (C, C++, Java)– Emerging web scripting languages
Improves verifiers that try to– Eliminate resource usage bugs
(locks, file handles)– Eliminate memory errors (leaks, dangling
pointers)– Eliminate concurrency errors (data races)– Validate developer assertions
Enables program transformations– Compile-time garbage collection– Data structure refactorings
…
13Bor-Yuh Evan Chang - End-User Program Analysis
Shape analysis by example:Removing duplicates
// l is a sorted doubly-linked list
for each node cur in list l {remove cur if duplicate;
}assert l is sorted,
doubly-linked with no duplicates;
Example/Testing Code Review/Static Analysis
“no duplicates”l
“sorted dl list”l
program-specific
l 2 2 44
l 2 44
cur
l 2 4
“sorted dl list”l“segment withno duplicates”
cur
intermediate state more
complicated
14Bor-Yuh Evan Chang - End-User Program Analysis
Shape analysis is not yet practical
Choosing the heap abstraction difficult for precision
Parametric in high-level, developer-oriented predicates+ Extensible+ Targeted to developers
Xisa
Built-in high-level predicates
- Hard to extend+ No additional user effort (if
precise enough)
Parametric in low-level, analyzer-oriented predicates+ Very general and expressive- Hard for non-expert
89
Traditional approaches:
End-user approach:
Space Invader [Distefano et
al.]
TVLA[Sagiv et al.]
15Bor-Yuh Evan Chang - End-User Program Analysis
Key insightfor being developer-friendly and efficientUtilize “run-time checking code” as specification for static analysis.
assert(sorted_dll(l,…));
for each node cur in list l {remove cur if duplicate;
}
assert(sorted_dll_nodup(l,…));
l
l
cur
l
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
checker
Contribution: Automatically generalize checkers for complicated intermediate states
Contribution: Build the abstraction for analysis out of developer-specified checking code
• p specifies where prev should point
16Bor-Yuh Evan Chang - End-User Program Analysis
Our framework is …
• Extensible and targeted for developers– Parametric in developer-supplied checkers
• Precise yet compact abstraction for efficiency– Data structure-specific based on properties of
interest to the developer
An automated shape analysis with a precise memory abstraction based around invariant checkers.
shape analyzer
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
17Bor-Yuh Evan Chang - End-User Program Analysis
Splitting of summaries
To reflect updates precisely
And summarizing for termination
Shape analysis is an abstract interpretation on abstract memory descriptions with …
cur
l
cur
l
cur
l
cur
l
cur
l
cur
l
18Bor-Yuh Evan Chang - End-User Program Analysis
Outline
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
typeinference
on checkerdefinitions
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
Learn information about the checker to use it as an abstraction 1
2
3Compare and contrast manual code review and our automated shape analysis
19Bor-Yuh Evan Chang - End-User Program Analysis
Overview: Split summariesto interpret updates precisely
l
cur
l
cur
Want abstract update to be “exact”, that is, to update one “concrete memory cell”.The example at a high-level: iterate using cur changing the doubly-linked list from purple to red.
l
cur
split at cur
update cur purple to red
l
cur
Challenge:How does the analysis “split” summaries and know where to “split”?
20Bor-Yuh Evan Chang - End-User Program Analysis
“Split forward”by unfolding inductive definition
Çdll(h, p) =
if (h = null) thentrue
elseh!prev = p and dll(h!next, h)
l
curget: cur!next
l
cur
null
p dll(cur, p)
l
cur
pdll(n, cur)
n
Analysis doesn’t forget the empty case
21Bor-Yuh Evan Chang - End-User Program Analysis
“Split backward” also possible and necessary
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
l
cur
pdll(n, cur)
n
for each node cur in list l {
remove cur if duplicate;}assert l is
sorted, doubly-linked with no duplicates;
“dll segment”
l
cur
p0dll(n, cur)
n“dll segment”
cur!prev!next= cur!next;
l
cur
dll(n, cur)nnull
get: cur!prev!next
Ç
Technical Details:How does the analysis do this unfolding?Why is this unfolding allowed?(Key: Segments are also inductively defined)
[POPL’08]
How does the analysis know to do this unfolding?
22Bor-Yuh Evan Chang - End-User Program Analysis
Outline
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
typeinference
on checkerdefinitions
Contribution: Turns testing code into specification for static analysis
12
3
How do we decide where to unfold?
Derives additional information to guide unfolding
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
23Bor-Yuh Evan Chang - End-User Program Analysis
memory cell (points-to: °!next = ±)
Abstract memory as graphs
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
l
®dll(null) dll(¯)
cur
°dll(°)
¯prev
next±
Make endpoints and segments explicit, yet high-levell dll(±, °)
±“dll segment”
cur
°
®
segment summary
checker summary (inductive pred)
memory address (value)
Contribution: Generalization of checker(Intuitively, dll(®,null) up to dll(°,¯).)
Some number of memory cells (thin edges)
Which summary (thick edge), in what direction, and how far do we unfold to get the edge ¯!next (cur!prev!next)?
¯
next
24Bor-Yuh Evan Chang - End-User Program Analysis
0
1
-1
-2
Types for deciding where to unfold
®dll(null) dll(¯) dll(¯)
°
dll(®,null)
dll(¯,®)
dll(°,¯)
dll(±,°)
dll(null,±)
Checker “Run” (call tree/derivation)
Instance
Summary
° ±® ¯ nullnull
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
h:{nexth0i,prevh0i }p:{nexth-1i,prevh-1i }
If it exists, where is:
°!next ?
¯!next ?
Checker Definition
0-1
Says:
For h!next/h!prev, unfold from h
For p!next/p!prev, unfold before h
25Bor-Yuh Evan Chang - End-User Program Analysis
Types make the analysis robust with respect to how checkers are written
¯dll(®) dll(¯) dll(¯)
°
Instance
Summarydll(h, p) =
if (h = null) thentrue
elseh!prev = p and dll(h!next, h)
h:{nexth0i,prevh0i }p:{nexth-1i,prevh-1i }
°¯ null®
¯ ° null
Instance
¯dll0 dll0 dll0
°
Summarydll0(h) =if (h!next = null)
thentrue
elseh!next!prev = h
and dll0(h!next)
Alternative doubly-linked list checker h:{nexth0i,prevh-1i }
°!prev ? -1
Doubly-linked list checker (as before)
Different types for different unfolding
26Bor-Yuh Evan Chang - End-User Program Analysis
Summary of checker parameter types
Tell where to unfold for which fields
Make analysis robust with respect to how checkers are written
Learn where in summaries unfolding won’t help
Can be inferred automatically with a fixed-point computation on the checker definitions
27Bor-Yuh Evan Chang - End-User Program Analysis
Summary of interpreting updates
Splitting of summaries needed for precision
Unfolding checkers is a natural way to do splitting
When checker traversal matches code traversal
Checker parameter typesEnable, for example, “back pointer” traversal without blindly guessing where to unfold
28Bor-Yuh Evan Chang - End-User Program Analysis
Outline
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
typeinference
on checkerdefinitions
12
3
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
29Bor-Yuh Evan Chang - End-User Program Analysis
Summarizeby folding into inductive predicates
last = l;cur = l!next;while (cur != null) {
// … cur, last …if (…) last =
cur;cur = cur! next;
}
listl, last
nextcur
listl
next nextcurlast
listl
next next nextcurlast
summarize
listlast
listnextcur
listl
Challenge: Precision (e.g., last, cur separated by at least one step)
Previous approaches guess where to fold for each graph.Contribution: Determine where by comparing graphs across history
30Bor-Yuh Evan Chang - End-User Program Analysis
Summary:Given checkers, everything is automatic
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
typeinference
on checkerdefinitions
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
31Bor-Yuh Evan Chang - End-User Program Analysis
Results: Performance
Benchmark
Max. Num.
Graphs at a
Program Pt
Analysis
Time (ms)
singly-linked list reverse 1 0.6
doubly-linked list reverse 1 1.4
doubly-linked list copy 2 5.3
doubly-linked list remove 5 6.5
doubly-linked list remove and back 5 6.8
search tree with parent insert 5 8.3
search tree with parent insert and back
5 47.0
two-level skip list rebalance 6 87.0
Linux scull driver (894 loc) (char arrays ignored, functions inlined)
4 9710.0
Times negligible for data structure operations (often in sec or 1/10 sec)Expressiveness:
Different data structures
Verified shape invariant as given by the checker is preserved across the operation.
TVLA: 850 ms
TVLA: 290 ms
Space Invaderonly analyzes lists (built-in)
32Bor-Yuh Evan Chang - End-User Program Analysis
Demo: Doubly-linked list reversal
http://xisa.cs.berkeley.edu
Body of loop over the elements:Swaps the next and prev fields of curr.
Already reversed segmentNode whose next and prev fields were swapped Not yet reversed list
33Bor-Yuh Evan Chang - End-User Program Analysis
Experience with the tool
Checkers are easy to write and try out– Enlightening (e.g., red-black tree checker in 6
lines)– Harder to “reverse engineer” for someone else’s
code– Default checkers based on types useful
Future expressiveness and usability improvements– Pointer arithmetic and arrays– More generic checkers:
polymorphic “element kind unspecified”
higher-orderparameterized by other predicates
Future evaluation: user study
34Bor-Yuh Evan Chang - End-User Program Analysis
Summary ofExtensible Inductive Shape Analysis
Key Insight: Checkers as specificationsDeveloper View: Global, Expressed in a familiar
styleAnalysis View: Capture developer intent,
Not arbitrary inductive definitions
Constructing the program analysisIntermediate states: Generalized segment predicates
Splitting: Checker parameter types with levels
Summarizing: History-guided approachnext listlist list listlist
® ¯c(°) c0(°0)
h : {nexth0i, prevh0i}p : {nexth-1i, prevh-1i}
35Bor-Yuh Evan Chang - End-User Program Analysis
Conclusion
Extensible Inductive Shape Analysisprecision demanding program analysis improved by novel user interaction
Developer: Gets results corresponding to intuition
Analysis: Focused on what’s important to the developer
Practical precise tools for better software with an end-user approach!