Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
Murali Krishna Ramanathan
Department of Computer SciencePurdue University
(joint work with Suresh Jagannathan and Ananth Grama)
Static Path-Aware Analysis of Program Invariants
3
Context What is a program invariant?
Property that must hold across all program executions
What is a failure? Program run does not satisfy an expected invariant
System crashes Logical bugs Performance bugs
What is a specification? Documentation of intended program invariants e.g., lock must be followed by unlock Unavailable or imprecise
4
Issues Deriving specifications
Where do we start? Absence of formal documentation Legacy code
Identifying the source of failures How do we search?
Exponential number of execution paths to explore Representing common information among paths
5
Specification Inference
Challenges What to look for?
Both relevant and irrelevant information present in the program source
How to be robust in the presence of bugs? Assumptions
Programs are mostly well tested but can have bugs
Transparent – no programmer annotations
6
Kinds of specifications Control-flow preconditions
A call to fopen must always precede a call to fgets Data-flow preconditions
The result of a call to socket must always be checked for error before a call to bind
Control-flow postconditions A call to fopen is either followed by a call to fclose or
error Control-flow divergence preconditions
A call to read can be preceded either by a call to open or socket
…
7
Preconditions
Predicate Captures properties associated with variables
and procedure calls Preconditions for procedure
Composed of predicates that need to hold always before every call to a procedure
fp = fopen(…);if(fp != NULL) fgets(buf, SIZE, fp);
fp := fopen(…)
fp != null
fopen <- fgets
8
Control-flow define precedence properties among procedures fgets is preceded by fopen
Data-flow captures data flow properties associated with
variables fp is assigned the return of fopen, fp is not
null,
Types of predicatesfp = fopen(…);if(fp != NULL) fgets(buf, SIZE, fp);
9
Control-flow preconditions (ICSE 07)
181 RI_FKey_check(PG_FUNCTION_ARGS)182 {199 ri_CheckTrigger(...);210 pk_rel = heap_open(...);
296 match_type = ri_DetermineMatchType(...);
303 ri_BuildQueryKeyFull(...);437 }
“Build up a new hashtable key for a prepared SPI Plan of a constraint trigger of MATCH FULL …”
“Get the relation descriptors of the FK and PK tables…”
“Convert the MATCH TYPE string into a switchable int”
“Check that RI trigger function was called in expected context”
10
Control-flow preconditions181 RI_FKey_check(PG_FUNCTION_ARGS)182 {199 ri_CheckTrigger(...);210 pk_rel = heap_open(...);212 if(TRIGGER_FIRED_BY_UPDATE(...)) ... 218 else ...231 if(!HeapTupleSatisfies(...)) ...
296 match_type = ri_DetermineMatchType(...);298 if(match_type==RI_MATCH_TYPE_PARTIAL)299 ereport(...);303 ri_BuildQueryKeyFull(...);437 }
11
181 RI_FKey_check(PG_FUNCTION_ARGS)182 {199 ri_CheckTrigger(...);210 pk_rel = heap_open(...);248 if (tgnargs == 4)249 { 250 ri_BuildQueryKeyFull(...);294 }
437 }
Control-flow preconditions
ri_BuildQueryKeyFull not preceded by ri_DetermineMatchType
Leads to a potential crash
12
Static Specification Mining To generate preconditions for a procedure
Generate predicates at each call-site of the procedure
Ideally common predicates across all the call-sites form the preconditions for the procedure
How to find common predicates? Use mining techniques
Construct patterns built from alignments or permutations of predicate sets
Approximation: Patterns appearing in programs denote preconditions
13
Approach Analyze control-flow graph
Build precedence relation (a <- b): A binary relation between procedures a and b A call to b is always preceded by call to a
Necessitates an inter-procedural analysis Relations can cross procedure boundaries
Convergence requires fixpoint calculation Procedure signatures
Frequent subsequence mining Mine the chains formed by precedence relations
14
Path Exploration
p
q rq
qPath-Sensitive Exploration:
q <- p, q <- r <- p
Path-Insensitive Exploration:
q , r <- p
Path-Aware Exploration:
q <- p
16
Inter-procedural Analysis
h() { if(cond) lwrap(); else lwrap(); … uwrap();}
lwrap () { init();}
uwrap () { access();}
17
Procedure Signatures
p
q rq
q
t
s
entry
u
ret
s
Procedure signature for s: q <- p
s <- ts <- q <- p <- t
18
Mining sequences Sequence mining:
Input: set of sequences (I) Output: sequences that occur ‘frequently’ as
subsequences in I Use the Apriori-all algorithm [Agrawal and
Srikant, Mining Sequential Patterns, ICDE ’95]
19
Control paths: Invariant: a, b, c, e a <- c <- e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a (Faulty path, no call to a and c before e)
Intersection of these paths e is preceded by nothing
Use mining to overcome brittleness of path intersection
Motivation for sequence mining
20
Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a
Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a
Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a
Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a
Sequence Mining - Example
a 5
c 4
e 5
a, c 4
a, e 4
c, e 4
a, c, e 4Maximal
21
Data-flow preconditions (PLDI 07) Challenges
Data-flow predicates may be aliased
No anchors for data-flow predicates
if (x > 0) f(x);
if (y > 0) f(y);
x = g(…); h(x); if(x > 0) f(x);
22
Motivating Examplemain(…) {
for(ai = options.listen_addrs;…) {
listen_sock = socket(ai->ai_family,…);
if(listen_sock < 0) error();
if(num_listen_socks >= 16) error();
if((ret = getnameinfo(…))) …
if(setsockopt(listen_sock,…) == -1) error();
if(bind(listen_sock, ai->ai_addr,…) < 0) …
}
}
• In a call to bind, the first parameter is always assigned the return value of a call to socket and is checked for error
23
Generate Predicatesmain(…) {
for(ai = options.listen_addrs;…) {
listen_sock = socket(ai->ai_family,…);
if(listen_sock < 0) error();
if(num_listen_socks >= 16) error();
if((ret = getnameinfo(…))) …
if(setsockopt(listen_sock,…) == -1) error();
if(bind(listen_sock, ai->ai_addr,…) < 0) …
}
}listen_sock: return(socket),
(param_1, bind)
(param_1, setsockopt), (>=,0)
num_listen_socks: (<,16)
ret: return(getnameinfo)
24
Another call-site ssh_control_listener(void) { if(control_fd = socket(PF_UNIX,…) < 0) error(); old_umask = umask(0177); if(bind(control_fd,(struct sockaddr *)&addr,…)) …
control_fd: return(socket),
(param_1, bind) (>=,0)
old_umask: return(umask)
25
Structural Similarity Problemlisten_sock: return(socket),
(param_1, bind)
(param_1, setsockopt), (>=,0)
num_listen_socks: (<,16)
ret: return(getnameinfo)
control_fd: return(socket),
(param_1, bind) (>=, 0)
old_umask: return(umask)
How to group the attribute sets that need to be mined together?
Find maximal matching of attribute sets NP-hard Use approximations based on program structures
26
Approximations Type
attribute sets divided based on type of variable Parameter
Supplied as arguments to the same parameter for any given procedure
Result Variables that are assigned the return values of
the same function …
27
Example revisited
Variable names are not comparable Use positional information
Different number of attributes Interspersed with irrelevant operations
listen_sock: return(socket),
(param_1, bind)
(param_1, setsockopt), (>=,0)
num_listen_socks: (<,16)
ret: return(getnameinfo)
control_fd: return(socket),
(param_1, bind) (>=, 0)
old_umask: return(umask)
28
Is intersection robust?
Same limitations as with control-flow preconditions Adopt frequent itemset mining
Order of events is less critical Aggregate collection of data-flow facts at call-sites
listen_sock: return(socket),
(param_1, bind)
(param_1, setsockopt), (>=, 0)
control_fd: return(socket),
(param_1, bind) (>=, 0)
sockfd: return(socket),
(param_1, bind)
return(socket),
(param_1, bind)
(>=, 0) missing!
Precondition:
29
Locality
Interprocedural analysis to capture precondition crossing procedure boundaries
main() { fp = fopen(…); if(fp != NULL) read_file(fp);}
read_file(FILE *fp) { … fgets(buf, SIZE, fp); …}
main() { fp = init_file(…); fgets(buf, SIZE, fp);}
init_file(…) { fp = fopen(…); if(fp != NULL) return fp; exit(-1);}
30
p1, p2
p1, p2
p1
Example
t
rs
s
s
r
p1 p1
p1 p1
p1
p2
p2
q
q
Intraprocedural edge
Interprocedural edge
p1
31
Experiments
Applied on open source C programs Input to the implementation: control flow
graphs Control flow nodes varied from 16K to 958K
Roughly 2M LoC Procedure count varied from 298 to 8568 Precondition predicates varied from 189 to
5963 Analysis time varied from 26s to 20m
32
Experimental Goals Path awareness improves precision Useful for bug detection Generates salient documentation
33
Effectiveness of path awareness
•Fewer protocols generated using our approach• Reduction not at the expense of increase in false negatives• Reduces false positives
34
Bug Detection: Openssh Procedure prime_test in openssh-4.4p1 Testing difficult as it performs Miller-Rabin
primality testing Program crashes due to the absence of a error
check e.g., BN_mod_word(p, …), if p is null, program crashes Fixed in openssh-4.5p1
Error check not always necessary e.g., BN_is_prime(…, ctx,…), ctx can either be null or
pre-allocated
35
Bug detection Case Study: Linux
Hardware Bug Difficult to detect using traditional testing techniques Platform dependent error Transparently identified using our approach
Performance Bug Cache lookup operation was absent Not easily specified as a bug for testing Deviation delays data write flushes
Difficult to identify using traditional testing techniques
37
Related Work
Static techniques Inferring Specifications from Within, Kremenek et al, OSDI 06 Bugs as deviant behavior, Engler et al, SOSP 01 …
Dynamic techniques Strauss, Ammons et al, POPL 02 Daikon, Ernst et al, TSE 01 …
Our approach Path-aware analysis Generates preconditions Predicates of arbitrary size Annotation free
38
Future Work Richer specifications
Post-conditions, divergence structures, … More sophisticated mining techniques
Graph mining, … Validating generated specifications
Integration with theorem prover Specifications and concurrency
Atomicity violations
39
Other work Dynamic analysis
Detecting cause of assertion failures (under review)
Static path profiles (under review)
Impact analysis – ASE 06
Memory aliasing – FASE 06
Test case prioritization – SAC 08
Distributed Systems Randomized leader election (Distributed Computing 07)
Eliminating duplicates in P2P systems (TPDS 07)
Search in P2P systems (P2P 05)
Efficient tag detection in RFID systems (SECON 05)
41
Why not mine post-conditions?
Precedence protocol: A call to fclose is always preceded by a call to fopen
fp = fopen(…);if(fp == NULL) exit(-1);fclose(…);
Successor protocol: A call to fopen is always succeeded by a call to fclose
42
Why parameter tracing is insufficient?
uldap_connection_find (…) { //code fragment from httpd if (APR_SUCCESS == apr_thread_mutex_trylock(l->lock)) { … compare_client_certs(st->client_certs, l->client_certs) …}
• In a call to compare_client_certs, the return value of a call to apr_thread_mutex_trylock must be APR_SUCCESS.
• Predicate for compare_client_certs includes
“return value of apr_thread_mutex_trylock(…) is APR_SUCCESS”