+ All Categories
Home > Documents > Murali Krishna Ramanathan Department of Computer Science Purdue University (joint work with Suresh...

Murali Krishna Ramanathan Department of Computer Science Purdue University (joint work with Suresh...

Date post: 22-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
43
Murali Krishna Ramanathan Department of Computer Science Purdue University (joint work with Suresh Jagannathan and Ananth Grama) Static Path-Aware Analysis of Program Invariants
Transcript

Murali Krishna Ramanathan

Department of Computer SciencePurdue University

(joint work with Suresh Jagannathan and Ananth Grama)

Static Path-Aware Analysis of Program Invariants

2

Motivation

Expert Programmer

How doI use this?

New Programmer

Undocumented Program

BUGS

Tester

3

Context What is a program invariant?

Property that must hold across all program executions

What is a failure? Program run does not satisfy an expected invariant

System crashes Logical bugs Performance bugs

What is a specification? Documentation of intended program invariants e.g., lock must be followed by unlock Unavailable or imprecise

4

Issues Deriving specifications

Where do we start? Absence of formal documentation Legacy code

Identifying the source of failures How do we search?

Exponential number of execution paths to explore Representing common information among paths

5

Specification Inference

Challenges What to look for?

Both relevant and irrelevant information present in the program source

How to be robust in the presence of bugs? Assumptions

Programs are mostly well tested but can have bugs

Transparent – no programmer annotations

6

Kinds of specifications Control-flow preconditions

A call to fopen must always precede a call to fgets Data-flow preconditions

The result of a call to socket must always be checked for error before a call to bind

Control-flow postconditions A call to fopen is either followed by a call to fclose or

error Control-flow divergence preconditions

A call to read can be preceded either by a call to open or socket

7

Preconditions

Predicate Captures properties associated with variables

and procedure calls Preconditions for procedure

Composed of predicates that need to hold always before every call to a procedure

fp = fopen(…);if(fp != NULL) fgets(buf, SIZE, fp);

fp := fopen(…)

fp != null

fopen <- fgets

8

Control-flow define precedence properties among procedures fgets is preceded by fopen

Data-flow captures data flow properties associated with

variables fp is assigned the return of fopen, fp is not

null,

Types of predicatesfp = fopen(…);if(fp != NULL) fgets(buf, SIZE, fp);

9

Control-flow preconditions (ICSE 07)

181 RI_FKey_check(PG_FUNCTION_ARGS)182 {199 ri_CheckTrigger(...);210 pk_rel = heap_open(...);

296 match_type = ri_DetermineMatchType(...);

303 ri_BuildQueryKeyFull(...);437 }

“Build up a new hashtable key for a prepared SPI Plan of a constraint trigger of MATCH FULL …”

“Get the relation descriptors of the FK and PK tables…”

“Convert the MATCH TYPE string into a switchable int”

“Check that RI trigger function was called in expected context”

10

Control-flow preconditions181 RI_FKey_check(PG_FUNCTION_ARGS)182 {199 ri_CheckTrigger(...);210 pk_rel = heap_open(...);212 if(TRIGGER_FIRED_BY_UPDATE(...)) ... 218 else ...231 if(!HeapTupleSatisfies(...)) ...

296 match_type = ri_DetermineMatchType(...);298 if(match_type==RI_MATCH_TYPE_PARTIAL)299 ereport(...);303 ri_BuildQueryKeyFull(...);437 }

11

181 RI_FKey_check(PG_FUNCTION_ARGS)182 {199 ri_CheckTrigger(...);210 pk_rel = heap_open(...);248 if (tgnargs == 4)249 { 250 ri_BuildQueryKeyFull(...);294 }

437 }

Control-flow preconditions

ri_BuildQueryKeyFull not preceded by ri_DetermineMatchType

Leads to a potential crash

12

Static Specification Mining To generate preconditions for a procedure

Generate predicates at each call-site of the procedure

Ideally common predicates across all the call-sites form the preconditions for the procedure

How to find common predicates? Use mining techniques

Construct patterns built from alignments or permutations of predicate sets

Approximation: Patterns appearing in programs denote preconditions

13

Approach Analyze control-flow graph

Build precedence relation (a <- b): A binary relation between procedures a and b A call to b is always preceded by call to a

Necessitates an inter-procedural analysis Relations can cross procedure boundaries

Convergence requires fixpoint calculation Procedure signatures

Frequent subsequence mining Mine the chains formed by precedence relations

14

Path Exploration

p

q rq

qPath-Sensitive Exploration:

q <- p, q <- r <- p

Path-Insensitive Exploration:

q , r <- p

Path-Aware Exploration:

q <- p

15

Precedes relation

p

q rq

q

q <- p

p

tq

q

q <- p

exit

16

Inter-procedural Analysis

h() { if(cond) lwrap(); else lwrap(); … uwrap();}

lwrap () { init();}

uwrap () { access();}

17

Procedure Signatures

p

q rq

q

t

s

entry

u

ret

s

Procedure signature for s: q <- p

s <- ts <- q <- p <- t

18

Mining sequences Sequence mining:

Input: set of sequences (I) Output: sequences that occur ‘frequently’ as

subsequences in I Use the Apriori-all algorithm [Agrawal and

Srikant, Mining Sequential Patterns, ICDE ’95]

19

Control paths: Invariant: a, b, c, e a <- c <- e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a (Faulty path, no call to a and c before e)

Intersection of these paths e is preceded by nothing

Use mining to overcome brittleness of path intersection

Motivation for sequence mining

20

Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a

Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a

Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a

Input sequences: Min Frequency: 4/5 a, b, c, e g, a, d, c, e a, c, e a, c, d, e, f e, f, d, a

Sequence Mining - Example

a 5

c 4

e 5

a, c 4

a, e 4

c, e 4

a, c, e 4Maximal

21

Data-flow preconditions (PLDI 07) Challenges

Data-flow predicates may be aliased

No anchors for data-flow predicates

if (x > 0) f(x);

if (y > 0) f(y);

x = g(…); h(x); if(x > 0) f(x);

22

Motivating Examplemain(…) {

for(ai = options.listen_addrs;…) {

listen_sock = socket(ai->ai_family,…);

if(listen_sock < 0) error();

if(num_listen_socks >= 16) error();

if((ret = getnameinfo(…))) …

if(setsockopt(listen_sock,…) == -1) error();

if(bind(listen_sock, ai->ai_addr,…) < 0) …

}

}

• In a call to bind, the first parameter is always assigned the return value of a call to socket and is checked for error

23

Generate Predicatesmain(…) {

for(ai = options.listen_addrs;…) {

listen_sock = socket(ai->ai_family,…);

if(listen_sock < 0) error();

if(num_listen_socks >= 16) error();

if((ret = getnameinfo(…))) …

if(setsockopt(listen_sock,…) == -1) error();

if(bind(listen_sock, ai->ai_addr,…) < 0) …

}

}listen_sock: return(socket),

(param_1, bind)

(param_1, setsockopt), (>=,0)

num_listen_socks: (<,16)

ret: return(getnameinfo)

24

Another call-site ssh_control_listener(void) { if(control_fd = socket(PF_UNIX,…) < 0) error(); old_umask = umask(0177); if(bind(control_fd,(struct sockaddr *)&addr,…)) …

control_fd: return(socket),

(param_1, bind) (>=,0)

old_umask: return(umask)

25

Structural Similarity Problemlisten_sock: return(socket),

(param_1, bind)

(param_1, setsockopt), (>=,0)

num_listen_socks: (<,16)

ret: return(getnameinfo)

control_fd: return(socket),

(param_1, bind) (>=, 0)

old_umask: return(umask)

How to group the attribute sets that need to be mined together?

Find maximal matching of attribute sets NP-hard Use approximations based on program structures

26

Approximations Type

attribute sets divided based on type of variable Parameter

Supplied as arguments to the same parameter for any given procedure

Result Variables that are assigned the return values of

the same function …

27

Example revisited

Variable names are not comparable Use positional information

Different number of attributes Interspersed with irrelevant operations

listen_sock: return(socket),

(param_1, bind)

(param_1, setsockopt), (>=,0)

num_listen_socks: (<,16)

ret: return(getnameinfo)

control_fd: return(socket),

(param_1, bind) (>=, 0)

old_umask: return(umask)

28

Is intersection robust?

Same limitations as with control-flow preconditions Adopt frequent itemset mining

Order of events is less critical Aggregate collection of data-flow facts at call-sites

listen_sock: return(socket),

(param_1, bind)

(param_1, setsockopt), (>=, 0)

control_fd: return(socket),

(param_1, bind) (>=, 0)

sockfd: return(socket),

(param_1, bind)

return(socket),

(param_1, bind)

(>=, 0) missing!

Precondition:

29

Locality

Interprocedural analysis to capture precondition crossing procedure boundaries

main() { fp = fopen(…); if(fp != NULL) read_file(fp);}

read_file(FILE *fp) { … fgets(buf, SIZE, fp); …}

main() { fp = init_file(…); fgets(buf, SIZE, fp);}

init_file(…) { fp = fopen(…); if(fp != NULL) return fp; exit(-1);}

30

p1, p2

p1, p2

p1

Example

t

rs

s

s

r

p1 p1

p1 p1

p1

p2

p2

q

q

Intraprocedural edge

Interprocedural edge

p1

31

Experiments

Applied on open source C programs Input to the implementation: control flow

graphs Control flow nodes varied from 16K to 958K

Roughly 2M LoC Procedure count varied from 298 to 8568 Precondition predicates varied from 189 to

5963 Analysis time varied from 26s to 20m

32

Experimental Goals Path awareness improves precision Useful for bug detection Generates salient documentation

33

Effectiveness of path awareness

•Fewer protocols generated using our approach• Reduction not at the expense of increase in false negatives• Reduces false positives

34

Bug Detection: Openssh Procedure prime_test in openssh-4.4p1 Testing difficult as it performs Miller-Rabin

primality testing Program crashes due to the absence of a error

check e.g., BN_mod_word(p, …), if p is null, program crashes Fixed in openssh-4.5p1

Error check not always necessary e.g., BN_is_prime(…, ctx,…), ctx can either be null or

pre-allocated

35

Bug detection Case Study: Linux

Hardware Bug Difficult to detect using traditional testing techniques Platform dependent error Transparently identified using our approach

Performance Bug Cache lookup operation was absent Not easily specified as a bug for testing Deviation delays data write flushes

Difficult to identify using traditional testing techniques

36

Change in Confidence

Increase in confidence reduces the number of predicates

37

Related Work

Static techniques Inferring Specifications from Within, Kremenek et al, OSDI 06 Bugs as deviant behavior, Engler et al, SOSP 01 …

Dynamic techniques Strauss, Ammons et al, POPL 02 Daikon, Ernst et al, TSE 01 …

Our approach Path-aware analysis Generates preconditions Predicates of arbitrary size Annotation free

38

Future Work Richer specifications

Post-conditions, divergence structures, … More sophisticated mining techniques

Graph mining, … Validating generated specifications

Integration with theorem prover Specifications and concurrency

Atomicity violations

39

Other work Dynamic analysis

Detecting cause of assertion failures (under review)

Static path profiles (under review)

Impact analysis – ASE 06

Memory aliasing – FASE 06

Test case prioritization – SAC 08

Distributed Systems Randomized leader election (Distributed Computing 07)

Eliminating duplicates in P2P systems (TPDS 07)

Search in P2P systems (P2P 05)

Efficient tag detection in RFID systems (SECON 05)

40

41

Why not mine post-conditions?

Precedence protocol: A call to fclose is always preceded by a call to fopen

fp = fopen(…);if(fp == NULL) exit(-1);fclose(…);

Successor protocol: A call to fopen is always succeeded by a call to fclose

42

Why parameter tracing is insufficient?

uldap_connection_find (…) { //code fragment from httpd if (APR_SUCCESS == apr_thread_mutex_trylock(l->lock)) { … compare_client_certs(st->client_certs, l->client_certs) …}

• In a call to compare_client_certs, the return value of a call to apr_thread_mutex_trylock must be APR_SUCCESS.

• Predicate for compare_client_certs includes

“return value of apr_thread_mutex_trylock(…) is APR_SUCCESS”

43

Predicate size distribution

Majority of predicates less than 3


Recommended