+ All Categories
Home > Documents > Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc....

Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc....

Date post: 25-Dec-2015
Category:
Upload: silvester-lang
View: 219 times
Download: 5 times
Share this document with a friend
Popular Tags:
180
bddbddb: bddbddb: Using Datalog and BDDs for Using Datalog and BDDs for Program Analysis Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006
Transcript
Page 1: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

bddbddb:bddbddb:Using Datalog and BDDs for Using Datalog and BDDs for

Program AnalysisProgram Analysis

John WhaleyStanford University and moka5 Inc.

June 11, 2006

Page 2: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

2

Implementing Program Analysis

…56 pages!

vs.

• 2x faster• Fewer bugs• Extensible

Page 3: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

3

Is it really that easy?

• Requires:– A different way of thinking– Knowledge, experience, and intuition– Perseverance to try different techniques– A lot of tuning and tweaking– Luck

• Despite all this, people who use it swear by it and could “never go back”

Page 4: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

4

Tutorial StructurePart I: Essential Background

– …

Part II: Using the Tools– …

Part III: Developing Advanced Analyses– …

Part IV: Profiling, Debugging, Avoiding Gotchas– …

Short break every 30 minutes

Page 5: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

5

Tutorial StructurePart I: Essential Background

– Datalog for Program Analysis– Binary Decision Diagrams

Part II: Using the Tools– …

Part III: Developing Advanced Analyses– …

Part IV: Profiling, Debugging, Avoiding Gotchas– …

Page 6: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

6

Tutorial StructurePart I: Essential Background

– …

Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode

Part III: Developing Advanced Analyses– …

Part IV: Profiling, Debugging, Avoiding Gotchas– …

Page 7: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

7

Tutorial StructurePart I: Essential Background

– …

Part II: Using the Tools– …

Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features

Part IV: Profiling, Debugging, Avoiding Gotchas– …

Page 8: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

8

Tutorial StructurePart I: Essential Background

– …

Part II: Using the Tools– …

Part III: Developing Advanced Analyses– …

Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for

Page 9: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

9

Try it yourself…

• Available as moka5 LivePC– Non-intrusive installation in a VM– Automatically kept up to date– Easy to try, easy to share– Complete environment on a USB stick

Page 10: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

10

Part I:Part I: Essential Background

Program Analysis in DatalogProgram Analysis in Datalog

Page 11: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

11

Datalog

• Declarative language for deductive databases [Ullman 1989]– Like Prolog, but no function symbols,

no predefined evaluation strategy

Page 12: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

12

Datalog Basics

Atom = Reach(d,x,i)

Literal = Atom or NOT Atom

Rule = Atom :- Literal & … & Literal

Predicate

Arguments:variables or constants

The body :For each assignment of valuesto variables that makes all thesetrue …

Make thisatom true(the head ).

Page 13: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

13

Datalog Example

parent(x,y) :- child(y,x).grandparent(x,z) :- parent(x,y), parent(y,z).

ancestor(x,y) :- parent(x,y).ancestor(x,z) :- parent(x,y), ancestor(y,z).

Page 14: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

14

Datalog

• Intuition: subgoals in the body are combined by “and” (strictly speaking: “join”).

• Intuition: Multiple rules for a predicate (head) are combined by “or.”

Page 15: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

15

Another Datalog Example

_ means “Dont-care” (at least one)! means “Not”

hasChild(x) :- child(_,x).hasNoChild(x) :- !child(_,x).

hasSibling(x) :- child(x,y), child(z,y), z!=x.onlyChild(x) :- child(x,_), !hasSibling(x).

“!” inverts the relation, not the atom!

Page 16: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

16

Reaching Defs in Datalog

Reach(d,x,j) :- Reach(d,x,i), StatementAt(i,s), !Assign(s,x), Follows(i,j).

Reach(s,x,j) :- StatementAt(i,s), Assign(s,x), Follows(i,j).

Page 17: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

17

Definition: EDB Vs. IDB Predicates

• Some predicates come from the program, and their tuples are computed by inspection.– Called EDB, or extensional database

predicates.

• Others are defined by the rules only.– Called IDB, or intensional database

predicates.

Page 18: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

18

Negation

• Negation makes things tricky.• Semantics of negation

– No negation allowed [Ullman 1988]– Stratified Datalog [Chandra 1985]– Well-founded semantics [Van Gelder

1991]

Page 19: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

19

Stratification

• A risk occurs if there are negated literals involved in a recursive predicate.– Leads to oscillation in the result.

• Requirement for stratification :– Must be able to order the IDB predicates so

that if a rule with P in the head has NOT Q in the body, then Q is either EDB or earlier in the order than P.

Page 20: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

20

Example: Nonstratification

P(x) :- E(x), !P(x).• If E(1) is true, is P(1) true?• It is after the first round.• But not after the second.• True after the third, not after the

fourth,…

Page 21: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

21

Iterative Algorithm for Datalog

• Start with the EDB predicates = “whatever the code dictates,” and with all IDB predicates empty.

• Repeatedly examine the bodies of the rules, and see what new IDB facts can be discovered from the EDB and existing IDB facts.

Page 22: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

22

Datalog evaluation strategy

• “Semi-naïve” evaluation– Remember that a new fact can be

inferred by a rule in a given round only if it uses in the body some fact discovered on the previous round.

• Evaluation strategy– Top-down (goal-directed) [Ullman 1985]– Bottom-up (infer from base facts)

[Ullman 1989]

Page 23: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

23

Our Dialect of Datalog

• Totally-ordered finite domains– Domains are of a given, finite size– Makes all Datalog programs “safe”– Cannot mix variables of different domains

• Constants (named/integers)• Comparison operators:

= != < <= > >=

• Dont-care: _ Universe: *

Page 24: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

24

Why Datalog?

• Developed a tool to translate inference rules to BDD implementation

• Later, discovered Datalog (Ullman, Reps)• Semantics of BDDs match Datalog exactly

– Obvious implementation of relations– Operations occur a set-at-a-time– Fast set compare, set difference– Wealth of literature about semantics,

optimization, etc.

Page 25: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

25

Inference Rules

• Datalog rules directly correspond to inference rules.

Assign(v1, v2), vPointsTo(v2, o)Assign(v1, v2), vPointsTo(v2, o).

vPointsTo(v1, o)vPointsTo(v1, o)

:-

Page 26: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

26

Flow-Insensitive Pointer Analysis

o1: p = new Object();

o2: q = new Object();

p.f = q; r = p.f;

p o1

q o2

fr

Input TuplesvPointsTo(p, o1)

vPointsTo(q, o2)

Store(p, f, q)Load(p, f, r)

Output RelationshPointsTo(o1, f, o2)

vPointsTo(r, o2)

Page 27: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

27

vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).

v1

ov2

Inference Rule in Datalog

v1 = v2;

Assignments:

Page 28: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

28

hPointsTo(o1, f, o2):- Store(v1, f, v2),

vPointsTo(v1, o1), vPointsTo(v2, o2).

v1 o1

v2 o2

f

Inference Rule in Datalog

v1.f = v2;

Stores:

Page 29: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

29

vPointsTo(v2, o2):- Load(v1, f, v2),

vPointsTo(v1, o1), hPointsTo(o1, f, o2).

v1 o1

v2 o2

f

Inference Rule in Datalog

v2 = v1.f;

Loads:

Page 30: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

30

The Whole Algorithm

vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).

hPointsTo(o1, f, o2):- Store(v1, f, v2),

vPointsTo(v1, o1), vPointsTo(v2, o2).

vPointsTo(v2, o2):- Load(v1, f, v2),

vPointsTo(v1, o1), hPointsTo(o1, f, o2).

vPointsTo(v, o) :- vPointsTo0(v, o).

Page 31: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

31

Format of a Datalog file

• DomainsName Size ( map file )V 65536 var.mapH 32768

• RelationsName ( <attribute list> ) flagsStore (v1 : V, f : F, v2 : V) inputPointsTo (v : V, h : H) input, output

• RulesHead :- Body .PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).

Page 32: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

32

Key Point

• Program information is stored in a relational database.– Everything in the program is numbered.

• Write declarative inference rules to infer new facts about the program.

• Negations OK if they are not in a recursive cycle.

Page 33: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

33

Take a break…(Next up: Binary Decision Diagrams)

Page 34: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

34

Part I:Part I: Essential Background

Binary Decision DiagramsBinary Decision Diagrams

Page 35: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

35

Call graph relation

• Call graph expressed as a relation.– Five edges:

• Calls(A,B)• Calls(A,C)• Calls(A,D)• Calls(B,D)• Calls(C,D)

B

D

C

A

Page 36: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

36

Call graph relation

• Relation expressed as a binary function.– A=00, B=01, C=10, D=11

B

D

C

A 00

1001

11

Calls(A,B)Calls(A,C)Calls(A,D)Calls(B,D)Calls(C,D)

→ 00 01

→ 00 10

→ 00 11

→ 01 11

→ 10 11

Page 37: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

37

Call graph relation

• Relation expressed as a binary function.– A=00, B=01, C=10, D=11

from to

x1 x2 x3 x4 f

0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 0

B

D

C

A 00

1001

11

Page 38: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

38

Binary Decision Diagrams (Bryant 1986)

• Graphical encoding of a truth table.

x2

x4

x3 x3

x4 x4 x4

0 0 0 1 0 0 0 0

x2

x4

x3 x3

x4 x4 x4

0 1 1 1 0 0 0 1

x1 0 edge

1 edge

Page 39: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

39

Binary Decision Diagrams

• Collapse redundant nodes.

x2

x4

x3 x3

x4 x4 x4

0 0 0 0 0 0 0

x2

x4

x3 x3

x4 x4 x4

0 0 0 0

x1

11 1 1 1

0 edge

1 edge

Page 40: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

40

Binary Decision Diagrams

• Collapse redundant nodes.

x2

x4

x3 x3

x4 x4 x4

x2

x4

x3 x3

x4 x4 x4

0

x1

1

0 edge

1 edge

Page 41: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

41

Binary Decision Diagrams

• Collapse redundant nodes.

x2

x4

x3 x3

x2

x3 x3

x4 x4

0

x1

1

0 edge

1 edge

Page 42: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

42

Binary Decision Diagrams

• Collapse redundant nodes.

x2

x4

x3 x3

x2

x3

x4 x4

0

x1

1

0 edge

1 edge

Page 43: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

43

Binary Decision Diagrams

• Eliminate unnecessary nodes.

x2

x4

x3 x3

x2

x3

x4 x4

0

x1

1

0 edge

1 edge

Page 44: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

44

Binary Decision Diagrams

• Eliminate unnecessary nodes.

x2

x3

x2

x3

x4

0

x1

1

0 edge

1 edge

Page 45: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

45

Binary Decision Diagrams

• Size depends on amount of redundancy,NOT size of relation.– Identical subtrees share the same

representation.– As set gets very large, more nodes have

identical zero and one successors, so the size decreases.

Page 46: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

46

BDD Variable Order is Important!

x1

x3

x4

0 1

x2

x1x2 + x3x4

x1<x2<x3<x4 x1<x3<x2<x4

x1

x3

x4

0 1

x2

x3

x2

Page 47: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

47

Variable ordering is NP-hard

• No good general heuristic solutions• Dynamic reordering heuristics don’t

work well on these problems

• We use:– Trial and error– Active learning

Page 48: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

48

Arguments Arguments AA, , BB, , opop AA and BB: Boolean Functions

Represented as OBDDs op: Boolean Operation (e.g., ^, &, |)

Apply Operation

• Concept– Basic technique for building OBDD from Boolean

formula.AA BBopop

ResultResult OBDD representing

composite function

AA opop BB

A op BA op B

0

d

c

b

1

a

0

d

c

b

1

a

0 1

d

c

a

b

0

d

1

c

a

|

Page 49: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

49

0 1

d

c

a

B3 B4

B2

B5

B1

Argument A

Operation

Argument B

b

0

d

1

c

a

A4 A5

A3

A2

A6

A1

Apply Execution Example

• Optimizations– Dynamic programming– Early termination rules

|

Recursive Calls

A3,B2

A6,B2

A2,B2

A3,B4A5,B2

A6,B5

A1,B1

A5,B4A4,B3

Page 50: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

50

0 1

d

c

b

11

c

a

Without Reduction With Reduction

0

d

c

b

1

a

Apply Result Generation

– Recursive calling structure implicitly defines unreduced BDD

– Apply reduction rules bottom-up as return from recursive calls

Recursive Calls

A3,B2

A6,B2

A2,B2

A3,B4A5,B2

A6,B5

A1,B1

A5,B4A4,B3

Page 51: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

51

BDD implementation

• ‘Unique’ table– Huge hash table– Each entry: level, left, right, hash, next

• Operation cache– Memoization cache for operations

• Garbage collection– Mark and sweep, free list.

Page 52: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

52

Code for BDD ‘and’.

Base case:

Memo cache lookup:

Recursive step:

Memo cache insert:

Page 53: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

53

BDD Libraries• BuDDy

– Simple, fast, memory-friendly– Identifies BDD by index in unique table

• JavaBDD– 100% Java, based on BuDDy– Also native interface to BuDDY, CUDD, CAL, JDD

• CUDD– Most popular, most feature-complete– Not as fast as BuDDy– Other types: ZDD, ADD

• JDD– 100% Java, fresh implementation– Still under development

Page 54: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

54

Depth-first vs. breadth-first

• BDD algorithms have natural depth-first recursive formulations.

• Some work on using breadth-first evaluation for better parallelism and locality– CAL: breadth-first BDD package

• General idea: Assume independent, fixup if not.

• Doesn’t perform well in practice.

Page 55: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

55

Take a break…(Next up: Using the Tools)

Page 56: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

56

Tutorial StructurePart I: Essential Background

– Datalog for Program Analysis– Binary Decision Diagrams

Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode

Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features

Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for

Page 57: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

57

bddbddbbddbddb((BDDBDD--bbased ased ddeductive eductive ddataatabbase)ase)

Part II:Part II: Using the Tools

Page 58: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

58

bddbddb System Overview

Joeq frontend

Java bytecod

e

Datalogprogram

Input relations

Output relations

Page 59: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

59

Compiler Frontend

• Convert IR into tuples• Tuples format:

# V0:16 F0:11 V1:160 0 10 1 21470 0 1464

header line

one tuple per line

Page 60: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

60

Compiler Frontend

• Robust frontends:– Joeq compiler– Soot compiler– SUIF compiler (for C code)

• Still experimental:– Eclipse frontend– gcc frontend– …

Page 61: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

61

Extracting Relations

• Idea: Iterate thru compiler IR, numbering and dumping relations of interest.– Types– Methods– Fields– Variables– …

Page 62: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

62

joeq.Main.GenRelations

• Generate initial relations for points-to analysis.– Does initial pass to discover call graph.

• Options:-fly : dump on-the-fly call graph info-cs : dump context-sensitive info-ssa : dump SSA representation-partial : no call graph discovery-Dpa.dumppath= : where to save files-Dpa.icallgraph= : location of initial call graph-Dpa.dumpdotgraph : dump call graph in dot

Page 63: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

63

Demo of joeq GenRelations

Page 64: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

64

bddbddb:bddbddb:From Datalog to BDDsFrom Datalog to BDDs

Part II:Part II: Using the Tools

Page 65: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

65

An Adventure in BDDs• Context-sensitive numbering scheme

– Modify BDD library to add special operations.– Can’t even analyze small programs. Time:

• Improved variable ordering– Group similar BDD variables together.– Interleave equivalence relations.– Move common subsets to edges of variable

order. Time: 40h• Incrementalize outermost loop

– Very tricky, many bugs. Time: 36h• Factor away control flow, assignments

– Reduces number of variables. Time: 32h

Page 66: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

66

An Adventure in BDDs

• Exhaustive search for best BDD order– Limit search space by not considering intradomain

orderings. Time: 10h

• Eliminate expensive rename operations– When rename changes relative order, result is not

isomorphic. Time: 7h

• Improved BDD memory layout– Preallocate to guarantee contiguous. Time: 6h

• BDD operation cache tuning– Too small: redo work, too big: bad locality– Parameter sweep to find best values. Time: 2h

Page 67: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

67

An Adventure in BDDs

• Simplified treatment of exceptions– Reduce number of variables, iterations necessary

for convergence. Time: 1h

• Change iteration order– Required redoing much of the code. Time:

48m

• Eliminate redundant operations– Introduced subtle bugs. Time: 45m

• Specialized caches for different operations– Different caches for and, or, etc. Time: 41m

Page 68: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

68

An Adventure in BDDs

• Compacted BDD nodes– 20 bytes 16 bytes Time: 38m

• Improved BDD hashing function– Simpler hash function. Time: 37m

• Total development time: 1 year– 1 year per analysis?!?

• Optimizations obscured the algorithm.• Many bugs discovered, maybe still more.

Page 69: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

69

bddbddb:bddbddb:BDDBDD--BBased ased DDeductive eductive

DDataataBBasease• Automatically generate from Datalog

– Optimizations based on my experience with handcoded version.

– Plus traditional compiler algorithms.

• bddbddb even better than handcoded– handcoded: 37m bddbddb: 19m

Page 70: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

70

Datalog BDDsDatalog BDDs

Relations Boolean functions

Relation ops: ⋈, ∪, select, project

Boolean function ops:∧, ∨, −, ∼

Relation at a time Function at a time

Semi-naïve evaluation

Incrementalization

Fixed-point Iterate until stable

Page 71: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

71

Compiling Datalog to BDDs

1. Apply Datalog source level transforms.2. Stratify and determine iteration order.3. Translate into relational algebra IR.4. Optimize IR and replace relational

algebra ops with equivalent BDD ops.5. Assign relation attributes to physical BDD

domains.6. Perform more optimizations after domain

assignment.7. Interpret the resulting program.

Page 72: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

72

High-Level Transform:Magic Set Transformation

• Add “magic” predicates to control generated tuples [Bancilhon 1986, Beeri 1987]– Combines ideas from top-down and

bottom-up evaluation

• Doesn’t always help– Leads to more iterations– BDDs are good at large operations

Page 73: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

73

Predicate Dependency Graph

vPointsTo

hPointsTo

Store

Load

vPointsTo0

Assign

vPointsTo(v, o) :- vPointsTo0(v, o).

add edge from RHS to LHS

vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).

hPointsTo(o1, f, o2):- Store(v1, f, v2),

vPointsTo(v1, o1), vPointsTo(v2, o2).

vPointsTo(v2, o2):- Load(v1, f, v2),

vPointsTo(v1, o1), hPointsTo(o1, f, o2).

Page 74: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

74

Determining Iteration Order

• Tradeoff between faster convergence and BDD cache locality

• Static heuristic– Visit rules in reverse post-order– Iterate shorter loops before longer loops

• Profile-directed feedback• User can control iteration order

– pri=# keywords on rules/relations

Page 75: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

75

Predicate Dependency Graph

vPointsTo

hPointsTo

Store

Load

vPointsTo0

Assign

Page 76: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

76

Datalog to Relational Algebra

vPointsTo(v1, o) :- Assign(v1, v2), vPointsTo(v2, o).

t1 = ρvariable→source(vPointsTo);

t2 = assign ⋈ t1;

t3 = πsource(t2);

t4 = ρdest→variable(t3);

vPointsTo = vPointsTo ∪ t4;

Page 77: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

77

Incrementalization

t1 = ρvariable→source(vP);

t2 = assign ⋈ t1;

t3 = πsource(t2);

t4 = ρdest→variable(t3);

vP = vP ∪ t4;

vP’’ = vP – vP’;

vP’ = vP;

assign’’ = assign – assign’;

assign’ = assign;

t1 = ρvariable→source(vP’’);

t2 = assign ⋈ t1;

t5 = ρvariable→source(vP);

t6 = assign’’ ⋈ t5;

t7 = t2 ∪ t6;

t3 = πsource(t7);

t4 = ρdest→variable(t3);

vP = vP ∪ t4;

Page 78: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

78

Optimize into BDD operationsvP’’ = vP – vP’;

vP’ = vP;

assign’’ = assign – assign’;

assign’ = assign;

t1 = ρvariable→source(vP’’);

t2 = assign ⋈ t1;

t5 = ρvariable→source(vP);

t6 = assign’’ ⋈ t5;

t7 = t2 ∪ t6;

t3 = πsource(t7);

t4 = ρdest→variable(t3);

vP = vP ∪ t4;

vP’’ = diff(vP, vP’);vP’ = copy(vP);t1 = replace(vP’’,variable→source);

t3 = relprod(t1,assign,source);

t4 = replace(t3,dest→variable);

vP = or(vP, t4);

Page 79: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

79

Physical domain assignment

vP’’ = diff(vP, vP’);vP’ = copy(vP);t1 = replace(vP’’,variable→source);

t3 = relprod(t1,assign,source);

t4 = replace(t3,dest→variable);

vP = or(vP, t4);

vP’’ = diff(vP, vP’);vP’ = copy(vP);t3 =

relprod(vP’’,assign,V0);t4 = replace(t3, V1→V0);

vP = or(vP, t4);

• Minimizing renames is NP-complete• Renames have vastly different costs• Priority-based assignment algorithm

Page 80: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

80

Other optimizations

• Dead code elimination• Constant propagation• Definition-use chaining• Redundancy elimination• Global value numbering• Copy propagation• Liveness analysis

Page 81: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

81

Splitting rules

R(a,e) :- A(a,b), B(b,c), C(c,d), R(d,e).Can be split into:

T1(a,c) :- A(a,b), B(b,c).

T2(a,d) :- T1(a,c), C(c,d).

R(a,e) :- T2(a,d), R(d,e).

Affects incrementalization, iteration.Use “split” keyword to auto-split rules.

Page 82: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

82

Other Tools

• Banshee (John Kodumal)– Results are harder to use (not relational)

• Paddle/Jedd (Ondrej Lhotak)– Imperative style: more expressive– Not as efficient, doesn’t scale as well

Page 83: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

83

Jedd code

vP’’ = diff(vP, vP’);vP’ = copy(vP);t1 = replace(vP’’,variable→source);

t3 = relprod(t1,assign,source);

t4 = replace(t3,dest→variable);

vP = or(vP, t4);

• Jedd code is like bddbddb internal IR before domain assignment:

Page 84: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

84

Demo of using bddbddb

Page 85: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

85

Tutorial StructurePart I: Essential Background

– Datalog for Program Analysis– Binary Decision Diagrams

Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode

Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features

Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for

Page 86: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

86

Context SensitivityContext SensitivityPart III:Part III: Developing Advanced Analyses

Page 87: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

87

Old Technique: Summary-Based Analysis

• Idea: Summarize the effect of a method on its callers.– Sharir, Pnueli [Muchnick 1981]– Landi, Ryder [PLDI 1992]– Wilson, Lam [PLDI 1995]– Whaley, Rinard [OOPSLA 1999]

Page 88: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

88

Old Technique: Summary-Based Analysis

• Problems:– Difficult to summarize pointer analysis.– Composed summaries can get large.– Recursion is difficult: Must find fixpoint.– Queries (e.g. which context points to x)

require expanding an exponential number of contexts.

Page 89: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

89

My Technique:Cloning-Based Analysis

• Simple brute force technique.– Clone every path through the call graph.– Run context-insensitive algorithm on

expanded call graph.

• The catch: exponential blowup

Page 90: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

90

Cloning is exponential!

Page 91: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

91

Recursion

• Actually, cloning is unbounded in the presence of recursive cycles.

• Technique: We treat all methods within a strongly-connected component as a single node.

Page 92: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

92

Recursion

A

G

B C D

E F

A

G

B C D

E F E F E F

G G

Page 93: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

93

Top 20 Sourceforge Java AppsNumber of Clones

1.E+00

1.E+02

1.E+04

1.E+06

1.E+08

1.E+10

1.E+12

1.E+14

1.E+16

1000 10000 100000 1000000

Size of program (variable nodes)

Nu

mb

er o

f cl

on

es

1016

1012

108

104

100

Page 94: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

94

Cloning is infeasible (?)

• Typical large program has ~1014 paths.• If you need 1 byte to represent a clone:

– Would require 256 terabytes of storage– >12 times size of Library of Congress– Registered ECC 1GB DIMMs: $41.7 million

• Power: 96.4 kilowatts = Power for 128 homes

– 500 GB hard disks: 564 x $195 = $109,980• Time to read sequential: 70.8 days

Page 95: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

95

Key Insight

• There are many similarities across contexts.– Many copies of nearly-identical results.

• BDDs can represent large sets of redundant data efficiently.– Need a BDD encoding that exploits the

similarities.

Page 96: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

96

Expanded Call Graph

A

DB C

E

F G

H

A0

D0B0 C0

E1

F2 G0

H0

E0 E2

F0 F1 G2G1

H1 H2 H3 H4 H5

Page 97: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

97

Numbering Clones

A

DB C

E

F G

H

0 0 0

0 1 2

0-2 0-2

0-2 3-5

0A0

D0B0 C0

E1

F2 G0

H0

E0 E2

F0 F1 G2G1

H1 H2 H3 H4 H5

Page 98: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

98

Context-sensitive Pointer Analysis Algorithm

1. First, do context-insensitive pointer analysis to get call graph.

2. Number clones.3. Do context-insensitive algorithm on

the cloned graph.

• Results explicitly generated for every clone.• Individual results retrievable with Datalog

query.

Page 99: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

99

Counting rule

• IEnum(i,m,vc2,vc1) :- roots(m), mI0(m,i), IE0(i,m). number

• Special rule to define numbering.• Head: result of numbering

– First two variables: edge you want to number– Second two variables: context numbering

• Subgoals: graph edges– Single variable: roots of graph

Page 100: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

100

Demo of context-sensitive

Page 101: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

101

Example: Race DetectionExample: Race DetectionPart III:Part III: Developing Advanced Analyses

Page 102: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

102

Object Sensitivity• k-object-sensitivity (Milanova, Ryder, Rountev 2003)

• k=3 suffices in our experiments

• CHA/context-insensitive/k-CFA too imprecise

static main() { Contexts of method bar():h1: C a = new A();h2: C b = new B(); 1-CFA: { p4 }p1: foo(a); 2-CFA: { p1:p4, p2:p4, p3:p4 }p2: foo(b);p3: foo(a); } 1-objsens: { h1, h2 }static foo(C c) { 2-objsens: { h1, h2 }p4: c.bar(); }

Page 103: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

103

Open Programs

• Analyzing open programs is important

– Many “programs” are libraries

– Developers need to understand behavior w/o a client

• Standard approach

– Write a “harness” manually

– A client exercising the interface of the open program

• Our approach

– Generate the harness automatically

Page 104: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

104

Race Detection

• A multi-threaded program contains a race if:– Two threads can access a memory

location– At least one access is a write– No ordering between the accesses

• As a rule, races are bad– And common …– And hard to find …

Page 105: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

105

Running Example

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

Harness

(Note: Single-threaded)

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Page 106: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

106

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Example: Two Object-Sensitive Contexts

Page 107: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

107

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Example: 1st Context

Page 108: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

108

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Example: 2nd Context

Page 109: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

109

All pairs of accesses such that

– Both references of one of the following forms:

• e1.f and e2.f (the same instance field)

• C.g and C.g (the same static field)

• e1[e3] and e2[e4] (any array elements)

– At least one is a write

Computing Original Pairs

Page 110: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

110

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Example: Original Pairs

Page 111: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

111

Computing Reachable Pairs

• Step 1

– Access pairs with at least one write to same field

• Step 2

– Consider any access pair (e1, e2)

– To be a race e1 must be:

– Reachable from a thread-spawning call site s1

• Without “switching” threads

– Where s1 is reachable from main

– (and similarly for e2)

Page 112: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

112

Example: Reachable Pairs

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

private int rd() { return f; }

private int wr(int x) {f = x;return x; }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Page 113: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

113

Computing Aliasing Pairs

• Steps 1-2

– Access pairs with at least one write to same field

– And both are executed in a thread in some context

• Step 3

– To have a race, both must access the same memory location

– Use alias analysis

Page 114: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

114

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

private int rd() { return f; }

private int wr(int x) {f = x;return x; }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Example: Aliasing Pairs

Page 115: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

115

Computing Escaping Pairs

• Steps 1-3

– Access pairs with at least one write to same field

– And both are executed in a thread in some context

– And both can access the same memory location

• Step 4

– To have a race, the memory location must alsobe thread-shared

– Use thread-escape analysis

Page 116: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

116

Example: Escaping Pairs

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

private int rd() { return f; }

private int wr(int x) {f = x;return x; }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() { return f; }

private int wr(int x) {f = x;return x; }

Page 117: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

117

Computing Unlocked Pairs

• Steps 1-4

– Access pairs with at least one write to same field

– And both are executed in a thread in some context

– And both can access the same memory location

– And the memory location is thread-shared

• Step 5

– Discard pairs where the memory location is guarded by a common lock in both accesses

Page 118: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

118

Example: Unlocked Pairs

static public void main() {

A a;

a = new A();

a.get();

a.inc(); }

private int rd() { return f; }

private int wr(int x) {f = x;return x; }

public A() {f = 0; }

public int get() {return rd(); }

public sync int inc() { int t = rd() + (new A()).wr(1);

return wr(t); }

private int rd() {return f; }

private int wr(int x) {f = x;return x; }

Page 119: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

119

Counterexamples

• Each pair of paths in the context-sensitive call graphfrom a pair of roots to a pair of accesses along which a common lock may not be held

• Different from most other systems

– Pairs of paths (instead of single interleaved path)

– At call-graph level

Page 120: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

120

Example: Counterexample

// file Harness.java

static public void main() {

A a;

a = new A();

4: a.get();

5: a.inc(); }

field reference A.f (A.java:10) [Rd]A.get(A.java:4)Harness.main(Harness.java:4)

field reference A.f (A.java:12) [Wr]A.inc(A.java:7)Harness.main(Harness.java:5)

// file A.java public A() {

f = 0; } public int get() {4: return rd(); } public sync int inc() {

int t= rd() + (new A()).wr(1);7: return wr(t); }

private int rd() {10: return f; } private int wr(int x) {12: f = x;

return x; }

Page 121: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

121

Race Checker Datalog

Page 122: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

122

Map Sensitivity

• Maps with constant string keys are common

• Augment pointer analysis:

– Model Map.put/get operations specially

...

String username = request.getParameter(“user”)

map.put(“USER_NAME”, username);

...

String query = (String) map.get(“SEARCH_QUERY”);

stmt.executeQuery(query);

...

“USER_NAME” ≠ “SEARCH_QUERY”

Page 123: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

123

Resolving Reflection

• Reflection is a dynamic language feature• Used to query object and class information

– static Class Class.forName(String className) • Obtain a java.lang.Class object• I.e. Class.forName(“java.lang.String”) gets an

object corresponding to class String– Object Class.newInstance()

• Object constructor in disguise• Create a new object of a given class

Class c = Class.forName(“java.lang.String”);Object o = c.newInstance();

• This makes a new empty string o

Page 124: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

124

What to Do About Reflection?

1. Anything goes

+ Obviously conservative

- Call graph extremely big and imprecise

1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;

3. Subtypes of T

+ More precise

- T may have many subtypes

4. Analyze className

+ Better still- Need to

know where className comes from

2. Ask the user

+ Good results- A lot of work

for user, difficult to find answers

Page 125: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

125

Analyzing Class Names

• Looking at className seems promising

• This is interprocedural const+copy prop on strings

String stringClass = “java.lang.String”;

foo(stringClass);

...

void foo(String clazz){

bar(clazz);

}

void bar(String className){

Class c = Class.forName(className);

}

Page 126: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

126

Pointer Analysis Can Help

stringClass

clazz

className

Stack variables Heap objects

java.lang.String

Page 127: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

127

Reflection Resolution Using Points-to

• Need to know what className is– Could be a local string constant like java.lang.String– But could be a variable passed through many layers of

calls• Points-to analysis says what className refers to

– className --> concrete heap object

1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;

Page 128: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

128

Reflection ResolutionConstants

1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;

Specification points

1. String className = ...;2. Class c = Class.forName(className); Object o = new T1(); Object o = new T2(); Object o = new T3();4. T t = (T) o;

Q: what object does this create?

Page 129: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

129

Resolution May Fail!

• Need help figuring out what className is• Two options

1. Can ask user for help• Call to r.readLine on line 1 is a specification point• User needs to specify what can be read from a file• Analysis helps the user by listing all specification points

2. Can use cast information• Constrain possible types instantiated on line 3 to subclasses of T• Need additional assumptions

1. String className = r.readLine();2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;

Page 130: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

130

1. Specification Files

loadImpl() @ 43 InetAddress.java:1231 => java.net.Inet4AddressImpl

loadImpl() @ 43 InetAddress.java:1231 => java.net.Inet6AddressImpl

lookup() @ 86 AbstractCharsetProvider.java:126 => sun.nio.cs.ISO_8859_15

lookup() @ 86 AbstractCharsetProvider.java:126 => sun.nio.cs.MS1251

tryToLoadClass() @ 29 DataFlavor.java:64 => java.io.InputStream

Format: invocation site => class

Page 131: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

131

2. Using Cast Information

• Providing specification files is tedious, time-consuming, error-prone

• Leverage cast data instead– o instanceof T– Can constrain type of o if

1. Cast succeeds2. We know all subclasses of T

1. String className = ...;2. Class c = Class.forName(className);3. Object o = c.newInstance();4. T t = (T) o;

Page 132: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

132

Analysis Assumptions

1. Assumption: Correct casts.Type cast operations that always operate on the result of a call to Class.newInstance are correct; they will always succeed without throwing a ClassCastException.

2. Assumption: Closed world.We assume that only classes reachable from the class path at analysis time can be used by the application at runtime.

Page 133: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

133

Casts Aren’t Always Present

• Can’t do anything if no cast post-dominating a Class.newInstance call

Object factory(String className){

Class c = Class.forName(className);

return c.newInstance();

}

...

SunEncoder t = (SunEncoder)

factory(“sun.io.encoder.” + enc);

SomethingElse e = (SomethingElse)

factory(“SomethingElse“);

Page 134: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

134

Call Graph Discovery Process

Program IRProgram IR Call graph constructionCall graph

constructionReflection resolution

using points-to

Reflection resolution

using points-to

Resolved calls

Resolved calls

Final call graph

Final call graph

User-providedspec

User-providedspec

Cast-based approximationCast-based approximation

Specification points

Specification points

Page 135: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

135

Implementation Details

• Call graph construction algorithm in the presence of reflection is integrated with pointer analysis– Pointer analysis already has to deal with virtual calls:

new methods are discovered, points-to relations for them are created

– Reflection analysis is another level of complexity

• See Datalog specification

Page 136: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

136

Reflection Resolution Results• Applied to 6 large Java apps, 190,000 LOC combined

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

jgap freetts gruntspud jedit columba jfreechart

Meth

ods

Call graph sizes compared

Page 137: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

137

Map relations

• Need to map from values in one domain to another?

• Use special operator “=>”• mapAtoB(a,b) :- a => b.• Elements in A are appended to

domain of B– A must have map file.– B must have enough space.

Page 138: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

138

Using Code Fragments

• Execute a code fragment before/after every rule invocation.A(x,y) :- B(y,a), C(a,z). { code goes here }

• Can access:– Relations by name.– Rule information.– Solver information.

• Can also add code fragment to relations (triggered on change).

• Special keywords: “modifies”, “pre”, “post”

Page 139: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

139

Take a break…(Next up: Profiling, Debugging)

Page 140: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

140

Tutorial StructurePart I: Essential Background

– Datalog for Program Analysis– Binary Decision Diagrams

Part II: Using the Tools– bddbddb– Compiler interface (Joeq compiler)– Datalog editor in Eclipse– Interactive mode

Part III: Developing Advanced Analyses– Context sensitivity– Combining multiple analyses– Race detection examples– Using advanced bddbddb features

Part IV: Profiling, Debugging, Avoiding Gotchas– Variable ordering– Iteration order– Machine learning– What it’s good for, what it isn’t good for

Page 141: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

141

Part IV:Part IV: Profiling, Debugging, Avoiding Gotchas

Variable OrderingVariable Ordering

Page 142: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

142

TryDomainOrders

• Try all possible domain orders for a given operation and inputs.– Bounded: if an order takes longer than current

best, abort it.

• To profile slow-running operations:Run with -Ddumpslow, -Ddumpcutoff=5000java net.sf.bddbddb.TryDomainOrders

• If you know ordering constraints, you can add them to rules/relations– Constraints automatically propagated to other

rules/relations

Page 143: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

143

Variable Numbering: Active Machine Learning

• Must be determined dynamically• Limit trials with properties of relations• Each trial may take a long time• Active learning:

select trials based on uncertainty– Can build up trial database to improve

accuracy

• Several hours• Comparable to exhaustive for small apps

Page 144: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

144

Using Machine Learning

• -Dfindbestorder– Enable machine learning

• -Dfbocutoff=#– Minimum runtime (in ms) for an

operation to be considered• -Dfbotrials=#

– Maximum number of trials to run• -Dtrialfile=

– Filename to load/store trial information.

Page 145: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

145

Changing Iteration Order

• bddbddb uses simple iteration order heuristic – not always optimal

• If a rule is iterating too many times:– Lower its priority with pri=5– Increase other rules with pri=-5– Can also adjust priority of relations

• Solver prints iteration order on startup• Also try reformulating the problem or

changing input relations

Page 146: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

146

Reformulate the Problem

• Change rule form:A(a,c) :- A(a,b), A(b,c). vsA(a,c) :- A(a,b), A(b,c).

• Change input relations– Short-circuit paths

• Filter relations as you go

Page 147: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

147

Debugging

• Debugging can be tricky– Relations are huge– Declarative: not so straightforward

• Adding code fragments can help.• Try it on a small example with full

trace information.• Best: Interactive solver

Page 148: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

148

“Comes from” query

• Special kind of query:A(3,5) :- ?

“What contributed to (3,5) being added to A?”

• Add ‘single’ keyword to get only one path.• Doesn’t solve the negated problem

(missing tuples)

Page 149: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

149

Solver options

-Dnoisy-Dtracesolve-Dfulltracesolve-Dbddvarorder=-Dbddnodes=-Dbddcache=-Dbddminfree=-Dfindbestorder

-Dbasedir=-Dincludedirs=-Ddumprulegraph-Duseir-Dprintir-Dsplit_all_rules-Dsplit_no_rules

Page 150: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

150

Datalog directives

• .include• .split_all_rules• .report_stats• .noisy• .strict• .singleignore• .trace

• .bddvarorder• .bddnodes• .bddcache• .bddminfree• .findbestorder• .incremental• .dot• .basedir

Page 151: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

151

Relation options

input / inputtuplesoutput / outputtuplesprinttuplesprintsizepri=#{ code fragment }x < y

splitnumbersinglecacheafterrenamefindbestordertrace / tracefullpre / post { code }modifies R

Rule options

Page 152: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

152

Experimental Features

• Distributed computation: (dbddbddb?)• Profile-directed feedback of iteration

order• Eclipse integration• Touchgraph integration• Debugging interface• Tracing information• Include rules in come-from query

Page 153: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

153

What works well

• Big sets of mostly redundant data– Pointer analysis– Context-sensitive analysis

• Short propagation paths– Each iteration takes quite a bit of time, so

>1000 iterations will hurt– Try to preprocess/reformulate problem to

shorten paths

• Natural ‘flow’ problems• Pure analysis problems (no transformations)

Page 154: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

154

What doesn’t work well• Long propagation paths

– Traditional dataflow analysis (use sparse form instead)

• Huge problems with little redundancy– Too much context sensitivity

• Domains that are not easily countable– Need to manufacture names on the fly

• Problems that have inherently complicated formulations

• Problems optimized for particular data structures (union-find, etc.)

Page 155: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

155

Using bddbddb in a class• bddbddb has been very useful in Stanford

advanced compiler course– Comparing/contrasting analyses becomes easier– Students can implement and evaluate multiple

techniques without much overhead• Projects:

– Implement an algorithm from a paper in Datalog, make a small change and evaluate its effectiveness

– Experiment with different kinds of context sensitivity on a given problem

– Improve on BDD solver efficiency– Build a tool based on analysis results

Page 156: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

156

Questions?

Page 157: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

157

That’s all, folks!

Thanks for sticking around for all 157 slides!

Page 158: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

158

Experimental ResultsExperimental Results

Page 159: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

159

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 160: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

160

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 161: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

161

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 162: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

162

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 163: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

163

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 164: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

164

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e to

ha

nd

co

de

d

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 165: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

165

Java Context-Insensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 166: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

166

Java Context-Sensitive Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e to

ha

nd

co

de

d

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 167: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

167

C Pointer AnalysisSpeed Comparison (Normalized to Handcoded)

0

0.5

1

1.5

2

2.5

3

3.5

4

crafty enscript hypermail monkey

Sp

ee

d r

ela

tiv

e t

o h

an

dc

od

ed

Handcoded

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 168: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

168

External Lock AnalysisSpeed Comparison (Normalized to No Opts)

0

1

2

3

4

5

6

joeq jgraph jbidwatch jedit umldot megamek

Sp

ee

d r

ela

tiv

e t

o N

o O

pts

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 169: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

169

SQL Injection AnalysisSpeed Comparison (Normalized to Incr)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

personalblog road2hibernate snipsnap roller

Sp

ee

d r

ela

tiv

e t

o In

cr

No Opts

Incr

+DU

+Dom

+All

+Order

Experimental Results

Page 170: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

170

Related Work• Datalog in Program Analysis

– Specify as Datalog query [Ullman 1989]– Toupie system [Corsini 1993]– Demand-driven using magic sets [Reps 1994]– Program analysis with logic programming [Dawson

1996]– Crocopat system [Beyer 2003]– Modular class analysis [Besson 2003]

• BDDs in Program Analysis– Predicate abstraction [Ball 2000]– Shape analysis [Manevich 2002, Yavuz-Kahveci 2002]– Pointer Analysis [Zhu 2002, Berndl 2003, Zhu 2004]– Jedd system [Lhotak 2004]

Page 171: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

171

Related Work• BDD Variable Ordering

– Variable ordering is NP-complete [Bollig 1996]– Interleaving [Fujii 1993]– Sifting [Rudell 1993]– Genetic algorithms [Drechsler 1995]– Machine learning for BDD orders [Grumberg 2003]

• Efficient Evaluation of Datalog– Semi-naïve evaluation [Balbin 1987]– Bottom-up evaluation [Ullman 1989, Ceri 1990, Naughton

1991]– Top-down evaluation with tabling [Tamaki 1986, Chen 1996]– Rule ordering [Ramakrishnan 1990]– Magic sets transformation [Bancilhon 1986]– Computing with BDDs [Iwaihara 1995]– Time and space guarantees [Liu 2003]

Page 172: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

172

Program Analysis with bddbddb

• Context-sensitive Java pointer analysis

• C pointer analysis• Escape analysis• Type analysis• External lock analysis• Finding memory leaks• Interprocedural def-use• Interprocedural mod-ref

• Object-sensitive analysis• Cartesian product

algorithm• Resolving Java reflection• Bounds check

elimination• Finding race conditions• Finding Java security

vulnerabilities• And many more…

Performance better than handcoded!

Page 173: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

173

Conclusion

• bddbddb: new paradigm in program analysis– Datalog compiled into optimized BDD operations– Efficiently and easily implement context-sensitive

analyses– Easier to develop correct analyses– Easily experiment with new ideas– Growing library of program analyses– Easily use and build upon work of others

• Available as open-source LGPL: http://bddbddb.sourceforge.net

Page 174: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

174

My Contribution (2)

– Pointer analysis in 6 lines of Datalog (a database language)• Hard to create & debug efficient BDD-based

algorithms (3451 lines, 1 man-year)• Automatic optimizations in bddbddb

– Easy to create context-sensitive analyses using pointer analysis results (a few lines)

– Created many analyses using bddbddb

bddbddb(BDD-based deductive database)

Page 175: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

175

Outline• Pointer Analysis

– Problem Overview– Brief History– Pointer Analysis in Datalog

• Context Sensitivity• Improving Performance• bddbddb: BDD-based deductive database• Experimental Results

– Analysis Time– Analysis Memory– Analysis Accuracy

• Conclusion

Page 176: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

176

Performance is Tricky!• Context-sensitive numbering scheme

– Modify BDD library to add special operations.– Can’t even analyze small programs. Time:

• Improved variable ordering– Group similar BDD variables together.– Interleave equivalence relations.– Move common subsets to edges of variable

order. Time: 40h• Incrementalize outermost loop

– Very tricky, many bugs. Time: 36h• Factor away control flow, assignments

– Reduces number of variables. Time: 32h

Page 177: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

177

due to V. Benjamin Livshits

Java Security VulnerabilitiesApplication Reported Errors Actual

NameClasses

context-insensitiv

e

context-sensitive

Errors

blueblog 306 1 1 1webgoat 349 81 6 6blojsom 428 48 2 2personalblog 611 350 2 2snipsnap 653 >321 27 15road2hiberna

867 15 1 1

pebble 889 427 1 1roller 989 >267 1 1

Total 5356 >1508 41 29

Page 178: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

178

Vulnerabilities Found

SQL injection

HTTP splitting

Cross-site scripting

Path traversal

Total

Header 0 6 4 0 10Parameter 6 5 0 2 13Cookie 1 0 0 0 1Non-Web 2 0 0 3 5Total 9 11 4 5 29

Page 179: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

179

Summary of Contributions• The first scalable context-sensitive subset-based

pointer analysis.– Cloning-based technique using BDDs– Clever context numbering– Experimental results on the effects of context sensitivity

• bddbddb: new paradigm in program analysis– Efficiently and easily implement context-sensitive

analyses– Datalog compiled into optimized BDD operations– Library of program analyses (with many others)– Active learning for BDD variable orders (with M. Carbin)

• Artifacts:– Joeq compiler and virtual machine– JavaBDD library and BuDDy library– bddbddb tool

Page 180: Bddbddb: Using Datalog and BDDs for Program Analysis John Whaley Stanford University and moka5 Inc. June 11, 2006.

June 11, 2006 Using Datalog and BDDsfor Program Analysis

180

Conclusion

• The first scalable context-sensitive subset-based pointer analysis.– Accurate: Results for up to 1014

contexts.– Scales to large programs.

• bddbddb: a new paradigm in prog analysis– High-level spec Efficient implementation

• System is publicly available at:http://bddbddb.sourceforge.net


Recommended