Data-Flow Analysis (Chapter 8) Mooly Sagiv Make-up class May 4.

Post on 20-Dec-2015

227 views 1 download

transcript

Data-Flow Analysis(Chapter 8)

Mooly Sagiv

Make-up class May 4

Outline• What is Data-Flow Analysis?• Structure of an optimizing compiler• An example: Reaching Definitions• Basic Concepts: Lattices, Flow-

Functions, and Fixed Points• Taxonomy of Data-Flow Problems and

Solutions• Iterative Data-Flow Analysis• Structural Data-Flow Analysis• DU-Chains and SSA

Data-Flow Analysis

• Input: A control flow graph• Output: A control flow graph with

“global” information at every basic blockExamples– Constant expressions: x+y*z– Live variables

Compiler Structure

Symbol table and

access routines

OS

Interface

String of characters

Scanner

tokens

Semantic

analyzer

Parser

Code Generator

IR

AST

Object code

Optimizing Compiler Structure

String of characters

Front-EndIR

IR

Control Flow Analysis

CFGData Flow Analysis

CFG+informationProgram Transformations

instruction selection

Object

code

An Example Reaching Definitions

• A definition --- an assignment to variable

• An assignment d reaches a basic block if there exists an execution path to the basic block in which the value assigned at d is still active at the basic block

Running Exampleunsigned int fib(unsigned int m)

{unsigned int f0=0, f1=1, f2, i;

if (m <= 1) {

return m;

}

else {

for (i=2, i <=m, i++) {

f2=f0+f1;

f0=f1;

f1 =f2;}

return f2; }

}

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

2,3

2,3, 5,8,9, 10, 11

2,3, 5, 8,9, 10, 11

2,3, 5, 8,9, 10, 11

2,3

2,3, 5,8,9, 10, 11

Difficulties in Data-Flow Analysis

• Input-dependent information

• Undecidability of program analysis– Reachability of basic blocks– Arithmetic– ...

1 int g(int m, int i)

2 int f(int n)

3 { int i=0;

4 if (n == 1) i = 2

5 while (n > 0) {

6 j = i+1;

7 n = g(n, i);

8 }

9 return j

10 }

Conservative data-flow analysis

• Every piece of data-flow information is sound

• Every enabled optimization is correct

• A superset of the execution sequences is considered

• In the reaching definition example a superset of the reaching definitions is computed

1 int g(int m, int i)

2 int f(int n)

3 { int i=0;

4 if (n == 1) i = 2

5 while (n > 0) {

6 j = i+1;

7 n = g(n, i);

8 }

9 return j

10 }

Iterative Computation of Reaching Definitions

• Optimistically assume that at every block no definition is reached

• Every basic block “generates” new definitions and “preserves” other definitions

• No definition reaches ENTRY• Accumulate reaching definitions along different

paths• Iteratively compute more and more definitions at

every basic block• The process must terminate• The final solution is unique and conservative

Iterative Computation of Reaching Definitions

RCin(ENTRY) =

RCin(B) = B’ Pred(B) RCout(B’)

RCout(B) = GEN(B) (RCin(B)PRSV(B))

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

2,3

2,3, 5,8,9, 10, 11

2,3, 5, 8,9, 10, 11

2,3, 5, 8,9, 10, 11

2,3

2,3, 5,8,9, 10, 11

Iterative Computation of Reaching Definitions

Using Bit-Vectors• Represent every definition with a bit• PRSV and GEN are bit-vectors

RCin(ENTRY) = <000...0>

RCin(B) = B’ Pred(B) RCout(B’)

RCout(B) = GEN(B) (RCin(B) PRSV(B))

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

2,3

2,3, 5,8,9, 10, 11

2,3, 5, 8,9, 10, 11

2,3, 5, 8,9, 10, 11

2,3

2,3, 5,8,9, 10, 11

Complete Join-Lattices

• A set L of “data-flow” information• A partial order on the elements of L• x y x “covers” less states than y

x is more precise than y is the minimum element• height of a lattice length of maximal strictly

increasing chain x1x2... xk

• A “join” confluence operator : LLL– x x y, y x y– x z, y z x y z

• Examples: Powersets, Bit-Vectors, ICP

Properties of Lattices

x = x = x

• x x = x (reflexivity)

• x y = y x (commutativity)

• (x y ) z = x (y z) (associativity)

Functions on Lattices

• Models effects of basic blocks

• A monotonic function f: L L x y f(x) f(y)

• A distributive function f: L L f(x) f(y) = f(x y)

• A fixed point of a function f, f(x) = x

• For a monotonic function f the effective height of L w.r.t. f, the longest increasing chainf()f2()=f(f())... fk() = lfp(f)

The Join (Meet) Over All Paths

• A data-flow solution which is precise under the assumption that every control flow path is executable

• For a path [B1, B2, ... Bn] Fp = FBn ... FB2 FB1

• The JOP at a block B

JOP(B) = P Path(B) Fp(Init)

• For distributive Fp compute JOP

• Otherwise, find X(B) JOP(B)

entry

w > 0?

u 1

v 2

u 2

v 1

w u+v

exit

Y N

Dimensions for Data-Flow Problems

• The information provided

• “ralational” Vs. independent attributes

• The type of lattice and functions usedpowersets, ICP, ..., unbounded heights

• The direction of information flowforward, backward, bidirectional

Example Data-Flow Problems

• Reaching Definitions

• Available Expressions

• Live Variables

• Upward Exposed Uses

• Copy-Propagation Analysis

• Constant-Propagation Analysis

• Partial-Redundency Analysis

entry

z > 1?

x 1

z > y

x 2

z x-3

exit

Y

NY

y x+1

N

Data-Flow Analysis Algorithms

• Allen’s strongly connected regions• Kildall’s iterative algorithm• Ullman’s T1-T2 analysis• Kennedy’s node-listing algorithm• Farrow, Kennedy, and Zuconi’s graph

grammar approach• Rosen’s high-level approach• structural analysis• slotwise analysis

Iterative Data-Flow Analysis

in(ENTRY) = Init

In(B) = B’ Pred(B) Out(B’)

Out(B) = FB(In(B))

Iterative Data-Flow AlgorithmInput: a flow graph G=(N,E,r) An init value Init A montonic function FB for every B in N

Output: For every N in(N)Initializatio: in(Entry) := Init;

for each node B in N-{Entry} do in(B) := WL := N - {Entry}Iteration: while WL != {} do Select and remove an B from WL out := FB(in(B)) For all B’ in succ(B) such that in(B’) != in(B’) out do in(B’):= in(B’) out WL := WL {B’}

Post-orderingInput: a flow graph G=(N,E,r)

Output: a depth-first spanning tree (N,T) and ordering Post of N

Method:T := Ø;for each node n in N do mark n unvisited;i := 1; call DFS(r)

Using: procedure DFS(n) ismark n visited; for each n s in E

do if s is not visited then add the edge n s to T;

call DFS(s); Post(n) := i; i := i + 1;

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

1

3

2

4

6

5

7

8

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

8

6

7

5

3

4

2

1

{2, 3}

{2, 3}

{2, 3, 5}

{2, 3, 5}

{2, 3, 5}

{}

{}

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

8

6

7

5

3

4

2

1

{2, 3}

{2, 3}

{2, 3, 5}

{2, 3, 5}{2, 3, 5, 8, 9, 10}

{}

{}

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

8

6

7

5

3

4

2

1

{2, 3}

{2, 3}

{2, 3, 5, 8, 9, 10}

{2, 3, 5, 8, 9, 10}

{2, 3, 5, 8, 9, 10}

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

8

6

7

5

3

4

2

1

{2, 3}

{2, 3}

{2, 3, 5, 8, 9, 10}

{2, 3, 5, 8, 9, 10}

{2, 3, 5, 8, 9, 10}

{2, 3, 5, 8, 9, 10}

Iterative Backward Data-Flow Analysis

Out(Exit) = Init

Out(B) = B’ Succ(B) In(B’)

In(B) = FB(Out(B))

Lattices of Flow Functions• For a lattice L

LF are the monotonic functions from L to L f LF x y f(x) f(y)

• LF is a lattice with the order f g for all z: f(z) g(z) F(z) – (x F y) (z) x(z) y(z)

• LF is closed under composition (f g)(z) f(g(z))– f0id, fn f fn-1

– f*(z) lim n (id f)n (z)

Structural Data-Flow Analysis

• Phase 1: Compute “the effect” of every program construct in a bottom-up fashion on the tree of control flow constructs(control-tree)

• Phase 2: Propagates the data-flow value in a top-down fashion into basic blocks

Bottom-Up Phase(if-then)

if

then

Fif/Y

Fif/N

Fthen

if-then

Fif-then

Fif-then=(F then° Fif/Y) Fif/N

Bottom-Up Phase(Simplified if-then)

if

then

Fif

Fif

Fthen

if-then

Fif-then

Fif-then=(F then° Fif) Fif

Bottom-Up PhaseReaching Definitions(Simplified if-then)

if

then

FG1, P1

FG1, P1

FG2, P2

if-then

FG, P

(F G2, P2° FG1, P1) FG1, P1

F (G1P2)G2, P1 P2 FG1, P1

F G1G2, P1

Top-Down Phase(if-then)

if

then

Fif/Y

Fif/N

Fthen

if-then

Fif-then

in(if-then)=in(if)

in(then) = Fif/Y(in(if))

Top-Down Phase(Simplified if-then)

if

then

Fif

Fif

Fthen

if-then

Fif-then

in(if-then)=in(if)

in(then) = Fif (in(if))

Top-Down PhaseReaching Definitions(Simplified if-then)

if

then

FG1, P1

FG1, P1

FG2, P2

if-then

FG,P

in(if) = in(if-then)

in(then) = FG1, P1 (in(if))

Bottom-Up Phase(if-then-else)

if-then-else

Fif-then-else

Fif-then-else=(F then° Fif/Y) (F else° Fif/N)

if

then

Fif/Y Fif/N

Fthen

else

Felse

Bottom-Up Phase(simplified if-then-else)

if-then-else

Fif-then-else

Fif-then-else=(F then° Fif) (F else° Fif)=

=(F then F else )° Fif

if

then

Fif Fif

Fthen

else

Felse

Bottom-Up Phase Reaching Definitions

(simplified if-then-else)

if-then-else

FG,P

(F G1, P1 F G2, P2 )° FG0, P0

if

then

FG0, P0 FG0, P0

FG1,P1

else

FG2, P2

F G1G2, P1P2, ° FG0, P0

F (G0 (P1P2)) G1G2, P0 (P1P2)

Top-Down Phase(if-then-else)

if-then-else

Fif-then-else

in(if)=in(if-then-else)

in(then)= Fif/Y (in(if))

in(else)= Fif/N (in(if))

if

then

Fif/Y Fif/N

Fthen

else

Felse

Top-Down Phase(Simplified if-then-else)

if-then-else

Fif-then-else

in(if)=in(if-then-else)

in(then)= Fif (in(if))

in(else)= Fif (in(if))

if

then

Fif Fif

Fthen

else

Felse

in(if)=in(if-then-else)

in(then)= FG0, P0 (in(if))

in(else)= FG0, P0 (in(if))

Top-Down-Up Phase Reaching Definitions

(simplified if-then-else)

if-then-else

FG,P

if

then

FG0, P0 FG0, P0

FG1,P1

else

FG2, P2

Bottom-Up Phase(while)

Fwhile/N

while-loop

Fwhile-loop

Fwhile-loop=Fwhile/N °(F body° Fwhile/Y)*

while

body

Fwhile/Y

Fbody

Bottom-Up Phase(Simplified while)

Fwhile

while-loop

Fwhile-loop

Fwhile-loop=Fwhile°(F body° Fwhile)*

while

body

Fwhile

Fbody

Bottom-Up PhaseReaching Definitions

(Simplified while)

FG0, P0while-loop

FG,P

FG0,P0°(F G1, P1° FG0, P0)*

while

body

FG0, P0

FG1, P1

FG0,P0°(F (G0P1)G1,P0 P1)*

FG0,P0°(F (G0P1)G1,U)

F ((G0P1)G1) P0G0, P0

Top-Down Phase(while)

while-loop

Fwhile-loop

in(while)=(F body° Fwhile/Y)*(in(while-loop))

Fwhile/N

while

body

Fwhile/Y

Fbody

in(body)= Fwhile/Y (F body° Fwhile/Y)*(in(while-loop))

Top-Down Phase(Simplified while)

Fwhile

while-loop

Fwhile-loop

while

body

Fwhile

Fbody

in(while)=(F body° Fwhile)*(in(while-loop))

in(body)= Fwhile (F body° Fwhile)*(in(while-loop))

Top-Down PhaseReaching Definitions

(Simplified while)

FG0, P0while-loop

FG,P

while

body

FG0, P0

FG1, P1

in(while)=(F G1, P1° FG0, P0)*(in(while-loop))

in(body)= FG0, P0 (F G1,P1° FG0, P0)*(in(while-loop))

1: receive m(val)

2: f0 0

3: f1 1

4: if m <= 1 goto L3

5: i 2

6: L1: if i <=m goto L2

7: return f2

8: L2: f2 f0 + f1

9: f0 f1

10: f1 f2

11: i i + 1

12: goto L1

13: L3: return m

entry

exit

B0

B1

B2

B3

B4

B5

B6 B7

Handling Arbitrary CFGs

• Need to handle arbitrary acyclic regions

• Need to to handle irreducible components (improper regions)

Handling Arbitrary CFGs

• Need to handle arbitrary acyclic regions

• Need to handle arbitrary cyclic regions– Reducible regions– irreducible components (improper regions)

Handling Improper Regions

• Ignore

• Node splitting

• Solve iteratively for every initial value

• Solve iteratively over LF

Structural Backward Analysis

• Tricky

• For constructs with single exit “reverse” equation direction

• For acyclic constructs with multiple exits use join

• For cyclic reducible constructs with multiple exits--- break the cycle and use join

• Cyclic improper regions are handled like the forward case

Bottom-Up PhaseBackward Problems

(if-then)

if

then

Fif/Y

Fif/N

Fthen

if-then

Fif-then

Fif-then=(F if/Y° Fthen) Fif/N

Top-Down PhaseBackward Problems

(if-then)

if

then

Fif/Y

Fif/N

Fthen

if-then

Fif-then

out(then)= out(if-then)

out(if) = Fthen(out(then)) out(if-then)

Implementation

• Represent the computation of canonic cases with functions (if-then-else, while)

• Use graphs to represent arbitrary functional computations

Automatic Construction ofData-Flow Analyzers

• Not commonly used so far

• Kildall developed a tool for iterative data flow analysis (1973)

• The PAG (1995) system allows systematic construction of iterative data-flow analysis

• The Sharlit (1992) system generates non-iterative data-flow analyzers– Finds regular “path-expressions” in CFG– Convert into effect functions

Def-Use, Use-Def Chains

• Sparse data-flow information on flow of variables between assignments

• Can be used to improve the efficiency of iterative data-flow analysis

• A du-chain for a variable v connects a definition of v to all the uses of this definition

• A ud-chain for a variable v connects a use of v to all the definitions that may flow to it

• A web for a variable v is the maximal union of interesting du-chains for v

entry

z>1

x1

z>2

x2

yx+1

zx-3

x4

zx+7

exit

Y

NY

N

Static Single Assignment(SSA)

• A normal form of the program such that def-use is immediate

• A separate variable for every assignment

• A function combines the values of relevant variables

• Simplifies some optimizations

• Increases program’s size

entry

z1>1

x11

z1>2

x22

y1x1+1

x3 (x1, x2) ; z2x3-3

x44

z3x4+7

exit

Y

NY

N

Handling Pointers and Arrays

• Complicated!!!

• Treated conservatively in most compilers

• The frontier of research

• A simple “reduction”

• Direct solutions yield more precise solutions

xa[i]

a[i]4

xaccess(a, i)

aupdate(a, i, 4)

More Ambitious Data-Flow Analysis

• Data-Flow analysis can yield “interesting” information on program behavior

• Signs of variables

• Non-trivial constant values

• Termination properties

• Complicated bugs

• Partial correctness

int f(int x){if (x > 100) return x -10;

else return f(f(x+11));}void main(){scanf(“%d”, &x);if (x > 100) printf(“%d\n”, 91);

else printf(“%d\n”, f(x));}