Pietro Ferrara
Chair of Programming MethodologyETH
Zurich, Switzerland
Analisi e Verifica di ProgrammiUniversita’ Ca’ Foscari, Venice, Italy
An Introduction to Heap Analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Top domain
2. Program point-bounded references
3. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Syntax
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Concrete and Abstract Domain
• Concrete domain:
• Abstract non-relational domain:
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Non-Relational Domains
• A non-relational domain has to provide:
> Operators on lattices
• Partial ordering
• Upper and lower bound
• Top and bottom elements
> Abstraction and concretization functions
>
>
>
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Summary on Non-Relational Domains
• Many different domains:
> Sign
> Parity
> Congruences
> Integers
> Intervals
• Work directly on values
• Evaluation of expressions and conditions
• No relational information!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Summary on Relational Domains
• “Generic state”
> We do not have anymore an environment!
• A relational domain has to provide:
> Operators on lattices
• Partial ordering
• Upper and lower bound
• Top and bottom elements
> Abstraction and concretization functions
>
>
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Top domain
2. Program point-bounded references
3. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Extended language
• We extend the language to support references
> Dynamic allocation of memory
> Dereferencing pointers
• We can have
> Integer variables
> Pointers to integers variables
> Pointers to pointers to…
• We do not consider arithmetic of pointers
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Syntax
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Syntax
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Syntax
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Top domain
2. Program point-bounded references
3. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Concrete domain
• Previously
> environment that relats variables to values
• Runtime behaviors described by:
> Set of environments
> Lattice with set operators:
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Naïve extension
• Trivial extension of this domain
> Variables can be
• Integers values
• Pointers
• Let’s define pointers as variables
> [x -> y] means that x points to y
• Formally
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Counterexample
• This approach is not expressive enough!
• The pointer created by allocInt was not previously assigned to a variable
> Imagine it as the creation of a new object
• We do not have a way to represent it
• We need something more!
x = allocInt;
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Environment and Store
• Common approach in programming languages:
> Environment
• Local variables related to addresses
> Heap
• Addresses related to values
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
More Expressive Languages
• For our language such domain is enough
• But what about objects?
> State of an object
• Environment!
• It relates variables (fields!) to values (integers or pointers to other objects)
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An exampleclass A {
int node=…;A next=null;
}
A head=new A(5);
head.next=new A(4);
Variable Value
head #1
Variable Value
head #1
Address Value
#1 head → #2next → null
#2 5
Address Value
#1 head → #2next → #3
#2 5
#3 head → #4next → null
#4 4
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Our approach
• It is enough for our simple language
• For many statements, same semantics
> Concatenation, if, while
> Arithmetic operations
> Evaluation of conditions
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Statements
• We have to (re)define the semantics of
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Old expressions
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
New expressions
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Statements
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example
int* it=allocInt;
*it=1;
int sum=0;
while(sum<2) {
sum=sum+*it;
}
Variable Value
it #1
Address Value
#1 ?
Variable Value
it #1
Address Value
#1 1
Variable Value
it #1
sum #2
Address Value
#1 1
#2 0
Variable Value
it #1
sum #2
Address Value
#1 1
#2 1
Address Value
#1 1
#2 2
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Summary
• We distinguish between actions
> on pointers
> on numerical values
• Abstract domains on
> Numerical information
• Sign
• Intervals
• …
> Heap structure
• Todo now!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Top domain
2. Program point-bounded references
3. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Approach
• The core is how to abstract the heap:
> Variables points to “something” on it
• Previously we had
> In fact, we did not have pointers and store
• Let us be generic w.r.t. what an identifiers of the heap can be
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Heap
• We suppose that an abstraction of the heap is provided, that is, we have
• For numerical domains, we had
> Similar approach!
• Now we investigate what semantics primitives we need
> Like eval_const, eval_arithm, etc…
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Statements
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Old expressions
• Values are stored in the heap
• We do not know what is the abstract heap
• We need a primitive that defines it!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
New expressions
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Statements
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Summary
• The heap analysis has to provide:
> returns a new abstract heap id
> returns the numerical value related to a heap identifier
> assigns the evaluation of a numerical expression to an heap identifier
• Mmm… something is wrong here!
> We have not to consider numerical values!!!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Numerical Relational Domains
• A relational domain has to provide:
> Operators on lattices
> Abstraction and concretization functions
>
>
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Numerical Values
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Approach
• In expressions:
> replace variables with (abstract) pointers
• Relational numerical domain:
> track relations on abstract pointers
• instead of variables!
• Combination of
> Heap analysis
• symbolically represents references
> Numerical domains
• track numerical information
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Arithmetic Expressions
• We replace variables with heap ids
> through the environment
• We pass such expression to
• We do not need and
• We simply deals with and
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Statements
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
New expressions
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Statements
corresponds to where we replaced with
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Summary
• The heap analysis provides:
> Heap identifiers
• Lattice structure!
> An “internal” state
• Lattice structure
>
• A numerical relational domain provides
> Lattice structure
> and
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Example
1 2 3
int x=0,y=1,z=2;
x=&y;z=&y;while(x>0)
y--;return z;
Heap analysis
Environment x y z
Numerical domain
1
2
3
We return [0..0]
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Too easy!
• The situation is more complex:
> Rough evaluation of boolean conditions
• A variable may point to different references
int x=0,y=1;
if(random>0.5)x=&y
1 2Heap analysis
Environment x y
Numerical domain1
2
?
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Heap identifiers
• An heap id is no more a single element
> Set of elements!
• Issue:
> If we assign a variable pointing to many addresses, what happens?
• Weak updates: – lub between the previous value and the assigned one
• We preserve the soundness!
• We introduce approximation!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
In the concrete
int x=0,y=1;
if(random>0.5)x=&y
x++;
1 2Heap analysis
Environment x y
Numerical domain1
2
0.6!true
0
12
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
In the concrete
int x=0,y=1;
if(random>0.5)x=&y
x++;
1 2Heap analysis
Environment x y
Numerical domain1
2
0.1!false
0
1
1
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
In the abstract
int x=0,y=1;
if(random>0.5)x=&y
x++;
1 2Heap analysis
Environment x y
Numerical domain1
2
?
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Final state
• Concrete
• Abstract
1
2 1
1
1
2
0
2
1 2
x y
1 2
x y
1 2
x y1
2
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Nodes
• Heap analysis
> the state of the heap represented by a graph
• Environment
> Part of the heap analysis
• More complex data structures like lists
> edges between nodes
1 2
head
next3
next4
next
while(head!=null)head=head.next
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Nodes
• Simple language
> No edges between nodes
> Only edges from variables to nodes
• Heap identifiers
> Set of nodes
> Lattice operators: set operators!
• Internal states
> Set of nodes (references)
> Lattice operators: set operators!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
The end
• Are you sure???
• As always, things are much more complex…
• Let’s start with a naïve solution:
> Nodes identified by an integer number
> Counter incremented each time we allocated a new integer
• As in the concrete context
• … it should not work
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Allocation inside while loop
int x=…;while(random>0.5)
x=allocInt;
1 2
x
3 4 ∞
1 2
x
3 4 ∞
• Unbounded number of nodes
> Lattice using set operators of infinite height
> The analysis may not converge!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Top domain
2. Program point-bounded references
3. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Top domain
• Let’s start with a simple domain
> The top domain
> approximates all the concrete references!
> Lattice structure is trivial:
• Upper bound:
• Lower bound:
• Top:
• Bottom:
• Partial order: true
> Semantics:
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
#12
#43#03
#77
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Concretization and abstraction
#12
#43#03
#77
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Allocation of memory
#12
#43
#98
#77
allocInt
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
A (counter)example
int x=0,y=1;
if(random>0.5)x=&y
x++;
rHeap analysis
Environment x y
Numerical domain r
• At the end, x and y may be both 1!
> Not sound
• Why?
> r represents many concrete references!
> We have to perform weak updates!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Weak vs. strong updates
• Concrete semantics of assignments:
• (Unsound) Abstract semantics:
• Note: unsound iff x is an heap identifier!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
(wrong) Assignments with heap id
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Summary nodes
• What happens if ?
>
>
> Environment:
> Initial state:
> Result
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Concrete and Abstract
x=0
x=0
Not sound!!!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Weak vs. Strong updates
• Weak updates:
> assign lub(old value, assigned value)
• Strong updates:
> Assign exactly the assigned value
• We can perform strong updates iff
> The assigned variable points to one heap id
> The id represents one concrete reference
• Otherwise we are unsound!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example
int x=0,y=1;
if(random>0.5)x=&y
x++;
rHeap analysis
Environment x y
Numerical domain r
• At the end, x and y may be
> both 1
> both 2
> Sound!
> … but rough (they are never 0!)
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Top domain
2. Program point-bounded references
3. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Program point-bounded reference
• Intuition:
> Approximate together
• all the concrete references
• allocated by the same statement
> Abstract reference:
• the program point of the statement
• Inside a loop or a recursive procedure:
> One abstract -> many concrete
> Important to track it for weak updates
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Heap identifiers
• Approximation:
> Different executions may assign
• references allocated by different statements
• To the same variable
• Heap identifiers
> sets of abstract references
• Lattice operators:
> Common set operators
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Lattice structure
{(l2,c4)} {(l1,c3)} {(l3,c2)} ….….
{(l2,c4),(l1,c3)} {(l1,c3),(l3,c2)}
{(l2,c4),(l1,c3),(l3,c2)}
….
….
….
….
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
#12
#43#03
#77
1: void main(String[] args) {2: int a=allocInt;3: }
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
#12
#43#03
#77
1: void main(String[] args) {2: int a;3: if(random>0.5)4: a=allocInt;5: else a=allocInt;6: }
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Abstraction and concretization
#12
#43#03
#77
1: void main(String[] args) {2: int a;3: while(random>0.5)4: a=allocInt;5: }
…
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Allocation of memory
#12
#43#03
#77
1: void main(String[] args) {2: int a=allocInt;3: }
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Allocation of memory
#12
#43#03
#77
1: void main(String[] args) {2: int a;3: while(random>0.5)4: a=allocInt;5: }
…
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example
1: int x=0,2: y=1;3: if(random>0.5)4: x=&y;5: x++;
l1Heap analysis
Environment x y
Numerical domain
l2
l1
l2
rHeap analysis
Environment x y
Numerical domain r
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Another example
1: while(random>0.5) {2: x=allocInt3: x=04: }5: x++;
l2Heap analysis
Environment x
Numerical domain l2
• Sound!
• … but rough (x cannot be 0 at the end!)
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Weak vs. Strong updates
• Recall:
> We can perform strong updates iff
• The assigned variable points to one heap id
• The heap id represents one concrete reference
• We require that:
> heap id composed by one abstract element
> The reference is not allocated inside a loop
• If we do not have method calls!
> Otherwise, different calls to the same method
> Track if a method is recursive
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Refining the analysis
• Create more abstract references
> that represent one concrete reference
> Increase the precision
> Increase the complexity
• For instance,
> Number of iteration of the while loop
• (pp, i): i-th iteration of the loop
• Require a widening operator, infinite references!
> Stack of the called method
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Limits
• Serious limits with data structures
> Complex relations between data
> More complex properties
• Acyclic list
• Trees
• Etc..
> Abstract reference bounded to program point
• Not enough
> Need to track relations between heap nodes
• Something like relational numerical domains
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example
class List {int node=0;List next=null;
}
l2Heap analysis
Environment list it
l4
1: List list=new List();2: List it=list;3: while(random>0.5) {4: it.next=new List();5: it=it.next;6: }
next
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example
class List {int node=0;List next=null;
}
l2Heap analysis
Environment list it
l4
1: List list=new List();2: List it=list;3: while(random>0.5) {4: it.next=new List();5: it=it.next;6: }
next
next
• list may be cyclic
• it may not point to the last cell
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example - Concretization
l2Heap analysis
Environment list it
l4next
next
#1
list it
#2next
#1
list it
#2next
#3next
#1
list it
#2next
#3next
next
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Outline
1. Recall of numerical domains
2. Extended language with references
3. Concrete semantics
4. Abstract semantics
5. Abstract domains
1. Program point-bounded references
2. Shape analysis
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Shape analysis
• Main trend in the field of static analysis
• Developed mostly by Mooly Sagiv & al.
> Started in 1995
> Still ongoing research efforts
• Many case studies and applications
• Strong practical interest
• Huge literature on the topic
Mooly Sagiv, Thomas W. Reps, Reinhard Wilhelm: “Solving Shape-Analysis Problems in Languages with Destructive Updating”ACM Trans. Program. Lang. Syst. 20(1): 1-50 (1998)
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Intuition
• Describe the shape of the concrete heaps
> Concrete heaps: potentially unbounded
> Abstract heaps: bounded a-priori• One abstract node → many concrete references
• As in the previous approach!
• More flexible
> Materialization
> Explicit representation of sharing
• Focused on lists
> But still approximated!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Shape graphs
#1 #2 #3 #4
• Concrete shape graph
> Represented by cons-cells
> Potentially unbounded
• Abstract shape graphs
> : summary nodes
n{x} nΦ
nΦ
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Approach
• Until here, it is not more refined nor really different
> Now: Nodes bounded to variables
> Before: Nodes bounded to program points
> No difference between cyclic and acyclic lists
> Still using summary nodes
n{x} nΦ
l2
list
l4next
next
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Sharing
• First refinement:
> Track an “is-shared” predicate on each node
> means that all the cells represented by a node n are distinct
• That is, there is not sharing
> The length of the list is abstracted away…
• … but in this way the analysis terminates!
• So we know something more…
> But how is this used?
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Materialization without sharing
• Access the next element
> Summary node
> Materialize a new single node ad hoc
• Precisely track nodes pointed by variables
> Always single nodes!
> More precise than the previous approach
• Sharing information is essential…
n{x,y} nΦ n{y} nΦx=x.next
n{x}
Unsound!
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Materialization with sharing
• Unshared summary node
>
• Shared summary node
>
n{x,y} nΦ n{y} nΦx=x.next
n{x}
n{x,y} nΦ n{y} nΦx=x.next
n{x}
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Renaming
• Problem
> Identifiers of nodes
• variables that point to them
> What happens if we assign an existing node to an existing variable?
• Renaming of the node
n{y} n{x,y}
Node x=y;
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Liquidization
• Another problem:
> If the assigned variable point to something?
• Garbage collection or liquidization
n{y} n{x,y}
x=y;n{x} n{?}
n{y} nΦn{x}
x=y;n{x,y} nΦn{?}
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
n{list,it}
An example
class List {int node=0;List next=null;
}
list
it
1: List list=new List();2: List it=list;3: while(random>0.5) {4: it.next=new List();5: it=it.next;6: }
n{list} n{?}n{it}
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example
class List {int node=0;List next=null;
}
list
it
1: List list=new List();2: List it=list;3: while(random>0.5) {4: it.next=new List();5: it=it.next;6: }
n{list} n{?}n{it} n{?}nΦ n{it}
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
An example - Concretization
l2
list it
l4next
next
#1
list it
#2next
#1
list it
#2next
#3next
#1
list it
#2next
#3next
next
list
it
n{list} nΦ n{it}
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Conclusion on shape analysis
• Shape analysis improves the precision
> But it is not the final solution
> Shape analysis appeared 13 years ago
> Many improvements in the meanwhile
> New improvements appear each year
• Not easy to infer information
> Automatically
> Efficiently
> Precisely
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Conclusion
• We have
> extended the language with references
> formalized concrete and abstract semantics
> presented several heap analyses
• Reasoning on heap analysis
> without taking into account numerical issues
• Quite similar to numerical domains
> Different levels of precision and efficiency
> Non-relational and relational information
Pietro Ferrara: “An introduction to heap analysis”Analisi e Verifica di Programmi, Universita’ Ca’ Foscari, Venice, Italy
Conclusion
• We can reason separately on
> Numerical domains
> Heap analyses
• Different plugins of the same analysis
> Different levels of complexity efficiency
• Generic analyzers plugged with different
> numerical domains
> heap analyses
> properties
> etc…