TIP Language and Type Analysis
Yu Zhang
Course web site: http://staff.ustc.edu.cn/~yuzhang/pldpa
Type Analysis and Unification 1
Resources
• Static Program Analysis
- http://cs.au.dk/~amoeller/
- TIPC:implemented in C++17tipg4:implemented using ANTLR4
Type Analysis and Unification 2
Anders Møller
Questions about Programs
• Does the program terminate on all inputs?
• How large can the heap/stack frame become during
execution?
• Can sensitive information leak to non-trusted users?
• Can non-trusted users affect sensitive information?
• Data races?
• SQL injections?
• …
Type Analysis and Unification 3
SQL 注入:通过把SQL
命令插入到Web表单提
交等,来欺骗服务器执
行恶意的SQL命令
Program Points
Type Analysis and Unification 4
Any point in the program
= any value of the PC
Invariants (不变式):
A property holds at a program point if it holds in any such
state for any execution with any input
Questions about Program Points
• Will the value of x be read in the future?
• Is the variable x initialized before it is read?
• What is a lower and upper bound on the value of
the integer variable x?
• Can the pointer p be null?
• Which variables can p point to?
• Do p and q point to disjoint structures in the heap?
• …
Type Analysis and Unification 5
Why are the Answers Interesting?
• Increase efficiency
- Resource usage
- Optimization
• Ensure correctness
- Verify behavior
- Catch bugs early
• Support program understanding
• Enable refactoringsType Analysis and Unification 6
Programs that reason about programs
• Soundness(可靠性): don’t miss any errors
• Completeness(完备性): don’t raise false alarms
• Termination(终止性): always give an answer
Type Analysis and Unification 7
Rice’s theorem, 1953
• H.G. Rice: Classes of recursively enumerable
sets and their decision problem
• Rice定理:Any nontrivial property of the behavior of
programs in a Turing-complete language is undecidable!
•
递归可枚举语言的所有非平凡(nontrival)性质都是不可判
定的
平凡性质:要么对全体程序都为真,要么对全体程序都为假
非平凡性质:所有不平凡的性质
Type Analysis and Unification 8
Approximation
• Approximate answers may be decidable!
- Output yes/no => output yes/no/unknown
• The approximation must be conservative
• More subtle approximations if not only yes/no
- E.g. memory usage, pointer targets
Type Analysis and Unification 9
False positives and false negatives
Type Analysis and Unification 10
误报
prevent by completeness
漏报
prevent by soundness
The Engineering Challenge
• A correct but trivial approximation algorithm may
just give the useless answer every time
• The engineering challenge is to give the useful
answer often enough to fuel the client application
• … and to do so within reasonable time and space
• Hard (but fun) part of static analysis
Type Analysis and Unification 11
A Constraint-based Approach
• Conceptually separates the analysis specification
from algorithmic aspects and implementation
details
Type Analysis and Unification 12
Challengeing Features in Modern PLs
• Higher-order functions
• Mutable records or objects, arrays
• Integer or floating-point computations
• Dynamic dispatching
• Inheritance
• Exceptions
• Reflection
• …
Type Analysis and Unification 13
TIP Language
TIP: Tiny Imperative Programming language
Type Analysis and Unification 14
TIP and its Implementation
• TIP language
- Minimal C-style syntax
- Enough features to make static analysis challenging
and fun
• Implementation
- Scala: https://github.com/cs-au-dk/TIP/
- C++ 17: https://github.com/matthewbdwyer/tipc
Type Analysis and Unification 15
Expresions in TIP
Type Analysis and Unification 16
Statements in TIP
• In conditions, 0 is false, all other values are true
• The output statement writes an integer value to
the output stream
Type Analysis and Unification 17
Functions in TIP
• The optional var block declares a collection of
uninitialized variables
• Function calls are an extra kind of expressions:
Type Analysis and Unification 18
Pointers
• No pointer arithmetic
Type Analysis and Unification 19
Records
• Records are passed by value (like structs in C)
• For simplicity, values of record fields cannot be
recordsType Analysis and Unification 20
Functions as Values
• Functions are first-class values
• The name of a function is like a variable that
refers to that function
• Generalized function calls
• Function values suffice to illustrate the main
challenges with methods (in OO languages) and
higher-order functions (in functional languages)Type Analysis and Unification 21
Programs
• A program is a collection of functions
• The function named main initiates execution
- Its arguments are taken from the input stream
- Its result is placed on the output stream
• We assume that all declared identifiers are unique
Type Analysis and Unification 22
TIP Examples
• Recursive factorial function • Iterative factorial function
Type Analysis and Unification 23
Control flow graphs
• Iterative factorial function
Type Analysis and Unification 24
Normalization
• Normalization:flatten nested expressions, using
fresh variables
Type Analysis and Unification 25
Type analysis and unification
Type Analysis and Unification 26
Type Errors
• Reasonable restrictions on operations:
- Arithmetic operators apply only to to integers
- Comparisons apply only to like values
- Only integers can be input and output
- Conditions must be integers
- Only functions can be called
- The * operator only applies to pointers
- Field lookup can only be performed on records
- The fields being accessed are guaranteed to be present
• Violations result in runtime errors
• No type annotations in TIP
Type Analysis and Unification 27
Type Checking
• Can type errors occur during runtime?
- undecidable
• Use conservative approximation
- A program is typable is it satisfies some type constraints
- These are systematically derived from the syntax tree
- If typable, then no runtime errors occur
- But some programs will be unfairly rejected (slack)
Type Analysis and Unification 28
typable
slack
No type
errors
Challenges
• Fighting slack
- Make the type checker a
bit more clever
- An eternal struggle
- And a great source of
publications
• The type checker may be
unsound
• Ex. covariant arrays in Java
- 协变数组若B是A的子类, 则如下代码在Java中是允许的: A[ ] a=new B[ ];
- 从类延伸到数组的变换,原有的继承关系不变
Type Analysis and Unification 29
Types
• Types describe the possible values
• These describe integers, pointers, functions, and
records
• Types are terms generated by this grammar
Type Analysis and Unification 30
Type constraints
Type Analysis and Unification 31
Generating constraints
Type Analysis and Unification 32
Generating constraints
Type Analysis and Unification 33
多态类型
Exercise
• Generate and solve the constraints
• Then try with y = alloc 8 replaced by y = 42
Type Analysis and Unification 34
Generating constraints
• This is the idea, but not directly expressible in TIP
types
Type Analysis and Unification 35
Generating constraints
• Exercise: Field write statements?
Type Analysis and Unification 36
General Terms
Type Analysis and Unification 37
Unification合一
• An equality between two terms with variables
- k(X,b,Y) = k(f(Y,Z), Z, d(Z))
• A solution (a unifier) is an assignment from
variables to terms that makes both sides equal
- X = f(d(b),b)
- Y = d(b)
- Z = b
Type Analysis and Unification 38
Unification errors
• Constructor error
- d(X) = e(X)
• Arity error
- a = a(X)
Type Analysis and Unification 39
Linear unification algorithm
• 1978, by Paterson and Wegman
• In time O(n)
- Finds a most general unifier
- Or decides that none exists
• Can be used as a back-end for type checking
• … but only for finite terms
Type Analysis and Unification 40
Recursive data structures
Type Analysis and Unification 41
[[p]] = [[alloc null]]
= ↑[[null]]
= ↑ ↑ t = ↑[[p]] = ↑ ↑ [[p]]
[[p]] = t t = ↑ t
Regular terms正则式
• Infinite but (eventually) repeating
- e(e(e(e(e(e(…))))))
- d(a, d(a, d(a,…)))
- f(f(f(f(…), f(…)), f(f(…), f(…))), f(f(f(…), f(…)), f(f(…),
f(…))))
• Only finitely many different subtrees
• A non-regular term
- f(a,f(d(a), f(d(d(a)), f(d(d(d(a))),…)))
Type Analysis and Unification 42
http://users-cs.au.dk/amoeller/spa/
3.3 Solving Constraints with Unification
Regular unification
• 1976, Huet
• Use a union-find (并查) algorithm to solve the
unification problem for regular terms in O(n*A(n))
• A(n) is the inverse Ackermann function
- Smallest k such that n<Ack(k,k)
- This is never bigger than 5 for any real value of n
• See TIP implementation tipcType Analysis and Unification 43
Union-Find
Type Analysis and Unification 44
Add a new node x that
initially is its own parent
Find the canonical representative of x by traversing the path to the root, performing path compression on the way
Find the canonical representatives of x and y, and makes one parent of the other unless they are already equivalent
https://github.com/matthewbdwyer/tipc/blob/main/src/semantic/types/solver/UnionFind.cpp
Union-Find (simplified)
Type Analysis and Unification 45
Implementation Strategy
• Representation of the different kinds of types
(including type variables)
• Map from AST nodes to type variables
• Union-Find
• Traverse AST, generate constraints, unify
- Reply type error if unification fails
- When unifying a type variable with e.g. a function type, it is
useful to pick the function type as representation
- For outputting solution, assign names to type variables (that
are roots), and be careful about recursive typesType Analysis and Unification 46
The Complicated Function
Type Analysis and Unification 47
Solutions
Type Analysis and Unification 48
递归类型
Infinitely many solutions
• Polymorphic function
(which is not expressible in TIP type language)
Type Analysis and Unification 49
Recursive and polymorphic types
Type Analysis and Unification 50
Slack – let-polymorphism
Type Analysis and Unification 51
Slack – let-polymorphism
Type Analysis and Unification 52
Slack – flow-insensitivity
Type Analysis and Unification 53
Other programming errors
Type Analysis and Unification 54