Post on 03-Aug-2020
transcript
11/8/2012
1
Compiler Design and Construction
Semantic Analysis
Slides modified from Louden Book, Dr. Scherger, & Y Chung (NTHU), and Fischer, Leblanc
2
Any compiler must perform two major tasks
Analysis of the source program
Synthesis of a machine-language program
The Structure of a Compiler (1)
Compiler
Analysis Synthesis
The Structure of a Compiler (2)
3
Scanner Parser Semantic
Routines
Code
Generator
Optimizer
Source
Program Tokens Syntactic
Structure
Symbol and
Attribute
Tables
(Used by all Phases of The Compiler)
(Character Stream)
Intermediate
Representation
Target machine code
Compiler Stages
January, 2010 Chapter 1: Introduction 4
Scanner
Parser
Semantic
Analyzer
Source Code
Optimizer
Code
Generator
Target Code
Optimizer
Source Code
Target
Code
Tokens
Syntax Tree
Annotated
Tree
Intermediate
Code
Target
Code
Literal
Table
Symbol
Table
Error
Handler
Semantic Processing
April, 2011 Chapter 6:Semantic Analysis 5
Semantic routines interpret meaning based on syntactic
structure of input (modern compilers do this)
This makes the compilation syntax-directed
Semantic routines finish the analysis
Verify static semantics are followed
Variables declared, compatible operands (type and #), etc.
Semantic routines also start the synthesis
Generate either IR or target machine code
The semantic action is attached to the productions (or
sub trees of a syntax tree).
Abstract Syntax Tree
1st step in semantic processing is to build a syntax tree
representing input program
Don't need literal parse tree
Intermediate nodes for precedence and associativity
e-rules
Just enough info to drive semantic processing
Or even recreate input
Semantic processing performed by traversing the tree 1
or more times
Attributes attached to nodes aid semantic processing
11/8/2012
2
7.1.1
Using a Syntax Tree Representation of a Parse (1)
Parsing:
build the parse tree
Non-terminals for operator precedence
and associatively are included.
Semantic processing:
build and decorate the Abstract Syntax Tree (AST)
Non-terminals used for ease of parsing
may be omitted in the abstract syntax tree.
7
parse tree
<target> := <exp>
id
<exp> + <term>
<term>
<term> * <factoor>
<factor>
Const
id
<factor>
id
abstract syntax tree :=
id +
* id
const id
<assign>
2012/11/8
Abstract Syntax Tree
:=
Id +
* Id
Const Id
Abstract syntax tree for Y:=3*X+I
Abstract Syntax Tree
:=
Id(Y) +
* Id(I)
Const(3) Id(X)
Abstract syntax tree for Y:=3*X+I with initial
values
Abstract Syntax Tree
Initially, attributes only at leaves
Attributes propagate during the static semantic
checking Processing declarations to build symbol table Find symbols in ST to get attributes to attach
Determining expression/operand types
Declarations propagate top-down
Expressions propagate bottom-up
A tree is decorated after sufficient info for code
generation has propagated.
Abstract Syntax Tree
:=(itof)
Id(Y)(f) +(i)
* (i) Id(I,i)
Const(3,i) Id(X,i)
Abstract syntax tree for Y:=3*X+I with
propagated values
7.1.1
Using a Syntax Tree Representation of a Parse (2)
Semantic routines traverse (post-order) the AST,
computing attributes of the nodes of AST.
Initially, only leaves (i.e. terminals, e.g. const, id) have
attributes
12
Ex. Y := 3*X + I
:=
id(Y) +
* id(I)
const(3) id(X)
2012/11/8
11/8/2012
3
7.1.1
Using a Syntax Tree Representation of a Parse (3)
The attributes are then propagated to other nodes using
some functions, e.g.
build symbol table
attach attributes of nodes
check types, etc.
bottom-up / top-down propagation
13
<program>
declaration <stmt>
:=
id +
* id
const id
exp.
type
symbol
table
‘‘‘ ‘‘ ‘‘
‘
‘‘‘‘‘
‘‘
‘‘‘
‘‘‘check types: integer * or floating *
” Need to consult symbol table for types of id’s.
2012/11/8
7.1.1
Using a Syntax Tree Representation of a Parse (4)
After attribute propagation is done,
the tree is decorated and ready for code generation,
use another pass over the decorated AST to generate code.
Actually, these can be combined in a single pass
Build the AST
Decorate the AST
Generate the target code
What we have described is essentially
the Attribute Grammars(AG) (Details in chap.14)
14 2012/11/8
Static Semantic Checks
April, 2011 Chapter 6:Semantic Analysis 15
Static semantics can be checked at compile time
Check only propagated attributes
Type compatibility across assignment
Int B;
B := 5.2; illegal
B := 3; legal
Use attributes and structure
Correct number and types of parameters
procedure foo(int a, float b, int c, float b);
int C;
float D;
call foo(C,D,3,2.9) legal
call foo(C,D,3.3, 2.9) illegal
call foo(1,2,3,4,5) illegal
Dynamic Semantic Checks
Some checks can’t be done at compile time
Array bounds, arithmetic errors, valid addresses of pointers,
variables initialized before use.
Some languages allow explicit dynamic semantic checks
i.e. assert denominator not = 0
These are handled by the semantic routines inserting
code to check for these semantics
Violating dynamic semantics result in exceptions
Translation
Translation task uses attributes as data, but it is driven by
the structure
Translation output can be several forms
Machine code
Intermediate representation
Decorated tree itself
Sent to optimizer or code generator
Compiler Organization
one-pass compiler
Single pass used for both analysis and synthesis
Scanning, parsing, checking, & translation all interleaved,
No explicit IR generated
Semantic routines must generate machine code
Only simple optimizations can be performed
Tends to be less portable
11/8/2012
4
7.1.2
Compiler Organization Alternatives (2)
We prefer the code generator completely hides machine
details and semantic routines are independent of machines.
Can be violated to produce better code.
Suppose there are several classes of registers,
each for a different purpose.
Better for register allocation to be performed by semantic
routines than code generator since semantic routines have a
broader view of the AST.
19 2012/11/8
Compiler Organization
one-pass with peephole optimization
Optimizer makes a pass over generated machine code, looking
at a small number of instructions at a time
Allows for simple code generation
Peephole: looking at only a few instructions at a time
Effectively a separate pass
Simple but effective
Simplifies code generator since there is a pass of post-
processing.
Compiler Organization
one-pass analysis and IR synthesis plus a code generation
pass
Adds flexibility
Explicit IR created & sent to code generator
IR typically simple
Optimization can examine as much of IR as wanted
Less machine-dependent analysis
So easier to retarget
Compiler Organization Multi-pass analysis
Scan, then parse, then check declarations, then static semantics
Usually used to save space (memory usage or compiler)
Multi-pass synthesis
Separate out machine dependence
Better optimization
Generate IR
Do machine independent optimization
Generate machine code
Machine dependent optimization
Many complicated optimization and code generation algorithms require multiple passes
i.e. optimizations that need a more global view
for I = 1 to N
foo = 35*bar(i)+16;
bar(i) { return 3;};
7.1.2
Compiler Organization Alternatives (7)
Multi-language and multi-target compilers
Components may be shared and parameterized.
Ex : Ada uses Diana (language-dependent IR)
Ex : GCC uses two IRs.
one is high-level tree-oriented
the other(RTL) is more machine-oriented
23
FORTRAN PASCAL ADA C .....
machine-independent optimization
SUN PC main-frame
.....
language - and machine-independent IRs 2012/11/8
7.1.3
Single Pass (1)
In Micro of chap 2, scanning, parsing and semantic processing
are interleaved in a single pass.
(+) simple front-end
(+) less storage if no explicit trees
(-) immediately available information is limited since no complete
tree is built.
Relationships
24
scanner
call
tokens
parser
semantic
rtn 1
semantic
rtn 2
semantic
rtn k
semantic
records
call
2012/11/8
11/8/2012
5
7.1.3
Single Pass (2)
Each terminal and non-terminal has a semantic record.
Semantic records may be considered
as the attributes of the terminals and non-terminals.
Terminals
the semantic records are created by the scanner.
Non-terminals
the semantic records are created by a semantic routine when a
production is recognized.
Semantic records are transmitted
among semantic routines
via a semantic stack.
25
A
B C D #SR
ex. A B C D #SR
2012/11/8
1 pass = 1 post-order traversal of the parse tree
parsing actions -- build parse trees
semantic actions -- post-order traversal
7.1.3
Single Pass (3)
26 2012/11/8
+
B
A
<exp>
A
<exp> <exp>+<term>
<assign> ID:=<exp>
gencode(+,B,1,tmp1)
gencode(:=,A,tmp1)
<assign>
ID (A) := <exp>
<exp> + <term>
<term> const (1)
id (B)
A
B
A
+
B
A
1
Fall, 2002 CS 153 - Chapter 6 27
Chapter 6 - Semantic Analysis
Parser verifies that a program is
syntactically correct and constructs a
syntax tree (or other intermediate
representation).
Semantic analyzer checks that the
program satisfies all other static
language requirements (is
“meaningful”) and collects and
computes information needed for code
generation.
Fall, 2002 CS 153 - Chapter 6 28
Important Semantic Information
Symbol table: collects declaration
and scope information to satisfy
“declaration before use” rule, and to
establish data type and other
properties of names in a program.
Data types and type checking:
compute data types for all typed
language entities and check that
language rules on types are satisfied. Fall, 2002 CS 153 - Chapter 6 29
How to build the symbol table
and check types:
Analyze the scope rules for the
language and determine an
appropriate table structure for
maintaining this information.
Analyze the type requirements and
translate them into rules that can be
applied recursively on a syntax tree.
Fall, 2002 CS 153 - Chapter 6 30
Theoretical framework for
semantic analysis Focus on attributes: computable
properties of language constructs that
are needed to satisfy language
requirements and/or generate code
Describe the computation of attributes
using equations or algorithms.
Associate these equations to grammar
rules and/or kinds of nodes in a syntax
tree.
11/8/2012
6
Fall, 2002 CS 153 - Chapter 6 31
Analyze the structure of the
equations to determine an order in
which the attributes can be
computed. (Tree traversals of syntax
tree - preorder, postorder, inorder, or
some combination of them.)
Fall, 2002 CS 153 - Chapter 6 32
Such a set of equations as described
is called an attribute grammar.
While much can be done without a
formal framework, the formality of
equations can help the process
considerably.
Nevertheless, there is currently no
tool in standard use that allows this
process to be automated (languages
differ too much in their
requirements). Fall, 2002 CS 153 - Chapter 6 33
Example of an attribute grammar Grammar:
exp exp + term | exp - term | term
term term * factor | factor
factor ( exp ) | number
Attribute Grammar:
GRAMMAR RULE SEMANTIC RULES
exp1 exp2 + term exp1 .val = exp2 .val term.val
exp1 exp2 - term exp1 .val = exp2 .val term.val
exp term exp.val = term.val
term1 term2 * factor term1 .val = term2 .val * factor.val
term factor term.val = factor.val
factor ( exp ) factor.val = exp.val
factor number factor.val = number.val
Fall, 2002 CS 153 - Chapter 6 34
Notes:
Different instances of same
nonterminal must be subscripted to
distinguish them.
Some attributes must have been
precomputed (by scanner or parser),
e.g. number.val.
These particular attribute equations
look a lot like a yacc specification,
because they represent a bottom-up
attribute computation.
Fall, 2002 CS 153 - Chapter 6 35
A Second Example Grammar:
decl type var-list type int | float
var-list id , var-list | id
Attribute Grammar:
GRAMMAR RULE SEMANTIC RULES
decl type var-list var-list.dtype = type.dtype
type int type.dtype = integer
type float type.dtype = real
var-list1 id , var-list2 id .dtype = var-list1.dtype
var-list2.dtype = var-list1.dtype
var-list id id .dtype = var-list.dtype
Fall, 2002 CS 153 - Chapter 6 36
Notes
Data type typically propagates down
a syntax tree via declarations.
No longer something yacc can
handle directly.
Such an attribute is called inherited,
while bottom-up calculation is called
synthesized.
Syntax tree is a standard synthesized
attribute computable by yacc; other
attributes computed on the tree.
11/8/2012
7
Fall, 2002 CS 153 - Chapter 6 37
Dependency graph
Indicates order in which attributes must
be computed.
Synthesized attributes always flow from
children to parents, and can always be
computed by a postorder traversal.
Inherited attributes can flow any other
way.
L-attributed: a left-to-right traversal
suffices to compute attributes. However,
this may involve a combination of pre-
order, inorder, and postorder traversal. Fall, 2002 CS 153 - Chapter 6 38
Data type dependencies (by
grammar rule):
var-list type
decl
dtype dtype
decl type var-list:
var-list.dtype = type.dtype
, dtype
dtype
dtype
var-list
var-list id
var-list id , var-list:
id .dtype = var-list1.dtype
var-list2.dtype = var-list1.dtype
Fall, 2002 CS 153 - Chapter 6 39
L-attributed dependencies have
three basic mechanisms:
(b) Inheritance from sibling to sibling
via the parent
(a) Inheritance from parent to siblings
a
a
a
A
C B a a
A
C B
(c) Sibling inheritance via sibling pointers
a a
A
C B
Fall, 2002 CS 153 - Chapter 6 40
Sample tree structure: typedef enum {decl,type,id} nodekind;
typedef enum {integer,real} typekind;
typedef struct treeNode
{ nodekind kind;
struct treeNode
* lchild, * rchild, * sibling;
typekind dtype;
/* for type and id nodes */
char * name;
/* for id nodes only */
} * SyntaxTree;
Fall, 2002 CS 153 - Chapter 6 41
Sample tree instance:
decl
type dtype ( = real )
id id ( ) x ( ) y
String: float x, y
Tree:
Fall, 2002 CS 153 - Chapter 6 42
Traversal code: void evalType (SyntaxTree t)
{ switch (t->kind)
{ case decl:
t->rchild->dtype = t->lchild->dtype;
evalType(t->rchild);
break;
case id:
if (t->sibling != NULL)
{ t->sibling->dtype = t->dtype;
evalType(t->sibling);
}
break;
} /* end switch */
} /* end evalType */
11/8/2012
8
Fall, 2002 CS 153 - Chapter 6 43
Attributes need not be kept in the
syntax tree:
GRAMMAR RULE SEMANTIC RULES
decl type var-list
type int dtype = integer
type float dtype = real
var-list1 id , var-list2 insert(id .name, dtype)
var-list id insert(id .name, dtype)
dtype is global Use a symbol table
to store the type of
each identifier
Fall, 2002 CS 153 - Chapter 6 44
New traversal code: typekind dtype; /* global */
void evalType (SyntaxTree t)
{ switch (t->kind)
{ case decl:
dtype = t->lchild->dtype;
evalType(t->rchild);
break;
case id:
insert(t->name,dtype);
if (t->sibling != NULL)
evalType(t->sibling);
break;
} /* end switch */
} /* end evalType */
Fall, 2002 CS 153 - Chapter 6 45
Even better, use a parameter
instead of a global variable: void evalDecl(SyntaxTree t)
{ evalType(t->rchild, t->lchild->dtype);
}
void evalType(SyntaxTree t, typekind dtype)
{ insert(t->name,dtype);
if (t->sibling != NULL)
evalType(t->sibling,dtype);
}
Note: inherited attributes can often be turned into
parameters to recursive traversal functions, while
synthesized attributes can be turned into returned
values.
Fall, 2002 CS 153 - Chapter 6 46
Alternative to a difficult inherited
situation (not recommended):
Theorem (Knuth [1968]). Given an
attribute grammar, all inherited
attributes can be changed into
synthesized attributes by suitable
modification of the grammar, without
changing the language of the grammar.
Fall, 2002 CS 153 - Chapter 6 47
Example: New grammar for types:
decl var-list id
var-list var-list id , | type
type int | float
New Tree for float x, y
might be:
type dtype ( = real )
id
id
( ) x
( ) y dtype ( = real )
dtype ( = real )
Fall, 2002 CS 153 - Chapter 6 48
Our approach:
Compute inherited stuff first (symbol
table) in a separate pass
Then type inference and type
checking turns into a purely
synthesized attribute computation,
since all uses of names have their
types already computed.
Next:
– Symbol table structure
– Synthesized type rules
11/8/2012
9
7.2.2
LR(1) - (1)
Semantic routines
are invoked only when a structure is recognized.
LR parsing
a structure is recognized when the RHS is reduced to LHS.
Therefore, action symbols must be placed at the end.
49
Ex:
<stmt>
if <cond> then <stmt> end
if <cond> then <stmt> else <stmt> end
# ifThenElse
# ifThen
2012/11/8
7.2.2
LR(1) - (2)
After shifting “if <cond> “
The parser cannot decide
which of #ifThen and #ifThenElse should be invoked.
cf. In LL parsing,
The structure is recognized when a non-terminal is
expanded.
50 2012/11/8
7.2.2
LR(1) - (3)
However, sometimes we do need to perform semantic
actions in the middle of a production.
51
<stmt> if <exp> then <stmt> end
generate code for <exp>
Need a conditional jump here.
generate code for <stmt>
Ex:
Solution: Use two productions:
<stmt> <if head> then <stmt> end #finishIf
<if head> if <exp> #startIf
semantic hook (only for semantic processing)
2012/11/8
Another problem
What if the action is not at the end?
Ex:
<prog> #start begin <stmt> end
We need to call #start.
Solution: Introduce a new non-terminal.
<prog><head> begin <stmt> end
<head>#start
YACC automatically performs such transformations. 52
7.2.2
LR(1) - (4)
2012/11/8
7.2.3
Semantic Record Representation - (1)
Since we need to use a stack to store semantic records,
all semantic records must have the same type.
variant record in Pascal
union type in C
Ex: enum kind {OP, EXP, STMT, ERROR};
typedef struct {
enum kind tag;
union {
op_rec_type OP_REC;
exp_rec_type EXP_REC;
stmt_rec_type STMT_REC;
......
}
} sem_rec_type; 53 2012/11/8
How to handle errors?
Ex.
A semantic routine
needs to create a record for each identifier in an expression.
What if the identifier is not declared?
The solution at next page…….
54
7.2.3
Semantic Record Representation - (2)
2012/11/8
11/8/2012
10
Solution 1: make a bogus record
This method may create a chain of
meaningless error messages due to this bogus record.
Solution 2: create an ERROR semantic record
No error message will be printed
when ERROR record is encountered.
WHO controls the semantic stack?
action routines
parser
55
7.2.3
Semantic Record Representation - (3)
2012/11/8
7.2.4
Action-controlled semantic stack - (1)
Action routines take parameters from
the semantic stack directly and push results onto the stack.
Implementing stacks:
1. array
2. linked list
Usually, the stack is transparent - any records
in the stack may be accessed by the semantic routines.
(-) difficult to change
56 2012/11/8
Two other disadvantages:
(-)Action routines
need to manage the stack.
(-)Control of the stack
is distributed among action routines.
Each action routine
pops some records and pushes 0 or 1 record.
If any action routine
makes a mistake, the whole stack is corrupt.
The solution at next page……..
57
7.2.4
Action-controlled semantic stack - (2)
2012/11/8
Solution 1: Let parser control the stack
Solution 2: Introduce additional stack routines
Ex:
Parser Stack routines Parameter-driven action routines
If action routines
do not control the stack, we can use opaque (or abstract)
stack: only push() and pop() are provided.
(+) clean interface
(- ) less efficient
58
7.2.4
Action-controlled semantic stack - (3)
2012/11/8
7.2.5
parser-controlled stack - (1)
LR
Semantic stack and parse stack operate in parallel [shifts and
reduces in the same way].
Ex:
<stmt> if <exp> then <stmt> end
Ex:
YACC generates such parser-controlled semantic stack.
<exp><exp> + <term>
{ $$.value=$1.value+$3.value;}
59
<stmt>
then
<exp>
if
:
<stmt>
then
<exp>
if
..........
..........
..........
..........
parser stack semantic stack
may be combined
2012/11/8
LL parser-controlled semantic stack
Every time a production AB C D is predicted,
60
A
B
C
D
: :
Parse stack
:
A :
D
C B :
A :
top
right
current
left
12
11
10
9
8
7
Semantic stack
Need four pointers for the semantic stack (left, right, current, top).
7.2.5
parser-controlled stack - (2)
2012/11/8
11/8/2012
11
However, when a new production BE F G is predicted,
the four pointers will be overwritten.
Therefore, create a new EOP record for the four
pointers on the parse stack.
When EOP record appears on stack top,
restore the four pointers, which essentially pops off
records from the semantic stack.
An example at next page…….
61
7.2.5
parser-controlled stack - (3)
2012/11/8 62
Parse stack
A
:
B
C
D
EOP(...)
:
E
F
G
EOP(7,9,9,12)
C
D
EOP(......)
:
EFG A BCD B
Semantic stack
:
A
:
D
C
B
:
A
:
G
F
E
D
C
B
:
A
:
top
right
current
left
top
right
current
left
current
9
8
7
12
11
10
9
8
7
15
14
13
12
11
10
9
8
7
7.2.5
parser-controlled stack - (4)
2012/11/8
Note
All push() and pop() are done by the parser
Not by the action routines.
Semantic records
Are passed to the action routines by parameters.
Example
<primary>(<exp>) #copy ($2,$$)
63
7.2.5
parser-controlled stack - (5)
2012/11/8
Initial information
is stored in the semantic record of LHS.
After the RHS is processed the resulting information
is stored back in the semantic record of LHS.
64
7.2.5
parser-controlled stack - (6)
initially
:
A
:
D
C
B
:
A
:
:
A
:
finally
information flow (attributes) 2012/11/8
(-) Semantic stack may grow very big.
<fix>
Certain non-terminals never use semantic records,
e.g. <stmt list> and <id list>.
We may insert #reuse
before the last non-terminal in each of their productions.
Example
<stmt list><stmt> #reuse <stmt tail>
<stmt tail><stmt> #reuse <stmt tail>
<stmt tail>
Evaluation
Parser-controlled semantic stack is easy with LR, but not so with LL.
65
7.2.5
parser-controlled stack - (7)
2012/11/8
7.3
Intermediate representation and code generation
Two possibilities:
66
1. ..... semantic
routines
code
generation Machine code
(+) no extra pass for code generation
(+) allows simple 1-pass compilation
2. semantic
routines
code
generation Machine code
IR
Target machine is abstracted to some virtual machine Allows language-oriented primitives
Code generation separated from semantic routines Semantic routines don't care about temp reg.
Reduces machine dependence (isolated to code generation
Optimization can be done at intermediate level Optimization independent of target machine
Simpler and better optimization (IR more high-level)
(+) allows higher-level operations e.g. open block, call procedures.
.....
2012/11/8
11/8/2012
12
IR vs Machine Code
Generating machine code advantages:
No overhead of extra pass to translate IR
Conceptually simple compilation model
Bottom line
IR valuable if optimization or portability is an important issue
Machine code much simpler
Forms of IR – Postfix Notation
Concise
Simple translation
Useful for interpreters and target machines with a stack architecture
Not particularly good for optimization or code generation
Example:
Code Postfix
a+b ab+
a+b*c abc*+
(a+b)*c ab+c*
a:=b*c+b*d abc*bd*+:=
Forms of IR – Three-Address Codes Virtual machine having operations with 3
operands, 2 source, 1 destination
Explicitly reference intermediates
Triples: op, arg1, arg2 More concise
Position dependency makes moving/removing triples hard
such as during optimization
Quadruples: op, arg1, arg2, arg3 More convenient for code generation than
postfix
Expression oriented, not so good for other uses
a := b*c + b*d
(1) ( * b c ) (1) ( * b c t1 )
(2) ( * b d ) (2) ( * b d t2 )
(3) ( + (1) (2)) (3) ( + t1 t2 t3 )
(4) (:= (3) a ) (4) ( := t3 a _ )
intermediate results
are referenced by
the instruction #
use
temporary
names
Forms of IR – Three-Address Codes Float a,d; Int b,c;
a:=b*c+b*d
Triples Quadruples
(1)(MULTI,Addr(b),Addr(c)) (1)(MULTI,Addr(b),Addr(c),t1)
(2)(FLOAT,Addr(b),-) (2)(FLOAT,Addr(b),t2,-)
(3)(MULTF,(2),Addr(d)) (3)(MULTF,t2,Addr(d),t3)
(4)(FLOAT,(1),-) (4)(FLOAT,t1,t4,-)
(5)(ADDF,(4),(3)) (5)(ADDF,t4,t3,t5)
(6)(:=,(5),Addr(a)) (6)(:=,t5,Addr(a),-)
Can also add more detail, such as type or address.
These forms translate input, other 3 forms transform it
Forms of IR – Tuples
Tuples allow variable number of operands
A generalization of quadruples
a:=b*c+b*d
(1)(MULTI,Addr(b),Addr(c),t1)
(2)(FLOAT,Addr(b),t2)
(3)(MULTF,t2,Addr(d),t3)
(4)(FLOAT,t1,t4)
(5)(ADDF,t4,t3,t5)
(6)(:=,t5,Addr(a))
Forms of IR – Trees
Syntax trees can also be used
Directed acyclic graph (DAG) is an option
Can use an abstract syntax tree
More complex and more powerful
Tree Transformations for optimizations
Ex: a := b*c + b*d
:=
a +
* *
b c b d
:=
a +
* *
b c d
..... Ex. Ada uses Diana.
11/8/2012
13
Fall, 2002 CS 153 - Chapter 6 - Part 2 73
Symbol Table
Major data structure after syntax tree.
An inherited attribute that may be kept
globally.
May be needed before semantic
analysis (or some form of it, as in C),
but makes sense to put off computing
it until necessary.
Stores declaration information using
name as primary key.
Fall, 2002 CS 153 - Chapter 6 - Part 2 74
Specific information stored in
symbol table depends heavily on
language, but generally includes:
– Data type
– Scope (see below)
– Size (bytes, array length)
– Potential or actual location information
(addresses, offsets - see later)
Fall, 2002 CS 153 - Chapter 6 - Part 2 75
One way to finesse the issue of what
information to put into the table is to
just keep pointers in the table that
point to declaration nodes in the
syntax tree. Then symbol table code
doesn’t need to be changed when
changing the information, since it is
stored in the node, not directly in the
table. This is the approach taken in
the TINY compiler, and should be
carried over to C-Minus.
Fall, 2002 CS 153 - Chapter 6 - Part 2 76
Scope Information
Requires that symbol table have some
kind of “delete” operation in addition to
lookup and insert, since exiting a scope
requires that declarations be removed
from view (that is, lookups no longer
find them, though they may still be
referenced elsewhere).
Delete operation should not in general
re-process individual declarations:
exitScope() should do them all in O(1). Fall, 2002 CS 153 - Chapter 6 - Part 2 77
C has simple scope structure:
All names must be declared before use
(although multiple declarations are
possible).
Scopes are nested in a stack-like fashion,
and cannot be re-entered after exit (simple
delete is possible).
Scope information can be kept simply as a
number: the nesting level (needed during
semantic analysis because redeclaration
in same scope is illegal in C).
Fall, 2002 CS 153 - Chapter 6 - Part 2 78
Example:
typedef int z;
int y;
/* this is legal C! */
void x(double x)
{ char* x;
{ char x;
}
}
“external” (global) scope: nestLevel 0
nestlevel 1 begins with params
nestlevel 2 begins with function body
nestlevel 3
11/8/2012
14
Fall, 2002 CS 153 - Chapter 6 - Part 2 79
Not all compilers get it right that
parameters have a separate scope
from the function body in C. But gcc
does:
C:\classes\cs153\f02>gcc -c scope.c
scope.c: In function `x':
scope.c:6: warning: declaration of
`x' shadows a parameter
At least all names occupy a single
“namespace” in C, so one symbol
table is enough (compare to Java).
Fall, 2002 CS 153 - Chapter 6 - Part 2 80
Java has 5 “namespaces”,
depending on type of declaration:
package A; // legal Java!!!
class A
{ A A(A A)
{ A:
for(;;)
{ if (A.A(A) == A) break A; }
return A;
}
}
Fall, 2002 CS 153 - Chapter 6 - Part 2 81
Further complication in Java:
local redeclaration even in nested
scopes is illegal: class A
{ A A(A A)
{ for(;;)
{ A A; // oops, now illegal!
if (A.A(A) == A) break;
}
return A;
}
}
Fall, 2002 CS 153 - Chapter 6 - Part 2 82
Symbol table data structure
properties:
All operations should be very fast
(preferably O(1)).
Must be able to disambiguate
overloaded name use (depending on
language): add type, scope, nesting
info to lookup.
Must not be affected by typical
programmer “clustered” names: x1,
x11, x12, etc. Fall, 2002 CS 153 - Chapter 6 - Part 2 83
Best bet:
Use a hash table (or a list or tree or
hash table of hash tables).
Separate chains better than a closed
array (chains handled as little stacks,
insertions and deletions always at
the front).
Hash function needs to use all
characters in a name (to avoid
collisions), and involve character
position too! Fall, 2002 CS 153 - Chapter 6 - Part 2 84
Example:
Indices Buckets Lists of Items
0
1
2
3
4
>
>
>
> j
i
size
temp
11/8/2012
15
Fall, 2002 CS 153 - Chapter 6 - Part 2 85
Sample hash function code:
#define SIZE 211 // typically a prime number
#define SHIFT 4
int hash ( char * key )
{ int temp = 0;
int i = 0;
while (key[i] != '\0')
{ temp = ((temp << SHIFT) + key[i]) % SIZE;
++i;
}
return temp;
}
Fall, 2002 CS 153 - Chapter 6 - Part 2 86
Easy way to get O(1) behavior when
exiting a scope: use a linked list (or tree
or…) of hash tables, one hash table for
each scope:
> j (int) > size (int)
> i (int) > i (char)
> temp (char)
> f (function)
> j (char *)
Fall, 2002 CS 153 - Chapter 6 - Part 2 87
Some structure similar to the previous
slide is actually required in C++, Ada, and
other languages where scopes can be
arbitrarily re-entered (C++ has the scope
resolution operator ::), since individual
scopes must be attached to names,
allowing them to be “called”:
class A { void f(); }
...
void A::f() // go back inside A
{ ... }
Fall, 2002 CS 153 - Chapter 6 - Part 2 88
Two additional scope issues (of
many):
Recursion: insertion into table must occur
before processing is complete: // lookup of f in body must work:
void f() { … f() … }
Relaxation of declaration before use rule
(C++ and Java class scopes):
all insertions must occur before all
lookups (two passes required): class A
{ int f() { return x; } int x; }
Fall, 2002 CS 153 - Chapter 6 - Part 2 89
One more scope issue: dynamic
scope Some languages use a run-time
version of scope that does not follow
the layout of the program on the
page, but the execution path: LISP,
perl.
Symbol table then must be part of
runtime system, providing lookup of
names during execution (it better be
really fast in this case).
Fall, 2002 CS 153 - Chapter 6 - Part 2 90
Called “dynamic scope” (vs. the
more usual lexical or static scope).
A questionable design choice for any
but the most dynamic, interpreted
languages, since there can then be
no static semantic analysis (no static
type checking, for example)
Running the symbol table during
execution also slows down execution
speed substantially
11/8/2012
16
Fall, 2002 CS 153 - Chapter 6 - Part 2 91
Example of dynamic scope (C
syntax): int i = 1;
void f(void)
{ printf("%d\n",i);}
main()
{ int i = 2;
/* the following call prints 1 using normal lexical
scoping, but prints 2 (the value of the local i)
using dynamic scope */
f();
return 0;
}
Fall, 2002 CS 153 - Chapter 6 - Part 2 92
TINY symbol table:
All names are global: there are no
scopes.
Declaration is by use: if a lookup
fails, perform an insert.
Virtually no information has to be
kept (all names are int vars), so I had
to invent something to store in the
symbol table (line numbers).
No deletes!
Fall, 2002 CS 153 - Chapter 6 - Part 2 93
TINY symtab.h:
/* Insert line numbers and memory locs
into the symbol table */
void st_insert( char * name, int lineno, int loc
);
/* return the memory
location of a variable or -1 if not found */
int st_lookup ( char * name );
/* Procedure printSymTab prints a formatted
listing of the symbol table contents
to the listing file */
void printSymTab(FILE * listing);
Fall, 2002 CS 153 - Chapter 6 - Part 2 94
Sample TINY code building the
symbol table:
case AssignK:
case ReadK:
if (st_lookup(t->attr.name) == -1)
/* not yet in table, so treat as new definition */
st_insert(t->attr.name,t->lineno,location++);
else
/* already in table, so ignore location,
add line number of use only */
st_insert(t->attr.name,t->lineno,0);
break;
Fall, 2002 CS 153 - Chapter 6 - Part 2 95
C-Minus Symbol Table
Use basic structure of TINY
Store tree pointers
Add enterScope() and exitScope()
List of tables structure helpful (slide 15)
Add nesting level to tree nodes
Add pointer to declaration in all ID nodes
(found by lookup)
Use best ADT methods (hide all details of
actual symtab structure)
Fall, 2002 CS 153 - Chapter 6 - Part 2 96
Sample C-Minus symtab.h: /* Start a new scope; return 0 if malloc fails,
else 1 */
int st_enterScope(void);
/* Remove all declarations in the current scope */
void st_exitScope(void);
/* Insert def nodes from the syntax tree
return 0 if malloc fails, else 1 */
int st_insert( TreePtr );
/* Return the defnode of a variable, parameter, or
function, or NULL if not found */
TreePtr st_lookup ( char * name );
11/8/2012
17
Fall, 2002 CS 153 - Chapter 6 - Part 2 97
Data types and type checking
A data type is constructed recursively out
of simple or base types (int, char, double,
etc.) and type constructors that create
“new” types out of a group of existing
ones: struct, union, * (“pointer to”), enum,
[ ] (“array of”), etc.
Types in code are checked by examining
the “compatibility” of the types of the
components, and by determining a
“result” type, if any, from these.
Fall, 2002 CS 153 - Chapter 6 - Part 2 98
C Example
Suppose a function is declared as char * f(double d)
Data type of f is then
char*()(double)
(function from double to char*)
The call f(2) type checks because f
is a function, 2 is an int, and int is
compatible in C with double (can be
silently converted). The result then must be of type char*
Fall, 2002 CS 153 - Chapter 6 - Part 2 99
In terms of syntax tree:
call
num: 2 id: f
char*()(double) int (2) compatible with
(3) result has type
char*
(1) is function
Fall, 2002 CS 153 - Chapter 6 - Part 2 100
Type compatibility of constructed
types Generally depends on a notion of when
two type are “equal” (equivalent), or at
least closely related.
C example: struct {} x,z;
struct {} y;
y = x; // illegal! (different
types)
z = x; // ok! Same types
Fall, 2002 CS 153 - Chapter 6 - Part 2 101
On the other hand: struct A {} x;
struct A y;
y = x; // now it’s ok!
struct A {} x; declares a type (with
name “struct A”) and a variable x
Reusing name struct A gives same type
Writing struct {} defines a type with a
hidden internal name (so it can’t be
referred to).
Fall, 2002 CS 153 - Chapter 6 - Part 2 102
Type Equivalence Algorithm
Structural equivalence: as long as the
types have the same structure, they are
equivalent.
Name equivalence: types are equivalent
only if they are identical as names
Declaration equivalence: types are
equivalent if they lead back (through
renaming) to the same original use of a
type constructor.
11/8/2012
18
Fall, 2002 CS 153 - Chapter 6 - Part 2 103
Equivalence Example (C syntax)
struct A {};
typedef struct A A;
typedef struct {} B;
struct A x; A y; B z;
x, y, z all structurally equivalent
x, y declaration equivalent, but z is
not declaration equivalent to these
none are name equivalent
Fall, 2002 CS 153 - Chapter 6 - Part 2 104
C uses a combination of
structural and declaration
equivalence: Declaration equivalence for struct
and union
structural equivalence for arrays,
pointers, and functions
enum isn’t even a type constructor,
but constructs a named subrange of
int (unlike C++ - see next slide)
Fall, 2002 CS 153 - Chapter 6 - Part 2 105
Digression: Enums in C and C++
An enum in C is not a real type constructor: enum A {one,two,three} x;
enum B {four,five,six} y;
x = y /* ok in C */
In C++ this assignment is an error: C:\classes\cs153\f02>gxx enum.cpp
enum.cpp: In function `int main()':
enum.cpp:7: cannot convert `B' to `A' in
assignment
Note how error message implies that C++
automatically generates a typedef enum A A!
Fall, 2002 CS 153 - Chapter 6 - Part 2 106
Representing types internally in a
compiler
Since types are built up recursively,
a tree structure must be used (syntax
tree gets another major node kind:
datatype).
Some languages (FORTRAN, TINY,
C-Minus) have flat type spaces, so
that an enum can be used: int,
intarray, function.
Fall, 2002 CS 153 - Chapter 6 - Part 2 107
Functions generally are type
constructors too, but their types do
not have to be built explicitly, since
the return type and parameter types
are available in the syntax tree for
checking (unless, of course, function
types can be explicitly written, as in C: typedef char*(F)(double)- see
the next slide).
Fall, 2002 CS 153 - Chapter 6 - Part 2 108
Digression on C function types
There are two kinds of function types in C
that are almost identical (and that can
almost be used interchangeably) -
function constants and function pointers: typedef char* F(double);
typedef char* (*G)(double);
F is a “constant” function type (a
prototype), while G is a “pointer to
function” type, or function variable: F f; // a prototype for a func f
G g = f; // g is var init’ed to f
f = g; // illegal - f is const
11/8/2012
19
Fall, 2002 CS 153 - Chapter 6 - Part 2 109
In many ways, this mirrors the close
relationship in C between pointers and
arrays: int x[10];
int* y = x; // ok
x = y; // illegal
In calls and params it really doesn’t matter
which type you use or assume: f(2), (*f)(2) and (&f)(2) all work fine,
and void p(F ff) and void p(G gg) are identical in effect.
Fall, 2002 CS 153 - Chapter 6 - Part 2 110
Recursive types
Present special problems: struct A { int x; struct A next; };
is illegal, because it would represent an “infinite” type (just as void f(void) {
f(); } represents an “infinite” call).
In C must interpose a pointer: struct A { int x; struct A* next; };
Some languages use a union instead.
Others (like Java) have implicit
pointers. Fall, 2002 CS 153 - Chapter 6 - Part 2 111
Other issues (a sample)
Should array size be part of its type?
(C says no)
How far should compatibility of types
go? (Should any two pointers be
compatible?)
Dynamic typing: constructing types
during execution.
Fall, 2002 CS 153 - Chapter 6 - Part 2 112
Type checking in TINY
Only two types: int and bool
Only need to check if statement,
while statement, assignment, and a
few other cases
type errors may create a “void” type.
Suppress error messages in the
presence of void.
Fall, 2002 CS 153 - Chapter 6 - Part 2 113
Sample TINY type checking
code
switch (t->kind.exp)
{ case OpK:
if ((t->child[0]->type != Integer) ||
(t->child[1]->type != Integer))
typeError(t,"Op applied to non-integer");
if ((t->attr.op == EQ) || (t->attr.op == LT))
t->type = Boolean;
else
t->type = Integer;
break;
Fall, 2002 CS 153 - Chapter 6 - Part 2 114
Type Checking in C-Minus
Go through Appendix A carefully,
writing out all type rules
As in TINY, there are only a few types
(other than functions). And there are
no explicit function types, or function
variables or parameters. Also no
recursive types. And no typedefs.
Answer questions such as: is x = y
legal if x and y are both arrays?
11/8/2012
20
Fall, 2002 CS 153 - Chapter 6 - Part 2 115
Example from Appendix A
18. expression var = expression | simple-expression
19. var ID | ID [ expression ]
An expression is a variable reference followed by an expression,
or just a simple expression. The assignment has the usual storage
semantics: the location of the variable represented by var is
found, then the subexpression to the right of the assignment is
evaluated, and the value of the subexpression is stored at the
given location. This value is also returned as the value of the
entire expression. A var is either a simple (integer) variable or a
subscripted array variable. A negative subscript causes the
program to halt (unlike C). However, upper bounds of subscripts
are not checked.
Fall, 2002 CS 153 - Chapter 6 - Part 2 116
Making syntax tree traversals easy: use
“generic” traversal function:
static void traverse( TreeNode * t,
void (* preProc) (TreeNode *),
void (* postProc) (TreeNode *) )
{ if (t != NULL)
{ preProc(t);
{ int i;
for (i=0; i < MAXCHILDREN; i++)
traverse(t->child[i],preProc,postProc);
}
postProc(t);
traverse(t->sibling,preProc,postProc);
}
}
Fall, 2002 CS 153 - Chapter 6 - Part 2 117
// builds symtab in preorder:
traverse(syntaxTree,insertNode,nullProc);
// checks types in postorder:
traverse(syntaxTree,nullProc,checkNode);
void nullProc( treeNode* t)
{}
etc . . .
Fall, 2002 CS 153 - Chapter 6 - Part 2 118
Analyze.h - a two-step process:
/* Function buildSymtab constructs the symbol
* table by preorder traversal of the syntax tree
*/
void buildSymtab(TreeNode *);
/* Procedure typeCheck performs type checking
* by a postorder syntax tree traversal
*/
void typeCheck(TreeNode *);
Fall, 2002 CS 153 - Chapter 6 - Part 2 119
What should C-Minus Print
under TraceAnalyze?
Possibly a representation of the
symbol table, as in TINY
But also another representation of
the tree with types added
PrintTree could be modified to do
this, or a new PrintTypes function
added to util.h/util.c
An Example of C-Minus Symbol
Table Construction
and the use of the symbol table to
link uses of names to their defs.
CS 153 - Fall, 2002 - K. Louden -
11/10/02
11/8/2012
21
11/11/02 K. Louden, CS 153, Fall 2002 121
The Example:
int a; /*d1*/
int b[10]; /*d2*/
int c /*d3*/ (int a[] /*d4*/, int c /*d5*/)
{ /* Position 1 */ if (c)
{ int d; /*d6*/ /* Position 2 */
d = a[c] + b[c];
return d; }
return 0; }
void main(void) /*d7*/
{ /* Position 3 */
output(c(b,a));
}
11/11/02 K. Louden, CS 153, Fall 2002 122
Syntax tree:
call
a
block
b
c main
call output
c b a
d1 d2
d3 d7
c a
d5 d4
block
if return
0 c block
d
d6 =
d +
subs
a c
subs
b c
11/11/02 K. Louden, CS 153, Fall 2002 123
Symbol Table at Position 1:
nestLevel 2
a a d4
c d5
nestLevel 1 nestLevel 0
input
a d1
b d2
c d3 output
11/11/02 K. Louden, CS 153, Fall 2002 124
Lookup of c after position 1 produces
the following tree with link:
call
a
block
b
c main
call output
c b a
d1 d2
d3 d7
c a
d5 d4
block
if return
0 c block
d
d6 =
d +
subs
a c
subs
b c
11/11/02 K. Louden, CS 153, Fall 2002 125
Symbol Table at Position 2:
nestLevel 2
a a d4
c d5
nestLevel 1 nestLevel 0
input
a d1
b d2
c d3 output d d6
nestLevel 3
11/11/02 K. Louden, CS 153, Fall 2002 126
Lookups of a, b, c , and d after position
2 produces the following tree with links:
call
a
block
b
c main
call output
c b a
d1 d2
d3 d7
c a
d5 d4
block
if return
0 c block
d
d6 =
d +
subs
a c
subs
b c
11/8/2012
22
11/11/02 K. Louden, CS 153, Fall 2002 127
Symbol Table at Position 3:
nestLevel 2 nestLevel 1 nestLevel 0
input
a d1
b d2
c d3 output
main d7
11/11/02 K. Louden, CS 153, Fall 2002 128
Lookups of output, a, b, and c after pos.
3 produces the following tree with links:
call
a
block
b
c main
call output
c b a
d1 d2
d3 d7
c a
d5 d4
block
if return
0 c block
d
d6 =
d +
subs
a c
subs
b c
output