Semantic Analysis(Generating An AST)
CS 471September 26,
2007
2 CS 471 – Fall 2007
Semantic Analysis
Source code
Lexical Analysis
Parsing
Semantic Analysis
Valid programs: decorated AST
lexical errors
syntax errors
semantic errors
tokens
AST
3 CS 471 – Fall 2007
Goals of a Semantic Analyzer
Compiler must do more than recognize whether a sentence belongs to the language…
• Find all possible remaining errors that would make program invalid
– undefined variables, types– type errors that can be caught statically
• Figure out useful information for later phases– types of all expressions– data layout
4 CS 471 – Fall 2007
Semantic Actions
Can do useful things with the parsed phrases– Each terminal and nonterminal may be
associated with type, e.g. exp: INT type is int– For rule: A B C D •Type must match A•Value can be built with BCD
5 CS 471 – Fall 2007
Semantic Actions
Semantic action executed when grammar production is reduced
• Recursive-descent parser: semantic code interspersed with control flow
• Yacc: fragments of C code attached to a grammar production
6 CS 471 – Fall 2007
Interpreter
Could develop an interpreter that executes the program as part of the semantic actions!
Example Grammar:
E id
E E + E
E E – E
E E * E
E -E
7 CS 471 – Fall 2007
Unions in Yacc
%union allows us to declare a union datatype
used to package the types/attributes of symbols
%union {
int pos;
int ival;
string sval;
struct {
int intval;
enum Types valtype;
} constantval;
A_exp exp;
}
Exported asYYSTYPE
8 CS 471 – Fall 2007
Types in Yacc
Using the values of union structs, tell Yacc the types
Terminals
%token <sval> ID STRING
%token <ival> INT
%token <pos> COMMA SEMI LBRACE RBRACE …
And Nonterminals (use %type)
%type <exp> expression program
LHS of productiontype
9 CS 471 – Fall 2007
Symbols in Yacc
•The symbol $n (n > 0) refers to the attribute of nth symbol on the RHS
•The symbol $$ refers the attribute of the LHS
•The symbol $n (n 0) refers to contextual information
Note: actions in middle contribute as a symbol!
expr : expr1 PLUS expr2
$$ $1 $3
10 CS 471 – Fall 2007
Interpreter in Yacc
%{ declarations of yylex and yyerror %}%union {int num; string id}% token <num> INT% token <id> ID% type <num> exp% start exp
%left PLUS MINUS%left TIMES%left UMINUS%%
[please fill in solution]
E id E E + EE E – EE E * EE -E
Recall
expr : expr1 PLUS expr2
$$ $1 $3
11 CS 471 – Fall 2007
Internally: A Semantic Stack
Implemented using a stack parallel to the state stack
Stack Input Action
1 + 2 * 3 $ shift
INT: 1 + 2 * 3 $ reduce
exp: 1 + 2 * 3 $ shift
exp: 1 +: 2 * 3 $ shift
exp: 1 +: INT: 2 * 3 $ reduce
exp: 1 +: exp: 2 3 $ shift
exp: 1 +: exp: 2 *: $ shift
exp: 1 +: exp: 2 *: INT: 3 $ reduce
exp: 1 +: exp: 2 *: exp: 3 $ reduce
exp: 1 +: exp: 6 $ reduce
exp: 7 $ accept
12 CS 471 – Fall 2007
Inlined TypeChecker and CodeGen
You can even type check and generate code:
expr : expr PLUS expr {
if ($1.type == $3.type &&
($1.type == IntType ||
$1.type == RealType)) $$.type = $1.type
else error(“+ applied on wrong type!”);
GenerateAdd($1, $3, $$);
}
13 CS 471 – Fall 2007
Problems
•Difficult to read
•Difficult to maintain
•Compiler must analyze program in order parsed
•Instead … we split up tasks
14 CS 471 – Fall 2007
Compiler ‘main program’
void Compile() {
TokenStream l = Lexer(input);
AST tree = Parser(l);
if (TypeCheck(tree))
IR ir = genIntermediateCode(tree);
emitCode(ir);
}
}
15 CS 471 – Fall 2007
Thread of control
Input Stream
Lexer
Parser
characters
tokens
AST
compile
parse
getToken
readStream
AST
16 CS 471 – Fall 2007
Producing the Parse Tree
Separates issues of syntax (parsing) from issues of semantics (type checking, translation to machine code)
• One leaf for every token
• One internal node for every reduction during parsing
• Concrete parse tree represents concrete syntax
But … parse tree has problems
• Punctuation tokens redundant
• Structure of the tree conveys this info
Enter the Abstract Syntax Tree
17 CS 471 – Fall 2007
AST
• Abstract Syntax Tree is a tree representation of the program. Used for
– semantic analysis (type checking)– some optimization (e.g. constant folding)– intermediate code generation (sometimes
intermediate code = AST with somewhat different set of nodes)
• Compiler phases = recursive tree traversals
18 CS 471 – Fall 2007
Do We Need An AST?
• Old-style compilers: semantic actions generate code during parsing
Problems:
• hard to maintain
• limits language features
• not modular!
expr ::= expr PLUS expr {: emitCode(add); :}input
parser
code
stack
19 CS 471 – Fall 2007
Interesting Detour
•Old compilers didn’t create ASTs … not enough memory to store entire program
•Can also see reasons for C requiring forward declarations - avoids an extra compilation pass
20 CS 471 – Fall 2007
Positions
In one pass compiler – errors reported using position of the lexer as approximation (global var)
Abstract syntax data structures must have pos fields
• Line number
• Char number
•Line number is unambiguous
•Char number is a matter of style
21 CS 471 – Fall 2007
Abstract Syntax for Tiger
/* absyn.h */
typedef struct A_var_ * A_var;
struct A_var_
{ enum {A_simpleVar,A_fieldVar,A_subscriptVar}kind;
A_pos pos;
union {S_symbol simple;
struct {A_var var;
S_symbol sym;} field;
struct {A_var var;
A_exp exp;} subscript;
} u;
};
22 CS 471 – Fall 2007
More Syntax (Constructors…p.98)
A_var A_SimpleVar(A_pos pos, S_symbol sym);
…
A_exp A_WhileExp(A_pos pos, A_exp test, A_exp body);
…
A_expList A_ExpList(A_exp head, A_expList tail);
23 CS 471 – Fall 2007
Tiger Program
(a := 5; a+1) translates to:
A_SeqExp(2,
A_ExpList(A_AssignExp(4,
A_SimpleVar(2,
S_Symbol(“a”)), A_IntExp(7,5)),
A_ExpList((A_OpExp(11,A_plusOp,
A_VarExp(A_SimpleVar(10,
S_Symbol(“a”))),A_IntExp(12,1))),
NULL)))
• AssignExp choose column of “:=“ for pos
• OpExp choose column of “+” for pos
24 CS 471 – Fall 2007
Some Odd Tiger Features
Tiger allows mutually recursive declarations:
let var a + 5
function f() : int = g(a)
function g(i: int) = f()
in f()
end
Thus: FunctionDec constructor takes a list of functions
25 CS 471 – Fall 2007
Correlation to Yacc (and your project)
(Demo)
Checklist
1. Detailed look at the Tiger AST (absyn.h)
2. Edit tiger.grm
3. The Tiger Language Manual• PA3 and PA4 make heavy use of it• Follow the structure to generate your yacc file