+ All Categories
Home > Documents > SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

Date post: 05-Jan-2016
Category:
Upload: brier
View: 64 times
Download: 4 times
Share this document with a friend
Description:
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES. Compilers that produce an executable (or the representation of an executable in object module format) as opposed to a program in an intermediate language (and, in fact, for optimization purposes, all compilers) - PowerPoint PPT Presentation
21
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES
Transcript
Page 1: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

Page 2: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

SYMBOL TABLESCompilers that produce an executable

(or the representation of an executable in object module format)

as opposed to a program in an intermediate language (and, in fact, for optimization purposes, all compilers)

need to make use of a symbol table

Page 3: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

The symbol table records information about the identifiers in the source program

such as their name, type, no. of dimensions, space assignment, etc.

Page 4: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

To illustrate the use of symbol tables, let’s consider a simple compiler, where symbol_stack consists of integers, and the integer associated with an identifier on the stack is the index of the entry for that identifier in the symbol table.

Page 5: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• Our symbol stack entries will provide pointers to the entries in the symbol table where the name of the identifier and the offset assigned to it in the data segment is stored.

• Negative numbers will be employed on symbol stack as codes to denote the registers, AX, BX, etc.

Page 6: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• As identifiers are encountered in the source code, their names are packed onto an array, we will call id_stack, defined as: char id_stack[1000];

• Since strings in C all end in a 00h byte, it is only necessary to specify where on id_stack a name begins, in order to retrieve it.

Page 7: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• The symbol table entry for a name does not contain the name itself, but instead a pointer to the beginning of the name on id_stack.

• The reason for this is that, since the symbol table is an array of symbol table entries, we would have otherwise have to provide space in each entry for the largest legal name size.

Page 8: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• When an identifier is encountered in the source code, the compiler has to search the symbol table to find the entry, if any, for it.

• Various methods have been investigated for making this process more efficient, such as the use of binary trees,

Page 9: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

But the method of choice has been to derive a number called a hash code from an identifier, and then link all identifiers with the same hash code in a list, which we will refer to as a hashlist

Page 10: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• One method for evaluating a hash code, is to add up the ascii codes of the individual characters of the identifier

• and then take, as the hash code the remainder of this sum after division by a prime number, such as 127.

Page 11: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

The following is sample code for this

purpose:

int hash(char * name) {

int hash_value = 0;

int i = 0;

while(name[i] != '\0') {

hash_value += name[i];

++i; }

return(hash_value % 127); }

In this scheme there are 127 hash-lists

Page 12: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

A simple symbol table could be defined as follows:

typedef struct {

int name_index;

int offset;

int hash_link; } symbol_table_entry;

symbol_table_entry symbol_table[1000];

Page 13: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• Here name_index is the pointer into ID_S where the name is stored,

• offset is the offset in the data segment assigned to the identifier, and

• hash_link is a pointer to the symbol table entry for the next identifier encountered, if any, with the same hash code

Page 14: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

The entries at symbol_table[0] thru symbol_table[126] are reserved for the heads of the 127 hash-lists.

Page 15: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

• For example if X1 is the first identifier encountered in the source with hash-code (say) 30, then an entry for it will be made at symbol_table[30].

• If later on, an identifier ZZ is encountered which also has hash-code 30, then an entry will be made for ZZ at the next free index > 127 in symbol_table, and the hash-link in the entry for X1 will be changed from null to point instead to the entry for ZZ.

Page 16: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

Within the rules section of the Lex definition file, the regular expression and associated code for an identifier may take a form such as the following:

{letter}({letter}|{digit}|'_')*

{yylval= find(yytext); return identifier;}

where the find function returns the index into the symbol_table of the entry for the identifier, creating an entry if one doesn’t already exist

Page 17: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

The find function begins as follows:

int find(char * name) { int j; j = hash(name); and proceeds according to the

flow-diagram on the next slide

Page 18: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

Code Generation Using the Symbol Table

Let’s consider the code required in our simple compiler within our Yacc definition file for addition.

To avoid complications, let’s assume that the code for our arithmetic expressions requires the use of register AX only

Page 19: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

So on symbol stack, positive numbers are indexes of entries for identifiers in symbol_table, and (say) -1 is used as a code for AX:

expression : expression ‘+’ term

{ c code as described below}

The c code should check whether $1 and $3 are positive or negative, and generate appropriate object code for each of the 4 cases.

Page 20: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

Case where $1 and $3 are both positive:

Generate machine code corresponding to:

mov AX, symbol_table[$1].offset;

add AX, symbol_table[$3].offset;

and set $$ = -1

Page 21: SYMBOL TABLES  &CODE GENERATION FOR EXECUTABLES

Case where $1 is neg. and $3 is positive:

Generate machine code corresponding to:

add AX, symbol_table[$3].offset;

and set $$ = -1


Recommended