+ All Categories
Home > Documents > Compiler Data Layouts - SEIDENBERG SCHOOL OF...

Compiler Data Layouts - SEIDENBERG SCHOOL OF...

Date post: 03-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
46
Joseph Bergin 1/12/99 1 Data Layouts Data Structures For a Simple Compiler
Transcript
Page 1: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 1

Data Layouts

Data Structures For a Simple Compiler

Page 2: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 2

Symbol Tables

Information about user defined names

Page 3: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 3

Symbol Table

● Symbol Tables are organized for fast lookup.È Items are typically entered once and

then looked up several times.È Hash Tables and Balanced Binary

Search Trees are commonly used.È Each record contains a ÒnameÓ

(symbol) and information describing it.

Page 4: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 4

Simple Hash Table

● Hasher translates ÒnameÓ into an integer in a fixed range- the hash value.

● Hash Value indexes into an array of lists.È Entry with that symbol is in that list

or is not stored at all.È Items with same hash value = bucket.

Page 5: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 5

Simple Hash Table

0

max

anObject

hasher

index buckets

Page 6: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 6

Self Organizing Hash Table

● Can achieve constant average time lookup if buckets have bounded average length.

● Can guarantee this if we periodically double number of hash buckets and re-hash all elements.È Can be done so as to minimize

movement of items.

Page 7: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 7

Self Organizing Hash Table

0

2 * max

newhasher

index

0

max

anObject

hasher

index

n n

n + max

Page 8: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 8

Balanced Binary Search Tree

● Binary search trees work if they are kept balanced.

● Can achieve logarithmic lookup time.

● Algorithms are somewhat complex.È Red-black trees and AVL trees are

used. È No leaf is much farther from root

than any other

Page 9: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 9

Balanced Binary Search Tree

Page 10: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 10

Symbol Tables + Blocks

● If a language is block structured then each block (scope) needs to be represented separately in the symbol table.

● If the hash table buckets are Òstack-likeÓ this is automatic.

● Can use a stack of balanced trees with one entry per scope.

Page 11: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 11

Special Cases

● Some languages partition names into different classes- keywords, variable&function names, struct names.. .

● Separate symbol tables can then be used for each kind of name. The different symbol tables might have different characteristics. È hashtable-sortedlist-binarytree.. .

Page 12: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 12

Parsing Information

Page 13: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 13

Parse Trees

● The structure of a modern computer language is tree-like

● Trees represent recursion well. ● A gramatical structure is a node

with its parts as child nodes.● Interior nodes are nonterminals.● The tokens of the language are

leaves.

Page 14: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 14

Parse Trees

<statement> ::= <variable> Ò:=Ò <expression>x := a + 5

statement

variable := expression

x a + 5

Page 15: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 15

Parse Trees

● There are different node types in the same tree.

● Variant records or type unions are typically used. Object-orientation is also useful here.

● Each node has a tag that distinguishes it, permitting testing on node type.

Page 16: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 16

Parse Stack

● Parsing is often accomplished with a stack. (Not in this version of GCL)

● The stack holds values representing tokens, nonterminals and semantic symbols from the grammar.

Ð It can either hold what is expected next (LL parsing) or what has already been seen (LR parsing)

Page 17: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 17

Parse Stack

● A stack is used because most languages and their grammars are recursive. Stacks can accomplish much of what trees can.

● The contents of the stack are usually numeric encodings of the symbols for compactness of representation and speed of processing.

Page 18: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 18

Parse Stack

<statement> ::= <variable> Ò:=Ó <expression> #doAssign

max := max + 1;

<var>

Ò:=Ó

<expr>

#doAs

...

Example being scanned:

G rammar fragment

Page 19: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 19

Stack vs Parameters

● In recursive descent parsing, no stack is needed.

● This is because the semantic records can be passed directly to the semantic routines as parameters.

● Semantic records can also be returned from the parsing functions.

Page 20: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 20

Tokens

Information produced by the Scanner

Page 21: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 21

Token Records

● Token records pass information about symbols scanned. This varies by token type.

● Variant records or type unions are typically used.

● Each value contains a tag - the token type - and additional information.È The tag is usually an integer.

Page 22: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 22

Token Examples

● Simple tokens● No additional info● Only the tag field

È e n d N u m

● Others are more complex

● Tag plus other info

È numera lNumÈ 3 5

Page 23: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 23

Handling Strings

● Strings are variable length and therefore present some problems.

● In C we can allocate a free-store object to hold the spelling--BUT, allocation is expensive in time.

● In Pascal, allocating fixed length strings is wasteful.

● Spell buffers are an alternative.

Page 24: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 24

Strings in the Free Store

write ÒThe answer is: Ò, x;

The answer is:\0

The string is represented by the value of the pointer which can bepassed around the compiler.

strval = new char[16];

Page 25: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 25

Strings in a Spell Buffer

write ÒThe answer is: Ò, x;

before

N a m e T h e a n s w e r i s : 18

3 N a m e

after

The string is represented as (3,15) = (start, length)

Page 26: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 26

Semantic Information

Page 27: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 27

Semantic Information

● Parsing and semantic routines need to share information.

● This information can be passed as function parameters or a semantic stack can be used.

● There are different kinds of semantic information.È Variant Records/Type Unions/Objects

Page 28: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 28

Semantic Records

● Each record needs a tag to distinguish its kind. We need to test the tag types.

● Depending on the tag there will be additional information.

● Sometimes the additional information must itself be a tagged union/variant record.

Page 29: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 29

Simple Semantic Records

identifiermaximum

7

addoperator+

reloperator<=

ifentryJ35J36

Page 30: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 30

Complex Semantic Records

typeentry

integer2

exprentryconst

33

* see types (later)

exprentryvariable

0, 6false

Page 31: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 31

Semantic StackIn some compilers semantic recordsare stored in a semantic stack. In others, they are passed as parameters.

typeentryinteger

2

identifiermaximum

7

identifiervalue

5

stacktop

Page 32: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 32

Type Information

Page 33: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 33

Type Information

● Type information must be maintained for variables and parameters.

● There are different kinds of typesÈ Variant Records/Type Unions/Objects

● There are different typing rules in different languages. È Pointers to records/structs are a

simple representation.

Page 34: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 34

Type Information

● Types describe variables.

È size of a variable of this type(in bytes)È kind (tag)È additional information for some

types.

● There are also recursive types.

Page 35: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 35

Simple Types

integer2

Boolean2

The tag and the size are enough.

character1

Page 36: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 36

Tuple Type

[integer, Boolean]

tuple4

integer2

Boolean2

Page 37: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 37

Recursive Types

[integer, [integer, Boolean]]

tuple6

integer2

tuple4

...

...

Page 38: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 38

Range Types

integer range[1..10]

range2

1, 10...

integer2

Page 39: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 39

Array Types

Boolean array[1..10][0..4]

array100

1, 10

array10

0, 4

Boolean2

Page 40: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 40

Array Types (alternate)

Boolean array [range1] [range2]

array100

array10

Boolean2

range2

1, 10

range2

0, 4

integer2

integer2

Page 41: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 41

Record Typesrecord [integer x, boolean y ]

record4

x y

integer2

Boolean2

Note similarity to tuple types.

Page 42: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 42

Pointer Types

pointer [integer, Boolean]

tuple4

integer2

Boolean2

pointer2

Page 43: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 43

Procedure Types

integer2

proc2... Boolean

2

proc (integer, Boolean)

Note: Not all languages have procedure typeseven when they have procedures.

Page 44: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 44

Function Types

func (integer returns [integer, Boolean])

tuple4

integer2

Boolean2

integer2

func2...

Note: Not all languages have function typeseven when they have functions.

Page 45: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 45

Self Recursive TypesSome languages (Java, Modula-3) permit a type to reference itself:

class node{ int value;

node next;}

class8 value next

int4

The internal representation is a pointer (4 bytes)

Page 46: Compiler Data Layouts - SEIDENBERG SCHOOL OF CSIScsis.pace.edu/~bergin/slides/CompilerDataStructures99.pdfSymbol Tables + Blocks If a language is block structured then each block (scope)

Joseph Bergin 1/12/99 46

Recursive Types Again

[ record [integer array[0..4] x, Boolean y] ,integer range [1..10] , pointer [integer, integer] ,func(integer, Boolean returns integer array[1..5])

]

Left as an exercise. :-)


Recommended