2.2 A Simple Syntax-Directed Translator 1. 2 2.3 Syntax-Directed Translation 2.4 Parsing 2.5 A...

transcript

2.2 A Simple Syntax-Directed Translator

• 2.3 Syntax-Directed Translation• 2.4 Parsing• 2.5 A Translator for Simple Expressions• 2.6 Lexical Analysis• 2.7 Symbol Tables• 2.8 Intermediate Code Generation• 2.9 Summary of Chapter 2

Syntax-Directed Translation• Syntax-directed translation is done by

attaching rules or program fragments to productions in a grammar.

• For example, consider an expression expr generated by the production:

• We can translate expr by exploiting its structure, as in the following pseudo-code:

Concepts of syntax-directed translation

• Attributes.– An attribute is any quantity associated with a programming

construct. Examples of attributes are data types of expressions, the number of instructions in the generated code, or the location of the first instruction in the generated code for a construct , among many other possibilities.

• (Syntax- directed) translation schemes. – A translation scheme is a notation for attaching program

fragments to the productions of a grammar. The program fragments are executed when the production is used during syntax analysis.

Postfix Notation

• The postfix notation for an expression E can be defined inductively as follows:

Postfix Notation

• Example 2.8 : The postfix notation for (9-5)+2

• What about??9- (5+2)

• Evaluate??

Postfix Notation

• How to evaluate??– The "trick" is to repeatedly scan the postfix string

from the left , until you find an operator. – Then, look to the left for the proper number of

operands, – group this operator with its operands. – Evaluate the operator on the operands, and replace

them by the result.– repeat the process, continuing to the right and

searching for another operator.

Postfix Notation

• Example 2.9 : evaluate the following postfix notation:

952+-3*

Synthesized Attributes

• We associate attributes with nonterminals and terminals. Then, we attach rules to the productions of the grammar; these rules describe how the attributes are computed at those nodes of the parse tree where the production in question is used to relate a node to its children.

Tree Traversals

• Tree traversals will be used for describing attribute evaluation and for specifying the execution of code fragments in a translation scheme. – A traversal of a tree starts at the root and visits each

node of the tree in some order.– A depth-first traversal starts at the root and recursively

visits the children of each node in any order, not necessarily from left to right . It is called "depth first“ because it visits an unvisited child of a node whenever it can, so it visits nodes as far away from the root (as "deep" ) as quickly as it can.

Translation Schemes

• Preorder and Postorder Traversals

Semantic actions

Example

Actions for postfix translating 9-5+2 into 95-2+

2.4 Parsing• Parsing is the process of determining how a

string of terminals can be generated by a grammar.– Top-Down Parsing

Use ϶-Productions

• Predictive Parsing (self study)• When to Use ϶-Productions– predictive parser uses an ϶ -production as a

default when no other production can be used

2.5 A Translator for Simple Expressions

• Using the techniques of the last three sections, we now construct a syntax directed translator– A syntax-directed translation scheme often serves as the

specification for a translator. – The scheme in Fig. 2.21 (repeated from Fig. 2.15) defines

the translation to be performed here

Abstract and Concrete Syntax

• A useful starting point for designing a translator is a data structure called an abstract syntax tree.

• In an abstract syntax tree for an expression, each interior node represents an operator; the children of the node represent the operands of the operator.

Example

syntax tree vs. parse tree

• in the syntax tree, interior nodes represent programming constructs while in the parse tree, the interior nodes represent nonterminals.

2.6 Lexical Analysis• A lexical analyzer reads characters from the input and groups

them into "token objects.“• The lexical analyzer in this section allows numbers, identifiers,

and "white space" (blanks, tabs, and newlines) to appear within expressions.

Reading Ahead

• A lexical analyzer may need to read ahead some characters before it can decide on the token to be returned to the parser.

• For example, a lexical analyzer for C or Java must read ahead after it sees the character >.– If the next character is =, then > is part of the

character sequence >=, the lexeme for the token for the "greater than or equal to" operator.

– Otherwise > itself forms the "greater than" operator, and the lexical analyzer has read one character too many.

Constants

Recognizing Keywords and Identifiers

• Most languages use fixed character strings such as for, do, and if , as punctuation marks or to identify constructs.

• Such character strings are called keywords.• Character strings are also used as identifiers to

name variables, arrays, functions, and the like. • Grammars routinely treat identifiers as terminals to

simplify the parser, which can then expect the same terminal, say id, each time any identifier appears in the input. For example, on input

Recognizing Keywords and Identifiers

• the parser works with the terminal stream id = id + id. The token for id has an attribute that holds the lexeme.

• Writing tokens as tuples, we see that the tuples for the input stream (2.6) are

• Keywords generally satisfy the rules for forming identifiers, so a mechanism is needed for deciding when a lexeme forms a keyword and when it forms an identifier.

• The problem is easier to resolve if keywords are reserved; i.e., if they cannot be used as identifiers. Then, a character string forms an identifier only if it is not a keyword.

2.7 Symbol table

• Symbol tables are data structures that are used by compilers to hold information about source-program constructs.

• The information is collected incrementally by the analysis phases of a compiler and used by the synthesis phases to generate the target code.

• Entries in the symbol table contain information about an identifier such as its character string (or lexeme) , its type, its position in storage, and any other relevant information. Symbol tables typically need to support multiple declarations of the same identifier within a program.

• Symbol Table Per Scope: The term "scope of identifier x' really refers to the scope of a particular declaration of x.

• Scopes are important, because the same identifier can be declared for different purposes in different parts of a program.

• If blocks can be nested, several declarations of the same identifier can appear within a single block.

2.8 Intermediate Code Generation

• Two Kinds of Intermediate Representations

Construction of Syntax Trees

L-values and R-values

• There is a distinction between the meaning of identifiers on the left and right sides of an assignment. In each of the assignments the right side specifies an integer value, while the left side specifies where the value is to be stored.

Type Checking

Three-Address Code

Example

2.9 Summary o f Chapter 2

2.2 A Simple Syntax-Directed Translator 1. 2 2.3 Syntax-Directed Translation 2.4 Parsing 2.5 A...

Documents