+ All Categories
Home > Education > Syntax analysis

Syntax analysis

Date post: 12-Apr-2017
Category:
Upload: akshaya-arunan
View: 401 times
Download: 0 times
Share this document with a friend
58
SYNTAX ANALYSIS
Transcript
Page 1: Syntax analysis

SYNTAX

ANALYSIS

Page 2: Syntax analysis

Syntax Analyzer

Syntax Analyzer creates the syntactic structure of the given source program.

This syntactic structure - parse tree.

Syntax Analyzer is also known as parser.

The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not.

If it satisfies, the parser creates the parse tree of that program.

Otherwise the parser gives the error messages.

Page 3: Syntax analysis

INTRODUCTION

Every programming language has precise rules that prescribe the syntactic

structure of well-formed programs.

Program is made up of functions, a function out of declarations and statements, a

statement out of expressions

The syntax of programming language constructs can be specified by context-

free grammars

A context-free grammar

gives a precise syntactic specification of a programming language.

the design of the grammar is an initial phase of the design of a

compiler.

a grammar can be directly converted into a parser by some tools.

Page 4: Syntax analysis

Parser

• Parser works on a stream of tokens.

• The smallest item is a token.

Lexical

AnalyzerParser

source

program

token

get next token

parse tree

Page 5: Syntax analysis

Parsing Parsing is the process of determining whether a string of tokens can be

generated by a grammar.

Parsing methods

The top-down

Bottom-up methods.

Top-down parsing, construction starts at the root and proceeds to the

leaves.

Bottom-up parsing, construction starts at the leaves and proceeds towards

the root.

Top-down parsers are easy to build by hand.

Bottom-up parsing,

Can handle a larger class of grammars.

They are not as easy to build, but tools for generating them directly from a grammar are available.

Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).

Page 6: Syntax analysis

Top- Down Parsing

Done by starting with the root, labeled with the starting nonterminal stmt,

and repeatedly performing the following two steps.

At node N, labeled with nonterminal A, select one of the productions for A and

construct children at N for the symbols in the production body.

Find the next node at which a subtree is to be constructed, typically the leftmost

unexpanded nonterminal of the tree.

The current terminal being scanned in the input is frequently referred to as

the lookahead symbol.

Page 7: Syntax analysis

Top- Down Parsing

Page 8: Syntax analysis

Top- Down Parsing

Page 9: Syntax analysis

Top- Down Parsing

Page 10: Syntax analysis

Top-Down Parsing

Top-Down Parsing is an attempt to find a left-most derivation for an input string

Example:

S cAd Find a derivation for

A ab | a for w cad

S S Backtrack S

/ | \ / | \ / | \

c A d c A d c A d

/ \ |

a b a

Page 11: Syntax analysis

Predictive Parsing

Recursive-descent parsing is a top-down method of syntax analysis in

which a set of recursive procedures is used to process the input.

Simple form of recursive descent – Predictive Parsing

Page 12: Syntax analysis

Syntax Error Handling

Goals in error handling

Report the presence of errors clearly and accurately.

Recover from each error quickly enough to detect subsequent errors.

Add minimal overhead to the processing of correct programs.

Page 13: Syntax analysis

Error-Recovery Strategies

The simplest approach is for the parser to quit with an informative error

message when it detects the first error.

Panic-mode recovery

Phrase-level recovery

Error-productions

Global-correction.

Page 14: Syntax analysis

Panic-Mode Recovery

The parser discards input symbols one at a time until one of a designated set of

synchronizing tokens is found.

The synchronizing tokens are usually delimiters, such as ; or }.

Skips a considerable amount of input without checking for additional errors

It has the advantage of simplicity, and is guaranteed not to go into an infinite

loop.

Page 15: Syntax analysis

Phrase-Level Recovery

Perform local correction on the remaining input;

It may replace a prefix of the remaining input by some string that allows the

parser to continue.

A typical local correction is to replace a comma by a semicolon.

Delete an extraneous semicolon.

Insert a missing semicolon.

Disadvantage in coping with situations in which the actual error has occurred

before the point of detection.

Page 16: Syntax analysis

Error Productions

Expand the grammar for the language at hand with productions that generate the

erroneous constructs.

The parser can then generate appropriate error diagnostics about the erroneous

construct that has been recognized in the input.

Page 17: Syntax analysis

Global Correction

Compiler to make as few changes as possible in processing an incorrect input

string.

Given an incorrect input string x and grammar G, algorithms will find a parse

tree for a related string y, such that the number of insertions, deletions, and

changes of tokens required to transform x into y is as small as possible.

Not implemented.

Page 18: Syntax analysis

Syntax Definition

A grammar describes the hierarchical structure of programming language constructs.

Eg: if ( expression ) statement else statement

An if-else statement is the concatenation of the keyword if, an opening parenthesis, an

expression, a closing parenthesis, a statement, the keyword else, and another statement.

Stmt -> if ( expr ) stmt else stmt

Rule is called a production.

In a production, lexical elements if and the parentheses are called terminals.

Variables like expr and stmt are called nonterminals.

Page 19: Syntax analysis

A Context Free Grammar

A context-free grammar has four components:

A set of terminal symbols, sometimes referred to as "tokens.“

A set of nonterminals, sometimes called "syntactic variables."

A set of productions, where each production consists of a nonterminal,called the head or

left side of the production, an arrow, and a sequence of terminals and/or nonterminals ,

called the body or right side of the production

A designation of one of the nonterminals as the start symbol.

Page 20: Syntax analysis

A Context Free Grammar

The terminal symbols are

Page 21: Syntax analysis

Notational Conventions

These symbols are terminals:

Lowercase letters early in the alphabet, such as a, b, c.

Operator symbols such as +, *, and so on.

Punctuation symbols such as parentheses, comma, and so on.

The digits 0, 1, . . . , 9.

Boldface strings such as id or if, each of which represents a single terminal

symbol.

Page 22: Syntax analysis

Notational ConventionsThese symbols are nonterminals:

Uppercase letters early in the alphabet, such as A, B, C.

The letter s, which, when it appears, is usually the start symbol.

Lowercase, italic names such as expr or stmt.

Uppercase letters may be used t o represent nonterminals for the constructs.

For example, nonterminals for expressions, terms, and factors are often

represented by E, T, and F, respectively.

Page 23: Syntax analysis

Notational Conventions

Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar

symbols; that is, either nonterminals or terminals.

Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly

empty) strings of terminals.

Lowercase greek letters,α, β, γ for example, represent (possibly empty) strings

of grammar symbols.

A set of productions a -> α 1 , a -> α2, ... , a -> α k with a common head

A (call them a-productions) , may be written A -> α 1 I α 2 I . , . I α k · call α1 ,

α2 , ... ,αk the alternatives for A.

Unless stated otherwise, the head of the first production is the start symbol

Page 24: Syntax analysis

Notational Conventions

Page 25: Syntax analysis

Derivations

E E+E : E+E derives from E

E E+E id+E id+id

A sequence of replacements of non-terminal symbols is called a derivation

of id+id from E.

A if there is a production rule A in our grammar and and

are arbitrary strings of terminal and non-terminal symbols

1 2 ... n (n derives from 1 or 1 derives n )

: derives in one step

: derives in zero or more steps

: derives in one or more steps

*

+

Page 26: Syntax analysis

CFG - Terminology

L(G) is the language of G (the language generated by G) which is a set of

sentences.

A sentence of L(G) is a string of terminal symbols of G.

If S is the start symbol of G then

is a sentence of L(G) iff S where is a string of terminals of G

If G is a context-free grammar, L(G) is a context-free language.

Two grammars are equivalent if they produce the same language.

S - If contains non-terminals, it is called as a sentential form of G.

- If does not contain non-terminals, it is called as a sentence of G.

*

*

Page 27: Syntax analysis

Derivation Example

E -E -(E) -(E+E) -(id+E) -(id+id)

OR

E -E -(E) -(E+E) -(E+id) -(id+id)

At each derivation step, we can choose any of the non-terminal in the

sentential form of G for the replacement.

If we always choose the left-most non-terminal in each derivation

step, this derivation is called as left-most derivation.

If we always choose the right-most non-terminal in each derivation

step, this derivation is called as right-most derivation.

Page 28: Syntax analysis

Left-Most and Right-Most Derivations

Left-Most Derivation

E -E -(E) -(E+E) -(id+E) -(id+id)

Right-Most Derivation

E -E -(E) -(E+E) -(E+id) -(id+id)

We will see that the top-down parsers try to find the left-most derivation of the given source program.

We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.

lmlmlmlmlm

rmrmrmrmrm

Page 29: Syntax analysis

Parse Trees and Derivations

A parse tree is a graphical representation of a derivation that filters out the

order in which productions are applied to replace nonterminals.

The interior node is labeled with the nonterminal A in the head of the

production;

The children of the node are labeled, from left to right, by the symbols in the

body of the production

The leaves of a parse tree are labeled by nonterminals or terminals

Read from left to right, constitute a sentential form, called the yield or frontier

of the tree.

There is a many-to-one relationship between derivations and parse trees.

Page 30: Syntax analysis

Ambiguity a grammar that produces more than one parse tree for some sentence is said

to be ambiguous

Page 31: Syntax analysis

1

2

3

4

a

b

c

d

e

f

Page 32: Syntax analysis

Writing a Grammar

Grammars are capable of describing most, of the syntax of programming

languages .

Grammar should be unambiguous.

Left-recursion elimination and left factoring - are useful for rewriting

grammars .

From the resulting grammar we can create top down parsers without

backtracking.

Such parsers are called predictive parsers or recursive-descent parser

Page 33: Syntax analysis

Eliminating Ambiguity

ambiguous grammar can be rewritten to eliminate the ambiguity.

stmt -> if expr then stmt

|if expr then stmt else stmt

|other

is ambiguous since the string

if E1 then if E2 then S1 else S2 has the two parse trees

Page 34: Syntax analysis

Two parse trees for an ambiguous sentence

Page 35: Syntax analysis

Eliminating Ambiguity

The general rule is, "Match each else with the closest unmatched then."

Page 36: Syntax analysis

Left Recursion

A grammar is left recursive if it has a non-terminal A such that there is a

derivation.

A A for some string

Top-down parsing techniques cannot handle left-recursive grammars.

The left-recursion may appear in a single step of the derivation (immediate left-

recursion), or may appear in more than one step of the derivation.

*

Page 37: Syntax analysis

Immediate Left-Recursion

A A | where does not start with A

eliminate immediate left recursion

A A’

A’ A’ |

A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A

eliminate immediate left recursion

A 1 A’ | ... | n A’

A’ 1 A’ | ... | m A’ | an equivalent grammar

In general,

Page 38: Syntax analysis

Left-Recursion -- Problem

• A grammar cannot be immediately left-recursive, but it still can

be left-recursive.

S Aa | b

A Sc | d

S Aa Sca

A Sc Aac causes to a left-recursion

Page 39: Syntax analysis

Eliminate Left-Recursion -- Algorithm

- Arrange non-terminals in some order: A1 ... An

- for i from 1 to n do {

- for j from 1 to i-1 do {

replace each production

Ai Aj

by

Ai 1 | ... | k

where Aj 1 | ... | k

}

- eliminate immediate left-recursions among Ai productions

}

Page 40: Syntax analysis

Eliminate Left-Recursion

S Aa | b

A Ac | Sd | f

- Order of non-terminals: S, A

- A Ac | Aad | bd | f

- Eliminate the immediate left-recursion in A

A bdA’ | fA’

A’ cA’ | adA’ |

So, the resulting equivalent grammar which is not left-recursive is:

S Aa | b

A bdA’ | fA’

A’ cA’ | adA’ |

Page 41: Syntax analysis

Eliminate Left-Recursion – Example2S Aa | b

A Ac | Sd | f

- Order of non-terminals: A, S

- Eliminate the immediate left-recursion in A

A SdA’ | fA’

A’ cA’ |

- Replace S Aa with S SdA’a | fA’a

- Eliminate the immediate left-recursion in S

S fA’aS’ | bS’

S’ dA’aS’ |

So, the resulting equivalent grammar which is not left-recursive is:

S fA’aS’ | bS’

S’ dA’aS’ |

A SdA’ | fA’

A’ cA’ |

Page 42: Syntax analysis

Left-Recursive Grammars III

Here is an example of a (directly) left-recursive grammar:

E E + T | T

T T * F | F

F ( E ) | id

This grammar can be re-written as the following non left-

recursive grammar:

E T E’ E’ + TE’ | є

T F T’ T’ * F T’ | є

F (E) | id

Page 43: Syntax analysis

Left Factoring Left factoring is a grammar transformation that is useful for

producing a grammar suitable for predictive, or top-down,

parsing.

Stmt -> if expr then stmt else stmt

|if expr then stmt

A ->α 1 | α 2

So it should be left factored as

Page 44: Syntax analysis
Page 45: Syntax analysis

Left-Factoring -- Algorithm

For each non-terminal A with two or more alternatives (production rules)

with a common non-empty prefix

A 1 | ... | n | 1 | ... | m

convert it into

A A’ | 1 | ... | m

A’ 1 | ... | n

Page 46: Syntax analysis

Left-Factoring – Example1

A abB | aB | cdg | cdeB | cdfB

A aA’ | cdg | cdeB | cdfB

A’ bB | B

A aA’ | cdA’’

A’ bB | B

A’’ g | eB | fB

Page 47: Syntax analysis

Left-Factoring – Example2

A ad | a | ab | abc | b

A aA’ | b

A’ d | | b | bc

A aA’ | b

A’ d | | bA’’

A’’ | c

Page 48: Syntax analysis

Top-Down Parsing

The parse tree is created top to bottom.

Top-down parser

Recursive-descent parsing

Backtracking is needed

It is a general parsing technique, but not widely used.

Not efficient

Predictive parsing

No backtracking

Efficient

Needs a special form of grammars - (LL(1) grammars).

Recursive predictive parsing is a special form of recursive descent parsing without

backtracking.

Non-recursive (table driven) predictive parser is also known as LL(1) parser.

Page 49: Syntax analysis

Recursive Predictive Parsing

Each non-terminal corresponds to a procedure.

Ex: A aBb

proc A {

- match the current token with a, and move to the next

token;

- call ‘B’;

- match the current token with b, and move to the next

token;

}

Page 50: Syntax analysis

Recursive Predictive Parsing (cont.)

A aBb | bAB

proc A {

case of the current token {

‘a’: - match the current token with a, and move to the next token;

- call ‘B’;

- match the current token with b, and move to the next token;

‘b’: - match the current token with b, and move to the next token;

- call ‘A’;

- call ‘B’;

}

}

Page 51: Syntax analysis

Top-down parse for id + id * id

Page 52: Syntax analysis
Page 53: Syntax analysis

FIRST and FOLLOW

FIRST and FOLLOW allow us to choose which production toapply, based on the

next input symbol.

FIRST(α), where α is any string of grammar symbols, to be the set of terminals that

begin strings derived from α.

If α => ε, then ε is also in FIRST(α) .

A => cY, so c is in FIRST(A)

FOLLOW(A) is the set of the terminals which occur immediately after (follow) the

non-terminal A in the strings derived from the starting symbol.

a terminal a is in FOLLOW(A) if S Aa

$ is in FOLLOW(A) if S A

*

*

Page 54: Syntax analysis

FIRST1. If X is a terminal, then FIRST(X) = {X}.

2. If X is a nonterminal and X -> YI Y2 ... Yk is a production for some k>=1, then

place a in FIRST(X) if for some i, a is in FIRST(Yi), and ε is in all of

FIRST(YI), ... ,FIRST(Yi-I); that is , YI Y2 ... Yi-1 => ε. If ε is in FIRST (Yj) for

all j = 1, 2,... ,k, then add ε to FIRST (X). For example, everything in FIRST(Y1)

is surely in FIRST(X) . If Yi does not derive ε then we add nothing more to

FIRST (X) , but if Y1 => ε, then we add FIRST(Y2), and so on.

3. 3. If X => ε is a production, then add ε to FIRST (X).

*

Page 55: Syntax analysis

FOLLOW

Page 56: Syntax analysis
Page 57: Syntax analysis

LL ( 1 ) Grammars

L: scanning the input from left to right

L: producing a leftmost derivation

1 : one input symbol of lookahead at each step

A grammar G is LL(1) if and only if whenever A -> α | β are two distinct

productions of G, the following conditions hold:

Page 58: Syntax analysis

Construction of a predictive parsing

table.


Recommended