Lecture 4. Parsing (syntax analysis) - Iowa State...

Post on 11-Jul-2020

6 views 0 download

transcript

Lecture 4. Parsing (syntax analysis)

Wei Le

2015.9

Outline

I review: languages and grammars

I introduction of parsing

I CFG: context free grammars

I derivation

I ambiguity

LanguagesI a theoretical conceptI a set of strings: the set can be infiniteI Chomsky Hierarchy (1956): regular (type 3), context-free (type 2),

context-sensitive (type 1), recursively enumerable (type 0)I Computational complexity: the expressive power of (and the

resources needed in order to process) classes of languagesI The more restricted the rules are, the lower in the hierarchy the

languages they generate are

Grammars

I A specification of languages: syntax

I Grammars are used for compression (biology): learn a grammar froma set of strings

I generators (produce strings) or recognizers (match production rules)

I Compiler: recognizers

I A grammar G is a 4-tuple:I Σ: a set of terminalsI V : a set of non-terminalsI S : start symbolI a set of production rules

I L(G ) represents the language of the grammar

What Can Regular Languages Express?

I Finite number of characters

I Infinite, allow repeated patterns – intuition: finite automaton thatruns long enough must repeat states

I Strings whose patterns related to a fixed number of states

I Finite automaton can’t remember of times it has visited a particularstate

Context Free Grammar

I the production does not dependent on the context

I no terminals on the left side

I A can be replaced to α independent of the context where A islocated

Examples

The set of strings is expressed using the following:

I {a∗b∗}

I {anbn : n > 0} (e.g., (())))

I {anbncn : n > 0}

Examples

The set of strings is expressed using the following:

I {a∗b∗}I {anbn : n > 0} (e.g., (())))

I {anbncn : n > 0}

Examples

The set of strings is expressed using the following:

I {a∗b∗}I {anbn : n > 0} (e.g., (())))

I {anbncn : n > 0}

Examples

S → aSbS → ε

I In every derivation the length of the string never decreases

I The term ”context-sensitive” comes from a normal form for thesegrammars,where each production is of the form α1Aα2 → α1βα2

with β 6= ε

I They permit replacement of variable A by string only in the”context” α1 - α2

Chomsky Hierarchy: A Summary

Programming Languages and Natural Languages

I Programming languages:I C: all the programs written in CI modern programming languages: context free + some

context-sensitive features

I Natural language: English is not regular, not context free (how toprove it?)

Parser

Lexer: scanner, lexical analyzer Parser: syntax analyzer

Example

Role of Parsers

Parsing and CFG

I Recognize if a string (program) is a language (yes, no)

I Generate a parse tree from the input

I Report/handle errors

I Form of the grammar is importantI Many grammars generate the same languageI Tools (e.g.,Bison) are sensitive to the grammar

Context Free Grammars

BNF: Bakus-Naur Form

I late 1950s, early 1960s by John Bakus and Peter Naur

I notation systems for context free grammars

I < SheepNoise >::= baa < SheepNoise > |baa reads ”derives”

I Today, many books use an updated form of BNF: →

Example CFGs

More Understandings for CFGs

Key Ideas

More Understandings for CFGs

More Understandings for CFGs

The Language of a CFG

Terminals

Examples

Derivations

Derivations

Derivations

Derivations

Derivations

Derivations

Derivations

Derivations

Derivations

I A parse tree hasI Terminals at the leavesI Non-terminals at the interior nodes

I An in-order traversal of the leaves is the original input

I In-order traversal: left root right

I The parse tree shows the associativity of operators, the input stringdoes not

I In programming languages, the associativity (or fixity) of anoperator is a property that determines how operators of the sameprecedence are grouped in the absence of parentheses.

Derivations

I A parse tree hasI Terminals at the leavesI Non-terminals at the interior nodes

I An in-order traversal of the leaves is the original input

I In-order traversal: left root right

I The parse tree shows the associativity of operators, the input stringdoes not

I In programming languages, the associativity (or fixity) of anoperator is a property that determines how operators of the sameprecedence are grouped in the absence of parentheses.

Derivations

I A parse tree hasI Terminals at the leavesI Non-terminals at the interior nodes

I An in-order traversal of the leaves is the original input

I In-order traversal: left root right

I The parse tree shows the associativity of operators, the input stringdoes not

I In programming languages, the associativity (or fixity) of anoperator is a property that determines how operators of the sameprecedence are grouped in the absence of parentheses.

Derivations

I A parse tree hasI Terminals at the leavesI Non-terminals at the interior nodes

I An in-order traversal of the leaves is the original input

I In-order traversal: left root right

I The parse tree shows the associativity of operators, the input stringdoes not

I In programming languages, the associativity (or fixity) of anoperator is a property that determines how operators of the sameprecedence are grouped in the absence of parentheses.

Leftmost and Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Rightmost Derivations

Summary of Derivations

Revisiting Parsing

I A parser knows the grammar of the programming language

I The parser finds the derivation of a particular input

Ambiguity

Definition: A context-free grammar G is ambiguous if some string hastwo or more distinct derivation trees

I A language can have many grammars, some grammars may beambiguous

I Some language only can have ambiguous grammar (inherentAmbiguity)

I Ambiguity is bad for programming languages, and we want toremove ambiguity

Ambiguity

Definition: A context-free grammar G is ambiguous if some string hastwo or more distinct derivation trees

I A language can have many grammars, some grammars may beambiguous

I Some language only can have ambiguous grammar (inherentAmbiguity)

I Ambiguity is bad for programming languages, and we want toremove ambiguity

Ambiguity

Definition: A context-free grammar G is ambiguous if some string hastwo or more distinct derivation trees

I A language can have many grammars, some grammars may beambiguous

I Some language only can have ambiguous grammar (inherentAmbiguity)

I Ambiguity is bad for programming languages, and we want toremove ambiguity

Ambiguity: Example

Ambiguity: Example

Dealing with Ambiguity

Dealing with Ambiguity

Dealing with Ambiguity

Associtivity Declaration

Precedence Declaration