+ All Categories
Home > Technology > MELJUN CORTES Automata Theory (Automata8)

MELJUN CORTES Automata Theory (Automata8)

Date post: 26-Jun-2015
Category:
Upload: meljun-cortes
View: 102 times
Download: 1 times
Share this document with a friend
Description:
MELJUN CORTES Automata Theory (Automata8)
Popular Tags:
26
CSC 3130: Automata theory and formal languages Context-free languages MELJUN P. CORTES, MBA,MPA,BSCS,ACS MELJUN P. CORTES, MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES
Transcript
Page 1: MELJUN CORTES Automata Theory (Automata8)

CSC 3130: Automata theory and formal languages

Context-free languages

MELJUN P. CORTES, MBA,MPA,BSCS,ACSMELJUN P. CORTES, MBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES Automata Theory (Automata8)

Context-free grammar

• This is an a different model for describing languages

• The language is specified by productions (substitution rules) that tell how strings can be obtained, e.g.

• Using these rules, we can derive strings like this:

A → 0A1A → BB → #

A, B are variables0, 1, # are terminalsA is the start variable

A 0A1 00A11 000A111 000B111 000#111

Page 3: MELJUN CORTES Automata Theory (Automata8)

Some natural examples

• Context-free grammars were first used for natural languages

a girl with a flower likes the boy

SENTENCE

NOUN-PHRASE VERB-PHRASE

ART NOUN PREP ART NOUN VERB ART NOUN

CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN

PREP-PHRASE

CMPLX-VERB

NOUN-PHRASE

Page 4: MELJUN CORTES Automata Theory (Automata8)

Natural languages

• We can describe (some fragments) of the English language by a context-free grammar:

SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB

ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → withvariables: SENTENCE, NOUN-PHRASE, …

terminals: a, the, boy, girl, flower, likes, touches, sees, with

start variable: SENTENCE

Page 5: MELJUN CORTES Automata Theory (Automata8)

Programming languages

• Context-free grammars are also used to describe (parts of) programming languages

• For instance, expressions like (2 + 3) * 5 or 3 + 8 + 2 * 7 can be described by the CFG

<expr> <expr> + <expr>

<expr> <expr> * <expr>

<expr> (<expr>)

<expr> 0

<expr> 1

<expr> 9

Variables: <expr>

Terminals: +, *, (, ), 0, 1, …, 9

Page 6: MELJUN CORTES Automata Theory (Automata8)

Motivation for studying CFGs

• Context-free grammars are essential for understanding the meaning of computer programs

• They are used in compilers

code: (2 + 3) * 5

meaning: “add 2 and 3, and then multiply by 5”

Page 7: MELJUN CORTES Automata Theory (Automata8)

Definition of context-free grammar

• A context-free grammar (CFG) is a 4-tuple (V, T, P, S) where– V is a finite set of variables or non-terminals– T is a finite set of terminals (V T = )– P is a set of productions or substitution rules of the

form

where A is a symbol in V and is a string over V T

– S is a variable in V called the start variable

A →

Page 8: MELJUN CORTES Automata Theory (Automata8)

Shorthand notation for productions

• When we have multiple productions with the same variable on the left like

we can write this in shorthand as

E E + E

E E * E

E (E)

E N

E E + E | E * E | (E) | 0 | 1

N 0N | 1N | 0 | 1

Variables: E, N

Terminals: +, *, (, ), 0, 1

Start variable: E

N 0N

N 1N

N 0

N 1

Page 9: MELJUN CORTES Automata Theory (Automata8)

Derivation

• A derivation is a sequential application of productions:

E

deri

vati

on

E * E (E) * E (E) * N (E + E ) * N (E + E ) * 1 (E + N) * 1 (N + N) * 1 (N + 1N) * 1 (N + 10) * 1 (1 + 10) * 1

means can be obtainedfrom with one production

*

means can be obtainedfrom after zero or moreproductions

Page 10: MELJUN CORTES Automata Theory (Automata8)

Language of a CFG

• The language of a CFG (V, T, P, S) is the set of all strings containing only terminals that can be derived from the start variable S

• This is a language over the alphabet T

• A language L is context-free if it is the language of some CFG

L = { | T* and S }*

Page 11: MELJUN CORTES Automata Theory (Automata8)

Example 1

• Is the string 00#11 in L?

• How about 00#111, 00#0#1#11?

• What is the language of this CFG?

A → 0A1 | BB → #

variables: A, Bterminals: 0, 1, # start variable: A

L = {0n#1n: n ≥ 0}

Page 12: MELJUN CORTES Automata Theory (Automata8)

Example 2

• Give derivations of (), (()())

• How about ())?

S SS | (S) | convention: variables in uppercase, terminals in lowercase, start variable first

S (S) (rule 2) () (rule 3)

S (S) (rule 2) (SS)

(rule 1) ((S)S) (rule 2) ((S)(S)) (rule 2) (()(S)) (rule 3) (()())

(rule 3)

Page 13: MELJUN CORTES Automata Theory (Automata8)

Examples: Designing CFGs

• Write a CFG for the following languages– Linear equations over x, y, z, like:

x + 5y – z = 911x – y = 2

– Numbers without leading zeros, e.g., 109, 0 but not 019

– The language L = {anbncmdm | n 0, m 0}

– The language L = {anbmcmdn | n 0, m 0}

Page 14: MELJUN CORTES Automata Theory (Automata8)

Context-free versus regular

• Write a CFG for the language (0 + 1)*111

• Can you do so for every regular language?

• Proof:

S A111A | 0A | 1A

Every regular language is context-free

regularexpression

DFANFA

Page 15: MELJUN CORTES Automata Theory (Automata8)

From regular to context-free

regular expression

a (alphabet symbol)

E1 + E2

CFG

E1E2

E1*

grammar with no rules

S→

S →a

S→ S1 | S2

S→ S1S2

S→ SS1 |

In all cases, S becomes the new start symbol

Page 16: MELJUN CORTES Automata Theory (Automata8)

Context-free versus regular

• Is every context-free language regular?

• No! We already saw some examples:

• This language is context-free but not regular

A → 0A1 | BB → #

L = {0n#1n: n ≥ 0}

Page 17: MELJUN CORTES Automata Theory (Automata8)

Parse tree

• Derivations can also be represented using parse trees

E E + E V + E x + E x + (E) x + (E E) x + (V E) x + (y E) x + (y V) x + (y z)

E

E E+

V ( E )

E E

V V

x

y z

E E + E | E - E | (E) | V V x | y | z

Page 18: MELJUN CORTES Automata Theory (Automata8)

Definition of parse tree

• A parse tree for a CFG G is an ordered tree with labels on the nodes such that– Every internal node is labeled by a variable– Every leaf is labeled by a terminal or – Leaves labeled by have no siblings

– If a node is labeled A and has children A1, …, Ak from left to right, then the rule

is a production in G.

A → A1…Ak

Page 19: MELJUN CORTES Automata Theory (Automata8)

Left derivation

• Always derive the leftmost variable first:

• Corresponds to a left-to-right traversal of parse tree

E E + E V + E x + E x + (E) x + (E E) x + (V E) x + (y E) x + (y V) x + (y z)

E

E E+

V ( E )

E E

V V

x

y z

Page 20: MELJUN CORTES Automata Theory (Automata8)

Ambiguity

• A grammar is ambiguous if some strings have more than one parse tree

• Example: E E + E | E E | (E) | V V x | y | z

x + y + z

E

E E+

E E+V

V Vx

y z

E

E E+

E E+ V

V V

x y

z

Page 21: MELJUN CORTES Automata Theory (Automata8)

Why ambiguity matters

• The parse tree represents the intended meaning:

x + y + z

E

E E+

E E+V

V Vx

y z

E

E E+

E E+ V

V V

x y

z

“first add y and z, and then add this to x”

“first add x and y, and then add z to this”

Page 22: MELJUN CORTES Automata Theory (Automata8)

Why ambiguity matters

• Suppose we also had multiplication:E E + E | E E | E E | (E) | V V x | y | z

x y + z

E

E E*

E E+V

V Vx

y z

E

E E+

E E V

V V

x y

z

“first x y, then + z”“first y + z, then x ”

Page 23: MELJUN CORTES Automata Theory (Automata8)

Disambiguation

• Sometimes we can rewrite the grammar to remove the ambiguity

• Rewrite grammar so cannot be broken by +:

E E + E | E E | E E | (E) | V V x | y | z

E T | E + T | E TT F | T FF (E) | VV x | y | z

T stands for term: x * (y + z)F stands for factor: x, (y + z)

A term always splits into factors

A factor is either a variable or a parenthesized expression

Page 24: MELJUN CORTES Automata Theory (Automata8)

Disambiguation

• Example

x y + z

E T | E + T | E TT F | T FF (E) | VV x | y | z

V V V

F

T

FT

E

E

T

F

Page 25: MELJUN CORTES Automata Theory (Automata8)

Disambiguation

• Can we always disambiguate a grammar?

• No, for two reasons– There exists an inherently ambiguous context-

free L:Every CFG for this language is ambiguous

– There is no general procedure that can tell if a grammar is ambiguous

• However, grammars used in programming languages can typically be disambiguated

Page 26: MELJUN CORTES Automata Theory (Automata8)

Another Example

• Is ab, baba, abbbaa in L?

• How about a, bba?

• What is the language of this CFG?

• Is the CFG ambiguous?

S aB | bAA a | aS | bAAB b | bS | aBB


Recommended