Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | rudolph-boyd |
View: | 226 times |
Download: | 0 times |
2
A construct used to establish properties of context-free languages (CFLs)
Every CFL without can be generated by a CFG in Chomsky normal form.
To show that language without is a CFL it is sufficient to show that it has a CFG in Chomsky normal form.
Typical approach to closure properites
Chomsky Normal Form: Purpose
3
Chomsky Normal Form: Definition
A context free grammar (CFG) in which all production are of the form A->BC or A->a, where A, B and C are variables and a is a terminal
4
Eliminate “useless: symbols Variables or terminals that do not
appear in any derivation of a terminal string from the start symbol
Eliminate -productions A->
Eliminate unit-productions A->B for variables A and B
Chomsky Normal Form: method of construction
5
For each elimination task, a method will be defined reclusively by an inductive proof.
Order in which tasks are preformed is important
Chomsky Normal Form: method of construction - 2
6
Generating and Reachable Symbols
X is generating if X =>* w (terminal string)
If X is a terminal, then it can generate itself in zero steps.
X is reachable if S =>* X for some and , (S is a start symbol)
Any symbol that is not generating and reachable is useless
7
Induction to find generating variables
Basis: If there is a production A -> w, where w is a terminal string, then A is generating.
Induction: If there is a production A -> , where consists only of terminals and variables known to derive a terminal string, then A derives a terminal string; hence is generating.
8
Algorithm to eliminate non-generating variables
1. Discover all variables that derive terminal strings.
2. For all other variables, remove all productions in which they appear either on the LHS or RHS of ->.
9
Example: finding generating variables
S->AB|C, A->aA|a, B->bB, C->c Basis: A and C are generating due
to productions A->a and C->c. Induction: S is generating due to
production S->C. Eliminate B->bB and S->AB Result: S->C, A->aA|a, C->c Still have unreachable variables
10
Finding reachable symbols
Basis: Obviously, start symbol is reachable.
Induction: if we can reach A, and there is a production A->, then we can reach all symbols of .
In result from previous slide S->C, A->aA|a, C->c
Only S and C are reachable
11
Epsilon Productions Theorem: If L is a CFL with no empty
string, then it has a CFG which can be put in Chomsky form with no -productions.
A-> is clearly an -production To eliminate all types -productions, we
must first discover the nullable variables, i.e. variables A such that A =>* ε.
12
Inductive definition of nullable symbols
Basis: If there is a production A -> ε, then A is nullable.
Induction: If there is a production A -> , and all symbols of are nullable, then A is nullable.
13
Example: Nullable Symbols
S->AB, A->aA|ε, B->bB|A A is nullable because of A -> ε. B is nullable because of B -> A. S is nullable because of S -> AB.
14
Algorithm to eliminate -productions Identify all nullable symbols.
Consider each production A->X1…Xn that contains nullable symbols
Suppose A->X1…Xn contains m<n nullable symbols
Construct a family of productions with 2m members that are all combinations of nullable symbols present or absent
If m=n exclude case with all symbols absent
15
Eliminating -productions The new CFG with no -productions
consist of all families of productions derived from productions with nullable symbols
Plus all productions from the original CFG that did not contain nullable symbols
16
Example: Eliminating ε-Productions
S->ABC, A->aA|ε, B->bB|ε, C->ε A, B, C, and S are all nullable. Productions S->ABC|AB|AC|BC|A|B|C
come from S->ABC Productions A->aA|a come from A-
>aA Productions B->bB|b come from B-
>bB
17
Eliminating ε-Productions continued
S->ABC, A->aA|ε, B->bB|ε, C->ε No contribution to CNF from original CFG C is not generating Eliminate C in productions of the new
CFGS -> ABC | AB | AC | BC | A | B | CA -> aA | aB -> bB | b
18
Define Unit Productions
A unit production is a production whose right side consists of exactly one variable.
A->a is not a unit production if a is terminal
Eliminate by expansion is most common approach
19
Eliminate by expansion In the CFG defined by
E->T|E+T T->F|T*F F->I|(E) I->a|Ia
E->T eliminated by E->F|T*F|E+T E->F eliminated by E->I|(E)|T*F|E+T E->I eliminated by E->a|Ia|(E)|T*F|
E+T
20
Eliminate by expansion Will not work on cycles of unit
productions A->B B->C C->A
Alternative: find all pairs (A,B) such that A=>*B by a sequence of unit productions
Works in all cases.
21
Alternative to expansion in eliminating unit productions Basic idea: If A=>*B by a series of
unit productions, and B-> is a non-unit-production, then add production A-> and drop the unit productions.
Example
22
Example of basic idea In the CFG defined by
E->T|E+T T->F|T*F F->I|(E) I->a|Ia
E=>*I by the series of unit productions E->T, T->F, F->I
I->a is a non-unit production. Replace by E->a E->a|Ia|(E)|T*F|E+T (same as
expansion method)
23
Pair search defined by induction
Find all pairs (A,B) such that A=>*B by a sequence of unit productions only.
Basis: A=>*A, therefor (A,A). Induction: If we have found (A,B), and B-
>C is a unit production, then add (A,C)
24
Example of pair search In CFG defined by
E->T|E+T T->F|T*F F->I|(E) I->a|Ia
Obviously (E,T), (T,F), (F,I) (T,I) and (E,F) also
25
Cleaning up a Grammar
Theorem: if L is a CFL, then there is a CFG for L – {ε} that has:
1. No useless symbols.2. No ε-productions.3. No unit productions.
every right side of a production is either a single terminal or has length > 2.
26
Clean-up continued Proof: Start with a CFG for L. Perform the following steps in order:
1. Eliminate ε-productions.2. Eliminate unit productions.3. Eliminate variables that derive no
terminal string.4. Eliminate variables not reached from
the start symbol.Must be first. Can createunit productions and uselessvariables.
27
Chomsky Normal Form A CFG is said to be in Chomsky
Normal Form if every production is of one of these two forms:
1. A -> BC (right side is two variables).2. A -> a (right side is a single terminal).
Theorem: If L is a CFL, then L – {ε} has a CFG in CNF.
28
Proof by construction Step 1: “Clean” the grammar, so every
production has right side either a single terminal or length >2.
Step 2: For each right side a single terminal, make the right side all variables. For each terminal a create new variable Aa and
production Aa -> a. (not a unit production)
Replace a by Aa in right sides of productions.
29
Example: Step 2
Consider production A -> BcDe. We need variables Ac and Ae. with
productions Ac -> c and Ae -> e. Note: you create at most one variable
for each terminal, and use it everywhere it is needed.
Replace A -> BcDe by A -> BAcDAe.
30
CNF construction: final step
Step 3: Break right sides longer than 2 into a chain of productions with right sides of two variables.
Example: A -> BCDE is replaced by A -> BF, F -> CG, and G -> DE. F and G must be used nowhere else.