Context Free Grammarshttps://courses.missouristate.edu/anthonyclark/333/
Some notes adapted from Professor GianLuigi Ferrari at University of Pisa
Outline
Topics and Learning Objectives• Formal grammar theory• Context free grammars
Assessments• Context free grammars
Picture from “Crafting Interpreters”
Interpreter
1. // GCD Program (in C)2. int main() {3. int i = getint(), j = getint();4. while (i != j) {5. if (i > j) i = i - j;6. else j = j - i;7. }8. putint(i);9. }
Source Code (Plain Text)
int main ( ) {int i = getint ( ) , j = getint ( ) ;
while ( i != j ) {if ( i > j ) i = i - j ;
else j = j - 1 ;}
putint ( i ) ;}
Lexemes/Tokens
Lexer/Scanner
Group characters into smallest meaningful units
Parser
Put Tokens in a Tree-like data structure that represents semantics of program
Evaluate/Interpret
“Walk” the tree and evaluate the program given optional input from outside of the program
User input / File Input / Sockets / Etc.
Result
We’ll create our own simple calculator language
You’ve already started this
Regular Expressionsand
Finite State Machines
Self-evident
Formal Grammarsand
Push-Down Automata
Assignments 5 through 7
We’ll build our own representationTree-Walking is pretty straightforward
Assignment 8
Interpreter
Parser
Lexer
requesttoken
sendtoken
read
Tree Walker
sendAST
I/OConsole
requestAST
… …
Chomsky Hierarchy
• Type-0: Turing machine
• Type-1: Linear bounded automaton
• Type-2: Pushdown automaton
• Type-3: Finite state automaton
Scanning vs Parsing
• Regular expressions for regular languages recognized by lexer• Context free grammars for context-free languages recognized by
parsers
REs cannot “count”• Cannot balance parenthesis• Cannot balance if then else expressions• Etc.
Example from:Writing An Interpreter In Go
Lexems/Tokens
Abstract Syntax Tree
No parentheses, semicolons, braces, etc.
We now care about syntax!
Grammar
A grammar is a tool for describing a languageA grammar is a set of rules (productions) for creating valid strings
grammar English;
sentence : subject verbPhrase object;subject : 'This' | 'Computers' | 'I';verbPhrase : adverb verb | verb;adverb : 'never';verb : 'is' | 'run' | 'am' | 'tell';object : 'the' noun | 'a' noun | noun;noun : 'university' | 'world' | 'cheese' | 'lies';
Generating Strings
We can use the grammar to generate stringsStart with the top-level rule: sentenceReplace RHS with other rules or terminalsFor example: This is a university
sentence : subject verbPhrase object;subject : 'This' | 'Computers' | 'I';verbPhrase : adverb verb | verb;adverb : 'never';verb : 'is' | 'run' | 'am' | 'tell';object : 'the' noun | 'a' noun | noun;noun : 'university' | 'world' | 'cheese' | 'lies';
Syntax vs Semantics
• You can also create syntactically valid strings that do not make sense semantically• “Computers run cheese”• “This am a lies”
• These are valid sentences, but they do not have any real meaning• The same can be true for our programming language rules• We’ll worry about semantics later.
def f():return “hi”
float y = f() + 5;
Error types
Invalid lexemes (all languages will catch this problem early)var x = 5 @ “6” # ‘@’ is not a valid operator in most languages
Valid lexemes, invalid syntax (all languages will catch this problem early)x var= 5 5 * # all valid lexemes, but in the wrong order
Valid lexemes, valid syntax, invalid semantics (catch at compile time or runtime)var x = 5 * “6” # many languages will not multiply an integer and a string
Valid lexemes, valid syntax, valid semanticsvar x = 5 * 6
Formal Grammars
Grammar a set of rules for creating valid strings
Nonterminal a grammar symbol that can be replaced by a sequence of symbols
Terminal a word in the language (cannot be replaced with something else)
Production a single rule in the grammar (XàY1Y2…)
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol is used to say that a nonterminal can be replaced with nothing
Example
Write a regular expression for the following regular language.At least 1 zero followed by at least 1 one
Now write a grammar for the same language
0 0* 1 1*
Example
• Define a regular expression where you have n 0’s followed by n 1’s
• Define a context-free grammar where you have some number of 0’s followed by the same number of 1’s
Activity
1. Write three strings that can be generated by the following grammar.
S à 1S | 0T | εT à 1T | 0S
What does this recognize?
S à 1S | 0T | εT à 1T | 0S
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S1S11S110T1101T11010S11010
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
S à 1S | 0T | εT à 1T | 0S
Grammar rules for creating valid strings of a language
Nonterminal can be replaced by a sequence of symbols
Terminal a word in the language
Production a single rule in the grammar
Derivation a sequence of rule applications that produces a valid string
Start Symbol the rule used to start all derivations
Null Symbol the ε symbol, a nonterminal can be replaced with nothing
Language set of all strings that can be derived from a grammar
• 1• “”• 0101• 1111• 100• 00• …
2. Write a grammar for palindromes where your terminals are the symbols ‘a’ and ‘b’.
3. Write a grammar for representing all strings that start with x number of a’s, followed by ynumber of b’s, followed by z number of a’s, where y = x + z.