+ All Categories
Home > Documents > Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331...

Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331...

Date post: 06-Mar-2018
Category:
Upload: duongkiet
View: 228 times
Download: 1 times
Share this document with a friend
40
Recursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February 15, 2017 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2017 Glenn G. Chappell continued
Transcript
Page 1: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent Parsing

CS F331 Programming LanguagesCSCE A331 Programming Language ConceptsLecture SlidesWednesday, February 15, 2017

Glenn G. ChappellDepartment of Computer ScienceUniversity of Alaska [email protected]

© 2017 Glenn G. Chappell

continued

Page 2: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewOverview of Lexing & Parsing

Two phases:§ Lexical analysis (lexing)§ Syntax analysis (parsing)

The output of a parser is often an abstractsyntax tree (AST). Specifications of these can vary.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

ParserLexemeStream

ASTor Error

cout << ff(12.6);

id op id litop

punctop

expr

binOp: <<

expr

id: cout funcCall

expr

id: ff

numLit: 12.6

expr

LexerCharacter

Streamcout << ff(12.6);

Parsing

2

Page 3: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewIntroduction to Syntax Analysis — Categories of Parsers

Parsing algorithms can be divided into two broad categories.Top-Down Parsing Algorithms

§ Go through derivation from top to bottom, expanding nonterminals.§ Usually produce a leftmost derivation.§ Important subclass: LL (read Left-to-right, Leftmost derivation).

§ Common reason a top-down parser may not be LL: doing lookahead.§ Often hand-coded.§ Example algorithm we will look at: Recursive Descent.

Bottom-Up Parsing Algorithms§ Go through the derivation from bottom to top, collapsing substrings.§ Usually produce a rightmost derivation.§ Important subclass: LR (read Left-to-right, Rightmost derivation).

§ Common reason a bottom-up parser may not be LR: doing lookahead.§ Almost always automatically generated.§ Example algorithm we will look at: Shift-Reduce.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 3

Page 4: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewIntroduction to Syntax Analysis — Categories of Grammars

Grammars that LL parsers can use are LL grammars.Grammars that LR parsers can use are LR grammars.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

All Grammars

LR Grammars

CFGs

LL Grammars

4

Page 5: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Intro, How It Works, Example #1

Recursive Descent is a top-down, LL parsing algorithm.§ There is one parsing function for each nonterminal.§ A parsing function is responsible for parsing all strings that its

nonterminal can be expanded into.

We wrote a Recursive-Descent parser based on Grammar 1.

Grammar 1item → “(” item “)”item → thingthing → IDthing → “%”

Our parser does not generate an AST.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

See rdparser1.lua.

Grammar 1aitem → “(” item “)”

| thingthing → ID

| “%”

5

Page 6: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Handling Incorrect Input

Our parser might call an input correct when it could not parse the entire input.

Example: ((x)))

Two Solutions§ Introduce a new end of input lexeme. Revise the grammar to

include it.§ After parsing, check to see the end of the input was reached.

Our parsers use the second solution.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 6

Page 7: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #2: More Complex [1/2]

We wrote a Recursive-Descent parser for the following more complex grammar, whose start symbol is still item.

Grammar 2item → “(” item “)”

| thingthing → ID { ( “,” | “:” ) ID }

| “%”| [ “*” “-” ] “[” item “]”

All strings in the old language are also in the new language. But now we can get strings like these:§ ((a,b,c:d))§ ((*-[([%])]))

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Recall:

Braces mean optional, repeatable (0 or more).

Brackets mean optional (0 or 1).

Note the difference between the following:

[ “[”

7

Page 8: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #2: More Complex [2/2]

Grammar 2item → “(” item “)” | thingthing → ID { ( “,” | “:” ) ID } | “%” | [ “*” “-” ] “[” item “]”

In a parsing function:[ … ] Brackets (optional: 0 or 1) become a conditional.

§ Check for the possible initial lexemes inside the brackets. If found, parse everything inside the brackets. Otherwise skip the brackets.

{ … } Braces (optional, repeatable: 0 or more) become a loop.§ Loop body: Check for the possible initial lexemes inside the braces.

If not found, then exit loop, moving to just after the braces. If found, parse everything inside the braces, and then REPEAT.

TO DO§ Write a Recursive-Descent parser based on Grammar 2.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Done. See rdparser2.cpp.

8

Page 9: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #3: Expressions [1/5]

Now we bump up our standards. We wish to parse arithmetic expressions in their usual form, with variables, numeric literals, binary +, -, *, and / operators, and parentheses. When given a syntactically correct expression, our parser should return an abstract syntax tree (AST).§ All operators will be binary and left-associative, so that, for

example, “a + b + c” means “(a + b) + c”.§ Precedence will be as usual, so that “a + b * c” means

“a + (b * c)”.§ Precedence and associativity may be overridden using parentheses:

“(a + b) * c”.

Due to the limitations of our lexer, the expression “k-4” will need to be rewritten as “k - 4”.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 9

Page 10: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #3: Expressions [2/5]

We begin with the following grammar, with start symbol expr.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

Grammar 3 encodes our associativity and precedence rules.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 10

Page 11: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #3: Expressions [3/5]

To the right is part of a parsing function for nonterminal expr.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

What is wrong with this code?

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

function parse_expr()if parse_term() then

return trueelseif parse_expr() then

11

Page 12: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #3: Expressions [4/5]

function parse_expr()if parse_term() then

return trueelseif parse_expr() then

What is wrong with this code?§ First, if the call to parse_term returns false, then the position in the

input may have changed. Fixing this requires backtracking, which can lead to extreme inefficiency.

§ But even if we can solve that, there is a more serious problem. Suppose parse_expr is called with input that does not begin with a valid term. What happens? Answer: infinite recursion!

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 12

Page 13: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — Example #3: Expressions [5/5]

In fact, without lookahead, it is impossible to write a Recursive-Descent parser for Grammar 3.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

Recall that a Recursive-Descent parser requires an LL grammar. But Grammar 3 is not an LL grammar. Next we look at LL grammars. We return to the expression-parsing problem later.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 13

Page 14: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [1/8]

An LL grammar is a CFG that can be handled by an LL parsing algorithm, such as Recursive Descent, if multiple-lexeme lookahead is not done.

Recall the origin of the name: these parsers handle their input in a strictly Left-to-right order, and they go through the steps required to generate a Leftmost derivation.

Now we look at some of the properties that an LL grammar must have.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 14

Page 15: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [2/8]

Consider the following grammar.

Grammar Axx → xx “+” “b” | “a”

A parsing function would begin:

function parse_xx()if parse_xx() then

We have recursion without a base-case check.The trouble lies in the grammar. The right-hand side of the

production for xx begins with xx. This is left recursion. It is not allowed in an LL grammar.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 15

Page 16: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [3/8]

Left recursion can be more subtle. Below is a variation on Grammar A.

Grammar Axxx → yy “b” | “a”yy → xx “+”

Grammar Ax also contains left recursion. It is not LL.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 16

Page 17: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [4/8]

The grammar below illustrates a more general problem.

Grammar Bxx → “a” yy | “a” zzyy → “*”zz → “/”

We cannot even being to write a Recursive-Descent parser for Grammar B. How would the code for function parse_xx begin? Should it take the first or second option? There is no way to tell, without lookahead.

We say the first production in Grammar B is not left-factored. An LL grammar can only contain left-factored productions.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 17

Page 18: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [5/8]

Here is another problematic grammar.

Grammar Cxx → yy | zzyy → “” | “a”zz → “” | “b”

In Grammar C, the empty string can be derived from either yy or zz. So if there is no more input, then there is no basis for making the xx-or-yy decision in the first production.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 18

Page 19: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [6/8]

One last non-LL grammar.

Grammar Dxx → yy “a”yy → “a” | “”

The strings “a” and “aa” lie in the language generated by Grammar D. But imagine a Recursive-Descent parser based on Grammar D, attempting to parse these strings. What would happen?

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 19

Page 20: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [7/8]

It turns out that the problems presented by Grammars A–D illustrate all the reasons a CFG might not be LL.

Fact.* Suppose that a context-free grammar G has the following three properties.1. If A → α and A → β are productions in G, then there do not exist

two strings, one derived from α, the other derived from β, that begin with the same (terminal) symbol.

2. If A → α and A → β are productions in G, then it is not the case that the empty string can be derived from both α and β.

3. If A → α and A → β are productions in G, and the empty string can be derived from β, then there is no (terminal) symbol x that begins a string that can be derived from α, such that x can follow a string derived from A.

Then Grammar G is an LL grammar.

*Adapted from A.V. Aho, R. Sethi, and J.D. Ullman,Compilers: Principles, Techniques, and Tools, 1986, p. 192.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

(1) does not hold for Grammars A, AA, and B; (2) does not hold

for Grammar C; and (3) does not hold for Grammar D.

20

Page 21: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

ReviewRecursive-Descent Parsing — LL Grammars: Properties [8/8]

In addition:

Fact. Suppose that G is an LL grammar. Then,§ G is not ambiguous, and§ G does not contain left recursion.

In general, when there is a choice to be made, an LL parser must be able to make that choice based on the current lexeme. If this cannot be done, then the grammar is not LL.

Now suppose—as in our expression-parsing example—that we wish to write a Recursive-Descent parser, but our grammar is not LL. What can we do about this?

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 21

Page 22: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingLL Grammars — Transforming [1/5]

If a grammar is not LL, this does not mean that the grammar must be completely useless as a basis for a Recursive-Descent parser. We might be able to transform the grammar into an LL grammar that generates the same language.

For example, here is Grammar A, which is not LL, along with an LL grammar that generates the same language.

Grammar Axx → xx “+” “b” | “a”

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Grammar Aaxx → “a” yyyy → “” | “+” “b” yy

continued

22

Page 23: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingLL Grammars — Transforming [2/5]

Grammar B, which is not LL, along with an LL grammar that generates the same language.

Grammar Bxx → “a” yy | “a” zzyy → “*”zz → “/”

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Grammar Baxx → “a” yyyy → “*” | “/”

23

Page 24: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingLL Grammars — Transforming [3/5]

Grammar C, which is not LL, along with an LL grammar that generates the same language.

Grammar Cxx → yy | zzyy → “” | “a”zz → “” | “b”

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Grammar Caxx → yy | zz | “”yy → “a”zz → “b”

24

Page 25: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingLL Grammars — Transforming [4/5]

And Grammar D, which is not LL, along with an LL grammar that generates the same language.

Grammar Dxx → yy “a”yy → “a” | “”

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Grammar Daxx → “a” yyyy → “a” | “”

25

Page 26: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingLL Grammars — Transforming [5/5]

It is not at all uncommon to be faced with a grammar that is not LL, but that can be transformed easily to one that is LL. In particular, this is common in the specification of programming-language syntax.

Note, however, that there are context-free languages that cannot be generated by any LL grammar at all.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 26

Page 27: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — Left-Associativity [1/3]

Now we return to our expression grammar. It is given below. Recall that this is not an LL grammar.

Grammar 3expr → term

| expr ( “+” | “-” ) termterm → factor

| term ( “*” | “/” ) factorfactor → ID

| NUMLIT| “(” expr “)”

An easy fix is to reorder the operands; for example,expr ( “+” | “-” ) term becomes term ( “+” | “-” ) expr. I will also use [ … ] to make the grammar more concise.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 27

Page 28: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — Left-Associativity [2/3]

Here is the resulting grammar. This is an LL grammar.

Grammar 3aexpr → term [ ( “+” | “-” ) expr ]term → factor [ ( “*” | “/” ) term ]factor → ID

| NUMLIT| “(” expr “)”

But now we have a new problem: Grammar 3a is LL, but it encodes right-associative binary operators. We want our operators to be left-associative.

Fortunately, all is not lost …

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 28

Page 29: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — Left-Associativity [3/3]

Here is how we do it.

Grammar 3bexpr → term { ( “+” | “-” ) term }term → factor { ( “*” | “/” ) factor }factor→ ID

| NUMLIT| “(” expr “)”

Grammar 3b is what we want. Itis LL, and we can use it toparse left-associativebinary operators.

However, we still need to generate an AST.15 Feb 2017 CS F331 / CSCE A331 Spring 2017

function parse_expr()

if not parse_term() then

return false

end

while true do

if not matchString("+")

and not matchString("-")

then break

end

if not parse_term() then

return false

end

end

return true

end

29

Page 30: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [1/7]

We want write a parser that returns an abstract syntax tree(AST). First, we need to specify the format of an AST.

Recall that a parse tree, or concrete syntax tree, includes one leaf node for each lexeme in the input, and one non-leaf node for each nonterminal in the derivation.

However, an AST is more sparse. For example, below are reasonable ASTs for the expressions a + 2 and (a + 2) * b.

Lexemes that only guide parsing are omitted from an AST: semicolons to end statements, parentheses in expressions, etc.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

*

+ b

a 2

a 2

+

30

Page 31: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [2/7]

We need to represent these trees in Lua. Represent a single lexeme by its string form. If there is more than one node in an AST, then represent it as an array whose first item represents the root node and whose remaining items each represent one of the subtrees rooted at the child nodes, in order.

Here is the first AST above in Lua: { "+", "a", "2" }

And here is the second AST: { "*", { "+", "a", "2" }, "b" }

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

*

+ b

a 2

a 2

+

31

Page 32: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [3/7]

It is better to describe our ASTs in a way that does not require drawings of trees. So we specify the format of an AST for each line in our grammar.

Grammar 3b(1) expr → term { ( “+” | “-” ) term }(2) term → factor { ( “*” | “/” ) factor }(3) factor→ ID(4) | NUMLIT(5) | “(” expr “)”

(1) expr. If there is only a term, then the AST for the expr is the AST for the term. Otherwise, the AST is { OO, AA, BB }, where OO is the string form of the last operator, AA is the AST for everything before it, and BB is the AST for the last term.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 32

Page 33: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [4/7]

A term is handled similarly.

Grammar 3b(1) expr → term { ( “+” | “-” ) term }(2) term → factor { ( “*” | “/” ) factor }(3) factor→ ID(4) | NUMLIT(5) | “(” expr “)”

(2) term. If there is only a factor, then the AST for the term is the AST for the factor. Otherwise, the AST is { OO, AA, BB }, where OO is the string form of the last operator, AA is the AST for everything before it, and BB is the AST for the last factor.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 33

Page 34: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [5/7]

A factor has multiple options.

Grammar 3b(1) expr → term { ( “+” | “-” ) term }(2) term → factor { ( “*” | “/” ) factor }(3) factor→ ID(4) | NUMLIT(5) | “(” expr “)”

(3) factor: ID. AST for the factor: string form of the ID.(4) factor: NUMLIT. AST for the factor: string form of the NUMLIT.(5) factor: “(” expr “)”. AST for the factor: AST for the expr.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 34

Page 35: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [6/7]

Applying the various rules, the AST for (a + 2) * b is{ "*", { "+", "a", "2" }, "b" }

Each parsing function can now return a pair: a boolean and an AST. The boolean indicates a correct parse, as before. The AST is only valid if the boolean is true, in which case it will be in the specified format.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 35

Page 36: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingBack to Example #3: Expressions — ASTs [7/7]

Grammar 3b(1) expr → term { ( “+” | “-” ) term }(2) term → factor { ( “*” | “/” ) factor }(3) factor→ ID(4) | NUMLIT(5) | “(” expr “)”

(1) expr. If there is only a term, then the AST for the expr is the AST for the term. Otherwise, the AST is { OO, AA, BB }, where OO is the string form of the last operator, AA is the AST for everything before it, and BB is the AST for the last term.

(2) term. Similar to (1).(3) factor: ID. AST for the factor: string form of the ID.(4) factor: NUMLIT. AST for the factor: string form of the NUMLIT.(5) factor: “(” expr “)”. AST for the factor: AST for the expr.15 Feb 2017 CS F331 / CSCE A331 Spring 2017

TO DO§ Based on Grammar 3b,

write a Recursive-Descent parser that produces an AST, as described.

Done. See rdparser3.cpp.

36

Page 37: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingExample #4: Better ASTs [1/4]

The ASTs we have specified are not quite what we want.

We need to know whether each node represents an operator, an identifier, etc. The lexer already figured this out, but then we did not store this information in the AST.

And there is other information we could store. For example, in many PLs, “-” can be either a binary operator (a - b) or a unary operator (-x). The lexer does not know which it is. But the parser knows, and the parser could return this information to its caller.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 37

Page 38: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingExample #4: Better ASTs [2/4]

To give the caller additional information, we mark each node in the AST, indicating what kind of entity it is. So far, we have three kinds of nodes: binary operators, identifiers, and numeric literals. So we mark each node as being one of these three.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

*

+ b

a 2

binOp: *

binOp: + id: b

id: a numLit: 2

38

Page 39: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingExample #4: Better ASTs [3/4]

In the Lua form of our ASTs, we can replace each string with a two-item array. The first item in the array will be one of three constants: BIN_OP, ID_VAL, or NUMLIT_VAL. The second item will be the string form of the lexeme.

"/" { BIN_OP, "/" }"abc" { ID_VAL, "abc" }"123" { NUMLIT_VAL, "123" }

So the AST for a + 2 changes as shown below.

{ "+", "a", "2" } {{ BIN_OP, "+" },{ ID_VAL, "a" },{ NUMLIT_VAL, "2" }}

15 Feb 2017 CS F331 / CSCE A331 Spring 2017 39

Page 40: Recursive-Descent Parsing continued - CS Home · PDF fileRecursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Wednesday, February

Recursive-Descent ParsingExample #4: Better ASTs [4/4]

"/" { BIN_OP, "/" }"abc" { ID_VAL, "abc" }"123" { NUMLIT_VAL, "123" }

TO DO§ Rewrite the Recursive-Descent parser based on Grammar 3b, so

that it produces the improved ASTs.

15 Feb 2017 CS F331 / CSCE A331 Spring 2017

Done. See rdparser4.cpp.

40


Recommended