+ All Categories
Home > Documents > A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A...

A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A...

Date post: 28-Aug-2019
Category:
Upload: lamhanh
View: 213 times
Download: 0 times
Share this document with a friend
39
Parsing Expression Grammars: A Recognition-Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004
Transcript
Page 1: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Parsing Expression Grammars:A Recognition­Based Syntactic Foundation

Bryan FordMassachusetts Institute of Technology

January 14, 2004

Page 2: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Designing a Language Syntax

Page 3: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Designing a Language Syntax

1.Formalize syntax via context­free grammar

2.Write a YACC parser specification

3.Hack on grammar until “ near­LALR(1)”

4.Use generated parser

Textbook Method

Page 4: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Designing a Language Syntax

1.Formalize syntax via context­free grammar

2.Write a YACC parser specification

3.Hack on grammar until “ near­LALR(1)”

4.Use generated parser

1.Specify syntax informally

2.Write a recursive descent parser

Textbook Method Pragmatic Method

Page 5: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What exactly does a CFG describe?

Short answer:a rule system to generate language strings

Example CFG:

S  aaSS  

S

aaS

aa aaaaS

...aaaa

Page 6: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What exactly does a CFG describe?

Short answer:a rule system to generate language strings

Example CFG:

S  aaSS  

S

aaS

aa aaaaS

...aaaa

Start symbol

Page 7: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What exactly does a CFG describe?

Short answer:a rule system to generate language strings

Example CFG:

S  aaSS  

S

aaS

aa aaaaS

...aaaa

Start symbol

Output strings

Page 8: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What exatly do we want to describe?

Proposed answer:a rule system to recognize language strings

Parsing Expression Grammar (PEG)models recursive descent parsing practice

Example PEG:

S  aaS / 

a a a a

SS

Sa a

a a

Page 9: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What exatly do we want to describe?

Proposed answer:a rule system to recognize language strings

Parsing Expression Grammar (PEG)models recursive descent parsing practice

Example PEG:

S  aaS / 

a a a a

SS

Sa a

a a

Inputstring

Page 10: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What exatly do we want to describe?

Proposed answer:a rule system to recognize language strings

Parsing Expression Grammar (PEG)models recursive descent parsing practice

Example PEG:

S  aaS / 

a a a a

SS

Sa a

a a

Inputstring

Derivestructure

Page 11: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Take­Home Points

Key benefits of PEGs:

● Simplicity, formalism, analyzability of CFGs

● Closer match to syntax practices

– More expressive than deterministic CFGs (LL/LR)– More of the “ right kind”  of expressiveness:

prioritized choice, greedy rules, syntactic predicates

– Unlimited lookahead, backtracking

● Linear­time parsing for any PEG

Page 12: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What kind ofrecursive descent parsing?

Key assumptions:

● Parsing functions are stateless:depend only on input string

● Parsing functions make decisions locally:return at most one result (success/failure)

Page 13: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Parsing Expression Grammars

Consists of: (∑, N, R, eS)

– ∑: finite set of terminals (character set)– N: finite set of nonterminals

– R: finite set of rules of the form “ A  e” ,where A ∈ N, e is a parsing expression.

– eS: a parsing expression called the start expression.

Page 14: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Parsing Expressions

the empty string

a terminal (a ∈ ∑)

A nonterminal (A ∈ N)

e1 e2 a sequence of parsing expressions

e1 / e2 prioritized choice between alternatives

e?, e*, e+ optional, zero­or­more, one­or­more

&e, !e syntactic predicates

Page 15: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

How PEGs Express Languages

Given input string s, a parsing expression either:– Matches and consumes a prefix s' of s.

– Fails on s.

Example:

S  badS matches “badder”S matches “baddest”S fails on “abad”S fails on “babe”

Page 16: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Prioritized Choice with Backtracking

S  A / B means:

“ To parse an S, first try to parse an A.If A fails, then backtrack and try to parse a B.”

Example:S  if C then S else S /

if C then S

S matches “if C then S foo”S matches “if C then S 1 else S 2”S fails on “if C else S”

Page 17: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Prioritized Choice with Backtracking

S  A / B means:

“ To parse an S, first try to parse an A.If A fails, then backtrack and try to parse a B.”

Example from the C++ standard:

“ An expression­statement ... can be indistinguishable from a declaration ... In those cases the statement is a declaration.”

statement declaration /expression­statement

Page 18: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Greedy Option and Repetition

A  e? equivalent to A  e / 

A  e* equivalent to A  e A / 

A  e+ equivalent to A  e e*

Example:

I L+

L a / b / c / ...

I matches “foobar”I matches “foo(bar)”I fails on “123”

Page 19: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:

A foo &(bar)B foo !(bar)

A matches “foobar”A fails on “foobie”B matches “foobie”B fails on “foobar”

Page 20: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Page 21: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Begin marker

Page 22: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Internal elements

Page 23: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

End marker

Page 24: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Page 25: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Only if an end marker doesn't start here...

Page 26: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Only if an end marker doesn't start here...

...consume a nested comment,or else consume any single character.

Page 27: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Syntactic Predicates

And­predicate: &e succeeds whenever e does,but consumes no input [Parr '94, '95]

Not­predicate: !e succeeds whenever e fails

Example:C B I* EI !E (C / T)B (*E *)T [any terminal]

C matches “(*ab*)cd”C matches “(*a(*b*)c*)”C fails on “(*a(*b*)”

Page 28: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Unified Grammars

PEGs can express both lexical and hierarchical syntax of realistic languages in one grammar

● Example (in paper):Complete self­describing PEG in 2/3 column

● Example (on web):Unified PEG for Java language

Page 29: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Lexical/Hierarchical Interplay

Unified grammars create new design opportunities

Example:

To get Unicode “∀”,instead of “\u2200”,write“\(0x2200)”or “\(8704)”or “\(FOR_ALL)”

E S / ( E ) / ...S “ C* “C \( E ) /

!“ !\ TT [any terminal]

Page 30: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Lexical/Hierarchical Interplay

Unified grammars create new design opportunities

Example:

To get Unicode “∀”,instead of “\u2200”,write“\(0x2200)”or “\(8704)”or “\(FOR_ALL)”

E S / ( E ) / ...S “ C* “C \( E ) /

!“ !\ TT [any terminal]

General­purpose expression syntax

Page 31: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Lexical/Hierarchical Interplay

Unified grammars create new design opportunities

Example:

To get Unicode “∀”,instead of “\u2200”,write“\(0x2200)”or “\(8704)”or “\(FOR_ALL)”

E S / ( E ) / ...S “ C* “C \( E ) /

!“ !\ TT [any terminal]

String literals

Page 32: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Lexical/Hierarchical Interplay

Unified grammars create new design opportunities

Example:

To get Unicode “∀”,instead of “\u2200”,write“\(0x2200)”or “\(8704)”or “\(FOR_ALL)”

E S / ( E ) / ...S “ C* “C \( E ) /

!“ !\ TT [any terminal]

Quotable characters

Page 33: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Lexical/Hierarchical Interplay

Unified grammars create new design opportunities

Example:

To get Unicode “∀”,instead of “\u2200”,write“\(0x2200)”or “\(8704)”or “\(FOR_ALL)”

E S / ( E ) / ...S “ C* “C \( E ) /

!“ !\ TT [any terminal]

Page 34: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Formal Properties of PEGs

● Express all deterministic languages ­ LR(k)● Closed under union, intersection, complement

● Some non­context free languages, e.g., anbncn

● Undecidable whether L(G) = ∅● Predicate operators can be eliminated

– ...but the process is non­trivial!

Page 35: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Minimalist Forms

Predicate­free PEG⇩

TS [Birman '70/'73]

TDPL [Aho '72]

Any PEG⇩

gTS [Birman '70/'73]

GTDPL [Aho '72]

A  A  aA  fA  BC / D

A  A  aA  fA  B[C, D]

⇦⇨

Page 36: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Formal Contributions

● Generalize TDPL/GTDPL with more expressivestructured parsing expression syntax

● Negative syntactic predicate ­ !e ● Predicate elimination transformation

– Intermediate stages depend ongeneralized parsing expressions

● Proof of equivalence of TDPL and GTDPL

Page 37: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

What can't PEGs express directly?

● Ambiguous languages

That's what CFGs were designed for!

● Globally disambiguated languages?– {a,b}n a {a,b}n ?

● State­ or semantic­dependent syntax– C, C++ typedef symbol tables– Python, Haskell, ML layout

Page 38: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Generating Parsers from PEGs

Recursive­descent parsing☞Simple & direct, but exponential­time if not careful

Packrat parsing [Birman '70/'73, Ford '02]☞Linear­time, but can consume substantial storage

Classic LL/LR algorithms?☞Grammar restrictions, but both time­ & space­efficient

Page 39: A RecognitionBased Syntactic Foundation - Bryan Fordbford.info/pub/lang/peg-slides.pdf · If A fails, then backtrack and try to parse a B.” ... syntax of realistic languages in

Conclusion

PEGs model common parsing practices– Prioritized choice, greedy rules, syntactic predicates

PEGs naturally complement CFGs– CFG: generative system, for ambiguous languages– PEG: recognition­based, for unambiguous languages

For more info:http://pdos.lcs.mit.edu/~baford/packrat(or GGooooggllee for “ Packrat Parsing”)


Recommended