+ All Categories
Home > Documents > ANTLR v3 Overview (for ANTLR v2 users)

ANTLR v3 Overview (for ANTLR v2 users)

Date post: 06-Feb-2016
Category:
Upload: rasia
View: 129 times
Download: 4 times
Share this document with a friend
Description:
ANTLR v3 Overview (for ANTLR v2 users). Terence Parr University of San Francisco. Topics. Information flow v3 grammars Error recovery Attributes Tree construction Tree grammars Code generation Internationalization Runtime support. Block Info Flow Diagram. Grammar Syntax. header {…} - PowerPoint PPT Presentation
32
1 ANTLR v3 Overview (for ANTLR v2 users) Terence Parr University of San Francisco
Transcript
Page 1: ANTLR v3 Overview (for ANTLR v2 users)

1

ANTLR v3 Overview(for ANTLR v2 users)

Terence Parr

University of San Francisco

Page 2: ANTLR v3 Overview (for ANTLR v2 users)

2

Topics

Information flowv3 grammarsError recoveryAttributesTree constructionTree grammarsCode generationInternationalizationRuntime support

Page 3: ANTLR v3 Overview (for ANTLR v2 users)

3

Block Info Flow Diagram

Page 4: ANTLR v3 Overview (for ANTLR v2 users)

4

Grammar Syntax

header {…}/** doc comment */kind grammar name;options {…}tokens {…}scopes…actionrules…

/** doc comment */rule[String s, int z] returns [int x, int y] throws E options {…} scopes init {…} : | ; exceptions

^(root child1 … childN)Trees

Note: No inheritance

Page 5: ANTLR v3 Overview (for ANTLR v2 users)

5

Grammar improvements

Single element EBNF like ID*Combined parser/lexerAllows ‘c’ and “literal” literalsMultiple parameters, return valuesLabels do not have to be unique

(x=ID|x=INT) {…$x…}For combined grammars, warns when

tokens are not defined

Page 6: ANTLR v3 Overview (for ANTLR v2 users)

6

Example Grammargrammar SimpleParser;program : variable* method+ ;variable: "int" ID (‘=‘ expr)? ';’ ;method : "method" ID '(' ')' '{' variable* statement+ '}' ;statement : ID ‘=‘ expr ';' | "return" expr ';' ;expr : ID | INT ;ID : ('a'..'z'|'A'..'Z')+ ;INT : '0'..'9'+ ;WS : (' '|'\t'|'\n')+ {channel=99;} ;

Page 7: ANTLR v3 Overview (for ANTLR v2 users)

7

Using the parserCharStream in = new ANTLRFileStream(“inputfile”);SimpleParserLexer lexer = new SimpleParserLexer(in);CommonTokenStream tokens = new CommonTokenStream(lexer);SimpleParser p = new SimpleParser(tokens);p.program(); // invoke start rule

Page 8: ANTLR v3 Overview (for ANTLR v2 users)

8

Improved grammar warnings

they happen less often ;)internationalized (templates again!)gives (smallest) sample input sequencebetter recursion warnings

Page 9: ANTLR v3 Overview (for ANTLR v2 users)

9

Recursion Warningsa : a A | B ;

t.g:2:5: Alternative 1 discovers infinite left-recursion to a from a

t.g:2:5: Alternative 1: after matching input such as B decision cannot predict what comes next due to recursion overflow to c from b

// with -Im 0 (secret internal parameter)a : b | B ;b : c ;c : B b ;

Page 10: ANTLR v3 Overview (for ANTLR v2 users)

10

Nondeterminisms

t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that inputt.g:2:5: The following alternatives are unreachable: 2

a : (A B|A B) C ;

a : (A+ B|A+ B) C ;

t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2

Page 11: ANTLR v3 Overview (for ANTLR v2 users)

11

Runtime Objects of Interest

Lexer passes all tokens to the parser, but parser listens to only a single “channel”; channel 99, for example, where I place WS tokens, is ignored

Tokens have start/stop index into single text input buffer

Token is an abstract class TokenSource

anything answering nextToken() TokenStream

stream pulling from TokenSource; LT(i), … CharStream

source of characters for a lexer; LT(i), …

Page 12: ANTLR v3 Overview (for ANTLR v2 users)

12

Error Recovery

ANTLR v3 does what Josef Grosch does in Cocktail

Does single token insertion or deletion if necessary to keep going

Computes context-sensitive FOLLOW to do insert/delete proper context is passed to each rule invocation knows precisely what can follow reference to r

rather than what could follow any reference to r (per Wirth circa 1970)

Page 13: ANTLR v3 Overview (for ANTLR v2 users)

13

Example Error Recoveryint i = 0;method foo( { int j = i; i = 4}

[program, method]: line 2:12 mismatched token:[@14,23:23='{',<14>,2:12]; expecting type ')'

[program, method, statement]: line 5:0 mismatched token:[@31,46:46='}',<15>,5:0]; expecting type ';'

int i = 0;method foo() ) { int j = i; i = = 4;}

[program, method]: line 2:13 mismatched token:[@15,24:24=')',<13>,2:13]; expecting type '{'

[program, method, statement, expr]: line 4:6 mismatched token:[@32,47:47='=',<6>,4:6]; expecting set null

Note: I put in two errors each so you’ll see it continues properly

One token insertion

One token deletion

Page 14: ANTLR v3 Overview (for ANTLR v2 users)

14

Attributes

New label syntax and multiple return values

Unified token, rule, parameter, return value, tree reference syntax in actions

Dynamically scope attributes!

a[String s] returns [float y] : id=ID f=field (ids+=ID)+ {$s, $y, $id, $id.text, $f.z; $ids.size();} ;field returns [int x, int z] : … ;

Page 15: ANTLR v3 Overview (for ANTLR v2 users)

15

Label properties

Token label reference properties text, type, line, pos, channel, index, tree

Rule label reference properties start, stop; indices of token boundaries tree text; text matched for whole rule

Page 16: ANTLR v3 Overview (for ANTLR v2 users)

16

Rule Scope Attributes

A rule may define a scope of attributes visible to any invoked rule; operates like a stacked global variable

Avoids having to pass a value downmethodscope { String name; } : "method" ID '(' ')' {$name=$ID.text;} body ;body: '{' stat* '}’ ;…atom init {… $method.name …} : ID | INT ;

Page 17: ANTLR v3 Overview (for ANTLR v2 users)

17

Global Scope Attributes

Named scopes; rules must explicitly request access

scope Symbols { List names; }{int level=0;}

globalsscope Symbols;init { level++; $Symbols.names = new ArrayList();} : decl* {level--;} ;

blockscope Symbols;init { level++; $Symbols.names = new ArrayList();} : '{' decl* stat* '}’ {level--;} ;

decl : "int" ID ';' {$Symbols.names.add($ID);} ;

*What if we want to keep the symbol tables around after parsing?

Page 18: ANTLR v3 Overview (for ANTLR v2 users)

18

Tree Support

TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are Object type

Tree; used by support codeBaseTree; List of children, w/o payload (no

more child-sibling trees)CommonTree; node wrapping Token as

payloadParseTree; used by interpreter to build trees

Page 19: ANTLR v3 Overview (for ANTLR v2 users)

19

Tree Construction

Automatic mechanism is same as v2 except ^ is now ^^expr : atom ( '+'^^ atom )* ;

^ implies root of tree for enclosing subrulea : ( ID^ INT )* ; builds (a 1) (b 2) …

Token labels are $label not #label and rule invocation tree results are $ruleLabel.tree

Turn onoptions {output=AST;}(one can imagine output=text for templates)

Option: ASTLabelType=CommonTree;

Page 20: ANTLR v3 Overview (for ANTLR v2 users)

20

Tree Rewrite Rules

Maps an input grammar fragment to an output tree grammar fragment

variable : type declarator ';' -> ^(VAR_DEF type declarator) ;

functionHeader : type ID '(' ( formalParameter ( ',' formalParameter )* )? ')' -> ^(FUNC_HDR type ID formalParameter+) ;

atom : … | '(' expr ')' -> expr ;

Page 21: ANTLR v3 Overview (for ANTLR v2 users)

21

Mixed Rewrite/Auto Trees

Alternatives w/o -> rewrite use automatic mechanism

b : ID INT -> INT ID | INT // implies -> INT ;

Page 22: ANTLR v3 Overview (for ANTLR v2 users)

22

Rewrites and labels

Disambiguates element references or used to construct imaginary nodes

Concatenation += labels useful too:

forStat : "for" '(' start=assignStat ';' expr ';' next=assignStat ')' block -> ^("for" $start expr $next block) ;block : lc='{' variable* stat* '}’ -> ^(BLOCK[$lc] variable* stat*) ;

/** match string representation of tree and build tree in memory */tree : ‘^’ ‘(‘ root=atom (children+=tree)+ ‘)’ -> ^($root $children) | atom ;

Page 23: ANTLR v3 Overview (for ANTLR v2 users)

23

Loops in Rewrites

Repeated elementID ID -> ^(VARS ID+)yields ^(VARS a b)

Repeated treeID ID -> ^(VARS ID)+yields ^(VARS a) ^(VARS b)

Multiple elements in loop need same size ID INT ID INT -> ^( R ID ^( S INT) )+yields(R a (S 1)) (R b (S 2))

Checks cardinality + and * loops

Page 24: ANTLR v3 Overview (for ANTLR v2 users)

24

Preventing cyclic structures

Repeated elements get duplicateda : INT -> INT INT ; // dups INT!a : INT INT -> INT+ INT+ ; // 4 INTs!

Repeated rule references get duplicateda : atom -> ^(atom atom) ; // no cycle!

Duplicates whole tree for all but first ref to an element; here 2nd ref to atom results in a duplicated atom tree

*Useful example “int x,y” -> “^(int x) ^(int y)”decl : type ID (‘,’ ID)* -> ^(type ID)+ ;

*Just noticed a bug in this one ;)

Page 25: ANTLR v3 Overview (for ANTLR v2 users)

25

Predicated rewrites

Use semantic predicate to indicate which rewrite to choose from

a : ID INT -> {p1}? ID -> {p2}? INT -> ;

Page 26: ANTLR v3 Overview (for ANTLR v2 users)

26

Misc Rewrite Elements

Arbitrary actionsa : atom -> ^({adaptor.createToken(INT,"9")} atom) ;

rewrite always sets the rule’s AST not subrule’s

Reference to previous value (useful?)

b : "int" ( ID -> ^(TYPE "int" ID) | ID '=' INT -> ^(TYPE "int" ID INT) ) ;

a : (atom -> atom) (op='+' r=atom -> ^($op $a $r) )* ;

Page 27: ANTLR v3 Overview (for ANTLR v2 users)

27

Tree Grammars

Syntax same as parser grammars, add^(root children…) tree element

Uses LL(*) also; even derives from same superclass! Tree is serialized to include DOWN, UP imaginary tokens to encode 2D structure for serial parser

variable : ^(VAR_DEF type ID) | ^(VAR_DEF type ID ^(INIT expr)) ;

Page 28: ANTLR v3 Overview (for ANTLR v2 users)

28

Code Generation

Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful!

Separates code gen logic from output; not a single character of output in the Java code

Java.stg: 140 templates, 1300 lines

Page 29: ANTLR v3 Overview (for ANTLR v2 users)

29

Sample code gen templates

/** Dump the elements one per line and stick in debugging * location() trigger in front. */element() ::= <<<if(debug)>dbg.location(<it.line>,<it.pos>);<\n><endif><it.el><\n>>>

/** match a token optionally with a label in front */tokenRef(token,label,elementIndex) ::= <<<if(label)><label>=input.LT(1);<\n><endif>match(input,<token>,FOLLOW_<token>_in_<ruleName><elementIndex>);>>

Page 30: ANTLR v3 Overview (for ANTLR v2 users)

30

Internationalization

ANTLR v3 uses StringTemplate to display all errors

Senses locale to load messages;en.stg: 76 templates

ErrorManager error number constants map to a template name; e.g., RULE_REDEFINITION(file,line,col,arg) ::= "<loc()>rule <arg> redefinition”

/* This factors out file location formatting; file,line,col inherited from * enclosing template; don't manually pass stuff in. */loc() ::= "<file>:<line>:<col>: "

Page 31: ANTLR v3 Overview (for ANTLR v2 users)

31

Runtime Support

Better organized, separated:org.antlr.runtimeorg.antlr.runtime.treeorg.antlr.runtime.debug

Clean; Parser has input ptr only (except error recovery FOLLOW stack); Lexer also only has input ptr

4500 lines of Java code minus BSD header

Page 32: ANTLR v3 Overview (for ANTLR v2 users)

32

Summary

v3 kicks assit sort of works!http://www.antlr.org/download/…ANTLRWorks progressing in parallel


Recommended