+ All Categories
Home > Documents > 1 Lecture 7 CUP: An LALR Parser Generator for Java.

1 Lecture 7 CUP: An LALR Parser Generator for Java.

Date post: 21-Dec-2015
Category:
View: 226 times
Download: 1 times
Share this document with a friend
39
Lecture 7 CUP: An LALR Parser Generator for Java
Transcript

1

Lecture 7

CUP: An LALR Parser Generator for Java

2

1. Introduction and Example CUP:

» a system for generating LALR parsers from simple specifications.

» like yacc, but» written in Java, uses specifications including

embedded Java code, and produces parser code in Java.

3

Grammar File Format in summary

preamble» declarations to specify how the parser is to be

generated, and supply parts of the runtime code.

» package_decl import_decl » user_code parser_code init_code scan_code

Terminal and Nonterminal declarations (name and type)

precedence and associativity of terminals grammar rules

4

An example CUP Grammar (1. preamble)

/* CUP specification for a simple expression evaluator (no actions) */import java_cup.runtime.*;

// Preliminaries to set up and use the scanner.

init with {: scanner.init(); :}; scan with {: return scanner.next_token(); :};

5

2. Declaration of terminals and nonterminals

/* Terminals (tokens returned by the scanner). */

terminal PLUS, MINUS, TIMES, DIVIDE, MOD; terminal UMINUS, LPAREN, RPAREN; SEMI, terminal Integer NUMBER; /* Non terminals */ non terminal expr_list, expr_part; nonterminal Integer expr, term, factor;

// no type ==> no value associated with the symbol

6

3. Precedences and association

/* Precedences */ precedence left PLUS, MINUS;precedence left TIMES, DIVIDE, MOD; precedence left UMINUS;

// Rules:1. Terminals appearing at the same line has

the same precedence.2. A < B iff the line A appears is above the

line that B occurs 3. possible associativity: left, right and

nonassoc.

7

4. The production rules/* The grammar without actions*/ expr_list ::= expr_list expr_part | expr_part;expr_part ::= expr SEMI; expr ::= expr PLUS expr | expr MINUS expr | expr TIMES expr | expr DIVIDE

expr | expr MOD expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;

8

How to use CUP prepare a grammar file (say, parser.cup) invoke:

» java java_cup.Main <parser.cup or» java java_cup.Main parser.cup

Then two files produced:» sym.java // contains constant decls, one for

each terminal (and nonterminal); used by scanner to refer to terminals.

» parser.java // implement the parser; containing two classes.

9

Grammar rules with action codes

/* The grammar */

expr_list ::= expr_list expr_part

|

expr_part;

expr_part ::= expr:e

{: System.out.println("= " + e); :}

SEMI

;

10

Grammar rules with action codes

expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2

{: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 TIMES expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIVIDE expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | expr:e1 MOD expr:e2 {: RESULT = new Integer(e1.intValue() % e2.intValue()); :} | NUMBER:n {: RESULT = n; :} | MINUS expr:e {: RESULT =

new Integer(0 - e.intValue()); :} %prec UMINUS | LPAREN expr:e RPAREN {: RESULT = e; :} ;

11

Variables available on the action code

RESULT : bound to the value of lhs node : name assigned to each symbol on the rhs.

» left, right : int variables for left and right position of the phrase in the input stream.

Example:

expr ::= expr:e1 PLUS expr:e2

{: RESULT = new Integer(

e1.intValue() + e2.intValue()); :}

// here e1left and e1rigth are both usable.

12

sym

myParser

$CUP$Parser$action

Scanner

Symbol

lr_parser

MyScanner

java_cup.runtimemyPackage(generated by CUP)

Relationship for related CUP classes

13

Typical client code/* create a parsing object */ parser parser_obj = new parser(new

MyScanner( new FileReader(“myFileName”)); /* open input files, etc. here */ Symbol parse_tree = null; try { if (do_debug_parse) parse_tree = parser_obj.debug_parse(); else parse_tree = parser_obj.parse(); } catch (Exception e) { /* do cleanup here - - possibly rethrow e */ } finally { /* do close out here */ }

14

The java_cup.runtime.Symbol

class public class Symbol {

» public Symbol(int id, int l, int r, Object o)» public Symbol(int id, Object o)» public Symbol(int id, int l, int r) …

public int sym; // kind of Symbol

public int parse_state; // used while staying in stack

boolean used_by_parser = false;

public int left, right; // left right position in input stream

public Object value; // filed for storing semantic value.

public String toString();

}

15

The scanner interfacepackage java_cup.runtime;

public interface Scanner {

/** Return the next token, or <code>null</code>

on end-of-file. */

public Symbol next_token() throws

java.lang.Exception;

}

16

java_cup.runtime.lr_parser

public abstract class lr_parser {

public lr_parser() { }

public lr_parser(Scanner s) { this(); setScanner(s) }

private Scanner _scanner;

public void setScanner(Scanner s) { _scanner = s; }

public Scanner getScanner() { return _scanner; }

public void user_init() throws java.lang.Exception {}

// { %initWithCode } in subclass

17

public Symbol scan() throws java.lang.Exception {

Symbol sym = getScanner().next_token();

return (sym!=null) ? sym : new Symbol(EOF_sym());

}

public void report_fatal_error(String message, Object info) throws java.lang.Exception …

public Symbol parse() throws java.lang.Exception

18

actual code of parse()public Symbol parse() throws java.lang.Exception { /* the current action code */ int act; // the Symbol/stack element returned by a reduce Symbol lhs_sym = null; /* information about production being reduced with */ short handle_size, lhs_sym_num; /* set up direct reference to tables to drive the parser */ production_tab = production_table(); action_tab = action_table(); reduce_tab = reduce_table();

19

/* initialize the action encapsulation object */ init_actions(); // { $generatedByCUP } /* do user initialization */ user_init(); // { $initWithCode }

/* get the first token */ cur_token = scan();

/* push dummy Symbol with start state to get us underway */ stack.removeAllElements(); stack.push(new Symbol(0, start_state())); tos = 0;

20

/* continue until we are told to stop */

for (_done_parsing = false; !_done_parsing ; ) {

/* Check current token for freshness. */

if (cur_token.used_by_parser) throw new Error( … );

/* current state is always on the top of the stack */

/* look up action out of the current state with the current input */

act = get_action(((Symbol)stack.peek()).parse_state, cur_token.sym);

21

/* decode the action -- > 0 encodes shift */

if (act > 0) { /* shift */

cur_token.parse_state = act-1;

cur_token.used_by_parser = true;

stack.push(cur_token); tos++;

/* advance to the next Symbol */

cur_token = scan(); }

/* if its less than zero, then it encodes a reduce action */

else if (act < 0) { /* reduce, first arg is action number */

22

else if (act < 0) { /* reduce, first arg is action number */lhs_sym = do_action((-act)-1, this, stack, tos);

/* look up information about the production */ lhs_sym_num = production_tab[(-act)-1][0]; handle_size = production_tab[(-act)-1][1]; /* pop the handle off the stack */ for (int i = 0; i < handle_size; i++)

{ stack.pop(); tos--; } /* look up the state to go to from the one popped back to */

act = get_reduce(((Symbol)stack.peek()).parse_state, lhs_sym_num); /* shift to that state */ lhs_sym.parse_state = act;lhs_sym.used_by_parser = true; stack.push(lhs_sym); tos++; }

23

/* finally if the entry is zero, we have an error */ else if (act == 0) { /* call user syntax error reporting routine */

syntax_error(cur_token); /* try to error recover, arg is debug or not */ if (!error_recovery(false)) { /* if that fails give up with a fatal syntax error */

unrecovered_syntax_error(cur_token); /* just in case that wasn't fatal enough, end parse */

done_parsing(); } else { // continue to parse lhs_sym = (Symbol)stack.peek(); }

} } return lhs_sym; }

24

public class parser extends lr_parser{

$parserCode public Symbol scan()

{ $scanWithCode }problic user_init() {

$initWithCode }…}

class lr_parser {

public Symbol parse() {…

user_init() ;…

// getFirstToken()… scan() …

}

class CUP$Parser$action {$actionCode

…Symbol CUP$parser$do_action

{ … $actionCodeInRules …}

}

Where the user codes are put in the generated class

25

Usage of various user code components

action code : // action code {: … :}» declaration of additional variables used in grammar

rules, e.g., symbol table,..» helper methods used in rules

parser code // parser with {: … :}» for customizing the generated parser» e.g., new error_report methods, scan(), …

user init code // init with {: … :}» initialize your scanner, tables, data structures required

for semantic actions scan code // scan with {: :}

» tell how to get a token from your scanner

26

Running CUP -package name // default is the empty package -parser name // name.java is the generated file

// default is parser. -symbols name // default is sym -interface

» Outputs the symbol constant code as an interface rather than as a class.

-nonterms // include also noterminal constant symbols -expect number // of conflicts to abort parsing -compact_red // enable table compaction for reduction -nowarn // warnings suppressed

27

-nosummary -progress -dump_grammar -dump_states -dump_tables -dump -time -debug -nopositions // => elefr, eright cause error -noscanner // obsolete -version

28

Scanner Interface1. Your scanner must implement the Scanner interface:

package java_cup.runtime; public interface Scanner { public Symbol next_token() throws java.lang.Exception;

}

2. lr_parser use scan() to fetch next token : public Symbol scan() throws java.lang.Exception { Symbol sym = getScanner().next_token(); return (sym!=null) ? sym : new Symbol(EOF_sym());

} You can override scan() in your parser code

( $scanWithCode )

29

3. The generated parser also contains a constructor :

public parser(Scanner)

which calls setScanner() with it. In most cases, the init with and scan with directives may be omitted.

You can simply create the parser with a reference to the desired scanner:

parser parser_obj = new parser(new my_scanner());

or set the scanner after the parser is created:

parser parser_obj = new parser(); parser_obj.setScanner(new my_scanner());

30

4. Integration with JLex Put the following three lines in the directives region. %implements java_cup.runtime.Scanner %function next_token %type java_cup.runtime.Symbol or simply:%cup Invoking the parser with the JLex scanner is then simply: parser parser_obj = new parser( new Yylex( some_InputStream_or_Reader)); Note CUP handles the JLex/JFlex convention of returning

null on EOF without a problem, so an %eofval directive is not required in the JLex specification (added in CUP 0.10k).

31

Syntax Error Handling Programs may contain errors at different levels:

» Lexical, such as misspelling an identifier, keyword, or operator.

» Syntactic, such as an arithmetic expression with unbalanced parentheses.

» Semantic, such as an operator applied to an incompatible operand.

» Logical, such as an infinitely recursive call. Goal of the error handler in a parser:

» Report the presence of errors clearly and accurately. » Recover from each error quickly enough to be able to

detect subsequent errors. » Do not significantly slow down the processing of correct

programs.

32

Error recovery strategies Panic mode recovery.

» On discovering an error, discard input symbols one at a time until one of a designated set of synchronizing tokens (normally delimiters such as semicolon or keyword end) is found.

» Simple but may skip a considerable amount of input. Phrase level recovery.

» Perform a local correction on the remaining input: replace a prefix of the remaining input by some string that allows the parse to continue.

» For example, replace a comma by a semicolon, delete an extra semicolon, or insert a missing semicolon.

» First used in the top town parsing.

33

Error Recovery Strategies Error productions.

» Use the language grammar augmented by some error production rules to detect common errors. These extra rules generate erroneous constructs.

Global correction. » Still perform a local correction but with global

knowledge leading to smallest replacements.

34

Error recovery in LL(1) parsers

Use panic mode recovery. When an error occurs looking for nonterminal A,

scan until an element of FOLLOW(A) is found, then pop A and continue.

If we can't match a terminal on the top of stack: » 1. pop the terminal » 2. print a message saying the terminal was

inserted » 3. continue the parse

35

Error recovery in shift-reduce parsers

The problem » encounter an invalid token » bad pieces of tree hanging from stack » incorrect entries in symbol table (type error… )

We want to parse the rest of the file » Restarting the parser » find a restartable state on the stack » move to a consistent place in the input » print an informative message (line number)

36

Error Recovery in CUP and YACC

Supports a special terminal symbol: error. » need not be declared.» can matches an erroneous input sequence.» error show synchronization point

When an error is discovered » pops the stack until error transition is legal » skips input tokens until it matches a certain

number (3) of tokens » error productions can have actions

37

Error recovery in yaccstmtList ::= stmt SEMI | stmtList stmt

SEMI ;

can be augmented with error

stmtList ::= stmt SEMI | stmtList stmt SEMI error SEMI ; This should

» throw out the erroneous statement » synchronize at SEMI. » invoke syntax_error()

38

Example production 2:

stmt ::= expr SEMI | while_stmt SEMI | if_stmt SEMI | ...

| error SEMI ; If error occurs

» 1. syntax_error() reported, and » 2. [recovery:] throw out the erroneous expr (or

while_statement or if_statement) statement » 3. synchronize at SEMI. » An error is considered to be recovered iff a sufficient

number of tokens past the error symbol can be successfully parsed. (= error_sync_size() method of the parser and defaults to 3).

Other natural places for errors » missing parentheses or brackets » extra operator or missing operator

39

S . expr SEMI S . while_stmt SEMI

S . if_stmt SEMI S . error SEMI

S error . SEMI

Xs a1 a2 a3 ; …

S error SEMI .

match errorpop() until errortransition legal

error+

SEMI

action(s,a1) is undefined

Error Recovery in CUP

Stack input


Recommended