1st year Master of Applied Computer Science Faculty of...

Post on 11-Aug-2020

0 views 0 download

transcript

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel 1

Section 4

Grammars

“specifying structure”

Fundamentals of Computer Science1st year Master of Applied Computer Science

Faculty of Engineering SciencesVrije Universiteit Brussel

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Pico{ QuickSort(V,Low,High): { Left: Low; Right: High; Pivot: V[(Left + Right) // 2]; Save: 0; until(Left > Right, { while(V[Left] < Pivot, Left:= Left+1); while(V[Right] > Pivot, Right:= Right-1); if(Left <= Right, { Save:= V[Left]; V[Left]:= V[Right]; V[Right]:= Save; Left:= Left+1; Right:= Right-1 }, void ) });display(Low, eoln); if(Low < Right, QuickSort(V, Low, Right), void); if(High > Left, QuickSort(V, Left, High), void) }; V[10000]: random(); QuickSort(V,1,size(V)); display(V[size(V)]) }

2

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

fac(n): if(n>1, n*fac(n-1), 1)

{! <NAM, fac>,<LPR>,<NAM, n>,<RPR>,<COL>,<NAM, if>,! <LPR>,<NAM, n>,<ROP, >>,<NBR, 1>,<COM>,<NAM, n>,! <MOP, *>,<NAM, fac>,<LPR>,<NAM, n>,<AOP, ->, ! <NBR, 1>,<RPR>,<COM>,<NBR, 1>,<RPR>,<END> }

Scanning text:

2

textual representation

tokenized representation

3

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Scanning text (cont'd):{ AOP_token: 1; CAT_token: 2; CEQ_token: 3; COL_token: 4; COM_token: 5; END_token: 6; FRC_token: 7; LBC_token: 8; LBR_token: 9; LPR_token: 10; MOP_token: 11; NAM_token: 12; NBR_token: 13; RBC_token: 14; RBR_token: 15; ROP_token: 16; RPR_token: 17; SMC_token: 18; TXT_token: 19; XOP_token: 20; scan_data: void; scan(): ...

3

token values

token attribute (string, number

or fraction)

4

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

init_scan('fac(n): if(n>1, n*fac(n-1), 1)'):<void>scan():12scan_data:facscan():10scan():12scan_data:nscan():17scan():4scan():12scan_data:if

trans

cript

4

5

Scanning text (cont'd):

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Scanning text (cont'd):aop: 1; apo: 2; bkq: 3; cat: 4; col: 5;com: 6; dgt: 7; eol: 8; eql: 9; exp: 10;ill: 11; lbc: 12; lbr: 13; lpr: 14; ltr: 15;mns: 16; mop: 17; per: 18; pls: 19; quo: 20;rbc: 21; rbr: 22; rop: 23; rpr: 24; smc: 25;wsp: 26; xop: 27;

ch_tab: [`end` wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, eol, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, wsp, xop, quo, rop, aop, aop, mop, apo, lpr, rpr, mop, pls, com, mns, per, mop, dgt, dgt, dgt, dgt, dgt, dgt, dgt, dgt, dgt, dgt, col, smc, rop, eql, rop, xop, cat, ltr, ltr, ltr, ltr, exp, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, lbr, mop, rbr, xop, ltr, ill, ltr, ltr, ltr, ltr, exp, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, ltr, lbc, aop, rbc, aop, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill, ill,

ill, ill, ill, ill, ill, ill, ill, ill ];

5

character categories

category of each ascii character (except the first

with value 0)

6

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Scanning text (cont'd):wsp ⇒ white space

eol ⇒ end of line

ltr ⇒ letter + {_} – {e E}

dgt ⇒ digit

exp ⇒ {e E}

aop ⇒ {$ % | ~}

rop ⇒ {# < >}

mop ⇒ {* & / \}

xop ⇒ {! ? ^}

pls ⇒ {+}

mns ⇒ {–}

apo ⇒ {'}

quo ⇒ {"}

bkq ⇒ {`}

com ⇒ {,}

per ⇒ {.}

col ⇒ {:}

eql ⇒ {=}

cat ⇒ {@}

lpr ⇒ {(}

rpr ⇒ {)}

lbr ⇒ {[}

rbr ⇒ {]}

lbc ⇒ {{}

rbc ⇒ {}}

ill ⇒ illegal

6

meaning of the character categories

7

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Scanning text (cont'd): check(allowed): if(ch = 0, false, allowed[ch_tab[ch]]); uncheck(allowed): if(ch = 0, false, not(allowed[ch_tab[ch]])); mask@list: { msk[siz]: false; for(k: 1, k:= k+1, not(k > size(list)), msk[list[k]]:= true); msk }; apo_allowed: mask(apo); apx_allowed: mask(apo,eol); bkq_allowed: mask(bkq,eol); dgt_allowed: mask(dgt); eql_allowed: mask(eql); exp_allowed: mask(exp); nam_allowed: mask(dgt,exp,ltr); opr_allowed: mask(aop,eql,mns,mop,pls,rop,xop); per_allowed: mask(per); quo_allowed: mask(quo); qux_allowed: mask(eol,quo); sgn_allowed: mask(pls,mns); wsp_allowed: mask(wsp,eol);

7

8

masks are vectors with true/false values for each

character category

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Scanning text (cont'd):

8

a function for each

character categorya dispatch

vector for each character category

selecting the correct

function for a given

character

9

aop_fun(): { ... };apo_fun(): { ... };

wsp_fun(): { ... };xop_fun(): { ... };

fun_tab: [ aop_fun, apo_fun, bkq_fun, cat_fun, col_fun, com_fun, dgt_fun, wsp_fun, rop_fun, ltr_fun, ill_fun, lbc_fun, lbr_fun, lpr_fun, ltr_fun, aop_fun, mop_fun, ill_fun, aop_fun, quo_fun, rbc_fun, rbr_fun, rop_fun, rpr_fun, smc_fun, wsp_fun, xop_fun ];

scan(): if(ch = 0, END_token, { fun: fun_tab[ch_tab[ch]]; fun() });

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

illustr

ation

col_fun(): { skip_ch(); if(check(eql_allowed), next_ch(CEQ_token), COL_token) }; com_fun(): next_ch(COM_token); dgt_fun(): { freeze(); until(uncheck(dgt_allowed), skip_ch()); if(check(per_allowed), fraction(), if(check(exp_allowed), exponent(), capture_number(NBR_token))) };

9

start and finish capture of characters representing

number

call auxiliary functions

10

Scanning text (cont'd):

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens:{! <NAM, fac>,<LPR>,<NAM, n>,<RPR>,<COL>,<NAM, if>, <LPR>,<NAM, n>,! <ROP, >>,<NBR, 1>,<COM>,<NAM, n>, <MOP, *>,<NAM, fac>,<LPR>,! <NAM, n>,<AOP, ->, <NBR, 1>,<RPR>,<COM>,<NBR, 1>,<RPR>,<END> }

[10, fac, [5, [[8, n]]], [11, if, [5, [[11, >, [5, [[8, n], [1, 1]]]], [11, *, [5, [[8, n], [11, fac, [5, [[11, -, [5, [[8, n], [1, 1]]]]]]]]]], [1, 1]]]]]

10

tokenized representation

abstract representation

11

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd):

ParserScanner

characterstream

tokenstream

abstractrepresentation

concreterepresentation

11

abstract grammar

concrete grammar

12

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd): <program> ::= <expression> <expression> ::= <invocation> <expression> ::= <invocation> : <expression> <expression> ::= <invocation> := <expression> <invocation> ::= <comparand> <invocation> ::= <invocation> <comparator> <comparand> <comparand> ::= <term> <comparand> ::= <comparand> <adder> <term> <term> ::= <factor> <term> ::= <term> <multiplier> <factor> <factor> ::= <reference> <factor> ::= <factor> <power> <reference> <reference> ::= <number> <reference> ::= <fraction> <reference> ::= <text> <reference> ::= <variable> <reference> ::= <prefix> <reference> ::= <application> <reference> ::= <apply> <reference> ::= <tabulation> <reference> ::= <subexpression> <reference> ::= <sequence> <reference> ::= <table>

12

this is a concrete

grammar for Pico

13

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd):

this is a concrete

grammar for Pico

<prefix> ::= <operator> <reference> <application> ::= <variable> ( ) <application> ::= <variable> ( <commalist> ) <apply> ::= <variable> @ <invocation> <tabulation> ::= <name> [ <expression> ] <subexpression> ::= ( <expression> ) <sequence> ::= { <semicolonlist> } <table> ::= [ ] <table> ::= [ <commalist> ] <commalist> ::= <expression> <commalist> ::= <expression> , <commalist> <semicolonlist> ::= <expression> <semicolonlist> ::= <expression> ; <semicolonlist> <variable> ::= <name> <variable> ::= <operator> <operator> ::= <power> <operator> ::= <multiplier> <operator> ::= <adder> <operator> ::= <comparator>

13

14

this is a concrete

grammar for Pico

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd): <scale> ::= #exponent# + <number> <scale> ::= #exponent# - <number> <scale> ::= #exponent# <number> <number> ::= #digit# <number> ::= #digit# <number> <fraction> ::= <number> . <number> <scale> <fraction> ::= <number> . <number> <fraction> ::= <number> <scale> <comparator> ::= #comparator# <operator> <adder> ::= #adder# <operator> <multiplier> ::= #multiplier# <operator> <power> ::= #power# <operator> <operator> ::= #operator# <operator> ::= #operator# <operator> <name> ::= #letter# <rest> <rest> ::= <rest> ::= #digit# <rest> <rest> ::= #letter# <rest> #letter# = { a ,..., z , A ,..., Z , _ } #digit# = { 0 ,..., 9 } #exponent# = { e , E } #comparator# = { < , = , > } #adder# = { + , - , | } #multiplier# = { * , / , \ , & } #power# = { ^ } #operator# = #comparator# + #adder# + #multiplier# + #power#

14

15

this is a concrete

grammar for Pico

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd): <expression> ::= <number> <expression> ::= <fraction> <expression> ::= <text> <expression> ::= <table> <expression> ::= <function> <expression> ::= <native> <expression> ::= <variable> <expression> ::= <application> <expression> ::= <tabulation> <expression> ::= <definition> <expression> ::= <assignment> <expression> ::= <void> <number> ::= NBR <number> <fraction> ::= FRC <fraction> <text> ::= TXT <text> <table> ::= TAB <table> <function> ::= FUN <identifier> <arguments> <expression> <dictionary> <native> ::= NAT <identifier> <function> <variable> ::= VAR <identifier> <application> ::= APL <identifier> <arguments> <tabulation> ::= TBL <identifier> <expression> <definition> ::= DEF <invocation> <expression> <assignment> ::= SET <invocation> <expression> <dictionary> ::= DCT <identifier> <expression> <dictionary> <void> ::= VOI <identifier> ::= <text> <arguments> ::= <table> <arguments> ::= <invocation> <invocation> ::= <variable> <invocation> ::= <application> <invocation> ::= <tabulation>

15

16

this is an abstract

grammar for Pico

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd): NBR_tag: 1; NBR(Val): [ NBR_tag, Val ]; NBR_VAL_idx: 2;

FRC_tag: 2; FRC(Val): [ FRC_tag, Val ]; FRC_VAL_idx: 2;

TXT_tag: 3; TXT(Val): [ TXT_tag, Val ]; TXT_VAL_idx: 2;

TAB_tag: 4; TAB(Tab): [ TAB_tag, Tab ]; TAB_TAB_idx: 2;

FUN_tag: 5; FUN(Nam, Par, Bod, Dct): [ FUN_tag, Nam, Par, Bod, Dct ]; FUN_NAM_idx: 2; FUN_PAR_idx: 3; FUN_EXP_idx: 4; FUN_DCT_idx: 5;

NAT_tag: 6; NAT(Nam, Nat): [ NAT_tag, Nam, Nat ]; NAT_Nam_idx: 2; NAT_NAT_idx: 3;

VAR_tag: 7; VAR(Nam): [ VAR_tag, Nam ]; VAR_NAM_idx: 2;

16

every abstract expression is

tagged

every abstract expression is composed of indexed parts

17

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd): APL_tag: 8; APL(Nam, Arg): [ APL_tag, Nam, Arg ]; APL_NAM_idx: 2; APL_ARG_idx: 3;

TBL_tag: 9; TBL(Nam, Idx): [ TBL_tag, Nam, Idx ]; TBL_NAM_idx: 2; TBL_IDX_idx: 3;

DEF_tag: 10; DEF(Inv, Exp): [ DEF_tag, Inv, Exp ]; DEF_INV_idx: 2; DEF_EXP_idx: 3;

SET_tag: 11; SET(Inv, Exp): [ SET_tag, Inv, Exp ]; SET_INV_idx: 2; SET_EXP_idx: 3;

DCT_tag: 12; DCT(Nam, Val, Dct): [ DCT_tag, Nam, Val, Dct ]; DCT_NAM_idx: 2; DCT_VAL_idx: 3; DCT_DCT_idx: 4;

VOI_tag: 13; VOI(): [ VOI_tag ];

17

18

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel tra

nscri

pt

read('123'):[1, 123]read('abc'):[7, abc]read('abc(1,2,3)'):[8, abc, [4, [[1, 1], [1, 2], [1, 3]]]]read('f(x): x'):[10, [8, f, [4, [[7, x]]]], [7, x]]read('t[360]: sin(h:= h+Pi/180)'):[10, [9, t, [1, 360]], [8, sin, [4, [[11, [7, h], [8, +, [4, [[7, h], [8, /, [4, [[7, Pi], [1, 180]]]]]]]]]]]]

18

a number

a variable

a call

a definition

a table

19

Parsing tokens (cont'd):

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd):{ tag => fun: [tag, fun];

else: 0;

case@clauses: { default: void; siz: size(clauses); max: 0; for(k: 1, k:= k+1, not(k > siz), { clause: clauses[k]; if(clause[1] = else, default:= clause[2], if(clause[1] > max, max:= clause[1], void)) }); tbl[max]: default; for(k: 1, k:= k+1, not(k > siz), { clause: clauses[k]; if(clause[1] = else, void, tbl[clause[1]]:= clause[2]) }); select(tag): if(tag > max, default, tbl[tag]) }

19

we will need a case statement!

build a table tbl of sufficient size to accept the case tags as indexes

store the clauses, and the eventual default in

tbl

return a function that looks up a tag in tbl and

returns the corresponding clause

20

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

vowel_test(ch): { vowel(ch): display(ch, ' is a vowel', eoln); consonant(ch): display(ch, ' is a consonant', eoln); char_fun: case(ord('a') => vowel, ord('A') => vowel, ord('e') => vowel, ord('E') => vowel, ord('i') => vowel, ord('I') => vowel, ord('o') => vowel, ord('O') => vowel, ord('u') => vowel, ord('U') => vowel, else => consonant); vowel_test(ch):= { fun: char_fun(ord(ch)); fun(ch) }; vowel_test(ch) } :<function vowel_test>vowel_test('Z'):Z is a consonantvowel_test('u'):u is a vowel

trans

cript

20

example

21

Parsing tokens (cont'd):

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

21

token: void; skip(): token:= scan()

next(Dat): { skip(); Dat }

22

Parsing tokens (cont'd): identity(Inv): Inv;

definition(Inv): DEF(next(Inv), expression());

assignment(Inv): SET(next(Inv), expression());

exp_case: case(COL_token => definition, CEQ_token => assignment, else => identity);

expression(): { inv: invocation(); cas: exp_case(token); cas(inv) }; read(Str): { init_scan(Str); token := scan(); expression() }

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

22

23

Parsing tokens (cont'd): operation(Opr, Tkn): { opd: Opr(); while(token = Tkn, { opr: next(scan_data); arg: [ opd, Opr() ]; opd:= APL(opr, TAB(arg)) }); opd }; factor(): operation(reference, XOP_token);

term(): operation(factor, MOP_token);

comparand(): operation(term, AOP_token);

invocation(): operation(comparand, ROP_token)

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd):

23

24

number(): NBR(next(scan_data));

fraction(): FRC(next(scan_data));

text(): TXT(next(scan_data));

ref_case: case(NBR_token => number, FRC_token => fraction, TXT_token => text, NAM_token => name, ROP_token => operator, AOP_token => operator, MOP_token => operator, XOP_token => operator, LPR_token => parentheses, LBC_token => braces, LBR_token => brackets, else => message); reference(): { cas: ref_case(token); cas() }

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

Parsing tokens (cont'd):

24

tab_str: 'tab'

begin_str: 'begin'

25

var_case: case(LPR_token => application, LBR_token => tabulation, CAT_token => apply, else => variable);

name(): { var: next(scan_data); cas: var_case(token); cas(var) };

parentheses(): { skip(); exp: expression(); if(token = RPR_token, skip(), message()); exp };

braces(): { skip(); APL(begin_str, list(SMC_token, RBC_token)) };

brackets(): { skip(); if(token = RBR_token, APL(tab_str, next(Empty)), APL(tab_str, list(COM_token, RBR_token))) }

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

25

26

Parsing tokens (cont'd): prefix(Var): { arg: [ reference() ]; APL(Var, TAB(arg)) };

opr_case: case(NBR_token => prefix, FRC_token => prefix, TXT_token => prefix, NAM_token => prefix, ROP_token => prefix, AOP_token => prefix, MOP_token => prefix, XOP_token => prefix, LPR_token => application, CAT_token => apply, LBR_token => tabulation, else => variable);

operator(): { opr: next(scan_data); cas: opr_case(token); cas(opr) }

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

26

27

Parsing tokens (cont'd):

application(Var): { skip(); if(token = RPR_token, APL(Var, next(Empty)), APL(Var, list(COM_token, RPR_token))) };

apply(Var): { skip(); ref: reference(); APL(Var, ref) };

tabulation(Var): { skip(); idx: expression(); if(token = RBR_token, skip(), message()); TBL(Var, idx) };

variable(Var): VAR(Var)

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

27

28

Parsing tokens (cont'd):

list(Sep, Trm): { loop(count): { exp: expression(); if(token = Sep, { skip(); tab: loop(count+1); tab[count]:= exp }, if(token = Trm, { skip(); tab[count]: void; tab[count]:= exp }, message())) }; TAB(loop(1)) }

Master of Applied Computer Science

Theo D’HondtFundamentals of Computer Science

Vrije Universiteit Brussel

msg_tab: [ 'additive operator', 'application', 'assignment', 'definition', 'comma', 'end of text', 'fraction', 'left brace', 'left bracket', 'left parenthesis', 'multiplicative operator', 'name', 'number', 'right brace', 'right bracket', 'relational operator', 'right parenthesis', 'semicolon', 'text', 'exponentiation operator' ]; message@any: error('Unexpected ', msg_tab[token])

28

29

Parsing tokens (cont'd):