+ All Categories
Home > Documents > S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of...

S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of...

Date post: 22-Dec-2015
Category:
Upload: thomasina-hardy
View: 223 times
Download: 1 times
Share this document with a friend
Popular Tags:
35
SYNTAX
Transcript
Page 1: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

SYNTAX

Page 2: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

2

Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract Syntax Trees Ambiguous Grammar Associativity and Precedence EBNFs and Syntax Diagrams

Nandigam

Page 3: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

3

Programming Language Specification

PLs require precise definitions (i.e. no ambiguity) Language form (Syntax) Language meaning (Semantics)

Consequently, PLs are specified using formal notation:

Formal syntax Tokens Grammar

Formal semantics Operational Denotational Axiomatic

Nandigam

Page 4: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

4

Lexical Structure of PLs

Nandigam

Page 5: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

5

Lexical Structure of PLs (cont.)

Main task of scanner: identify tokens Basic building blocks of programs E.g. keywords, identifiers, numbers, punctuation marks

Lexeme – an instance of a token. One can think of programs as strings of lexemes rather

than of characters A token of a language is a category of its lexemes (or

instances) Some tokens can have one or more lexemes

E.g. keyword, identifier, number In some cases, a token has only one single possible lexeme

E.g. equal_sign, plus_op, mult_op

Nandigam

Page 6: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

6

Lexical Structure of PLs (cont.)

Consider the following Java statement:index = 2 * count + 17 ;

The lexemes and tokens of this statement are:

Nandigam

Lexemes Tokens

index identifier

= equal_sign

2 int_literal

* mult_op

count identifier

+ plus_op

17 int_literal

; semicolon

Page 7: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

7

Lexical Structure of PLs (cont.)

Tokens in a programming language are described formally by regular expressions.

Regular expressions – descriptions of patterns of characters Regular expression operations

Basic operations Concatenation item sequencing Choice or selection | Repetition * Grouping ( )

Additional operations One or more repetitions + Range of characters [ - ] Optional ? Any character .

Nandigam

Page 8: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

8

Lexical Structure of PLs (cont.)

Regular expression examples (a|b)*c

String that match include ababaac, aac, bbc, c, and babc [0-9]+

Integer constants with one or more digits [0-9]+(\.[0-9]+)?

Floating-point literals [a-zA-Z][a-zA-Z0-9_]*

Identifiers

Nandigam

Page 9: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

9

Lexical Structure of PLs (cont.)

Scanners generators: lex, flex ANTLR – Another Tool for Language Recognition These programs can be used to generate a program

(i.e., a scanner) that can extract tokens from a stream of characters.

Many PLs provide good support for regular expressions – Java, C#, Perl, Ruby, …

Support for regular expressions in Java java.util.regex package split() method of String class

Nandigam

Page 10: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

10

Syntactic Structure of PLs Specifying the form of a programming language

Tokens Regular Expression

Syntax – organization of tokens Context-Free Grammars (CFGs)

Nandigam

Page 11: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

11

Context-Free Grammar Context-free grammars (CFGs) are used to describe

the syntax of PLs. Proposed by Noam Chomsky – a noted linguist

BNF (Backus-Naur Form) is a notation for describing syntax.

Proposed by John Backus and Peter Naur CFG and BNF are nearly identical and are used

interchangeably. BNF is a metalanguage for programming languages. A metalanguage is a language that is used to

describe another language.

Nandigam

Page 12: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

12

Context-Free Grammar (cont.)

CFG or BNF consists of a series of rules or productions. Productions are made up of:

Nonterminals – structures that are broken down into further structures

Terminals – things that cannot be broken down Metasymbols

Symbols that are part of CFG/BNF These are not actual symbols in the language being described Sometimes, a metasymbol is also an actual symbol in a language

One of the nonterminals is designated as the start symbol. The start symbol stands for the entire structure being

defined.

Nandigam

Page 13: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

13

Context-Free Grammar (cont.)

CFG/BNF Example (Figure 4.2, page 83)

(1) sentence → noun-phrase verb-phrase .(2) noun-phrase → article noun(3) article → a | the(4) noun → girl | dog(5) verb-phrase → verb noun-phrase(6) verb → sees | pets

Nandigam

Page 14: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

14

Context-Free Grammar (cont.)

The language of a CFG is the set of strings of terminals that can be generated from the start symbol by a derivation:

sentence noun-phrase verb-phrase . (rule 1) article noun verb-phrase . (rule 2) the noun verb-phrase . (rule 3) the girl verb-phrase . (rule 4) the girl verb noun-phrase . (rule 5) the girl sees noun-phrase . (rule 6) the girl sees article noun . (rule 2) the girl sees a noun . (rule 3) the girl sees a dog . (rule 4)

Nandigam

Page 15: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

15

Context-Free Grammar (cont.)

Derivation – Generating sentences of the language through a sequence of applications of rules (or productions), beginning with a special nonterminal called the start symbol.

Leftmost derivation – The replaced nonterminal is always the leftmost nonterminal.

Rightmost derivation – The replaced nonterminal is always the rightmost nonterminal.

A derivation may be neither leftmost nor rightmost. Derivation order has no effect on the language generated by a grammar.

Nandigam

Page 16: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

16

Context-Free Grammar (cont.)

A grammar for a small language

<program> → begin <stmt_list> end

<stmt_list> → <stmt> | <stmt> ; <stmt_list>

<stmt> → <var> := <expr><expr> → <var> + <var>

| <var> - <var>| <var>

<var> → A | B | C

Derive the following program:begin A := B + C ; B := C end

Is the language defined by this grammar finite or infinite?

Nandigam

Page 17: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

17

Context-Free Grammar (cont.)

Left recursive rule – A BNF rule is left recursive if the left-hand side (LHS) appears at the beginning of its right-hand side (RHS).

Right recursive rule – A BNF rule is right recursive if the LHS appears at the right end of the RHS.

Examples:number ® number digit | digitdigit ® 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9expr ® expr + expr

| expr expr | ( expr ) | number

Uses of recursion in BNF: to show repetition to describe complex structures

Nandigam

Page 18: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

18

Parse Trees A parse tree is a graphical representation of hierarchical syntactic

structure of sentences. It describes graphically the replacement process in a derivation.

A parse tree is labeled by nonterminals at interior nodes and terminals at leaves.

A parse tree better expresses the structure inherent in a derivation.

Nandigam

number

number

number

digit

digit

digit

2

3

4

expr

* expr

number

digit

4

expr

expr expr +

number number

digit digit

2 3

expr

( )

Page 19: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

19

Parse Trees (cont.)

Problem 1:

<assign> → <id> := <expr><expr> → <id> + <expr> | <id> * <expr> | ( <expr> ) | <id><id> → A | B | C

Show a leftmost derivation and a parse tree for each of the following statements:

A := A + ( B * C )A := B + C + AA := A * ( B + C )A := B * ( C * ( A + B ) )

Nandigam

Page 20: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

20

Parse Trees (cont.)

Problem 2:Describe, in English, the language defined by the following grammar:

<S> → <A> <B> <C><A> → a <A> | a<B> → b <B> | b<C> → c <C> | c

Problem 3:Consider the following grammar:

<S> → <A> a <B> b<A> → <A> b | b<B> → a <B> | a

Which of the following sentences are in the language generated by this grammar?

baabbbbabbbaaaaabbaab

Nandigam

Page 21: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

21

Parse Trees (cont.)

Problem 4:Consider the following grammar:

<S> → a <S> c <B><S> → <A> | b<A> → c <A> | c<B> → d | <A>

Which of the following sentences are in the language generated by the grammar?

abcdacccbdacccbccacdaccc

Nandigam

Page 22: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

22

Abstract Syntax Trees Parse trees are still too detailed in their structure, since

every step in a derivation is expressed as nodes Abstract Syntax Tree or (just syntax tree) shows the

essential structure of a parse tree. AST is more compact than the corresponding parse tree An (abstract) syntax tree condenses a parse tree to its

essential structure Language designers and translator writers are most

interested in abstract syntax. A programmer is most interested in concrete syntax Examples on the next two slides…

Nandigam

Page 23: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

23

Abstract Syntax Trees (cont.)

Nandigam

number

number

number

digit

digit

digit

2

3

4

2

3

4

Parse Tree Corresponding AST

Page 24: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

24

Abstract Syntax Trees (cont.)

Nandigam

expr

* expr

number

digit

4

expr

expr expr +

number number

digit digit

2 3

expr

( )

*

4 +

2 3

Parse Tree Corresponding AST

Page 25: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

25

Ambiguous Grammars A grammar is ambiguous if it is possible to construct two or more distinct

parse trees for the same string Example:

Grammar:expr ® expr + expr | expr expr

| ( expr ) | NUMBER Expression: 2 + 3 * 4 Parse trees – ambiguity in operator precedence

Nandigam

expr

expr expr

expr

+

* expr

expr

expr

+

* expr

expr expr NUMBER (2)

NUMBER (3)

NUMBER (4)

NUMBER (2)

NUMBER (3)

NUMBER (4)

Page 26: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

26

Ambiguous Grammars (cont.)

Another Example: Grammar:

expr ® expr + expr | expr expr | ( expr ) | NUMBER

Expression: 2 - 3 - 4 Parse trees – ambiguity in operator associativity

Nandigam

expr

expr expr

expr

-

- expr

expr

expr

-

- expr

expr expr NUMBER (2)

NUMBER (3)

NUMBER (4)

NUMBER (2)

NUMBER (3)

NUMBER (4)

Page 27: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

27

Ambiguous Grammars (cont.)

Ways to resolve ambiguities in a grammar Revise grammar – desired approach Provide disambiguating rule (semantic help)

Revising grammar to address precedence and associativity ambiguities Do not write rules that allow a parse tree to grow on both left and right sides Use left recursive rules for left-associative operators Use right recursive rules for right-associative operators Add new rules that establish “precedence cascade” between rules to specify

precedence Make sure operators with higher precedence appear lower in the cascade of

rules Revised grammar

expr ® expr + term | termterm ® term * factor | factorfactor ® ( expr ) | NUMBER

Nandigam

Page 28: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

28

Ambiguous Grammars (cont.)

Problem 1:

<expr> → <expr> + <expr> | <expr> - <expr> | <expr> * <expr> | <expr> / <expr> | ( <expr> ) | NUMBERNUMBER = [0-9]+

Show that this grammar is ambiguous by constructing two distinct parse trees for each of the following expressions:30 + 5 + 230 – 5 – 230 * 5 * 230 / 5 / 230 + 5 * 2

Nandigam

Page 29: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

29

Ambiguous Grammars (cont.)

Revised unambiguous grammar

<expr> → <expr> + <term> | <expr> - <term> | <term>

<term> → <term> * <factor> | <term> / <factor> | <factor>

<factor> → ( <expr> ) | NUMBER

NUMBER = [0-9]+

Nandigam

Page 30: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

30

Ambiguous Grammars (cont.)

Problem 2:

Show that the following grammar is ambiguous:

<S> → <A><A> → <A> + <A> | <id><id> → a | b | c

Nandigam

Page 31: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

31

Ambiguous Grammars (cont.)

Are there other alternatives to resolving ambiguities? Yes, but they change the language!

Fully-parenthesized expressions:

expr ® ( expr + expr ) | ( expr - expr ) | NUMBER

Prefix expressions:

expr ® + expr expr | - expr expr | NUMBER

Nandigam

Page 32: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

32

Extended BNF Adds new metasymbols (or operations) to BNF to

enhance readability and writability. These new extensions do not enhance the descriptive

power of BNF. It facilitates development of parsing tools based on an

approach called Recursive-Descent Parsing.

New metasymbols added to EBNF:

{ } zero or more repetitions [ ] optional parts ( | ) multiple-choice

Nandigam

Page 33: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

33

Extended BNF (cont.)

Examples:BNF: <number> → <number> <digit> | <digit>EBNF: <number> → <digit> {<digit>}

BNF: <expr> → <expr> + <term> | <term>EBNF: <expr> → <term> {+ <term>}

BNF: <expr> → <term> ^ <expr> | <term>EBNF: <expr> → <term> [^ <expr>]

BNF: <selection> → if <logic-expr> then <statement>| if <logic-expr> then <statement> else <statement>

EBNF <selection> →if <logic-expr> then <statement> [else <statement>]

BNF: <for-stmt> → for <var> := <expr> to <expr> do <statement> | for <vat> := <expr> downto <expr> do <statement>

EBNF: <for-stmt> → for <var> := <expr> (to | downto) <expr> do <stmt>

Nandigam

Page 34: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

34

Extended BNF (cont.)

More examples:BNF: <expr> → <expr> + <term> | <term>

<term> → <term> * <power> | <term> / <power> | <term> % <power> | <power>

<power> → <factor> ^ <power> | factor<factor> → (<expr>) | NUMBERNUMBER = [0-9]+

EBNF: <expr> → <term> {+ <term>}<term> → <power> { * <power> | / <power> | %

<power> }<power> → <factor> [^ <power>]<factor> → (<expr>) | NUMBERNUMBER = [0-9]+

Nandigam

Page 35: S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.

35

Syntax Diagrams A graphical representation for a grammar rule An alternative to EBNF Circle or ovals for terminals Squares or rectangles for nonterminals Terminals and nonterminals are connected with lines and arrows Visually appealing but takes up space Rarely seen any more: EBNF is much more compact

Nandigam

if-statement expression

statement

if ( )

else statement


Recommended