Allyson M. Hoss, January 28, 2008 CSC 7101 Programming Language Structures Spring 2008 Louisiana...

Post on 27-Dec-2015

216 views 4 download

Tags:

transcript

Allyson M. Hoss,

January 28, 2008

CSC 7101 Programming Language Structures

Spring 2008

Louisiana State University

CSC 7101 Programming Language Structures

• Research Assignment

• Miscellaneous Issues

• PL Design Goals

• Syntax & Semantics

• Attribute Grammars

Topic & References

• Sources - published papers (incl. IEEE / ACM)

(not: class websites; blogs; advertisements)

• Limit sources from Wikipedia

• Trouble finding or accessing papers?

• Do your own research!

• Your research is YOURS!

• Example Topic Description

Example of a Good Topic Description

Topic: Parallel Programming Languages

Focus: Usability on today's multi-core and

future multi-core / multi-processor systems

Approach: Compare & contrast programming languages,

strengths and weaknesses w.r.t. multi-core processors

Focus: Determine if current Software Development Toolkits

built on top of existing languages are better suited

and easier to use

Systems: IBM's X10 language, Cray Inc's Chapel,

possibly Sun's Fortress language. 

Toolkits: OpenMP and Intel's Threading Blocks

Guidelines for Outline

• Topic sentence(s)

• Focus (narrowing the topic)

• Approach (your review will take)

• Address PL design goals of your topic

• Include a comparative analysis table/diagram of related research organized based on your approach

• Address open issues / research directions

• Review syntax briefly … focus on semantics

• Minimal review of HW

CSC 7101 Programming Language Structures

• Research Assignment

• Miscellaneous Issues

• PL Design Goals

• Syntax & Semantics

• Attribute Grammars

Teaching Assistant

• John W. Burris

• Office Hours

Monday 11:00 AM - 12:30 PM

Tuesday 11:00 AM - 1:00 PM

• Thursday by appointment

(with at least 12 hours notice)

• Coates 162

Website

My main home page will be:

http://www.csc.lsu.edu/~hoss/index.html

Additional Reading Material

Slonneger, K. and Kurtz, B., Formal Syntax and Semantics of Programming Languages:A Laboratory-Based Approach

Addison-Wesley, Reading, MA,

ISBN: 0-201-65697-3, 1995.http://www.cs.uiowa.edu/~slonnegr/plf/Book/

Please Read:

Chapter 1, pp. 1-8; 21-29

Chapter 3, pp. 59-71

CSC 7101 Programming Language Structures

• Research Assignment

• Miscellaneous Issues

• PL Design Goals

• Syntax & Semantics

• Attribute Grammars

Design Questions

• What design decisions make each language different from the others?

• Are these differences a result of minor syntactic rules or important underlying semantic issues?

• Is a controversial design decision necessary to make the language appropriate for its intended use or was the design decision an accident?

Design Questions

• Could different design decisions result in a language with more strengths and fewer weaknesses?

• Are the good parts of different languages mutually exclusive or could they be efficiently combined?

• Can a language be extended to compensate for its weaknesses?

Design Goals

What do

you think

are

some

design goals?

Design Goals

• Initially: time (execution) vs space (memory)

• Next: simplicity, expressiveness, generality

• Then: reliability, maintainability, efficiency

Design Goals

What do you think

are the

current goals

today?

Design Goals NOW

• Simplicity : easy to learn, use, understand

• Robustness: security, safety

(strongly typed; restricts ptrs)

• Portability: architectures

(run-time bytecode interpreters)

• Internet Compatibility: access SW anywhere

(class libraries)

• Concurrency: multi-interaction

(multi-threading & conc.primitives)

Procedural or ImperativeFunctional DeclarativeObject-oriented

Rule-based, Event-driven, Parallel or Concurrent,Scripting, Markup, Specification, Assembly, Visual, …

Programming Language Paradigms

• Computation based on command such as

do this, do that, do the next thing

• Variables represent memory locations

• Assignment statements store values

• Destructive assignment

• Uses iteration for repetition

• Acts on stored data, modifies system state

• Assembly, Fortran, COBOL, C, C++, Java

Procedural or Imperative Paradigm

• View programs as function definitions and sets of expressions

• Computation based on math functions -“back boxes” accepts inputs and returns outputs

• Apply functions to arguments

• Minimal use of variable or assignment statements

• No extraneous side effects

• Natural recursion (primary form of repetition)

• Lisp, Scheme, ML

Functional Paradigm

• Not procedural : commands describe what is

and not how to

• Computation using symbolic logic, facts, and rules

• Typically interpreted

• Looping via recursion

• Prolog

Declarative (Logic) Paradigm

• Data abstraction : object & methods

• Information hiding

• Classes, class hierarchy, instances of classes

• Computation via interaction of objects

• Inheritance

• Smalltalk

• OO features incorporated into most modern PL…C++, C#, Java

Object-Oriented Paradigm

• Language: set of strings (infinite?) symbols from a finite alphabet

• Language Specification:• Syntax:

arrangement of symbols well-formedness (not ambiguous; values defined wellgrammar

• Semantics: meaning of syntactically valid stringsrelationship between input and output steps of program executionrules for legal programs – often syntax can not describe

• Pragmatics – extra information Usage of the language (ease of use, efficiency)Features of the implementation (optimization)

Defining a PL

CSC 7101 Programming Language Structures

• Research Assignment

• Website

• Additional Reading Material

• Design Goals

• Syntax & Semantics

• Attribute Grammars

Syntax & Semantics

• Syntax

how does a program “look”

form and structure of language constructs

(programs, procedures, statements, …)

• Semantics

what do the language constructs “do”

meaning (behavior) of the syntactic units

(what does an “if” statement “do”)

Syntax & Semantics

• Syntax

grammar of a natural language statement

a set of rules to define a language

• Semantics

meaning of a natural language statement

English Grammar

<sentence> ::= <noun phrase> <verb phrase>

A sentence is a noun phrase

followed by a verb phrase

::= can be read

“is defined to be” or

“is composed of”

also written

English Grammar

<sentence> ::= <noun phrase> <verb phrase>

<noun phrase> ::= <determiner> <noun>

| <determiner> <noun> <prepositional phrase>

<verb phrase> ::= <verb> | <verb> <noun phrase>

| <verb> <noun phrase> <prepositional phrase>

<prepositional phrase> ::= <preposition> <noun phrase>

<noun> ::= boy | dog | leash | ball

<determiner> ::= a | the

<verb> ::= walked | threw

<preposition> ::= with | to 

::= can be read “is defined to be” or “may be composed of” also written

Parse Tree

< n o u n >

< n o u n p h ra se >

< d e t >

< sen ten c e >

< v erb p h ra se >

< n o u n > < v erb > < p rep p h ra se >

< p rep >

< n o u n p h ra se >

< d e t > < n o u n p h ra se >

< d e t > < n o u n >

< . >

th e b o y w a lk ed

th e d o g w ith

a lea sh

Parse Tree alone can not validate semantics

< n o u n >

< n o u n p h ra se >

< d e t >

< sen ten c e >

< v erb p h ra se >

< n o u n > < v erb > < p rep p h ra se >

< p rep >

< n o u n p h ra se >

< d e t > < n o u n p h ra se >

< d e t > < n o u n >

< . >

th e d o g th rew

th e b o y to

th e b a ll

FORMAL FORMAL

SYNTAX SEMANTICS

Static Dynamic

BNF attribute grammars operational

(Backus axiomatic

Naur denotational

Form)

Formal Syntax

• Formal Translation Models

• Grammar – a formal definition of syntax

• Types of Grammars (0..3)

• BNF

Formal Languages

• Language: set of strings containing symbols from alphabet

• What strings can you form over the alphabet {a, b}

1.{abbb}

2.{baa, baaa, baaaa, baaaaa, . . . }

3.{ab, aabb, aaabbb, aaaabbbb, aaaaabbbbb, . . . }

4.{aba, abba, abbba, …}

Definition of a formal language

model generates & recognizes all (and only) strings of a formal language

A programming language grammar is used to parse a program producing a parse tree.

It contains every symbol in the input program as well as all sets of symbols used in the program's derivation

Important role in the design and implementation of programming languages

Formal Languages

Grammar

• Required to define a formal language

• Alphabet: finite set Σ of symbols

• String: finite sequence of symbols

Empty string Σ* - set of all strings over Σ (incl. )Σ+ - set of all non-empty strings over Σ

• Language: set of strings L Σ*

• Set of Rules to determine legal strings

Grammars

• G = (T, N, S, P)

• Finite set of terminal symbols T

• Finite set of non-terminal symbols N

• Starting non-terminal symbol S N

• Finite set of productions P

x y (x ::= y)

x (N T)+, y (N T)*

• Applying a production: uxv uyw

Grammars

G = (T, N, S, P)terminal symbols : symbols (words) of an alphabet (word

set) from which strings of the language can be created; {a, b, c...}

nonterminal symbols : symbols describing sets of strings (syntactic categories) ; {A, B, C...}

start symbol S: marks starting point for string derivations; unique in the grammar;

productions : rules describing how each nonterminal is defined in terms of terminal symbols and nonterminals; ordered pairs of strings (x, y) such that x y (x ::= y)

Grammar

• String derivation – sequence of rule application

• w1 w2 … wn; denoted w1 wn

• Language generated by a grammar

• L(G) = { w T* | S w }

• Traditional classification

• Regular

• Context-free

• Context-sensitive

• Unrestricted

Chomsky Hierarchy

T yp e 0 : R ec u rs iv e ly E n u m era b le L a n g u a g es m o st u nrest r ic ted ; reco g nized by T u rn ing m achine

T yp e 1 : C o n tex t-S en sitiv e L a n g u a g es reco g nized by linea r-bo u nd au to m ata

T yp e 2 : C o n tex t-F ree L a n g u a g es m o st P L ; reco g nized by p u sh-d o w n au to m ata

T yp e 3 : R e g u la r L a n g u a g e s m o st re st r ic ted ; re c o gn iz e d b y fin ite au to m ata

Regular Languages (Type 3)

• Most restricted

• LHS is a single non-terminal

• RHS has exactly one terminal and at most one nonterminal

• All productions are A wB and A w

A,B N and w T*Or all productions are A Bw and A w

Regular Languages Examples (Type 3)

• L = { anb | n > 0 } is a regular language

S Ab and A a | Aa

• What are the strings that can be generated using this language?

{ab, aab, aaab, …}

Regular Languages Examples (Type 3)

• Binary numerals

B 0B 1B 0 BB 1 B

Uses of Regular Grammar

• Lexical analysisLexical analysis in compilers

e.g. identifier = letter (letter|digit)*

Token sequence for syntactic analysissyntactic analysis done by parser

tokens = terminals for CFG

• Pattern matchinggrep “a\+b” foo.txtEvery line from program that contains a string from the

language L = { anb | n > 0 }i.e. the language for reg. expr. a+b

Context-Free Languages (Type 2)

• LHS must be a single nonterminal ;

• All productions are xAy --> xzy

or xAy --> xZy

A, Z N and z T* and x,y =

• A can be rewritten by the strings z or Z on the right regardless of the context in which A finds itself

• A z, A Z, Z z

Context-Free Languages (Type 2)

Example:

L1 = { anbn | n > 0 } is c.f. but not regular

L2 = {axby; x>0, y>0}

What are the strings that can be generated using L1 and L2 languages?

L1 {ab, aabb, aaabbb, …}

L2 {ab, aab, aaab, …, abb, abbb, abbb, … aabb, aabbb, …}

Context-Free Languages (Type 2)

S → ABS → ASBA → aB → b

S AB aB abS ASB aSB aABB aaBB aabB aabb This grammar can be simplified by removing the nonterminals A

and B, leaving just two rewrite rules:S abS aSb

Uses of Context-Free Languages

• Describe the essential features of all current PLs

• Syntax of a programming language• e.g. Java

• Terminals: identifiers, keywords, literals, separators, operators

• Starting non-terminal: CompilationUnit

• Implementation of most parsers in a compiler to determine syntactic structure and produce CFG parse trees

• Backus-Naur Form (BNF) : alternative notation for context-free grammars; John Backus and Peter Naur, for ALGOL60

Limitations of Context-Free Languages

• Cannot represent semantics

• e.g. “every variable used in a statement should be declared in advance”

• e.g. “the use of a variable should conform to its type” (type checking)

• cannot say “string s1 divided by string s2”

• Solution: attribute grammars

For certain kinds of semantic analysissemantic analysis

Context-Sensitive Languages (Type 1)

• RHS contains no fewer symbols than LHS

• All productions are xAy --> xzy

A N and x,y,z T* and z · ≠ Ø

• A can be rewritten by z only when it is in the context of x and y

(when the string x precedes N and the string y follows it)

• Example Rule

ABC AbbC

Context-Sensitive Languages (Type 1)

• Example language

L = { anbncn | n >= 1}

• More powerful than context-free grammars

• All context-free languages are also context-sensitive

• Not all context-sensitive languages are context-free

Recursively Enumerable Languages (Type 0)

• No restrictions – most general grammar (linguists find useless)

aYb bY Y N and a,b T*

• Language accepted by a Turning machinea general model of computation(a finite-state machine in which each transition prints a symbol on a tape. – The tape head can move in either direction. – The tape is infinite to the right)

• models a human being solving a problem in an algorithmic way

Using Grammars

How do we represent

graphically

a sequence of productions

from a formal grammar?

Derivation Tree

< n o u n >

< n o u n p h ra se >

< d e t >

< sen ten c e >

< v erb p h ra se >

< n o u n > < v erb > < p rep p h ra se >

< p rep >

< n o u n p h ra se >

< d e t > < n o u n p h ra se >

< d e t > < n o u n >

< . >

th e b o y w a lk ed

th e d o g w ith

a lea sh

Derivation Tree

• Derivation tree = parse tree• Leaf nodes: terminals• Inner nodes: non-terminals• Root: starting non-terminal of the grammar

• Describes a particular way to derive a string • Leaf nodes from left to right are the string• To get the string: depth-first traversal, following the

leftmost unexplored branch

• Begins with Start symbol and replace one nonterminal at a time by its corresponding right-hand side in some production for that nonterminal

Derivation Tree

• Types of Parsing:

Top-down parsing

Bottom-up parsing

Depth-first left-corner parsing

Derivation Tree

• Top-down parsing : starts from the start symbol (S) and works down to the leaves

+ Only builds trees that are rooted in S

− Wastes time building trees that don’t match the input

Derivation Tree

• Bottom-up parsing : starts from the leaves and works up to the start symbol (S)

+ Only builds trees that match the input

− Wastes time building trees that will never lead to S

Derivation Tree

• Depth-first left-corner parsing : combines the best of both types of parsing:

+ Only build trees that are rooted in S

+ Only build trees that match the input

Derivation Sequence

•Each tree represents a set of derivation sequences

•The tree “filters out” the choice of order of production application

•Filtering out the order

•Leftmost derivation: expand leftmost non-terminal

•Rightmost derivation: expand rightmost non-terminal

•A derivation may be neither leftmost nor rightmost

Backus-Naur (Normal) Form (BNF)

• Describes for Context-Free Languages • MetaLanguage to describe most PL syntax• John Backus and Peter Naur• Algol-60• Essential in compiler construction Guides the parser Should not be ambiguous Although a parser may not produce a derivation tree, the

structure of the tree is embodied in the parsing process

Backus-Naur (Normal) Form (BNF)

Special Symbols

<...> nonterminals <expression>

Terminals use no special symbols if, while, (

::= is defined as / composed of

| alternatives <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

BNF example

<stmt> ::= while <exp> do <stmt>

| if <exp> then <stmt>

| if <exp> then <stmt> else <stmt>

| <exp> := <exp>

| <id> ( <exps> )

<exps> ::= <exp> | <exps> , <exp>

* note: recursion with <stmt>

Limitations of BNF

• Describes only syntax, not semantics

• Does not specify implementation details

• Difficult to impose length limitations

(e.g. maximum length of variable names)

• Impossible to impose requirements such as a variable must be declared before it is used

• Can not indicate issues such as

blank lines in a program

one statement spans multiple lines

• BUT … nothing better yet

Extended BNF (EBNF)

[...] options (0 or 1 occurrences) <stmt> ::= if <cond> then <stmt> [ else <stmt>]

{...} repetition (0 or more occurrences) <unsigned> ::= <digit> {<digit>}

BNF EBNF

<a> ::= <b> | <a> ; <b> <a> ::= <b> {;<b>}

<a> ::= x <c> <a> ::= x <b> {, <b>}

<c> ::= <b> | <c>, <b>

Ambiguous Grammar

• Generates two or more distinct parse trees for the same string

• Unambiguous grammars required so compiler can produce correct code(parse tree provides precedence and associativity

of operators)

Ambiguous Grammar

< n o u n >

< n o u n p h ra se >

< d e t >

< sen ten c e >

< v erb p h ra se >

< n o u n > < v erb > < p rep p h ra se >

< p rep >

< n o u n p h ra se >

< d e t > < n o u n p h ra se >

< d e t > < n o u n >

< . >

th e b o y w a lk ed

th e d o g w ith

a lea sh

Ambiguous Grammar

< n o u n p h ra se >

< d e t >

< sen ten c e >

< v erb p h ra se >

< n o u n > < v erb > < n o u n p h ra se >

< . >

th e b o y w a lk ed < n o u n >< d e t >

th e d o g

< p rep p h ra se >

< p rep > < n o u n p h ra se >

< d e t > < n o u n >w ith

a lea sh

Ambiguous Grammar

• One famous ambiguity is “dangling else”<stmt> ::= if <cond> then <stmt> [else <stmt>]

• Solve syntactically by adding nonterminals & productions

<stmt> ::= <matched> | <unmatched>

<matched> ::= if <cond> then <matched> else <matched>

<unmatched> ::= if <cond> then <stmt> | if <cond> then <matched> else <unmatched>

• Solve semantically by adding constraint “elses are associated with immediately preceding unmatched then”

Syntax Graphs

• Are equivalent to CFGs

• Terminals in circles

• Non-terminals in rectangles

• Lines and arrows indicate how constructs are built

Syntax Graphs

Part of the Context-Free Syntax for Mini-Language Core in BNF

<program> ::= program <declaration-sequence> begin <statement-seqeuence> end ;

<declaration-sequence> ::= <declaration>                            | <declaration> <declaration-sequence>

<declaration> ::= <identifier-list> : integer ;

<identifier-list> ::= <identifier> | <identifier> , <identifier-list>

Syntax Graphs

p ro g ram d ec la ra tio n endb eg in s ta tem en t ;

id en tifie r : in teg e r

,

;

S y n ta x G ra ph for M in i-L a n g u a g e C ore

D ecla ra tion S y n ta x G ra ph

Formal Semantics

• Static

Attribute Grammars

• Dynamic

Operational

Axiomatic

Denotational

Formal Semantics - Static

• Not all program properties can be checked by a context free parser

• Context free parsing can be extended with attributes

• Useful to specify things BNF can not

• Determined at compile time

• Attribute Grammars

CSC 7101 Programming Language Structures

• Research Assignment

• Website

• Additional Reading Material

• Design Goals

• Syntax & Semantics

• Attribute Grammars

Attribute Grammars

• Extension of CFG

• Provides context-sensitive information such as declarations and type checking to facilitate semantic checking e.g. boolean state information to help control the parsing process itself symbol table information

• Adds attributes (typed values) to some nonterminalsSynthesized & Inherited

• Each attribute has a domain of possible values

Attribute Grammars

• Functions added to productions to assign values to the attributes

• Attributes evaluated in assignments or conditions during parse tree walk

• Conditions to reject invalid parse trees

• Evaluation order depends on attribute dependencies

Attribute Grammars

• Algorithms exist to test for the circularity of attribute dependencies in an attribute grammar

• Incorporated into some parser generator tools

Yacc, for example, is not attribute based, but provides a mechanism for accessing the results of child nodes in the parse tree when performing a reduction

• Synthesized attribute

gets its values from the attributes attached to the children of its nonterminal.

• Inherited attribute

gets its values from the attributes attached to the parent (or siblings) of its nonterminal

Synthesized vs Inherited Attributes

Synthesized vs Inherited Attributes

S

t

SYN INH A

Evaluation Rules

•Synthesized attribute associated with N:Each alternative in N’s production should

contain a rule for evaluating the attribute

• Inherited attribute associated with N:for every occurrence of N on the right-hand

side of any alternative, there must be a rule for evaluating the attribute

Attribute Grammar Example

• L = { anbncn | n > 0 }; not context-free

• BNF<start> ::= <A><B><C>

<A> ::= a | a<A>

<B> ::= b | b<B>

<C> ::= c | c<C>

• Attributes (Value domain = integers )

Na: associated with <A>

Nb: associated with <B>

Nc: associated with <C>

Evaluation

• Evaluation rules (similar for <B>, <C>)

<A> ::= a

Na(<A>) := 1

| a<A>

Na(<A>) := 1 + Na(<A>2)

• Conditions

<start> ::= <A><B><C>

Cond: Na(<A>) = Nb(<B>) = Nc(<C>)

Alternative notation: <A>.Na

Parse Tree

Na:1

Na:2

Nc:1Nb:1

Cond:true

Nc:2Nb:2

<start>

<A> <B> <C>

a <A> b <B> c <C>

a b c

Parse Tree for an Attribute Grammar

• Valid tree for the underlying BNF

• Each node has a set of (attribute,value) pairs

One pair for each attribute associated with the terminal or non-terminal in the node

• Some nodes have boolean conditions

• Valid parse treeAttribute values conform to the evaluation rulesAll boolean conditions are true

Example: Binary Numbers

• Context-free grammar

For simplicity, will use X instead of <X>

B ::= D

B ::= D B

D ::= 0

D ::= 1

Goal: compute the value of a binary number

BNF Parse Tree for Input 1010

B

B

B

B

D

D

D

D

1

0

0

1

Add attributes

B: synthesized val

B: synthesized pos

D: synthesized val

D: inherited pow

Evaluated Parse Tree

B

B

B

B

D

D

D

D

1

0

0

1

pos:4 val:10

pos:3 val:2

pos:2 val:2

pos:1 val:0

pow:0val:0

pow:1val:2

pow:2val:0

pow:3val:8

No

Class

Next Week