PL Features: Type Systems - University of Alaska Fairbanks fileLecture Slides Monday, February 20,...

PL Features: Type Systems

CS F331 Programming LanguagesCSCE A331 Programming Language ConceptsLecture SlidesMonday, February 20, 2017

Glenn G. ChappellDepartment of Computer ScienceUniversity of Alaska [email protected]

© 2017 Glenn G. Chappell

ReviewOverview of Lexing & Parsing

Two phases:§ Lexical analysis (lexing)§ Syntax analysis (parsing)

The output of a parser is often an abstractsyntax tree (AST). Specifications of these can vary.

20 Feb 2017 CS F331 / CSCE A331 Spring 2017

ParserLexemeStream

ASTor Error

cout << ff(12.6);

id op id litop

punctop

expr

binOp: <<

expr

id: cout funcCall

expr

id: ff

numLit: 12.6

expr

LexerCharacter

Streamcout << ff(12.6);

Parsing

2

ReviewIntroduction to Syntax Analysis — Categories of Parsers

Parsing algorithms can be divided into two broad categories.Top-Down Parsing Algorithms

§ Go through derivation from top to bottom, expanding nonterminals.§ Usually produce a leftmost derivation.§ Important subclass: LL (read Left-to-right, Leftmost derivation).

§ Common reason a top-down parser may not be LL: doing lookahead.§ Often hand-coded.§ Example algorithm we will look at: Recursive Descent.

Bottom-Up Parsing Algorithms§ Go through the derivation from bottom to top, collapsing substrings.§ Usually produce a rightmost derivation.§ Important subclass: LR (read Left-to-right, Rightmost derivation).

§ Common reason a bottom-up parser may not be LR: doing lookahead.§ Almost always automatically generated.§ Example algorithm we will look at: Shift-Reduce.

20 Feb 2017 CS F331 / CSCE A331 Spring 2017 3

ReviewIntroduction to Syntax Analysis — Categories of Grammars

Grammars that LL parsers can use are LL grammars.Grammars that LR parsers can use are LR grammars.


All Grammars

LR Grammars

CFGs

LL Grammars

4

ReviewRecursive-Descent Parsing [1/2]

Recursive Descent is a top-down, LL parsing algorithm.§ There is one parsing function for each nonterminal.§ A parsing function is responsible for parsing all strings that its

nonterminal can be expanded into.

The natural grammar for a left-associative binary operator is not an LL grammar. But we can usually transform it appropriately.

Bad (has left recursion)e → t

| e ( “+” | “-” ) t


Okaye → t { ( “+” | “-” ) t }

5

ReviewRecursive-Descent Parsing [2/2]

A parser generally returns an abstract syntax tree (AST). We specify the format of an AST for each line in our grammar. It is helpful to including information in our AST telling what kind of thing each node represents.

Expression: a + 2

AST:

Lua representation: {{ BIN_OP, "+" },{ ID_VAL, "a" },{ NUMLIT_VAL, "2" }}


binOp: +

id: a numLit: 2

6

ReviewShift-Reduce Parsing [1/3]

Shift-Reduce is a bottom-up, LR parsing algorithm.§ It is table-driven.§ Shift-Reduce is no longer used. However, a number of similar—but

more complicated—algorithms are heavily used.

A Shift-Reduce parser is a state machine with an associated stack.§ A stack item holds a symbol—terminal or nonterminal—and a state.§ The current state is the state in the top-of-stack item.§ The table has two parts: action table (rows are states, columns

are terminals) and goto table (rows are states, columns are nonterminals).

Operation§ Begin by pushing an item holding the start state (and any symbol).§ At each step, do a lookup in the action table, using the current state

and the current input symbol. Do what the action table entry says.



Action Table Entries

S# [# is the number of a state]Shift—Push item: current symbol + given state. Advance input.

R# [# is the number of a production]Reduce—Pop RHS of given production. Push LHS + state from goto table (lookup: state before push + LHS nonterminal).

ACCEPTTerminate: syntactically correct.

ERROR (blank table cell)Terminate: syntax error.



Writing a Shift-Reduce Parsing Table§ Begin by splitting up grammar productions as much as possible.§ Productions and states should be numbered.§ When do we add a new state? Two situations can be handled by the

same state if they would react identically to all future input.§ When working on a state, it can be helpful to keep in mind a short

input string that would put us into this state, as well as what it means to be in this state.

§ Every time we write a reduce operation, the relevant entries need to be added to the goto table.

See the February 17 slides for the Shift-Reduce parsing table constructed in class.


ReviewParsing Wrap-Up — Lookahead & Parser Categories

An LL(k) parser (k is a number) is one that follows the basic ideas of LL parsers, but looks at the next k lexemes. So LL(1) is the same as LL. A parser that looks one lexeme farther is LL(2). An LL(2) grammar is one that can be used by such a parser.

An LL(2) language is a language that is generated by some LL(2) grammar. There are LL(2) languages that are not LL(1) languages. So adding lookahead to an LL parser can make it more powerful.

We can similarly talk about LR(k), but here lookahead does not allow us to parse more languages.

A common parsing method is LALR, which works much like Shift-Reduce, but does a kind of lookahead during the generation of the parsing table. This allows for greater efficiency, while restricting the set of grammars that can be used.


ReviewParsing Wrap-Up — Efficiency of Parsing

Practical parsing algorithms:§ Cannot handle all CFLs.§ Run in linear time.

This includes Recursive Descent and Shift-Reduce.

There are parsing algorithms that can handle all CFLs.§ The Cocke-Younger-Kasami Algorithm (CYK) requires O(n3)

single-lexeme operations.§ Valiant’s Algorithm is a variation that relies on matrix

multiplication. With currently known matrix multiplication algorithms, the running time of Valiant’s Algorithm is O(nk), for some k in the range 2.372–3.0, depending on the matrix multiplication algorithm used. This k will never be less than 2.

These are mostly theoretical curiosities.


PL Features: Type SystemsBasic Concepts [1/3]

These slides are an incomplete summary of the reading “A Primer on Type Systems”.

A type system is a way of classifying entities in a program by the kinds of values they represent, in order to prevent undesirable program states.

The classification assigned to an entity is its type.

int abc;

abc = 123 + 456;

cout << 4.2;


In C++, int is a type.abc is a variable of type int.

123 and 456 are literals of type int.

123 + 456 is an expression of type int.

4.2 is a literal of type double.

cout is a variable of type std::ostream.

12


The great majority of PLs include some kind of type system.

In the past, many PLs had a fixed set of types. Many modern PLs have an extensible type system: one that allows programmers to define new types.

class Zebra { // New type named "Zebra"…

Type checking means checking & enforcing the restrictions associated with a type system.

The various actions involved with a type system (determining types, type checking) are collectively known as typing.



Types are used in three ways. They are used to determine:§ Which values an entity may take on.

int abc = vector<int>(); // Type error: RHS is not int

§ Which operations are legal.

cout << *abc; // Type error: cannot dereference int

§ Which of multiple possible operations to perform.

cout << 123 + 456; // + does int additionstring ss1, ss2;cout << ss1 + ss2; // + does string concatenation


PL Features: Type SystemsClassifying Type Systems — Overview

We classify type systems along three axes.§ Overall type system: static or dynamic.§ How are types determined: manifest or implicit.§ How are types checked: nominal or structural.

Next we look at each of these three.


PL Features: Type SystemsClassifying Type Systems — Static vs. Dynamic

Static type system: types are determined and checked before program execution—typically during compilation. A type error prevents program execution.§ Examples: C, C++, Java, Haskell, Objective-C, Go, Rust, Swift.

Dynamic type system: types are determined and checked during program execution. Types are tracked by attaching to each value a tag indicating its type. A type error is flagged only when the code containing the error executes.§ Examples: Python, Lua, JavaScript, Ruby, Scheme, PHP.

Static & dynamic typing are very different things. Some prefer to reserve the word “type” for the static version, referring to the dynamic version as a “tag”. We will use “type” for both flavors.


PL Features: Type SystemsClassifying Type Systems — Manifest vs. Implicit [1/3]

Manifest typing: set the type of an entity by explicitly stating it. Such a mention of a type is a type annotation.

double sq47(double n) // C++{

double result = 4.7 * n * n;return result;

}

Implicit typing: no annotation.

function sq47(n) -- Lualocal result = 4.7 * n * nreturn result

end20 Feb 2017 CS F331 / CSCE A331 Spring 2017

Type annotations

No type annotations, except for this

17


In dynamically typed PLs, typing is usually mostly implicit. So it is tempting to conflate manifest typing with static typing. However, the two are not the same.

For example, Haskell has a static type system, but generally does not require type annotations.

sq47 = result where -- Haskellresult = 4.7 * n * n

A Haskell compiler performs type inference. Haskell types are inferred.


No type annotations at all

18


The following table shows how the type systems of various PLs can be classified along our first two axes.


Mostly Manifest Mostly Implicit

Static C, C++, Java Haskell, OCaml

Dynamic Not much goes here Python, Lua, Ruby, JavaScript, Scheme

Type Determination

OverallType

System

19

PL Features: Type SystemsClassifying Type Systems — Nominal vs. Structural [1/3]

struct A { // C++int h;int m;

};

struct B {int h;int m;

};

void gg(A x);

B bval;gg(bval); // Legal?

Consider the C++ code to the left.Can we pass a value of type B to

function gg?

No. The call to gg is illegal, because C++ checks ordinary function parameters using nominal typing: does the parameter have the right type.



Another standard: structural typing. Types are interchangeable if they have the same structure and support the same operations.

A loose version of structural typing: duck typing. Allow an argument to be passed to a function as long as every operation that the function actually uses is defined for the argument.

C++ checks template parameter types using duck typing.

template <typename T>void ggt(T x) // Takes parameter of type A or B{

cout << x.h << " " << x.m << endl;}

Lua uses duck typing for all function parameters.20 Feb 2017 CS F331 / CSCE A331 Spring 2017 21


An alternative to the nominal and structural standards for type checking is to avoid type checking altogether.

As defined in the 1994 ANSI standard, the Forth PL distinguishes between integer and floating-point values, so it arguably has a notion of type.

However, the two types are dealt with using different syntax. There is no need to check whether an integer parameter is actually an integer; Forth provides no facilities for passing a floating-point value in its place.

Thus, while Forth has types, it has no type checking.

This idea was once common in PL design. However, it does not work well with extensible type systems. Virtually all modern PLs include some form of type checking.


PL Features: Type SystemsType Safety — Introduction

A PL or PL construct is type-safe if it forbids operations that are incorrect for the types on which they operate.

Some PLs/constructs discourage incorrect operations without forbidding them. We may compare their level of type safety.

The C/C++ printf function is not type-safe. The following assumes age has type int, but does not check. It may behave oddly if age has another type.

printf("I am %d years old.", age);

C++ stream I/O is type-safe. Below, age is output correctly, based on its type. This will not compile if that type cannot be output.

cout << "I am " << age << " years old.";


PL Features: Type SystemsType Safety — Strong [ick!] & Weak [ick!]

Two unfortunate terms are often used in discussions of type safety: strong typing (or strongly typed) and weak typing(or weakly typed). These generally have something to do with the overall level of type safety offered by a PL.

But these terms have no standard definitions. They are used in different ways by different people. (I have seen at least three definitions of “strongly typed” in common use. C is strongly typed by one of them and weakly typed by the other two.)

Therefore:Avoid using the terms “strong” and “weak” typing, or “strongly” and “weakly” typed.

You may use the terms in an informal, comparative sense: this type system is stronger than that type system.


PL Features: Type SystemsType Safety — Soundness

A static type system is sound if it guarantees that operations that are incorrect for a type will not be performed; otherwise it is unsound.

Haskell has a sound type system. C & C++ have unsound type systems.

This is not a criticism!

In the world of dynamic typing, there does not seem to be any standard terminology corresponding to soundness. However, we can still talk about whether a dynamic type system strictly enforces type safety.


Date post:	09-Mar-2018
Category:	Documents
Upload:	ledan
View:	217 times
Download:	4 times

PL Features: Type Systems - University of Alaska Fairbanks fileLecture Slides Monday, February 20,...

Documents