© Kenneth C. Louden, 20031 Chapter 6 - Data Types Programming Languages: Principles and Practice,...

© Kenneth C. Louden, 2003 1

Chapter 6 - Data TypesChapter 6 - Data Types

Programming Languages:

Principles and Practice, 2nd Ed.

Kenneth C. Louden

Chapter 6 K. Louden, Programming Languages 2

IntroductionIntroduction The data type of an identifier is perhaps its most

important attribute. When computed statically, it can improve the– efficiency– security– correctness– readability

of a program. When computed dynamically and attached to

values, rather than identifiers (a la Scheme) the data type can no longer provide for most of these, but it still can provide:– security


What is a data type?What is a data type? Some name like int, double, etc. that you have to

write before a variable name. Just joking. But what do those names (int, double,

etc.) really mean? They indicate which values can be stored in the location referred to by the name.

Thus, the name int in Java stands for the set of integers, or more precisely for the finite subset of integers:{ x | x an integer & -2147483648 x < 2147483648}

So we could say: a data type is a set of values. Unfortunately, this ignores some basic properties.


What is a data type (2)What is a data type (2) Typically, we are interested not only in the actual

values themselves, but what we can do with them, i.e. what operations we can apply to them.

For example, +, -, *, /, % are operations on ints. The data type should provide exact information

on what these operations are and how they act. Thus, a second, more complete definition is: a

data type is a set of values, together with a set of operations on those values having certain properties.

Data types are rarely specified with this kind of completeness—usually some assumptions are made about what operations are available and/or what they do.


How to define a data type?How to define a data type? List all of its values:

enum RGBColor {Red, Green, Blue}; // C++

datatype RGBColor = Red|Green|Blue; (*ML*)

(what are the operations here?) Imitate mathematics (built-in types): double, int. Apply a type constructor to already existing types:

type IntReal = int * real; (* ML built-in Cartesian product type constructor *)

class IntReal {int x; double y;} // Java Type constructors can also imitate mathematical

set operations: Cartesian product, union, sequence, function:union IntOrReal { int x; double y;};// C++


Defining a data type (2)Defining a data type (2) Important to remember that type constructors and

built-in types only imitate mathematics: there are always substantial differences (e.g. finiteness, and the kinds of operations that are available).

Also important to separate type constructors from value constructors: type constructors are functions from types to types; value constructors are functions from values to values. In Java, con-structors are value constructors, while the class definition mechanism itself is a type constructor.

Data types can be studied as math objects—they are algebras. And algebraic ideas can be translated into language syntax as abstract data types (Chapter 9), sometimes called algebraic types.


Defining a data type (3)Defining a data type (3) Many of the mechanisms for defining types fail to

specify the precise operations available. Note that the Java class construct does (most of the time) specify the operations.

Even when the operations are specified, the properties of these operations often are not. Sometimes the properties are put into comments (preconditions and postconditions).

Types may or may not get explicit names:– In Java, inner classes may be anonymous. Also,

there is no way to give a specific array type a name except by wrapping it in a class definition.

– In ML, the built-in type constructors can be used without explicit naming.


Using data types in translationUsing data types in translation Each type carries information about the size and

structure of its data; this can be used to make code efficient.

Types can be used to check whether the operations of the program make sense: type checking.

During type checking, two basic issues arise:– How to compare two types: type equivalence.– How to construct a type that is not given

explicitly: type inference. Many different algorithms are available for both of

these: collectively referred to as a type system. A language is strongly typed if its type system

guarantees statically (as far as possible) that no data-corrupting errors can occur during execution.


Using data types (2)Using data types (2) Additional property of a strongly typed language: all

errors that cannot be checked statically (such as subscript out of bounds), generate runtime errors.

Java, ML are strongly typed, Scheme, C are not. Unsafe programs: those with data errors. Legal programs: those accepted by the type system. In a strongly typed language all legal programs are

safe (could take this as a definition). Unfortunately, there may be many safe illegal

programs. A type system tries to maximize both flexibility and

security, where flexibility means: reduce the number of safe illegal programs & reduce the amount of type information the programmer must supply.


Overview of Java typesOverview of Java types Can be confusing because Java has two

type systems, one static & one dynamic:– The static system is based on the declared type.– The dynamic system is based on actual class

membership of objects during execution. Built-in types in Java: boolean, byte, char, short, int, long, float, double.

Type constructors:– Class– Interface– Array


Simple typesSimple types No internal structure; sometimes called scalar

types. Most predefined types are simple, but not all

(java.lang types are classes, so not simple). There can also be user-defined simple types (e.g. enum, subrange, but none in Java).

Many simple types are ordinal types (having an order with a first and last element): int, char are ordinal in Java, but boolean is not.

Values of simple types are usually conditioned by hardware (Java tries for independence).

Operations are implicit and may or may not conform to mathematical rules.


Type constructors as set operationsType constructors as set operations

Cartesian products

Unions

Subsets

Arrays/functions

Sequences/lists

Recursive types

No intersection! (In general, types should not overlap)


Cartesian ProductsCartesian Products Finite combinations of previously defined types. In mathematics, the components are selected by

position. In most languages, the components are selected

by name. ML has a very pure form of Cartesian product:

– ("a",2): string * int– #1 ("a",2) returns "a".

Java classes are related but certainly not identical to Cartesian products: components are selected by name, and there are methods.

C structs are closer than Java classes.


UnionsUnions Values belong to one of a finite set of types. In mathematics, sets can overlap, so a value could

be in more than one set. As noted previously, we don't really want values to

be in different sets (non-empty intersection). Thus, unions are usually disjoint: a value can only

be in one set at a time. True for C/C++; the following code prints garbage:

union IntOrReal { int x; double y;} u;u.x = 1; cout << u.y << endl;

Note that in this code we force the compiler to think that a value is in the wrong set. This is dangerous and part of why C++ is not strongly typed.


Unions (2)Unions (2) Unions can also be discriminated, when the

values are tagged with their set membership. If enforced, this makes unions type safe and would outlaw the previous C++ code. Thus, C/C++ unions are not discriminated.

ML has discriminated unions:datatype IntOrReal =

IsInt of int | IsReal of real; The constructor names IsInt, IsReal are the

tags/discriminants:> IsInt 2;val it = IsInt 2 : IntOrReal

In C++ we can apply a tag manually:struct IntOrReal{ bool isInt; union {int x; double y;};};


Unions (3)Unions (3) Java doesn't have explicit unions. But does it

have unions at all? Yes! Consider:

public abstract class A {…};public class B extends A {…};public class C extends A {…};

Now A represents the union of B and C! Are these disjoint? Yes! Are they discriminated? Yes again! (Consider the instanceof operator.)


SubsetsSubsets A subset type may be an explicit subrange

indication for values, either as a runtime check, or as a separate type, as in the Ada:-- runtime check that an int is between 0 and 9:subtype Digit1 is integer range 0..9;-- new type holding an integer between 0 and 9:type Digit2 is range 0..9;

It could also be a subtype: a type that implements all the operations of another type. Note that subsets are not always subtypes and subtypes are not always mathematical subsets. (Digit1 above is not really even a static type, despite the keyword.)

C enums are subranges of int (no runtime check). Public inheritance is a form of subtyping.


Arrays and functionsArrays and functions An array in C or Java is like a function from a finite

subrange 0...n-1 of the integers. In Java you can't give an explicit name to an array

type, but you can in C/C++:typedef int[10] IntArray;

More general function types are available in many languages (but not in Java):

type IntFunc = int -> int; (* ML *)val inc:IntFunc = fn x => x+1; (* ML *)

// a function constant in C/C++:int incfn(int x) { return x+1; }

typedef int (*IntFunc)(int); // why *?

// a function var, initialized to incfn:IntFunc inc = incfn;


Vectors, lists and sequencesVectors, lists and sequences Some languages also have vectors, which are like

arrays, but often with more flexibility, especially dynamic resizability.

Lists are similar to vectors, except they can only be accessed by counting down from the first element. Thus, the list type is really a recursive type: datatype 'a List (* ML *)

= EmptyList | Cons of 'a * 'a List;All functional languages that I am aware of have built-in lists.

Sequences are also like arrays, except that they are typically (potentially) infinite. In this guise they are called streams. Many functional languages have built-in streams (Scheme but not ML).


Recursive typesRecursive types A recursive type is a set that contains itself as an

element, right? Wrong! Sets cannot in general contain

themselves as elements (Russell's paradox: the set of all sets that do not contain themselves as elements).

A recursive type should better be named a recursively-defined type. Indeed, in math there are many sets that are recursively defined: the set of arithmetic expressions (recursive grammar), the set of integers—indeed, any set that is defined inductively (e.g., if x is an integer, then so is x+1).


Recursive types (2)Recursive types (2) The problem with a recursive type is that it is

generally infinite, and values in the type can be arbitrarily large.

As with virtually all situations with elements of unpredictable size (lists, arrays, calls), languages use indirection, or pointers, to deal with them.

Many languages require the indirection to be explicit in a recursive type definition:struct IntList { int head; IntList* tail; }; // C++

pointer

Java of course has implicit object indirection:class IntList { int head; IntList tail; } // Java


Recursive types (3)Recursive types (3) Every recursively defined type must (like

induction) have a base case and at least one recursive (or inductive) case. In many languages, particularly those with explicit indirection, the base case is implicitly the null pointer:IntList* list = 0; // C/C++IntList list = null; // Java

In other languages, the base case must be explicitly represented in the recursive definition:datatype IntList (* ML *)

= Null | Struct of int * IntList; The ML definition is much closer to the math

definition of IntList as a set:IntList = { Null } Int IntList


Mathematics of Recursive SetsMathematics of Recursive Sets

The actual values in a recursively defined set must be computed from a recursive equation such as IntList = {Null} Int IntList.

This equation says that IntList is a fixed point (or fixpoint) of the function f(X) = {Null} Int X.

A least fixpoint solution is found as the union of partial solutions: IntList = {Null} Int {Null} Int Int {Null} …

Least fixpoint solutions occur also for recursive functions (we did not study this).


Type structure Type structure of Javaof Java

Java Types

Numeric class interface

double

boolean

short

Integral

float

Floating point

int

char

Primitive Reference

Array

long

byte


C Types

Numeric Pointer Function struct union

double

void

long double

short int

Integral

float

Floating

enum

int char

(signed) (unsigned)

Basic Derived

Array

long int

Type structure Type structure of Cof C


Type equivalenceType equivalence Languages differ substantially over when their

type checking algorithms consider two types to be the same.

Historically, languages like Fortran and Algol used structural equivalence: two types are the same if they have the same structure. Thus, using Java syntax, if we defineclass A { int x; double y;} andclass B { int x; double y;} then the sets that A and B represent are the same: A a = new B() is ok.

Obviously, this is not Java's rule, nor is it C's. Structural equivalence is also difficult to verify

for recursive types.


Type equivalence (2)Type equivalence (2) Structural equivalence is reasonable for some

built-in type constructors, especially non-recursive ones. For example, C uses structural equivalence for pointers, arrays, and functions. Even Java uses structural equivalence for arrays.

ML also uses structural equivalence for types defined in a type declaration:type dollars = real; type cents = int;fun pennies (d:dollars):cents

= round (d * 100.0);pennies (2.0:real); (* ok *)

Structural equivalence leaves unspecified whether the order in a structure matters, or the field names, or both (the usual choice).


Type equivalence (3)Type equivalence (3) A strict alternative to structural equivalence is

name equivalence: two types are the same if and only if they have the same name.

Easy to implement, but depends on ability to name. Without names, structural equivalence must be used.

Java uses name equivalence for classes and interfaces, structural equivalence for arrays.

ML uses name equivalence for types declared in a datatype declaration (which may be recursive):datatype Dollars = Dollars of real;(* now Dollars 2.0 is not the same as 2.0 *)

If naming can be mixed with construction, intermediate algorithms can be used.


Type equivalence (4)Type equivalence (4) Example in C: applying struct constructs a new

type, applying typedef doesn't:struct A { char x; int y; };struct B { char x; int y; };typedef struct A C;typedef C* P;typedef struct A * R;typedef int S[10];typedef int T[5];typedef int Age;typedef int (*F)(int);typedef Age (*G)(Age);

Types struct A and C are equivalent, but they are not equivalent to struct B; types P and R are equivalent; types S and T are equivalent; types int and Age are equivalent, as are function types F and G.


Type checkingType checking Determining whether code uses legitimate

operations according to its types. Involves both type inference and equivalence. Also involves applying often complex rules for

relaxing exact type matching under certain circumstances, usually called type compatibility rules.

Assignment compatibility refers to the compatibility rules governing assignments.

Simple example: x = y / 2 + 3.5. Clearly x and y must be numeric. Can x be an int? Can x be a float? Can y be a long? What are the (implicit) types of the literals 2 and 3.5?


Type checking (2)Type checking (2) Type checking using compatibility rules involves

type conversion from one type to another, compatible type (see later slides).

Type checking of assignments involves verifying that the left hand side has a computable address, called an l-value, and that the value of the right hand side (called an r-value) is capable of being stored at that address, or capable of conversion to a value that can be.

OO languages have further assignment compatibility rules: assignment of a subclass object to a superclass variable is allowed; assignment of a superclass object to a subclass variable without a cast is not.


Type checking (3)Type checking (3) Back to previous example: x = y / 2 + 3.5;

Suppose the type of x is float and the type of y is long. Does this statement type check?

First determine the type of y / 2: 2 is implicitly an int. Then, since y is a long, 2 is converted automatically ("promoted") to a long, and the type of y / 2 is long.

Now determine the type of the sum: the left operand is a long and the right operand is a double (implicitly). By the rules of Java, a long can be promoted to a double, so the result is a double.

However, a double cannot be assigned to a float, so a type error occurs at that point.


Type conversionType conversion Type conversion can be classified two ways:

– Does the conversion require written code?– Does the internal representation change, or

just the type? The 1st classification has two categories:

– automatic or implicit conversion (no code)– manual or explicit conversion (code must be

written) The 2nd classification also has two categories:

– The value representation in memory changes– The value representation in memory doesn't

change, just the type All four combinations of these can occur.


Type conversion (2)Type conversion (2) Implicit conversions in Java include numerical

promotion and upcasting. In general, this may involve either representation change (e.g. int to double), or simply changing the perceived type without representation changes (e.g. upcasting).

Explicit conversions in Java also may or may not involve representation changes:– Casts typically do not involve bit changes

(downcasts, but numeric casts do change the representation)

– Applying conversion functions are representation changes, e.g. Math.round

Some languages outlaw automatic conversions and use conversion functions only (ML, Ada).


Polymorphic type checkingPolymorphic type checking Hindley-Milner style uses type variables ('a, 'b,

etc. in ML) and a process called unification (a version of pattern matching):– Any type variable unifies with any type expression (and

is instantiated to—is identified with—that expression).– Any two type constants (i.e., literals like int or double)

unify only if they are the same type.– Any two type constructions (i.e., applications of type

constructors) unify if and only if they are applications of the same type constructor and all of their component types also (recursively) unify.

The type of an identifier is the most general type that can result from the application of type unification to its definition (sometimes called a most general unifier, or mgu).


Polymorphic type checking (2)Polymorphic type checking (2) Every use of an identifier that is polymorphically

typed must involve a type that is a specialization of its most general type: a more restricted form that is compatible with the general type.

For example, int -> int is a specialization of 'a -> 'a, and int -> real is a specialization of 'a -> 'b, but int -> real is not a specialization of 'a -> 'a.

In an H-M type system, there is a further restriction of the use of polymorphic types, in that each use of a polymorphic argument in a function call must specialize to the same type (so-called let-bound polymorphism).

Example: f (x, y, g) = (g x, g y) has ML type 'a * 'a * ('a -> 'b) -> 'b * 'b


Polymorphic type checking (3)Polymorphic type checking (3) Further problem in H-M type checking: what if the

same type variable occurs in two different places in two type expressions that are to be unified? An infinite regress can occur!

Consider the ML definition: fun f g = g f;The ML type checker assigns f the type 'a->'b and g the type 'a. Then the rhs says type 'a is actually a function type 'c->b', and since f is a parameter to g, 'c = 'a->'b. Thus 'a = 'c->'b = ('a->'b)->'b. But what is 'a? Trying to solve this equation for 'a leads to an infinite process.

To prevent this, unification must implement the occur check to make sure it does not try to unify a type variable 'a with a type expression that contains 'a.


Extended H-M exampleExtended H-M examplefun max (x,y,gt) = if gt(x,y) then x else y;

Possible syntax tree with annotated type variables (using greek letters and C-style notation for functions):

define

max x gt y if

call x y

x y gt

(*)(,,)


Extended H-M example (2)Extended H-M example (2)Add the type information gathered by the type

checker at the call node:

define

max x gt y if

call x y

x y gt

(*)(,) (*)(,, (*)(,))

(*)(,)


Extended H-M example (3)Extended H-M example (3)Add the type information gathered by the type

checker at the if node:

define

max x gt y if

call x y

x y gt

(*)(,, bool (*)(,))

bool (*)(,)

bool

bool (*)(,)


Extended H-M example (4)Extended H-M example (4)

Finally, at the level of the root, unify the return type of the max function () with the type of the body (, the type of the if node), and the result is the most general type of the max function:

(*)(,, bool (*)(,))

Or, in ML notation:'a * 'a * ('a * 'a -> bool) -> 'a

Date post:	14-Dec-2015
Category:	Documents
Upload:	braydon-soutter
View:	318 times
Download:	12 times

© Kenneth C. Louden, 20031 Chapter 6 - Data Types Programming Languages: Principles and Practice,...

Documents