XMλ. Contents What is the problem? Hosoya’s approach Shields’ approach XMLambda and the...

Post on 20-Dec-2015

213 views 0 download

Tags:

transcript

XMλ

Contents

What is the problem? Hosoya’s approach Shields’ approach XMLambda and the UHConclusion

What is the problem?XML, a standard language of first-order, tree-like datatypes

XML works well for describing static documents, but documents are typically dynamic, generated by a server

Implementing a server for dynamic documents in conventional languages is hard:

no direct support for XML or scripting language syntax no compile-time checks to ensure valid documents

Can custom languages developed for XML be embedded as combinatory libraries within a Haskell-like language?

element Msg = ( ( (To|Bcc)* & From), Body)element To = Stringelement Bcc = Stringelement From = Stringelement Body = P*element P = String

<Msg><To>jrommes@cs.uu.nl</To><Bcc>doaitse@cs.uu.nl</Bcc><From>joep@geevers.com</From><Body>

<P>Our presentation is finished!</P></Body>

</Msg>

XML

element Msg = ( ( (To|Bcc)* & From), Body)element To = Stringelement Bcc = Stringelement From = Stringelement Body = P*element P = String

| : union* : sequence& : unordered tuple, : ordered tuple

XML

What we are looking for:

XML → Functional Program.

document-type definition → type definitions

Regular expression → type

element → term

Document validation → type checking

Possible solutions

1. Using a universal datatype

Data Element = Atom String

| Node String (List Element)

Data Element = Atom String| Node String (List

Element)

Node “Msg” [Node “To” [Atom “jrommes@cs.uu.nl”],Node “Bcc [Atom “doaitse@cs.uu.nl”],Node “From” [Atom “joep@geevers.com”],Node “Body” [

Node “P” [Atom “Our...”]]

]

No validation possible

Possible solutions

1. Using a universal datatype

2. Using a newtype declarations

Newtype Msg = Msg (List (Either To Bcc),

From, Body )Newtype From = From StringNewtype To = To StringNewtype Bcc = Bcc StringNewtype Body = List PNewtype P = P String

Newtype Msg = Msg (List (Either To Bcc), From, Body

Newtype From = From StringNewtype To = To StringNewtype Bcc = Bcc StringNewtype Body = List PNewtype P = P String

Msg ([ Left ( To “jrommes@cs.uu.nl”), Right ( Bcc “doaitse@cs.uu.nl”),From “joep@geevers.com”,Body [

P “Our...”]

)

Sound, but not complete.

Possible solutions

1. Using a universal datatype

2. Using a newtype declarations

3. Using regular expression types as primitive

Hosoya

Possible solutions

1. Using a universal datatype

2. Using a newtype declarations

3. Using regular expression types as primitive

4. Using Type-Indexed rowsShields

Hosoya’s approach

Why Regular Expression Types? Static typechecking: generated XML

documents conform to DTD Or: invalid documents can never arise For example: A <table> must have at

least one <tr>

Why Regular Expression Patterns? Convenient programming constructs for

manipulating documents For instance, jump over arbitrary length data and

extract specific data: type Person = person[Name,Email*,Tel?]

match p with

person[Name

,Email+

,Tel ] -> …

XDuce: Values

Primitives represent XML documents (trees)

For example:person[name[“Joep”]

,email[“Joep@geevers.com”]]

I.e. a value is a sequence of nodes

XDuce: Regular Expression Types Types correspond to document schemas Familiar XML regular expressions: type Tel = tel[String] type Tels = Tel* type Recip = Bcc|Cc (Name, Tel*), Addr T? = T|() T+ = T,T*

Subtyping

Many algebraic laws:Associativity of concatenation and union:

A|(B|C) (A|B)|CCommutativity of union: A|B B|A

These laws are crucial for XML processing, but lead to complicated specification

Subtyping

Subtyping as set inclusion First define which values belong to type One type is a subtype of another if the

former denotes a subset of the latter For example: (Name*, Tel*) <: (Name|Tel)*

Pattern Matching: Exhaustivenesstype Person = person[Name,Email*,Tel?]match p with person[Name,Email+,Tel?] -> … person[Name,Email*,Tel] -> …

Not exhaustive Use subtyping to check: the input type

must be a subtype of the union of the pattern types

Pattern Matching: Irredundancy

match p with person[Name,Email*,Tel?] -> … person[Name,Email+,Tel] -> …

Second clause redundant A clause is redundant iff all the input

values that can be matched by the pattern can also be matched by preceding patterns

Pattern Matching: Type Inferencetype Name = name[String]match (ps as Person*) with person[name[val n as String] ,Email*,Tel?] ,rest -> …

Avoid excessive type annotations Use input type and pattern to infer types of

bare variables (rest)bound variables (n)

Functions

First-order functions (explicitly typed):fun f(P):T = e

For example:fun tels(val ps as Person*):Tel* = match ps with

person[Name,Email*,tel[val t]],rest -> tel[t],tels(rest) person[Name,Email*],rest -> tels(rest)

Higher-order Functions

Functions as first-class citizen Why desireable?

Abstraction

Not supported by XDuce What is needed?

Subtyping for arrow types

So why not support higher-order functions?

Higher-order Functions

Function definitions given by fixed set G G is used in T-APP (instead of standard

rule) Consequence: T-ABS fails Fix: redefine T-APP Type annotations needed for check of

pattern match

Parametric Polymorphism

Generic typing using vars instead of actual types Why desireable?

Abstraction from structure of problem

What is needed? Type abstraction Type application

So why no parametric polymorphism?

Parametric Polymorphism

Problems:forall X . (U|X) -> (T|X) Pattern matching problems:

Exhaustiveness / irredundancy checksType inference

Typing constraints cannot be representedforall X {U,T}.(U|X) -> (T|X)

Conclusions

Typed language with XML docs as primitive values

Regular expression types are fundamental Regular expression pattern matching No higher-order functions No parametric polymorphism

Shields’ approach“It is required that content models in element type

declarations be deterministic”

Consequence 1:

regular expressions must be 1-unambiguous

Unions and unordered tuples are formed from distinct members.

( ( To , Bcc ) & (Bcc, To) ) is 1-unambiguous

( (Bcc, To) & Bcc ) is not

( (To | Bcc) & Bcc ) is not

Shields’ approach“It is required that content models in element type

declarations be deterministic”

Consequence 2:possible to transform any XML element into a term:

* sequence list, tuple tuple| union → type-indexed sum& unordered tuple → type-indexed product

| and & are both formed from Type-Index Rows

Type-Indexed Rows

A type-indexed row is a list of types

Type constructors Empty: Row (_#_): Type → Row →

Row

For example: (Int # Bool # Empty)

Type-indexed product TIP: (All _): Row → Type

Type-indexed coproduct TIC: (One _): Row → Type

Insertion Constraints

Insertion constraints used to guarantee distinctness of elements:

a ins (Int # Bool # Empty)

constrains a to be any other than Int or Bool

(List b) ins (Int # Bool # Empty)

Is True

Type-indexed product TIP:Triv: All Empty(_ && _): extension

forall (a: Type) (b: Row) .

a ins b => a → All b → All (a#b)

Type-indexed coproduct TIC:(Inj _): injection

forall (a: Type) (b: Row) .

a ins b => a → One (a#b)

Let tuple = \(x && y && Triv) . (x, y)In tuple (True && 1 && Triv)

Type checking:

Unify All(x#y#Empty) and All(Int#Bool#Empty)

Under constraint: x ins (y#Empty)

Overall term has type (Int, Bool) or (Bool, Int) !

Equality constraints( c # d # Empty ) eq ( Int # Bool # Empty )

Propagates until sufficient information is found to be simplified

Simplifying constraints

Simple unification: (a → Int) eq (Bool → b)

a eq Bool, Int eq b

Row unification: (Int # a # Empty) eq (Bool # b # Empty)

(Int eq b), (a # Empty) eq (Bool # Empty)

insertion: (a,b) ins (Bool # c # Empty)

(a,b) ins (c # Empty)

Introducing fresh typenames Monomorphic:

newtype xCoord = IntAll (xCoord # Int # Empty)

Polymorphic:newtype xCoord = \ (a:Type).aAllows same newtypes within a record !!

Introduction opaque newtypesType arguments are ignored in insertion constraints

: newtype opaque xCoord = \(a:Type).a

XMLambda and UHConclusion

Why regular expression types (Hosoya)? Fundamental regular expression types Powerful pattern matching No higher order functions and polymorphism Subtyping and parametric polymorphism?

Why type indexed rows (Shields)? Flexibility: more general than regular expression types All nice characteristics of FP Constraint system?