XMλ
Contents
What is the problem? Hosoya’s approach Shields’ approach XMLambda and the UHConclusion
What is the problem?XML, a standard language of first-order, tree-like datatypes
XML works well for describing static documents, but documents are typically dynamic, generated by a server
Implementing a server for dynamic documents in conventional languages is hard:
no direct support for XML or scripting language syntax no compile-time checks to ensure valid documents
Can custom languages developed for XML be embedded as combinatory libraries within a Haskell-like language?
element Msg = ( ( (To|Bcc)* & From), Body)element To = Stringelement Bcc = Stringelement From = Stringelement Body = P*element P = String
<Msg><To>[email protected]</To><Bcc>[email protected]</Bcc><From>[email protected]</From><Body>
<P>Our presentation is finished!</P></Body>
</Msg>
XML
element Msg = ( ( (To|Bcc)* & From), Body)element To = Stringelement Bcc = Stringelement From = Stringelement Body = P*element P = String
| : union* : sequence& : unordered tuple, : ordered tuple
XML
What we are looking for:
XML → Functional Program.
document-type definition → type definitions
Regular expression → type
element → term
Document validation → type checking
Possible solutions
1. Using a universal datatype
Data Element = Atom String
| Node String (List Element)
Data Element = Atom String| Node String (List
Element)
Node “Msg” [Node “To” [Atom “[email protected]”],Node “Bcc [Atom “[email protected]”],Node “From” [Atom “[email protected]”],Node “Body” [
Node “P” [Atom “Our...”]]
]
No validation possible
Possible solutions
1. Using a universal datatype
2. Using a newtype declarations
Newtype Msg = Msg (List (Either To Bcc),
From, Body )Newtype From = From StringNewtype To = To StringNewtype Bcc = Bcc StringNewtype Body = List PNewtype P = P String
Newtype Msg = Msg (List (Either To Bcc), From, Body
Newtype From = From StringNewtype To = To StringNewtype Bcc = Bcc StringNewtype Body = List PNewtype P = P String
Msg ([ Left ( To “[email protected]”), Right ( Bcc “[email protected]”),From “[email protected]”,Body [
P “Our...”]
)
Sound, but not complete.
Possible solutions
1. Using a universal datatype
2. Using a newtype declarations
3. Using regular expression types as primitive
Hosoya
Possible solutions
1. Using a universal datatype
2. Using a newtype declarations
3. Using regular expression types as primitive
4. Using Type-Indexed rowsShields
Hosoya’s approach
Why Regular Expression Types? Static typechecking: generated XML
documents conform to DTD Or: invalid documents can never arise For example: A <table> must have at
least one <tr>
Why Regular Expression Patterns? Convenient programming constructs for
manipulating documents For instance, jump over arbitrary length data and
extract specific data: type Person = person[Name,Email*,Tel?]
match p with
person[Name
,Email+
,Tel ] -> …
…
XDuce: Values
Primitives represent XML documents (trees)
For example:person[name[“Joep”]
,email[“[email protected]”]]
I.e. a value is a sequence of nodes
XDuce: Regular Expression Types Types correspond to document schemas Familiar XML regular expressions: type Tel = tel[String] type Tels = Tel* type Recip = Bcc|Cc (Name, Tel*), Addr T? = T|() T+ = T,T*
Subtyping
Many algebraic laws:Associativity of concatenation and union:
A|(B|C) (A|B)|CCommutativity of union: A|B B|A
These laws are crucial for XML processing, but lead to complicated specification
Subtyping
Subtyping as set inclusion First define which values belong to type One type is a subtype of another if the
former denotes a subset of the latter For example: (Name*, Tel*) <: (Name|Tel)*
Pattern Matching: Exhaustivenesstype Person = person[Name,Email*,Tel?]match p with person[Name,Email+,Tel?] -> … person[Name,Email*,Tel] -> …
Not exhaustive Use subtyping to check: the input type
must be a subtype of the union of the pattern types
Pattern Matching: Irredundancy
match p with person[Name,Email*,Tel?] -> … person[Name,Email+,Tel] -> …
Second clause redundant A clause is redundant iff all the input
values that can be matched by the pattern can also be matched by preceding patterns
Pattern Matching: Type Inferencetype Name = name[String]match (ps as Person*) with person[name[val n as String] ,Email*,Tel?] ,rest -> …
Avoid excessive type annotations Use input type and pattern to infer types of
bare variables (rest)bound variables (n)
Functions
First-order functions (explicitly typed):fun f(P):T = e
For example:fun tels(val ps as Person*):Tel* = match ps with
person[Name,Email*,tel[val t]],rest -> tel[t],tels(rest) person[Name,Email*],rest -> tels(rest)
Higher-order Functions
Functions as first-class citizen Why desireable?
Abstraction
Not supported by XDuce What is needed?
Subtyping for arrow types
So why not support higher-order functions?
Higher-order Functions
Function definitions given by fixed set G G is used in T-APP (instead of standard
rule) Consequence: T-ABS fails Fix: redefine T-APP Type annotations needed for check of
pattern match
Parametric Polymorphism
Generic typing using vars instead of actual types Why desireable?
Abstraction from structure of problem
What is needed? Type abstraction Type application
So why no parametric polymorphism?
Parametric Polymorphism
Problems:forall X . (U|X) -> (T|X) Pattern matching problems:
Exhaustiveness / irredundancy checksType inference
Typing constraints cannot be representedforall X {U,T}.(U|X) -> (T|X)
Conclusions
Typed language with XML docs as primitive values
Regular expression types are fundamental Regular expression pattern matching No higher-order functions No parametric polymorphism
Shields’ approach“It is required that content models in element type
declarations be deterministic”
Consequence 1:
regular expressions must be 1-unambiguous
Unions and unordered tuples are formed from distinct members.
( ( To , Bcc ) & (Bcc, To) ) is 1-unambiguous
( (Bcc, To) & Bcc ) is not
( (To | Bcc) & Bcc ) is not
Shields’ approach“It is required that content models in element type
declarations be deterministic”
Consequence 2:possible to transform any XML element into a term:
* sequence list, tuple tuple| union → type-indexed sum& unordered tuple → type-indexed product
| and & are both formed from Type-Index Rows
Type-Indexed Rows
A type-indexed row is a list of types
Type constructors Empty: Row (_#_): Type → Row →
Row
For example: (Int # Bool # Empty)
Type-indexed product TIP: (All _): Row → Type
Type-indexed coproduct TIC: (One _): Row → Type
Insertion Constraints
Insertion constraints used to guarantee distinctness of elements:
a ins (Int # Bool # Empty)
constrains a to be any other than Int or Bool
(List b) ins (Int # Bool # Empty)
Is True
Type-indexed product TIP:Triv: All Empty(_ && _): extension
forall (a: Type) (b: Row) .
a ins b => a → All b → All (a#b)
Type-indexed coproduct TIC:(Inj _): injection
forall (a: Type) (b: Row) .
a ins b => a → One (a#b)
Let tuple = \(x && y && Triv) . (x, y)In tuple (True && 1 && Triv)
Type checking:
Unify All(x#y#Empty) and All(Int#Bool#Empty)
Under constraint: x ins (y#Empty)
Overall term has type (Int, Bool) or (Bool, Int) !
Equality constraints( c # d # Empty ) eq ( Int # Bool # Empty )
Propagates until sufficient information is found to be simplified
Simplifying constraints
Simple unification: (a → Int) eq (Bool → b)
a eq Bool, Int eq b
Row unification: (Int # a # Empty) eq (Bool # b # Empty)
(Int eq b), (a # Empty) eq (Bool # Empty)
insertion: (a,b) ins (Bool # c # Empty)
(a,b) ins (c # Empty)
Introducing fresh typenames Monomorphic:
newtype xCoord = IntAll (xCoord # Int # Empty)
Polymorphic:newtype xCoord = \ (a:Type).aAllows same newtypes within a record !!
Introduction opaque newtypesType arguments are ignored in insertion constraints
: newtype opaque xCoord = \(a:Type).a
XMLambda and UHConclusion
Why regular expression types (Hosoya)? Fundamental regular expression types Powerful pattern matching No higher order functions and polymorphism Subtyping and parametric polymorphism?
Why type indexed rows (Shields)? Flexibility: more general than regular expression types All nice characteristics of FP Constraint system?