+ All Categories
Home > Documents > Document Value Model: Value-oriented XML processing for the internet

Document Value Model: Value-oriented XML processing for the internet

Date post: 01-Jan-2016
Category:
Upload: theodore-pallas
View: 14 times
Download: 3 times
Share this document with a friend
Description:
Document Value Model: Value-oriented XML processing for the internet. Fritz Henglein DIKU, University of Copenhagen [email protected]. Abstract. - PowerPoint PPT Presentation
Popular Tags:
43
Document Value Model: Value-oriented XML processing for the internet Fritz Henglein DIKU, University of Copenhagen [email protected]
Transcript
Page 1: Document Value Model:  Value-oriented XML processing for the internet

Document Value Model: Value-oriented XML processing

for the internet

Fritz Henglein

DIKU, University of Copenhagen

[email protected]

Page 2: Document Value Model:  Value-oriented XML processing for the internet

Abstract

• XML is all the rage. How do we store and process XML documents, however? In this talk we present XML Value Store, a persistent distributed (peer-to-peer) storage manager with a value-oriented interface, the Document Value Model (DVM), for XML documents whose parts may be distributed around the net and even moving around (such as on a cell phone at 140 km/h on a motorway). We compare DVM with existing XML processing languages and specifically the W3-consortium based Document Object Model (DOM). We argue that, apart from a series of technical advantages, the central benefit of DVM is a simplified programming model that lets the programmer focus on application logic, and the XML middleware on persistence management, caching, replication, coalescing, encryption, distribution, lookup, routing and internet data transport. We finally sketch a simple extension of XML Value Store with remote execution. Together with storing code in the XML Value Store this lets users send queries to remote XML Value Store for execution and promises highly scalable grid computing functionality with a simple, problem-oriented programming model.

Page 3: Document Value Model:  Value-oriented XML processing for the internet

Abstract (long)• XML (eXtended Markup Language) is emerging as the universal language for representing semi-structured data for distributed

storage and information interchange on the internet and as such is destined to be the universal tissue -- the lingua france -- for interoperable web services and databases interconnecting the internet. This makes XML processing an undisputed growth industry. But how is it done? We give examples of processing XML documents using domain-specific languages XSLT and XQUERY, and general purpose interfaces SAX an DOM for manipulating structure and contents of XML documents. The latter, Document Object Model, is based on object-oriented programming principles in which tree nodes are mutable objects, with associated methods for imperatively updating their state. Furthermore, each tree node in DOM is equipped with a (single) parent reference and a (single) document root reference, which means DOM-nodes cannot be shared and cannot be moved to other documents. Furthermore, in practice a DOM program starts by parsing an XML document completely into memory before performing any processing on it, however little part of the document is actually required for that, and it finally pretty-prints its tree structure and writes the whole document out on disk, however little of it is actually changed since it was read. In this talk we present DVM (Document Value Model), a value-oriented interface for processing XML documents, and XML Value Store, a distributed (peer-to-peer) storage manager for storing XML documents in parsed form. We illustrate XML documents and document nodes, based on treating nodes as values -- immutable objects. The immutability of nodes in DVM allows aggressive and safe use of sharing through value references -- universal pointers to immutable objects stored anywhere -- in the XML Value Store. This has a number of technical advantages over DOM: document nodes are sharable, also across multiple documents; loading and saving of nodes into/from memory is done by need, that is only those nodes needed by a computation are loaded and only those not already saved on disk are actually saved; node pointers point to nodes whereever they are stored, even they move around frequently; parsing and unparsing for persisting (storing on disk) are eliminated since the XML tree, not its linearized form is stored; nodes can be cached and replicated aggressively for performance without concern for (cache) incoherence; identical nodes (document parts) are only saved once on a disk as opposed to multiple times, even if the different users accidentally store the same are different; no parent and root nodes are stored, yet navigation to parent and root are still possible. The main advantage, we argue, of this is that these 'generic' (computer sciency) data management concerns can be and are handled in the XML Value Store, not in the programmer's application logic. A planned extension of XML Value Store is the addition of a 'higher-order' interface, which allows remote execution. This allows sending scripts (queries) to an XML Store for remote execution and promises to provide scalable grid computing functionality with a simple, problem-oriented programming model.

Page 4: Document Value Model:  Value-oriented XML processing for the internet

Abstract (for functional programmers)

• This talk is basically about programming with (disk and network) I/O in a functional, high-level fashion.

Page 5: Document Value Model:  Value-oriented XML processing for the internet

Overview (buzzy version)

• OOP

• VOP

• XML

• DOM

• DVM

• X

Page 6: Document Value Model:  Value-oriented XML processing for the internet

Overview

• OOP: object-oriented programming and distribution and mobility

• VOP: value-oriented programming• XML: XML processing models and

languages• DOM: Document Object Model• DVM: Document Value Model• X: The Unknown (future work)

Page 7: Document Value Model:  Value-oriented XML processing for the internet

Overview

• OOP

• VOP

• XML

• DOM

• DVM

• X

General theme: programming with values (immutable objects)

... and objects (and carefully distinguishing between them).

Page 8: Document Value Model:  Value-oriented XML processing for the internet

OOP – or rather:imperative programming

• Basic model of programming: – primitive in-place update operations:

obj.field := obj2ref– compound update operations: controlled

sequential execution of updates; e.g.(for int i = 0; i < arr.size; i++) arr[i] := newVal(i);

Page 9: Document Value Model:  Value-oriented XML processing for the internet

Imperative programming theme

• Goal: Global state transition from State0 to Staten; State0 is destroyed.

• Implementation (ephemeral state updates): State0 -> ... -> Statei -> Staten of primitive state transitions, where– each primitive update destroys the previous

state

Page 10: Document Value Model:  Value-oriented XML processing for the internet

Consequence 1

• software component interfaces are state-oriented and stateful:– which operations are available depends on history of

operations executed in the

– responses from components depend on history of operations executed

• Example: Unix file I/O• NB: Operations on such components are not

necessarily atomic (or even recoverable)

Page 11: Document Value Model:  Value-oriented XML processing for the internet

Copy-and-update programming

input(f)

process(s)

output(f)

Note:•data get copied•they are not always coherent• they get copied again

input(f)

process(s)

output(f)

Page 12: Document Value Model:  Value-oriented XML processing for the internet

Why (and when) it works (well)

• no concurrent access to file• sequential and synchronous programming (control

over sequence of state changes)• no partial failures: atomic abort due to single point

of failure (single-process execution on single processor)

• no replication of stateful data• ‘random’ access to location of data (rapid access

no matter where they are stored)

Page 13: Document Value Model:  Value-oriented XML processing for the internet

Consequence 2

• Software/hardware component APIs are copy-oriented: data referenced by a pointer get copied before being manipulated to ensure integrity

• Example: Modern operating systems are based on separation of address spaces; require copying of data or delegation of tasks (ask the other process to do something for me)

Page 14: Document Value Model:  Value-oriented XML processing for the internet

Imperative programming: Problem areas

• caching and replication require heavy coherence protocols or different states are ‘observable’ by clients and users– e.g. file save under NFS (wait for 30 seconds!!)

• atomic (commit of compound) update is difficult to achieve in the presence of partial failures;– rollback is not ‘naturally’ supported, but normally

required in situations where (atomic) updates can fail

• coalescing identical data (storing data only once) cannot be done (easily)

Page 15: Document Value Model:  Value-oriented XML processing for the internet

Imperative programming: Problem areas...

• programming is mostly synchronous to control degree of nondeterminism due to concurrency

• access to storage locations is not ‘random’ (no modern file system does what’s shown before)

• access to updatable objects is typically ‘location’-based; mobile objects are not ‘naturally’ supported

• lots of data stored multiple times

Page 16: Document Value Model:  Value-oriented XML processing for the internet

...but, of course

• Updatable objects are excellent for propating information to an arbitrary number of clients (to any caller of the object, needn’t even know or keep track number or identity of callers)

Page 17: Document Value Model:  Value-oriented XML processing for the internet

Properties of distributed (mobile) systems

• Partial failures– can’t even distinguish network failures from

computing node failures

• Concurrency• Difficult (exact) synchronization of

processes• Widely varying access latency:

– rpc may block arbitrarily long time

Page 18: Document Value Model:  Value-oriented XML processing for the internet

Techniques for battling these problems

• Caching, replication, memoization

• (buffered) asynchronous message passing

• relaxed or indeterminate semantics– time-outs– observational differences between processes

running on same machine or on different machines

Not good for mobile code!

Page 19: Document Value Model:  Value-oriented XML processing for the internet

Central problem

• ...not reading (loading)

• ...not writing (saving, allocating)

• but updating (overwriting)

These ops commute!

Breaks commuting

Note: The more updating, the less operations commute and the more their execution needs to be controlled (synchronized).

Page 20: Document Value Model:  Value-oriented XML processing for the internet

VOP: Value-oriented programming

• Programming with:– arbitrarily “large” values (immutable objects), stored

not only in RAM, but also on disk and on the net– location-independent value references (short,

probabilistically unique identifiers of values, wherever they are stored) – can be thought of as light-weight proxies for actual (big) values

– plus “small” stateful cells (mutable objects) and cell references, incl.

• wait-free registers with consensus number infinity (e.g., compare-and-swap registers)

Page 21: Document Value Model:  Value-oriented XML processing for the internet

Benefits/goals• Value references:

– efficient sharing of immutable data– efficient message passing

• Arbitrarily large values: – programmatic support of efficient atomic update:

build new (global) state as value, then perform update atomically by assigning value reference of new value to register holding present state.

• Small registers: – guaranteeing atomic update, with no (or minimal) locking– wait-freeness: ensure ‘progress’ (doesn’t get blocked forever or for

too long) of each client, even in the face of partial failures elsewhere

Page 22: Document Value Model:  Value-oriented XML processing for the internet

XML

• XML info set (“(Minimal) XML tree”): labeled ordered tree, with – character data at the leaves– key/value pairs (attributes) at the internal nodes

• XML document:– linearized representation of XML tree based on

pre/post-order traversal of XML tree

Page 23: Document Value Model:  Value-oriented XML processing for the internet

XML example

author title

book

Susanne Staun Mit smukke lig

<?xml ...?><book> <author> Susanne Staun </author> <title> Mit smukke lig </title></book>

Page 24: Document Value Model:  Value-oriented XML processing for the internet

Document Object Model (DOM)

author title

book

Susanne Staun Mit smukke lige

Why the extra pointers?

Page 25: Document Value Model:  Value-oriented XML processing for the internet

DOM characteristics

• object-oriented: nodes are objects, have methods that, amongst others, update their properties (children, attributes, parent pointer)

• purely tree oriented: each node has at most one predecessor, no node sharing

• cloning: is used to copy a node into another place of a document

Page 26: Document Value Model:  Value-oriented XML processing for the internet

DOM specification

• Specified by W3C, see www.w3.org/DOM

• Specification has 3 levels (specifying more and more functionality for document objects)

Page 27: Document Value Model:  Value-oriented XML processing for the internet

Programming with DOM

• Typical scenario:– Read linearized XML document from file or network

‘pipe’ (socket).– Parse XML document into an in-memory tree data

structure corresponding to DOM– Traverse and manipulate in-memory structure– Unparse in-memory structure to linearized XML

document– Write out XML document through file or network pipe

interface.

Page 28: Document Value Model:  Value-oriented XML processing for the internet

Document Value Model (DVM)

author title

book

Susanne Staun Mit smukke lige

Isn’t that just a picture of the XML tree model?

title

book

Blå hav

Sharing!!

Page 29: Document Value Model:  Value-oriented XML processing for the internet

Navigation

• How do we navigate in an XML tree without parent and root pointers?

• DOM: current node contains complete navigation state, including parent and root-pointers

• DVM: navigation state characterized by [n0, ..., nk] where n0 is root and nk is’current’ node– allows navigation to parent and root, just as in DOM– does not require any storage in nodes, as in DOM– works also for shared nodes (”bread crumbs” method

for finding one’s way back in a labyrinth [dag])

Page 30: Document Value Model:  Value-oriented XML processing for the internet

DVM: basic interface

• The type of XML trees is an inductive datatype• Basic constructors (”factory” methods):

– Combine attributes, child list, tag into new element node

– Make chardata node from string

• Basic deconstructors (projections):– Get attributes, child list, tag, chardata

• Cells (updatable nodes):– setState, getState: atomic operations

Page 31: Document Value Model:  Value-oriented XML processing for the internet

DVM: general interface

• Equip nodes with the ability to receive and apply any function to itself or a function that is applied to every of its subnode

• Called Visitor pattern in OO design• Corresponds to unique homomorphism/type

elimination rule (”fold”) known from algebraic datatypes/type theory

• Lets nodes not only receive single ”commands” for execution, but whole programs.

Page 32: Document Value Model:  Value-oriented XML processing for the internet

Share-and-create style updating

author title

book

Susanne Staun Mit smukke lig

title

book

Blå hav

(functional) update operation

Page 33: Document Value Model:  Value-oriented XML processing for the internet

Universal references

author title

book

Susanne Staun Mit smukke lig

title

book

Blå hav

disk storage RAM

Never loaded from disk!

Page 34: Document Value Model:  Value-oriented XML processing for the internet

Universal references

• Value references are location independent:– always designate value, not where value is

stored– require routing service to be resolved!

• Value references can point from any place to any place:– from RAM to disk, from disk to disk, from disk

to network, from disk to RAM (!)...

Page 35: Document Value Model:  Value-oriented XML processing for the internet

XML Value Store

• Distributed persistence manager for XML elements

• Peer-to-peer architecture

• Global name server for binding and rebinding value references to human-readable names– Rebinding: bindings can be updated atomically.

Page 36: Document Value Model:  Value-oriented XML processing for the internet

XML Store: Basic interface

• Load value: Value load(ValueRef vr)

• Save value:ValueRef save(Value v)

• (That’s it)• Security/authentication not addressed yet:

– extended access control based interface– encrypted storage

Page 37: Document Value Model:  Value-oriented XML processing for the internet

XML Store: General interface

• The visitor interface allows nodes to receive any function and apply it to its state.

• Let’s do the same with the XML value store interface: Extending it with a visitor interface allows XML value stores to receive arbitrary code and execute it.

• Allows implementation of:– query languages– general remote processing (e.g. for ‘grid’ computing)

Page 38: Document Value Model:  Value-oriented XML processing for the internet

Code as values

• Program code = value: Code can be stored in the XML store.

• Remote execution then involves passing a value reference to the code to the receiver. If the receiver already has the corresponding value (code) – e.g. due to caching in the XML value store, no further communication is necessary; otherwise the value is requested (pulled in) by the receiver.

Page 39: Document Value Model:  Value-oriented XML processing for the internet

XML Value Store architecture

• Base configuration: each peer is a single component made up of:– ”raw” disk manager– network proxy for group of remote XML-store

peers– group communication presently based on:

• IP-multicast (Pedersen/Tejlgaard 2002), or• Chord-routing protocol (Baumann/Fennestad/Thorn

2002)

Page 40: Document Value Model:  Value-oriented XML processing for the internet

Configurable XML Stores

• Goal: Clients can construct XML Stores by constructing them from:– primitive XML stores (disk manager, in-RAM

manager, adapters to databases, file managers etc.), and – XML store constructors (“decorators”):

• caching reads and writes• asynchronous load/save• buffered load/save requests• encryption/decryption

• Target date: August 2003

Page 41: Document Value Model:  Value-oriented XML processing for the internet

A simple challenge

• Write a little program that implements a dictionary, e.g. for looking up phone numbers, and inserting and updating records.

• It should work on the net (concurrent access).• It should work for a while (also after the machine

has been taken down and restarted).• Surprisingly more complex to program than the

routines you learned in algorithm class...

Page 42: Document Value Model:  Value-oriented XML processing for the internet

Summary

• Value-oriented model for manipulating semistructured data: – supports light-weight caching, replication,

asynchronous computing in the “XML middleware”

• Configurable XML middleware (client can order the properties one wants from the XML store)

• Separation of program logic (in the client code) from generic deal

• Encourages clients to write transaction safe code programmatically

Page 43: Document Value Model:  Value-oriented XML processing for the internet

More info

• Website: www.plan-x.org– Presently contains material from seminar on

“distributed and mobile data and software” (including lots of references not mentioned here)

• Email: [email protected]


Recommended