�/56
Tova Milo – Tel Aviv University
Peer-to-Peer Data Integration with Active XML
Tova Milo
Tel-Aviv University
�/56
Tova Milo – Tel Aviv University
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data• Querying Active XML data• Distribution and replication• Security and Access control
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications
Conclusion
�/56
Tova Milo – Tel Aviv University
Introduction
�/56
Tova Milo – Tel Aviv University
Distributed data management in P2P Information is everywhere
services
XMLXML
services
XMLXMLXMLXML
services
XML
services
XMLWeb
Webservice
Webservice
Data warehousesDatabasesWeb sitesPC, PDA, cell phones, home appliances, cars…
�/56
Tova Milo – Tel Aviv University
The golden triangle of distributed data management
XML• a standard for data representation & exchange
Query languages• XPath, XQuery
Web services • standards for distributed computing • Activation of methods on remote servers
XQueryXPath
XML
Webservices
�/56
Tova Milo – Tel Aviv University
What is Active XML (AXML)?
AXML is a declarative language
for distributed information management
and
an infrastructure to support this language,
in a peer-to-peer framework.
�/56
Tova Milo – Tel Aviv University
Active XML
�/56
Tova Milo – Tel Aviv University
Active XML documents
XML documents with embedded calls to Web services
Intensional• Some of the data is given explicitly • Some is given intensionally
(i.e. the means to acquire data when needed are given)
Dynamic• If the external sources change, the same document will provide
different information• Reaction to world changes
/56
Tova Milo – Tel Aviv University
Not a new idea in databases, nor on the Web
Mixing calls to data is an old idea• Procedural attributes in relational systems• Basis of Object-oriented Databases
In HTML world • Sun’s JSP, PHP+MySQL
Calls to Web services inside XML documents• Macromedia FLEX, Apache Jelly, Microsoft XAML
What is new is the exploitation of the idea…
�/56
Tova Milo – Tel Aviv University
A sample AXML document<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date><call svc=“Yahoo.GetTemp”><city>Paris</city></call><call svc=“TimeOut.GetEvents”>exhibits</call></newspaper>
GetTemp
city
“Paris”
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
AXML documents may contain calls:• to any existing Web services
(e-bay.net, google.com…)• to any AXML Web services
(to be defined)
��/56
Tova Milo – Tel Aviv University
Materialization
We will see later that:• Replacing the call by its result is not the only option• Calls are not necessarily RPC-style synchronous invocations
<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date><call svc=“Yahoo.GetTemp”><city>Paris</city></call><call svc=“TimeOut.GetEvents”>exhibits</call></newspaper>
GetTemp
city
“Paris”
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
����
temp
“16°C”
SOAP call
<temp>16°C</temp>
��/56
Tova Milo – Tel Aviv University
AXML Web services
Parameters: AXML data
Result: AXML data
Distribute computations: by sending as parameters data containing service calls, one can delegate some work to other peers
Partial computations: by returning data containing service calls, one can give to the receiver the control of these calls
Great flexibility
��/56
Tova Milo – Tel Aviv University
Calling an AXML service<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date>
<call svc=“TimeOut.GetEvents”>exhibits</call></newspaper>
newspaper
titledate
“06/10/2003”
“Le Monde”
GetEvents
“Exhibits”
<temp>16°C</temp>
exhibits
GetExhibits
“Paris”
City
����
temp
“16°C”
SOAP call (still…)
Materialization is a recursive process
Termination is an issue
<exhibits> <call svc=“Yahoo.GetExhibits”><city>Paris</city></call></exhibits>
��/56
Tova Milo – Tel Aviv University
Novel issues
��/56
Tova Milo – Tel Aviv University
Active XML - Outline
Introduction
Active XML• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data (SIGMOD’03, PODS’05)• Querying Active XML data• Distribution and replication• Security and Access control
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications
Conclusion
��/56
Tova Milo – Tel Aviv University
To call or not to call ?
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
temp
“16°C”
����
� Materialization can be performed� by the sender, before sending a document� or by the receiver, after receiving it
GetEvents
“Exhibits”
newspaper
title date
“Le Monde”“06/10/2003”
GetTemp
city
“Paris”
temp
“16°C”
��/56
Tova Milo – Tel Aviv University
Why control the materialization of calls?
For added functionality, e.g. • Intensional data allows to get up-to-date information
For security reasons or capabilities, e.g.• I don’t trust this Web service/domain• I don’t have the right credentials to invoke it• It costs money• Maybe the receiver doesn’t know Active XML!
For performance reasons, e.g.• A proxy can invoke all the services on behalf of a PDA
… and many more reasons you can think of!
��/56
Tova Milo – Tel Aviv University
We extend XML Schema, with intensional types: XMLSchemaint
How to control it? Using types
Casting algorithms use signatures of services: WSDLint
... ...
r
......
...
... ...
gfq
...
CapabilitiesACLCost...
Sender
dataexchangeSchemaf q
g
CapabilitiesACLCost...
Receiver
gg
g
g
gg
q
q
q
f
fr
r
�/56
Tova Milo – Tel Aviv University
Rewritings
The Goal:Given • an AXML document d• a schema sCan we rewrite d so that it matches s?
Safe rewriting: one that for sure leads to s(we know without making any call)
Possible rewriting: one that possibly leads to s(depending on the answers of the services)
�/56
Tova Milo – Tel Aviv University
Results
The general problem is undecidable [MSS04]
Restrictions on the considered rewritings• Left-to-right: No “going back and forth”• K-depth: bound on the nesting of function calls
(Search space still infinite but finitely representable)
Under these restrictions• We have algorithms to find safe/possible rewritings• They are PTIME (for deterministic schemas)• We can also do it between schemas
Implementation• first demo at VLDB 2003 (customizable news syndication)
��/56
Tova Milo – Tel Aviv University
Active XML - Outline
IntroductionActive XML
• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data• Querying Active XML data (SIGMOD’04, PODS’05)• Distribution and replication• Security and Access control
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
ApplicationsConclusion
��/56
Tova Milo – Tel Aviv University
Querying AXML Data
Given a (tree pattern) query:/newspaper[temp > 18°C]/exhibits//exhibit[location=“Le Louvre”]
Materialize the document?
Call only the services that may contributedata to the query answer.
The problem: Lazy evaluation of service callsTo call or not to call, this time when evaluating a query
GetTemp
city
“Paris”
newspaper
titlegetDate
“Le Monde”
GetEvents
“Exhibits”
exhibits
GetExhibits
“Paris”
City
temp
“19°C”
��/56
Tova Milo – Tel Aviv University
Lazy evaluation
Difficulties:• Calls can be found everywhere in the document• May appear dynamically (as a result of previous calls)• May become (ir)relevant due to previous invocations• Need to take signatures of calls into consideration
Possible approach: modify the query processor• Trigger the calls found on the way• Not so great:
– Computation is blocked– Optimization opportunities are lost
Our solution:• Drives queries that find the relevant calls (recursively)• Use service signatures to prune irrelevant calls• Parallel call invocations• Pushing queries to capable external sources
��/56
Tova Milo – Tel Aviv University
Active XML - Outline
IntroductionActive XML
• Active XML documents• Active XML services
Novel issues• Exchanging Active XML data• Querying Active XML data • Distribution and replication• Security and Access control
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
ApplicationsConclusion
��/56
Tova Milo – Tel Aviv University
Active XML peers
��/56
Tova Milo – Tel Aviv University
Distributed data management in P2P
services
XMLXML
services
XMLXMLXMLXML
services
XML
services
XMLWeb
Webservice
Webservice
AXML
AXML
AXML
AXML
AXML
AXML
AXML
��/56
Tova Milo – Tel Aviv University
What is an AXML peer ?
Repository: manages persistent AXML data
Client: uses (AXML) Web services to dynamically enrich data
Server: easy (declarative) definition of AXML services
AXMLpeerso
ap
��/56
Tova Milo – Tel Aviv University
Global architecture
�����
���������
�� �������
��
��
��������� �
��
�������
�� ������
��������� ���������� �
AXML
XML
AXML
AXML
AXML store
servicedescriptions
AXMLengine
Query engine
�/56
Tova Milo – Tel Aviv University
Active XML - Outline
IntroductionActive XML
• Active XML documents• Active XML services
New issues• Exchanging Active XML data• Querying Active XML data• Distribution and replication• Security and Access control
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations
Applications• P2P auctions• News syndication• Other applications
Conclusion
�/56
Tova Milo – Tel Aviv University
Managing persistent AXML data
“Our newspaper should have its temperature information refreshed daily. New exhibits should be fetched every week and archived for 6 months”
Service call results enrich the document (calls can be kept for possible future reuse)
Main issues:• When to activate a service call? (pull/push, implicit/explicit)• What to do with its result? (add/replace/merge)
��/56
Tova Milo – Tel Aviv University
Example: AXML document with control attributes
<?xml version=“1.0” ?><newspaper><title>Le Monde</title><date>06/10/2003</date><call svc=“Yahoo.GetTemp” mode=“lazy”
valid=“1 day”merge=“replace” >
<city>Paris</city></call><call svc=“TimeOut.GetEvents” mode=“every Monday morning”
valid=“6 months”merge=“append”>
exhibits</call></newspaper>
��/56
Tova Milo – Tel Aviv University
Providing declarative AXML services
Services can be defined by queries or updates over the AXML documents of the repository (XQuery, XPath, Xupdate)
Users can subscribe to services
Services can be composed (BPEL4WS)
Which (lazy) service calls may contribute to the answer?
let service GetExhibitsByLocation($loc) be
for $a in document(“newspaper.xml")/newspaper/exhibits,
$b in $a//exhibit
where $b@name=$loc
return <exhibits> {$b} </exhibits>
��/56
Tova Milo – Tel Aviv University
Active XML - Outline
IntroductionActive XML
• Active XML documents• Active XML services
New issues• Exchanging Active XML data• Querying Active XML data• Distribution and replication• Security and Access control
Active XML Peers• The peer as a client• The peer as a server• Theoretical foundations (PODS’04, PODS’05)
ApplicationsConclusion
��/56
Tova Milo – Tel Aviv University
Applications
��/56
Tova Milo – Tel Aviv University
Demos
Peer-to-peer auctions (VLDB 2002 demo)• Discovery of new peers/auctions through intensional answers
RSS News syndication (VLDB 2003 demo)• Customization of services through schemas + news subscriptions
Decentralized management of patient data (VLDB 2004 demo)• Use AXML to coordinate the integration of data
and privacy enforcement services in a uniform way
Querying Business Processes (VLDB 2005 demo)• Use AXML to model and query BPEL specifications
Others…
A powerful framework for the fast developmentof distributed, data-centric applications.
��/56
Tova Milo – Tel Aviv University
Other applications
Dynamic warehouse on food risk management (E.dot) • Use AXML as the platform for the warehouse definition,
construction and maintenance
Network configuration (SWAN)• Consider using AXML exchange of information to
configure hardware/software components
Software distribution (EDOS)• Consider using AXML to customize distributions and
keep your view of the software fresh
��/56
Tova Milo – Tel Aviv University
Conclusion
��/56
Tova Milo – Tel Aviv University
AXML documents and services
A simple paradigm…
…that allows for new, powerful features
• Intensional parameters and results: AXML documents can be exchanged
• Support for continuous services (streams of answers)• Control over the exchange of AXML data• Lazy query evaluation
AXML implementation goes Open Source (ObjectWeb consortium)
�/56
Tova Milo – Tel Aviv University
Thanks:
Serge Abiteboul, Omar Benjelloun, IoanaManolescu, Bernd Amann, Jerome Baumgarten, Bogdan Cautis, and many others…