Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
1
WebdamExchange and WebdamLog: some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland
Webdam WS, 04/03/2011
2
Organization
• Introduction
• Representing all Web information as logical sentences
• Representing all Web data management as logical rules
• Some clues about WebdamPoor
• Some clues about implementation
• Conclusion
4
Context of the work presented here
• Joint work with many people: Émilien Antoine, Serge Abiteboul, Meghyn Bienvenu, David Gross-Amblard, Marilena Oita, Amélie Marian, Bruno Marnette, Neoklis Polyzotis, Philippe Rigaux, Marie-Christine Rousset…
5
Context: Web data management
• Scale: lots of users, servers, large volume of data…
• Distribution heterogeneity: Cloud (social networks), P2P (DHT, gossiping)…
• Security heterogeneity: login, https, crypto, hidden URL…
• Terminology heterogeneity: annotation, semantic Web, ontologies…
• Incomplete information: inconsistencies, belief, trust…
• The heterogeneity keeps increasing with new systems and new applications arriving
• Consequence 1: difficulty to perform data integration/management
• Consequence 2: impossibility to keep control over its own data
6
Thesis: Web data = distributed knowledge
• Work plan1. Represent all Web information as logical sentences
2. Represent all Web data management as logical rules
3. Develop a system to validate these ideas
• Motivation for the approach• Facilitate the design/implementation of complex systems
• Facilitate the control/surveillance of complex systems
• Use reasoning to optimize query evaluation
• Use reasoning for semantics/ontologies
• Use reasoning to manage access control and protect data
• Use reasoning to analyze properties of systems
7
Motivating example
• Alice : get me the pictures of my friends where I am with Bob?
• What is going on:• Find the friends of Alice (The iPhone of Alice may remember it)
• For each answer, say Sue, find where Sue keeps her pictures (She may keep her pictures on Picasa)
• Find the means to access Sue’s pictures (Alice may ask the private url to a common friend)
• Find the photos with Bob and Alice (e.g. by querying the meta-data)
8
Motivating example
• Alice : get me the pictures of my friends where I am with Bob?
• Issues: heterogeneity of friends• Heterogeneity of hosting: Some keep their pictures on trusted servers
such as Picasa, some put in on untrusted DHT, some have them on their smartphones…
• Heterogeneity of access-control: Some are public, some use login-password, some use private url, some use cryptography…
• Heterogeneity of data description: they may use different models of meta-data (taxonomies, ontologies…)
11
The information belongs to someone
• Each information belongs to a principal• A principal has an identity (URI) which can be authenticated
• Two kinds of principal: peer and virtual principal
• A peer: alice-laptop, alice-iPhone, picasa, facebook, dht-peer-124, …• Storage and processing capabilities
• A peer typically has a URL and can be sent query/update requests
• A virtual principal: alice, alice-friends, roc14• A virtual principal relies on peers for storage and processing
12
The kind of information we are talking about
• Data: pictures, movies, music, emails, ebooks, reports
• Localization: bookmarks, knowledge such as Alice has an account in Facebook, Sue puts her pictures in Picasa
• Access: login/password, access rights on servers
• Annotations /Ontologies: semantic tags in Picasa ,RDFS, OWL
• Services: search engines, yellow pages, dictionaries…
• Incomplete information: beliefs, probabilistic information…
• And more…
13
Logical statements to represent information
• Data: • Document: picture34@alice-iPhone(picture34.jpg,09/12/2009,…)
• Collection: pictures@alice(picture34@alice-iPhone)
• Localization: where@alice(picture37, picasa/alice)
• Access right: isOwner@picasa/alice(alice)
• Access secret : ownSecret@picasa/alice(“alice”, “HG-FT23”)
• Ontologies: [email protected](“alice”, human-being)
• Services: [email protected]($Person, $City, $Y)
• Belief: picture34@alice-iPhone(picture34.jpg,09/12/2009,…,75%)
• Etc.
14
WebdamExchange focus: authenticated knowledge
• Base statement: • someone states picture37@alice (….)
• It is annotated with a proof that “someone” can write data of alice
• In the cryptographic setting, it is a signature of the whole statement using the write secret key of alice
• Keeping trace of provenance: • alice-laptop states picture37@alice (….) requester bob at 12:30,
10/08/2009
• alice-Laptop is the performer (the peer who did the update of the data of Alice)
• bob is the requester (the peer or the user who requested the update)
• The content is possibly encrypted: • alice-laptop states picture37@alice (….) protected for reader@alice
requester bob at 12:30, 10/08/2009
15
WebdamExchange focus: authenticated knowledge
• Communication: external knowledge is knowledge about other principals: • alice-laptop says (alice-laptop states picture37@Alice (….) requester
bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009
• alice-laptop is the performer of the communication
• sue-iphone is the receiver of the communication
• External knowledge is authenticated by the performer and is stored by the receiver .
• The external knowledge keep a trusted trace of the provenance and communication are pilled-up: • sue-iphone says (alice-laptop says (alice-laptop states picture37@Alice
(….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009) to bob-iphone at 13:10, 15/10/2009
• The time is the time of the performer, there is no global clock
16
The model covers a wide range of data
• The model does not prescribe any particular architecture for distribution• Gossiping, DHT, centralized server
• Combination of these
• Based on an abstract notion of localization
• The model does not prescribe how access control is enforced, e.g.:• Documents in Web servers with access protected by login/password
• Documents protected by cryptographic keys in public sites
• Based on an abstract notion of secret and hint
• See presentation of Emilien on WebdamPoor
17
Summary of WebdamExchange
• All the information forms a trusted knowledge base
• Each peer manages some portion of the knowledge base
• Now, we have to use this distributed knowledge base … for the management of the distributed knowledge base!
19
From WebdamExchange to Webdamlog
• The logical part of the WebdamExchange statements can easily be translated into datalog facts.
• Now we want to perform reasoning on these facts in order to locate, exchange, and update information• Example: use logical reasoning among peers to locate the
pictures of Alice’s friends in which she appears with Bob
• This motivates Webdamlog, a rule-based language for web data management
20
Why datalog?
• Datalog: very popular in the 90’s, prehistory by Web time+ Natural syntax; reasonably expressive; easy to extend
- Recursion not really essential in most applications
• Datalog extensions• Negation and aggregate functions lots of work on these
• Updates, time, trees, distribution less work on these
• We use a datalog-like language influenced by• Active XML for distribution and delegation
• Hellerstein’s Dedalus for time and performance
21
Webdamlog
• Facts (messages) of the form m@p(a1,...,an)
• Rules of the form R@P(U) :- (¬) R1@P1(U1), …, (¬) Rn@Pn(Un)
• R,Ri are relation terms
• P,Pi are peer terms
• U,Ui are tuples of terms
• Safety condition
• Intuition: if the body holds for some valuation v, the fact vR@vP(vU) is sent to the peer vP
• What happens if the body of the rule mentions different peers?• Peers need to collaborate to evaluate the rule rule delegation
22
Webdamlog
System:
• A finite set of peers
• Each peer p in has a local program P(p) and a delegated program D(p), which are both finite sets of rules
• Each peer p also has a database I(p) consisting of a finite set of facts of the form m@p(u)
Semantics:
• In a state (P,D,I), choose randomly some p • Evaluate (P(p)UD(p))(I(p))
• This defines the new DB I’(p)
• Send facts and update delegations of the other peers to define (D’(q),I’(q)) for each peer q≠p
• The changes to each q are installed instantaneously – we will see how to avoid this if desired
• Choose another peer and keep going (in a fair way)
23
Features of Webdamlog illustrated
Alice: get me the pictures of my friends where I am with Bob
result@alice-iphone($photo) :- friends@alice-iphone($X),
findPhotos@alice-iphone($X, $R, $P),
$R@$P($X, $Photo, $Meta),
contains@$P($Meta, “Alice”) ,
contains@$P($Meta, “Bob”)
findPhotos@alice-iphone($X, photos, picasa) :- member($X, picasa)
friends@alice-iphone(Sue) member(Sue,picasa)
- Peers and relations treated as data: they are reified
- $R@$P: will instantiate with concrete relation and peer
- friends@alice-iphone is extensional, occurs in data at alice-iphone
- findPhotos@alice-iphone intensional, derived from data + rules
24
Peer picasa will send the photos as extensional facts to alice-iphone.
When Alice terminates her query, she cancels all the delegations.
Features of Webdamlog illustrated
findPhotos@alice-iphone($X, photos, picasa) :- member($X, picasa)
friends@alice-iphone(Sue) member(Sue,picasa)
Partial evaluation at alice-iphone ($XSue, $R photos, $P picasa)Then alice-iphone installs the rest of the rule at picasa:result@alice-iphone($Photo,Sue) :-
photos@picasa(Sue,$Photo,$Meta),contains@picasa($Meta, “Alice”) , contains@picasa($Meta, “Bob”)
result@alice-iphone($photo) :- friends@alice-iphone($X),
findPhotos@alice-iphone($X, $R, $P),
$R@$P($X, $Photo, $Meta),
contains@$P($Meta, “Alice”) ,
contains@$P($Meta, “Bob”)
Alice: get me the pictures of my friends where I am with Bob
25
What can we show ?
• In general, asynchronicity yields non-deterministic systems
• Identified two types of Webdamlog systems (only positive rules / appropriately stratified negation) for which we have:• convergence: all runs eventually reach same state
• simulation by centralized datalog program
• Interesting to compare expressivity of different variants of WebdamLog: full / limited / no delegation, presence of time-stamps or ordering of peers…• For appropriate notion of simulation, can show that
full delegation > limited delegation > no delegation
26
More refined asynchronicity
• To model transmission of facts from peer p to peer q, we may use a “peer” netpq that captures the network
• Replace m@q(u) at p by m@netpq(u)
• netpq should just relay messages: $M@q($U) :- $M@netpq($U)
• Problem: all messages stocked in netpq arrive at the same time
• Better with time • m@netpq(u,t) where t is the time at p
• $M@q(U) :- $M@netpq (U,T), min(T, $M@netpq (U,T)),
using min aggregate function