+ All Categories
Home > Documents > Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust...

Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust...

Date post: 19-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
30
Peer-to-Peer Data Integration Alan Davoust 28.02.2008 Alan Davoust Peer to Peer Data Integration 28 february 2008 1
Transcript
Page 1: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Peer-to-Peer Data Integration

Alan Davoust

28.02.2008

Alan Davoust Peer to Peer Data Integration 28 february 2008

1

Page 2: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Presentation outline

Alan Davoust Peer to Peer Data Integration 28 february 2008

What is Data Integration ?

From Conventional to Peer-to-Peer Data Integration

Some issues with semantics and inconsistencies

Overview of some P2P Data Integration applications

UP2P, our local project

2

Page 3: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Data Integration : example>> A Data Integration System for movies

Film:Title: ?1Director: Woody AllenPlaying:10 pmCinema:

city: Parisname: ?2

Film:Title:Director:Year:

Query Engine

Cinema:Name:Address:

Film :Title:Hours:

IMDB

Query over Global Schema

Pariscope

Q(?1, ?2) :- IMDB(?1, “woody allen”, - ) Λ Pariscope(?2, - ,?1, “10pm”)

Alan Davoust Peer to Peer Data Integration 28 february 2008

3

Page 4: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Data Integration

Alan Davoust Peer to Peer Data Integration 28 february 2008

Alan Davoust Peer to Peer Data Integration 28 february 2008

Data source A Data source B Data source C

Mediator

Wrapper Wrapper Wrapper

Combines data from different sources

Translates to a common data model

4

Page 5: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Data Integration: views, semantics

Alan Davoust Peer to Peer Data Integration 28 february 2008

Local Schema A

Local Schema B

Local Schema C

Global Schema

GAV view definition : G1(x) :- LA

1(x,y) Λ LB

2(y)

LAV : LA1(x,y) :- G

1(x) Λ G2(x,y)

GLAV: G1(x) Λ G

2(x,y) :- LA

1(x,y) Λ LC

1(y)

Logical Semantics of “:-”➢ G

1(x) Λ G

2(x,y) ← LA

1(x,y) Λ LC

1(y) (sound)

➢ G1(x) Λ G

2(x,y) → LA

1(x,y) Λ LC

1(y) (complete)

➢ G1(x) Λ G

2(x,y) ↔ LA

1(x,y) Λ LC

1(y) (exact)

5

Page 6: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Towards P2P architecture

Alan Davoust Peer to Peer Data Integration 28 february 2008

Data source A

Data source B

Data source C

P2P Mediator

Wrapper

Wrapper

Wrapper

P2P Mediator

P2P Mediator

6

Peer APeer B

Peer C

Page 7: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

P2P architecture : logical view

Alan Davoust Peer to Peer Data Integration 28 february 2008

Data source A Data source B Data source C

Peer A Schema

Peer B Schema

Peer C Schema

Peer to Peer mappings

Source mapping

Mapping example: Peer_A.ArticleTitle :- Peer_B.PublicationTitlePeer_A.Author :- concat(Peer_B.AuthorLastName, Peer_B.AuthorFirstName)

7

Page 8: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Data Integration: purpose

Alan Davoust Peer to Peer Data Integration 28 february 2008

Why get data from other sources?

Get more answers (Open World Assumption)

Other sources more authoritative

Combine information from several sources (relational join)

➔ Will define the semantics of the mappings

Imply “redundant” sources

No redundancy

8

Page 9: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Peer-to-peer networks

Alan Davoust Peer to Peer Data Integration 28 february 2008

Advantages

➢ Decentralized and dynamic

➢ Scalable

Disadvantages

➢ No control = no guarantee that sources are reliable

➢ Sometimes very long paths to reach other nodes

9

Page 10: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Some issues in P2P Data Integration

Alan Davoust Peer to Peer Data Integration 28 february 2008

Complexity / Expressiveness trade-off ➢ Language for queries and views/schema

determines complexity➢ Query answering is undecidable in general➢ In practice in a distributed setting delays can be high

Inconsistencies➢ Data inconsistencies➢ Incorrect mappings➢ Most studies assume the data is consistent

10

Page 11: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Inconsistencies

Alan Davoust Peer to Peer Data Integration 28 february 2008

How do inconsistencies arise ?➢ Negation ➢ Primary / foreign keys➢ CWA➢ Explicit constraints

Resolving Inconsistencies➢ We may trust all peers equally➢ We may have one or several more trusted peers

Inconsistency tolerance: theoretical studies➢ Calvanese et al. using modal logic➢ Bertossi et al. using answer set programming

11

Page 12: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Inconsistencies : Calvanese approach

Alan Davoust Peer to Peer Data Integration 28 february 2008

Mappings Qi → Q

j from Pi to Pj

Qk → Q

j from Pk to Pj

Case 1: Pi is inconsistent➢ Ignore it entirely

Case 2: data from Pi inconsistent with data from Pj

➢ Ignore imported data➢ (implicitly a peer trusts itself more)

Case 3: data from Pi inconsistent with data from Pk

➢ Ignore both pieces of imported data➢ (no preference between other peers)

Pj

Pi P

k

12

Page 13: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Inconsistencies : Calvanese approach

Alan Davoust Peer to Peer Data Integration 28 february 2008

Summary :➢ Ignore inconsistent peers➢ Transfer knowledge only if it does not create inconsistencies➢ Implicit trust relation : each peer trusts itself more than the

others➢ Compatible with OWA and CWA

Formalization with multi-modal logic:

Mapping Qi → Q

j from Pi to Pj produces semantic rule:

Pj

Pi P

k

13

Page 14: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Inconsistencies : Bertossi approach

Alan Davoust Peer to Peer Data Integration 28 february 2008

Explicit trust relations between peers Closed-world assumption Propagation of queries is only for consistency checking

Peer consistent answers based on the notion of peer solution

A solution is an DB instance which respects all the constraints for the peers and stays as close as possible to original data

14

Page 15: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Inconsistencies : Bertossi approach

Alan Davoust Peer to Peer Data Integration 28 february 2008

Computing the solutions for a peer : Import all relevant data from peers Depending on data exchange constraints: Virtually add, remove tuples from relations to resolve inconsistencies Data imported from more trusted peers does not change(even virtually)

➔ Peer Consistent Answers to a query are those which hold in every solution.

15

Page 16: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Inconsistencies : Example

Alan Davoust Peer to Peer Data Integration 28 february 2008

Schemas and data at the peersPj :

Pi :

Pk:

Queries to Pj : 1: People born in Ottawa ?2: Citizenship of Bob ?3: Birthplace of Bob ?

Pj

Pi P

k

Person birthplaceAlice 01/04/75 ParisBob 04/05/80 Orleans

birthdate

Person CitizenshipAlbert Ottawa CanadianBob Orleans Canadian

PlaceOfBirth

Person CitizenshipAlice Ottawa FrenchBob Orleans French

PlaceOfBirth

16

Page 17: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

A few P2P data integration applications

Alan Davoust Peer to Peer Data Integration 28 february 2008

Piazza

Edutella

SomeWhere / SomeRDFS

UP2P

17

Page 18: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

PIAZZA (U of Washington, Seattle)

Alan Davoust Peer to Peer Data Integration 28 february 2008

Peer Data Management System

XML, Relational DBs

Recursive rewriting and propagation of queries

Allows different semantics for mappings

Nodes may contribute data and / or mappings

18

Page 19: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

PIAZZA: example 2

Alan Davoust Peer to Peer Data Integration 28 february 2008

19

Page 20: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Edutella (Stockholm, Hannover, Kassel)

Alan Davoust Peer to Peer Data Integration 28 february 2008

Distributed ontology of educational resources

Super-peer topology

Single schema (meta-model) for super-peers

Queries independently executed on each node (including “join” type queries)

Own query language (an extension of datalog)

20

Page 21: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

SomeWhere / SomeRDFS (Paris Orsay)

Alan Davoust Peer to Peer Data Integration 28 february 2008

Framework for distributed ontologies Any peer can extend the ontology by declaring relations with classes defined by other peers DRAGO built roughly on the same idea

21

Page 22: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

U-P2P (Carleton U :))

Alan Davoust Peer to Peer Data Integration 28 february 2008

Our local project!! Share metadata and files

P2P file sharing apps

P2P file sharing with searchable

metadata

Each client supports several

schemas

Data Integration

Integration viaP2P mappings

across schemas

P2P mappings within peer connect schemas

22

Page 23: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

U-P2P

Alan Davoust Peer to Peer Data Integration 28 february 2008

Repository

UP2P Client

23

Repository

UP2P Client

Repository

UP2P Client

Root community

Science_Papers

Chem_molecules

Root community

Science_Papers

Articles

Root community

Chem_molecules

Multiple schemas (Communities) Each community is a (P2P) distributed database

Page 24: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

U-P2P

Alan Davoust Peer to Peer Data Integration 28 february 2008

Query : { doc | Science_papers.Author(“Einstein”, doc)} ? Query propagated only within community: no network flooding

24

Peer C

Root community

Science_Papers

Chem_molecules

Root community

Science_Papers

Articles

Root community

Chem_molecules

Peer B

Peer A

Page 25: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

U-P2P : Data Integration perspective

Alan Davoust Peer to Peer Data Integration 28 february 2008

Data Integration is between heterogeneous communities Semantic mappings defined as “bridge” Queries in one communitycan be propagated to othercommunities Meta-model allows for mappings to be shared in a specific community

Peer A

23

Peer B

Peer C

Root community

Mapping

Articles

Chem_molecules

Peer A

Peer B

Peer CPeer B

Peer BScience_Papers

...

...

Peer B

Page 26: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

U-P2P : Challenges

Alan Davoust Peer to Peer Data Integration 28 february 2008

On-going refactoring of application Integrate formal semantics for mappings

... Any suggestions?

25

Page 27: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer
Page 28: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

P2P Data Integration : Purpose

Alan Davoust Peer to Peer Data Integration 28 february 2008

3

File sharing : get { doc | Author(“Einstein”, doc)} Expected response: Metadata about the doc? The doc itself?

(search or retrieve) Usually, each peer can provide answers autonomously Challenge is to propagate query and possibly translate it

Process a query over a distributed knowledge base :get { film, theatre | director(“Spielberg”, film) and PlaysIn(film, theatre, “28.02.2008”) }

Expected response: a set of atomic pieces of data (DB entries) A response may result from relational join operation on several

peers Challenge is to rewrite the query

Page 29: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

Issues with Query processing

Alan Davoust Peer to Peer Data Integration 28 february 2008

3

Do queries terminate? Most studies assume limited network of peers Issues with topology (cycles)

Certain answers to a query : True in every possible consistent interpretation Open-World vs. Closed-World Assumption

Inconsistencies depend on: Redundancy in DBs Integrity constraints Uniqueness of names

Page 30: Peer-to-Peer Data Integration - Carleton€¦ · Inconsistencies : Bertossi approach Alan Davoust Peer to Peer Data Integration 28 february 2008 Computing the solutions for a peer

P2P data integration

Alan Davoust Peer to Peer Data Integration 28 february 2008

3

Metadata is logical predicate about some file or object

Conjunctive query : get x such that...

-> x can be object (file, URL)

-> x can be atom (logical atom)

We have the logical view where we process logical queries in a distributed knowledge base


Recommended