RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013

Post on 20-Jan-2015

1,746 views 5 download

Tags:

description

The Relational Databases to RDF (RDB2RDF) Tutorial at the 2013 International Semantic Web Conference (ISWC2013)

transcript

Relational Database to RDF (RDB2RDF)

TutorialInternational Semantic Web

Conference ISWC2013Juan F. Sequeda

Daniel P. Miranker

Barry Norton

RDB2RDF TutorialIntroduction

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

www.rdb2rdf.org - ISWC2013

What is RDB2RDF?

ID

NAME

AGECID

1 Alice 25 100

2 BobNULL 100

Person

CIDNAM

E

100 Austin

200Madri

d

City

<Person/1>

<City/100>

Alice 25

Austin

<Person/2>

Alice

<City/200> Madrid

foaf:namefoaf:name foaf:age

foaf:name

foaf:name

foaf:based_near

www.rdb2rdf.org - ISWC2013

Context

RDFData Management

Relational Database to RDF(RDB2RDF)

TriplestoresWrapper Systems

Extract-Transform-Load(ETL)

RDBMS-backedTriplestores

NativeTriplestores

NoSQLTriplestores

www.rdb2rdf.org - ISWC2013

Outline

• Historical Overview

• 4 Scenarios

• Overview W3C RDB2RDF Standards– Direct Mapping– R2RML

www.rdb2rdf.org - ISWC2013

www.rdb2rdf.org - ISWC2013

www.rdb2rdf.org - ISWC2013

www.rdb2rdf.org - ISWC2013

F2F Meeting ISWC 2008

March 2008 February 2009

1. Recommendation to standardize a mapping language

2. RDB2RDF Survey

(2) http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf

(1) http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf-20090126/

October 2008

www.rdb2rdf.org - ISWC2013

Sept 2009 Sept 2012

www.rdb2rdf.org - ISWC2013

Sep-09

Oct-09

Nov-09

Dec-09

Jan-10

Feb-10

Mar-10

Apr-10

May-10

Jun-10Jul-1

0

Aug-10

Sep-10

Oct-10

Nov-10

Dec-10

Jan-11

Feb-11

Mar-11

Apr-11

May-11

Jun-11Jul-1

1

Aug-11

Sep-11

Oct-11

Nov-11

Dec-11

Jan-12

Feb-12

Mar-12

Apr-12

May-12

Jun-12Jul-1

2

Aug-12

Sep-12

Oct-12

0

50

100

150

200

250

First F2F @Semtech 2010

FPWDR2RML

WDR2RML + DM

RecR2RML + DM

Candidate RecR2RML + DM Proposed Rec

R2RML + DM

FPWDDM

WDR2RML+DM

WDR2RML+DM

Photo from cygri http://www.flickr.com/photos/cygri/4719458268/

www.rdb2rdf.org - ISWC2013

Statistics

• 206 Actions• 78 Issues

– 61 Closed– 17 Postponed

• public-rdb2rdf-wg– 3393 emails (Sept 2009 – Oct 2012)

• public-rdb2rdf-comments– 200 emails (Sept 2009 – March 2013)

www.rdb2rdf.org - ISWC2013

Outline

• Historical Overview

• 4 Scenarios

• Overview W3C RDB2RDF Standards– Direct Mapping– R2RML

www.rdb2rdf.org - ISWC2013

How to include relational data in a semantic application?

• Many architectural design choices.

• Technology Development Fluid.

• No established “best-of-breed” sol’n.

www.rdb2rdf.org - ISWC2013

Feature Space of Design Choices • Scope of the application

– Mash-up topic page– Heterogeneous Enterprise Data Application

• Size of the (native) database – Data Model– Contents

• Size of the useful (in application) database– Data Model– Contents

• When to translate the data?– Wrapper – ETL

www.rdb2rdf.org - ISWC2013

Reduction to 4 Scenario’s

www.rdb2rdf.org - ISWC2013

Scenario 1: Direct Mapping

Suppose: • Database of Chinese Herbal Medicine and Applicable Conditions

– Database is static.– Herbs and conditions do not have representation in western medical ontologies.

www.rdb2rdf.org - ISWC2013

Scenario 1: Direct Mapping

Suppose: • Database of Chinese Herbal Medicine and Applicable Conditions

– Database is static.– Herbs and conditions do not have representation in western medical ontologies.

RelationalDatabase

Direct Mapping

EngineTriplestore

Extract Transform Load

SPARQL

www.rdb2rdf.org - ISWC2013

Scenario 1: Direct Mapping

Suppose: • Database of Chinese Herbal Medicine and Applicable Conditions

Then:• Existing table and column names are encoded into URIs• Data is translated into RDF and loaded into an existing, Internet

accessible triplestore.

RelationalDatabase

Direct Mapping

EngineTriplestore

Extract Transform Load

SPARQL

www.rdb2rdf.org - ISWC2013

Scenario 2: R2RML

Suppose: • Database of Chinese Herbal Medicine and Applicable Conditions

+ Clinical Records– Database is static.– Also have, patient names, demographics, outcomes

www.rdb2rdf.org - ISWC2013

Scenario 2: R2RMLSuppose: • Database of Chinese Herbal Medicine and Applicable Conditions

+ Clinical Records

RelationalDatabase

R2RMLMapping

Engine

DomainOntologies

(e.g FOAF, etc)

R2RMLFile

Extract Transform Load

Triplestore

SPARQL

www.rdb2rdf.org - ISWC2013

Scenario 2: R2RML• Database of Chinese Herbal Medicine and Applicable Conditions

+ Clinical Records

• Then:– Developer says, “I know FOAF, I’ll write some R2RML and that data will have canonical

URIs, and people will be able to use the data”.

RelationalDatabase

R2RMLMapping

Engine

DomainOntologies

(e.g FOAF, etc)

R2RMLFile

Extract Transform Load

Triplestore

SPARQL

www.rdb2rdf.org - ISWC2013

Scenario 4: Automatic Mapping

Suppose: • Database of Electronic Medical Records• Application, integration of all of a hospitals IT systems• Database has 100 tables and a total of 7,000 columns• Use of existing ontologies as a unifying data model

– ICDE10 codes (> 12,000 concepts)– SNOMED vocabulary (> 40,000 concepts)

www.rdb2rdf.org - ISWC2013

Scenario 4: Automatic Mapping

Relational Database

RefinedR2RML

Direct Mapping as

Ontology

RDB2RDF WrapperSPARQL

Source Putative Ontology

RDF

AutomaticMapping

DomainOntologies

Suppose: • 7,000 Columns• Use of existing ontologies as a

unifying data model– ICDE10 codes (> 12,000 concepts)– SNOMED vocabulary (> 40,000 concepts)

Then:• Convert the database schema and

data to an ontology.• Apply ontology alignment program

www.rdb2rdf.org - ISWC2013

Scenario 4: Automatic MappingSuppose: • 7,000 Columns• Use of existing ontologies as a

unifying data model– ICDE10 codes (> 12,000 concepts)– SNOMED vocabulary (> 40,000 concepts)

Then:• A semantic system implements the

solution with no human labor

Relational Database

RefinedR2RML

Direct Mapping as

Ontology

RDB2RDF WrapperSPARQL

Source Putative Ontology

RDF

AutomaticMapping

DomainOntologies

www.rdb2rdf.org - ISWC2013

Scenario 3: Semi-automatic Mapping

RelationalDatabase

RefinedR2RML

Direct Mapping as

Ontology

RDB2RDFWrapper

SPARQL

Source Putative Ontology

RDF

Semi-AutomaticMapping

DomainOntologies

www.rdb2rdf.org - ISWC2013

Outline

• Historical Overview

• 4 Scenarios

• Overview W3C RDB2RDF Standards– Direct Mapping– R2RML

W3C RDB2RDF Standards

• Standards to map relational data to RDF

• A Direct Mapping of Relational Data to RDF– Default automatic mapping of relational data to

RDF

• R2RML: RDB to RDF Mapping Language– Customizable language to map relational data to

RDF

www.rdb2rdf.org - ISWC2013

www.rdb2rdf.org - ISWC2013

www.rdb2rdf.org - ISWC2013

RDF

Direct Mapping

RelationalDatabase

Direct Mapping

Engine

Input: Database (Schema and Data)Primary KeysForeign Keys

OutputRDF graph

www.rdb2rdf.org - ISWC2013

Direct Mapping Result

ID

NAME

AGECID

1 Alice 25 100

2 BobNULL 100

Person

CIDNAM

E

100 Austin

200Madri

d

City

<Person/ID=1>

<City/CID=100>

Alice25

Austin

<Person/ID=2>

Alice

<City/CID=200> Madrid

<Person#NAME><Person#AGE> <Person#NAME>

<Person#NAME>

<Person#NAME>

<Person#ref-CID><Person#ref-CID>

www.rdb2rdf.org - ISWC2013

www.rdb2rdf.org - ISWC2013

RDF

R2RML

RelationalDatabase

R2RMLMapping

Engine

OWLOntologies

(e.g FOAF, etc)

R2RMLFile

www.rdb2rdf.org - ISWC2013

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ].

Direct Mapping as R2RML

www.rdb2rdf.org - ISWC2013

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

Customized R2RML

www.rdb2rdf.org - ISWC2013

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ];

rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;rr:joinCondition [

rr:child “CID”;rr:parent “CID”;

]]

] .

<TriplesMap2> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”City" ];

rr:subjectMap [ rr:template "http://ex.com/City/{CID}"; rr:class ex:City ];

rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column ”TITLE" ] ] .

www.rdb2rdf.org - ISWC2013

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:sqlQuery

“””SELECT ID, NAMEFROM Person WHERE gender = “F” “””];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class <http://www.ex.com/Woman> ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

R2RML View

Questions

Next: Direct Mapping

RDB2RDF TutorialDirect Mapping

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

40

RDF

Direct Mapping

RelationalDatabase

Direct Mapping

Engine

Completely Automatic

41

W3C Direct Mapping

• Input: – Database (Schema and Data)– Primary Keys– Foreign Keys

• Output– RDF graph

What do we need to automatically generate?

• Generate Identifiers– IRI– Blank Nodes

• Generate Triples– Table– Literal– Reference

Generating Identifiers

• Identifier for rows, tables, columns and foreign keys

• If a table has a primary key, – then the row identifier will be an IRI, – otherwise a blank node

• The identifiers for table, columns and foreign keys are IRIs

• IRIs are generated by appending to a given base IRI• All strings are percent encoded

Row Node

1) <http://www.ex.com/Person/ID=1>

Base IRI “Table Name”/“PK attr”=“PK value”

2) <http://www.ex.com/Person/ID=1;SID=123>

Base IRI “Table Name”/“PK attr”=“PK value”

3) Fresh Blank Node

More IRI

1) <http://www.ex.com/Person>

Base IRI “Table Name”

2) <http://www.ex.com/Person#NAME>

Base IRI “Table Name”#“Attribute”

3) <http://www.ex.com/Person#ref-CID>

Base IRI “Table Name”#ref-“Attribute”

46

ID (pk)

NAME AGE

1 Alice 25

2 Bob NULL

Person

Table Triple

<http://www.ex.com/Person/ID=1>

<http://www.ex.com/Person>rdf:type

47

<http://www.ex.com/Person/ID=1>

<http://www.ex.com/Person#NAME>

“Alice” .

Literal Triples

ID (pk)

NAME AGE

1 Alice 25

2 Bob NULL

Person

48

ID (pk)

NAME

AGECID(fk)

1 Alice 25 100

2 BobNULL

200

PersonCID (pk)

TITLE

100Austin

200Madri

d

CityReference Triples

<http://www.ex.com/Person/ID=1>

<http://www.ex.com/Person#ref-CID>

<http://www.ex.com/City/CID=100>.

49

Direct Mapping Result

ID

NAME

AGECID

1 Alice 25 100

2 BobNULL 100

Person

CIDNAM

E

100 Austin

200Madri

d

City

<Person/ID=1>

<City/CID=100>

Alice25

Austin

<Person/ID=2>

Alice

<City/CID=200> Madrid

<Person#NAME><Person#AGE> <Person#NAME>

<Person#NAME>

<Person#NAME>

<Person#ref-CID><Person#ref-CID>

50

Summary: Direct Mapping

• Default and Automatic Mapping• URIs are automatically generated

– <table>– <table#attribute>– <table#ref-attribute>– <Table#pkAttr=pkValue>

• RDF represents the same relational schema• RDF can be transformed by

SPARQL CONSTRUCT– RDF represents the structure and ontology of mapping

author’s choice

51

What else is missing?

• Relational Schema to OWL is *not* in the W3C standard

• NULL values

• Many-to-Many relationships (binary tables)

• “Ugly” IRIs

NULL

“The direct mapping does not generate triples for NULL values. Note that it is not known how to relate the behavior of the obtained RDF graph with the standard SQL semantics of the NULL values of the source RDB.”

A Direct Mapping of Relational Data to RDF. W3C Recommendation

52

Problem

1. How can a relational database schema and data, be automatically mapped to OWL and RDF?

2. How can we assure correctness of mapping?

53

54

ptID label prID

10 ACME Inc 4

11 FooBars 5

prID title loc

4 Foo TX

5 Bar NULL

Product

Producer

ex:Product ex:Producerpt:Producer

Stringpt:label

String

Stringpr:title

pr:loc

ex:Product11 ex:Producer5pt:Producer

Barpt:label

FooBarspr:title

Input• Relational Schema R• Set Σ of Primary Keys PK and

Foreign Keys FK over R• Instance I of R

Output• RDF graph• OWL ontology as a graph

We need to be careful about two issues• Binary Relations• NULLs

rdf:type rdf:type

Mapping

NULLs

• What should we do with NULLs?– Generate a Blank Node

– Don’t generate a triple

55

How do we reconstruct the NULL?

prID title loc

4 Foo TX

5 Bar NULLex:Producer5

Bar

_:a

pr:title

pr:loc

ex:Producer5 Barpr:title

Direct Mapping Properties

• Fundamental Properties– Information Preserving: no information is lost– Query Preserving: no query is lost

• Desirable Properties– Monotonicity– Semantics Preserving:

Information Preservation

57

RDB

Direct Mapping

Inverse Direct Mapping

Query Preservation

58

Direct Mapping

Result of Q*Result of Q =

RDB

Monotonicity

59

Direct MappingRDB

RDB Direct Mapping

subset subset

New Data

Semantics Preservation

60

Direct MappingRDB

Direct MappingRDB

Semantics Preservation

The Nugget

• Defined a Direct Mapping DM

• Formally defined semantics using Datalog

• Considered RDBs that may contain NULL values

• Studied DM wrt 4 properties– Information Preservation– Query Preservation– Monotonicity– Semantics Preservation

62

Sequeda, Arenas & Miranker. On Directly Mapping Relational Databases to RDF and OWL. WWW 2012Sequeda et. al. Survey of Directly Mapping SQL Databases to the Semantic Web. J KER 2011Tirmizi, Sequeda & Miranker. Translating SQL Applications to the Semantic Web. DEXA 2008

Direct Mapping

Input: A relational schema R a set of Σ of primary keys and foreign keys and a database instance I of this schemaOutput: An RDF Graph

63

Definition:

A direct mapping M is a total function from the set of all (R, Σ, I) to the set of all RDF graphs

The Direct Mapping DM

• Relational Schema to OWL – S.H. Tirmizi, J.F. Sequeda and D.P. Miranker.

Translating SQL Applications to the Semantic Web. DEXA 2008

• Relational Data to RDF– M. Arenas, A. Bertails, E. Prud’hommeaux and J.F.

Sequeda. A Direct Mapping of Relational Data to RDF. W3C Recommendation. 27 September 2012

64

65

IR, Σ Predicates to

store (R, Σ, I)Predicates to

Store Ontology O

Datalog Rules to generate O from R, Σ

Datalog Rules to generate

RDF from O and I

Datalog Rules to generate OWL from O

OWL

RDF

Direct Mapping RDB to RDF and OWL

Running Example

Consider the following relational schema:– person(ssn, name, age) : ssn is the primary key– student(id, degree, ssn) : id is the primary key,

ssn is a foreign key to ssn in person

Consider the following instance:

ssn name age

123 Juan 26

456 Marcelo 27

789 Daniel NULL

id degree ssn

1 Math 789

2 EE 456

3 CS 123

personstudent

66

Input: Relational Schema

• Rel(r) : – Rel(student)

• Attr(a, r) : – Attr(degree, student)

• PKn(a1, … , an, r) : – PK1(id, student)

• FKn(a1, … , an, r, b1, … , bn, s) : – FK1(ssn, student, ssn, person)

67

id degree ssn

1 Math 789

2 EE 456

3 CS 123

student

Input: Instances

• Value(v, a, t, r)– Value( 1, id, t1, student)– Value( Math, degree, t1, student)– Value( 789, ssn, t1, student)– Value( 2, id, t2, student)– Value( EE, degree, t2, student)– Value( 456, ssn, t2, student)– Value( 3, id, t3, student)– Value( CS, degree, t3, student)– Value( 123, ssn, t3, student)

68

student

id degree ssn

1 Math 789

2 EE 456

3 CS 123

Mapping to OWL

69

Triple(http://ex.org/person, rdf:type, owl:Class)

Triple(U,"rdf:type","owl:Class") ← Class(R), ClassIRI(R, U)

ClassIRI(R, X) ← Class(R), Concat2(base, R, X)

Class(X) ← Rel(X), ¬IsBinRel(X)

IsBinRel(X) ← BinRel(X, A, B, S, C, T, D)

BinRel(R, A, B, S, C, T, D) ← PK2(A, B, R), ¬ThreeAttr(R), FK1(A,R,C,S),R = S, FK1(B,R,D,T),R = T, ¬TwoFK(A, R), ¬TwoFK (B, R), ¬OneFK(A, B, R), ¬FKTo(R)

Mapping to RDF

70

Table triples: for each relation, store the tuples that belongs to it

Triple(http://ex.org/person#ssn=123, rdf:type,

http://ex.org/person)

Mapping to RDF

71

Table triples: for each relation, store the tuples that belongs to it

Triple(http://ex.org/person#ssn=123 , rdf:type,

http://ex.org/person )Literal triples: for each tuple, store the values in each of its attributes

Triple(http://ex.org/person#ssn=123 , http://ex.org/person#name , “Juan”)

Mapping to RDF

72

Reference triples: store the references generated by the FKs

Triple(http://ex.org/student#id=3 ,

http://ex.org/student,person#ssn,ssn ,

http://ex.org/person#ssn=123 )

Mapping to RDF

73

Triple(http://ex.org/person#ssn=123 , http://ex.org/person#name , “Juan”)

Triple(U,V, W) ← DTP(A,R), Value(W, A, T, R), W != NULL.TupleID(T,R,U), DTP_IRI(A,R,V)

DTP(A,R) Attr(A,R), ¬IsBinRel(X)

DTP_IRI(A, R, X) ← DTP(A,R) , Concat4(base, R,”#”, A, X)

TupleID(T, R, X) Class(R), PKn(A1, …, An, R),Value(V1, A1, T, R), …, Value(Vn, An, T, R),RowIRIn(V1, …, Vn, A1, …, An, T, R, X)

Information Preservation

74

IR, Σ

M(R, Σ, I)

M- (M(R, Σ, I))

Proof: Provide a computable mapping M-

Theorem: The Direct Mapping is information preserving

Relational Algebra tuples vs. SPARQL mappings

75

ssn name age

789 Daniel NULL

persont.ssn = 789t.name = Danielt.age = NULL

Then, tr(t) = μ :• Domain of μ is {?ssn, ?name}• μ(?ssn) = 789• μ(?name) = Daniel

Query Preservation

76

IR, Σ M(R, Σ, I)

Proof: By induction on the structure of QBottom-up algorithm for translating Q into Q*

Theorem: The Direct Mapping is query preserving

eval(Q*, M(R, Σ, I))eval(Q, I) =tr( )

Example of Query Preservation

πname, age( σdegree = EE (student) person)

77

ssn name age

123 Juan 26

456 Marcelo 27

789 Daniel NULL

id degree ssn

1 CS 789

2 EE 456

3 Math 123

personstudent

Example of Query Preservation

πname, age( σdegree = EE (student) person)

78

SELECT ?id ?degree ?ssnWHERE { ?x rdf:type <…/student>. OPTIONAL{?x <…/student#id> ?id. } OPTIONAL{?x <…/student#degree> ?degree. } OPTIONAL{?x <…/student#ssn> ?ssn. }}

id degree ssn

1 CS 789

2 EE 456

3 Math 123

student

Example of Query Preservation

πname, age( σdegree = EE (student) person)

79

SELECT ?id ?degree ?ssnWHERE { ?x rdf:type <…/student>. OPTIONAL{?x <…/student#id> ?id. } OPTIONAL{?x <…/student#degree> ?degree. } OPTIONAL{?x <…/student#ssn> ?ssn. } FILTER(?degree != “EE” && bound(?degree) )}

id degree ssn

1 CS 789

2 EE 456

3 Math 123

student

Example of Query Preservation

πname, age( σdegree = EE(student) person)

80

SELECT ?ssn ?name ?ageWHERE { ?x rdf:type <…/person>. OPTIONAL{?x <…/person#ssn> ?ssn. } OPTIONAL{?x <…/person#name> ?name. } OPTIONAL{?x <…/person#age > ?age. }}

ssn name age

123 Juan 26

456 Marcelo 27

789 Daniel NULL

person

πname,age( σdegree = EE(student) person)

81

{SELECT ?ssn?name ?ageWHERE { ?x rdf:type <…/person>. OPTIONAL{?x <…/person#ssn> ?ssn. } OPTIONAL{?x <…/person#name> ?name. } OPTIONAL{?x <…/person#age > ?age. } FILTER(bound(?ssn)}}

{SELECT ?id ?degree ?ssnWHERE { ?x rdf:type <…/student>. OPTIONAL{?x <…/student#id> ?id. } OPTIONAL{?x <…/student#degree> ?degree. } OPTIONAL{?x <…/student#ssn> ?ssn. } FILTER(?degree != “EE” && bound(?degree) ) FILTER(bound(?ssn)}}

SELECT ?name ?age{

}

Monotonicity

82

M(R, Σ, I1)

M(R, Σ, I2)

I1 I2 M(R, Σ, I1) M(R, Σ, I2)

I1 R, Σ

I2 R, Σ

Proof: All negative atoms in the Datalog rules refer to the schema, where the schema is fixed.

Theorem: The Direct Mapping is monotone

Semantics Preservation

83

M(R, Σ, I)

M(R, Σ, I)

IR, Σ

IR, Σ

I satisfies Σ

I does not satisfies Σ

Consistent under OWL semantics

Not consistent under OWL semantics

DM is not Semantics Preserving

84

ssn name

123 Juan

123 Marcelo

person

ssn is the PK

I does not satisfy Σ

#ssn=123person#name Juan

Marcelo

person#name

however DM(R, Σ, I) is consistentunder OWL semantics

DM(R, Σ, I) 123person#ssn

Theorem: No monotone direct mapping is semantics preserving

Proof: By contradiction.

Extending DM for Semantics Preservation

• Family of Datalog rules to determine violation – Primary Keys– Foreign Keys

• Non-monotone direct mapping• Information Preserving• Query Preserving• Semantics Preserving

85

Summary

• The Direct Mapping DM– Formally defined semantics using Datalog– Consider RDBs that may contain NULL values– Monotone, Information and Query Preserving

• If you migrate your RDB to the Semantic Web using a monotone direct mapping, be prepared to experience consistency when what one would expect is inconsistency.

86

W3C Direct Mapping

• Only maps Relational Data to RDF– Does not consider schema

• Monotone• Not Information Preserving

– Because it does not direct map the schema• Not Semantics Preserving

87

Questions?

Next: From Direct Mapping to R2RML

Backup Slides

89

DM is not Semantics Preserving

90

ssn name

123 Juan

123 Marcelo

person

ssn is the PK

I does not satisfy Σ

#ssn=123person#name Juan

Marcelo

person#name

however DM(R, Σ, I) is consistentunder OWL semantics

DM(R, Σ, I)

123person#ssn

PREFIX ex: <http://ex.org/>PREFIX person: <http://ex.org/person#>

ex:person rdf:type owl:Class .person:name rdf:type owl:DatatypeProperty ;

rdfs:domain ex:person .person:ssn rdf:type owl:DatatypeProperty ;

rdfs:domain ex:person .

What about owl:hasKey

• Student/id=NULL, rdf:type Student• Student/id=1, degree, math

• owl:hasKey can not make me have a value

91

id degree

NULL Math

student

owl:hasKey

• Tuple 1– Student/id=1, student#id, 1– Student/id=1, degree, math

• Tuple 2– Student/id=1, student#id, 1– Student/id=1, degree, EE

• DM generate the same IRI Student/id=1 for two different tuples. This does not violate owl:hasKey

92

id degree

1 Math

1 EE

student

owl:hasKey

• Tuple 1– Student/id=1, student#id, 1– Student/id=1, degree, math

• Tuple 2– Student/id=1, student#id, 1– Student/id=1, degree, EE

• However, UNA works:– Student/id=1 differentFrom Student/id=1

• However a new DM that generates IRIs based on tuple ids– Owl:hasKey would work

93

id degree

1 Math

1 EE

student

Semantics Preserving DMpk

• Find violation of PK• Create artificial triple that will generate

contradiction

94

Semantics Preserving DMpk+fk

• Find violation of FK• Create artificial triple that will generate

contradiction

95

RDB2RDF TutorialFrom Direct Mapping to R2RML

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

97

RDF

R2RML

RelationalDatabase

R2RMLMapping

Engine

OWLOntologies

(e.g FOAF, etc)

R2RMLFile

98

W3C R2RML

• Input– Database (schema and data)– Target Ontologies– Mappings between the Database and Target

Ontologies in R2RML• Output

– RDF graph

99

RDF

RelationalDatabase

R2RMLMapping

Engine

OWLOntologies

(e.g FOAF, etc)

R2RMLFile

Direct Mapping helps to “bootstrap”

100

Direct Mapping as R2RML

ID

NAME

AGECID

1 Alice 25 100

2 BobNULL 100

Person

CIDNAM

E

100 Austin

200Madri

d

City

<Person/ID=1>

<City/CID=100>

Alice25

Austin

<Person/ID=2>

Alice

<City/CID=200> Madrid

<Person#NAME><Person#AGE> <Person#NAME>

<Person#NAME>

<Person#NAME>

<Person#ref-CID><Person#ref-CID>

How can this be represented as R2RML?

101

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ].

Direct Mapping as R2RML

102

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ].

Direct Mapping as R2RML

Logical Table: What is being mapped?

SubjectMap: How to generate the Subject?

PredicateObjectMap: How to generate the Predicate and Object?

103

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] .

Logical Table

What is being mapped?

104

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] .

Subject URI Template

Subject URI

<Subject URI> rdf:type <Class URI>

105

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] .

Predicate URI Constant

Predicate URI

106

@prefix rr: <http://www.w3.org/ns/r2rml#> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/ID={ID}"; rr:class <http://www.ex.com/Person> ]; rr:predicateObjectMap [ rr:predicate <http://www.ex.com/Person#NAME> ; rr:objectMap [rr:column ”NAME" ] ] .

Object Column Value

Object Literal

107

<http://www.ex.com/Person/ID=1>

<http://www.ex.com/Person#NAME>

<http://www.ex.com/Person/1>

foaf:name

“Ugly” vs “Cool” URIs

foaf:Person

<http://www.ex.com/Person>

108

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

Customization

Customized Subject URI

Customized Class

109

What if …

IDNAM

EGENDE

R

1 Alice F

2 Bob M

Person

<Person/1> Alicefoaf:name

<Woman>

rdf:type

SELECT ID, NAME FROM Person WHERE GENDER = "F"

R2RML View

110

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:sqlQuery

“””SELECT ID, NAMEFROM Person WHERE gender = “F” “””];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class <http://www.ex.com/Woman> ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

R2RML View

Query instead of table

111

Quick Overview of R2RML

• Manual and Customizable Language

• Learning Curve

• Direct Mapping bootstraps R2RML

• RDF represents the structure and ontology of mapping author’s choice

Questions?

Next: R2RML

RDB2RDF TutorialR2RML

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

Outline

• Logical Tables: What is being mapped• Term Maps: How to create RDF terms• How to create Triples from a table• How to create Triples between two tables• Languages• Datatypes

R2RML Mapping

Input Database

Logical Table

Logical Table = base table or view or SQL query

R2RML View = SQL Query

R2RML Mapping

sid name pid

1 Juan 100

2 Martin 200

pid name

100 Dan

200 Marcelo

Student

Professor

ex:Student1 rdf:type ex:Student .ex:Student2 rdf:type ex:Student .ex:Professor100 rdf:type ex:Professor .ex:Professor200 rdf:type ex:Professor .ex:Student1 foaf:name “Juan”.…

R2RML Mapping

R2RML Mapping

R2RML Mapping

• A R2RML Mapping M consists of a finite set TM TripleMaps.

• Each TM ∈TM consists of a tuple (LT, SM, POM)– LT: LogicalTable– SM: SubjectMap– POM: PredicateObjectMap

• Each POM∈POM consists of a pair (PM, OM)*– PM: PredicateMap– OM: ObjectMap

* For simplicity

R2RML Mapping

• An R2RML Mapping is represented as an RDF Graph itself.

• Associated RDFS schema– http://www.w3.org/ns/r2rml

• Turtle is the recommended syntax

119

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

LogicalTable

• Tabular SQL query result that is to be mapped to RDF– rr:logicalTable

1. SQL base table or view– rr:tableName

2. R2RML View– rr:sqlQuery

121

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:sqlQuery

“””SELECT ID, NAMEFROM Person WHERE gender = “F” “””];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class <http://www.ex.com/Woman> ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

How to create RDF terms that define S, P and O?

• RDF term is either an IRI, a blank node, or a literal

• Answer1. Constant Value2. Value in the database

a. Raw Value in a Columnb. Column Value applied to a template

TermMap

• A TermMap is a function that generates an RDF Term from a logical table row.

• RDF Term is either a IRI, or a Blank Node, or a Literal

Logical Table Row

TermMap IRI

Bnode

Literal

RDF Term

TermMap

• A TermMap must be exactly on of the following– Constant-valued TermMap– Column-valued TermMap– Template-valued TermMap

• If TermMaps are used to create S, P, O, then– 3 ways to create a subject– 3 ways to create a predicate– 3 ways to create an object

Stemplate

Ptemplate

Otemplate

Oconstant

Ocolumn

PConstant

Otemplate

Oconstant

Ocolumn

Pcolumn

OtemplateOconstant

Ocolumn

Sconstant

Ptemplate

Otemplate

Oconstant

Ocolumn

PConstant

Otemplate

Oconstant

Ocolumn

Pcolumn

OtemplateOconstant

Ocolumn

Scolumn

Ptemplate

Otemplate

Oconstant

Ocolumn

PConstant

Otemplate

Oconstant

Ocolumn

Pcolumn

OtemplateOconstant

Ocolumn

How many ways to create a Triple?

Constant-valued TermMap

• A TermMap that ignores the logical table row and always generates the same RDF term

• rr:constant

• Commonly used to generate constant IRIs as the predicate

129

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [rr:column ”NAME" ] ] .

Column-valued TermMap

• A TermMap that maps a column value of a column name in a logical table row

• rr:column

• Commonly used to generate Literals as the object

131

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [rr:column ”NAME" ] ] .

Template-valued TermMap

• A TermMap that maps the column values of a set of column names to a string template.

• A string template is a format that can be used to build strings from multiple components.

• rr:template

• Commonly used to generate IRIs as the subject or concatenate different attributes

133

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [rr:column ”NAME" ] ] .

Commonly used…

• … but any of these TermMaps can be used to create any RDF Term (s,p,o). Recall:– 3 ways to create a subject– 3 ways to create a predicate– 3 ways to create an object

• Template-valued TermMap are commonly used to create an IRI for a subject, but can be used to create Literal for an object.

• How to specify the term (IRI or Literal in this case)?

TermType

• Specify the type of a term that a TermMap should generate

• Force what the RDF term should be• Three types of TermType:

– rr:IRI– rr:BlankNode– rr:Literal

136

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [ rr:template ”{FIRST_NAME} {LAST_NAME}”; rr:termType rr:Literal; ] ] .

137

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template ”person{ID}"; rr:termType rr:BlankNode; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [rr:column ”NAME" ] ] .

TermType (cont…)

• Can only be applied to Template and Column valued TermMap

• Applying to Constant-valued TermMap has no effect– i.e If the constant is an IRI, the term type is

automatically an IRI

TermType Rules

• If the Term Map is for a 1. Subject TermType = IRI or Blank Node2. Predicate TermType = IRI 3. Object TermType = IRI or Blank Node or Literal

TermType is Optional

• If a TermType is not specified then– Default = IRI– Unless it’s for an object being defined by a

Column-based TermMap or has a language tag or specified datatype, then the TermType is a Literal

• That’s why if there is a template in an ObjectMap, it will always generate an IRI, unless a TermType to Literal is specified.

141

rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [ rr:template ”{FIRST_NAME} {LAST_NAME}”; rr:termType rr:Literal; ] ]

rr:predicateObjectMap [ rr:predicateMap [rr:constant ex:role ] rr:objectMap [ rr:template ”http://ex.com/role/{role}” ] ]

rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name ] rr:objectMap [ rr:template ”{FIRST_NAME} {LAST_NAME}” ] ]

Now we have the elements to create Triples

Generating SPO

• TermMap that specifies what RDF term should be for S, P, O– SubjectMap– PredicateMap– ObjectMap

SubjectMap

• SubjectMap is a TermMap• rr:subjectMap• Specifies what the subject of a triple should be• 3 ways to create a subject

– Template-valued Term Map– Column-valued Term Map– Constant-valued Term Map

• Has to be an IRI or Blank Node

SubjectMap

• SubjectMaps are usually Template-valued TermMap

• Use-case for Column-valued TermMap– Use a column value to create a blank node– URI exist as a column value

• Use-case for Constant-valued TermMap– For all tuples: <CompanyABC> <consistsOf> <Dep{id}>

SubjectMap

• Optionally, a SubjectMap may have one or more Class IRIs associated– This will generate rdf:type triples

• rr:class

148

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

Optional

PredicateObjectMap

• A function that creates one or more predicate-object pairs for each logical table row.

• rr:predicateObjectMap• It is used in conjunction with a SubjectMap to

generate RDF triples in a TriplesMap.• A predicate-object pair consists of*

– One or more PredicateMaps– One or more ObjectMaps or

ReferencingObjectMaps

150

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name];

rr:objectMap [rr:column ”NAME" ] ] .

PredicateMap

• PredicateMap is a TermMap• rr:predicateMap• Specifies what the predicate of a triple should

be• 3 ways to create a predicate

– Template-valued Term Map– Column-valued Term Map– Constant-valued Term Map

• Has to be an IRI

PredicateMap

• PredicateMaps are usually Constant-valued TermMap

• Use-case for Column-valued TermMap– …

• Use-case for Template-valued TermMap– …

153

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name]; rr:objectMap [rr:column ”NAME" ] ] .

154

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [rr:column ”NAME" ] ] .

Shortcut!

Constant Shortcut Properties

• ?x rr:predicate ?y• ?x rr:predicateMap [ rr:constant ?y ]

• ?x rr:subject ?y• ?x rr:subjectMap [ rr:constant ?y ]

• ?x rr:object ?y• ?x rr:objectMap [ rr:constant ?y ]

ObjectMap

• ObjectMap is a TermMap• rr:objectMap• Specifies what the object of a triple should be• 3 ways to create a predicate

– Template-valued Term Map– Column-valued Term Map– Constant-valued Term Map

• Has to be an IRI or Literal or Blank Node

ObjectMap

• ObjectMaps are usually Column-valued TermMap

• Use-case for Template-valued TermMap– Concatenate values– Create IRIs

• Use-case for Constant-valued TermMap– All rows in a table share a role

158

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”Person”]; rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ]; rr:predicateObjectMap [ rr:predicateMap [rr:constant foaf:name]; rr:objectMap [rr:column ”NAME" ] ] .

sid name pid

1 Juan 100

2 Martin 200

Student

@prefix ex: <http://example.com/ns/>.

ex:Student1 rdf:type ex:Student .ex:Student2 rdf:type ex:Student .

TripleMap

Example 1

• We now have sufficient elements to create a mapping that will generate– A Subject IRI– rdf:Type triple(s)

Example 1@prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns/>.

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ].

Logical Table is a Table Name

SubjectMap is aTemplate-valued TermMapAnd it has one Class IRI

sid name pid

1 Juan 100

2 Martin 200

Student

@prefix ex: <http://example.com/ns/>.

ex:Student1 rdf:type ex:Student .ex:Student1 ex:name “Juan” .ex:Student2 rdf:type ex:Student .ex:Student2 ex:name “Martin” .

TripleMap

Example 2

Example 2@prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns/>.

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:name; rr:objectMap [ rr:column “name”]; ].

Logical Table is a Table Name

SubjectMap is aTemplate-valued TermMapAnd it has one Class IRI

PredicateObjectMap

PredicateMap which is a Constant-valued TermMap

ObjectMap which is a Column-valued TermMap

sid name pid

1 Juan 100

2 Martin 200

Student@prefix ex: <http://example.com/ns/>.

ex:Student1 rdf:type ex:Student .ex:Student1 ex:comment “Juan is a Student” .ex:Student2 rdf:type ex:Student .ex:Student2 ex:comment “Martin is a Student” .

TripleMap

Example 3

Example 3@prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns/>.

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:comment; rr:objectMap [ rr:template “{name} is a Student”; rr:termType rr:Literal; ]; ].

Logical Table is a Table Name

SubjectMap is aTemplate-valued TermMapAnd it has one Class IRI

PredicateObjectMap

PredicateMap which is a Constant-valued TermMap

ObjectMap which is a Template-valued TermMap

TermType

sid name pid

1 Juan 100

2 Martin 200

Student@prefix ex: <http://example.com/ns/>.

ex:Student1 rdf:type ex:Student .ex:Student1 ex:webpage <http://ex.com/Juan>.ex:Student2 rdf:type ex:Student .ex:Student2 ex:webpage <http://ex.com/Martin>.

TripleMap

Example 4

Example 4@prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns/>.

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:webpage; rr:objectMap [ rr:template “http://ex.com/{name}”; ]; ].

Logical Table is a Table Name

SubjectMap is aTemplate-valued TermMapAnd it has one Class IRI

PredicateObjectMap

PredicateMap which is a Constant-valued TermMap

ObjectMap which is a Template-valued TermMap

Note that there is not TermType

sid name pid

1 Juan 100

2 Martin 200

Student@prefix ex: <http://example.com/ns/>.

ex:Student1 rdf:type ex:Student .ex:Student1 ex:studentType ex:GradStudent.ex:Student2 rdf:type ex:Student .ex:Student2 ex:studentType ex:GradStudent.

TripleMap

Example 5

Example 6@prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns/>.

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:studentType; rr:object ex:GradStudent ; ].

Logical Table is a Table Name

SubjectMap is aTemplate-valued TermMapAnd it has one Class IRI

PredicateObjectMap

PredicateMap which is a Constant-valued TermMap

ObjectMap which is a Constant-valued TermMap

RefObjectMap

• A RefObjectMap (Referencing ObjectMap) allows using the subject of another TriplesMap as the object generated by a ObjectMap.

• rr:objectMap• A RefObjectMap defined by

– Exactly one ParentTripleMap, which must be a TripleMap

– May have one or more JoinConditions

171

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ];

rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;rr:joinCondition [

rr:child “CID”;rr:parent “CID”;

]]

] .

<TriplesMap2> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”City" ];

rr:subjectMap [ rr:template "http://ex.com/City/{CID}"; rr:class ex:City ];

rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column ”TITLE" ] ] .

RefObjectMap

ParentTripleMap

• The referencing TripleMap• rr:parentTriplesMap

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ];

rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;rr:joinCondition [

rr:child “CID”;rr:parent “CID”;

]]

] .

Parent TriplesMap

JoinCondition

• Join between child and parent attribuets• rr:joinCondition

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ];

rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;rr:joinCondition [

rr:child “CID”;rr:parent “CID”;

]]

] .

JoinCondition

174

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ];

rr:subjectMap [ rr:template "http://www.ex.com/Person/{ID}"; rr:class foaf:Person ];

rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;rr:joinCondition [

rr:child “CID”;rr:parent “CID”;

]]

] .

<TriplesMap2> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”City" ];

rr:subjectMap [ rr:template "http://ex.com/City/{CID}"; rr:class ex:City ];

rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column ”TITLE" ] ] .

RefObjectMap

Parent TriplesMap

JoinCondition

JoinCondition

• Child Column which must be the column name that exists in the logical table of the TriplesMap that contains the RefObjectMap

• Parent Column which must be the column name that exists in the logical table of the RefObjectMap’s Parent TriplesMap.

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ]; ... rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;

rr:joinCondition [rr:child

“CID”;rr:parent

“CID”;]]

] .<TriplesMap2> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”City" ];

... .

JoinCondition• Child Query

– The Child Query of a RefObjectMap is the LogicalTable of the TriplesMap containing the RefObjectMap

• Parent Query– The ParentQuery of a

RefObjectMap is the LogicalTable of the Parent TriplesMap

• If the ChildQuery and ParentQuery are not identical, then a JoinCondition must exist

<TriplesMap1> a rr:TriplesMap; rr:logicalTable [ rr:tableName”Person" ]; ... rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;

rr:joinCondition [rr:child

“CID”;rr:parent

“CID”;]]

] .<TriplesMap2> a rr:TriplesMap; rr:logicalTable [ rr:tableName ”City" ];

... .

sid name pid

1 Juan 100

2 Martin 200

pid name

100 Dan

200 Marcelo

Student

Professor

ex:Student1 rdf:type ex:Student .ex:Student2 rdf:type ex:Student .ex:Professor100 rdf:type ex:Professor .ex:Professor200 rdf:type ex:Professor .ex:Student1 ex:hasAdvisor ex:Professor100 .ex:Student2 ex:hasAdvisor ex:Professor200

R2RML Mapping

Example 7

@prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns/>.

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:hasAdvisor; rr:objectMap [ rr:parentTriplesMap <#TriplesMap2>; rr:joinCondition [ rr:child “pid”; rr:parent “pid”; ] ] ]. <#TriplesMap2>

rr:logicalTable [ rr:tableName ”Professor”]; rr:subjectMap [ rr:template "http://example.com/ns/{pid}"; rr:class ex:Professor; ].

RefObjectMap

Parent TriplesMap

JoinCondition

Summary

Languages

• TermMap with a TermType of rr:Literal may have a language tag

• rr:language <#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:comment; rr:objectMap [ rr:column “comment”; rr:language “en”; ]; ].

sid name comment

1 Juan Excellent Student

2 Martin Wonderful student

Student

@prefix ex: <http://example.com/ns/>.

ex:Student1 rdf:type ex:Student .ex:Student1 ex:comment “Excellent Student”@en .ex:Student2 rdf:type ex:Student .ex:Student2 ex:comment “Wonderful Student”@en .

Issue with Languages

• What happens if language value is in the data?

ID COUNTRY_ID LABEL LANG

1 1 United States en

2 1 Estados Unidos es

3 2 England en

4 2 Inglaterra es

@prefix ex: <http://example.com/ns/>.

ex:country1 rdfs:label “United States”@en .ex:country1 rdfs:label “Estados Unidos”@es .

ex:country2 rdfs:label “England”@en .ex:country2 rdfs:label “Inglaterra”@es .

ID COUNTRY_ID LABEL LANG

1 1 United States en

2 1 Estados Unidos es

3 2 England en

4 2 Inglaterra es

?

Issue with Languages

• Mapping for each language

<#TripleMap_Countries_EN>a rr:TriplesMap;rr:logicalTable [ rr:sqlQuery """SELECT COUNTRY_ID,

LABEL, LANG, FROM COUNTRY WHERE LANG = ’en'""" ]; rr:subjectMap [ rr:template "http://example.com/country{COUNTRY_ID}" ]; rr:predicateObjectMap [ rr:predicate rdfs:label; rr:objectMap [

rr:column “LABEL”; rr:language “en”;];

].

Language Extension

• Single mapping for all languages

<#TripleMap_Countries_EN>a rr:TriplesMap;rr:logicalTable [ rr:tableName ”COUNTRY" ];

rr:subjectMap [ rr:template "http://example.com/country{COUNTRY_ID}" ]; rr:predicateObjectMap [ rr:predicate rdfs:label; rr:objectMap [

rr:column “LABEL”; rrx:languageColumn “LANG”;];

].

Column Value as Language

Datatypes

• TermMap with a TermType of rr:Literal• TermMap does not have rr:language

<#TriplesMap1> rr:logicalTable [ rr:tableName ”Student”]; rr:subjectMap [ rr:template "http://example.com/ns/{sid}"; rr:class ex:Student; ]; rr:predicateObjectMap [ rr:predicate ex:startDate; rr:objectMap [ rr:column “start_date”; rr:datatype xsd:date; ]; ].

Summary of Terminology• R2RML Mapping• Logical Table• Input Database• R2RML View• TriplesMap• Logical Table Row• TermMap• TermType• SubjectMap• PredicateObjectMap• PredicateMap• ObjectMap

• Constant-valued TermMap• Column-valued TermMap• Template-valued TermMap• RefObjectMap• JoinConditions• ChildQuery• ParentQuery• Language• Datatype

Questions?

Next: ETL and Musicbrainz

RDB2RDF TutorialETL and Musicbrainz

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

191

Context

RDFData Management

Relational Database to RDF(RDB2RDF)

TriplestoresWrapper Systems

Extract-Transform-Load(ETL)

RDBMS-backedTriplestores

NativeTriplestores

NoSQLTriplestores

Extract – Transform – Load (ETL)

Triplestore

SPARQL

RelationalDatabase

RDB2RDFDump

193

EUCLID Scenario

Visualization Module

Metadata

Streaming providers

Physical Wrapper

Downloads

Dat

a ac

quis

ition

R2R Transf.LD Wrapper

Musical Content

Appl

icati

on

Analysis & Mining Module

LD D

atas

etAc

cess

LD Wrapper

RDF/ XML

Integrated Dataset

Interlinking CleansingVocabulary Mapping

SPARQL Endpoint

Publishing

RDFa

Other content

194

W3C RDB2RDF• Task: Integrate data from

relational DBMS with Linked Data

• Approach: map from relational schema to semantic vocabulary with R2RML

• Publishing: two alternatives –– Translate SPARQL into SQL on

the fly– Batch transform data into RDF,

index and provide SPARQL access in a triplestore

LD D

atas

etAc

cess

Integrated Data in

Triplestore

Interlinking CleansingVocabulary Mapping

SPARQL Endpoint

Publishing

Dat

a ac

quis

ition

R2RMLEngine

RelationalDBMS

RDB2RDF

195

MusicBrainz Next Gen Schema• artist

As pre-NGS, but further attributes

• artist_creditAllows joint credit

• release_groupCf. ‘album’

versus:

• release• medium

• track• tracklist

• work• recording

https://wiki.musicbrainz.org/Next_Generation_Schema

RDB2RDF

196

Music Ontology• MusicArtist

– ArtistEvent, member_of

• SignalGroup ‘Album’ as per Release_Group

• Release– ReleaseEvent

• Record• Track• Work• Compositionhttp://musicontology.com/

RDB2RDF

197

Scale• MusicBrainz RDF derived via R2RML:

lb:artist_member a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT a1.gid, a2.gid AS band FROM artist a1 INNER JOIN l_artist_artist ON a1.id = l_artist_artist.entity0 INNER JOIN link ON l_artist_artist.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:member_of ; rr:objectMap [rr:template "http://musicbrainz.org/artist/{band}#_" ; rr:termType rr:IRI]] .

300M Triples

Musicbrainz

• Musicbrainz Dumps:– http://mbsandbox.org/~barry/

• Musicbrainz R2RML Mappings– https://github.com/LinkedBrainz/MusicBrainz-R2RML

• 30 mins to generate 150M triples with Ultrawrap– 8 Xeon cores, 16 GB Ram (2GB are usually free)– Should be less but server was overloaded– It use to be 8+ hours using D2RQ on a dedicated

machine

Musicbrainz Dump Statistics

(Lead) Table Triples Time (s)area 59798 2artist 36868228 423dbpedia 172017 13label 201832 3medium 18069143 163recording 11400354 209release_group 3050818 31release 9764887 151track 75506495 794work 1728955 20

156822527 1809

200

R2RML Class Mapping

• Mapping tables to classes is ‘easy’:

lb:Artist a rr:TriplesMap ; rr:logicalTable [rr:tableName "artist"] ; rr:subjectMap [rr:class mo:MusicArtist ; rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:musicbrainz_guid ; rr:objectMap [rr:column "gid" ; rr:datatype xsd:string]] .

RDB2RDF

RDB2RDF 201

R2RML Property Mapping

• Mapping columns to properties can be easy:

lb:artist_name a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT artist.gid, artist_name.name FROM artist INNER JOIN artist_name ON artist.name =

artist_name.id"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate foaf:name ; rr:objectMap [rr:column "name"]] .

202

NGS Advanced Relations• Major entities (Artist, Release Group, Track, etc.) plus

URL are paired(l_artist_artist)

• Each pairingof instancesrefers to a Link

• Links have types (cf. RDF properties)and attributes

http://wiki.musicbrainz.org/Advanced_Relationship

RDB2RDF

203

Advanced Relations Mapping• Mapping advanced relationships (SQL joins):lb:artist_member a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT a1.gid, a2.gid AS band FROM artist a1 INNER JOIN l_artist_artist ON a1.id = l_artist_artist.entity0 INNER JOIN link ON l_artist_artist.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id WHERE link_type.gid='5be4c609-9afa-4ea0-910b-12ffb71e3821'"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:member_of ; rr:objectMap [rr:template

"http://musicbrainz.org/artist/{band}#_" ; rr:termType rr:IRI]] .

RDB2RDF

204

Advanced Relations Mapping• Mapping advanced relationships (SQL joins):lb:artist_dbpedia a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT artist.gid, REPLACE(REPLACE(url, 'wikipedia.org/wiki', 'dbpedia.org/resource'), 'http://en.', 'http://') AS url FROM artist INNER JOIN l_artist_url ON artist.id = l_artist_url.entity0 INNER JOIN link ON l_artist_url.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN url on l_artist_url.entity1 = url.id WHERE link_type.gid='29651736-fa6d-48e4-aadc-a557c6add1cb' AND url SIMILAR TO 'http://(de|el|en|es|ko|pl|pt).wikipedia.org/wiki/%'"""] ; rr:subjectMap lb:sm_artist ; rr:predicateObjectMap [rr:predicate owl:sameAs ; rr:objectMap [rr:column "url"; rr:termType rr:IRI]] .

RDB2RDF

205

SPARQL Example• SPARQL versus SQLASK {dbp:Paul_McCartney mo:member dbp:The_Beatles}

SELECT …

INNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOININNER JOINWHERE AND … AND … AND … AND …

RDB2RDF

206

For exercises, quiz and further material visit our website:

@euclid_project EUCLID project EUCLIDproject

http://www.euclid-project.eu

Other channels:

eBook Course

Questions?

Next: Wrappers

RDB2RDF TutorialWrappers

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

Juan F. Sequeda

Daniel P. Miranker

Barry Norton

209

Context

RDFData Management

Relational Database to RDF(RDB2RDF)

TriplestoresWrapper Systems

Extract-Transform-Load(ETL)

RDBMS-backedTriplestores

NativeTriplestores

NoSQLTriplestores

210

SPARQL

RDF

SQL

SQLResults

SPARQL/RDFResults

RelationalDatabase

RDB2RDFMapping

Wrapper Systems

“Comparing the overall performance […] of the fastest rewriter with the fastest relational database shows an overhead for query rewriting of 106%. This is an indicator that there is still room for improving the rewriting algorithms”

[Bizer and Schultz 2009]

Larger numbers are better

Results of BSBM 2009

http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/index.html

Larger numbers are better

Results of BSBM 2009100M Triple Dataset

http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/index.html

After March 2009, RDB2RDF systems have not been compared to RDBMS

Current rdb2rdf systems are not capable of providing the query execution performance required [...] it is likely that with more work on query translation, suitable mechanisms for translating queries could be developed. These mechanisms should focus on exploiting the underlying database system’s capabilities to optimize queries and process large quantities of structure data

[Gray et al. 2009]

Why is this happening if …

“SPARQL is equivalent, from an expressive point of you, to relational algebra”

Angles & Gutierrez 2008

Problem

• How can SPARQL queries be efficiently evaluated on a RDBMS?

• Hypothesis: Existing commercial relational database already subsume optimizations for effective SPARQL execution on relationally stored data

219

220

Nugget

1. Defined architecture based on SQL Views which allows RDBMS to do the optimization.

2. Identified two important optimizations that already exist in commercial RDBMS.

Sequeda & Miranker. Ultrawrap: SPARQL Execution on Relational Data. Journal Web Semantics 2013

221

UltrawrapCompile Time1. Translate SQL Schema

to OWL and Mapping2. Define RDF Triples,

as a View

Run Time3. SPARQL to SQL

translation4. SQL Optimizer

creates relational query plan

222

Creating Tripleview

• For every ontology element (Class, Object Property and Datatype property), create a SQL SELECT query that outputs triples

SELECT 'Product’+ptID as s, ‘label’ as p, label as oFROM Product WHERE label IS NOT NULL

S P O

Product1 label ACME Inc

Product2 label Foo Bars

ptID label prID

1 ACME Inc 4

2 Foo Bars 5

Product

223

Creating Tripleview

SELECT ‘Product’+ptID as s, prID as s_id, ‘label’ as p, label as o, NULL as o_idFROM Product WHERE label IS NOT NULL

S S_id P O O_id

Product1 1 label ACME Inc NULL

Product2 2 label Foo Bars NULL

ptID label prID

1 ACME Inc 4

2 Foo Bars 5

Product

S S_id P O O_id

Product1 1 Product#Producer Producer4 4

Product2 2 Product#Producer Producer5 5

Object Property RDF TriplesSELECT ‘Product’+ptID as s, ptID as s_id, ‘Product#Producer’ as p, ‘Producer’+prID as o, prID as o_id FROM Product

SELECT ‘Product’+ptID as s, prID as s_id, ‘rdf:type’ as p, ‘Product’ as o, NULL as o_idFROM Product

S S_id P O O_id

Product1 1 rdf:type Product NULL

Product2 2 rdf:type Product NULL

Class RDF Triples

225

Creating Tripleview (…)

• Create TripleViews (SQL View), which are unions of the SQL SELECT query that have the same datatype

CREATE VIEW Tripleview_varchar ASSELECT ‘Product’+ptID as s, ptID as s_id, ‘label’ as p, label as o, NULL as o_id FROM ProductUNION ALLSELECT ‘Producer’+prID as s, prID as s_id, ‘title’ as p, title as o, NULL as o_id FROM ProducerUNION ALL …

S S_id P O O_id

Product1 1 label ACME Inc NULL

Product2 2 label Foo Bars NULL

Producer4 4 title Foo NULL

Producer5 5 Ttitle Bars NULL

CREATE VIEW Tripleview_int ASSELECT ‘Product’+ptID as s, ptID as s_id, ‘pnum1’ as p, pnum1 as o, NULL as o_id FROM ProductUNION ALLSELECT ‘Product’+ptID as s, ptID as s_id, ‘pnum2’ as p, pnum2 as o, NULL as o_id FROM Product

S S_id P O O_id

Product1 1 pnum1 1 NULL

Product2 2 pnum1 3 NULL

Product1 1 pnum2 2 NULL

Product2 2 pnum2 3 NULL

227

SPARQL and SQL• Translating a SPARQL query to a semantically

equivalent SQL querySELECT ?label ?pnum1WHERE{ ?x label ?label. ?x pnum1 ?pnum1.}

SELECT label, pnum1FROM product

SQL on Tripleview

SELECT t1.o AS label, t2.o AS pnum1FROM tripleview_varchar t1, tripleview_int t2WHERE t1.p = 'label' AND

t2.p = 'pnum1' ANDt1.s_id = t2.s_id

What is the

Query Plan?

228

Tripleview_varchar t1

Product

π Product+’id’ AS s , ‘label’ AS p, label AS o

σlabel = NULLProducer

π Producer+’id’ AS s , ‘title’ AS p, title AS o

σtitle = NULL

U

Tripleview_int t2

Product

π Product+’id’ AS s , ‘pnum1’ AS p, pnum1 AS o

σpnum1 = NULLProduct

π Product+’id’ AS s , ‘pnum2’ AS p, pnum2 AS o

σpnum2 = NULL

U

π t1.o AS label, t2.o AS pnum1

σp = ‘label’ σp = ‘pnum1’

CONTRADICTION

CONTRADICTION

229

Detection of Unsatisfiable Conditions

• Determine that the query result will be empty if the existence of another answer would violate some integrity constraint in the database.

• This would imply that the answer to the query is null and therefore the database does not need to be accessed

Chakravarthy, Grant and Minker. (1990) Logic-Based Approach to Semantic Query Optimization.

230

Product

π Product+’id’ AS s , ‘label’ AS p, label AS o

σlabel = NULL

Product

π Product+’id’ AS s , ‘pnum1’ AS p, pnum1 AS o

σpnum1 = NULL

π t1.o AS label, t2.o AS pnum1

Join on the same table? REDUNDANT

231

Self Join Elimination• If attributes from the same table are projected

separately and then joined, then the join can be dropped

SELECT label, pnum1 FROM product WHERE

id = 1

SELECT p1.label, p2.pnum1 FROM product p1, product p2 WHERE

p1.id = 1 and p1.id = p2.id

SELECT p1.id FROM product p1, product p2 WHERE

p1.pnum1 >100 and p2.pnum2 < 500 and p1.id = p2.id

SELECT id FROM product WHERE

pnum1 > 100 and pnum2 < 500

Self Join Elimination of Projection

Self Join Elimination of Selection

232

Product

σlabel = NULL AND pnum1 = NULL

π label, pnum1

Evaluation

• Use Benchmarks that stores data in relational databases, provides SPARQL queries and their semantically equivalent SQL queries

• BSBM - 100 Million Triples• Barton – 45 million triples

234

Detection of Unsatisfiable

Conditions

Self Join

Elimination

MYSQL

MSSQL

ORACLE

DB2

✖✔

✖✖

✖ ✔✔ ✔

Ultrawrap Experiment

Augmented Ultrawrap Experiment

• Implemented DoUC– Hash predicate to SQL query– Few LOC

SPARQL as Fast as SQL

237

Berlin Benchmark on 100 Million Triples on Oracle 11g using Ultrawrap

Discussion

• Self join elimination

• Push Selects and Join Predicates

• Join Ordering

• Left Outer Join

Questions?

Next: Hands-On