Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | andimou |
View: | 513 times |
Download: | 1 times |
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Anastasia Dimou1, Miel Vander Sande1, Jason Slepicka2, Pedro Szekely2,
Erik Mannens1, Craig Knoblock2, Rik Van de Walle1
1Ghent University – iMinds – Multimedia Lab 2University of Southern California – Information Science Institute –
Department of Computer Science
http://rml.io
IEEE-ICSC14
Newport beach, California, 18th June 2014
Most of the data that we would like to be able to query as Linked Open Data
exists in formats other than RDF
There are…
over 11,000 APIs according to ProgrammableWeb.org
only 74 of which return results in RDF
But more than 5000
return results in JSON or XML
Many languages, tools and approaches
were proposed
to convert data from relational databases to RDF
Relational Database to RDF (R2RML W3C)
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSON XML
RDF RDF RDF
lack of uniform definitions to describe mapping rules for heterogeneous sources
lack of interoperable definitions that would allow the re-use of mapping rules
across different implementations
lack of reusable definitions that would allow the re-use of mapping rules
for representing data in the same or different formats
mapping data
on a per-source and per-format basis
or on case-specific basis
Uniform way of defining mappings
for heterogeneous sources
that can be re-used across data
in the same or different formats
and be interoperable
across different implementations
R2RML mappings R2RML processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSON XML
RDF RDF RDF
Mappings definitions processor
Data OWNER / PUBLISHER
defines
RDF
DB CSV JSON XML
any format to RDF
RDF Mapping Language (RML)
generic scalable mapping language
for mapping heterogeneous resources into RDF
in an integrable and interoperable fashion
superset of the W3C standardized
R2RML mapping language
http://semweb.mmlab.be/ns/rml
Relational Database to RDF
Mapping Language
(R2RML)
R2RML mapping document
NAME BIRTH_DATE DEATH_DATE
Robert Theodore McCall 1919-12-23 2010-02-26
Ronald Anderson 1929-12-06
Triples Map
Logical Table
Table Name
<#ArtistMapping>
rr:logicalTable [
rr:tableName “ARTISTS” ].
R2RML mapping definition
Table Name
Triples Map
Logical Table
Subject Map
Predicate-Object Map
Predicate-Object Map
Predicate-Object Map
Predicate Map
Object Map
R2RML mapping document
Triples Map
Subject Map
NAME BIRTH_DATE DEATH_DATE
Robert Theodore McCall 1919-12-23 2010-02-26
Ronald Anderson 1929-12-06
<#ArtistMapping>
rr:subjectMap [
rr:template “http://ex.com/{NAME}” ;
rr:class ex:Person ];
<http://ex.com/Robert+Theodore+McCall> a ex:Person
R2RML mapping document
Predicate Map
NAME BIRTH_DATE DEATH_DATE
Robert Theodore McCall 1919-12-23 2010-02-26
Ronald Anderson 1929-12-06
<#ArtistMapping>
rr:predicateObjectMap [
rr:predicate ex:birth_date;
rr:objectMap [ rr:column "BIRTH_DATE" ] ];
<http://ex.com/Robert+Theodore+McCall> ex:birth_date “1919-12-23”
Predicate Object Map
Objectt Map
RDF Mapping Language
(RML)
RDF Mapping Language (RML)
mapping hierarchical sources to RDF
deal with hierarchy and heterogeneity
R2RML: each row is a self-contained
that can be processed independently
R2RML: the columns in each row
can be referred to unambiguously
R2RML: for each reference to a column in a single row
a unique value is returned
explicit reference to the iteration pattern R2RML: each row is a self-contained
that can be processed independently
abstract reference to the input data R2RML: the columns in each row
can be referred to unambiguously
more than one triples per Predicate-Object Map R2RML: for each reference to a column in a single row
a unique value is returned
RDF Mapping Language
(RML)
For hierarchical sources
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<Artists> ... ...
<Artist>
<Name>Robert Theodore McCall</Name>
<Birth_Date>1919-12-23</Birth_Date>
<Death_Date>2010-02-26</Death_Date>
</Artist>
<Artist>
<Name>Ronald Anderson</Name>
<Birth_Date>1929-12-06</Birth_Date>
<Death_Date/>
</Artist> ... ...
</Artists>
artworks.JSON artists.XML
Specifying the input data
R2RML: database
RML: file, API, …
R2RML: Logical Table (rr:logicalTable)
RML: Logical Source (rml:logicalSource)
R2RML: logical Name (rr:logicalName)
RML: source (rml:source)
Triples Map
Logical Source
source
<#ArtworkMapping>
rml:logicalSource [rml:source “http://ex.com/artworks.json”].
Triples Map
Logical Source
source
<#ArtistMapping>
rml:logicalSource
[ rml:source “artists.xml” ].
Referring to the input data
R2RML: databases
RML: XML or JSON or CSV or ….
R2RML: (SQL)
RML: Xpath/Xquery or JSONPath or RFC 4180 or …
R2RML: (rr:sqlQuery)
RML: rml:referenceFormulation
<#ArtworkMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json” ;
rml:rererenceFormulation ql:JSONPath ].
Triples Map
Logical Source
source
<#ArtistMapping>
rml:logicalSource
[ rml:source “artists.xml”;
rml:referenceFormulation ql:XPath ]. Reference Formulation
Triples Map
Logical Source
source
Reference Formulation
Iterating over the input data
R2RML: per row
RML: ?
R2RML:
RML: rml:iterator
<#ArtistMapping>
rml:logicalSource
[ rml:source “artists.xml”; rml:referenceFormulation ql:Xpath ;
rml:iterator “/Artists/Artist” ].
<Artists> ... ...
<Artist>
<Name>Robert Theodore McCall</Name>
<Birth_Date>1919-12-23</Birth_Date>
<Death_Date>2010-02-26</Death_Date>
</Artist>
<Artist>
<Name>Ronald Anderson</Name>
<Birth_Date>1929-12-06</Birth_Date>
<Death_Date/>
</Artist> ... ...
</Artists>
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<#ArtworkMapping>
rml:logicalSource
[ rml:source “http://ex.com/artworks.json” ;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*]” ].
<#SitterMapping>
rml:logicalSource [ rml:source “http://ex.com/artworks.json”;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*].Sitter” ].
Referring to the extracts of the input data
explicitly and implicitly
R2RML: column name
RML: XML element or JSON object or …
R2RML: rr:column
RML: rml:reference
<#ArtistMapping>
rml:logicalSource [ rml:source “http://ex.com/artists.xml”;
rml:rererenceFormulation ql:XPath ;
rml:iterator “/Artists/Artist” ] ;
rr:subjectMap [
rr:template “http://ex.com/{Name}” ];
rr:predicateObjectMap [ rr:predicate ex:death_date ; rr:objectMap [
rml:reference “/Artists/Artist/Death_Date”] ].
<Artists> ... ...
<Artist>
<Name>Robert Theodore McCall</Name>
<Birth_Date>1919-12-23</Birth_Date>
<Death_Date>2010-02-26</Death_Date>
</Artist>
<Artist>
<Name>Ronald Anderson</Name>
<Birth_Date>1929-12-06</Birth_Date>
<Death_Date/>
</Artist> ... ...
</Artists>
<http://ex.com/Robert+Theodore+McCall> ex:death_date “1929-12-06”.
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<#ArtworkMapping>
rml:logicalSource [ rml:source “http://ex.com/artworks.json”;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*]” ] ;
rr:subjectMap [ rr:template “http://ex.com/{Ref}”];
rr:predicateObjectMap [ rr:predicate rdfs:label ; rr:objectMap [ rml:reference “$.[*].Title” ] ].
<http://ex.com/NPG_70_36> rdfs:label “Apollo 11 Crew”.
[ ... …
{ "Title": "Apollo 11 Crew",
"Artist": "Ronald Anderson",
"Ref": "NPG_70_36",
"Sitter": [
{ "Name": "Neil Armstrong",
"Birth Date": "1930-08-05" },
{ "Name": "Buzz Aldrin",
"Birth Date": "1930-01-20" },
{ "Name": "Michael Collins" } ],
"DateOfWork": "1969" },
{ "Title": "Neil Armstrong",
"Artist": "Robert Theodore McCall",
"Ref": "S_NPG_2010_51",
"Sitter": [
{ "Name": "Neil Armstrong" } ],
"DateOfWork": "2009" },
... … ]
<#SitterMapping>
rml:logicalSource [ rml:source “http://ex.com/artworks.json”;
rml:rererenceFormulation ql:JSONPath ;
rml:iterator “$.[*].Sitter” ] ;
rr:subjectMap [ rr:template “http://ex.com/{Name}”];
rr:predicateObjectMap [ rr:predicate ex:birth_date ; rr:objectMap [ rml:reference “$.[*].Sitter.Birth Date” ]].
<http://ex.com/Neil+Armstrong> ex:birth_date “1930-08-05”.
RDF Mapping Language (RML)
Source
Triples Map
Logical Source
Subject Map
Predicate-Object Map
Predicate Map
Object Map
Term Map
template
constant
reference
Iterator
Reference Formulation
Referencing Object Map
Triples Map
Join Condition
Parent column
Child column
RDF Mapping Language
(RML)
Editing mappings with Karma http://www.isi.edu/integration/karma/
RDF Mapping Language
(RML)
Processing
mapping-driven processing:
processing driven by the mapping module
data-driven processing:
processing driven by the extraction module
Extraction Module Mapping Module
RML Processor
Mapping Hierarchical Sources into RDF
using the RML mapping language
RML: http://rml.io
RML Namespace: http://semweb.mmlab.be/ns/rml
RML Processor: https://github.com/mmlab/RMLProcessor
Contact us
Anastasia Dimou [email protected] @natadimou
Miel Vander Sande [email protected] @Miel_vds