+ All Categories
Home > Technology > Batch import of large RDF datasets into Semantic MediaWiki

Batch import of large RDF datasets into Semantic MediaWiki

Date post: 21-Jan-2018
Category:
Upload: samuel-lampa
View: 620 times
Download: 1 times
Share this document with a friend
28
Batch import of large RDF datasets using RDFIO or the new rdf2smw tool Samuel Lampa - @smllmp PhD Student in Pharmaceutical Bioinformatics @ pharmb.io with Assoc. Prof. Ola Spjuth - @ola_spjuth @ Dept. of Pharm. Biosci. / Uppsala University Semantic MediaWiki Conference Fall 2016, Frankfurt am Main,
Transcript
Page 1: Batch import of large RDF datasets into Semantic MediaWiki

Batch import of large RDF datasets using RDFIO or the new

rdf2smw tool

Samuel Lampa - @smllmp

PhD Studentin Pharmaceutical Bioinformatics @ pharmb.io

with Assoc. Prof. Ola Spjuth - @ola_spjuth@ Dept. of Pharm. Biosci. / Uppsala University

Semantic MediaWiki Conference Fall 2016, Frankfurt am Main,

Page 2: Batch import of large RDF datasets into Semantic MediaWiki

RDF Import? Who wants that?

Page 3: Batch import of large RDF datasets into Semantic MediaWiki

Research interests

● Large datasets● Automation● Scientific workflows● Machine Learning

● Semantic data● Reasoning ● Query systems

● Something user friendly● … and hopefully usable● “Answer ALL the research questionz”

Page 4: Batch import of large RDF datasets into Semantic MediaWiki

RDFIO

github.com/rdfio/rdfio

Page 5: Batch import of large RDF datasets into Semantic MediaWiki

What’s the problem?

● Semantic MediaWiki has great support for exporting to RDF

Page 6: Batch import of large RDF datasets into Semantic MediaWiki

What’s the problem?

● … but, not really any (proper) RDF import (as in: plain triples → wiki syntax in articles)

Page 7: Batch import of large RDF datasets into Semantic MediaWiki

RDFIO What?!

● SMW extension● Import plain RDF triples ● No need for an ontology● RDF URIs → Wiki titles

● Retains Original URIs ● Translates back to

Original URIs on export● Round-trip SMW ↔ RDF● tinyurl.com/getrdfio

Page 8: Batch import of large RDF datasets into Semantic MediaWiki

Turning RDF Triples into Wiki Pages

<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden><http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany><http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer

Page 9: Batch import of large RDF datasets into Semantic MediaWiki

Turning RDF Triples into Wiki Pages

<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden><http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany><http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer

Stockholm

[[Located In::Sweden]][[Population::789024]][[Original URI::http://ex.org/Stockholm]]

Frankfurt

[[Located In::Germany]][[Population::731095]][[Original URI::http://ex.org/Frankfurt]]

Page 10: Batch import of large RDF datasets into Semantic MediaWiki

Turning RDF Triples into Wiki Pages

<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden><http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany><http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer

Sweden

[[Original URI::http://ex.org/Sweden]]

Germany

[[Original URI::http://ex.org/Germany]]

Stockholm

[[Located In::Sweden]][[Population::789024]][[Original URI::http://ex.org/Stockholm]]

Frankfurt

[[Located In::Germany]][[Population::731095]][[Original URI::http://ex.org/Frankfurt]]

Page 11: Batch import of large RDF datasets into Semantic MediaWiki

Turning RDF Triples into Wiki Pages

<http://ex.org/Stockholm> <http://ex.org/onto/LocatedIn> <http://ex.org/Sweden><http://ex.org/Stockholm> <http://ex.org/onto/Population> "789024"^^xsd:integer<http://ex.org/Frankfurt> <http://ex.org/onto/LocatedIn> <http://ex.org/Germany><http://ex.org/Frankfurt> <http://ex.org/onto/Population> "731095"^^xsd:integer

Property:LocatedIn

[[Has type::Page]][[Original URI::http://ex.org/LocatedIn]]

Property:Population

[[Has type::Number]][[Original URI::http://ex.org/Population]]

Sweden

[[Original URI::http://ex.org/Sweden]]

Germany

[[Original URI::http://ex.org/Germany]]

Stockholm

[[Located In::Sweden]][[Population::789024]][[Original URI::http://ex.org/Stockholm]]

Frankfurt

[[Located In::Germany]][[Population::731095]][[Original URI::http://ex.org/Frankfurt]]

Page 12: Batch import of large RDF datasets into Semantic MediaWiki

RDF Import interface

Page 13: Batch import of large RDF datasets into Semantic MediaWiki

SPARQL Endpoint

Page 14: Batch import of large RDF datasets into Semantic MediaWiki

SPARQL: Output Original URI

Page 15: Batch import of large RDF datasets into Semantic MediaWiki

SPARQL: Query by Original URI

Page 16: Batch import of large RDF datasets into Semantic MediaWiki

RDFIO History Timeline

Page 17: Batch import of large RDF datasets into Semantic MediaWiki

RDFIO – Current Status

● SMW 2.3 support – with some hacks(Ali working on the last minor issues)

● See the Vagrant box for a working automated setup with MW 1.26.4 + SMW 2.3.1:– github.com/rdfio/rdfio-vagrantbox

● Some known minor issues

Page 18: Batch import of large RDF datasets into Semantic MediaWiki

New Feature: Commandline Import

Page 19: Batch import of large RDF datasets into Semantic MediaWiki

Problem:

● Importing 300K triples can take like 24h........

● What if you realize a mis-configurationonly after 24h?

Page 20: Batch import of large RDF datasets into Semantic MediaWiki

Solution:

rdf2smw(new tool)

Page 21: Batch import of large RDF datasets into Semantic MediaWiki

The new rdf2smw tool

● Convert RDF → MediaWiki XML (Really fast!)● Import via MediaWiki XML import (Still slow...)● But: Can now preview before the XML import!

Page 22: Batch import of large RDF datasets into Semantic MediaWiki

More rdf2smw facts:

● Written in Go for compiled, multi-core performance● Very pluggable architecture● Easy to install: Just download and run!● Get it: github.com/samuell/rdf2smw

Page 23: Batch import of large RDF datasets into Semantic MediaWiki

rdf2smw: Architecture

Page 24: Batch import of large RDF datasets into Semantic MediaWiki

rdf2smw performance

50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 5500000

100

200

300

400

500

600

Number of triples

Exe

cutio

n t

ime

(s)

Page 25: Batch import of large RDF datasets into Semantic MediaWiki

Future outlook

● How to make RDFIO more maintainable, for developers with too little time?

● Drastically simplify?● Break out well defined sub-modules?

(SPARQL endpoint, RDF Import, etc)● Integrate with MW REST API Instead of dedicated Special-

page – as per Denny’s original idea with SMWWriter?● Re-use core SMW functionality more? (Or not?)● Your ideas?

Page 26: Batch import of large RDF datasets into Semantic MediaWiki

RDFIO Vagrant box

github.com/rdfio/rdfio-vagrantbox

$ vagrant up20 min

Page 27: Batch import of large RDF datasets into Semantic MediaWiki

The new Vagrant box: Set up MW + SMW + RDFIO in 7 steps

1) Install dependencies

2) $ git clone https://github.com/rdfio/rdfio-vagrantbox.git

3) $ cd rdfio-vagrantbox

4) $ vagrant up

5) Surf in on localhost:8080/w/index.php/Special:RDFIOAdmin

6) Log in with Admin and changethis

7) Click “Setup”

Done!

Page 28: Batch import of large RDF datasets into Semantic MediaWiki

Acknowledgements

● Denny Vrandečić (@vrandezo) - Basically had the same idea for an extension already when the (eventually accepted) GSOC proposal was submitted in 2010, and supported the project with valuable ideas and though mentoring the GSOC 2010 project.

● Ali King (@ali_king) – Has done great work at updating the extension to the latest standards and versions, and added the new template editing functionality, as part of aOPW 2014 project.

● Joel Sachs (@xjsachs) - Championed the addition of the template editing functionality, provided valuable encouragement and mentored Ali King’s FOSS OPW project.

● Egon Willighagen (@egonwillighagen) - Has supported the project with valuable testing, constructive feedback, encouragement and new ideas.

● Ola Spjuth (@ola_spjuth) – Has provided constructive feedback and encouragement, as well as financed parts of the further development of the project.

● Google Inc. - Supported the initial development through it’s summer of code program (GSOC) in 2010.

● Gnome Foundation - Supporting further development as part of its outreach program for women (OPW) in 2014.


Recommended