+ All Categories
Home > Technology > SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

Date post: 10-May-2015
Category:
Upload: ansgar-scherp
View: 848 times
Download: 0 times
Share this document with a friend
Description:
Slides of the billion triple challenge 2011 on SchemEX. Please download original file to enjoy all animations.
Popular Tags:
19
SchemEX Creating the Yellow Pages of the LOD Cloud Mathias Konrath, Thomas Gottron, Ansgar Scherp
Transcript
Page 1: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

SchemEX Creating the Yellow Pages of the LOD Cloud

Mathias Konrath, Thomas Gottron, Ansgar Scherp

Page 2: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

2 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Scenario• People who are politicians and actors

• Who else?• Where do they live? • Whom do they know?

Page 3: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

3 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Problem

• Execute those queries on the LOD cloud

Relevant sources?

• No single federated query interface provided

“politicians and actors”

Page 4: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

4 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Principle Solution

• Suitable index structure for looking up sources

“politicians and actors”

Page 5: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

5 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

The Naive Approach

1. Download the entire LOD cloud2. Put it into a (really) large triple store3. Process the data and extract schema4. Provide lookup

- Big machinery- Late in processing the data- High effort to scale with LOD cloudCan we do smarter?

Page 6: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

6 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Yes, we can …

Page 7: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

7 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

The SchemEX Approach• Stream-based schema extraction• While crawling the data

Nquad-Stream

SchemaSchema-Extractor

Parser

Instance-Cache

LOD-CrawlerRDF-DumpTriple Store

NxParser

RDFRDBMS

FIFO

Page 8: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

8 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

• Observe a quadruple stream from LD spider

Efficient Instance Cache

• Ring queue, backed up by a hash map• Organizes triples with same subject URI• Dismiss oldest, when cache full (FIFO)→ Runtime complexity O(1)

Page 9: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

9 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

DS1 DS2 DS3 DS4 DS5 DSxData

sources

consistsOf

hasDataSource

Building the Schema and Index

EQC1 EQC2 EQCn Equivalenceclasses

hasEQClass p1 p2

TC1 TC2 TCm

Type clusters…

c2c1 c3 ckRDF

classes…

Page 10: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

10 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Computing SchemEX: TimBL Data Set

• Analysis of a smaller data set• 11 M triples, TimBL’s FOAF profile• LDspider with ~ 2k triples / sec

• Different cache sizes: 100, 1k, 10k, 50k, 100k• Compared SchemEX with reference schema• Index queries on all Types, TCs, EQCs• Good precision/recall ratio at 50k+

Page 11: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

11 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Computing SchemEX: Full BTC 2011 Data

Cache size: 50 k

Page 12: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

12 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Conclusions: SchemEX

• Stream-based approach to schema extraction • Scalable to arbitrary amount of Linked Data • Applicable on commodity hardware

(4GB RAM, standard single CPU)

• Lookup-index to find relevant data sources• Support federated queries on the LOD cloud

Page 13: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

13 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

BACKUP

Page 14: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

14 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

SchemEX Computation: Window Sizes

Crawled TimBL dataset (11M triples)

Runtime increases hardly withgreater window sizes

Memory consumption scaleswith window size

Page 15: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

15 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

SchemEX Quality: Precision

Page 16: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

16 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

SchemEX Quality: Recall

Page 17: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

17 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Example Data Graph

Page 18: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

18 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

Output Vocabulary: voiD

Page 19: SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud

19 of 12 SchemEX – Mathias Konrath, Thomas Gottron, Ansgar Scherp

SchemEX Extraction: Progress PlotC

ount

# processed instances# processed instances

Type-clusterEquivalence classes


Recommended