Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | juergen-umbrich |
View: | 290 times |
Download: | 1 times |
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Querying Live Linked Data
Mini Viva presentation ( 11.02.2011)
1
by Jürgen Umbrich
Digital Enterprise Research Institute www.deri.ie
Querying in the Linked Data space
millions of diverse but often interrelated data
sources
“data everywhere” on the Web
no complete control over the data
crawl IndexYars2
Virtuoso
livedistributed querying
QP
sta
tic
dyn
am
ic
2
Digital Enterprise Research Institute www.deri.ie
Linked Data is Dynamic
Dataset – Web data (’08 – ‘09) 24 weekly snapshots 4 hop neighborhood from Tim Berners-Lee FOAF file 550K RDF/XML docs, 3.3M unique entities
[ Umbrich et al. 2010 ]
Findings (entity level)
68% 32%
static dynamic
3
52%
24%
10%14%
<1 week >1 week<= 1 month
>1 month<= 3 month
>3 month<= 6 month
Change frequencyChange frequency
Digital Enterprise Research Institute www.deri.ie
Accessing Linked Data
① Use URIs for things② Use HTTP URIs so that
people can look it up③ Provide useful
information, using standards (RDF, SPARQL)
④ Include links to other URIs
① Use URIs for things② Use HTTP URIs so that
people can look it up③ Provide useful
information, using standards (RDF, SPARQL)
④ Include links to other URIs
Direct correspondence between thing-URI and source-URI
http://umbrich.net/foaf.rdf#mehttp://umbrich.net/foaf.rdf#me
HTTP-GETHTTP-GET
http://umbrich.net/foaf.rdfhttp://umbrich.net/foaf.rdf
RDF/XMLRDF/XML
#me#me
http://dbpedia.org/resource/Galway
http://dbpedia.org/resource/Galway
4
foaf:based_near
Digital Enterprise Research Institute www.deri.ie
Accessing Linked Data
http://dbpedia.org/resource/Galwayhttp://dbpedia.org/resource/Galway
Re-direct correspondence between thing-URI and source-URI
HTTP-GETHTTP-GET
http://dbpedia.org/data/Galwayhttp://dbpedia.org/data/Galway
HTMLHTML
http://dbpedia.org/page/Galwayhttp://dbpedia.org/page/Galway
Direct correspondence between thing-URI and source-URI
http://umbrich.net/foaf.rdf#mehttp://umbrich.net/foaf.rdf#me
HTTP-GETHTTP-GET
http://umbrich.net/foaf.rdfhttp://umbrich.net/foaf.rdf
RDF/XMLRDF/XML
#me#me
http://dbpedia.org/resource/Galway
http://dbpedia.org/resource/Galway
RDF/XMLRDF/XML
5
Digital Enterprise Research Institute www.deri.ie
SELECT ?friendLabel WHERE{ juum:me foaf:knows ?f . ?f foaf:name ?friendLabel .}
SELECT ?friendLabel WHERE{ juum:me foaf:knows ?f . ?f foaf:name ?friendLabel .}
The Problem
What are the query relevant sources?
Example Query
?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .
polleres.net/foaf.rdf
6
umbrich.net/foaf.rdf sw.deri.org/~aidanh/
Digital Enterprise Research Institute www.deri.ie
Index
Source Selection Approaches
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .
“Aidan Hogan”
“Aidan Hogan”
“Axel Polleres”
“Axel Polleres”
7
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
HTTP GET “Aidan Hogan” HTTP GET
“Axel Polleres”
HTTP GET
Digital Enterprise Research Institute www.deri.ie
Source Selection Approaches
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
Direct execution/ graph traversal [Hartig et al. 2009]
Direct execution/ graph traversal [Hartig et al. 2009]
?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .
HTTP GET HTTP GET
“Aidan Hogan”
“Aidan Hogan”
“Aidan Hogan”
“Axel Polleres”
“Axel Polleres”
“Axel Polleres”
8
Direct execution/ graph traversal [Hartig et al. 2009]
Direct execution/ graph traversal [Hartig et al. 2009]
Digital Enterprise Research Institute www.deri.ie
Source Selection Approaches
Schema-Level Indices [Stuckenschmidt et al.
2004]
Schema-Level Indices [Stuckenschmidt et al.
2004]Data Summaries
[Umbrich et al. 2010]Data Summaries
[Umbrich et al. 2010]
Inverted Indices [Heflin et al. 2010] (e.g.
Sindice V1.0)
Inverted Indices [Heflin et al. 2010] (e.g.
Sindice V1.0)
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
Direct execution/ graph traversal [Hartig et al.
2009]
Direct execution/ graph traversal [Hartig et al.
2009]
Index SizeQuery time recall freshness
ResultsQuery System
9
Digital Enterprise Research Institute www.deri.ie
Approximate Data Summaries
Combined description of schema level and instance level
Use approximation to reduce index size (incurs false positives)
Index growth only with the number of sources
10
Multidimensional numerical dataspace
Hash-based data summaries
o
s1
301
30
Digital Enterprise Research Institute www.deri.ie
o
s1
301
30
10
20
10 20
Hash-based Data Summaries
① juum:me foaf:knows ah:ah <http…foaf.rdf>
11
① Input: triple + source information ② Hash triples
② [ 24 , 5 , 2 ] <http…foaf.rdf>
③ Insert hash-triple into dataspaceand store source information with buckets
③ INS([ 24 , 5 , 2 ] , http…foaf.rdf )
Equi-width histogram
④ Query for relevant sources
④ QUERY ( juum:me ?p ?o ) -> ( 24, ?, ? )
Digital Enterprise Research Institute www.deri.ie
o
s1
301
30
10
20
10 20
QTree: Efficient source selection
12
Equi-width histogram QTree
Combination of histograms and R-tree inheriting thebenefit of both data structures optimal for sparse data
Buckets store cardinality and set of sources => Top-k source rankinge.g. R1,1 ( 1: { http://…/foaf.rdf } )
Digital Enterprise Research Institute www.deri.ie
Evaluation: Source Selection
13
J. Umbrich, K. Hose, M. Karnstedt, A. Harth, A. Polleres."Comparing Data Summaries for Processing Live Queries over Linked Data.”. In WWW Journal, Special Issue "Querying the Data Web", 2011
Digital Enterprise Research Institute www.deri.ie
Source Selection Approaches
Schema-Level Indices [Stuckenschmidt et al.
2004]
Schema-Level Indices [Stuckenschmidt et al.
2004]Data Summaries
[Umbrich et al. 2010]Data Summaries
[Umbrich et al. 2010]
Inverted Indices [Hefflin et al. 2010] (e.g. Sindice V1.0)
Inverted Indices [Hefflin et al. 2010] (e.g. Sindice V1.0)
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
Direct execution/ graph traversal [Hartig et al.
2009]
Direct execution/ graph traversal [Hartig et al.
2009]
Index SizeQuery time recall freshness
ResultsQuery System
14
Digital Enterprise Research Institute www.deri.ie
Querying in the Linked Data Space
millions of diverse but often interrelated data
sources
“data everywhere” on the Web
no complete control over the data
crawl MATIndex
livedistributed querying
QP
sta
tic
dyn
am
ic
Combined Query of RDF stores and the Linked Data Web
Combined Query of RDF stores and the Linked Data Web
15
Digital Enterprise Research Institute www.deri.ie
Improved Query Time & Fresh Results
query
tim
e
#number of query execution
live querying
index querying
16
combined queryinglearning about source dynamics
combined querying
decrease query time by avoiding unnecessary HTTP lookups and still returning fresh results
Digital Enterprise Research Institute www.deri.ie
Current Research Question
17
How to combined queryRDF stores and the Linked Data Web
Digital Enterprise Research Institute www.deri.ie
Combined Query Processing
Live results on top of SPARQL stores
SPARQL
Index
query
live results
Query Processo
r
18
to decide (at query time) if we access the static store or the Web resources
Linked Data Web
by integrating the knowledge about the dynamic of sources into the query processor
SourceSelectio
n
Dynamics
SourceSelectio
n
Dynamics
Query Processo
r
Yars2,Virtuoso
Digital Enterprise Research Institute www.deri.ie
Mining Dynamic/Static Patterns
Goal acquire knowledge about dynamic patterns
( e.g. geo:lat, geo:long) Considering context of a node ( e.g. a location value of a city
vs location value of a GPS sensor )
19
Dynamics Based on two datasets (started in March 2010 ) Daily 3-hop neighborhood crawls from 20 seed URIs Weekly snapshots over ~10 month
10% sampling from a billion triples crawl(fixed URI list, contains ~2K web vocabularies)
Learn to predict changes events
Digital Enterprise Research Institute www.deri.ie
Query Processor
Collaboration with Yuan (APEXLAB)
Elaboration on how dynamic query planning can support data access decision taking into account dynamic patterns
Investigation of one of the possible approaches
20
Query Processo
r
Digital Enterprise Research Institute www.deri.ie
Evaluation
Based on simulation using our dynamic mining dataset
Based on real-world data Linked Stream Data effort Using the gathered knowledge from our dynamic mining
Evaluation criteria Query time ( number of HTTP lookups ) Result freshness Recall (number of results)
21
Digital Enterprise Research Institute www.deri.ie
22
How to combined query RDF stores and the Linked Data Web
to return fresh results
SourceSelecti
on
Dynamics
SPARQL
Index
Query Process
or
query
live results
Questions ?
Digital Enterprise Research Institute www.deri.ie
Literature
23
[Hartig 2009 ] O. Hartig, Ch. Bizer, and J.-Ch. Freytag. Executing SPARQL Queries over the Web of Linked Data. In ISWC’09, 2009.
[Stuckenschmidt] H. Stuckenschmidt, R. Vdovjak, J. Broekstra, and G.-J. Houben. Towards distributed processing of RDF path queries. JWET, 2(2/3):207–230, 2005.
[Umbrich 2010] J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, S. Decker. Towards Understanding Dataset Dynamics: Change Frequency of Linked Data Sources. LODW 2010 at WWW 2010, 2010.
. [Heflin 2010] Y. Li and J. Heflin. Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources. In proceedings of the 9th International Semantic Web Conference (ISWC2010). 2010.