Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | le-duc-thang |
View: | 230 times |
Download: | 0 times |
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 1/24
Top-k Linked D
Query Process
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 2/24
• Linked Data is about using the Web to connect related data that w
previously linked.
Linked Data - Definition
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 3/24
• Use URIs to define things
• Use HTTP URIs so that these things can be referred to andup
• Provide useful information in RDF – when someone looks
• Include RDF links to other URIs – to enable discovery of reinformation
Linked Data - Principle
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 4/24
• URIs
– Global unique identifiers for entites
– Pointers to data
• HTTP to access data on the Web
• RDF as a share data model
• FORMATS ( RDF/XML, RDFa,…) / HYPERLINKS
Linked Data - Component
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 5/24
• A resource is basically everything
– E.g. persons, places, Web documents, abstract concepts
• Descriptions of resources
– Attributes
– Relations
• The framework contains:
– A data model
– Languages and syntaxes
RDF
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 6/24
• Data comes as a set of triples (subject, predicate, objec
• Subject: resources
• Predicate: properties
• Object: literals or resources
• Examples: – ( Mount Baker , last eruption , 1880 )
– ( Mount Baker , location , Washington )
RDF Data Model
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 7/24
• RDF is also a graph model
– Triples as directed edges
– Subjects and objects as vertices
– Edges labeled by predicate
• Example:
– ( Mount Baker , last eruption , 1880 ) – ( Mount Baker , location , Washington )
RDF Data Model
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 8/24
• URIs extend the concept of URLs
– Globally unique identifier for resources
– URL of a Web document usually used as its URI
– Attention: URIs identify not only Web documents
• Example:
– Me: http://olafhartig.de/~hartig/foaf.rdf#olaf – RDF document about me: http://olafhartig.de/~hartig/foaf.rdf
– HTML document about me: http://olafhartig.de/~hartig/index.h
URI
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 9/24
URI
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 10/24
• Literals may occur in the object position of triples
• Represented by strings
• Literal strings interpreted by datatypes
– Datatype identified by a URI
– Common to use the XML Schema datatypes
– No datatype: interpreted as xsd:string
– Untyped literals may have language tags (e.g. @de)
Literal
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 11/24
• Blank nodes represent unnamed, anonymous resources
– Not identified by a URI
• Blank node identifiers
– Identification of blank nodes in triple serializations
• Form: _:xyz
• Scope: a single RDF graph
Blank Nodes
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 12/24
• Extends the definition of RDF nodes and RDF triples
– RDF node: I, B, and L, which are pair-wise disjoint infinite sets of IResource Identifiers (IRIs), blank nodes and literals
– RDF triple: (s, p, o) ∈ IB × I × IBL, where IL = I ∪ L, IB = I ∪ B and IBL
Linked Stream Data model
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 13/24
• White box architecture
– Implements all required components
– physical operators (e.g. windows, join, triple pattern matching)
– data structures (e.g. B+-Trees, hashtables)
– query generator/optimizer/executor
•
Black box architecture – Uses existing RDF and data stream processing systems as sub-co
– Query rewriter, data translator and orchestrator among subcomneeded
System architecture
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 14/24
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:ex="http://example.com/">
<foaf:Band rdf:about="http://example.com/Beatles/">
<foaf:name>Beatles</foaf:name>
<ex:album rdf:about="http://example.com/Beatles/Sgt_Pepper/">
<ex:name>Sgt_Pepper</ex:name>
<ex:song rdf:about="http://example.com/Beatles/Sgt_Pepper/Lucy">
<ex:name>Lucy</ex:name>
</ex:song>
</ex:album>
<ex:album rdf:about="http://example.com/Beatles/Help!/">
<ex:name>Help!</ex:name>
<ex:song rdf:about="http://example.com/Beatles/Help!/Help!">
Data model
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 15/24
-Get all songs from Beatles’s album
PREFIX ex: http://example.com/
PREFIX foaf: http://xmlns.com/foaf/0.1/
PREFIX : http://example.com/resource/
SELECT * WHERE
ex:beatle ex:album ?album
?album ex:song ?song
}
Query
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 16/24
• Purpose
– Efficiency and scalability are essential problems in the Linked D
– Instead of computing all results, top-k query processing approacproduce only the “best" k results
Top-k Linked Data Query Proces
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 17/24
• Source index
– Map or match triple pattern to sources containing bindings
Top-k Linked Data Query ProcessingRequirement (just use for binary op
Linked DataQuery ProcessingEngine
ex:sgt_pepper foaf:n
"Sgt. Pepper";
ex:song "Lucy".
Src.2
ex:help foaf:name
"Help!";
ex:song "Help!".
Src.3ex:beatles foaf:name
"The Beatles";
ex:album ex:sgt_pepper;
ex:album ex:help.
Src.1
TP1: ex:beatles ex:album ?album .
TP2: ?album ex:song ?song .
sourceindex
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 18/24
• Ranking function
– Determining the relevance of triple pattern bindings
– For instance, scores for triples can be obtained through PageRank ranking
– However, no triples are indexed (i.e., each source must be scanne
Top-k Linked Data Query Proces
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 19/24
• Sorted access
Top-k Linked Data Query Proces
TP2: ?album ex:song ?song
Src.2
TP1:
ex:beatles ex:album ?album
Bindidesc score
Sche
Str
Src.3
2
Src.1
1
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 20/24
• Push Based Rank Join
Top-k Linked Data Query Proces
Sorted Access forex:beatles ex:album ?album .
Sorted Access for?album ex:song ?
Score Query Bindings – Output Queue
ex:beatles foaf:name
"The Beatles";
ex:album ex:sgt_pepper;
ex:album ex:help.
Src.1 ex:help foaf:name
"Help!";
ex:song "Help!".
Src.3
Score Seen Triples (TP2)
3 ex:help ex:song "Help!"
2 ex:sgt_pepper ex:song
“Lucy“ (skip because of
score 2 <3, just only pus
“help!” )
Score Seen Triples (TP1)
1 ex:beatles ex:album
ex:sgt_pepper
1 ex:beatles ex:albumex:help (because
1=1 so choose both
of them)
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 21/24
• Push Based Rank Join
Top-k Linked Data Query Proces
Score Query Bindings – Output Queue
4ex:beatles ex:album ex:help.ex:help ex:song "Help!" .
Threshold: 4(max (1+3, 1+3))
Sorted Access forex:beatles ex:album ?album .
Sorted Access fo?album ex:song
Score Seen Triples (TP2)
3 ex:help ex:song "Help
Score Seen Triples (TP1)
1 ex:beatles ex:album
ex:sgt_pepper
1 ex:beatles ex:albumex:help
Found query bindingwith score ≥ threshold
STOP
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 22/24
• Improving threshold estimation
– Origin threshold estimation:
– How to improve:
– Star-shaped entity query bounds
– Look-ahead bounds
Top-k Linked Data Query Proces
Threshold: max { max_1 + min_2, max_2 + min_1}
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 23/24
• Star-shaped entity query bounds
– Problem:
– In Linked Data query processing, every result for entity queries to suis contained in one single source.
– Reason:
– A result here is an entity,
– Information related to that entity comes exclusively from the one sorepresenting that particular entity.
– Idea:
– Upper bound scores for triple pattern bindings via the maximal posscore
Top-k Linked Data Query Proces
7/27/2019 Linked Data+Top-K Query Processing
http://slidepdf.com/reader/full/linked-datatop-k-query-processing 24/24
• Look-ahead Bounds:
– Provide a more accurate upper bound for the unseen bindings
the next possible score
Top-k Linked Data Query Proces
ma
mi
Threshold: max { 1 + 2 , 1 + 3 } = 4
Score Seen Triples (TP2)
3 ex:help ex:song "Help!"
Sorted Access for?album ex:song ?song
Src. 2
S
mi
max_1 = 1
min_1 = 1
Score Seen Triples (TP1)
1 ex:beatles ex:album
ex:sgt_pepper
1 ex:beatles ex:album
ex:help
Sorted Access for
Score Query Bindings – Output Queue
4 ex:beatles ex:album ex:help .ex:help ex:song "Help!" .