Date post: | 23-Feb-2017 |
Category: |
Internet |
Upload: | jean-paul-calbimonte |
View: | 383 times |
Download: | 3 times |
Query Rewriting in RDF Stream Processing
Jean-Paul Calbimonte – Jose Mora – Oscar Corcho
LSIR EPFLOEG UPM
ESWC
Heraklion, Greece. June 2016
@jpcik
2
Query Rewriting in
RDF Stream Processing
What?
RDF Streams
3
s p oTriple
s p o t
Time-stamped triple
s p o
s p o
s p o
t
Time-stamped graph
Gi-1 Gi Gi+1 Gi+2 Gi+3
RDF stream
RSPS
W(S,t)
Windowed Queries in RSP
4
:obs1 rdf:type ssn:Observation .:obs2 rdf:type ssn:Observation .:obs3 rdf:type ssn:Observation .
qG(x) ← ssn:Observation(x) :obs1,:obs2 ,:obs3
qW(S,t)(x) ← ssn:Observation(x)
:obs1,:obs2 [5]
:obs3,:obs4 [10]
:obs5,:obs6 [15]
g1[3][4]
[7]
[10]
[11]
[13]
:obs1 rdf:type ssn:Observation . g2 :obs2 rdf:type ssn:Observation .
g3 :obs3 rdf:type ssn:Observation .
g4 :obs4 rdf:type ssn:Observation . g5 :obs5 rdf:type ssn:Observation .
g6 :obs6 rdf:type ssn:Observation .
[1] S
G grap
h
WW size=5
qW(S,t)(x) ← ssn:observedBy(x,y)
No answers for this query: no matches in the stream
RSP Query Languages
5
SELECT (MAX(?temp) AS ?maxtemp) ?sensorWHERE { STREAM :stream [RANGE 1h] { ?obs ssn:observationResult ?result; ssn:observedProperty cf-property:air_temperature; ssn:observedBy ?sensor. ?result ssn:hasValue ?obsValue. ?obsValue qu:numericalValue ?temp. }} GROUP BY ?sensor
SELECT ?obs WHERE { STREAM :stream [RANGE 5s] {?obs rdf:type ssn:Observation. }}
CQELSC-SPARQLSPARQLStream
EP-SPARQL
RSP languages
Example:Get the maximum air temperature observed in the last hour, grouped by sensor.
6
Query Rewriting in
RDF Stream Processing
How?
Query Rewriting
7
q’’(x) ← HumiditySensor(x)
Query Rewriting
q(x) ← Sensor(x) q’(x) ← Sensor(x)
Sensor
HumiditySensorTBox
Original QueryRewritten UCQ
ABox
Ontology expressiveness: e.g. RDFS, ELHIO, etc.
OBDA systems, query rewriters: Prexto, Requiem, Kyrie, etc.
Explosion of rewritings, complexity of queries
Semantic Sensor Network Ontology
8
ssn:Sensor
ssn:Platform
ssn:FeatureOfInterest
ssn:Deployment
ssn:Property
cf-prop:air_temperature
ssn:observes
ssn:onPlatform
dul:Placedul:hasLocation
ssn:SensingDevicessn:inDeployment
ssn:MeasurementCapability
ssn:MeasurementProperty
geo:lat, geo:lngxsd:double
ssn:hasMeasurementProperty
ssn:Accuracy
ssn:ofFeature
aws:TemperatureSensor
aws:Thermistor
ssn:Latency
dim:Temperature
qu:QuantityKind
cf-prop:soil_temperature
cf-feat:Wind
cf-feat:Surface
cf-feat:Medium
cf-feat:aircf-feat:soil
dim:VelocityOrSpeed cf-prop:wind_speedcf-prop:rainfall_rate
aws:CapacitiveBead …
…
…
Rewriting in RSP
9
:obs1,:obs2 [5]
:obs3,:obs4 [10]
:obs5,:obs6 [15]
g1[3][4]
[7]
[10]
[11]
[13]
:obs1 rdf:type ssn:Observation . g2 :obs2 rdf:type ssn:Observation .
g3 :obs3 rdf:type ssn:Observation .
g4 :obs4 rdf:type ssn:Observation .
g5 :obs5 rdf:type ssn:Observation .
g6 :obs6 rdf:type ssn:Observation .
[1]
q’’W(S,t)(x) ← ssn:Observation(x) Query
RewritingqW(S,t)(x) ← ssn:observedBy(x,y)
q’W(S,t)(x) ← ssn:observedBy(x,y)
Original QueryRewritten UCQ
ssn:Observation ssn:observedBy
S
W(S,t)
Rewriting in RSP: Semantics
10
𝑞 :𝑞h (𝒙 )←𝑝1∧…∧𝑝𝑛(𝒙𝑛)
𝑐𝑒𝑟𝑡 (𝑞 ,𝒪 )= {𝜶∨𝑞∪𝒯∪𝒜⊨𝑞h(𝜶 )}
𝑐𝑒𝑟𝑡 (𝑞 ,𝒪 )= {𝜶∨𝑞 ′∪𝒜⊨𝑞h(𝜶)}
Stream:
𝑐𝑒𝑟𝑡 (𝑞𝑤 , ⟨𝒯 ,𝒜𝑤 ⟩ )= {𝜶∨𝑞𝑤∪𝒯∪𝒜𝑤⊨𝑞h(𝜶 )}
𝑐𝑒𝑟𝑡 (𝑞𝑤 , ⟨𝒯 ,𝒜𝑤 ⟩ )= {𝜶∨𝑞 ′𝑤∪𝒜𝑤⊨𝑞h(𝜶)}
Query
Certain answers
Ontology
rewrittenquery
get rid of TBox
Streaming Aboxes
Window
Evaluated only for a window
Now for RDF streams:
Implementation: StreamQR
11
kyrie rewriter CQELS
OntologyTBOX
StreamQR
query registration
continuous answers
CQELSquery CQELSUCQ
RDF Stream
Github StreamQR: https://github.com/jpcik/streapler/tree/streamQR
StreamQR: Rewriting Steps
12
preprocess
separate BGP & context
OntologyTBOX
Datalog query
CQELSquery
RewrittenCQELSUCQ
remove functional
terms
remove auxiliary
predicates
expand syntactic transformation
UCQ
CQ (BGP)
StreamQR in Action
13
met:TemperatureObservation ssn:observedBy.aws:TemperatureSensormet:AirTemperatureObservation met:TemperatureObservation met:ThermistorObservation met:TemperatureObservation aws:Thermistor aws:TemperatureSensor aws:CapacitiveBead aws:TemperatureSensor
TBoxCONSTRUCT { ?o a :ObservedTemperature. }WHERE { STREAM :stream1 [RANGE 10ms] { ?o ssn:observedBy ?t . ?t a aws:TemperatureSensor. }}
Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)
kyrie rewriter
CQELS Query
CQ extraction & syntactic transformation
StreamQR in Action
14
Q(?0) <- met:TemperatureObservation(?0)Q(?0) <- met:AirTemperatureObservation(?0)Q(?0) <- met:ThermistorObservation(?0)Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:Thermistor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:CapacitiveBead(?1), ssn:observedBy(?0,?1)
Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)
met:TemperatureObservation ssn:observedBy.aws:TemperatureSensormet:AirTemperatureObservation met:TemperatureObservation met:ThermistorObservation met:TemperatureObservation aws:Thermistor aws:TemperatureSensor aws:CapacitiveBead aws:TemperatureSensor
TBox
Query expansion
Rewritten UCQ
StreamQR in Action
15
Q(?0) <- met:TemperatureObservation(?0)Q(?0) <- met:AirTemperatureObservation(?0)Q(?0) <- met:ThermistorObservation(?0)Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:Thermistor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:CapacitiveBead(?1), ssn:observedBy(?0,?1)
PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn#>PREFIX aws: <http://purl.oclc.org/NET/ssnx/meteo/aws#>PREFIX met: <http://purl.org/env/meteo#>CONSTRUCT { ?o a :ObservedTemperature. }WHERE { STREAM :stream1 [RANGE 10ms] { { ?o a met:TemperatureObservation } UNION { ?o a met:AirTemperatureObservation } UNION { ?o a met:ThermistorObservation } UNION { ?s a aws:TemperatureSensor . ?o ssn:observedBy ?s } UNION { ?s a aws:Thermistor . ?o ssn:observedBy ?s } UNION { ?s a aws:CapacitiveBead . ?o ssn:observedBy ?s }}CQELS rewritten
query: syntactic transformation
Rewritten UCQ
Comparison: Rewriting vs No-rewriting
16
• Throughput under different input data loads• Modified version of the SRBench benchmark queries• Ontology based on AWS, which extends SSN-O
• Compared the throughput of StreamQR with CQELS (no rewiriting)• Tested under different input rates, ranging from 10 to 100K triples/s
• Under three different load conditions: such that only:• 10%, 50% and 90% of the input data matches the continuous query
10% of matches 90% of matches50% of matches
Reaches optimal t-put up to this point
Varying input distributions
17
• The more matches, the more time the engine spends on evaluation • Up to 10K triples/s., StreamQR is capable to keep up with the input• Beyond that, the throughput degrades until it reaches a limit
Changning que input stream triple distribution
• Experiments under different input loads• Varying distribution of the types of triples
• 10, 20, 50, 80 and 90% of the triples match the query.
Throughput for different queries
18
Different queries prducing from 2 to 180 rewritings
• Throughput decreases for queries that produce more rewrittings • Optimizations: existing techniques used in query rewriting and OBDA• Pruning queries that may not match any input. • For around 1K triples/s, it still reaches maximum throughput.
• Different queries -> UCQ with a different number of sub-queries. • 9 distinct queries that produce from 2 to over 180 sub-queries
Comparing with TrOwl
19
• Removal operation known to be expensive in incremental materialization. • StreamQR sustains better throughput under fast input rates• Under lower input rates both are able to reach maximum throughput.
• TrOWL Provides incremental reasoning for ABoxes• Three settings:
• TrOWL without performing any reasoning (noreclassify)• Activating the reasoning, but allowing only additions• Including removals
Conclusions
20
• Query answering over ontologies for RDF stream processors
• Novel approach that combines query rewriting techniques and an
RSP engines
• Implemented StreamQR: incorporates the kyrie into an existing RSP
• Still efficient in terms of throughput, for a large range of scenarios
• Plan to study other criteria such as correctness
• Exploring different expressiveness
• Find a good balance between efficiency and complexity.
• Research in approaches that combine rewriting and incremental
materialization
Thanks a lot!¿Tienes preguntas?
Jean-Paul CalbimonteLSIR EPFL
@jpcik