Date post: | 23-Aug-2014 |
Category: |
Lifestyle |
Upload: | emanuele-della-valle |
View: | 1,130 times |
Download: | 2 times |
http://streamreasoning.org
Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data Emanuele Della Valle [email protected] - http://emanueledellavalle.org
Emanuele Della Valle - http://streamreasoning.org/
Acknowledges § This talk presents the content of a joint paper with
Stefan Schlobachb, Markus Krötzschc, Alessandro Bozzona, Stefano Ceria, and Ian Horrocksc to appear on SWJ a Politecnico di Milano b Vrije Universiteit Amsterdam c Univerity of Oxford
§ I also want to thank Frank van Harmelenb for his important contribution to the discussion, Tony Lee (Saltlux), Andreas Schreiber (DLR) and Achim Basermann (DLR) for the valuable discussion on concrete examples of problems that require order-aware reasoning. Moreover I want to thank Sara Magliacaneb
for her work on SPARQL-RANK and the slides I use in this presentation, and Marco Balduinia, Davide Barbieria, and Daniele Bragaa for their work on C-SPARQL
§ Check out the paper: • http://www.semantic-web-journal.net/content/order-matters-
harnessing-world-orderings-reasoning-over-massive-data
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
References § The numbers in square brackets refers to references
in the SWJ paper • http://www.semantic-web-journal.net/content/order-
matters-harnessing-world-orderings-reasoning-over-massive-data
§ A short selection of references to my papers is available in the end of the presentation.
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The problem, three use cases, and … § More and more applications require real-time
processing of massive, dynamically generated, data
Space Situational Awareness
Jet Engine Design
Intelligent Surveillance
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 5
The Problem Use case: space junk
[source http://wordlesstech.com/2011/03/26/space-junk/ ]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 6
The Problem Use case: jet engine design
[Source: http://www.sae.org/mags/aem/10018/ ]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 7
The Problem Use case: intelligent surveillance
[Source: http://youtu.be/I3iDBfB_ZC0 ]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Problem … and four common features!
§ their data is ordered, • naturally ordered by recency, proximity, etc. • intrinsically ordered by precision, popularity, provenance,
certainty, trust, etc. • and, in any case, it is explicitly sortable through attribute
values
§ the answers are also required to come in an ordered fashion • engineers surveying a satellite orbit need to know the largest
pieces of debris in closest proximity with maximal certainty, measured with highest precision, etc.
§ they require immediate answers at runtime • flight paths have to be adapted once an object in collision
course is detected
§ and, they require inference • rich ontological models describing complex domain
knowledge is often used to pose the queries and to interpret the results
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 9
The Problem Performance targets
Answer quality at
time t
Computation Time t
Max runtime
Fully correct answers
Target
Real-time behaviour
Current situation
Desired situation
Note: completeness may not be necessary if all relevant answers are found Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Problem A running example
§ Imagine a system which • listens to all micro-posts that are published, • knows the geographic location of social media
users, • has the ability of detecting the topic of each micro-
post, and • has modelled relationships between topics in an
expressive ontological language
§ Let suppose that each of us asks a query like the following to such a system: • Which users of social media, currently leading
popular discussions on fashion-related topics, are closest to my current location? What are they saying about the shopping district nearby?
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 11
The solution space
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Approximation and
parallelisation
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 12
The solution space no ordering, no reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Most of the big data solutions currently on the market • BSP (Bulk Synchronous Parallel) • PRAM (Parallel Random Access Machine) • PGAS (Partitioned Global Access Space) • Map-Reduce implementations • and data-centric workflow systems based on them
§ Some (e.g., Hive and Pig) allow the specification of ordering constraints, but no specific optimisation is provided for top-k or streaming queries
§ W.r.t. the running example • Right performances and scalability • Limited ability to harnessing orderings • Missing inference capability
The solution space no ordering, no reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 14
The solution space Order aware data management
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Order aware data management § When treating massive data order matters!
§ If N is the size of the input, a problem is considered to be “well- solved” if a streaming algorithm exists which requires at most O(poly(log(N)) space and time [31]
Data as a sortable en,ty
where we can enforce orderings easily and logically
e.g., order by • sortable literals • popularity • uncertainty • trust
streaming algorithms
Most relevant answers first
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Order aware data management and approximation
§ approximate, streaming algorithms can outperform classical, data-bound approaches to this problem by several orders of magnitude [6,14].
§ Such approximations can be asymptotic, so that arbitrary accuracy can be achieved [6].
Answer accuracy at computation
time t
Computation Time t
Fully correct answers
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 17
The solution space Harnessing natural orderings
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 18
The solution space Harnessing natural orderings § Continuous queries registered over streams that, in most of
the cases, are observed trough windows
§ Assumption: the recent information being more relevant as it describes the current state of a dynamic system
window
input streams (unbound, and time-varying)
streams of answer Registered Con,nuous Query
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Harnessing natural orderings
§ The nature of streams requires a paradigmatic change* • from persistent data
– to be stored and queried on demand – a.k.a. one time semantics
• to transient data – to be consumed on the fly by continuous queries – a.k.a. continuous semantics
* This paradigmatic change first arose in DB community [31]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Harnessing natural orderings § Two types of solutions
• Data Stream Management Systems (DSMS) • Complex Event Processors (CEP)
§ Research Prototypes • Amazon/Cougar (Cornell) – sensors • Aurora (Brown/MIT) – sensor monitoring, dataflow • Gigascope: AT&T Labs – Network Monitoring • Hancock (AT&T) – Telecom streams • Niagara (OGI/Wisconsin) – Internet DBs & XML • OpenCQ (Georgia) – triggers, view maintenance • Stream (Stanford) – general-purpose DSMS • Stream Mill (UCLA) - power & extensibility • Tapestry (Xerox) – publish/subscribe filtering • Telegraph (Berkeley) – adaptive engine for sensors • Tribeca (Bellcore) – network monitoring
§ High-tech startups • Streambase, Coral8, Apama, Truviso
§ Major DBMS vendors are all adding stream extensions as well • IBM InfoSphere Stream • Microsoft streaminsight • Oracle CEP
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Harnessing natural orderings
§ DSMSs are optimised for the simplest portion of the query in our running example • retrieve the micro posts that have been posted recently
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 22
The solution space Harnessing other types of orders
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Harnessing other types of orders
§ W.r.t. the running example, solutions studied in these two areas allow to efficiently • retrieve nearby shops that are discussed by popular social
media users.
§ This is a typical top-k query • a limited number of results k • ordered by a scoring function • that combines several criteria
– e.g., near by and most discussed
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 24
§ Traditional query evaluation schema: materialize then sort
The solution space - Harnessing other types of orders Treating order as a first class citizen
shops
Materialize join results and order them all by proximity of the shop to the issuer and popularity of the
social media user
discussed
Limit to K
[1,000s]
[1,000s]
[10s]
social media user
Order by popularity
discussed
§ Order-aware query evaluation schema: split and interleave
Limit to K [10s]
[10s] [10s]
shops
Order by proximity to the issuer
social media user
[100,0000s]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space - Harnessing other types of orders The split-and-interleave scheme
§ State-of-the-art • Literature in RDBMS (for a survey see [35]) presents the
split-and-interleave scheme: 1. Split the evaluation of the scoring function
into the evaluation of the single criteria 2. Interleave them with other operators 3. Use partial orders to construct incrementally the final order
§ Standard assumptions: • Monotone increasing scoring function • Sorted access for each criterion • Random access when possible is expensive • No uncertainty in the scores • No uncertainty in the scoring function
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 26
The solution space - Harnessing other types of orders Be aware, it’s a trade-off
NOTE: Typically users are interested in 1<= k <= 100
Orders of magnitude
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 27
The solution space Harnessing all types of orders together
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Harnessing all types of orders together
§ W.r.t. the running example, solutions studied in these area allow to efficiently • retrieve the shops nearby that popular social media users
are currently positively posting about..
§ This is a typical continuous monitoring of top-k queries over sliding windows [45]
§ A very promising and little explored research area in data management
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Wrapping up order-aware data mng.
§ Two parts of the query in the running example remain difficult to express: • knowing which topics are related to fashion
– requires at least a taxonomy of fashion-related topics • computing which recent discussions on social media
are popular – requires to compute the transitive closure of the discussion
§ Both are • difficult to model without an expressive ontological
language (such as OWL 2) and • both require complex algorithms that an ontology
reasoner can handle natively
§ Moreover, order-aware data management techniques do not cope with heterogeneity • i.e., data should be translated in one common representation
before order-aware data manage- ment techniques can be applied.
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 30
The solution space
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Solution Space Scalable Reasoning
§ Why? • handling heterogeneity in the input data through
ontology-based information integration
§ In the running example, • ontological background knowledge can be used to model
relationships between more specific and more general topics of interest, which can be used to infer which concrete topics are related to fashion
§ How? • Data-driven methods
– Scalable methods available in the state-of-the-art • Query-driven methods
– research trend, implementations are appearing • Combinations of the previous two
– mostly theoretical results
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Solution Space – Scalable Reasoning Data-driven § Ontological Language:
• OWL 2 RL – aimed at applications that require scalable reasoning without sacrificing
too much expressive power – http://www.w3.org/TR/owl2-profiles/#OWL_2_RL
§ Reasoning approach • Backward chaining: from asserted data to all possible entailments
§ Pros: Low query latency
§ Cons: they do not take the actual information-need into account
§ Implementations • OWLIM, Virtuoso, Allegro- Graph, and OntoBroker
§ Research trend • Parallelization using Map-Reduce as a main paradigm
– e.g. [33,65] for OWL2RL or a fragment thereof [32,64,66,38] • Applying similar techniques to more expressive fragments of OWL
– e.g., ELK reasoner for OWL EL [37]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Solution Space – Scalable Reasoning Query-driven § Ontological Language
• OWL 2 QL – designed for query answering in LOGSPACE w.r.t the size of the data,
with the expressivity of conceptual models (e.g., UML class diagrams) – http://www.w3.org/TR/owl2-profiles/#OWL_2_QL
§ Reasoning approach • Forward chaining: from query to asserted facts • Query rewriting: from ontological query to a set of SQL queries
§ Pros: limit the search space by considering the actual query
§ Cons: number of rewritings grow exponentially § Implementations
• QuOnto, Owlgres, and Requiem
§ Research trend • Extend query rewriting for more expressive ontology languages
– e.g., Datalog± [27,4] • Parallelization using Map-Reduce
– e.g., Query Pie Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Solution Space – Scalable Reasoning Combinations § Ontological Language
• Subject to research
§ Reasoning approach • combine the advantages of data- and query-driven approaches
§ State-of-the-art • Magic Sets technique [1]
§ Recent theoretical results • for limited fragment of OWL EL [44] • for existential rules [4]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The Solution Space – Scalable Reasoning Approximation § Many rule-based systems compute only part of the
entailed consequences by employing a set of rules that cannot derive all results • E.g., Jena, Sesame, OWLIM, and Virtuoso
§ A typical approach is to approximate the input information by restricting to a simpler ontology language that is then processed with a more efficient, sound and complete algorithm • e.g., Trowl [48], and screech [62].
§ Approximate reasoning is used as a sub-method in many sound and complete reasoners, • e.g., the OWL reasoner HermiT first computes the syntactically told class
hierarchy before using more complex algorithms for a complete subsumption check.
§ None of the above, however, deal with or take advantage of orderings of any kind.
§ A number of interesting research challenges thus remain open.
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 36
The solution space Wrap up of the talk so far
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 37
The solution space Reasoning with streaming algorithms
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Top-k Reasoning
Order-aware reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 38
The solution space Reasoning with streaming algorithms
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Stream Reasoning [IEEE-IS2009]
§ W.r.t. the running example, solutions studied in these area allow to efficiently • compute which recent discussions on social media are
popular
§ For instance, how many micro-posts discussed (either replying or retweeting) my tweet?
t1 t3 t5 t8 retweet reply reply
t2 t4 t7
t6
reply reply
retweet
reply discuss discuss discuss
discuss discuss
discuss
discuss
7! Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Stream Reasoning features
Feature
Trad Data Processing
offers
Stream Processing
offers
Automatic Reasoning
offers
Stream Reasoning
aims at Processing Streams Handling Large datasets Reactivity (real-time) Expressing Fine-grained queries Capturing Knowledge Access to Persistent Data
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Stream Reasoning definition
§ Making sense [IEEE-IS2010] • in real time • of multiple, heterogeneous, gigantic and inevitably noisy
data streams • in order to support the decision process of extremely
large numbers of concurrent user
§ Note: making sense of streams necessarily requires processing them against rich background knowledge, an unsolved problem in database
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Continuous reasoning tasks registered over streams that, in most of the cases, are observed trough windows
window
input streams streams of answer Registered Con,nuous Reasoning Tasks
The solution space Architecture of a Stream Reasoner
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Stream Reasoning PoliMi’s Achievements § RDF Stream data type [WWW2009]
• (virtually) represent heterogeneous data streams
§ C-SPARQL query language [WWW2009] • express fine-grained continuous queries • It is “compiled down” to keep high performances
§ Incremental RDFS++ Reasoning [ESWC2010] • allows for domain knowledge exploitation
§ C-SPARQL Engine [EDBT2010] • Fully operational prototype • Deployed in award winning applications (e.g., Bottari [JWS2012])
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 44
The solution space Stream Reasoning PoliMi’s Achievements
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” RDF Stream
§ RDF Stream Data Type • Ordered sequence of pairs, where each pair is made of an
RDF triple and its timestamp
§ Timestamps are not required to be unique, they must be non-decreasing
§ E.g., (<:Alice :posts :post1 >, 2010-02-12T13:34:41) (<:post1 :talksAboutPositively :LaScala>, 2010-02-12T13:34:41) (<:Bob :posts :post2 >, 2010-02-12T13:36:28) (<:post2 :talksAboutNegatively :Duomo>, 2010-02-12T13:36:28)
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
MEMO: SPARQL
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” Where C-SPARQL Extends SPARQL
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” An Example of C-SPARQL Query
Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?resource .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource.
FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker)
&& ?opinion != sd:accesses )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” An Example of C-SPARQL Query
Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?resource .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource.
FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker)
&& ?opinion != sd:accesses )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Query registration (for continuous execution)
FROM STREAM clause
WINDOW
RDF Stream added as new ouput format
Builtin to access timestamps
Aggregates as in SPARQL 1.1
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” Efficiency of C-SPARQL Query Evaluation
§ window based selection of C-SPARQL outperforms the standard FILTER based selection
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” Efficiency of C-SPARQL Query Evaluation § C-SPARQL Algebra allows to push of filters and projections
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” High Throughputs of C-SPARQL Engine
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space – Stream Reasoning “alla PoliMi” Incremental Materialization evaluation § base-line: re-computing the materialization from scratch
§ state-of-the-art (materialized view incremental maintenance)
§ PoliMi’s incremental stream approach [ESWC2010]
% of the materialization changed when the window slides
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
forward reasoning naive approach incremental-‐stream
query 5,82 1,61 1,61materialization 0 15,91 0,28
0
5
10
15
20
ms.
The solution space – Stream Reasoning “alla PoliMi” Incremental Maintenance and Query Latency
§ comparison of the average time needed to answer a C-SPARQL query using • backward reasoner • the naive approach of re-computing the materialization • PoliMi’s incremental-stream approach
Backward reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Stream Reasoning Community Achievements
§ RDF Stream data type • Adopted by most of the research groups active on Stream
Reasoning • Alternative solution based on two time stamps used in eTalis
§ Continuous query language • C-SPARQL was extended by the community • Alternative solutions have been studied
– without FROM STREAM clause [CQUELS] – oriented to complex event processing [2]
§ Reasoning • Data-driven for RDFS++ [ESCW2010] • Goal-driven for temporal logics (eTalis) [2] • time-decaying logic programs [26]. • Inductive reasoning [IEEE-IS2010]
§ Implementation Experiences • C-SPARQL Engine • eTalis / EP-SPARQL • CQUELS • S2R
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Stream Reasoning next steps
§ Scientific • Notions of soundness and completeness • More expressive reasoning
– with minor loss in throughput – and predictable loss on scalability
• Dealing with incomplete & noisy data • Parallelization and distribution of the processing
§ Technical • Prove effectiveness and efficacy in specific application
domains • Better integrate continuous semantics with Linked Data • Design and develop a software framework to simplify stream
reasoning application development
§ Organizational • Standardaze RDF Stream, C-SPARQL, Streaming Linked
Data, etc.
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 57
The solution space Wrap-up of Stream Reasoning
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 58
The solution space Top-k reasoning
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Top-k Reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Top-k reasoning approach
§ In traditional reasoning, ranking of results is normally considered a task that increase the hopelessness of scaling inference to massive data set
§ Top-k reasoning should, instead, overcome such a common practice and interleave ordering and reasoning
§ W.r.t. the running example, top-k reasoning should allow to efficiently • compute which are the top-k social media users, who are
well-known to lead discussions on fashion-related topics and are closest to the requester current location.
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Top-k reasoning attempts
§ SoftFacts [60] • an ontology-mediated top-k information retrieval system over
relational databases
§ SparqlRank[13] • adds order to SPARQL algebra as a first class citizen and
experimentally shows the performance gain
§ AnQL [41] • extends SPARQL to querying RDFS annotated by bounded
lattice (and thus comes with a partial or- dering).
§ Notion of exact top-k closure of an ontology w.r.t. a query and a scoring function [53]
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Top-k queries in SPARQL 1.1 § Retrieve the best 10 offers ordered by a function of
user ratings of the product and offer price: SELECT ?product ?offer (g1(?avgRat1) + g2(?avgRat2) + g3(?price) AS ?score) WHERE {
?product hasAvgRat1 ?avgRat1 . ?product hasAvgRat2 ?avgRat2 . ?product hasName ?name . ?product hasOffers ?offer . ?offer hasPrice ?price
} ORDER BY DESC (?score) LIMIT 10
§ Slow = tens of seconds on 5M (could be improved to milliseconds)
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Adapting SQL optimizations to SPARQL is not straightforward: • Different algebra • Different cost of data access in native RDF triplestores
– Sorted access is slow, random access is fast • Additional optimization dimensions
– Pushing the evaluation of BGP in the storage
§ Research tasks • New algebra for SPARQL where order is a first class citizen • new algorithms, and • optimization techniques
The solution space - Top-k queries in SPARQL 1.1 Challenges
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Extends the standard SPARQL algebra
§ Ranked set of mappings: set of mappings augmented with an order relation
Extended OPERATORS
New EQUIVALENC
ES
The solution space - Top-k queries in SPARQL 1.1 The SPARQL-Rank algebra
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 64
Ω
ρp1
ρp1(Ω )
?x ?y ?p1 ?p2 µ1 1 8 0.8 0.8
µ2 3 3 0.3 0.6
µ3 3 4 0.4 0.6
?x ?y ?p1 ?p2 Fp1
µ1 1 8 0.8 0.8 1.8
µ3 3 4 0.4 0.6 1.4
µ2 3 3 0.3 0.6 1.3
F (p1, p2)= ?p1 + ?p2
The solution space – SPARQL-Rank algebra The new Rank Operator
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 65
?x ?z ?p2 Fp2 µ4 1 9 0.8 1.8
µ5 3 0 0.6 1.6
Ω’p2
?x ?y ?z ?p1 ?p2 Fp1Up2
µ1 U µ4 1 8 9 0.8 0.8 1.6
µ3 U µ5 3 4 0 0.4 0.6 1.0
µ2 U µ5 3 3 0 0.3 0.6 0.9
?x ?y ?p1 ?p2 Fp1
µ1 1 8 0.8 0.8 1.8 µ3 3 4 0.4 0.6 1.4 µ2 3 3 0.3 0.6 1.3
Ωp1
The solution space – SPARQL-Rank algebra The redefined Join Operator
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
(a)RankJoin
sortedAccesssortedAccess
(b)RankSequence
randomAccesssortedAccess
(c)
RA-RankJoin
sortedAccessrandomAccess
sortedAccessrandomAccess
(a)RankJoin
sortedAccesssortedAccess
(b)RankSequence
randomAccesssortedAccess
(c)
RA-RankJoin
sortedAccessrandomAccess
sortedAccessrandomAccess
(a)RankJoin
sortedAccesssortedAccess
(b)RankSequence
randomAccesssortedAccess
(c)
RA-RankJoin
sortedAccessrandomAccess
sortedAccessrandomAccess
§ Different algorithms based on available access in the inputs:
• Hash Rank-Join – e.g. HRJN [Ilyas2004]
• Random Access Rank-Join
– e.g. RA-HRJN [Ilyas2004]
• RankSequence (e,g, RSEQ) – Minimum sorted access – Leverages random access
NEW [ISWC2012]
The solution space – SPARQL-Rank algebra Rank Join Algorithms
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
Split
The solution space – SPARQL-Rank algebra The new Algebraic Equivalences
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
Interleave
The solution space – SPARQL-Rank algebra The new Algebraic Equivalences
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Apply algebraic equivalences
§ Result: three possible strategies
1. Rank of BGPs 2. Interleaved 3. Rank Join
The solution space – SPARQL-Rank algebra Planning Strategies
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Substitute the monolithic scoring function with a number of incremental rank operators (rho)
The solution space – SPARQL-Rank algebra Planning Strategies: rank of BGPs (ROB)
(a) (b) (c)
g1(?a1)
g3(?p1)
?pr, ?of, ?score
[0,10]SLICE
seqScan
?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1
g3(?p1)
?pr, ?of, ?score
[0,10]SLICE
orderScan_a1
?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1
?pr = ?pr
?pr, ?of, ?score
[0,10]SLICE
g1(?a1)
g3(?p1)seqScan
?pr hasN ?n
Sequence
seqScan
?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1
?pr, ?of, ?score
[0,10]SLICE
?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasN ?n . ?pr hasO ?of .?of hasP ?p1.
[?score]ORDER
[?score =g1(?a1)+g2(?a2)+g3(?p1)]EXTEND
(a)
?pr = ?pr
?pr, ?of, ?score
[0,10]SLICEJoin
g3(?p1) g1(?a1)?pr hasO ?of .?of hasP ?p1 . ?pr hasA1 ?a1 .
?pr = ?prRankJoin
?pr = ?pr?pr hasN ?n .
RankJoin
g2(?a2)
?pr hasA2 ?a2 .
(b)
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Separate the pattern in two groups: • Triple patterns that influence the ranking • Triple patterns that don’t influence the ranking
(a) (b) (c)
g1(?a1)
g3(?p1)
?pr, ?of, ?score
[0,10]SLICE
seqScan
?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1
g3(?p1)
?pr, ?of, ?score
[0,10]SLICE
orderScan_a1
?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?of hasP1 ?p1
?pr = ?pr
?pr, ?of, ?score
[0,10]SLICE
g1(?a1)
g3(?p1)seqScan
?pr hasN ?n
Sequence
seqScan
?pr hasA1 ?a1 . ?pr hasO ?of . ?of hasP1 ?p1
?pr, ?of, ?score
[0,10]SLICE
?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasN ?n . ?pr hasO ?of .?of hasP ?p1.
[?score]ORDER
[?score =g1(?a1)+g2(?a2)+g3(?p1)]EXTEND
(a)
?pr = ?pr
?pr, ?of, ?score
[0,10]SLICEJoin
g3(?p1) g1(?a1)?pr hasO ?of .?of hasP ?p1 . ?pr hasA1 ?a1 .
?pr = ?prRankJoin
?pr = ?pr?pr hasN ?n .
RankJoin
g2(?a2)
?pr hasA2 ?a2 .
(b)
The solution space – SPARQL-Rank algebra Planning Strategies: Interleaved (INTER)
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Split into one pattern for each ranking criterion
§ Use the most appropriate join based on type of access
The solution space – SPARQL-Rank algebra Planning Strategies: Rank-Join (RJ)
?pr, ?of, ?score
[0,10]SLICE
?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasN ?n . ?pr hasO ?of .?of hasP ?p1.
[?score]ORDER
[?score =g1(?a1)+g2(?a2)+g3(?p1)]EXTEND
(a)
?pr = ?pr
?pr, ?of, ?score
[0,10]SLICEJoin
g3(?p1) g1(?a1)?pr hasO ?of .?of hasP ?p1 . ?pr hasA1 ?a1 .
?pr = ?prRankJoin
?pr = ?pr?pr hasN ?n .
RankJoin
g2(?a2)
?pr hasA2 ?a2 .
(b)
?pr, ?of, ?score
[0,10]SLICE
?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasN ?n . ?pr hasO ?of .?of hasP ?p1.
[?score]ORDER
[?score =g1(?a1)+g2(?a2)+g3(?p1)]EXTEND
(a)
?pr = ?pr
?pr, ?of, ?score
[0,10]SLICEJoin
g3(?p1) g1(?a1)?pr hasO ?of .?of hasP ?p1 . ?pr hasA1 ?a1 .
?pr = ?prRankJoin
?pr = ?pr?pr hasN ?n .
RankJoin
g2(?a2)
?pr hasA2 ?a2 .
(b)
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Example query, 5M triples dataset
§ Assumption: availability of sorted access indexes
The solution space – SPARQL-Rank algebra Experimental evidences of performance improvements
Two orders of magnitude better
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
§ Benchmark: 8 queries from on an extension of BSBM
The solution space – SPARQL-Rank algebra Experimental evidences of performance improvements
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 75
The solution space Wrap-up of Top-k Reasoning
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Top-k Reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 76
The solution space Full-fledge Order-aware reasoning
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Top-k Reasoning
Order-aware reasoning
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Full-fledge Order-aware reasoning
§ In Full-fledged order-aware reasoning, data- and query-driven inference methods have to deal with combinations of natural, cheap to enforce and expensive to enforce type of orders. • the naive assumption of independence of orderings would
have to be relaxed • theories and methods, which exploit mutual relationships
between the three type of orders, have to be rethought
§ Considering our running example, methods implementing order-aware reasoning are the only ones able to answer to the query • Which users of social media, currently leading popular
discussions on fashion- related topics, are closest to my current location? What are they saying about the shopping district nearby?
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
The solution space Full-fledge Order-aware reasoning
§ State-of-the-art • None
§ Promising work • The Answer Set Programming (ASP) community has recently
proposed an streaming algorithm for ASP [25] that 1. ranks the constants referring to domain elements and, 2. fetch them increasing the domain sizes until an answer set is
found.
§ Challenges • theoretical framework that unifies and generalises those
defined for stream reasoning and top-k reasoning • designing and test scalable data- and query-driven methods
that allows for efficient answering of queries that involve all types of orders
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 79
The solution space Wrap-up of Top-k Reasoning
Scalable reasoning
Types of reasoning
No reasoning Data-driven Query-driven Combinations
No ordering
Natural
Cheap to enforce
Expensive to enforce
Combinations
Types of orders
Ord
er-a
war
e
data
man
agem
ent
Stream reasoning
Top-k Reasoning
Trento, Italy, 6.11.2012
Order-aware reasoning
Emanuele Della Valle - http://streamreasoning.org/
References My papers [IEEE-IS2009] E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)
[EDBT2010] D.F. Barbieri, D.Braga, S. Ceri and M. Grossniklaus. An Execution Environment for C-SPARQL Queries. EDBT 2010
[WWW2009] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062
[IEEE-IS2010] D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Rettinger, H. Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics IEEE Intelligent Systems, 30 Aug. 2010.
[JWS2012] M. Balduini; I.Celino; E. Della Valle; D.Dell'Aglio; Y. Huang; T. Lee; S. Kim; V. Tresp: BOTTARI: an Augmented Reality Mobile Application to deliver Personalized and Location-based Recommendations by Continuous Analysis of Social Media Streams. JWS. 2012. IN PRESS.
[ESWC2010] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus. Incremental Reasoning on Streams and Rich Background Knowledge. ESWC 2010
[SWJ2012] E. Della Valle, S.Schlobach, M. Krötzsch, A. Bozzon, S. Ceri, I. Horrocks. Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data. IN PRESS
[ISWC2012] S. Magliacane, A. Bozzon, E. Della Valle. Efficient Execution of Top-k SPARQL Queries. ISWC 2012. IN PRESS
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/
Downloads § C-SPARQL Engine (no reasoning support)
• A ready to go pack for eclipse – http://streamreasoning.org/download
• Source code available on request
§ SPARQL-Rank Engine (ARQ-Rank) • Source code and experimental data
– http://sparqlrank.search-computing.org/
Trento, Italy, 6.11.2012
Emanuele Della Valle - http://streamreasoning.org/ 82
Thank You!
Keep an eye on http://www.streamreasoning.org There’s much more to come!
Any questions? [email protected]
Trento, Italy, 6.11.2012