+ All Categories
Home > Documents > Order Matters! Harnessing a World of Orderings for Reasoning over

Order Matters! Harnessing a World of Orderings for Reasoning over

Date post: 03-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
13
Undefined 0 (0) 1 1 IOS Press Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data Editor(s): Pascal Hitzler, Kno.e.sis Center, Wright State University, Dayton, OH, USA; Krzysztof Janowicz, University of California, Santa Barbara, USA Solicited review(s): Alessandra Mileo, DERI, National University of Ireland, Galway; Denny Vrandeˇ ci´ c, Wikimedia Deutschland e.V., Germany; David Carral Martínez, Kno.e.sis Center, Wright State University, Dayton, OH, USA Emanuele Della Valle a , Stefan Schlobach b , Markus Krötzsch c , Alessandro Bozzon a , Stefano Ceri a , Ian Horrocks c a DEI, Politecnico di Milano b Vrije Universiteit Amsterdam c Univerity of Oxford Abstract. More and more applications require real-time processing of massive, dynamically generated, ordered data; order is an essential factor as it reflects recency or relevance. Semantic technologies risk being unable to meet the needs of such applications, as they are not equipped with the appropriate instruments for answering queries over massive, highly dynamic, ordered data sets. In this vision paper, we argue that some data management techniques should be exported to the context of semantic technologies, by integrating ordering with reasoning, and by using methods which are inspired by stream and rank-aware data management. We systematically explore the problem space, and point both to problems which have been successfully approached and to problems which still need fundamental research, in an attempt to stimulate and guide a paradigm shift in semantic technologies. Keywords: Massive data, inference, ordering, streaming algorithms 1. Introduction Data is massively produced and published at a speed which exceeds by far our current methods and infras- tructure for processing it. Science and Engineering have become more and more data-driven: an environ- mental study of the earth atmosphere using digital tele- scopes requires collecting streams of measurements; the smooth pathway of satellites through space criti- cally depends on the availability and analysis of de- tailed information about tiny objects in a flight path through hundreds of kilometers of space; a single sim- ulation of an airplane engine easily produces terabytes of simulation results. These examples have a common feature: their data is ordered. In some cases, data is naturally ordered by recency. Other data is intrinsically ordered, e.g., by precision, popularity, provenance, certainty, trust. In any case, data is explicitly sortable through at- tribute values such as latitude, longitude, object size, user-provided ratings, or frequency. Multiple orderings are simultaneously present for almost every available piece of information. Most answers are also required to come in an ordered fashion; for instance, engineers surveying a satellite orbit need to know the largest pieces of debris in closest proximity with maximal cer- tainty, measured with highest precision; social scien- tists studying the Web want to study the most influen- tial blogs, or the most recent tweets closest to a partic- ular point of interest. Another common property of the described prob- lems is their time-critical character, requiring imme- diate answers at runtime: scientists and engineers want to adapt their expensive and complex experiments and simulations while running them based on analysis of incoming results, e.g., flight paths have to be adapted 0000-0000/0-1900/$00.00 c 0 – IOS Press and the authors. All rights reserved
Transcript
Page 1: Order Matters! Harnessing a World of Orderings for Reasoning over

Undefined 0 (0) 1 1IOS Press

Order Matters! Harnessing a World ofOrderings for Reasoning over Massive DataEditor(s): Pascal Hitzler, Kno.e.sis Center, Wright State University, Dayton, OH, USA; Krzysztof Janowicz, University of California, SantaBarbara, USASolicited review(s): Alessandra Mileo, DERI, National University of Ireland, Galway; Denny Vrandecic, Wikimedia Deutschland e.V.,Germany; David Carral Martínez, Kno.e.sis Center, Wright State University, Dayton, OH, USA

Emanuele Della Valle a, Stefan Schlobach b, Markus Krötzsch c, Alessandro Bozzon a, Stefano Ceri a,Ian Horrocks c

a DEI, Politecnico di Milanob Vrije Universiteit Amsterdamc Univerity of Oxford

Abstract. More and more applications require real-time processing of massive, dynamically generated, ordered data; order is anessential factor as it reflects recency or relevance. Semantic technologies risk being unable to meet the needs of such applications,as they are not equipped with the appropriate instruments for answering queries over massive, highly dynamic, ordered data sets.In this vision paper, we argue that some data management techniques should be exported to the context of semantic technologies,by integrating ordering with reasoning, and by using methods which are inspired by stream and rank-aware data management. Wesystematically explore the problem space, and point both to problems which have been successfully approached and to problemswhich still need fundamental research, in an attempt to stimulate and guide a paradigm shift in semantic technologies.

Keywords: Massive data, inference, ordering, streaming algorithms

1. Introduction

Data is massively produced and published at a speedwhich exceeds by far our current methods and infras-tructure for processing it. Science and Engineeringhave become more and more data-driven: an environ-mental study of the earth atmosphere using digital tele-scopes requires collecting streams of measurements;the smooth pathway of satellites through space criti-cally depends on the availability and analysis of de-tailed information about tiny objects in a flight paththrough hundreds of kilometers of space; a single sim-ulation of an airplane engine easily produces terabytesof simulation results.

These examples have a common feature: their datais ordered. In some cases, data is naturally orderedby recency. Other data is intrinsically ordered, e.g.,by precision, popularity, provenance, certainty, trust.

In any case, data is explicitly sortable through at-tribute values such as latitude, longitude, object size,user-provided ratings, or frequency. Multiple orderingsare simultaneously present for almost every availablepiece of information. Most answers are also requiredto come in an ordered fashion; for instance, engineerssurveying a satellite orbit need to know the largestpieces of debris in closest proximity with maximal cer-tainty, measured with highest precision; social scien-tists studying the Web want to study the most influen-tial blogs, or the most recent tweets closest to a partic-ular point of interest.

Another common property of the described prob-lems is their time-critical character, requiring imme-diate answers at runtime: scientists and engineers wantto adapt their expensive and complex experiments andsimulations while running them based on analysis ofincoming results, e.g., flight paths have to be adapted

0000-0000/0-1900/$00.00 c© 0 – IOS Press and the authors. All rights reserved

Page 2: Order Matters! Harnessing a World of Orderings for Reasoning over

2 Emanuele Della Valle et al. / Order Matters!

once an object in collision course is detected, and com-panies need to know the effects of their commercialcampaigns immediately. There is an immense need forreactive tools that critically depend on such runtimesolutions.

Finally, all these problems require inference. For in-stance, engineers need to identify complex modellingerrors in simulations, and classify them by severity,subjects in the flight path of a satellite need to besorted according to their type, material, size, etc. Se-mantic applications must deal at the same time withrich ontological models describing complex domainknowledge, and highly dynamic data representing re-cent or relevant information, as produced by streamingor search-enabled data sources. State-of-the-art seman-tic technologies do not consider ordering as an essen-tial property. Ranking results is often seen as an “addedtask,” performed after inference, without affecting theinference process which is order-agnostic; as a result,semantic technologies cannot provide reactive and re-liable query answering over such massive datasets, in-tegrating highly dynamic sources; they don’t scale infront of massive ordered data, and fail to be used inthese problems and contexts.

However, the data management community hasshown that the intrinsic “sorted” nature of data sourcescan be considered as an opportunity for building ef-ficient data processing techniques, by harnessing or-dering before processing, or by exploiting orderingwithin processing. Harnessing and exploiting order-ings in reasoning gives the opportunity for signifi-cantly scaling up inferencing. We need a foundationaltheory, new generic methods and concrete algorithmsfor reasoning in the presence of orderings. This visionpaper offers a systematic study of the solution spaceand of the challenges, both solved and unsolved, thatthe semantic community should face, taking advantageof methods which were defined in the data manage-ment context. We will identify classes of problems, de-scribe prototypical examples, and indicate applicableapproaches for each class. We will also identify theopen challenges for the most difficult problems.

In principle the methods we propose are complete,i.e., return the correct answer. However, streaming al-gorithms lend themselves very naturally to approachesreturning partial and approximate answers, with in-creasing quality over time. Progress in the context ofreasoning over massive dynamic data can thus meantwo things: runtime performance and quality perfor-mance. To make this more concrete, let real-timeindicate the minimal time required to meet opera-

Fig. 1. Runtime performance and answer quality

tional deadlines from event occurrence to system re-sponse [10].

Then, the behaviour of a system should be targetedtowards exploiting all the inferences that can be per-formed in real-time, and the target is to reduce the run-time to real-time while retaining an acceptable level ofanswer quality. These concepts are visualized in Fig. 1.

In summary, the class of problems that can bene-fit from a tight integration of orderings and reasoninghave the following properties:

– Data is massive.– Data is ordered.– Data can be incomplete, heterogeneous and noisy.– Applications are time sensitive.– Applications require inference.– The analytical tasks require ordered answers.

In the remainder of the paper, we describe four con-crete examples of time-sensitive applications that needto process massive sets of highly dynamic data (Sec-tion 2). We identify the problem and solution spacethat need to be investigated to implement our vision,and we argue that several areas in the problem spacecan only be addressed by making ordering a first classcitizen in reasoning (Section 3). The core of the pa-per is (1) a systematic exploration of the solution spacethat points both to problems that have been success-fully approached and to problems that still need funda-mental research (Section 4), and (2) a discussion aboutthe important role of approximation and parallelism insuch space (Section 5).

2. Examples of applications

Hereafter, we present four concrete applicationproblems for which order-aware reasoning can signifi-cantly boost scalability: three of them have their origin

Page 3: Order Matters! Harnessing a World of Orderings for Reasoning over

Emanuele Della Valle et al. / Order Matters! 3

in specific projects with industrial partners, the fourthis more general and will be used as a running examplethroughout the paper to explain our vision and ideas.

2.1. Space Situational Awareness

Since the start of human excursions into space, theamount of debris left in the atmosphere has expo-nentially increased. Typical examples of such debrisare decommissioned satellites, parts of transportationrockets, but also include tools lost by astronauts, andmost often wreckage from explosions or collisions inorbit. For satellites circling earth, collision with spacedebris has turned into a serious risk.1 Space situationalawareness is the problem of detecting these objects bymonitoring the space using networks of radar and tele-scopic installations, and combining those results withexisting debris databases. Those installations providemassive streams of measurements that are time andspace bound, and can be ordered by reliability of thetools and precision of the observations.

Given that the task consists in finding debris in a re-stricted space and time window, and that human deci-sions strongly depend on a system’s confidence, thisis a typical target problem where order-aware reason-ing makes a difference. For example, those streams canprovide information whose real-time analysis could al-low to adapt the flight path of a satellite to avoid colli-sion, or even to shoot down debris.

2.2. Jet Engine Design

The construction of jet engines heavily dependson simulation. During those simulations, terabytes ofdata about flow fields, pressure, etc. are produced. Inthe current routine, such a simulation is performed inhigh performance computer centres and can last up tomonths. In a separate step, the data is analysed fordesign-errors, e.g., regarding the distance between ro-tor and engine boundary. Visualisation is used to detectdeformations to rotor or stator, and to derive novel andmore favourable design parameters. Given the com-plexity of running the simulations reactive adaptationof those parameters according to real-time analyticswould be desirable.

Both analysis and visualisation are done offlineand decoupled from the simulation, as analytic infer-ence and data selection cannot yet be done efficiently

1http://www.space.com/11314-space-junk-satellite-collision-air-force.html

enough for real-time processing. However, engineersperceive the need for effective analysis and visualisa-tion of intermediate simulation results as soon as theyare produced, and inference becomes a critical compu-tational bottleneck. More concretely, tools are requiredthat order intermediate results according to their im-portance for the visualisation process and their analyticimportance (e.g., mesh quality problems). By dealingwith simulation results at the time they are produced,jet engine design will be turned into a reactive process,which will save critical experimental time and efforts.

2.3. Intelligent Surveillance

Surveillance is the monitoring of behaviour, activ-ities, or other information about groups of people. Itis, e.g., employed to ensure the safety of workers onthe factory floor, to detect crimes occurring in indooror outdoor settings, or to monitor the flow of largecrowds through public spaces. Current systems stillrequire full involvement of human operators, whichimplies an high labour costs, limited capability formultiple screens, inconsistency in long-duration, etc.Most surveillance products on the market are basedon vision and pattern recognition techniques, but in-dustry and governments call for next generation intel-ligent surveillance systems that integrate such sensordata with social data streams. For larger cities, such asSeoul, the size of data generated each day from sen-sor networks exceed 500GB and 3 million tweets areposted each day. To make sense of all this information,extremely large amounts of geo-spatial data are alsoconsidered. In the workflow of real-time city surveil-lance scalable inferencing has been identified as an in-surmountable obstacle.

Again, data naturally contains orderings. For in-stance, tweets can be ordered by popularity that can beestimated by the ratio of tweets and re-tweets, trust-worthiness of information gathered from sensor net-works can be ordered by precision of the sensor. More-over, the final analytics returned to the interested partyis necessarily based on choices that are related to thoseorderings; for instance, aggregating social sensing overa recent time window, a district of the city, age groups,or based on the most reliable information. Making useof the orderings in both data and information need typ-ically calls for order-aware reasoning.

2.4. Social Media Analysis

The final example is about social media analysis,and will be used throughout the paper as running ex-

Page 4: Order Matters! Harnessing a World of Orderings for Reasoning over

4 Emanuele Della Valle et al. / Order Matters!

ample to motivate and explain our vision and ideas.Each second, thousands of tweets are produced world-wide, forming a rich body of information for compa-nies and governments alike. Imagine a system whichlistens to all micro-posts that are published (on Twitter,Facebook, Google+, etc.), knows the geographic loca-tion of social media users, has the ability of detectingthe topic of each micro-post, and has modelled rela-tionships between topics in an expressive ontologicallanguage. Such system would be capable of serving avariety of information needs, e.g.:

Which users of social media, currently leadingpopular discussions on fashion-related topics, areclosest to my current location? What are they say-ing about the shopping district nearby?

Such a query describes a complex information needheavily depending on orderings in the data, and is thusa prototypical example for a problem requiring order-aware reasoning.

3. Inferencing with streaming algorithms

As mentioned in the introduction, research in ef-ficient database querying indicates that many actualapplications require order-aware queries, and that an-swering those queries can be highly efficient. Trans-lated to the problem of inference over large-scale data,this means the following: whenever our applicationsrequire only those parts of the solution space that areoptimal according to one or more ordering criteria,there is the opportunity to speed-up traditional infer-ence methods by focusing on that part of the data thatcontributes to this significantly smaller solution space.For instance, when dealing with fast-changing data,efficiency can be gained through window-aware mainmemory processing of streaming data and indexing ofstatic data, which allow efficient random access. Alter-natively, when the information need relates to an order-ing criterion, there is the opportunity to speed-up tra-ditional inference methods by focusing on the ordereddata and using order-aware operators.

Figure 2 shows a bird’s-eye view of the order-awarereasoning vision. Data on the left is considered to besorted – or sortable – according to some ordering. Suchorderings can be natural, i.e., already present in thedata (e.g., the recency in the micro-post stream), orthey can be enforced for the purpose of a given ap-plication. Among the enforced orders, we further dis-tinguish cheap from expensive ones. Cheap orderings

Fig. 2. A bird’s-eye view on order-aware reasoning

can be obtained by means of simple operations, suchas column sorts, that can be supported efficiently. Forinstance, indexes for sorted access can be created forany database column of an ordinal type in relationaldatabases or for any geometry column in geo-spatialdatabases. Expensive orderings, in contrast, can onlybe produced as the result of complicated operations,such as joining intermediate results (e.g., the recipro-cal distance of two flying objects in a given momentin time) or invoking complex custom functions (e.g.,the impact of an opinion maker in the last hour as afunction of the number of replies and re-tweets). Manyforms of cheap or natural sorted data access can beleveraged by algorithms to obtain a core selection cri-terion, thus reducing the impact of evaluating expen-sive orders.

Data of the given size, frequently changing andintrinsically ordered, calls for streaming algorithms.These algorithms completely avoid random access todata, i.e., require only one pass or a small number ofpasses over the data, while using a workspace that ismuch smaller than the size of the data. Examples in-clude many algorithms that perform computations bysplitting a problem into the two problems of sortingand solving. Typical streaming algorithms feature lowspace complexity upper bounds that are polylog in thesize of the input (i.e., their memory use is estimated byO(logk(n)), where n is the size of the input and k isa constant). Moreover, although exact time bounds areoften not known for streaming algorithms, the majorityof these algorithms are also very fast in practice.

Let us consider the running example of Section 2.4to show how streaming algorithms can make use of theadditional complexity of the orderings in the data to ef-fectively speed up inferencing and overcome the scal-ability bottleneck. As topic detection can be computa-tionally difficult, this becomes expensive for datasetsof the size considered in intelligent surveillance. How-ever, we are looking for the most recent contributions

Page 5: Order Matters! Harnessing a World of Orderings for Reasoning over

Emanuele Della Valle et al. / Order Matters! 5

Fig. 3. Investigation space for order-aware reasoning.

at physical proximity which are expressed by trustedusers; thus inference becomes easier, as we can nowfilter out tweets from distant or untrusted contributors,and those that have been published longer time ago.Assuming that such ordering can be effectively har-nessed, the inference space gets smaller, as we can it-eratively evaluate all users by distance and popularityfor the topic they posted on, and stop once we haveenough good answers. We can stream answers by com-puting the top-k answers according to some metrics,or else we can stream answers as they are discovered,without ordering them, in an any-k approach.

The basic approach used in this paper is to considerhow streaming algorithms apply to different types oforderings – natural or enforced – and to different typesof reasonings, such as data- or query-driven inferenc-ing. The investigation space covers two dimensions, asillustrated in Fig. 3.

The vertical dimension is structured by types of or-ders of increasing complexity. The baseline is a set ofscalable data management solutions that do not con-sider order as first class citizen (i.e., the large majorityof those available on the market). On top of that, threedifferent types of orderings can be considered:

– Natural orders– Cheap orders– Expensive orders– Combinations of the three

The horizontal dimension is defined by type of rea-soning. Once again, the baseline is a set of scalabledata management solutions that do not offer reasoning(i.e., the large majority of those available on the mar-ket). On top of that, three different types of reasoningmethods can be considered:

Fig. 4. A proposal for naming categories of areas in the investigationspace for order-aware reasoning.

– Data-driven: those that reduce query latency bymaterializing inferences at loading time

– Query-driven: those that rewrite the ontologicalquery into one or several simpler queries

– Combined: those that explore mixed approaches

Light grey squares indicate problems for whichwell-established methods exist; grey squares indicateproblems that are currently hot topics in research; darkgrey squares denote progress areas in which most in-novative techniques can lead to major advances, wellbeyond the state of the art.

A third dimension, which is not shown in Fig. 3,concerns additional features such as parallelisation andapproximation. The former is fundamental for scal-ability. The latter trades completeness or correctness(or both) for improving performance, and as such isclosely related to any-time reasoning, where imperfectresults are returned as early as possible and continu-ously improved if more time is available.

As our framework adds orderings as an extra dimen-sion of inference, new appropriate quality metrics arerequired. Classical metrics such as soundness, com-pleteness or computational complexity do not take thespecial requirements of order-based processing intoaccount. For example, with regards to a query answer-ing task, answers could be sound and complete asusual, but now correctness of ordering has to be estab-lished. New qualitative and quantitative measures ofappropriateness are needed to characterise the qualityof order-based methods.

Page 6: Order Matters! Harnessing a World of Orderings for Reasoning over

6 Emanuele Della Valle et al. / Order Matters!

4. Investigation space

In the previous section, we introduced the investi-gation space of order-aware reasoning with respect tothe two dimensions of types of orders and of reason-ing. Next, we use this classification to group the areasof investigation into six categories illustrated in Fig-ure 4: data management solutions available on the mar-ket (Area 1), order-aware data management (Areas 2to 5), scalable reasoning for ontology-based informa-tion integration (Areas 6, 11 and 16), stream reasoningas defined in [20] (Areas 7, 12 and 17), top-k reasoning(Areas 8, 9, 13, 14, 18 and 19), and full-fledged order-aware reasoning (Areas 10, 15 and 20). In each cat-egory, we draw the line between the established stateof the art, current research trends, and open challengesfor future investigations. We leave the discussion ofapproximation and parallelisation for Section 5.

4.1. Solutions currently on the market

The large majority of scalable data management so-lutions available on the market fall into the area ofmethods and tools that are (almost) completely ig-noring orderings in the data, and that do not per-form any inference. Existing approaches are basedon parallel programming models such as BSP (BulkSynchronous Parallel), PRAM (Parallel Random Ac-cess Machine), PGAS (Partitioned Global AccessSpace), or Map-Reduce. The latter has been im-plemented in several frameworks: MapReduce [19],Hadoop [30], SkyNet [58], Disco [21] are examplesof data-centric workflow systems (based on the Map-Reduce paradigm) that ease the parallel execution ofdata-intensive processes on a large cluster of commod-ity machines. Other frameworks based on the MapRe-duce paradigm such as Hive [61] or Pig [47] allow thespecification of ordering constraints for user queriesover massive data collections, but no specific optimi-sation is provided for top-k or any-k queries.

4.2. Order-aware data management

Area 2 is the area of Stream Data ManagementSystems (DSMS) [24] and Complex Event Processors(CEP) [43], which is about naturally ordered data with-out reasoning. This type of system has been largely in-vestigated in the end of the 1990s and in the begin-ning of the 2000s. A number of start-ups (e.g., MikeStonebraker’s StreamBase) were founded and majordata management solution vendors have extended their

offer in this direction (e.g., Microsoft’s StreamInsight,IBM’s InfoSphere Streams).

The simplest portion of the query in our runningexample, i.e., the request to retrieve the micro-posts that have been posted recently, is the typicalquery a DSMS is optimised for.

DSMS and CEP share two ideas: a) processing “onthe fly” on data streams while they pass by, and b) ex-ploiting the temporal order of the data stream to op-timise the computation. These two ideas are the ba-sis for a well-known class of algorithms: the stream-ing algorithms, that we already discussed in Section 3.In 1998, “Computing on Data Streams” [31] was thefirst publication formalising streaming algorithms, butearly works on this class of algorithms have their rootsin the late 1970s, when, for instance, the first queryoptimisation technique based on estimation of orderstatistics in the data was presented [54].

For instance, streaming algorithms that computegraph statistics, matchings in a graph, and randomwalks [65] can cope with massive graphs that can onlybe stored in high capacity storage devices where ran-dom access is extremely slow (as compared to primarymemory devices). Notably, to use streaming algo-rithms, data does not need to be naturally ordered (e.g.,by recency as in DSMS/CEP); it has to be sortable bysome criteria, e.g., popularity or physical distance, re-quired by the streaming algorithm that will read it.

Area 3 is about data on which orderings are cheapto enforce, again without reasoning. It is the area oftop-k query answering, which has been the subject ofresearch since the 1990s. One of the earliest works inthe area is the famous rank aggregation algorithm byFagin [23], which allows merging multiple lists of re-sults returned from different databases with one passon both lists. More recent works (e.g., [34]) focused onrank-aware join algorithms (see [35] for a survey), i.e.,return the top-k results of a join of a set of ordered re-lationships scanning only a minimal part of each rela-tion and avoiding random access.2 Area 4 is about ef-ficient evaluation of top-k queries that include expen-sive to enforce orderings. Some important theoreticalresults and efficient algorithms (e.g., on the minimalnumber of probes absolutely required to return correctresults [17]) are available in this area, but no existingwork tackles the problem of optimal planning of top-k

2If random access is possible, some rank-join algorithms performa minimum amount of random accesses after the sequential scans.

Page 7: Order Matters! Harnessing a World of Orderings for Reasoning over

Emanuele Della Valle et al. / Order Matters! 7

queries considering both predicate correlation and se-lectivity estimation.

Regarding the running example, solutions studiedin Areas 3 and 4 allow to retrieve nearby shops thatare discussed by popular social media users.

Area 5 is about combining data stream process-ing with top-k query answering. This is where mostof the current research efforts concentrate. “Contin-uous monitoring of top-k queries over sliding win-dows” [45] is the leading work in this area. It showshow to efficiently combine sliding windows, whichharvest the natural orders of a data streams, with top-kquery answering.

Using the methods of Area 5, it is possible to re-trieve the shops nearby that popular social mediausers are currently positively posting about.

However, two parts of the query in the running ex-ample remain difficult to express: knowing which top-ics are related to fashion and computing which recentdiscussions on social media are popular. Both are dif-ficult to model without an expressive ontological lan-guage (such as OWL 2) and both require complex al-gorithms that an ontology reasoner can handle natively.Moreover, these techniques do not cope with hetero-geneity, i.e., data should be translated in one com-mon representation before order-aware data manage-ment techniques can be applied.

4.3. Scalable reasoning for ontology-basedinformation integration

Areas 6, 11, and 16 consider ontological back-ground information as a basis for inferring implicit in-formation from the given data. In our target applica-tions, this is particularly useful for ontology-based in-formation integration, i.e., for handling heterogeneityin the input data [39].

In the running example, ontological backgroundknowledge can be used to model relationships be-tween more specific and more general topics of in-terest, which can be used to infer which concretetopics are related to fashion.

Area 6 covers reasoning methods that draw onto-logical inferences based on the available data (possi-bly including ontological information). A prime ex-ample of this “bottom-up” approach is materialisationin relational databases [1,29], which is closely relatedto forward chaining in logic programming. This ap-

proach has successfully been applied to ontologies, inparticular in the lightweight ontology language OWLRL and fragments thereof. Commercial implementa-tions of this idea include OWLIM, Virtuoso, Allegro-Graph, and OntoBroker. Similar methods have beenapplied with great success to more expressive frag-ments of OWL under the label consequence-based rea-soning [36,57] as implemented in the ELK reasonerfor OWL EL [37]. The common advantage of bottom-up techniques is that the computation is driven by theinferences that are possible based on the data. Thisguides the search for logical consequences and reducesoverall computational effort. The major disadvantageof data-driven approaches, however, is that they do nottake our actual information-need (query) into account,i.e., they are usually not goal-directed.

Area 11 includes approaches that search for logi-cal consequences that lead to the answer of a particu-lar query. Typical examples from relational databasesare the numerous techniques for query rewriting [1],which closely relate to backward chaining in logicprogramming. In the context of ontologies, this ap-proach was mainly applied to OWL QL and relatedlogics [16,50,27]. Implementations include QuOnto,Owlgres, and Requiem. Query rewriting has also beenused to implement reasoning capabilities in commer-cial RDF databases, e.g., in Virtuoso (configurablealternative to materialisation) and 4Store (4sr plu-gin). The theoretical foundations of query rewritinghave also been studied for more expressive ontologylanguages that are based on existential rules (a.k.a.Datalog±) [27,4]. The advantage of these methods isthat they limit the search space by considering theactual information-need (query) instead of computingall possible inferences. On the other hand, rewriting aquery in a way that is not depending on the given datamay require a very high number of rewritings (expo-nentially many for OWL QL, possibly infinitely manyfor existential rules).

Area 16 therefore aims to combine the advantages ofbottom-up and top-down approaches. A classical ex-ample is the Magic Sets technique known in deductivedatabases, which achieves a goal-directed behaviourin bottom-up computations [1]. For more expressiveontology languages, however, this combination is notclear and subject to on-going research. Initial propos-als for combined approaches have been made for a lim-ited fragment of OWL EL [44] and, on a purely theo-retical level, for existential rules [4]. We note that thereare a number of reasoning methods that do not specif-ically relate to query answering, e.g., tableau meth-

Page 8: Order Matters! Harnessing a World of Orderings for Reasoning over

8 Emanuele Della Valle et al. / Order Matters!

ods, that check satisfiability of a logical theory by con-structing models. Such general approaches can be a ba-sis for data- or query-driven approaches, but are notcombined approaches in our sense since they do notusually combine the advantages of data- and query-driven procedures.

4.4. Stream reasoning

Investigations in Areas 7, 12 and 17 deal withreasoning on rapidly changing information, namelystream reasoning [20]. This new reasoning method re-moves the common assumption in scalable reasoningthat knowledge bases are static or evolving slowly. Byharvesting the natural temporal order in data streams,stream reasoning addresses the requirements of a num-ber of modern applications, ranging from sensor net-works to social media analysis.

In terms of our example, stream reasoning methodsare the most appropriate to compute which recentdiscussions on social media are popular.

In the last three years, several independent groupselaborated stream reasoning techniques [9,8,22,2,26]applied to sensor networks, healthcare, financial frauddetection and social media analysis. These approachesexploit different stream processing and reasoning tech-niques, but they share an homogeneous theoreticalframework.

All of them, but [26], share the notion of RDFstream [9], which logically models a stream of triplesannotated with a non-decreasing timestamp. However,other types of RDF stream can be explored. For in-stance, each triple could be annotated with two timestamps that describe the time interval in which thetriple is valid (this is commonly done in CEP). Thegranularity of the streamed data element can also berethought. Choosing a triple as streamed data ele-ment is appropriate when a single triple carries enoughinformation. For instance, [7] experimentally provedthat effective social media analysis can be based on astream of triples such as Alice likes Wonderland.In semantic sensor networks [56], one observation re-quires a minimum of ten triples, thus choosing a namedgraph containing a set of triples as streamed data el-ement can be more appropriate. This is similar to thetime-decaying logic programs used in [26]. Punctua-tion [63] – a mark that identifies substreams allowingto view an infinite stream as a mixture of finite streams– is a flexible alternative approach still unexplored instream reasoning.

Moreover, little effort has been dedicated so far tothe formal definition of stream reasoning inferenceproblems. Preliminary work has been done in [8],which formally defines continuous query answeringunder RDFS++ entailment regime, and in [26], whichgives the formal semantics of stream reasoning withanswer set programming in terms of a dedicated mod-ule theory [46]. However, a general formal definitionof soundness and completeness for continuous queryanswering under expressive OWL2 entailment regimeremains an open problem. Also the notion of inconsis-tency in stream reasoning deserves further investiga-tion; while in a static domain an individual cannot be-long to two disjoint classes (i.e., Alice can either be-long to Tall or to Short), when we consider a timeframe (i.e., a window in DSMS terms) two inconsis-tent facts can be present, but the content of the windowshould not be considered inconsistent, only the mostrecent statement should be considered. However, thisis not as simple as it may appear at a first look and the-oretically framing this problem, so that it can be effi-ciently treated in practice, is an open issue (i.e., well-known AI techniques, such as belief revision [18], pro-vide a sound theoretical framework, but, in practice,naive implementations do not scale).

Algorithmically, [8,22,2] are all data-driven ap-proaches (Area 7). Area 12, which is about integrat-ing natural orders into query-driven reasoning, has notbeen explored, yet. It may deserve exploration giventhat in stream reasoning queries are registered, thus thecost of rewriting can be paid only once (at query reg-istration time) and inter-query optimization (e.g., shar-ing of sub-plans) may show to be able to handle rewrit-ten queries that would be practically too complex fora DBMS. The biggest challenge for stream reasoningis in applying combined data- and query-driven infer-encing techniques to data naturally ordered (Area 17).

4.5. Top-k reasoning

Investigations in Areas 3 and 4 are typically referredto as top-k data processing. Likewise, we refer to theresearch space that deals with reasoning in presence ofboth cheap to enforce (Areas 8, 13, and 18) and expen-sive to enforce (Areas 9, 14, and 19) orders as top-kreasoning. In traditional reasoning, ranking of resultsis normally considered a task that increase the hope-lessness of scaling inference to massive data set; ourproposal, instead, is to overcome such a common prac-tice and interleave order and reasoning.

Page 9: Order Matters! Harnessing a World of Orderings for Reasoning over

Emanuele Della Valle et al. / Order Matters! 9

In terms of the running example, top-k reason-ing methods are the most appropriate to computewhich are the top-k social media users, who arewell-known to lead discussions on fashion-relatedtopics and are closest to the requester current lo-cation.

Some investigations have been conducted in Areas13 and 14, where several works addressed the problemof top-k query answering in presence of orders usingquery rewriting: [60] studies SoftFacts – an ontology-mediated top-k information retrieval system over rela-tional databases; [13] adds order to SPARQL as a firstclass citizen; the authors of [41] take a different angleby extending SPARQL to querying RDFS annotatedby bounded lattice (and thus comes with a partial or-dering).

The theoretical framework for top-k reasoning is un-explored. Progressing in this direction calls for a soundidentification of the type of data to be managed, andthe classes of queries that can be answered with thegiven data. [13] made a first step in this direction bydefining the notion of ranked sets of mappings, andan order-aware SPARQL algebra that embodies rank-aware algebraic operators. Once these basic buildingblocks of query answering under simple RDF entail-ment regime is in place, appropriate inference prob-lems can be defined: e.g., notions of exact top-k clo-sure of an ontology w.r.t. a query and a scoring func-tion (for an attempt see [53]).

From an algorithmic point of view, the only ex-plored areas in top-k reasoning are Area 13 and 14.Area 8 – methods for top-k materialisation on easyto enforce orders – seems the easiest to explore sinceorder-aware joining algorithms [35] could be appliedto rule-base reasoning; Area 9 could benefit from exist-ing works in databases [50]. Perhaps, the biggest chal-lenge in top-k reasoning is in interleaving combineddata- and query-driven techniques with techniques thatharvest easy (Area 18) and expensive to enforce orders(Area 19).

4.6. Order-aware reasoning

With full-fledged order-aware reasoning, we referto Areas 10, 15, and 20, where data- and query-driveninference methods have to deal with combinations ofnatural, cheap to enforce and expensive to enforce typeof orders. In such a context, the naive assumption ofindependence of orderings would have to be relaxed,thus theories and methods, which exploit mutual rela-

tionships between the three type of orders, have to berethought.

Considering our running example, methods im-plementing order-aware reasoning are the onlyones able to answer to the query we posed inSection 2.4, i.e., Which users of social media,currently leading popular discussions on fashion-related topics, are closest to my current location?What are they saying about the shopping districtnearby?

Some promising work is undergoing in the AnswerSet Programming (ASP) community. In Area 10, [25]proposes an streaming algorithm for ASP that firstranks the constants referring to domain elements and,then, fetch them increasing the domain sizes until ananswer set is found.

The challenge is to define a theoretical frameworkthat unifies and generalises those defined for streamreasoning and top-k reasoning. Such a frameworkshould pave the way for designing scalable data- andquery-driven methods that allows for efficient answer-ing of queries that involve all types of ordering (Areas10 and 15). As for stream and top-k reasoning, per-haps, the biggest challenge is in interleaving combineddata- and query-driven techniques with techniques thatharvest all types of orders (Area 20).

5. Approximation and parallelisation

While the solution space in Fig. 3 is based on the in-put and output behaviour of order-aware systems, thereare various additional areas of investigation that arerelevant in almost every scenario. In this section, wefocus on two such orthogonal research dimensions thatare of particular importance to order-aware reasoning:approximation and parallelisation.

5.1. Approximate reasoning

Classical approximation tries to reduce the use ofcomputation time and working memory. Item 1 aboveconsiders another critical resource, namely the amountof input data that is required for giving useful answers.

In every case, it is desirable that algorithms com-pute increasingly accurate results when given more re-sources (time, memory, data). Any-time algorithms areapproximation procedures that provide preliminary re-sults of increasing quality at (almost) any stage of thecomputation. Ideally, the correct result should be ap-

Page 10: Order Matters! Harnessing a World of Orderings for Reasoning over

10 Emanuele Della Valle et al. / Order Matters!

proximated arbitrarily close in this process, but suchasymptotic behaviour might not always be possible.

By approximate reasoning we mean any approachto reasoning that can produce results that are incom-plete (missing answers) or unsound (giving wrong an-swers), but in a deliberate and controlled way that isaccepted as a means of improving performance.3 Suchalgorithms might be used to produce quick preliminaryresults, to give answers when no other method is fea-sible, or to estimate outputs in situations where onlylimited accuracy is needed.

Approximation plays a key role in order-aware rea-soning, due to the following independent reasons:

1. Restricted access to input data: order-aware rea-soning often deals with data that is only partiallyaccessible (in an ordered fashion), making it im-possible to guarantee full accuracy in all cases.

2. Soft constraints on output data: order-aware rea-soning introduces the order of results as a newoutput dimension, for which controlled inaccu-racy is often more acceptable.

3. Order-aware approximation: reasoning involvescomputational tasks that are inherently hard tosolve, motivating the use of approximation inclassical cases; but order provides a new guide-line and quality measure for approximation.

Approximation has been considered in some of theinvestigation areas in Fig. 3. For a classical exam-ple that belongs to Area 1, consider the problem ofcounting the number of triangles in a graph. This isa well-known problem in graph analysis, which is abasic building block for evaluating many practicallyimportant graphs, e.g., in social networks, chemicalcompounds, or networks of Web links. It has beenshown that approximate, streaming algorithms can out-perform classical, data-bound approaches to this prob-lem by several orders of magnitude [6,14]. Moreover,such approximations can be asymptotic, so that arbi-trary accuracy can be achieved [6].

Various forms of approximation have been proposedin areas related to ontology-based data access. Manyrule-based systems compute only part of the entailedconsequences by employing a set of rules that can-not derive all results. This is the case, e.g., for Jena,Sesame, OWLIM, and Virtuoso, all of which supportspecific fragments of OWL RL but not the whole lan-

3Note that this is different from (accurate) reasoning with approx-imate (imprecise) information, as done, e.g., in fuzzy logic.

guage. Incompleteness is mostly well-understood insuch cases.

A variety of other approximation methods have beenconsidered in reasoning [51,28]. A typical approachis to approximate the input information by restrictingto a simpler ontology language that is then processedwith a more efficient, sound and complete algorithm,see, e.g., [48,62]. This is related to the idea of knowl-edge compilation [55]. Approximate reasoning is alsoused as a sub-method in many sound and completereasoners, e.g., the OWL reasoner HermiT first com-putes the syntactically told class hierarchy before us-ing more complex algorithms for a complete subsump-tion check. Reasoning approaches that are based onbackward-chaining and query rewriting typically sug-gest any-time approximation, as more and more an-swers can be found by continuing the (possibly infi-nite) generation of queries. This is typical for manyProlog implementations and can also be combinedwith rewriting-based approaches for DL-Lite [16] orexistential rules [5].

None of the above systems, however, deal with ortake advantage of orderings of any kind. A number ofinteresting research challenges thus remain open.

5.2. Parallelisation approaches

Parallelisation of reasoning means that the task issplit into subproblems that can be solved indepen-dently with limited exchange of information. This en-ables concurrent computation (e.g., by sharing com-putation among several CPUs of one machine) anddistribution (sharing computation among multiple net-worked machines with independent memory). Both as-pects are essential for exploiting state-of-the-art com-puting systems to go beyond the limits of traditionalreasoning.

However, order-based reasoning introduces newchallenges to parallelisation, e.g., the following:

1. Ordered data access might be inherently sequen-tial, and thus is harder to distribute.

2. Outputs must be integrated into an overall order,and thus cannot be generated in a fully decentral-ized fashion.

To bring order-aware reasoning to its full potential,these challenges must be addressed.

In recent years, parallelisation has been success-fully employed in various reasoning tasks, especiallyin rule-based (bottom-up) materialisation approaches.The two main strands of work are multi-machine dis-

Page 11: Order Matters! Harnessing a World of Orderings for Reasoning over

Emanuele Della Valle et al. / Order Matters! 11

tribution and multi-processor concurrency. Distribu-tion is motivated by processing data volumes that aretoo large for a single machine’s working memory. Ap-proaches in that field often target OWL RL [33,65]or a fragment thereof [32,64,66,38], using MapReduceas the main computation paradigm [19]. Another dis-tribution paradigm suggested for this case is the pre-partitioning of inputs [59]. In contrast to these ap-proaches, concurrent processing on a single machineaims at speeding up reasoning for improving reactiv-ity of user-driven applications. The main example ofthis approach is the ELK reasoner for OWL EL [37]and various preliminary works on parallel reasoning inmore expressive ontology languages [40,52,3]. Over-all, many of the implemented systems demonstratedperformance gains of one or several orders of mag-nitude, yet none of these systems is applicable to astreaming scenario.

6. Conclusions

Our systematic exploration of the investigationspace has shown that a huge amount of founda-tional and applied research is necessary. Starting fromlessons learned from data management, we need:

– A theory of semantic processing of massive setsof complex and highly dynamic data. This willinclude the development of knowledge represen-tation languages, performance metrics and a sys-tematic roadmap about how to process massive,dynamic, ordered data.

– Methods and techniques related to such a theoret-ical framework. This means at least one methodfor each area of investigation identified in Fig. 3.

– Implementations of the above methods accordingto current software development standards.

– Rigorous evaluations of the proposed methodsand technology in a comparative way using apurpose-built testing infrastructure.

Putting order as first class citizen in reasoning maystart with incremental steps, but a tight integration ofordering and reasoning will require substantial rethink-ing to the cornerstones on which current semantic tech-nologies are built.

Acknowledgements This research has been sup-ported by the ERC Search Computing (SeCo) project,the Dutch national program COMMIT, the EU FP7project SEALS, and by the EPSRC projects ConDOR,

ExODA and LogMap. Many ideas of this paper stemfrom results of the LarKC project. We also want tothank Frank van Harmelen for his important contribu-tion, and Tony Lee (Saltlux), Andreas Schreiber (DLR)and Achim Basermann (DLR) for the valuable discus-sion on concrete examples of problems that requireorder-aware reasoning.

References

[1] Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations ofDatabases. Addison-Wesley.

[2] Anicic, D., Fodor, P., Rudolph, S., & Stojanovic, N. (2011). EP-SPARQL: a unified language for event processing and stream rea-soning. In S. Srinivasan, K. Ramamritham, A. Kumar, M. P.Ravindra, E. Bertino, and R. Kumar, editors, WWW, pages 635–644. ACM.

[3] Aslani, M. & Haarslev, V. (2010). Parallel TBox classificationin description logics – first experimental results. In H. Coelho,R. Studer, and M. Wooldridge, editors, Proc. 19th European Conf.on Artificial Intelligence (ECAI’10), volume 215 of Frontiersin Artificial Intelligence and Applications, pages 485–490. IOSPress.

[4] Baget, J.-F., Leclère, M., Mugnier, M.-L., & Salvat, E. (2009).Extending decidable cases for rules with existential variables. In[12], pages 677–682.

[5] Baget, J.-F., Leclère, M., Mugnier, M.-L., & Salvat, E. (2011).On rules with existential variables: Walking the decidability line.Artif. Intell., 175(9-10), 1620–1654.

[6] Bar-Yossef, Z., Kumar, R., & Sivakumar, D. (2002). Reduc-tions in streaming algorithms, with an application to counting tri-angles in graphs. In D. Eppstein, editor, SODA, pages 623–632.ACM/SIAM.

[7] Barbieri, D. F., Braga, D., Ceri, S., Della Valle, E., Huang, Y.,Tresp, V., Rettinger, A., & Wermser, H. (2010a). Deductive andinductive stream reasoning for semantic social media analytics.IEEE Intelligent Systems, 25(6), 32–41.

[8] Barbieri, D. F., Braga, D., Ceri, S., Della Valle, E., & Gross-niklaus, M. (2010b). Incremental reasoning on streams and richbackground knowledge. In [42], pages 1–15.

[9] Barbieri, D. F., Braga, D., Ceri, S., Della Valle, E., & Gross-niklaus, M. (2010c). Querying RDF streams with C-SPARQL.SIGMOD Record, 39(1), 20–26.

[10] Ben-Ari, M. (1990). Principles of concurrent and distributedprogramming. PHI Series in computer science. Prentice Hall.

[11] Bernstein, A., Karger, D. R., Heath, T., Feigenbaum, L., May-nard, D., Motta, E., & Thirunarayan, K., editors (2009). Proc.8th Int. Semantic Web Conf. (ISWC’09), volume 5823 of LNCS.Springer.

[12] Boutilier, C., editor (2009). IJCAI 2009, Proceedings of the21st International Joint Conference on Artificial Intelligence,Pasadena, California, USA, July 11-17, 2009.

[13] Bozzon, A., Della Valle, E., & Magliacane, S. (2011). To-wards and efficient SPARQL top-k query execution in virtualRDF stores. In S. Chakrabarti and D. Martinenghi, editors, Proc.5th International Workshop on Ranking in Databases (DBRANK2011), pages 1–6.

Page 12: Order Matters! Harnessing a World of Orderings for Reasoning over

12 Emanuele Della Valle et al. / Order Matters!

[14] Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., & Sohler, C. (2006). Counting triangles in datastreams. In S. Vansummeren, editor, PODS, pages 253–262.ACM.

[15] Calvanese, D. & Lausen, G., editors (2008). Web Reasoningand Rule Systems, Second International Conference, RR 2008,Karlsruhe, germany, October 31-November 1, 2008. Proceed-ings., volume 5341 of LNCS. Springer.

[16] Calvanese, D., Giacomo, G. D., Lembo, D., Lenzerini, M., &Rosati, R. (2007). Tractable reasoning and efficient query an-swering in description logics: The DL-Lite family. J. Autom. Rea-soning, 39(3), 385–429.

[17] Chang, K. C.-C. & won Hwang, S. (2002). Minimal prob-ing: supporting expensive predicates for top-k queries. In M. J.Franklin, B. Moon, and A. Ailamaki, editors, SIGMOD Confer-ence, pages 346–357. ACM.

[18] Darwiche, A. & Pearl, J. (1996). On the logic of iterated beliefrevision. Artificial intelligence, 89, 1–29.

[19] Dean, J. & Ghemawat, S. (2008). Mapreduce: simplified dataprocessing on large clusters. Commun. ACM, 51, 107–113.

[20] Della Valle, E., Ceri, S., van Harmelen, F., & Fensel, D. (2009).It’s a streaming world! Reasoning upon rapidly changing infor-mation. IEEE Intelligent Systems, 24(6), 83–89.

[21] Disco (2012). Disco project - http://discoproject.org.[22] Do, T., Loke, S., & Liu, F. (2011). Answer set programming

for stream reasoning. In C. Butz and P. Lingras, editors, Advancesin Artificial Intelligence, volume 6657 of LNCS, pages 104–109.Springer Berlin / Heidelberg.

[23] Fagin, R. (1996). Combining fuzzy information from multi-ple systems (extended abstract). In Proceedings of the fifteenthACM SIGACT-SIGMOD-SIGART symposium on Principles ofdatabase systems, PODS ’96, pages 216–226, New York, NY,USA. ACM.

[24] Garofalakis, M., Gehrke, J., & Rastogi, R. (2007). Data StreamManagement: Processing High-Speed Data Streams. Springer-Verlag New York, Inc.

[25] Gebser, M., Sabuncu, O., & Schaub, T. (2011). An incrementalanswer set programming based system for finite model computa-tion. AI Commun., 24(2), 195–212.

[26] Gebser, M., Grote, T., Kaminski, R., Obermeier, P., Sabuncu,O., & Schaub, T. (2012). Stream Reasoning with Answer Set Pro-gramming: Preliminary Report. In T. Eiter and S. McIlraith, ed-itors, Proceedings of the Thirteenth International Conference onPrinciples of Knowledge Representation and Reasoning (KR’12),pages 613–617. AAAI Press.

[27] Gottlob, G., Orsi, G., & Pieris, A. (2011). Ontological queries:Rewriting and optimization. In S. Abiteboul, K. Böhm, C. Koch,and K.-L. Tan, editors, ICDE, pages 2–13. IEEE Computer Soci-ety.

[28] Groot, P., Stuckenschmidt, H., & Wache, H. (2005). Approx-imating Description Logic Classification for Semantic Web Rea-soning. In A. Gómez-Pérez and J. Euzenat, editors, Proc. 2nd Eu-ropean Semantic WebConf. (ESWC’05), volume 3532 of LNCS,pages 318–332. Springer.

[29] Gupta, A. & Mumick, I. S., editors (1999). Materialized views:techniques, implementations, and applications. MIT Press, Cam-bridge, MA, USA.

[30] Hadoop (2012). Apache hadoop framework -http://hadoop.apache.org/.

[31] Henzinger, M. R., Raghavan, P., & Rajagopalan, S. (1999).Computing on data streams. In External Memory Algorithms: Di-

macs Workshop External Memory and Visualization, May 20-22,1998, volume 50, page 107. Amer Mathematical Society.

[32] Hogan, A., Harth, A., & Polleres, A. (2009). Scalable author-itative OWL reasoning for the Web. Int. J. of Semantic Web Inf.Syst., 5(2), 49–90.

[33] Hogan, A., Pan, J. Z., Polleres, A., & Decker, S. (2010). SAOR:template rule optimisations for distributed reasoning over 1 bil-lion linked data triples. In [49], pages 337–353.

[34] Ilyas, I. F., Aref, W. G., & Elmagarmid, A. K. (2003). Sup-porting top-k join queries in relational databases. In VLDB, pages754–765.

[35] Ilyas, I. F., Beskales, G., & Soliman, M. A. (2008). A survey oftop-k query processing techniques in relational database systems.ACM Comput. Surv., 40(4), 1–58.

[36] Kazakov, Y. (2009). Consequence-driven reasoning for HornSHIQ ontologies. In [12], pages 2040–2045.

[37] Kazakov, Y., Krötzsch, M., & Simancík, F. (2011). Concur-rent classification of EL ontologies. In Proceedings of the 10thinternational conference on The semantic web - Volume Part I,ISWC’11, pages 305–320, Berlin, Heidelberg. Springer-Verlag.

[38] Kotoulas, S., Oren, E., & van Harmelen, F. (2010). Mind thedata skew: distributed inferencing by speeddating in elastic re-gions. In Proc. 19th Int. Conf. on World Wide Web (WWW’10),WWW’10, pages 531–540. ACM.

[39] Lenzerini, M. (2002). Data integration: A theoretical perspec-tive. In L. Popa, editor, PODS, pages 233–246. ACM.

[40] Liebig, T. & Müller, F. (2007). Parallelizing tableaux-baseddescription logic reasoning. In R. Meersman, Z. Tari, and P. Her-rero, editors, Proceedings of OTM Workshops 2007, Part II, vol-ume 4806 of LNCS, pages 1135–1144. Springer.

[41] Lopes, N., Polleres, A., Straccia, U., & Zimmermann, A.(2010). AnQL: SPARQLing up annotated RDFS. In [49], pages518–533.

[42] Lora Aroyo et al., editor (2010). The Semantic Web: Researchand Applications, 7th Extended Semantic Web Conference, ESWC2010, Heraklion, Crete, Greece, May 30 - June 3, 2010, Proceed-ings, Part I, volume 6088 of LNCS. Springer.

[43] Luckham, D. (2008). The power of events: An introductionto complex event processing in distributed enterprise systems.In N. Bassiliades, G. Governatori, and A. Paschke, editors, RuleRepresentation, Interchange and Reasoning on the Web, volume5321 of Lecture Notes in Computer Science, pages 3–3. SpringerBerlin / Heidelberg.

[44] Lutz, C., Toman, D., & Wolter, F. (2009). Conjunctive queryanswering in the description logic EL using a relational databasesystem. In [12], pages 2070–2075.

[45] Mouratidis, K., Bakiras, S., & Papadias, D. (2006). Continuousmonitoring of top-k queries over sliding windows. In S. Chaud-huri, V. Hristidis, and N. Polyzotis, editors, SIGMOD Conference,pages 635–646. ACM.

[46] Oikarinen, E. & Janhunen, T. (2006). Modular equivalence fornormal logic programs. In G. Brewka, S. Coradeschi, A. Perini,and P. Traverso, editors, ECAI, volume 141 of Frontiers in Artifi-cial Intelligence and Applications, pages 412–416. IOS Press.

[47] Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A.(2008). Pig latin: a not-so-foreign language for data processing.In Proceedings of the 2008 ACM SIGMOD international confer-ence on Management of data, SIGMOD ’08, pages 1099–1110,New York, NY, USA. ACM.

[48] Pan, J. Z. & Thomas, E. (2007). Approximating OWL-DLontologies. In Proc. 22nd AAAI Conf. on Artificial Intelligence

Page 13: Order Matters! Harnessing a World of Orderings for Reasoning over

Emanuele Della Valle et al. / Order Matters! 13

(AAAI’07), pages 1434–1439. AAAI Press.[49] Patel-Schneider, P. F., Pan, Y., Hitzler, P., Mika, P., Zhang, L.,

Pan, J. Z., Horrocks, I., & Glimm, B., editors (2010). The Seman-tic Web - ISWC 2010 - 9th International Semantic Web Confer-ence, ISWC 2010, Shanghai, China, November 7-11, 2010, Re-vised Selected Papers, Part I, volume 6496 of LNCS. Springer.

[50] Pérez-Urbina, H., Horrocks, I., & Motik, B. (2009). Effi-cient query answering for OWL 2. In A. Bernstein, D. R.Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta, andK. Thirunarayan, editors, International Semantic Web Confer-ence, volume 5823 of LNCS, pages 489–504. Springer.

[51] Rudolph, S., Tserendorj, T., & Hitzler, P. (2008). What is ap-proximate reasoning? In [15], pages 150–164.

[52] Schlicht, A. & Stuckenschmidt, H. (2009). Distributed res-olution for expressive ontology networks. In A. Polleres andT. Swift, editors, Proc. 3rd Int. Conf. on Web Reasoning andRule Systems (RR 2009), volume 5837 of LNCS, pages 87–101.Springer.

[53] Schlobach, S. (2011). Top-k reasoning for the semantic web.Proceedings of the 11th Interational Semantic Web ConferenceISWC2011, pages 55–59.

[54] Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie,R. A., & Price, T. G. (1979). Access path selection in a rela-tional database management system. In P. A. Bernstein, editor,SIGMOD Conference, pages 23–34. ACM.

[55] Selman, B. & Kautz, H. A. (1996). Knowledge compilationand theory approximation. J. ACM, 43(2), 193–224.

[56] Sheth, A. P., Henson, C. A., & Sahoo, S. S. (2008). SemanticSensor Web. IEEE Internet Computing, 12(4), 78–83.

[57] Simancik, F., Kazakov, Y., & Horrocks, I. (2011).Consequence-based reasoning beyond horn ontologies. In

T. Walsh, editor, IJCAI, pages 1093–1098. IJCAI/AAAI.[58] Skynet (2012). Skynet ruby project -

http://skynet.rubyforge.org/.[59] Soma, R. & Prasanna, V. K. (2008). Parallel inferencing for

OWL knowledge bases. In Proc. Int. Conf. on Parallel Processing(ICPP’08), pages 75–82. IEEE Computer Society.

[60] Straccia, U. (2010). Softfacts: A top-k retrieval engine for on-tology mediated access to relational databases. In SMC, pages4115–4122. IEEE.

[61] Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., An-thony, S., Liu, H., Wyckoff, P., & Murthy, R. (2009). Hive: awarehousing solution over a map-reduce framework. Proc. VLDBEndow., 2, 1626–1629.

[62] Tserendorj, T., Rudolph, S., Krötzsch, M., & Hitzler, P. (2008).Approximate OWL-reasoning with screech. In [15], pages 165–180.

[63] Tucker, P. A., Maier, D., Sheard, T., & Fegaras, L. (2003). Ex-ploiting punctuation semantics in continuous data streams. IEEETrans. Knowl. Data Eng., 15(3), 555–568.

[64] Urbani, J., Kotoulas, S., Oren, E., & van Harmelen, F. (2009).Scalable distributed reasoning using MapReduce. In [11], pages634–649.

[65] Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., & Bal,H. E. (2010). OWL reasoning with WebPIE: calculating the clo-sure of 100 billion triples. In [42], pages 213–227.

[66] Weaver, J. & Hendler, J. A. (2009). Parallel materialization ofthe finite RDFS closure for hundreds of millions of triples. In[11], pages 682–697.


Recommended