Specification of the Architecture - optique-project.eu fileExecutiveSummary:...

Project No: FP7-318338

Project Acronym: Optique

Project Title: Scalable End-user Access to Big Data

Instrument: Integrated Project

Scheme: Information & Communication Technologies

Deliverable D2.1Specification of the Architecture

Due date of deliverable: (April 30, 2013)

Actual submission date: April 29, 2013

Start date of the project: 1st November 2012 Duration: 48 months

Lead contractor for this deliverable: UOXF

Dissemination level: PU – Public

Final version

Executive Summary:Specification of the Architecture

This document summarises deliverable D2.1 of project FP7-318338 (Optique), an Integrated Project sup-ported by the 7th Framework Programme of the EC. Full information on this project, including the contentsof this deliverable, is available online at http://www.optique-project.eu/.

The deliverable presents an initial specification of the architecture of the Optique system describingthe individual system components, their interplay and interfaces, and establishes agreement on system-wideconventions and standards.

The deliverable is organized as follows. In Chapter 1 we start with the goal that an Optique system shouldfulfill and proceed with an overview of OBDA systems: we present a general architecture of the systems andexplain why OBDA approach to data management is a good way to approach our goal. Finally, we discussexisting OBDA solutions and exhibit limitation which make them insufficient for Optique. In Chapter 2we present the initial specification of the technical requirements and set of standards and conventions thatmembers of the Optique project will follows in developing their solutions. In Chapter 3 we present anoverview of the Optique’s OBDA solution architecture and we focus on each of its components, namely:query formulation, ontology and mapping management, query answering, and processing and analytics ofstreaming and temporal data. The conclusions and dissemination efforts are given in Chapter 4. Finally,Appendix A provides an initial specification of the shared interfaces and their importance in each workpackage.

List of Authors

Martin Giese (UIO)Peter Haase (FOP)Ian Horrocks (UOXF)Ernesto Jiménez-Ruiz (UOXF)Evgeny Kharlamov (UOXF)Michael Schmidt (FOP)Ahmet Soylu (UIO)Dmitriy Zheleznyakov (UOXF)

Contributors

Dimitris Bilidas (UoA)Diego Calvanese (FUB)Manolis Koubarakis (UoA)Michael Meier (FOP)Özgür Özçep (TUHH)Mariano Rodríguez-Muro (FUB)Riccardo Rosati (UNIROMA1)Domenico Fabio Savo (UNIROMA1)

2

http://www.optique-project.eu/

Optique Deliverable D2.1 Specification of the Architecture

Internal ReviewersHerald Kllapi (UoA)Rudolf Schlatte (UIO)

3

Contents

1 Introduction 6

2 Specific Requirements 92.1 External Interface Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 General System Functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 Query Formulation, Execution, and Processing . . . . . . . . . . . . . . . . . . . . . . 102.2.3 Ontology and Mapping Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.4 Catalog Management Functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Conventions and Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Approaches for Streaming and Temporal Data . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Performance Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Logical Database Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Design Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6.1 Standard Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.2 Hardware Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.3 Software Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 Software System Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.7.1 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7.3 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7.4 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7.5 Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Optique Architecture 163.1 Query Formulation Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Widget-based Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Ontology and Mapping Management Component . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Query Answering Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 Query Transformation Subcomponent . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.2 Distributed Query Processing Subcomponent . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Streaming and Temporal Data in the Optique Architecture . . . . . . . . . . . . . . . . . . . 28

4 Conclusions and Dissemination 29

Bibliography 29

Glossary 34

4


A Initial Specification of the Shared Interfaces 35A.1 Shared APIs for Ontology Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36A.2 Shared APIs for Reasoning over Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.3 Shared APIs for Mapping Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.4 Shared APIs for Relational Data and Metadata Management . . . . . . . . . . . . . . . . . . 40A.5 Shared APIs for RDF Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40A.6 Shared APIs for Streaming Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 41A.7 Shared APIs for Cloud Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5

Chapter 1

Introduction

A typical problem that end-users face when dealing with Big Data is that of data access, which arises dueto the three dimensions (the so-called “3V”) of Big Data: volume, since massive amounts of data have beenaccumulated over the decades, velocity, since the amounts may be rapidly increasing, and variety, since thedata are spread over different formats. In this context accessing the relevant information is an increasinglydifficult task and the Optique project [18] aims at providing solutions for it.

The project is focused around two demanding use cases that provide it with motivation, guidance, andrealistic evaluation settings. The first use case is provided by Siemens1 and encompasses several terabytes oftemporal data coming from sensors, with a growth rate of about 30 gigabytes per day. Users need to querythis data in combination with many gigabytes of other relational data that describe events. The second usecase is provided by Statoil2 and concerns more than one petabyte of geological data. The data are storedin multiple databases which have different schemata, and – at the time being – the users have to manuallycombine information from many databases, including temporal, in order to get the results for a single query.

End users

Ontology

...

Heterogeneous data sources

Query

Mappings Mappings

...Applicationend users heterogeneous

data sources

predefinedquieres

...heterogeneous data sources

end user IT-expert

informationneed

specialisedquieres

Figure 1.1: Left: existing approaches to data access; Right: basic OBDA approach

Accessing the relevant data in this context is becoming increasingly difficult for end-users. For example,in large enterprises, such as Statoil, end-users work with applications that allow accessing data througha limited set of predefined queries (cf. Figure 1.1, left, top). In situations where an end-user needs datathat these predefined queries do not provide, the help of IT-experts (e.g., database managers) is required totranslate the information need of end-users to specialised queries and optimise them for efficient execution(cf. Figure 1.1, left, bottom). This process is the main bottleneck in the Optique use cases since it mayrequire several iterations and may take several days. In particular in the oil and gas industry, IT-expertsspend 30–70% of their time gathering and assessing the quality of data [14]. This is clearly very expensivein terms of both time and money.

1http://www.siemens.com2http://www.statoil.com

6

http://www.siemens.com

http://www.statoil.com


streaming data

end-user IT-expert

Ontology Mappings


query

results

Query Formulation

Ontology & Mapping Management

...

end-user IT-expert

ApplicationOntology Mappings

Query Answering


query

results

Classical OBDA Optique OBDA

Application(Analytics)

Query TransformationDistributed Query Optimisation and Processing

Figure 1.2: Left: classical OBDA approach. Right: the Optique OBDA system

The Optique project aims at solutions that reduce the cost of data access dramatically. More precisely,Optique aims at automating the process of going from an information requirement to the retrieval of therelevant data, and to reduce the time needed for this process from days to hours, or even to minutes. Abigger goal of the project is to provide a platform with a generic architecture that can be adapted to anydomain that requires scalable data access and efficient query execution.

The semantic approach known as “Ontology-Based Data Access” (OBDA) [49, 10] has the potential toaddress the data access problem by automating the translation process from the information needs of users (cf.Figure 1.1, right) to data queries. The key idea is to use an ontology, which presents to users a semanticallyrich conceptual model of the problem domain. The users formulate their information requirements (thatis, queries) in terms of the ontology, and then receive the answers in the same intelligible form. Theserequests should be executed over the data automatically, without an IT-expert’s intervention. To this end,a set of mappings is maintained which describes the relationship between the terms in the ontology andthe corresponding terminology in the data source specifications, e.g., table and column names in relationaldatabase schemas.

State-of-the-art OBDA systems that are based on classical OBDA architecture (cf. Figure 1.2), however,have shown among others the following four limitations.

1. The usability of OBDA systems regarding the user interface is still an open issue. Even if the vocabularyprovided by the ontology is familiar to end-users, the user may find difficult to formulate complex querieswhen several concepts and relationships are involved.

2. OBDA systems critically depend on a suitable ontology and the corresponding set of mappings, whichare in practice expensive to obtain. Even if we assume that the ontology and the mappings aregiven, they are not static artefacts and should evolve according to the new end-users’ informationrequirements.

3. Treatment of query answering is usually limited to query rewriting and there is little support of dis-tributed query optimisation and processing in OBDA systems. Moreover, even in the scenarios withoutdata distribution current state of the art implementations of rewriting-based query answering haveshown serious limitations in scalability [49].

4. Temporal data, streaming, e.g., sensor, data, and corresponding analytical tools are generally ignoredby OBDA systems, which seriously limits their applicability in enterprises such as Siemens where

7


one has to deal with large amounts of streaming data from many turbines and diagnostic centres, incombination with historical, that is, temporal relational data sources.

In the Optique project, we aim at developing a next generation OBDA system (cf. Figure 1.2, right)that overcomes this limitations. More precisely, the project aims at a cost-effective approach that require torevise existing OBDA components and develop new ones, in particular, to support (i) novel ontology andmapping management components, (ii) user-friendly query formulation interface(s), (iii) automated querytranslation, (iv) distributed query optimisation and execution exploiting private and public cloud resourcesfor scale-out, and (v) rigorous treatment of temporal and streaming data.

In this deliverable we present an initial specification of the architecture of the Optique system describingthe individual system components, their interplay and interfaces, and establishes agreement on system-wideconventions and standards. This architecture will serve as a guideline for the modules and components devel-oped in the technical work packages, and will evolve according to new requirements. The final architecturewill be provided in deliverable D2.2 (Month 30).

8

Chapter 2

Specific Requirements

This chapter provides an initial specification of the Optique platform requirements based on the IEEEmethodology for software requirements specification [31]. The requirements form the basis for the architecturespecification in the subsequent chapter.

2.1 External Interface Requirements

The three main user interfaces of the Optique presentation layer (formulation of queries, visualisation ofquery answers, and the management of ontologies and mappings) will be accessed via web browser, using theHTTP or HTTPS protocol. The system must support the latest versions of the most prominent browsers (inparticular the latest versions of Internet Explorer, Firefox and Chrome). Additionally, (external) visualizationtools (such as, for instance, dedicated visualization tools used in the Statoil use case) or a fully-fledgedontology editor will also be integrated as external interfaces.

Finally, Optique administrators and external tools (e.g. for data visualization) must have access to theplatform’s interfaces via programming APIs. To this end, the platform shall offer both CLI and RESTinterfaces exposing the functionality of the API to authorized users and tools.

2.2 Functional Requirements

As part of the requirements analysis, we identified the following functional requirements.

2.2.1 General System Functionalities

User and roles

• Users must be able to register to the system with basic privileges.

• At least three kind of users will be differentiated: regular end-users, IT-experts, and administrators

• Users must have personalized logins to the Optique system.

• The system must support LDAP authentication. In particular, there must be a mechanism for mappingarbitrary LDAP groups to the three roles mentioned above.

• Different privileges can be associated to users and user groups (e.g. using certain components, writevs. read access to the ontology, etc.). For example, Table 2.1 summarizes different privileges associatedwith end-users and IT-experts.

• The specification of privileges shall be done via access control lists (so-called ACL files). There mustbe a default setting for the three user groups mentioned above.

9


Table 2.1: Privileges associated with end-users and IT-experts.

Type ofuser

Registration Queryformulation

Basic ontologymanipulation

Complex ontologymanipulation

Mappingmanipulation

Ontology andMapping maintenance

IT-expert X X X X X XEnd-users X X X × × ×

Configuration

• The Optique platform must provide means to configure those modules that require an initial set-up.

• The central configuration tasks will be made via the user interface using predefined configuration forms.

• Default configurations shall be provided.

2.2.2 Query Formulation, Execution, and Processing

Query formulation

• Both end-users and IT-experts must be able to formulate one-time queries, as well as streaming andtemporal queries.

• The platform will provide both a Query by Navigation interface and a direct editing interface, optionallysupported by auto-completion techniques (i.e. context sensitive editing).

Query planning, optimization, and execution

• The system shall be able to execute queries following an OBDA approach based on a global ontologyand specified mappings over the original data sources.

• The system shall support distributed query processing in a transparent way over a set of relationaldatabases.

• The system shall support querying of streaming data, also in combination with non-streaming dataheld in relational databases.

• The system shall generate optimized queries according to state-of-the-art OBDA optimization tech-niques.

Answer processing

• The platform must provide basic visualization of the query results.

• More sophisticated visualizations and interpretations will involve the use of external tools. To this end,the platform shall provide mechanisms to delegate answers to such external tools or, alternatively, offerAPIs that external tools can use to extract answers for a given query.

Cloud automation

• The system shall have interfaces making it possible to connect private and public cloud resources forscale-out.

• For complex queries, the system shall be able to parallelise query execution using cloud infrastructureresources.

• Using these interfaces, it must be possible to deploy, start, and stop Virtual Machines and to triggercommands on virtual machines remotely in a secure way.

10


2.2.3 Ontology and Mapping Management

Ontology bootstrapping and extension

• The platform must provide techniques to bootstrap an initial ontology from pre-existing databaseschemas, and reuse terms and axioms from state-of-the-art ontologies.

• The platform must provide mechanisms to maintain, modify, and extend ontologies.

– End-users must be able to perform basic extensions over the ontology.

– IT-experts must be able to perform any extension/modification over the ontology.

Mapping bootstrapping and extension

• The platform must provide support in bootstrapping an initial set of mappings.

• Only IT-experts are allowed to perform extensions/modifications over the mappings.

Ontology and mapping maintenance

• IT-experts must be able to analyse the consistency of the ontology and the mappings in order to detectdefects.

• IT-experts must be able to fix defects in the ontology and the mappings.

• IT-experts must be able to compare different versions of the ontology and the mappings as in a versioncontrol system.

2.2.4 Catalog Management Functionalities

Shared store for asset management

• The system shall have a central triple store for all assets required by the individual system components,in particular for storing ontology, mapping, relational database metadata, etc.

• Transparent access to the ontology will be provided through an ontology API.

• RDF data will be directly accessed through a SPARQL-based RDF API.

• Other assets must be accessible and modifyable via a SPARQL API. On demand, domain-specific APIsmay be implemented on top to ease the access to the assets.

Data and query catalog

• It must be possible to register relational databases to the system via JDBC.

• Database meta information will be automatically extracted from registered databases.

• There shall be APIs to access, browse, and visualize metadata from these data catalogs (such as tables,attributes, key and foreign key relationships)

• There must be mechanisms for creating, modifying, and editing query catalogs containing prominent,use-case specific queries.

• End-users must be able to browse and investigate both data and query catalogs.

11


2.3 Conventions and Standards

Java will be the main progamming language. SPARQL 1.01 (and its revision 1.12) will be the query languageproduced as output by the query formulation component. OWL 2 [15] will the standard language to representontologies, concretely Optique will initially focus on the OWL 2 QL profile3 based on the DL-Lite familyof description logics [2]. R2RML4 and RDB direct mapping5 formats will be the standard representationlanguages for the mappings. The API to manage these mapping formats is under active development byOptique members.

The OWL API [28] and Sesame6 will provide generic interfaces for accessing semantic data. Rather thanreimplementing ontology and mapping management capabilities from scratch, we plan to tightly integratethe well known ontology editor Protégé7. Dedicated plugins, e.g. based on a RESTful API provided bythe core platform, will make it possible to access Optique assets (ontologies, mappings, etc.) directly fromwithin Protégé, thus allowing Optique users to benefit from the established ontology management facilitiesimplemented in Protégé as well as its extensions for mapping management, like the ontop Protégé plugin8.

The Quest reasoner [50], HermiT [45], RacerPro [22], ELK [37] and MORe [51] are examples of off-the-selfOWL 2 reasoners which will be integrated within the Optique’s platform. The use of one reasoner or otherwill depend on their performance with the Optique ontologies. The ontology alignment system LogMap [32]will be used to link Optique’s ontology with state of the art domain ontologies.

Quest [50, 48] and Pegasus [43] systems will be the core of the query transformation component (seeSection 3.3 for details). While the ADP system [39, 56] is going to be the core part of the Optique’sdistributed query processing component. The JDBC API9 will be used to connect the Java programminglanguage with the databases.

Finally, the entire Optique system will be integrated via the Information Workbench (IWB) platform [26,24]. Complementary, FOP’s eCloudManager platform [25] will provide the basic interface to private andpublic clouds, allowing the Optique platform to consume virtualised compute and storage resources ad hocand on demand via a generic, vendor-independent cloud automation API.

Appendix A presents the work package requirements with respect to the Optique platform shared APIs.Existing SOTA APIs will serve as the bases for these required interfaces. For example the Optique ontologyand reasoning API will reuse and extend the functionalities provided by the OWL API and the OWL reasonersbuilt on top of the OWL API. Other interfaces for which no SOTA API exists yet (e.g. the Mapping API)will require an implementation from scratch.

2.3.1 Approaches for Streaming and Temporal Data

The ontological modelling and the query language to represent and access streaming and temporal data inOptique is still in an early stage. We are currently analysing which of the state of the art solutions is abetter fit for Optique and what limitations Optique will have to address. The results will be reported inDeliverable D2.2.

Several approaches address the problem of representing and querying temporal data within the generalcontext of ontologies. As the Optique project will follow a weak temporalization of the OBDA paradigm,which will guarantee the conservation of so-called FOL rewritability (which essentially means a possibilityto translate ontological queries into SQL queries over data sources), work on modal-style temporal ontol-ogy languages formalised via Description Logics [42] is of minor relevance; because of the bad complexity

1http://www.w3.org/TR/rdf-sparql-query/2http://www.w3.org/TR/2013/REC-sparql11-query-20130321/3http://www.w3.org/TR/owl2-profiles/4http://www.w3.org/TR/r2rml/5http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/6http://www.openrdf.org/7http://protege.stanford.edu/8http://ontop.inf.unibz.it/9http://www.oracle.com/technetwork/java/javase/jdbc/index.html

12

http://www.w3.org/TR/rdf-sparql-query/

http://www.w3.org/TR/2013/REC-sparql11-query-20130321/

http://www.w3.org/TR/owl2-profiles/

http://www.w3.org/TR/r2rml/

http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/

http://www.openrdf.org/

http://protege.stanford.edu/

http://ontop.inf.unibz.it/

http://www.oracle.com/technetwork/java/javase/jdbc/index.html


properties, this is even true for temporalized lightweight logics [3].The approach in [21] introduces temporal RDF graphs, details out a sound and complete inference system,

and gives a sketchy introduction to a possible temporal query language. A similar representation of temporalRDF graphs is adopted within the spatio-temporal RDF engine STRABON [41, 6]10, which also defines thespatio-temporal extension stSPARQL of the W3C recommendation query language SPARQL 1.1. Strabonis currently the only fully implemented spatio-temporal RDF store with rich functionality and very goodperformance as seen by the comparison in [41, 6]. The authors of [55] favor a more conservative strategyby modeling time directly with language constructs within RDF and SPARQL—the resulting extensions ofRDF and SPARQL being mere syntactic sugar. The logical approach of [44] follows ideas of [21] but shiftsthe discussion to the level of genuine ontology languages such as OWL; the semantics of the temporalizedRDF and OWL languages are given by a translation to (a fragment of) first order logic. The temporalizedSPARQL query language uses a careful separation of the time component and the thematic component thatguarantees feasibility of query answering.

The concepts of streaming relational data as well as the concepts underlying complex event processingare well understood and form the theoretical underpinnings for highly developed streaming engines used inindustrial applications. The picture for stream processing within the OBDA paradigm is quite different; thefew implemented streaming engines [7, 4, 46] are still under development and have been shown to lack one orother basic functionality [58]. Though all of the systems are intended to be used within the OBDA paradigm,only CSPARQL [4] seems to have (minimal) capabilities for reasoning/query answering over ontologies. Thereis no agreement yet on how to extend SPARQL to work over streams; and so all of the mentioned systemshave their streamified version of SPARQL. However, the core of all extensions seems to be the addition of(sliding) window operators over streams, which are adapted from query languages over relational streams [1].

2.4 Performance Requirements

The Optique core platform is built on top of the Information Workbench, a Web-based platform, thusenabling users access via state-of-the-art Web browsers. We define the response time as the interval betweena user-command and the receipt of feedback from the system. In the context of interaction with a Web-basedsystem, this means the interval between the user action (such as a click on a link or button) and the deliveryof an initial page including feedback (yet not necessarily the final result). The response time of the OptiqueHTML interface shall be less than 2 seconds in 90% of the cases, and less than 10 seconds in 99% of theuser requests. This requirement is relaxed to hold for the main navigation pages only: for complex tasks likequery formulation (which may require cost-intensive calculations on top of complex ontologies, for instance)or the time between query execution and the delivery of result, no guarantees can be made.

The Optique platform shall support up to 20 simultaneous users when set up in a single-server fashion(meaning that the response times constraints defined above shall still be valid for this amount of users). Scale-out to a higher amount of users can be reached by common high availability setups with load-balancing andmultiple Optique servers answering user requests in parallel.

In a production environment (when running on stable, HA hardware), the Optique platform shall havean availability of 99.9%. In case of a critical failure, the platform shall have a fault recovery mechanismallowing to recover a consistent state of the system within at most two hours.

2.5 Logical Database Requirements

The database behind the Optique core platform shall be able to persistently store all types of assets used inthe Optique system, particularly including

• Ontologies

10www.strabon.di.uoa.gr

13

www.strabon.di.uoa.gr


• Mappings

• Queries

• System configurations

• Database connection and meta information

• Historical information

• Query answers or samples thereof

The list of assets kept in the database may evolve over time, hence there should be a flexible databasethat allows the storage of other asset types without complete schema redesign.

2.6 Design Constraints

This section describes the technology and design constraints for the Optique platform.

2.6.1 Standard Compliance

• The HTML delivered by the Optique frontend shall conform to the HTML5 standard.

• Access to the platforms SPARQL endpoint shall be conform with the W3C’s latest SPARQL protocolspecification.

• The platform must support the latest versions of the most common Web browsers (Firefox, InternetExplorer, Safari, and Google Chrome).

• Ontologies stored and maintained in the platform shall be specified in OWL.

• Mappings stored in the platform shall adhere to the R2RML W3C specification.

2.6.2 Hardware Constraints

• The Optique core platform shall run with 8GB RAM and a quad core 2.00 GHz CPU (or comparablehardware).

• For complex queries, additional private or public cloud resources must be available in order to deliverquery results.

2.6.3 Software Constraints

• The system shall be implemented as a platform-independent Java application.

• The system requires a Java 7 Runtime Environment.

• The system is designed to run on 64bit systems only.

The software constraints listed above shall be satisfied by the core system (i.e., the central, coordinatingOptique instance), but are not mandatory for associated components. For instance, software for distributedquery processing in the cloud that is triggered by the core system, may have different software requirements.

2.7 Software System Attributes

This section describes specific requirements of the software system attributes for the Optique platform.

14


2.7.1 Security

• The system shall make use of common security standards (HTTPS, SSL, ...).

• The system must keep log and audit trails.

• There must be an LDAP-based user authentication mechanism (optionally RSA-certified).

2.7.2 Availability

• The system shall support an active-passive failover configuration.

• The system shall support load-balancing for scale-out.

2.7.3 Maintainability

• Standards shall be used to improve maintainability of both the system itself and the assets maintainedin the system.

• Programmer APIs, in particular a command line interface and a REST endpoint, shall enable basicsystem configuration at runtime.

2.7.4 Portability

• Portability shall be guaranteed by using the platform-independent Java language.

• The use of native libraries must be avoided.

As for the software constraints, the portability constraints listed above shall be satisfied by the coresystem (i.e., the central, coordinating Optique instance), but are not mandatory for associated componentssuch as the cloud-based distributed query processing module.

2.7.5 Usability

• The system shall be usable also by non-IT-experts, which demands an intuitive, easy-to-use interface.

15

Chapter 3

Optique Architecture

This chapter presents the designed Optique OBDA solution which aims at accomplishing the requirementspresented in Chapter 2, specially the functional requirements introduced in Section 2.2. Figure 3.1 shows anoverview of the Optique’s OBDA solution architecture and its components. The architecture is developedusing the three-tier approach and has three layers:

• The presentation layer consists of four main user interfaces, which will mainly be Web based: (i) toconfigure/log-in into the system, (ii) to compose queries, (iii) to visualise answers to queries, and(iv) to maintain the system by managing ontologies and mappings. The first three interfaces are forboth end-users and IT-experts, while the fourth one is meant for IT-experts only.

• The application layer consists of several main components of the Optique’s system, supports its ma-chinery, and provides the following functionality:

– query formulation,– ontology and mapping management,– query answering, and– processing and analytics of streaming and temporal data.

Additionally, the Optique system will include an LDAP authentication module, to assign differentroles to Optique users, and a Configuration module, to provide a custom initial set-up to the Optiquecomponents.

• The data and resource layer consists of the data sources that the system provides access to, that is,relational, semistructured, temporal databases and data streams. It also includes capabilities to exploitvirtual resources such as storage and compute infrastructure available through (private or public) cloudofferings.

The entire Optique system will be integrated via the Information Workbench (IWB) platform1 [26, 24]. TheIWB is a generic platform for semantic data management, which provides a shared triple store for managingthe OBDA system assets (such as ontologies, mappings, component’s configurations, query logs, (excerptsof) query answers, database metadata, lexical information related to the ontology, etc.), generic interfacesand APIs for semantic data management (e.g. ontology processing APIs). As such, the shared triple store,combined with both generic SPARQL APIs and dedicated, domain-specific APIs, form the basis for theimplementation of the catalog requirements specified in Section 2.2.4. In addition, the IWB provides generalfunctionality for user management, fine-grained access control and configuration file management, as requiredby the general system functionality requirements listed in Section 2.2.1.

In addition to these backend data management capabilities, the Information Workbench comes witha flexible user interface that will be used for implementing the query formulation components. The user

1www.fluidops.com/information-workbench/

16

www.fluidops.com/information-workbench/


Cloud API

data streamsRDBs, triple stores, temporal DBs, etc.

Stream connectorJDBC, Teiid

... ...

Information Workbench frontend API* Infor. Workbench

frontend API* Information Workbench frontend API* Infor. Workbench frontend API*

Cloud (virtual resource pool)

Ans. visual.: WorkbenchQuery Formulation

Interface

Answers visualisation Optique's Configuration Interface

Ontology and Mapping Management Interface

Ontology editing Interface: Protégé

PresentationLayer

Query Answering Component

External visualisation engines

Workbench visualisationengine

Shared triple store

Sesame

- ontology- mappings- configuration- queries- answers- history- lexical information- etc.

Map

ping

s

Ontology and Mapping Manager's Processing Components

Ont/Mapp matchers

Ont/Mapp bootsrappers

Query Formulation Processing Components

Query by Navig.1-time Q SPARQL Stream Q

Context Sens. Ed.1-time Q SPARQL Stream Q

Direct Editing1-time Q SPARQL Stream Q

Faceted searchQuery Ans man1-time Q SPARQL Stream Q

QDriven ont construction

1-time Q SPARQL Stream Q

Export functional

mininglog analysis

...

Stream analyticsranking, abductionprovenance, etc.

Met

adat

a

Configuration of modules

LDAP authentification

Feedback funct.

Sesame

Query transformation

Query rewriting1-time Q SPARQL

Stream Q

Semantic QOpt.1-time Q SPARQL

Stream Q

Syntactic QOpt1-time Q SPARQL

Stream Q

Sem indexing1-time Q SPARQL

Stream Q

Query Execution1-time Q

SQL->RDFStream

Q

Distributed Query Execution

Q Planner1-time Q

SQLStream

Q

Optimization1-time Q

SQLStream

Q

Materialization module

Shared database

JDBC, Stream API

ontology mapping

Bootstrapper

ontology mapping

Analyser

ontology mapping

Evolution Engine

ontology mapping

Transformator

ontology mapping

ApproximationSimplification

OW

L API

Federation module

Federation module

Manager

Ont/Mapp revision control, editing

* E.g., widget development, Java, REST

Ontology Processing

Ontology modularization

Sesame

Front end: mainly Web-basedComponent

Group of components

Optique solution

External solution

Components Colouring Convention Expert users

Types of Users End users API Application receiving

answers

Ontology reasoner 1Ontology reasoner 2

...

ComponentManager,

Setup module

Data,Resource Layer

ApplicationLayer

Optique Platform: Integrated via Information Workbench

Figure 3.1: The general architecture of the Optique OBDA system

interface follows a semantic wiki approach, based on a rich, extensible pool of widgets for visualization,interaction, mashup, and collaboration, which can be flexibly integrated into semantic wiki pages, allowingdevelopers to compose comprehensive, actionable user interfaces without any programming efforts.

In the following sections we describe in detail the main four application layer components together withtheir respective sub-architectures.

17


External visualisation

engines

Workbench visualisation

engine

Shared triple store

Sesame API- ontology- mappings- configuration- queries- answers- history- lexical information- etc.

mininglog analysis

...

Stream analytics


Context Sens. Ed.1-time Q SPARQL Stream Q

Direct Editing1-time Q SPARQL Stream Q

Export functionality

Answers to stream queries, e.g., CSPARQL

Answers to 1-time queries, e.g., SPARQL

Users Feedback functionality

Integrated via Information Workbench

Information Workbench frontend API (E.g., widget dev., Java, REST)

ApplicationLayer

Query FormulationInterface

PresentationLayer

Ontology & Mapping Manager's

Processing Components

Ont/Mapp revision, control, editing

Communication Chanel or Hub



Answer manager1-time Q

SQL->RDF Stream Q

Distrib. Query Execution

Shared database

Query by Navig.1-time Q SPARQL Stream Q

Faceted search

Answer Manager

Hub

1-time Q SPARQL Stream Q

QDriven ont construction

OWL API


Group of components

Optique solutionExternal solution


Types of Users

End users API

Application receiving answers



Ontology Processing



...

Figure 3.2: Query Formulation components of the Optique OBDA system

3.1 Query Formulation Component

Covering the user-facing requirements defined in Section 2.2.2, the query formulation component aims atproviding a user-friendly interface for non-technical users combining multiple representation paradigms. Weintend to design and implement novel techniques to exploit the semantics of the underlying ontology inorder to formulate both complex and valid queries. In particular, we intend to look at existing work, wherequery formulation is driven from a Description Logic model of the domain, e.g., [5, 13]. Furthermore, thiscomponent will also integrate a query-driven ontology extension subcomponent to insert new end-users’information requirements in the ontology.

Figure 3.2 shows the main query formulation subcomponents for the Optique OBDA solution and theirinteraction with other components of the system. Next we give a brief overview of each of them. Note thatmany subcomponents deal with both one-time queries, e.g., SPARQL queries, and continuous queries, e.g.,CSPARQL queries.

1. Editing subcomponents. Different users may cooperate on the same query or set of queries, thus,the Optique solution aims at providing (at least) three kind of interfaces to formulate the query (i.e.components): direct editing, context sensitive editing and query by navigation exploiting faceted searchand other navigation paradigms. Technically versed users may prefer the direct editing of the queryusing a formal language (e.g. SPARQL, stream query language), while other end-user should beprovided with a less technical interface such as query by navigation. Additionally, direct editing should

18


also allow the possibility of exploiting the ontology, and provide context sensitive completion. All threeinterfaces should provide views on the partially constructed query, and users should be able to switchbetween views at will.

2. Query-driven ontology construction subcomponent. The ontology may not include all the vocabularyexpected or needed by the end-user. Moreover, the vocabulary is to a certain extent specific to individ-uals, projects, departments, etc. and subject to change. In Optique we consider it crucial to keep theontology up-to-date with respect to the end-user needs. Thus, the Query-driven ontology constructionsubcomponent will deal with four different changing scenarios driven by end-user requirements:

(a) Adding new synonyms. Concept synonyms (e.g. annotation labels) do not represent new logicalextension of the ontology, and hence end-users will be able to add them to the ontology with no(logical) harm. For example, the concept WellBore can be extended with the labels “drill hole”or “borehole”. In order to avoid an overloading of the ontology with synonyms, we advocate aseparation between the ontology (e.g. logical axioms) and the terminological information (e.g.synonyms, descriptions, related terms, etc.) as proposed in [35].

(b) Adding basic extensions. End-user queries may also require basic extension of the ontology hier-archy, such as adding a new concept GeologicalWellBore under WellBore (i.e. GeologicalWellBore vWellBore). These types of additions can be considered safe [33] since they represent a conservativeextension of the ontology. However other additions to the ontology may require further analysisby the IT-expert if they are not conservative extensions (e.g. reclassifying the concept WellBoreunder the new concept PlannedSideTrack).

(c) “On the fly” extensions. This represents the more challenging scenario where we intend to exploitontology learning techniques in order to mine formulated queries and to identify relevant newconcepts and relations (e.g., [57, 40]). Ontology alignment techniques (e.g. LogMap [32]) will alsobe required in order to relate the new vocabulary to the existing ontology concepts.

(d) IT-expert assistance. In the cases where the manual or on-the-fly extensions are insufficient, theassistance of the IT-expert will be required to extend the ontology accordingly.

3. The Answer Manager subcomponent. This component should deal with the (basic) visualization of thequery results and their transformation (i.e. export functionality) into the required output formats (e.g.input formats of external tools).

4. The User Feedback subcomponent. This component is intended to allow the user to semi-automaticallyrefine a query if the (partially) obtained results are not the expected ones. Furthermore, similar orrelated queries to the partially constructed query will also be suggested in order to help end-users inthe refinement.

Interaction with other components of the Optique OBDA system:

1. The Ontology Revision Control system. Different versions of the ontology may exist concurrently(e.g. extensions driven by different formulated queries or query requirements). These versions willbe managed by the IT-experts through a revision control system (from the Ontology and Mappingmanagement system) in order to detect logical defects (e.g. unsatisfiabilities), logical conflicts amongversions as in [34], and OWL 2 profile violations (e.g. a new version is outside the OWL 2 QL profile).

2. The Ontology Processing component. The ontology will be a key element for the query formulationcomponent and thus, the ontology processing component (e.g. OWL API, OWL reasoners) will alsoplay an important role. Furthermore, logic-based ontology modularization techniques [17] will also beexploited to achieve a good balance between overview and focus when dealing with large ontologies.The properties of such modules guarantee that the semantics of the concepts of interest are preservedwhile providing (in general) a much smaller fragment of the ontology.

19


3. Shared triple store. The Query formulation component accesses the Shared triple store where, amongothers, the ontology, query logs, (excerpts of) query answers and the lexical information are physicallystored.

4. The Query Answering component will transform the formulated queries into executable and optimizedqueries with respect to the data sources (e.g. streaming data, relational databases).

5. Configuration and LDAP authentication. Via these components one can configure the entire QueryFormulation component and set access rights.

6. Stream Analytics. By exporting answers to this component component one can do, e.g., data miningin answers for continuous queries.

7. The Visualisation Engines allow to visualise both queries and query answers.

The following subsection presents the technical architecture for the query formulation interface based onwidget-mashups.

3.1.1 Widget-based Solution

A mashup based approach (cf. [53]) is promising for the construction of an extensible and flexible queryformulation interface. The mashup idea, in our context, is grounded on the possibility to combine thefunctionality and data of a set of individual applications in a common graphical space, for common tasks.Widgets are the building blocks of mashups, where each widget corresponds to a standalone applicationwith less complex functionality and presentation compared to full-fledged applications. In query formulationscenario, a set of widgets can be employed, for instance, one for query by navigation and one for faceted searchfor handling the construction of queries; and one for representing results in table and one for visualizing theresult in a graph to handle communication of results to the end-users.

Widgets are managed by a widget environment which provides basic communication and persistenceservices to widgets. The orchestration of widgets relies on the requirement that each widget discloses itsfunctionality to the environment through a client side interface and notifies any other widget in the environ-ment (e.g., broadcast, subscription etc.) and/or the widget environment upon each user action. Then, eithereach widget decides on what action to execute in response, by considering the syntactic or semantic signatureof the received event; or, the environment decides which widgets to invoke with which functionality. Thecore benefits of such an approach are that,

1. it becomes easier to deal with the complexity, since the management of functionality and data can bedelegated to different widgets;

2. each widget can employ a different visualization paradigm that best suits the functionality that it isexpected to provide;

3. widgets can be used alone or together, in different combinations, for different contexts and experiences;4. and the functionality of the overall interface can be extended by introducing new widgets (e.g., such

as for result visualization).The proposed initial architecture for the query formulation interface based on widget-based mashups is

depicted in Figure 3.3. The architecture assumes that each widget has client side and server side components(for complex processing), and that widgets can communicate with each other and with the environmentthrough a communication channel. Communication usually happens through the client side, but a serverside communication mechanism can also be realized in order to support remote experiences. The architectureassumes that there exists an environment controller at the client side and a component control logic at theserver side. The former is responsible for operational tasks such as collecting the event notifications comingfrom widgets and submitting control commands to them. The latter is responsible for the orchestration logic,that is it decides how widgets should react to specific events.

20


Widget based implementation of Query Formulation Interface

Server side


Export

Interface Controller

Client side

Dynamic Result

PresentationLayer

Query by NavigationInterface

Faceted Search

Interface

DirectEditing

InterfaceContext Sens.Edit. Interface

Query DrivenOnt. Contract.

Communication Chanel

Application Layer

Query by Navigation

Logics

Faceted SearchLogics

DirectEditingLogics

Feedback

AnsweringLogic

Ranking

Context SensitivLogics

Logics of Query DrivenOnt. Contract.

Component Control Logic

Front end: mainly Web-basedComponent Widget Optique solution


Types of Users End users

Figure 3.3: Query Formulation interface based on widget-based mashups

3.2 Ontology and Mapping Management Component

The ontology and mapping management component targets the requirements specified in Section 2.2.3. Assuch, it will provide tools and methodologies to (i) semi-automatically bootstrap an initial ontology andmappings and (ii) maintain the consistency between the evolving mappings and the evolving ontology.

In Figure 3.4 we present the Ontology and Mapping Management component (the O&M manager) of theOptique OBDA system. The O&M manager has a Web interface at the presentation layer which will alsoinclude the well known ontology editor Protégé for sophisticated extensions over the ontology. Functional-ities of the O&M manager are intended for IT-experts rather than end-users. The manager includes fivesubcomponents:

1. The O&M Bootstrapper. OBDA systems crucially depend on the existence of suitable ontologies andmappings. Developing them from scratch is likely to be expensive and a practical OBDA systemshould support a (semi-) automatic bootstrapping of an initial ontology and set of mappings. Thus,the Optique OBDA system will be equipped with an OBDA bootstrapper, a routine that takes a set ofdatabase schemata and possibly instances over these schemata as an input, and returns an ontologyand a set of mappings connecting the ontology entities to the elements of the input schemata.

2. The O&M Matching and alignment system. The bootstrapped ontologies will be aligned with domainontologies using state of the art ontology alignment techniques2 (e.g. LogMap [32]).

3. The O&M Analyser and reasoner. The ontologies and the mappings are not static objects and aresubject to frequent changes. The O&M analyser will check ontologies and mappings for defects causedby these changes. We distinguish between logical and modelling defects. The three types of logicaldefects that are usually discussed in the literature (see, for example, [54, 36, 52]) are inconsistency,

2http://www.ontologymatching.org/

21

http://www.ontologymatching.org/


External visualisation

engines

Workbench visualisation

engine




Information Workbench frontend API (E.g., widget development, Java, REST)

Application, Internal Data Layer

PresentationLayer

Query Formulation Processing

Components

Ontology and Mapping Management Interface

Ontology editing Interface: Protégé

Querydriven

ontologyconstruction

Ontology & Mapping Manager'sProcessing Components

O&M matching,alignment system

O&M evolution and transformation

engine

O&M analyser, reasoner

O&M revision, control, editing

O&Mbootstrapper


Group of components

Optique solutionExternal solution

Components Colouring Convention

API

Application receiving answers

OWL APISesame API

Shared triple store

- ontology- mappings- configuration- queries- answers- history- lexical information- etc.

Ontology Processing



...

Figure 3.4: Ontology and Mapping Management component of the Optique OBDA system

unsatisfiability, and incoherency. In OBDA scenarios empty Mappings will also be an important logicaldefect. A mapping of an OBDA setting is empty if it does not propagate any individuals (resp. pairs ofindividuals) into any concept (resp. property) in the ontology. Modelling defects are less intuitive andless formally defined than logical ones. Typical modelling defects are redundancy [20] (e.g. redundantaxioms, concepts, and mappings) and unexpected entailments (e.g. those entailments that do notcorrespond to the expected by domain experts [29].

4. The O&M Evolution and transformation engine. The functionality of this component is twofold. Onthe one hand, it performs debugging on defects found by the analyser, and returns a debugged versionof the ontology and the mappings. In the development of the Optique OBDA system we plan toexploit existing techniques from ontology evolution, e.g. [12, 19]. On the other hand, it may performseveral kinds of transformations of the ontology and the mappings: for instance, it should transforman input ontology which is in a language not supported by the OBDA systems (e.g., OWL 2) andreturn a version of the ontology into the supported language (e.g., OWL 2 QL profile). Also, it mayperform transformations of the mappings related to formal properties of the mappings or to optimisationstrategies (some initial examples may be found in [47]). This may involve changes in the syntax and/orsemantics of the ontology.

5. The O&M Revision, control and editing system will support versioning and editorial processes for bothontologies and mappings. It will also act as a hub, coordinating interoperation between the analyserand evolution engine.

22


The O&M manager also interacts with other components of the Optique OBDA system. In particular,

1. It accesses the Shared triple store, where the ontology and mappings are physically stored. It can bothread the ontology and mappings and update them when needed.

2. The analyser, alignment system, and evolution engine have access to reasoning capabilities, e.g., exter-nal ontology reasoners, ontology modularisation engines, etc.

3. The Query Formulation Component can call the O&M manager whenever a user decides to extend theontology. We refer to this as query-driven ontology construction.

4. The O&M manager is connected to a Visualisation engine to visualise both ontology and mappings.

5. Finally, the O&M manager configured via the Configuration component access rights can be set viaLDAP authentication.

3.3 Query Answering Component

Covering the backend functionalities (query planning, optimization, execution, and use of cloud resources)defined in Section 2.2.2, the query answering component is compound of two large subcomponents:

• query transformation subcomponent, and

• distributed query processing subcomponent.

The query transformation subcomponent is responsible for translating, usually referred to as rewriting, ofqueries received from the query formulation component, e.g., SPARQL queries, into an optimised executablecode that should be evaluated over the data sources and streams in the data layer, e.g., into a set of SQLor sliding-window queries. The Quest system [50, 48] is going to be the core part of the Optique’s querytransformation, while we plan to develop novel rewriting and optimisation techniques to deal, e.g., withstreaming data, and to employ other techniques, such as the implemented in Pegasus [43]. Both Quest andPegasus can be used in several stages of the query transformation process. In particular, they implementsophisticated rewriting and optimization techniques.

The distributed query processing subcomponent provides query planning and execution. It distributesqueries to individual servers and uses massively parallelised (cloud) computing. The ADP [39, 56] systemfor complex dataflow processing in the cloud is going to be the core part of Optique’s distributed queryprocessing.

3.3.1 Query Transformation Subcomponent

Figure 3.5 presents the architecture of the Query transformation (QT) component of the Optique OBDAsystem. The QT component interacts with other components of the Optique OBDA system. In particular:

1. The query formulation component provides a query interface for end-users. This component receivesqueries from an end-user and send them to the QT component, e.g., via Sesame API.

2. The configuration module provides the configuration for the QT module that is required for querytransformation performance.

3. The ontology processing module (a group of components such as ontology reasoners, modularisationsystems, etc.) is called by the QT module to perform semantic optimisation.

4. The distributed query processing component receives rewritten queries from the QT module and per-forms their evaluation over data sources.

23


Presentation Layer


Application Layer


Shared triple store

Sesame

Query rewriting1-time Q SPARQL

Stream Q Semantic QOpt.

1-time Q SPARQL

Stream Q

Syntactic QOpt1-time Q SPARQL

Stream Q

Sem indexing1-time Q SPARQL

Stream Q

Query Transformation

Manager

Query execution1-time Q SPARQL

Stream Q

Distributed Query Execution

Federation module

Materialization module

Set up Module

Analytics


... ...

Optique's configuration interface

Data,ResourceLayer

Shared database

Front end:mainly Web-basedComponent Group of

componentsOptique solution


Users



Stream connectorJDBC, Teiid

Infor. Workbench frontend API*

Query Formulation Processing

Components

Stream analyticsOntology Processing


Ontology reasoners

OW

L API

Sesame

Map

ping

sM

etad

ata

External solution

ApplicationsReceivingAnswers

Figure 3.5: Query transformation component of the Optique OBDA system

5. The (internal) shared database contains the technical data required for data answering process such assemantic indices, answers to queries, etc. This database will be only shared by the QT component andthe distributed query processing component.

6. The shared triple store contains the data that can be used by (the most of) the components of theOptique OBDA system. E.g., it contains the ontology, the mappings, the query history, etc.

7. The stream analytics module provides analyse of answers to the stream queries.

8. Data sources (RDBs, triple stores, data streams) can be also accessed by the QT module during thequery execution.

The Query Transformation Manager (QTM) is the principal component of the QT component and willorchestrate the QT process. We assume that the QT process is triggered when a query is received from thequery formulation component. Next we describe the role of each QT subcomponent and their interplay inthe query transformation process. The process can be divided into several stages:

1. Initialisation. At this stage the Configuration module sends the configuration to the Set-up module,which in turn configure the other modules of the QT module. The initialisation includes several stepsin which the input ontology and mappings get analysed and optimised so as to allow the rewriting andoptimisation algorithms to be fast, and the query evaluation over the data sources to be more efficient.

24


The semantic indexing and materialisation modules are in charge of creation and maintenance ofso-called semantic index.

2. Query rewriting. The QTM sends the (SPARQL) query Q received from the query formulation com-ponent to the query rewriting module, which is in charge of rewriting the query in the required format;for example, SQL if it is supposed to be evaluated over RDBs or Streaming SPARQL for querying datastreams. Further, for the sake of simplicity, we will assume that the target query format is SQL. Alongwith the rewriting, this module optimises the rewritten query both syntactically and semantically.

(a) Syntactic optimisation. During the transformation process, the (SPARQL) initial query mightbe turned into a number of SQL queries Q1, . . . , Qn such that their union is equal to Q. In theSyntactic optimisation stage, these queries get optimised to improve the performance, e.g., somejoints, conditions, filters within this SQL queries are deleted. The methods that are used to detectwhat parts of the queries have to be optimised are syntactical, that is they are based only on theshape of a query and do not involve any reasoning.

(b) Semantic optimisation. The next step is to perform semantic optimisation of the queries. Duringthis stage, the queries get optimised in a similar manner as in the case of syntactic optimisation.The difference is that the methods in this module take into account semantic information such asquery containment, integrity constraints of the data sources, ontology reasoning, etc.

3. Query execution. After rewriting and optimization, the queries Q′i1 , . . . , Q′im

are sent to the queryexecution module. This module decides what queries of Q′i1 , . . . , Q

′im, if any, need to be evaluated using

distributing query execution, and what can be evaluated directly by the standard query answeringfacilities. In the former case, the corresponding queries are sent to the distributed query processingcomponent. In the latter case, the corresponding queries are evaluated directly over the data sourcesby standard means. If some of the queries have to be evaluated over over a federated database system,the query execution module entrusts this task to the federation module.

4. Query answer management. After the query evaluation process, the answers to the queries that havebeen sent directly to the data sources are returned to the QTM module. The manager transformthem into the required format and send them back to the query formulation component, which takescare of representing the answers to end-users. The queries that has been sent to the distributed queryprocessing component do not necessarily go directly to the query answer manager, but rather to ashared database. The reason is that the answer can be very big (up to several GBs), so sending themdirectly to the QTM component would hang the system. The QTM receives the signal that the answersare in the shared database and some metadata about the answer. Then, together with the analyticsmodule, it decides how to proceed further. The answers to one-time queries, e.g. SQL queries overRDBs, eventually go to the query formulation component, while the answers to stream queries go tothe stream analytics module.

3.3.2 Distributed Query Processing Subcomponent

The Distributed query processing subcomponent aims at achieving the efficiency in Big Data scenarios byboth massive parallelism, i.e., running queries with the maximum amount of parallelism at each stage ofexecution, and elasticity, i.e., by allowing a flexibility to execute the same query with the use of resourcesthat depends on the the resource availability for this particular query, and the execution time goals.

The distributed query execution is based on the ADP (Athena Distributed Processing) [39, 56], a systemfor complex dataflow processing in the cloud. ADP has been developed and used successfully in severalEuropean projects such as Health-e-Child [27].

The general architecture of the distributed query processing component within the Optique platform isshown in Figure 3.6. The query is received through the gateway using JDBC. This communication mainlyinvolves interaction with the query transformation component. The Master node (see anatomy in Figure

25




... ...Cloud (virtual resource pool)

Optique's configuration interfacePresentation Layer

Externat DataLayer

Shared database


Query Rewriting1-time Q SPARQL Stream Q

Answ Manager1-time Q SPARQL Stream Q

Front end:mainly Web-basedComponent Group of components Optique solution


Types of Users




Distributed Query Execution based on ADP

Master

Data Connector

Optimisation Engine

Optimisation Engine

Execution Engine

Execution Engine

StreamConnector

WorkerWorker Worker Worker

P2P Net

Fast Local Net

ADP Gateway: JDBC, Stream API

Application, Internal Data Layer

Cloud APIStream connectorJDBC, Teiid

Externat Cloud

Figure 3.6: General architecture of the ADP component within the Optique System

3.7a) is responsible for initialization and coordination of the process. The optimization engine (see anatomyin Figure 3.7b) produces the execution plan for the query. Next, the execution plan is given to the executionengine (see anatomy in Figure 3.7c) which is responsible for reserving the necessary resources, sending theoperators of the graph to the appropriate workers (see anatomy in Figure 3.7d), and monitor the execution.Next we describe in more detail the distribution process:

• Language and Optimization: The queries are expressed in SQL. Queries are issued to the systemthrough the gateway. The SQL query is transformed to a data flow language allowing complex graphswith operators as nodes and with edges representing producer-consumer relationships. The first levelof optimization is planning. The result of this phase is an SQL query script. We enhanced SQL byadding the table partition as a first class citizen of the language. A table partition is defined as a setof tuples having a particular property (e.g., the value of a hash function applied on one column is thesame for all the tuples in the same partition). A table is defined as a set of partitions. The optimizerproduces an execution plan in the form of a directed acyclic graph (DAG), with all the informationneeded to execute the query.

• Execution Engine: ADP relies on an asynchronous execution engine. As soon as a worker node com-pletes one job, it is sending a corresponding signal to the execution engine. The execution engine usesan asynchronous event based execution manager, which records the jobs that have been executed andassigns new jobs when all the prerequisite jobs have finished.

26


Resource(Manager(

P2P(Main(Re

gistry(

Session(Manager(

Repo

rt(M

anager(

DB(Registry(

Mon

itorin

g( Asynchronous(Execu:on(Manager((

Asynchronous(Streaming(Manager((

(a) ADP Master Node subcomponents

Query&Parser&

Session&Manager&

Mon

itorin

g&

P2P&Re

gistry&

Repo

rt&M

anager&

Query&Op5mizer&

One&Time& Stream& Federated&

(b) ADP Optimization Engine subcomponents

Session'Manager'

Asynchronous'Event'Based'Execu7on'Engine'M

onito

ring'

P2P'Re

gistry'

Repo

rt'M

anager'

(c) ADP Execution Engine subcomponents

Session'Manager'

Operator'Manager'

Buffer'Manager'

Connec4on'Manager'

Sta4s4cs'Manager'

Mon

itorin

g'

Repo

rt'M

anager'

P2P'Re

gistry'

Table'Par44ons'

(d) ADP Worker subcomponents

Table&Transfer&Scheduler&

Session&Manager&

Mon

itorin

g&

P2P&Re

gistry&

Repo

rt&M

anager&

(e) ADP Data Connector subcomponents

Asynchronous*Stream*Event*Listener*

Session*Manager*

Mon

itorin

g*

P2P*Re

gistry*

Repo

rt*M

anager*

(f) ADP Stream Connector subcomponents

Figure 3.7: Antomy of ADP subcomponents

27


• Worker Pool: The resources needed to execute the queries (machines, network, etc.) are reserved orallocated automatically. Those resources are wrapped into containers. Containers are used to abstractfrom the details of a physical machine in a cluster or a virtual machine in a cloud. Workers run queriesusing a python wrapper of SQLite3. This part of the system can also be used as a standalone singlenode DB4. Queries are expressed in a declarative language which is an extension of SQL. This languagefacilitates considerably the use of user-defined functions (UDFs). UDFs are written in Python. Thesystem supports row, aggregate, and virtual table functions.

• Data / Stream Connector: The Data Connector (see anatomy in Figure 3.7e) and the Stream Connector(see anatomy in Figure 3.7f) are responsible for handling and dispatching the relational and streamdata through the network respectively. These modules are used when the system receives a request forcollecting the results of executed queries. Stream Connector uses an asynchronous stream event listenerto be notified of incoming stream data, whereas Data Connector utilizes a table transfer scheduler toreceive partitions of relational tables from the worker nodes.

3.4 Streaming and Temporal Data in the Optique Architecture

Processing and Analytics of Streaming and Temporal Data is primarily motivated by the need of largeindustries. For example, Siemens encompasses several terabytes of temporal data coming from sensors, withan increase rate of about 30 gigabytes per day. Addressing this challenge requires a number of techniquesand tools which should be integrated in several modules of the Optique OBDA solution. For example, thequery formulation module should support window queries and the query answering module should supportrewriting and optimised execution of such queries. Additionally, the ontology and mapping managementcomponent is also required to design appropriate formalisms to support ontological modelling of streamingand temporal data.

The query language that the system should provide to end-users should combine (i) temporal operators,that address the time dimension of data and allow to retrieve data which was true “always” in the past or“sometimes” in the last X months, etc., (ii) time series analysis operators, such as mean, variance, confi-dence intervals, standard deviation, as well as trends, regression, correlation, etc., and (iii) stream orientedoperators, such as sliding windows.

Given the query, mapping languages, and ontology, the Optique system should be able to translate queriesinto highly optimised executable code over the underlying temporal and streaming data. This requires tech-niques for automated query translation of one-time, continuous, temporal queries, and their combinations.Existing translation techniques are limited and they do not address query optimisation and distributed queryprocessing. Thus, novel approaches should be developed.

3http://www.sqlite.org4https://code.google.com/p/madis/

28

http://www.sqlite.org

https://code.google.com/p/madis/

Chapter 4

Conclusions and Dissemination

In this Deliverable we have presented an initial agreed-upon specification of the architecture describing theindividual system components and their interplay. We have also given an initial specification of the interfacesas well as the required standards. The process of designing the architecture and interfaces has involved severalteleconference with members of each work package. We have also gather the required shared APIs for eachwork package by means of a questionnaire. Details about the outcome of this process, which found entranceinto the architecture specification and will form the basis for a detailed implementation of the APIs, aresummarized in Appendix A.

The architecture described in this document will serve as a guideline for the modules and componentsdeveloped in the technical work packages, and will evolve according to new requirements. The final architec-ture will be provided in deliverable D2.2 (Month 30). We intend to follow a similar methodology to gatherthe new requirements for the final architecture.

As an output of the agreed-upon Optique architecture and subcomponents we have published the followingpapers1 in different venues collocated with the 10th Extended Semantic Web Conference (2013).

• OWL Experiences and Directions Workshop (OWLED)

– The Optique Project: Towards OBDA Systems for Industry (Short Paper) [8]

– Towards Query Formulation and Query-Driven Ontology Extensions in OBDA Systems [16]

– On Rewriting and Answering Queries in OBDA Systems for Big Data (Short Paper) [11]

– Distributed Query Processing on the Cloud: the Optique Point of View (Short Paper) [38]

• Workshop on Debugging Ontologies and Ontology Mappings (WoDOOM)

– Optique System: Towards Ontology and Mapping Management in OBDA Solutions [23]

• Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (Know@LOD)

– Addressing Streaming and Historical Data in OBDA Systems: Optique’s Approach (Statement ofInterest) [30]

• Poster track of the Extended Semantic Web Conference

– Optique: OBDA Solution for Big Data [9]

1Papers are available from: http://www.optique-project.eu/results/publications/

29

http://www.optique-project.eu/results/publications/

Bibliography

[1] Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL continuous query language: semanticfoundations and query execution. The VLDB Journal, 15:121–142, 2006. 10.1007/s00778-004-0147-z.

[2] Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Michael Zakharyaschev. The DL-LiteFamily and Relations. J. Artif. Intell. Res. (JAIR), 36:1–69, 2009.

[3] Alessandro Artale, Roman Kontchakov, Vladislav Ryzhikov, and Michael Zakharyaschev. Past andFuture of DL-Lite. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence(AAAI-10). AAAI Press, 2010.

[4] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Gross-niklaus. C-SPARQL: a Continuous Query Language for RDF Data Streams. Int. J. Semantic Computing,4(1):3–25, 2010.

[5] Sean Bechhofer and Ian Horrocks. Driving User Interfaces from FaCT. In Proceedings of the 2000International Workshop on Description Logics, pages 45–54, 2000.

[6] Konstantina Bereta, Panayiotis Smeros, and Manolis Koubarakis. Representation and querying of validtime of triples in linked geospatial data. ESWC, 2013.

[7] Jean-Paul Calbimonte, Oscar Corcho, and Alasdair J. G. Gray. Enabling ontology-based access tostreaming data sources. In Proceedings of the 9th international semantic web conference on The semanticweb - Volume Part I, ISWC’10, pages 96–111, Berlin, Heidelberg, 2010. Springer-Verlag.

[8] D. Calvanese, M. Giese, P. Haase, I. Horrocks, T. Hubauer, Y. Ioannidis, E. Jiménez-Ruiz, E. Kharlamov,H. Kllapi, J. Klüwer, M. Koubarakis, S. Lamparter, R. Möller, C. Neuenstadt, T. Nordtveit, Ö. Özcep,M. Ro driguez Muro, M. Roshchin, Marco Ruzzi, F. Savo, M. Schmidt, A. Soylu, A. Waaler, andD. Zheleznyakov. The Optique Project: Towards OBDA Systems for Industry (Short Paper). In OWLExperiences and Directions Workshop (OWLED), 2013.

[9] D. Calvanese, M. Giese, P. Haase, I. Horrocks, T. Hubauer, Y. Ioannidis, E. Jiménez-Ruiz, E. Kharlamov,H. Kllapi, J. Klüwer, M. Koubarakis, S. Lamparter, R. Möller, C. Neuenstadt, T. Nordtveit, Ö. Özcep,M. Ro driguez Muro, M. Roshchin, Marco Ruzzi, F. Savo, M. Schmidt, A. Soylu, A. Waaler, andD. Zheleznyakov. Optique: OBDA Solution for Big Data. In Poster track of the Extended SemanticWeb Conference, 2013.

[10] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Mari-ano Rodriguez-Muro, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. The MASTRO systemfor ontology-based data access. Semantic Web, 2(1):43–53, 2011.

[11] Diego Calvanese, Ian Horrocks, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Michael Meier, MarianoRodriguez-Muro, and Dmitriy Zheleznyakov. On Rewriting and Answering Queries in OBDA Systemsfor Big Data (Short Paper). In OWL Experiences and Directions Workshop (OWLED), 2013.

30


[12] Diego Calvanese, Evgeny Kharlamov, Werner Nutt, and Dmitriy Zheleznyakov. Evolution of DL-LiteKnowledge Bases. In International Semantic Web Conference (1), pages 112–128, 2010.

[13] Tiziana Catarci, Paolo Dongilli, Tania Di Mascio, Enrico Franconi, Giuseppe Santucci, and SergioTessaris. An ontology based visual tool for query formulation support. In ECAI, pages 308–312, 2004.

[14] Jim Crompton. Keynote talk at the W3C Workshop on Semantic Web in Oil & Gas Industry: Hous-ton, TX, USA, 9–10 December, 2008. available from http://www.w3.org/2008/12/ogws-slides/

Crompton.pdf.

[15] B. Cuenca Grau, I. Horrocks, B. Motik, B. Parsia, P. F. Patel-Schneider, and U. Sattler. OWL 2: Thenext step for OWL. J. Web Sem., 6(4):309–322, 2008.

[16] Bernardo Cuenca Grau, Martin Giese, Ian Horrocks, Thomas Hubauer, Ernesto Jiménez-Ruiz, EvgenyKharlamov, Michael Schmidt, Ahmet Soylu, and Dmitriy Zheleznyakov. Towards Query Formulation andQuery-Driven Ontology Extensions in OBDA. In OWL Experiences and Directions Workshop (OWLED),2013.

[17] Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Modular reuse of ontologies:Theory and practice. J. Artif. Intell. Res., 31:273–318, 2008.

[18] Martin Giese, Diego Calvanese, Peter Haase, Ian Horrocks, Yannis Ioannidis, Herald Kllapi, ManolisKoubarakis, Maurizio Lenzerini, Ralf Möller, Özgür Özçep, Mariano Rodriguez Muro, Riccardo Rosati,Rudolf Schlatte, Michael Schmidt, Ahmet Soylu, and Arild Waaler. Scalable End-user Access to BigData. In Rajendra Akerkar: Big Data Computing. Florida : Chapman and Hall/CRC. To appear., 2013.

[19] Bernardo Cuenca Grau, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, and Dmitriy Zheleznyakov. Ontol-ogy Evolution Under Semantic Constraints. In KR, 2012.

[20] Stephan Grimm and Jens Wissmann. Elimination of Redundancy in Ontologies. In ESWC (1), pages260–274, 2011.

[21] Claudio Gutierrez, Carlos Hurtado, and Ro Vaisman. Temporal RDF. In In European Conference onthe Semantic Web (ECSW’ 05), pages 93–107, 2005.

[22] Volker Haarslev, Kay Hidde, Ralf Möller, and Michael Wessel. The RacerPro knowledge representationand reasoning system. Semantic Web, 3(3):267–277, 2012.

[23] Peter Haase, Ian Horrocks, Dag Hovland, Thomas Hubauer, Ernesto Jiménez-Ruiz, Evgeny Khar-lamov, Johan Klüwer, Christoph Pinkel, Riccardo Rosati, Valerio Santarelli, Ahmet Soylu, and DmitriyZheleznyakov. Optique System: Towards Ontology and Mapping Management in OBDA Solutions. InWorkshop on Debugging Ontologies and Ontology Mappings (WoDOOM), 2013.

[24] Peter Haase, Christian Hütter, Michael Schmidt, and Andreas Schwarte. The Information Workbenchas a Self-Service Platform for Linked Data Applications. In Proceedings of the WWW 2012 DeveloperTrack, 2012.

[25] Peter Haase, Tobias Mathäß, Michael Schmidt, Andreas Eberhart, and Ulrich Walther. Semantic tech-nologies for enterprise cloud management. In International Semantic Web Conference (2), pages 98–113,2010.

[26] Peter Haase, Michael Schmidt, and Andreas Schwarte. The Information Workbench as a Self-ServicePlatform for Linked Data Applications. In Proceedings of the Second International Workshop on Con-suming Linked Data (COLD), 2011.

[27] Health-e-Child. Integrated healthcare platform for european paediatrics, 2006. http://www.

health-e-child.org/.

31

http://www.w3.org/2008/12/ogws-slides/Crompton.pdf

http://www.w3.org/2008/12/ogws-slides/Crompton.pdf

http://www.health-e-child.org/

http://www.health-e-child.org/


[28] Matthew Horridge and Sean Bechhofer. The OWL API: A Java API for OWL ontologies. SemanticWeb, 2(1):11–21, 2011.

[29] Ian Horrocks. Tool Support for Ontology Engineering. In Foundations for the Web of Information andServices, pages 103–112, 2011.

[30] Ian Horrocks, Thomas Hubauer, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Manolis Koubarakis, RalfMöller, Konstantina Bereta, Christian Neuenstadt, Özgür Özçep, Mikhail Roshchin, Panayiotis Smeros,and Dmitriy Zheleznyakov. Addressing Streaming and Historical Data in OBDA Systems: Optique’sApproach (Statement of Interest). In Workshop on Knowledge Discovery and Data Mining Meets LinkedOpen Data (Know@LOD), 2013.

[31] IEEE Computer Society. IEEE Recommended Practice for Software Requirements Specifications, 1998.http://dx.doi.org/10.1109%2FIEEESTD.1998.88286.

[32] Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. LogMap: Logic-based and Scalable Ontology Match-ing. In Int’l Sem. Web Conf. (ISWC), pages 273–288, 2011.

[33] Ernesto Jiménez-Ruiz, Bernardo Cuenca Grau, Ulrike Sattler, Thomas Schneider, and Rafael Berlanga.Safe and economic re-use of ontologies: A logic-based methodology and tool support. In The 5thEuropean Semantic Web Conference, ESWC, volume 5021, pages 185–199, 2008.

[34] Ernesto Jiménez-Ruiz, Bernardo Cuenca Grau, Ian Horrocks, and Rafael Berlanga Llavori. Supportingconcurrent ontology development: Framework, algorithms and tool. Data Knowl. Eng., 70(1):146–164,2011.

[35] Antonio Jimeno-Yepes, Ernesto Jiménez-Ruiz, Rafael Berlanga Llavori, and Dietrich Rebholz-Schuhmann. Reuse of terminological resources for efficient ontological engineering in life sciences. BMCBioinformatics, 10(S-10):4, 2009.

[36] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, and James A. Hendler. Debugging unsatisfiable classes inOWL ontologies. J. Web Sem., 3(4):268–293, 2005.

[37] Yevgeny Kazakov, Markus Krotzsch, and Frantisek Simancik. Concurrent classification of EL ontologies.In Int’l Sem. Web Conf. (ISWC), pages 305–320, 2011.

[38] Herald Kllapi, Dimitris Bilidas, Ian Horrocks, Yannis Ioannidis, Ernesto Jiménez-Ruiz, Evgeny Khar-lamov, Manolis Koubarakis, and Dmitriy Zheleznyakov. Distributed Query Processing on the Cloud:the Optique Point of View (Short Paper). In OWL Experiences and Directions Workshop (OWLED),2013.

[39] Herald Kllapi, Eva Sitaridi, Manolis M. Tsangaris, and Yannis E. Ioannidis. Schedule optimization fordata processing flows on the cloud. In Proc. of SIGMOD, pages 289–300, 2011.

[40] Konstantinos Kotis, Andreas Papasalouros, and Manolis Maragoudakis. Mining query-logs towardslearning useful kick-off ontologies: an incentive to semantic web content creation. IJKEDM, 1(4), 2011.

[41] Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis. Strabon: A Semantic GeospatialDBMS. In International Semantic Web Conference, Boston, USA, November 2012.

[42] Carsten Lutz, Frank Wolter, and Michael Zakharyaschev. Temporal Description Logics: A Survey. InStéphane Demri and Christian S. Jensen, editors, 15th International Symposium on Temporal Repre-sentation and Reasoning (TIME-08), pages 3–14, 2008.

[43] Michael Meier. The backchase revisited. Submitted for Publication, 2013.

32

http://dx.doi.org/10.1109%2FIEEESTD.1998.88286


[44] Boris Motik. Representing and querying validity time in RDF and OWL: a logic-based approach. InProceedings of the 9th international semantic web conference on The semantic web - Volume Part I,ISWC’10, pages 550–565, Berlin, Heidelberg, 2010. Springer-Verlag.

[45] Boris Motik, Rob Shearer, and Ian Horrocks. Hypertableau reasoning for description logics. J. Artif.Intell. Res., 36:165–228, 2009.

[46] Danh Le Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, and Manfred Hauswirth. A native andadaptive approach for unified processing of linked streams and linked data. In Lora Aroyo, ChrisWelty, Harith Alani, Jamie Taylor, Abraham Bernstein, Lalana Kagal, Natasha Fridman Noy, and EvaBlomqvist, editors, 10th International Semantic Web Conference (ISWC 2011), pages 370–388, 2011.

[47] Floriana Di Pinto, Domenico Lembo, Maurizio Lenzerini, Riccardo Mancini, Antonella Poggi, RiccardoRosati, Marco Ruzzi, and Domenico Fabio Savo. Optimizing query rewriting in ontology-based dataaccess. In EDBT, pages 561–572, 2013.

[48] Mariano Rodriguez-Muro and Diego Calvanese. Dependencies: Making Ontology Based Data AccessWork. In AMW, 2011.

[49] Mariano Rodriguez-Muro and Diego Calvanese. High Performance Query Answering over DL-Lite On-tologies. In KR, 2012.

[50] Mariano Rodriguez-Muro and Diego Calvanese. Quest, an OWL 2 QL Reasoner for Ontology-basedData Access. In OWLED, 2012.

[51] Ana Armas Romero, Bernardo Cuenca Grau, and Ian Horrocks. MORe: Modular Combination of OWLReasoners for Ontology Classification. In Int’l Sem. Web Conf. (ISWC), pages 1–16, 2012.

[52] Kostyantyn Shchekotykhin, Gerhard Friedrich, Philipp Fleiss, and Patrick Rodler. Interactive OntologyDebugging: Two Query Strategies for Efficient Fault Localization. Web Semantics: Science, Servicesand Agents on the World Wide Web, 12(0), 2012.

[53] Ahmet Soylu, Felix Modritscher, Fridolin Wild, Patrick De Causmaecker, and Piet Desmet. Mashupsby orchestration and widget-based personal environments Key challenges, solution strategies, and anapplication. Program-Electronic Library and Information Systems, 46(4):383–428, 2012.

[54] Heiner Stuckenschmidt. Debugging OWL Ontologies - A Reality Check. In Proceedings of the 6thInternational Workshop on Evaluation of Ontology-based Tools and the Semantic Web Service Challenge(EON), 2008.

[55] Jonas Tappolet and Abraham Bernstein. Applied Temporal RDF: Efficient Temporal Querying of RDFData with SPARQL. In Proceedings of the 6th European Semantic Web Conference on The Seman-tic Web: Research and Applications, ESWC 2009 Heraklion, pages 308–322, Berlin, Heidelberg, 2009.Springer-Verlag.

[56] Manolis M. Tsangaris, George Kakaletris, Herald Kllapi, Giorgos Papanikos, Fragkiskos Pentaris, PaulPolydoras, Eva Sitaridi, Vassilis Stoumpos, and Yannis E. Ioannidis. Dataflow Processing and Opti-mization on Grid and Cloud Infrastructures. IEEE Data Eng. Bull., 32(1):67–74, 2009.

[57] Jie Zhang, Miao Xiong, and Yong Yu. Mining query log to assist ontology learning from relationaldatabase. In Frontiers of WWW Research and Development (APWeb), pages 437–448, 2006.

[58] Ying Zhang, P. Minh Duc, O. Corcho, and J. P. Calbimonte. Srbench: A Streaming RDF/SPARQLBenchmark. In Proceedings of International Semantic Web Conference 2012, November 2012.

33

Glossary

ACL Access Control ListADP Athena Distributed ProcessingAPI Application Programming InterfaceCLI Command Line InterfaceCQ Conjunctive QueryCQL Continuous Conjunctive QueryCSPARQL Continuous SPARQLDAG Directed Acyclic GraphDL Description LogicFOL First Order LogicHTTP Hypertext Transfer ProtocolHTTPS Hypertext Transfer Protocol SecureIEEE Institute of Electrical and Electronics EngineersIT Information TechnologiesIWB FOP Information WorkbenchLDAP Lightweight Directory Access ProtocolJDBC Java Database ConnectivityOBDA Ontology-based Data AccessOS Operating SystemOWL Web Ontology LanguageO&M Ontology and MappingQA Query AnsweringQbN Query by NavigationQF Query FormulationQT Query TransformationQTM Query Transformation ManagerRDB Relational Data BaseRDBMS Relational Data Base Management SystemRDF Resource Description FrameworkREST Representational State TransferR2RML RDB to RDF Mapping LanguageSOTA State of the ArtSPARQL SPARQL Protocol and RDF Query LanguageSQL Structured Query LanguageSVN Subversion (version control system)SSH Secure ShellUCQ Union of Conjunctive QueriesUDF User-defined FunctionsURI Uniform Resource IdentifierVM Virtual MachineWP Work PackageW3C World Wide Web Consortium

34

Appendix A

Initial Specification of the Shared Interfaces

We distinguish between (i) the shared interfaces among the Optique components and (ii) the interfacesprovided by each component itself (e.g. APIs for query formulations). In this initial specification of theinterfaces we will focus on the former. The concrete interfaces for each component have been introduced inChapter 3, together with their interplay, and they will be described with more detail in Deliverable D2.2.

We have gathered the requirements in terms of shared APIs for each technical WP, that is, WP3–WP7,by means of a questionnaire. The collection of requirements was split in the following phases:

1. February 1: the questionnaires were sent to the technical WPs.

2. February 8: a preliminary version was filled by the leaders or delegates of the technical WPs.

3. February 11-15: individual sync-up between FOP-UOXF and the leaders or delegates of the technicalWPs.

4. February 22: the questionnaires were finalised by the leaders or delegates of the technical WPs.

The questionnaire addresses the following seven categories of shared APIs, where each category is de-scribed in terms of several features.

(1) Shared APIs for Ontology Management with 10 features,

(2) Shared APIs for Reasoning over Ontologies with 9 features

(3) Shared APIs for Mapping Management with 6 features,

(4) Shared APIs for Relational Data and Metadata Management with 3 features,

(5) Shared APIs for RDF Data Management with 3 features,

(6) Shared APIs for Streaming Data Management with 5 features,

(7) Shared APIs for Cloud Automation with 6 features.

The leaders of technical WPs were asked to specify for each feature its

• importance for WP according to the scale from 0 to 5, where 0 = lowest and 5 = highest,

• required year to be implemented, that is, from Y1 to Y4.

The following section presents the aggregated content of the filled questionnaires. This information formsthe basis for the upcoming API design.

35


A.1 Shared APIs for Ontology Management

This section covers functionality related to ontologies. Basic management facilities will be implemented usingthe OWL API, while more advanced features could be implemented by dedicated APIs on top. Protégé willalso be integrated to support sophisticated ontology editing.

Feature Description WP3 WP4 WP5 WP6 WP7Import externalontology

Submit a command to store an ontology from an externalfile (serialized version) in the central data store.Additional WP requirements/information:• File formats accepted by the OWL API

- 5/Y1 5/Y2 5/Y1 -

Load ontology Submit a command to retrieve an ontology from the cen-tral data store; the ontology is returned as an objectmodel that can be processed in memoryAdditional WP requirements/information:• Besides loading a specific (domain) ontology, there is

a need to support ontology versioning. That is, given alogical URI + a version number, one should be able toaccess the corresponding version of the physical URI.

5/Y1 5/Y1 5/Y2 5/Y1 -

Ontology Edit-ing

Change or rewrite an axiom in the ontology , i.e., usingProtégéAdditional WP requirements/information:• Ontology edition will be required for the Query-driven

knowledge extension component. In principle a morespecialised environment will be required rather than asophisticated editor like Protégé.

• For Y1 Protégé is a good solution. In the future we mayuse less sophisticated ontology editing tools focusing inconcrete modifications/extensions of bootstrapped on-tology.

5/Y2 4/Y1 3/Y2 - -

Basic ontologymanipulation

Basic ontology manipulation capabilities (add/removeconcepts and axioms, . . . )Additional WP requirements/information:• Support provided by OWL API.

5/Y2 5/Y1 3/Y2 - -

Advanced ontol-ogy manipula-tion

Advanced ontology manipulation capabilities: version-ing, ontology SVN, etc.Additional WP requirements/information:• Two (or even several) versions of the ontology (e.g.

different extensions derived from the queries) shouldbe handled; one can use a versioning system to supportconflict resolution between them, merge of them, etc.

4/Y2 4/Y4 3/Y2 - -

Access/retrieveentities (basicquerying)

Obtain references to entities (classes, properties, individ-uals,. . . )Additional WP requirements/information:• Support provided by OWL API

5/Y1 5/Y1 3/Y2 - -

Advanced ontol-ogy queries

Obtain references to entities that satisfy complex condi-tions, e.g. SPARQL queries.Additional WP requirements/information:• This facility can help in the communication between

the direct editing and the QbN module (e.g. attach aSPARQL query to a concepts: SPARQL InferencingNotation (SPIN))

5/Y1 3/Y2 3/Y2 - -

36


Feature Description WP3 WP4 WP5 WP6 WP7Bootstrap an on-tology

Create an ontology from a data source; align it with theexisting ontology, etc.Additional WP requirements/information:• The ontology will be necessary for both QbN and Direct

query editing components.

3/Y1 5/Y1 3/Y2 - -

Replace im-ported ontology

Replace an existing ontology in the repository. - 4/Y1 - - -

Remove an on-tology from therepository

Remove an ontology loaded in the repository. - 4/Y1 - - -

A.2 Shared APIs for Reasoning over Ontologies

This section covers functionality related to reasoning over ontologies, which will be implemented using APIsto external reasoning. The OWL API will provide the basic functionalities to connect to external reasoningtechniques.

Feature Description WP3 WP4 WP5 WP6 WP7Standard rea-soning

Classification of classes and relations; satisfiability ofclasses, relations, and ontologies; entailments, etc.Additional WP requirements/information:• Perhaps we will use the external reasoner RacerPro.• OWL API should be enough to make use of any OWL

2 reasoner.• Required for both the QbN and Direct query editing

components.• Required for semantic validation of a bootstrapped on-

tology and the Ontology versioning module

5/Y2 5/Y1 5/Y2 4/Y2 -

Modularisation Split ontology into relevant parts.Additional WP requirements/information:• May be required for both the QbN and Direct query

editing components.• To be used for semantic validation of a bootstrapped

ontology. We need the support provided by OWL API.

3/Y2 4/Y2 3/Y2 - -

Ranking ofquery results

Rank derived answers, due to potentially high number ofthem, i.e., ranking over concepts, or individuals.Additional WP requirements/information:• May be required for the query formulation component.

3/Y2 - - - -

Justification Justify reasoning or query results; provenance, etc.Additional WP requirements/information:• Needed for abduction.• Might be needed for the query transformation compo-

nent.• To be used for semantic validation of a bootstrapped

ontology. Required for the Ontology versioning module.

3/Y2 5/Y1 5/Y2 3/Y2 -

37


Feature Description WP3 WP4 WP5 WP6 WP7Pinpointing Find axioms from which a conclusion was derived (related

to justification)Additional WP requirements/information:• Needed for abduction.• To be used for semantic validation of a bootstrapped

ontology. Required for the Ontology versioning module.

3/Y2 5/Y1 5/Y2 - -

Abduction Find what could have been the reason of the entailment.Additional WP requirements/information:• In the Siemens use case, providing diagnoses (for

events such as start failure of a turbine etc.) is the corerequirement for the diagnostic engineer. This could berealized by some abduction methodology.

• May be necessary for the ontology versioning module.

3/Y2 2/Y3 5/Y2 - -

Ontology align-ment

Align ontologies; reason over the integrated ontology, etc.Additional WP requirements/information:• May be useful for the Siemens use case if different di-

agnostic engineers want to align their individual on-tologies containing different diagnoses (for the sameevents) etc.

• Lexical alignment techniques will be necessary to linknew entities, introduced by the Query-driven ontologyextensions component, to entities in the Ontology.

• We need it to align a SOTA ontology with a boot-strapped one.

4/Y2 5/Y1 3/Y2 - -

Semantic ontol-ogy approxima-tion

Given an ontology in a language (e.g. OWL 2), approxi-mate it in a “smaller” language (e.g. OWL 2 QL).Additional WP requirements/information:• Relevant, in particular, for the Siemens use case where

the diagnostic engineer will probably use a complex on-tology language.

• It may be required in the query transformation compo-nent.

• We may need this if an extension of an ontology drivenby the query is outside the OWL 2 QL profile.

• This may be integrated with the Ontology modulariza-tion module, i.e., extract an OWL 2 QL module for thegiven signature from an OWL 2 ontology.

• Semantic approximation requires the adoption of anexisting OWL reasoner compliant with OWL API.

3/Y3 5/Y2 5/Y2 4/Y2 -

Syntactic ontol-ogy approxima-tion

Given an ontology, approximate it syntactically (i.e., dis-regarding expressive axioms).

3/Y3 5/Y1 5/Y2 4/Y2 -

38


A.3 Shared APIs for Mapping Management

This section covers functionality related to management of mappings.

Feature Description WP3 WP4 WP5 WP6 WP7Import map-pings

Submit a command to store a mapping definition froman external file in the central data store.Additional WP requirements/information:• R2RML, RDB-direct mapping, and Quest own syntax.• Text files in turtle syntax.• We need URIs to get the files.

- 5/Y1 5/Y2 5/Y1 -

Load mappings Submit a command to retrieve a mapping from the cen-tral data store; the mapping is returned as an object thatcan be processed in memory.Additional WP requirements/information:• R2RML, RDB-direct mapping, and Quest own syntax.• We need URIs to get the files.

- 5/Y1 5/Y2 5/Y1 -

Mapping editing Change or rewrite a mapping, e.g., the SQL query in-volved in it. Could be done, e.g., using Protégé.Additional WP requirements/information:• The first Optique prototype may be based on Protégé.

Future prototypes may integrate their own interfaces.

- 5/Y1 4/Y2 - -

Access/retrievemapping

Access mapping rules through a basic query language,e.g. “give me all mappings involving the ontological con-cept C”, “give me all mappings involving relational tablesfrom the database D”, etc.

- 5/Y1 3/Y2 - -

Simple mappingmanagement

Set a mapping (to join a new data source to the system),drop a mapping (e.g., when it is outdated or wrong)Additional WP requirements/information:• Required to fix bootstrapped mappings. E.g., given

two sets of mappings (one manually, one automati-cally bootstrapped) one may want to process them be-fore storing.

- 5/Y1 5/Y2 - -

Advanced map-ping manage-ment

Merge mappings; optimize mappings; split mappings; up-date mappings.Additional WP requirements/information:• May be needed for the Siemens use case when different

diagnostic engineers want to merge mappings concern-ing symptoms.

• Required to fix bootstrapped mappings.

- 5/Y3 3/Y4 - -

39


A.4 Shared APIs for Relational Data and Metadata Management

This section covers functionality required for data access as well as for meta data extraction from connectedrelational database systems such as table catalogs, information about table attributes (like datatypes), keys,foreign keys, and other dependencies.

Feature Description WP3 WP4 WP5 WP6 WP7Querying Send an SQL query to a relational database.

Additional WP requirements/information:• We require JDBC access to the source.

- 5/Y1 - 5/Y1 -

Access to MetaData

Access the meta data from a relational database.Additional WP requirements/information:• We require JDBC access to the source.• We need as much (meta) information as possible from

the datasources in order to bootstrap an initial ontol-ogy and the initial mappings from the database schema(e.g.: table catalog, attributes, datatypes, keys, foreignkeys, etc.)

- 5/Y1 - 5/Y1 -

Query contain-ment

Checking that one conjunctive query (or UCQ) is con-tained in another one. E.g., to avoid redundancy inrewriting via ontologies.

- 4/Y1 - - -

A.5 Shared APIs for RDF Data Management

All entities (the ontology, mapping, relational database metadata, . . . ) will be stored in the central storeas RDF. Using the Sesame RDF API, it will be possible to access these entities directly. Currently, theInformation Workbench already implements most of the functionalities of this API.

Feature Description WP3 WP4 WP5 WP6 WP7Storage Store custom RDF data in the central store.

Additional WP requirements/information:• Storage of predefined queries or query templates.• Storage of lexical and terminological data associated to

the ontology.• Probably storage of mappings, such as R2RML and

also RDB meta-information• We believe that other kinds of knowledge could be con-

sidered along the project (e.g., data dependencies, epis-temic constraints, identification assertions, denial as-sertions, . . . ).

4/Y1 3/Y1 2/Y2 - -

Querying Retrieve RDF data from the central store via SPARQL1.1 queries.Additional WP requirements/information:• Retrieval of predefined queries or query templates.• Queries over lexical and terminological data.

4/Y1 3/Y1 2/Y2 - -

Update Execute SPARQL 1.1 UPDATE/DELETE queries on thecentral store to manipulate RDF data in a schematic way.Additional WP requirements/information:• Updates over predefined queries or query templates.• Updates over lexical and terminological data.

4/Y2 3/Y2 2/Y2 - -

40


A.6 Shared APIs for Streaming Data Management

This section covers functionality related to storing, querying of and reasoning over data streams.

Feature Description WP3 WP4 WP5 WP6 WP7Storage Store (a fragment of) a stream, store answers over

streams.3/Y3 - 5/Y3 - -

Querying Sequential queries, continuous queries, e.g., CQL. 3/Y3 - 5/Y3 - -

Ranking ofquery results

Ranking for continuous queries. 3/Y3 - 3/Y3 - -

Reasoning overstreams

Reasoning over streams and ontologies. 3/Y3 - 4/Y3 - -

Subquery detec-tion

Subquery detection for a streamed query sublanguage(streamed UCQs) to handle many different streams.

3/Y3 - 4/Y4 - -

A.7 Shared APIs for Cloud Automation

This section covers functionality for cloud automation. This API will mostly be integrated within the ADPsolution.

Feature Description WP3 WP4 WP5 WP6 WP7Provision a VM API to provision a virtual machine in the cloud from a

VM template.Additional WP requirements/information:• Linux based OS (Debian, CentOS).• Define an instance with specific requirements (CPU,

memory, disk).• Create a specific user (e.g., ADP).• Add a specific rsa key to the user (for ssh).

- - - - 5/Y1

Delete a VM API to delete a virtual machine in the cloud. - - - - 3/Y1

Start/Stop aVM

API to start or stop a VM. - - - - 4/Y1

Send commandto a VM

Communicate with a VM via some protocol/agent.Additional WP requirements/information:• Any protocol that can execute a remote command (pre-

ferred ssh).

- - - - 2/Y1

Create a net-work

API to create a virtual network.Additional WP requirements/information:• This has low priority. We can assume that all VMs

use a pre-defined network.

- - - - 1/Y1

Cloud Protocol The protocol to communicate with the cloud provider(e.g. eCloudManager).Additional WP requirements/information:• An interface like OCCI API will be required.

- - - - 4/Y1

41

Date post:	16-Sep-2018
Category:	Documents
Upload:	lamhanh
View:	216 times
Download:	0 times

Specification of the Architecture - optique-project.eu fileExecutiveSummary:...

Documents