DELIVERABLE D4.4 Framework for Knowledge Extraction from...

DELIVERABLE

This project has received financial support from the European Union Horizon 2020 Programme under grant agreement no. 688203.

D4.4 Framework for Knowledge Extraction from

IoT Data Sources

Project Acronym: bIoTope

Project title: Building an IoT Open Innovation Ecosystem for Connected Smart Objects

Grant Agreement No. 688203

Website: www.bIoTope-project.org

Version: 1.0

Date: 2017-06-30

Responsible Partner: EPFL

Editor: Prodromos Kolyvakis

Contributing Partners: Fraunhofer, BIBA, CSIRO, AALTO:CS, eccenca and Enervent

Dissemination Level: Public X

Confidential – only consortium members and European Commission Services

Ref. Ares(2017)3188041 - 26/06/2017

D4.4 Framework for Knowledge Extraction from IoT Data Sources

© 688203 bIoTope Project Partners 2 30 June 2017

Revision History

Revision Date Author Organization Description

0.1 15/11/2016 Prodromos Kolyvakis

EPFL Initial ToC


EPFL Sections 2 – 2.4 edited


EPFL Knowledge Extraction & Fusion

Added


EPFL Knowledge Definition added & 3rd section headers

revised


EPFL Section 2.4 extended 3rd section’s content

added

0.6a 12/05/2016 Prodromos Kolyvakis

EPFL 4th section content added

0.6b 16/05/2016 Prodromos Kolyvakis

EPFL 4th section’s content extended


EPFL Section 1 and 5 revised

0.7a 31/05/2016 Prodromos Kolyvakis

EPFL ToC revised, some cross-references

fixed

0.8a 09/06/2016 Jérémy ROBERT

Uni.LU Section 3.2 revised

0.8b 15/06/2016 Jérémy Morel OpenDataSoft Section 2,3,4 revised

0.9a 16/06/2016 Robert Hellbach BIBA Whole manuscript’s review & revision

0.9b 20/06/2016 Prodromos Kolyvakis

EPFL Changes based on the received

feedback


EPFL Final Version

Every effort has been made to ensure that all statements and information contained herein are accurate, however the bIoTope Project Partners accept no liability for any error or omission in the same.



Table of Contents

1. Introduction .......................................................................................................................... 7

1.1. Scope ........................................................................................................................................ 7

1.2. Audience ................................................................................................................................... 7

1.3. Content of this Document.......................................................................................................... 7

2. State-of-the-Art in Knowledge Extraction and Fusion ............................................................. 9

2.1. Sensor Networks ....................................................................................................................... 9

2.2. Knowledge Definition .............................................................................................................. 10

2.3. Multi-sensor Data fusion and the JDL/DFIG model ................................................................... 11

2.4. Knowledge Extraction.............................................................................................................. 13

2.5. Conclusion .............................................................................................................................. 17

3. Knowledge Extraction Framework ....................................................................................... 18

3.1. Knowledge Extraction Architecture in bIoTope KnaaS framework ............................................. 19

3.2. Integration Workflow with Existing bIoTope Components ........................................................ 21

3.3. Knowledge Extraction Implementation .................................................................................... 22

4. Case Studies ........................................................................................................................ 24

4.1. Knowledge Extraction and Fusion in bIoTope use cases and pilots ............................................ 24

4.1.1. Heat Wave Mitigation: Inform Citizens of Local Heat Conditions ............................................. 25

4.1.2. Heat Wave Mitigation: Help citizens to Ventilate their Home .................................................. 27

4.1.3. Heat Wave Mitigation: Establish a correlation between traffic and temperature ................... 29

5. Conclusion and Future Work ............................................................................................... 32

6. References .......................................................................................................................... 33

7. Online References ............................................................................................................... 35

8. Annex 1. bIoTope Big Picture ............................................................................................... 36

List of Tables

Table 1: Six Levels of the JDL/DFIG model. ........................................................................................................ 13

Table 2: The main problems of the Data Fusion process. ................................................................................. 14

List of Figures

Figure 1: Internet of Things Monitoring Cycle [6]. ............................................................................................ 12

Figure 2: A taxonomy of data fusion methodologies: Different data fusion algorithms can be roughly

categorized based on one of the four challenging problems of input data that are mainly tackled: data

imperfection, data correlation, data inconsistency, and disparateness of data form [21]. ...................... 15

Figure 3: Categorization of Reasoning Systems [40]. ........................................................................................ 16

Figure 4: Conceptual Architecture of the bIoTope Knowledge Framework ...................................................... 18

Figure 5: Key processing blocks required in knowledge framework and knowledge processing ..................... 19



Figure 6: KnaaS Architecture highliting interactions betweeen other bIoTope components and Linked Open

Data. .......................................................................................................................................................... 20

Figure 7: Example workflow for KnaaS integration with bIoTope. .................................................................... 21

Figure 8: bIoTope KnaaS example ..................................................................................................................... 23

Figure 9: Goal Realization Viewpoint of Lyon's Pilot Case #2 in ArchiMate. ..................................................... 24

Figure 10: Access Local Heat Conditions & Traffic Through KnaaS ................................................................... 25

Figure 11: Visualization of local Heat Conditions through KnaaS ..................................................................... 26

Figure 12: Data Exploration of the local Heat Conditions through KnaaS ........................................................ 26

Figure 13: Gauge Charts and Graphs produced via KnaaS to visual explore the sensor streams. .................... 27

Figure 14: Node-RED flow, which displays through dashboards’ notifications and tweets the OpenWeatherMap

information. ............................................................................................................................................... 28

Figure 15: Message notification with the available information in the OpenWeatherMap. ............................ 28

Figure 16: Tweet message with the parsed information from OpenWeatherMap. ......................................... 28

Figure 17: Node-RED flow, which enables the Users to ventilate their homes accordingly. ............................ 28

Figure 18: Node for writing Python code in KnaaS............................................................................................ 29

Figure 19: Establish a correlation between traffic and temperature through KnaaS. ...................................... 30

Figure 20: MongoDB query that retrieves the geocoordinates of a specific street_name given a specific

timestamp. ................................................................................................................................................ 30

Figure 21: MongoDB query that finds the closest Netatmo local weather stations given specific geographic

coordinates. ............................................................................................................................................... 30

Figure 22: Dashboard with the computed Pearson correlation coefficients. ................................................... 31



Terms and Acronyms

API Application Programming Interface

AUCS Artificial Use Case Scenario

GSN Global Sensor Networks

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

ICO Internet-Connected Objects

ICT Information and Communication Technologies

IoT Internet of Things

KnaaS Knowledge as a Service

O-MI Open Messaging Interface

O-DF Open Data Format

OWL Ontology Web Language

RDF Resource Description Format

REST Representational State Transfer

RFID Radio Frequency Identification

SOA Service Oriented Architecture

SOAP Simple Object Access Protocol

SPARQL SPARQL Protocol and RDF Query Language

SQL Structured Query Language

URI Uniform Resource Identifier

XML eXtensible Markup Language



Executive Summary

The bIoTope project lays the foundation for creating open innovation ecosystems supporting the Internet of Things (IoT) by providing a platform that enables companies to easily create new IoT systems and to rapidly harness available information using advanced Systems-of-Systems (SoS) capabilities for connected smart objects and easily creating innovative business processes. The main purpose of this deliverable is to propose an architecture and implement a framework that is capable to let the users to extract valuable knowledge out of IoT data sources and thus permitting the user to gain insight into the interactions of various phenomena of the everyday life. Toward this direction, a plethora of knowledge extraction and data fusion mechanisms are presented and implemented inside the Knowledge as a Service (KnaaS) framework. In order to measure the effectiveness of the proposed conceptual framework as well as the prototype implementation of it various scenarios of the Lyon’s Heat Wave Mitigation use case are examined and concrete implemented solutions are provided. More specifically, the studied scenarios show how the data fusion and the knowledge discovery capabilities can be used to inform the citizens of Urban Community of Lyon about the local heat conditions of their region and help the citizens ventilate their homes accordingly. In addition, it is demonstrated how a service can be set up through the KnaaS framework that enables to establish the correlation between traffic and temperature data. It should be highlighted that the Lyon’s Heat Wave Mitigation Pilot use case was chosen as it was considered the most representative as far as the knowledge extraction process is concerned. However, the work presented in this deliverable is by no means specific to this use case. The deliverable takes into account use cases and interaction scenarios defined in Deliverable D2.1 as well as the Conceptual Architecture of the bIoTope Knowledge Framework introduced in Deliverable D4.2, and shows how the proposed approach supports them. From a technology perspective, the contributions of this deliverable are in accordance with the activities of WP5, and the work we report in this deliverable has been closely coordinated with D5.4 (led by Fraunhofer). Last but not least, special consideration was given so as the Knowledge as a Service framework presented in this deliverable to provide a concrete integration of the O-MI/O-DF RDF Integration Server (OORI) Server introduced in D5.2 with the aim to (a) exploit the advanced querying capabilities offered by it inside the KnaaS framework and (b) to facilitate the interaction of User Interaction (UI) as a Service and Knowledge as a Service towards the creation of a sustainable bIoTope ecosystem.



1. Introduction

1.1. Scope

This deliverable illustrates some of the core features of the prototype reference implementation of Knowledge

Extraction Framework from IoT data sources, which enables the efficient knowledge discovery in the bIoTope

ecosystem through a plethora of different extraction and fusion algorithms. The Knowledge Extraction

Framework consists the key technology enabler of KnaaS, and from a practical perspective it could be seen as

the key realization component of KnaaS. From a theoretical perspective, the KnaaS framework, however, is

more general as it defines also the key requirements that any possible third-party service providers, who are

willing to provide a domain-specific KnaaS services and join bIoTope ecosystems should confront with1.

In order to understand the key challenges of the bIoTope’s Knowledge Extraction Framework as well as the

design choices, this document deals with the following questions:

How the knowledge extraction can be formally defined?

Which are the primary functions of a Knowledge Extraction Framework so as to play such a role?

Can the reasoning mechanisms offered by the Knowledge Extraction Framework be exhaustive?

Which types of interfacing and interaction mechanisms should be implemented between the

Knowledge as a Service framework and the other bIoTope components?

1.2. Audience

The target audience of the deliverable includes groups within and outside the consortium, in particular:

Researchers, developers and integrators of the bIoTope consortium: This deliverable illustrates important aspects of the bIoTope context service formulation and delivery and will therefore serve as valuable input for stakeholders within the bIoTope consortium, notably stakeholders that work on the design of the bIoTope infrastructure and/or its implementation in the scope of the bIoTope open source project.

Researchers within other IERC and IoT EPI projects: The deliverable illustrates some of the core implementation concepts of bIoTope and will therefore be of interest to researchers in other IERC and IoT- European Platforms Initiative (IoT-EPI) projects, notably researchers working on projects that interact closely with bIoTope.

Researchers working on IoT: The deliverable will be also of interest to broader groups of IoT researchers, since it provides new insights into IoT open innovation ecosystem (e.g., sensors / cloud computing) integration. As a public document, the deliverable will be accessible to such groups.

Open source community: In the medium term, bIoTope intends to build an open source community based on the bIoTope IoT platform ecosystem. This deliverable may serve as a guide to some of the introductory, yet important topics and functionalities of bIoTope.

1.3. Content of this Document

This deliverable reports our work on the implementation of the core knowledge extraction and data fusion

mechanisms of the Knowledge Extraction Framework paving the way to the bIoTope’s Knowledge as a Service

(Knaas) Framework. After reporting the state-of-the-art of knowledge extraction in IoT, this report presents

the key features of the knowledge extraction functionalities, while illustrating the interaction between the

1 For further information, please refer to chapter 3.



other bIoTope components. For the purpose of demonstrating the strengths of the bIoTope’s KnaaS

framework and helping the readers understand the proposed approach, different scenarios of Heat Wave

Mitigation Use Case of the Urban Community of Lyon are examined and implemented in the frame of KnaaS.

The rest of this deliverable is structured as follows: Chapter 2 provides state-of-the-art definitions and

discussions of knowledge, knowledge reasoning, knowledge extraction and knowledge fusion, particularly

taking into account their applications on Internet of Things. Chapter 3 deals with the core functionalities of

the Knowledge Extraction Framework as well as with its reference implementation. In chapter 4, the

application of these functionalities is illustrated by demonstrating step-by-step how the provided prototype

implementation can provide solutions in the Heat Wave Mitigation Pilot Use Case of the Urban Community of

Lyon. Finally, chapter 5 summarizes the work demonstrated in this deliverable and presents the future work.



2. State-of-the-Art in Knowledge Extraction and Fusion

The essence of an IoT ecosystem is to enable the secure connection of a multitude of heterogeneous sensing

and actuating devices, having different constraints and capabilities, to the Internet. In the absence of de-facto

communication standard(s), the sensing and actuating devices by different vendors may subscribe to different

interaction patterns, and may implement different subsets of available communication protocols. As a result,

arguably, the value of an IoT ecosystem grows proportionally with the number and the versatility of the

supported devices [1]. The current IoT solutions address the issue of interfacing heterogeneous devices

differently. Generally, the interoperability with devices is ensured either by implementing a gateway that can

be expanded, e.g., with the help of plugins, to support new types of devices whenever needed, or by

mandating the device vendors to use protocols from a limited set of supported ones.

Note however that either the heterogeneity of supported devices is limited, or the use of a gateway is

necessary. In a recent IoT gap analysis [1] the authors concluded that “in order to streamline the integration

of new device types, standard object models for IoT devices” and universal messaging standards should be

integrated widely in a IoT ecosystem. For a smooth integration with sensing and actuating devices, it is

essential that the IoT communities ought to establish standardized protocols that would enable the

publication, consumption and composition of heterogeneous information sources and services from across

various platforms. To this end, bIoTope takes full advantage of recent Open API standards for the IoT [2],

notably O-MI (Open Messaging Interface) [3] and O-DF (Open Data Format) [4], which can be extended with

more specific vocabularies, e.g. using semantic web and ontology technologies.

This architectural choice consists of an innovative action towards the realization of an IoT ecosystem and

facilitates the integration of heterogeneous data sources, the fusion of these sensor data – sensor fusion – as

well as the knowledge discovery and reasoning upon the available sensor data – knowledge extraction. In the

rest of this chapter, we will provide the background knowledge required for the next sections and we will

provide a summary of state-of-the-art discussions on knowledge extraction.

2.1. Sensor Networks

Sensor networks are the major enabler of the IoT. A sensor can be defined as a device that detects or measures

a physical phenomenon such as humidity, temperature, etc. A sensor node is a physical platform that hosts

one or more sensors. Each sensor node has the capability to sense, communicate and process data. A typical

sensor network [5] comprises two or more sensor nodes which communicate between each other using wired

and wireless means. In sensor networks, sensors can be homogeneous or heterogeneous. Multiple sensor

networks can be connected together through different mechanisms. One such approach is through the

Internet.

Typically, sensor nodes are deployed in densely manner around the phenomenon, which we want to sense

[5].These sensor nodes are low-cost and small in size, that enable large deployments. Sensor network is not a

concept that emerged with the IoT. The concept of sensor network and related research existed long time

before the IoT was defined. This can be clearly seen when we evaluate the literature in the field. However,

with the emergence of the IoT, it has facilitated the mainstream adoption of sensor network as a major

technology used to realise the IoT vision.

In recent times, another widely recognised source of sensor data is obtained from mobile smart devices. The

ubiquitous nature of mobile smart devices such as smartphones, tablets, smart watch to name a few and the

availability of cheap embedded sensors have completely revolutionised the smart city application dimensions

[6].



2.2. Knowledge Definition

In the literature, there is a wide range of definitions and unspoken agreement of what knowledge is. Starting

with Hintikka’s [7] pioneering contribution, the notions of knowledge and belief have been studied extensively

in the literature. Bonanno’s definitions of information and knowledge constitute one of the most acceptable

formal definitions in the literature [8]. Apart from providing just a conceptual framework for information and

knowledge Bonanno provides a strict axiomatic system of what knowledge and information is [8], [9].

According to Bonanno, “information is modelled as possibilities consistent with signal received from the

environment. Knowledge is obtained by reasoning about the signals received as well as those that are missing”

[8].

One of the most important aspects of Bonanno’s model is that the absence of information could also be

valuable input for deriving knowledge. This is explained based on the argument presented below, adapted

from Conan Doyle’s Silver Blaze Mystery. “In the dead of night, someone removed the horse Silver Blaze from

the stable in which he was kept. Footprint found outside the stable match those of two individuals, who

therefore become the primary suspects. During the investigation, Scotland Yard Inspector Gregory asks

Sherlock Holmes” [8], [10]:

Gregory: ‘Is there any other point to which you should wish to draw my attention?’

Holmes: ‘To the curious incident of the dog in the night-time’.

Gregory: ‘The dog did nothing in the night-time’.

Holmes: ‘That was the curious incident’.

Holmes deduces from the fact that the dog did not bark (absence of a signal) that the thief must have been

known to the dog and is therefore able to eliminate one of the two suspects of these grounds [8], [9]. We

present the basic definitions of signal, information and knowledge based on the Bonanno’s model.

Information can be thought as possibilities associated (or consistent) with signals (sensor data) received from

the environment. Let 𝛺 be a set of states and 𝛺𝛫 be the set of known states, while 𝛺𝛫 ⊆ 𝛺. For instance,

𝛺 could be the set of all diseases and 𝛺𝛫 be the set of known diseases (e.g. the set of diseases in some

database). Let 𝛴 be a set of signals (sensor data) and 𝜎: 𝛺 → 2𝛴 (where 2𝛴 denotes the set of subsets of 𝛴)

be a function that associates with every state ω the set of signals produced by ω. In the example where

states are identified with diseases, signals can be thought of as symptoms so that 𝜎(𝜔) is the set of

symptoms associated with disease ω [8].

We define signal as anything that alters the physical environment. Namely, there is an objectively

measurable difference between the situation when the signal is present and when the signal is absent. If we

consider an example from the medical domain, signals can be thoughts as sensor data e.g., auscultation lungs

sound, radiography, etc. On the other hand, according to this terminology, the absence of a signal is not itself

a signal. This allows us to think of information as signals received and to represent knowledge as inference

based on the signals that are present as well as those that are absent.

Definition: The information function 𝐼: 𝛺 → 2𝛺 is given by: 𝐼(𝜔) = {𝜔′ ∈ 𝛺𝛫 ∶ 𝜎(𝜔′) ⊇ 𝜎(𝜔)}.

Thus 𝐼(𝜔) is the set of known states that are compatible with the signals produced by true state ω, in the

sense that those states would also have produced those signals (although they might have more signals

associated with them). For example, one can imagine a database of known diseases and their associated

symptoms and a computer program which receives as input the patient’s symptoms and gives as output the

list of diseases in the database that manifest all those symptoms. The next step for a careful doctor would

be to research each of the reported possible diseases and eliminate as true possibilities all those diseases



that had extra symptoms not exhibited by the patient. This step, corresponding to the notion of knowledge

derived from information [8]. Although, the treatment appeared in [8], [9] defines a lot of additional axioms

that information function should fulfill like secondary reflexivity, transitivity, etc., we have chosen to omit

them for the sake of simplicity.

Based on the above definitions, we will proceed in defining what the knowledge is. Let 𝛫: 𝛺 → 2𝛺 be the

knowledge function. 𝐾(𝜔) is interpreted as the set of states that, based on what someone knows, the

individual cannot rule out when the state is ω. We should emphasize on the fact that we did not impose any

additional requirements on the steps in reasoning, deduction, induction, and abduction (for further

information please see section 2.4). We impose only the following requirements [8]:

1. Knowledge should be based on information received. 2. Knowledge should reflect reasoning about the information. 3. Knowledge should be derived exclusively from the available information.

Based on the above definition and requirements, we have proceeded in illustrating the key concepts of

Bonanno’s model of how information and knowledge can be defined. Of course, we did not aim at a complete

presentation of Bonanno’s theory. For a strict and formal definition of information and knowledge, which is

confronted with the above statements, please refer to [8], [9]

2.3. Multi-sensor Data fusion and the JDL/DFIG model

“By 2020, wirelessly networked sensors in everything we own will form a new Web. But it will only be of value

if the “terabyte torrent” of data it generates can be collected, analysed and interpreted” [11]. IoT would

produce substantial amount of data [12] that are less useful unless we are able to derive knowledge using

them. Data fusion is a data processing technique that associates, combines, aggregates, and integrates data

from different sources. It helps to build knowledge about certain events and environments, which is not

possible by using individual sensors separately. Data fusion also helps to build a context-awareness model that

helps to understand situational context [6]. The boundary between sensor fusion and sensor integration is

quite fuzzy and the terms are used interchangeably sometimes. Joshi and Sanderson describe multi-sensor

fusion as part of the multi-sensor integration process [13]. This process refers to the synergistic use of multiple

sensors to improve operation of the system as a whole and includes sensor planning and sensor architecture.

Data fusion in the IoT environment consists one of the most important challenges that need to be addressed

to develop innovative services. In particular, in smart cities applications, when 50 to 100 billion devices start

sensing [14], it would be essential to fuse, and reason about the data automatically and intelligently. A recent

work from a group of researchers from MIT [15] demonstrate the potential of fusing data from disparate data

sources in smart city to understand a city’s attractiveness. The work focuses on cities in Spain and shows how

the fusion of big data sets can provide insights into the way people visit cities. Such a correlation of data from

a variety of data sources play a vital role in delivering services successfully in smart cities of the future. Fusion

is a broad term that can be interpreted in many ways. Hall and Llinas [16] have defined the sensor data fusion

as a method of combining sensor data from multiple sensors to produce more accurate, more complete, and

more dependable information that could not be possible to achieve through a single sensor. Nakamura et al.

[17] have defined data fusion based on three key operations: complementary, redundant, and cooperative.

Complementary means putting bits and pieces of a large picture together. A single sensor cannot say

much about the environment as it would be focused on measuring a single factor such as temperature.

However, when we have data sensed through a number of different sensors, we can understand the

environment in a much better way.



Redundant means that same environmental factor is sensed through different sensors. It helps to

increase the accuracy of the data. For example, averaging the temperature value sensed by two

sensors located in the same physical location would produce more accurate information compared to

a single sensor. It also reduces the amount of data that need to be handled as it combines the two

set of data streams together.

Cooperative operations combine the sensor data together to produce new knowledge. For example,

reading RFID tags recorded in a supermarket can be used to identify the events such as shoplifting.

Let’s consider a scenario where RFID reader in a supermarket shelf detects that an item has been

removed from a shelf. The RFID sensor in the counter does not see the object during payments. Later,

the RFID sensor in the exit door detects the item that was removed from the shelf earlier. This

sequence of actions can be simply inferred as a shoplifting event.

The ultimate goal of sensor data fusion is to understand the environment and act accordingly. This can be

defined as a cycle as shown in Figure 1. It is called the Internet of Things Monitoring Cycle [6]. It has five steps:

Collection, Collation, Evaluation, Decide, and Act. IoT monitoring cycle has been derived by combining the

Intelligence Cycle [18] and the Boyd Control Loop [19]. These Collection step collects raw data from sensors

and other IoT data sources (Social media, smart city infrastructure, mobile devices etc.). The Collation step

analyse, compare and correlate the collected data. The Evaluation step fuses the data in order to understand

and provide a full view of the environment. The Decide step decides the actions that need to be taken. The

Act step simply applies the actions decided at the previous step. The Act step includes actuator control as well

as sensor calibration and re-configuration [6].

In an attempt to unify the terminology associated with data and information fusion, the Joint Directors of

Laboratories formed the Data Fusion Subpanel (which later became known as the Data Fusion Group)

developed the JDL fusion model [20]. With subsequent revisions, it is the most widely used system for

understanding data fusion processes. The goal of this model is to facilitate understanding and communication

among researchers, designers, developers, evaluators, and users of data and information fusion techniques

to permit cost-effective system design, development, and operation [21]. The JDL/DFIG model differentiates

data fusion functions into a set of fusion levels and provides a useful distinction among data fusion processes

that relate to the refinement of objects, situations, threats, and processes at several levels. With the advent

of the World Wide Web, data fusion thus included data, sensor, and information fusion [21]. The JDL/DFIG

introduced a model of data fusion that divided the various processes. Currently, the six levels with the Data

Fusion Information Group (DFIG) model are:

Figure 1: Internet of Things Monitoring Cycle [6].



Table 1: Six Levels of the JDL/DFIG model.

Levels Description

Level 0:

Source Preprocessing/

subject Assessment

Estimation and prediction of signal/object observable states on the basis of pixel/signal level data association and characterization.

Level 1:

Object Assessment

Estimation and prediction of entity states on the basis of observation-to-track association, continuous state estimation (e.g. kinematics) and discrete state estimation (e.g. target type and ID).

Level 2:

Situation Assessment

Estimation and prediction of relations among entities, to include force structure and cross force relations, communications and perceptual influences, physical context, etc.

Level 3:

Impact Assessment

Estimation and prediction of effects on situations of planned or estimated/predicted actions by the participants; to include interactions between action plans of multiple players (e.g. assessing susceptibilities and vulnerabilities to estimated/predicted threat actions given one’s own planned actions).

Level 4:

Process Refinement

Adaptive data acquisition and processing to support mission objectives.

Level 5:

User Refinement

Adaptive determination of who queries information and who has access to

information (e.g. information operations) and adaptive data retrieved and

displayed to support cognitive decision making and actions (e.g. human computer

interface) [22].

Although the JDL Model (Level 1–4) is still in use today, it is often criticized for its implication that the levels

necessarily happen in order and also for its lack of adequate representation of the potential for a human-in-

the-loop [23]. The DFIG model (Level 0–5) explored the implications of situation awareness, user refinement,

and mission management [24]. Despite these shortcomings, the JDL/DFIG models are useful for visualizing the

data fusion process, facilitating discussion and common understanding [25], and important for systems-level

information fusion design [24].

2.4. Knowledge Extraction

Knowledge extraction – also known as Knowledge Discovery and Data Mining (KDD) – concerns the creation

of knowledge out of information coming from structured and unstructured data sources. Examples of

structured sources include relational databases, NoSQL, or XML format data, whereas unstructured sources

can include text data or documents and images, which are largely accessible via the Web. It is an

interdisciplinary area focusing upon methodologies for extracting useful knowledge from data. The ongoing

rapid growth of online data due to the Internet and the widespread use of databases have created an immense

need for KDD methodologies. The challenge of extracting knowledge from data draws upon research in

statistics, databases, pattern recognition, machine learning, data visualization, optimization, and high-

performance computing, to deliver advanced business intelligence and web discovery solutions. In bIoTope,

ecosystem applications will be accessed through O-MI/O-DF compatible wrappers2. Hence, this approach

facilitates the exchange of information and permits the knowledge extraction process to be performed in a

uniform manner. However, the knowledge extraction remains a key challenge as it requires a bunch of

different advanced techniques to be performed upon the data so as to gain valuable insight into the data. Data

Fusion, which is a very important aspect of Knowledge Extraction, is a bunch of data processing techniques

that permits the association, combination, aggregation, and integration of data coming from different and

heterogeneous data sources. There are a number of issues that make data fusion a challenging task. The

2 For further information, please refer to the section 2.3 in Deliverable D4.2.



majority of these issues arise from the data to be fused, imperfection and diversity of the sensor technologies,

and the nature of the application environment as following:

Table 2: The main problems of the Data Fusion process.

Data Fusion

Main Problems

Description

Data

Imperfection

Data provided by sensors is always affected by some level of impreciseness as well as

uncertainty in the measurements. Data fusion algorithms should be able to express such

imperfections effectively, and to exploit the data redundancy to reduce their effects.

Outliers and

spurious data

The uncertainties in sensors arise not only from the impreciseness and noise in the

measurements, but are also caused by the ambiguities and inconsistencies present in

the environment, and from the inability to distinguish between them [26]. Data fusion

algorithms should be able to exploit the redundant data to alleviate such effects.

Conflicting data

fusion of such data can be problematic especially when the fusion system is based on

evidential belief reasoning and Dempster’s rule of combination [27]. To avoid producing

counter-intuitive results, any data fusion algorithm must treat highly conflicting data

with special care.

Data modality

Sensor networks may collect the qualitatively similar (homogeneous) or different

(heterogeneous) data such as auditory, visual, and tactile measurements of a

phenomenon. Both cases must be handled by a data fusion scheme.

Data

Correlation

This issue is particularly important and common in distributed fusion settings, e.g.

wireless sensor networks, as for example some sensor nodes are likely to be exposed to

the same external noise biasing their measurements. If such data dependencies are not

accounted for, the fusion algorithm, may suffer from over/under confidence in results.

Data

alignment/

registration

Sensor data must be transformed from each sensor’s local frame into a common frame

before fusion occurs. Such an alignment problem is often referred to as sensor

registration and deals with the calibration error induced by individual sensor nodes. Data

registration is of critical importance to the successful deployment of fusion systems in

practice.

Data

association

Multi-target tracking problems introduce a major complexity to the fusion system

compared to the single-target tracking case [28]. One of these new difficulties is the data

association problem, which may come in two forms: measurement-to-track and track-

to-track association. The former refers to the problem of identifying from which target,

if any, each measurement is originated, while the latter deals with distinguishing and

combining tracks, which are estimating the state of the same real-world target [29].

Processing

framework

Data fusion processing can be performed in a centralized or decentralized manner. The

latter is usually preferable in wireless sensor networks, as it allows each sensor node to

process locally collected data. This is much more efficient compared to the

communicational burden required by a centralized approach, when all measurements

have to be sent to a central processing node for fusion.

Operational

timing

The area covered by sensors may span a vast environment composed of different

aspects varying in different rates. Also, in the case of homogeneous sensors, the

operation frequency of the sensors may be different. A well-designed data fusion

method should incorporate multiple time scales in order to deal with such timing

variations in data. In distributed fusion settings, different parts of the data may traverse

different routes before reaching the fusion center, which may cause out- of-sequence

arrival of data. This issue needs to be handled properly, especially in real-time

applications, to avoid potential performance degradation.



Static vs.

dynamic

phenomena

The phenomenon under observation may be time-invariant or varying with time. In the

latter case, it may be necessary for the data fusion algorithm to incorporate a recent

history of measurements into the fusion process [13]. In particular, data freshness, i.e.,

how quickly data sources capture changes and update accordingly, plays a vital role in

the validity of fusion results.

Data

dimensionality

The measurement data could be preprocessed, either locally at each of the sensor nodes

or globally at the fusion center to be compressed into lower dimensional data, assuming

a certain level of compression loss is allowed. This preprocessing stage is beneficial as it

enables saving on the communication bandwidth and power required for transmit- ting

data, in the case of local preprocessing [30], or limiting the computational load of the

central fusion node, in the case of global preprocessing [31].

While many of these problems have been identified and heavily investigated, no single data fusion algorithm

is capable of addressing all the aforementioned challenges. The variety of methods in the literature focus on

a subset of these issues to solve, which would be determined based on the application in hand. Our

presentation of data fusion literature is organized according to the taxonomy shown in Figure 2. This taxonomy

represented in the work of [32], illustrates an overview of data-related challenges that are typically tackled by

data fusion algorithms. The input data to the fusion system may be imperfect, correlated, inconsistent, and/or

in disparate forms/modalities. Each of these four main categories of challenging problems can be further

subcategorized into more specific problems, as shown in Figure 2. For each of one of these categories many

approaches have been developed to tackle with each specific aspect. For instance, a number of mathematical

theories are available to represent data imperfection [33], such as probability theory [34], fuzzy set theory

[35], [36], possibility theory [37], rough set theory [38], and Dempster– Shafer evidence theory (DSET) [39]. It

is clear that none of these techniques can substitute the others. Thus, the art of data fusion is more about

choosing among a set of appropriate fusing tools rather than choosing one to solve each specific task.

Figure 2: A taxonomy of data fusion methodologies: Different data fusion algorithms can be roughly categorized based on one of the

four challenging problems of input data that are mainly tackled: data imperfection, data correlation, data inconsistency, and

disparateness of data form [21].

With regard to knowledge creation from heterogeneous data sources, the knowledge will be the result of

some inference procedures. Inferences are steps in reasoning, moving from premises to conclusions. Charles

Sanders Peirce divided inference into three kinds: deduction, induction, and abduction. Another one more



refined categorization of reasoning systems is provided in [40]. Its categorization is depicted in Figure 3.

According to this, reasoning is further divided in First Order Logic Reasoning, Probabilistic Reasoning, Causal

Reasoning, Newtonian Mechanics, Spatial Reasoning and Nom-falsifiable reasoning. Deduction is inference

deriving logical conclusions from premises known or assumed to be true, with the laws of valid inference being

studied in logic (First Order Logic in Figure 3).

Figure 3: Categorization of Reasoning Systems [40].

Abduction is inference to the best explanation. Statistical inference uses mathematics to draw conclusions in

the presence of uncertainty. This generalizes deterministic reasoning, with the absence of uncertainty as a

special case. Statistical inference uses quantitative or qualitative (categorical) data which may be subject to

random variations. Towards that direction, Machine learning explores the study and construction of

algorithms that can learn from and make predictions on data [41]. Machine learning is a subfield of computer

science that evolved from the study of pattern recognition and computational learning theory in artificial

intelligence. In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the

ability to learn without being explicitly programmed" [42]. Such algorithms operate by building a model from

example inputs in order to make data-driven predictions or decisions, rather than following strictly static

program instructions.

Causal reasoning is a well-known expressive limitation of probabilistic reasoning. For instance, “we can

establish a correlation between the events “it is raining” and “people carry open umbrellas”. This correlation

is predictive: if people carry open umbrellas, we can be pretty certain that it is raining3. But this correlation

tells us little about the consequences of an intervention: banning umbrellas will not stop the rain” [40]. For

instance, classical mechanics is an extremely successful example of causal reasoning system and an example

of what is called as Newtonian mechanics reasoning. Spatial reasoning provides answers to question such as

“How would a visual scene change if one changes the viewpoint or if one manipulates one of the objects in

the scene?” [40]. Spatial reasoning does not require the full logic apparatus but certainly benefits from the

definition of specific algebraic constructions [43]. Last, History provides countless examples of reasoning

systems with questionable predictive capabilities, such as Astrology, Mythology, etc. Just like non-falsifiable

statistical models, non-falsifiable reasoning systems are unlikely to have useful predictive capabilities [40],

[44].

There are two ways to face such a universe of reasoning systems. One approach would be to identify a single

reasoning framework strictly more powerful than all others. Whether such a framework exists and whether it

leads to computationally feasible algorithms is unknown [40]. As in the case of data fusion, the creation of

knowledge out of information, which comes in a semantically annotated or in raw data form may require a lot

of different reasoning techniques to be employed on the available data. This highlights the importance of a

Knowledge as a Service to provide various different mechanisms of reasoning upon data. Last but not least, as

it is very difficult to discover a single reasoning framework more powerful than all others, special consideration

should be given to be easily extendible and adaptable.

3 An alternative conclusion may be that the weather is sunny. Cultural and local aspects play a significant role in this example as well.



2.5. Conclusion

In this chapter, we have provided the background knowledge required for the next sections and we have given

a summary of state-of-the-art discussions in data fusion and knowledge extraction. We have begun by defining

information and knowledge to share a common understanding along the whole chapter, as these notions

correspond to the main concepts that data fusion and knowledge discovery work upon. We have explained in

details the number of issues that make data fusion a challenging task and that the majority of these issues

arise from the data’s imperfection and diversity of the sensor technologies. In addition, we provided an

overview of the key reasoning mechanisms as well as the key difficulties encountered in identifying a single

reasoning or data fusion framework strictly more powerful than all others. This knowledge will be valuable for

understanding the key architectural and implementation choices followed by both the conceptual architecture

of the Knowledge as a Service and its prototype implementation as described in chapter 3.



3. Knowledge Extraction Framework

In deliverable D4.2 – Knowledge Representation and Inference Framework – the conceptual architecture of

‘Knowledge as a Service (KnaaS)’ was described and there was provided a recommendation architecture,

depicted in Figure 4. In this deliverable we will go in further details regarding the “demystification” of the

Knowledge Extraction Framework as well as the whole Knowledge as a Service framework. Special concern

will be given to the connection with the ‘bIoTope Ecosystem’ as well as their implementation design.

It should be highlighted that among the bIoTope Ecosystem the O-MI4 and O-DF5 standards will be the ‘glue

technologies’ that manages to make all the IoT devices and platforms interoperable. With this in mind, the

communication using O-MI/O-DF is the key requirement that any system or service should confront with. From

the point of view of ‘bIoTope Ecosystem6’, Knowledge as a Service (KnaaS) also serves as a recommendation

architecture paradigm. Other third-party service providers, who are willing to provide a domain-specific KnaaS

services, might be able to join bIoTope ecosystems by making their platform/software systems compliant at

least to the O-MI/O-DF communication, the recommended semantic vocabularies used in the bIoTope

ecosystem, etc.

Based on the state-of-the art discussion and the analyses well described in [45], [46], four main functions are

identified in order for any sensor data to be integrated within a knowledge framework. The functional blocks

are summarized below:

1. The capability of gathering relevant data from any kind of data sources, e.g., structured and/or

unstructured.

4 https://www2.opengroup.org/ogsys/catalog/C14B 5 https://www2.opengroup.org/ogsys/catalog/C14A 6 Please refer to “bIoTope Big Picture” in Annex I.

Figure 4: Conceptual Architecture of the bIoTope Knowledge Framework

https://www2.opengroup.org/ogsys/catalog/C14B

https://www2.opengroup.org/ogsys/catalog/C14A



2. The availability of annotating heterogeneous data for the purpose of being integrated and understood

in an inference engine, e.g., probabilistic, first order logic reasoning, etc.

3. The capacity of inferring new information from data, and managing the inferred knowledge.

4. The possibility of annotating relevantly the response of knowledge services for the purpose of posting

the answer on user devices or activating operations as well as republishing it.

Thus, the KnaaS will offer to possibility to an agent, either user or a software program, to gather relevant data,

to further annotate throughout the intermediate steps the heterogeneous data, inferring new information

out of the incoming information and markup relevantly the response of knowledge services. We would like to

highlight that as O-DF will be the basic data format to exchange information among the bIoTope ecosystem, a

lot of consideration is given into integrating the semantic annotation into O-DF messages. For that reason, we

have mentioned previously to further annotate the heterogeneous data coming to the KnaaS7. Last but not

least, the second requirement is not restrictive as far as the reasoning techniques are concerned. Thus, the

inference capabilities of bIoTope’s KnaaS do not aim to be exhaustive. It is clear that if a new platform, which

offers ‘Knowledge as a Service’ capabilities enters the bIoTope ecosystem will probably provide similar or

complementary inference capabilities. This is in accordance with the overall objective of bIoTope, which is to

create a system of systems where information from cross domain platforms, devices and other information

sources can be accessed when, and as needed using standardized open APIs.

The above-mentioned functional building blocks are sequentially arranged on Figure 5 respecting the order of

processing.

3.1. Knowledge Extraction Architecture in bIoTope KnaaS framework

We provide a prototype implementation that has some knowledge fusion and inference capabilities, using the

KnaaS conceptual architecture presented in Figure 4. To be in accordance with both the ‘Everything as a

Service’ principles as well as the bIoTope’s open APIs requirement we have a chosen a service-based approach,

i.e, a Knowledge Service Request will be sent to KnaaS and the communication will be achieved through O-

MI/O-DF.

Figure 68 provides an overview of the components that KnaaS offers (indicated by the dashed green line) and

their interfaces. Inside the KnaaS, different knowledge extraction mechanisms are provided in order to fuse

and/or reason upon the available information. The KnaaS consists of a visual programming interface in which

different knowledge extraction algorithms can be deployed through a drag-and-drop mechanism. At the same

time, the KnaaS framework permits the direct implementation of advanced knowledge extraction methods,

7 For further information, please refer to the 2.3 section in Deliverable D4.2. 8 For the creation of Figure 6 icons from www.flaticon.com and the Linking Open Data cloud diagram 2017 [54] were used.

Knowledge

inferencing

Data from

context, big

data, other

sources

Semantic

annotation

Response

markup

Figure 5: Key processing blocks required in knowledge framework and knowledge processing

http://www.flaticon.com/



apart from the ones provided as a black-box in the visual programming interface. The coding of these custom

functions could be performed either in the JavaScript or in the Python programming language. Advanced

knowledge extraction mechanisms are offered in the prototype implementation such as: dimensionality

reduction, clustering algorithms, classification and/or regression algorithms, SPARQL queries, etc. Last but not

least, it provides abstract interfaces in order to parse O-DF formatted data and query the O-MI nodes. In Figure

6, the interaction of KnaaS extraction framework with another bIoTope component (OORI Server) is shown as

well as with the Linked Open Data Cloud. The OORI server consists of a conversion service component, which

receives the URI of an O-MI node and the O-MI/O-DF XML-structure describing the objects that should be

converted into their RDF representation. It is presented in further details in deliverable D5.2 and consists one

of the main results of WP5. KnaaS exploits the OORI server in order to perform advanced SPARQL queries on

O-MI nodes, and extract valuable information out of the new semantic representation of the O-DF messages.

The role of IoTBnB and the interactions with the KnaaS will be described in details in the next section.

In order to make more clear the role of KnaaS inside the bIoTope ecosystem, we will describe an artificial use

case scenario to demonstrate its capabilities as well as its usage. Later in chapter 4, a real use case scenario

and implementation will be provided addressing major aspects of the Greater Lyon’s Heat Wave Mitigation

use case.

A User wants to extract knowledge out of available information provided through either an O-MI node

or data available on the Web, i.e., Linked Open Data, etc. For that reason, the user wants to use the

bIoTope’s KnaaS. The user chooses the available knowledge fusion and knowledge reasoning black

boxes that are provided and combines them in an easy, visual and as far as possible without

programming way. In the next, the user provides a service request in O-DF, which is able to ‘fire’ the

execution of this specific KnaaS provided solution. Last but not least, the markup of the answer should

be added, and to be encoded in an O-MI message if it will be exposed in the bIoTope ecosystem. In

the future, an interested agent, either another person or a software program, could access this

knowledge service in an automated way without programming and harvesting the resulted

knowledge. This artificial conceived scenario is in full accordance with the conceptual architecture of

Figure 6: KnaaS Architecture highliting interactions betweeen other bIoTope components and Linked Open Data.



the bIoTope Knowledge Framework (Figure 4) as well as with the four requirements provided in

Section 3.9

3.2. Integration Workflow with Existing bIoTope Components

Figure 7: Example workflow for KnaaS integration with bIoTope.

Figure 7 describes the systems involved in our proposed approach and the information flow between them.

On the left hand side of the figure, the IoT Environment of a user, exemplarily called “Jeremy”, is illustrated

by a dashed-line circulated area. It consists of an O-MI node and the user itself, interacting with another

system, the IoTBnB platform. This platform enables to index and expose the IoT data/service description so as

the IoT stakeholders can easily find IoT data/services. The systems within the dashed-lined grey areas already

exist and numerous instances are deployed on the Web. Our contribution in this deliverable, the Knowledge

as a Service – KnaaS– is outlined on the right hand side of Figure 7 using the green dashed line.

A typical workflow of KnaaS integrated in the bIoTope ecosystem could look as follows:

1. Jeremy decides to expose IoT data/service through an O-MI node/server. To increase the visibility of his data/service (possibly based on financial motives), he registers his node on the IoTBnB marketplace.

2. As Jeremy wants also to publish/expose and potentially sell the results of more advanced data analysis based on his own data, he purchases — maybe for free, through the IoTBnB marketplace — the access to the KnaaS component.

3. To benefit from this service, he needs to provide some information to the KnaaS such as his O-MI node URL, the associated infoItems on which he wants a knowledge extraction, data fusion, etc.

9 Later references will be provided in the rest of the Section to that artificial scenario, for that reason we will refer to this

as AUCS (Artificial Use Case Scenario).



4. Based on this information, Jeremy chooses the available knowledge fusion and knowledge reasoning black boxes that are provided and combines them in an easy, visual and as far as possible without programming way. When this procedure ends the knowledge Extraction process (which has been just created) can be started.

5. The KnaaS component sends the results as an O-MI/O-DF message (either a write request if the Jeremy enables the KnaaS to do that, or as a simple O-MI response that Jeremy would need to handle).

3.3. Knowledge Extraction Implementation

As it was described in great detail in section 2.4 the art of knowledge extraction is more about choosing among

a set of appropriate fusing tools rather than choosing one to solve each specific task. For that reason, it is

crucial to provide the end user with a great variety of knowledge extraction algorithms in order to permit the

production of new information out of the already available data sources. Moreover, IoT asks for simplicity as

well as ease of usage.

Among one of the best practices in IoT today is Node-RED. Node-RED is a visual tool for wiring the Internet of

Things, but it can also be used for other types of applications to quickly assemble flows of services. The reason

why ‘Node’ is in the name is because the tool is implemented as Node application but from a consumer point

of view that’s really only an internal implementation detail. Node-RED is available as open source and has

been implemented by the IBM Emerging Technology organization. However, even Node-RED is quite simplistic

and permits a bunch of different functionalities, the decision to be built upon the Node server adds the

restriction of the wiring services to be programmed via the JavaScript languages with has a lot of limitations

regarding the available knowledge extraction libraries. On the other hand, there is a vast increase in the

knowledge extraction frameworks which offer an API for the Python language or they are written in the Python

language, i.e., [47]–[52]. Based on the conclusions of section 2.4 a special consideration should be given in

order the KnaaS to be easily extendible and adaptable. For that reason, there should be an easy mechanism

to further extend and add all the functionalities that these libraries provide.

Leveraging both advantages that Node-RED offers with those of well-known inference libraries was the

architectural decision that we have made. Toward that direction, we have integrated a python functionality

on Node-RED which permits the integration of many Knowledge Extraction Frameworks. This python addition

makes easy the extension of KnaaS with new fusing and reasoning functionalities.

In Figure 8 there is an example of the visual and user-friendly interface that the KnaaS will provide. KnaaS

provides capabilities for knowledge extraction from heterogeneous data sources. It is currently designed to

be installed as a stand-alone server on the Web, exposing its services as REST endpoints. However, if sensitive

information should be processed, operating a public KnaaS server may raises security or privacy concerns.

Therefore, it is also possible to run KnaaS within the premises of, e.g., the user Jeremy. In the future we also

plan to integrate KnaaS’s functionality with the capabilities of O-MI nodes so that they can directly deal with

data and avoid privacy, security or performance problems. In addition, the integration of O-MI nodes has

already implemented10 and the full functionality of KnaaS, together with examples, will be provided in next

release. With regard to the AUCS (Artificial Use Case Scenario) introduced in section 3.1, a user is able to (a)

access data available in the Web through e.g., an O-MI query, a SPARQL query, etc., to (b) fuse them and

extract knowledge out of them through pre-custom nodes and/or by writing his/her own custom functions in

JavaScript or Python, and finally to (c) harvest the resulted knowledge. Last but not least, the user is able to

markup the resulted knowledge, and encode it in an O-DF message if resulted knowledge will be exposed in

the bIoTope ecosystem.

10 https://github.com/skubler/Node-Red-OMI and https://github.com/skubler/Node-Red-ODF

https://github.com/skubler/Node-Red-OMI

https://github.com/skubler/Node-Red-ODF



Figure 8: bIoTope KnaaS example

We provide the KnaaS under an open source license at GitHub11. It is implemented as a flow-based Web visual

programming service using the Node-RED12 framework and mongoDB13 as a NoSQL database to store the

resulted knowledge if needed. The Node-RED framework is extended with a Python node, which is able to

offer advanced knowledge extraction and fusion capabilities through the integration of state-of-the-art

libraries such as: SciPy14, scikit-learn15, etc. Special consideration will be given to the future so as a plethora of

many more knowledge extraction libraries will be added to the Node-RED environment. For an easy testing

setup, we provide instructions how to build the server as well as a Docker16 compose file for simplified

deployment and demo data. In-depth instructions for building, deploying and using KnaaS are available in the

project repository as well.

11 https://github.com/prokolyvakis/knaas 12 https://nodered.org/ 13 https://www.mongodb.com/ 14 https://www.scipy.org/ 15 http://scikit-learn.org/stable/ 16 https://www.docker.com/

https://github.com/prokolyvakis/knaas

https://nodered.org/

https://www.mongodb.com/

https://www.scipy.org/

http://scikit-learn.org/stable/

https://www.docker.com/



4. Case Studies

The implementation of smart city use cases requires the exploration of multiple data sources and their

relationships along with user demand-driven knowledge services for intelligent management and operations.

The purpose of this section is to illustrate the knowledge extraction capabilities of the KnaaS framework while

referencing one of the bIoTope use cases, namely the Lyon’s Heat Wave Mitigation Pilot Use Case. Especially,

we will demonstrate how the data fusion and the knowledge discovery capabilities can be used to inform the

citizens of Urban Community of Lyon about the local heat conditions of their region and help the citizens

ventilate their homes accordingly. In addition, it will be demonstrated how a service can be set up through the

KnaaS framework that will enable to establish the correlation between traffic and temperature. Similar to the

work from MIT [15] which was briefly described in section 2.3, the established correlation can provide insights

into the way temperature may be affected by traffic. Based on this work different proactive actions could be

initiated for proactive environmental management. In section 4.1, we describe the scenarios of the Lyon’s

Heat Wave Mitigation Pilot Use Case on which we have focused. We have implemented the different scenarios

through KnaaS and we provide them under an open source license at GitHub17. The implementation of these

scenarios is accompanied with in-depth instructions for building, deploying and using the aforementioned

solutions through the KnaaS current implemented version.

4.1. Knowledge Extraction and Fusion in bIoTope use cases and pilots

The Heat Wave Mitigation Pilot Use Case aims at improving citizen life during summer and particularly when

heat waves hit the city. It also aims to contribute to the metropolitan climate change policy by refining the

heat waves model, by controlling the vegetation, which impacts on the heat waves, and this by using a natural

resource such as rainwater. The realization goals of this pilot use case were fully described in deliverable D2.1:

Ecosystem Stakeholder Requirements Report and Pilot Definition. To demonstrate the knowledge extraction

capabilities of the KnaaS, we have focused on some specific aspects shown in orange colour in Figure 9.

Figure 9: Goal Realization Viewpoint of Lyon's Pilot Case #2 in ArchiMate.

More specifically, we will demonstrate step by step how KnaaS can be used to:

1. Inform Citizens of local heat conditions.

2. Help citizens to ventilate their home.

17 https://github.com/prokolyvakis/knaas#application-to-lyons-heat-wave-mitigation-pilot-use-case

https://github.com/prokolyvakis/knaas#application-to-lyons-heat-wave-mitigation-pilot-use-case



3. Establish a correlation between traffic and temperature.

It is obvious from these three specific aspects that a large traffic, temperature and humidity sensors network

is required in order to reach the promised goal. For that reason, traffic data were accessed from the RESTful

APIs18 that the Urban Community of Lyon offers. Additionally, temperature and pressure data were extracted

from the Netatmo PUBLIC API19. The Netatmo company designs and distributes weather stations that allow

the monitoring of the atmospheric conditions inside and outside the buildings.

4.1.1. Heat Wave Mitigation: Inform Citizens of Local Heat Conditions

In Figure 10, an example of how to access Netatmo local weather stations and Urban Community of Lyon’

Traffic Data is shown in a user-friendly and visual programming way. In the upper part of figure the Netatmo

sensors are accessed and the data are cleaned, pre-processed and stored in a MongoDB database20 for further

investigation. The storage of historical data is in full accordance with the Conceptual Architecture of the

Knowledge Framework (presented in Section 3). Moreover, this functionality is very useful especially when a

data analysis should be performed on the data; as it will be also shown in section 4.1.2 when the correlation

between temperature and traffic data will be established.

Figure 10: Access Local Heat Conditions & Traffic Through KnaaS

In the lower part of Figure 10, the same procedure is applied as above and the Urban Community of Lyon’

Traffic Data are accessed, cleaned, pre-processed and stored in the MongoDB database. The preprocessing is

achieved by writing a custom JavaScript function, which parses the JSON messages produced in the “Lyon

Netatmo” and “Lyon’s Traffic Data request” nodes. Finally, in the middle part of Figure 10, the visualization is

achieved through the world map functionality that KnaaS provides. The result of this feature is shown in Figure

11. The local heat conditions of the Urban Community of Lyon are shown in OpenStreetMap21, where the

citizens can be informed about local heat conditions. In addition, a rule is established which is activated when

the temperature is over 24 °C. This fact is shown in the OpenStreetMap with a black icon. In future work,

context-aware events could be initiated when the users’ geographical position is close to temperature critical

18 https://data.grandlyon.com/ 19 https://dev.netatmo.com 20 https://www.mongodb.com/ 21 https://www.openstreetmap.org

https://data.grandlyon.com/

https://dev.netatmo.com/


https://www.openstreetmap.org/



regions. The section 4.1.2 demonstrates a way how the flow shown in Figure 10 could be extended in order to

create an event through “Switch” node.

To go on step further, and demonstrate the connection between the results presented here and the work

presented in deliverable D5.4: 2D and 3D UI Widgets Library regarding the Context-Sensitive End-user

Dashboards (UIaaS), we proceed with the demonstration of an example of how we can develop simple

widgets, which permit the visual data exploration of the local heat conditions through KnaaS. Figure 12

demonstrates this capability. As it can be seen, we can exploit the widgets to visualize the various results

produced via the KnaaS framework and in that day investigate different aspects of the local heat conditions in

the urban community of Lyon.

Figure 12: Data Exploration of the local Heat Conditions through KnaaS

Specifically, in Figure 13 is depicted the produced dashboards based on the above flow configuration. Different

gauge charts are produced that present in a visual way the temperature and humidity sensor data streams

offered by Netatmo local weather stations. Moreover, the temperature and humidity graphs over time are

shown from which the variation and the mean value of the data can be investigated. In [53] is mentioned that

Figure 11: Visualization of local Heat Conditions through KnaaS



there is a consensus that one of the “next breakthroughs will come from integrated solutions that allow the

user to explore their data using graphical metaphors.” This will bring users into direct contact with the data

and it will lower the huge gap between knowledge extraction algorithms and the knowledge presentation. For

that reason, during the initial KnaaS framework design the facilitation of the interaction with the work of WP5

regarding the UIaaS was of significant importance. The initial choice of bIoTope that O-MI, O-DF will be the

‘glue technology’ that manages to make all the IoT devices and the bIoTope components interoperable was a

great facilitator towards the realization of this aim. Last but not least, this work permits the KnaaS framework

to be interoperable not only with the UIaaS, but all the bIoTope components and IoT devices that are

confronted to O-MI and O-DF.

Figure 13: Gauge Charts and Graphs produced via KnaaS to visual explore the sensor streams.

4.1.2. Heat Wave Mitigation: Help citizens to Ventilate their Home

In this subsection, the scenario of helping the citizens to ventilate their home properly is investigated. In order

to realize it, special information is needed regarding the weather information, the weather forecast is needed.

OpenWeatherMap 22 is an online service that provides weather data, including current weather data,

forecasts, and historical data to the developers of web services and mobile applications. For data sources, it

utilizes meteorological broadcast services, raw data from airport weather stations, raw data from radar

stations, and raw data from other official weather stations. All data is processed by OpenWeatherMap in a

way that it attempts to provide accurate online weather forecast data and weather maps, such as those for

clouds or precipitation. Beyond that, the service is focused on the social aspect by involving weather station

owners in connecting to the service and thereby increasing weather data accuracy. The ideology is inspired by

OpenStreetMap and Wikipedia that make information free and available for everybody.

In Figure 14, a Node-RED flow is shown in which the available data from OpenWeatherMap are accessed,

processed and visualized through a message notification in the dashboard created in the previous section. The

result is shown in Figure 15. In the future work, where a custom mobile application will be implemented this

information could be easily demonstrated in it. The only extension to the flow shown in Figure 14 is that an

O-MI node will be added, which will transfer the O-MI/O-DF message to the application.

22 https://openweathermap.org/

https://openweathermap.org/



Figure 14: Node-RED flow, which displays through dashboards’ notifications and tweets the OpenWeatherMap information.

Figure 15: Message notification with the available information in the OpenWeatherMap.

In Figure 16, the same message is shown through a tweet message, which has been triggered by the same

function as the one which triggered the dashboard’s notification.

Figure 16: Tweet message with the parsed information from OpenWeatherMap.

In order to further extend the above functionality and check if the temperature exceeds a specific threshold

only a simple modification is needed in the flow shown in Figure 14. Especially, with the addition of the switch

node – shown in orange colour in Figure 17– a rule can be added which creates a message to be displayed only

in the condition where the temperature exceeds the 24 °C.

Figure 17: Node-RED flow, which enables the Users to ventilate their homes accordingly.

In the future scenario, where the user has exposed his/her e.g., Smart Ventilator (might be an interconnection

to another pilot use case of the bIoTope ecosystem) through a private O-MI node, the switch node could fix

the temperature accordingly just by sending an O-MI/O-DF message to this node.



4.1.3. Heat Wave Mitigation: Establish a correlation between traffic and temperature

In statistics, the Pearson correlation coefficient, also referred to as the Pearson's r or Pearson product-moment

correlation coefficient (PPMCC), is a measure of the linear correlation between two variables X and Y. It has a

value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total

negative linear correlation. It is widely used in the sciences. It was developed by Karl Pearson from a related

idea introduced by Francis Galton in the 1880s. Pearson's correlation coefficient when applied to a population

is commonly represented by the Greek letter ρ (rho) and may be referred to as the population correlation

coefficient or the population Pearson correlation coefficient. The formula for ρ is:

𝜌𝑋,𝑌=𝑐𝑜𝑣(𝑋,𝑌)

𝜎𝑋𝜎𝑌

where:

cov: is the covariance, namely: cov , X YX Y E X Y

𝝈𝑿: is the standard deviation of X, namely: X E X

𝝈𝒀: is the standard deviation of Y, namely: Y E Y

The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the

relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases.

A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies

that there is no linear correlation between the variables. More generally, note that i iX X Y Y is

positive if and only if iX and iY lie on the same side of their respective means. Thus the correlation coefficient

is positive if iX and iY tend to be simultaneously greater than, or simultaneously less than, their respective

means. The correlation coefficient is negative (anti-correlation) if iX and iY tend to lie on opposite sides of

their respective means. Moreover, the stronger is either tendency, the larger is the absolute value of the

correlation coefficient.

KnaaS supports execution of python code thanks to the node shown in Figure 18. This node acts a proxy to

Python modules and libraries which offer advanced knowledge extraction and data mining capabilities. For

instance, in deliverable D5.4 (UIaaS), this node acts as the central element for evaluating the current context

as it uses the Python programming extension in order to execute SPARQL ASK queries against an O-MI/O-DF

Integration Server thanks to the Python RDFlib23 library. (see Deliverable D5.2 for an introduction to OORI).

Figure 18: Node for writing Python code in KnaaS.

In this subsection, we will demonstrate how the node shown in Figure 18 permits us to incorporate in KnaaS

knowledge extraction functionalities provided in SciPy24, an open source Python library used for scientific

23 http://rdflib.readthedocs.io/en/stable/ 24 https://www.scipy.org/

http://rdflib.readthedocs.io/en/stable/




computing and technical computing and pandas25, a Python package providing fast, flexible, and expressive

data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

Specifically, the python3 function node shown in Figure 19, with the label Get Netatmo & Traffic Data, accesses

Netatmo local weather stations and urban community of Lyon’ traffic data every two minutes, preprocesses

the data and stores them accordingly in a MongoDB collection. In addition, the timestamps with the exact

time that the data were accessed are stored in a Python list structure.

Figure 19: Establish a correlation between traffic and temperature through KnaaS.

In the next step, a MongoDB query, shown in Figure 20, is executed in order to access the geographic

coordinates of a road with a specific steet_name. Additionally, another MongoDB query is executed in order

to find the closest local weather stations based on the geographic coordinates returned by the previous query.

This query can be shown in Figure 21. Based on this result an average temperature value and an average

pressure value is computed for each timestamp for the road with the specific street name. In the next these

data are stored in a pandas’s dataframe and the correlation of the road speed with the temperature and

pressure is computed based on the Pearson correlation coefficient.

query = { "$and": [ { "layer": 'traffic' }, \

{"last_update": {"$eq": date}}, \

{"libelle": street_name} ]

}

Figure 20: MongoDB query that retrieves the geocoordinates of a specific street_name given a specific timestamp.

query= {"$and":[{"layer":'netatmo'},

{"last_update":{"$eq": date}},\

{"loc":{"$nearSphere":{"$geometry":{"type":"Point",\

"coordinates":[geo[0],geo[1]]}

}

}

}]}

Figure 21: MongoDB query that finds the closest Netatmo local weather stations given specific geographic coordinates.

In order to demonstrate the results a simple dashboard was created and presented in Figure 22 that displays

the Pearson correlation coefficients in a table format. Similar to the work from MIT [15] which was briefly

25 http://pandas.pydata.org/pandas-docs/stable/

http://pandas.pydata.org/pandas-docs/stable/



described in section 2.3, the established correlation can provide insights into the way temperature may be

affected by traffic. Insights gained from this step may lead, for instance, to a better understanding of critical

traffic conditions.

Figure 22: Dashboard with the computed Pearson correlation coefficients.



5. Conclusion and Future Work

In this deliverable, we demonstrated the core knowledge extraction and data fusion mechanisms of the Knowledge Extraction Framework. We provide a prototypical implementation of the Knowledge Extraction Framework in the frame of Knowledge as a Service provisioning and outline how it can be integrated with existing services in the bIoTope ecosystem. For the purpose of demonstrating the strengths of the bIoTope’s KnaaS framework and helping the readers understand the proposed approach, different scenarios of Heat Wave Mitigation Pilot Use Case of the Urban Community of Lyon were examined and implemented in the frame of KnaaS. Our immediate next steps will concentrate on extending further the reasoning and fusing mechanisms of the Knowledge Extraction Framework and provide more advanced capabilities in a user-friendly way based on the visual programming principles that the KnaaS implementation was based. Additionally, special consideration will be given to the efficient handling of the semantic annotations within the O-MI/O-DF messages based on the new extended O-DF messages with semantics achieved by the coordinated work among WP3, WP4 and WP5. Last but not least, we will work on ameliorating the interactions with other bIoTopes’ components towards realizing a sustainable bIoTope ecosystem.



6. References

[1] J. Mineraud, O. Mazhelis, X. Su, and S. Tarkoma, “A gap analysis of Internet-of-Things platforms,” Comput. Commun., vol. 89, pp. 5–16, 2016.

[2] K. Framling, S. Kubler, and A. Buda, “Universal Messaging Standards for the IoT From a Lifecycle Management Perspective,” IEEE Internet Things J., vol. 1, no. 4, pp. 319–327, Aug. 2014.

[3] The Open Group, “Open Messaging Interface (O-MI), an Open Group Internet of Things (IoT) Standard - C14B.” [Online]. Available: https://www2.opengroup.org/ogsys/catalog/C14B. [Accessed: 21-Feb-2017].

[4] The Open Group, “Open Data Format (O-DF), an Open Group Internet of Things (IoT) Standard.” [Online]. Available: http://www.opengroup.org/iot/odf/index.htm. [Accessed: 21-Feb-2017].

[5] I. F. Akyildiz, Weilian Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Commun. Mag., vol. 40, no. 8, pp. 102–114, Aug. 2002.

[6] M. Wang et al., “City Data Fusion:,” Int. J. Distrib. Syst. Technol., vol. 7, no. 1, pp. 15–36, Jan. 2016. [7] J. Hintikka, “Knowledge and belief: an introduction to the logic of the two notions,” 1962. [8] G. Bonanno, “Information, knowledge and belief,” Bull. Econ. Res., 2002. [9] P. Battigalli and G. Bonanno, “Recent results on belief, knowledge and the epistemic foundations of

game theory,” Res. Econ., 1999. [10] A. Doyle, “The Annotated Sherlock Holmes (Vol. 1),” I (New York, 1967), 1967. [11] M. Raskino, J. Fenn, and A. Linden, “Extracting Value From the Massively Connected World of 2015.

Gartner Research, 1 April 2005.” . [12] C. Perera, C. H. Liu, and S. Jayawardena, “The Emerging Internet of Things Marketplace From an

Industrial Perspective: A Survey,” IEEE Trans. Emerg. Top. Comput., vol. 3, no. 4, pp. 585–598, Dec. 2015.

[13] R. Joshi and A. C. (Arthur C. . Sanderson, Multisensor fusion : a minimal representation framework. World Scientific, 1999.

[14] A. Zaslavsky, A. Zaslavsky, C. Perera, and D. Georgakopoulos, “Sensing as a Service and Big Data.” [15] S. Sobolevsky et al., “Scaling of City Attractiveness for Foreign Visitors through Big Data of Human

Economical and Social Media Activity,” in 2015 IEEE International Congress on Big Data, 2015, pp. 600–607.

[16] D. L. Hall and J. Llinas, “An introduction to multisensor data fusion,” Proc. IEEE, vol. 85, no. 1, pp. 6–23, 1997.

[17] E. F. Nakamura, A. A. F. Loureiro, and A. C. Frery, “Information fusion for wireless sensor networks,” ACM Comput. Surv., vol. 39, no. 3, p. 9–es, Sep. 2007.

[18] A. N. Shulsky and G. J. Schmitt, Silent warfare : understanding the world of intelligence. Brassey’s, Inc, 2002.

[19] J. R. Boyd, “A discourse on winning and losing. Maxwell Air Force Base, AL: Air University,” Libr. Doc. No. MU, vol. 43947, p. 79, 1987.

[20] O. E. Drummond, “Methodologies for performance evaluation of multitarget multisensor tracking,” 1999, pp. 355–369.

[21] H. Fourati and K. Iniewski, Multisensor data fusion : from algorithms and architectural design to applications. .

[22] E. Blasch and S. Plano, “DFIG Level 5 (User Refinement) issues supporting Situational Assessment Reasoning,” in 2005 7th International Conference on Information Fusion, 2005, pp. xxxv–xliii.

[23] E. P. Blasch and S. Plano, “JDL level 5 fusion model: user refinement issues and applications in group tracking,” 2002, pp. 270–279.

[24] E. Blasch, E. Bosse, and D. A. Lambert, High-level information fusion management and systems design. Norwood, MA: Artech House, 2012.

[25] M. E. Liggins, D. L. (David L. Hall, and J. Llinas, Handbook of multisensor data fusion : theory and practice. CRC Press, 2009.

[26] M. Kumar, D. P. Garg, and R. A. Zachery, “A generalized approach for inconsistency detection in data fusion from multiple sensors,” in 2006 American Control Conference, 2006, p. 6 pp.

[27] P. Smets and Philippe, “Analyzing the combination of conflicting belief functions,” Inf. Fusion, vol. 8,



no. 4, pp. 387–412, Oct. 2007. [28] R. P. S. Mahler, “Statistics 101 for multisensor, multitarget data fusion,” IEEE Aerosp. Electron. Syst.

Mag., vol. 19, no. 1, pp. 53–64, Jan. 2004. [29] D. Smith and S. Singh, “Approaches to Multisensor Data Fusion in Target Tracking: A Survey,” IEEE

Trans. Knowl. Data Eng., vol. 18, no. 12, pp. 1696–1710, Dec. 2006. [30] Yunmin Zhu, Enbin Song, Jie Zhou, and Zhisheng You, “Optimal dimensionality reduction of sensor data

in multisensor estimation fusion,” IEEE Trans. Signal Process., vol. 53, no. 5, pp. 1631–1639, May 2005. [31] B. L. Milenova and M. M. Campos, “Mining high-dimensional data for information fusion: a database-

centric approach,” in 2005 7th International Conference on Information Fusion, 2005, p. 7 pp. [32] B. Khaleghi, A. Khamis, F. O. Karray, and S. N. Razavi, “Multisensor data fusion: A review of the state-

of-the-art,” Inf. Fusion, vol. 14, no. 1, pp. 28–44, 2013. [33] F. Sheridan, “A survey of techniques for inference under uncertainty,” Artif. Intell. Rev., 1991. [34] H. Durrant-Whyte and T. Henderson, “Multisensor data fusion,” Springer Handb. Robot., 2016. [35] L. Zadeh, “Fuzzy sets,” Inf. Control, 1965. [36] F. Karray and C. De Silva, “Soft computing and intelligent systems design: theory, tools, and

applications,” 2004. [37] C. Negoita, L. Zadeh, and H. Zimmermann, “Fuzzy sets as a basis for a theory of possibility,” Fuzzy sets

Syst., 1978. [38] Z. Pawlak, “Rough sets: Theoretical aspects of reasoning about data,” 2012. [39] G. Shafer, “A mathematical theory of evidence,” 1976. [40] L. Bottou and Leon, “From machine learning to machine reasoning,” Mach. Learn., vol. 94, no. 2, pp.

133–149, Feb. 2014. [41] R. Kohavi and F. Provost, “Glossary of terms,” Mach. Learn., vol. 30, no. 2–3, pp. 271–274, 1998. [42] A. Munoz, “Machine Learning and Optimization,” URL https//www. cims. nyu. edu/~

munoz/files/ml_optimization. pdf [accessed 2016-03-02][WebCite Cache ID 6fiLfZvnG], 2014. [43] I. Pratt-Hartmann, “First-Order Mereotopology,” in Handbook of Spatial Logics, Dordrecht: Springer

Netherlands, 2007, pp. 13–97. [44] V. N. Vapnik, The nature of statistical learning theory. Springer, 2000. [45] X. Wu et al., “Knowledge Engineering with Big Data,” IEEE Intell. Syst., vol. 30, no. 5, pp. 46–55, Sep.

2015. [46] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding, “Data mining with big data,” IEEE Trans.

Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, Jan. 2014. [47] J. Bergstra, O. Breuleux, and F. Bastien, “Theano: A CPU and GPU math compiler in Python,” Proc. 9th

Python, 2010. [48] T. Schaul, J. Bayer, D. Wierstra, Y. Sun, and M. Felder, “PyBrain,” Mach. Learn. …, 2010. [49] F. Pedregosa, G. Varoquaux, and A. Gramfort, “Scikit-learn: Machine learning in Python,” Mach. Learn.

…, 2011. [50] J. Demšar, B. Zupan, G. Leban, and T. Curk, “Orange: From experimental machine learning to interactive

data mining,” Princ. Data Min. …, 2004. [51] S. Sonnenburg, S. Henschel, C. Widmer, and J. Behr, “The SHOGUN machine learning toolbox,” Mach.

Learn. …, 2010. [52] D. King, “Dlib-ml: A machine learning toolkit,” J. Mach. Learn. Res., 2009. [53] U. M. Fayyad, A. Wierse, and G. G. Grinstein, Information visualization in data mining and knowledge

discovery. Morgan Kaufmann, 2002. [54] A. J. and R. C. Andrejs Abele, John P. McCrae, Paul Buitelaar, “Linking Open Data cloud diagram 2017.”

[Online]. Available: http://lod-cloud.net/.



7. Online References

DBpedia. wiki.dbpedia.org

Dublin Core vocabulary-DCMI Metadata Terms. dublincore.org/documents/dcmi-terms/

FOAF. xmlns.com/foaf/spec/

Google Knowledge Graph. www.google.com/intl/es419/insidesearch/features/search/knowledge.html

Linked Open Data. linkeddata.org

Lyon DATA platform. data.grandlyon.com

SKOS. https://www.w3.org/TR/skos-reference/

SPARQL Protocol And RDF Query Language. www.w3.org/TR/rdf-sparql-query

Pandas. http://pandas.pydata.org/pandas-docs/stable

SciPy. https://www.scipy.org/

OpenStreetMap. https://www.openstreetmap.org

Netatmo. https://dev.netatmo.com

mongoDB. https://www.mongodb.com/

Node-RED. https://nodered.org/

Docker. https://www.docker.com/


https://www.openstreetmap.org/

https://dev.netatmo.com/


https://nodered.org/



8. Annex 1. bIoTope Big Picture

26

26 Initial draft, more elaborated view in deliverable D2.4 and further specification in D2.7

Date post:	11-Oct-2018
Category:	Documents
Upload:	vandat
View:	218 times
Download:	0 times

DELIVERABLE D4.4 Framework for Knowledge Extraction from...

Documents