This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779899. It is the property of the SecureIoT consortium and shall not be distributed or reproduced without the formal approval of the SecureIoT Management Committee. The content of this report reflects only the authors’ view. The Innovation and Networks Executive Agency (INEA) is not responsible for any use that may be made of the information it contains.
Project Acronym: SecureIoT
Grant Agreement number: 779899 (H2020-IoT03-2017 - RIA)
Project Full Title: Predictive Security for IoT Platforms and Networks of Smart
Objects
DELIVERABLE Deliverable Number D4.3 Deliverable Name Tools and Techniques for Predictive IoT
Security Dissemination level PU
Type of Document R
Contractual date of delivery M11
Deliverable Leader INRIA
Status & version Final - V1.0
WP / Task responsible INRIA
Keywords: Predictive algorithms, Process mining, Deep learning
Abstract (few lines): This deliverable introduces three major types of machine-
learning algorithms for predictive security: process mining,
variational autoencoders (deep learning) and behavioral
analysis. All of them will be used to predict potential anomalies
of the monitored IoT systems. Description and initial results are
given along with open-source solutions that can support their
implementation.
Deliverable Leader: Jérôme François (INRIA)
Contributors:
Nikos Kefalakis (INTRA), Abdelkader Lahmadi (INRIA), Remi
Badonnel (INRIA), Adrien Hemmer (INRIA), Jérôme François
(INRIA), Juergen Neises (FUJITSU), Thomas Walloschker
(FUJITSU), Jose Fran Ruiz (ATOS), Mariza Konidi (INTRA)
Reviewers: Sofianna Menesidou (UBI), Stylianos Georgoulas (INTRA)
Approved by: Stylianos Georgoulas (INTRA)
Page | 2
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Executive Summary The scope of this deliverable is to introduce methods and techniques that can be leveraged for
performing predictive security in the context of SecureIoT. Two types of techniques have been
identified, described and assessed through a preliminary evaluation: process mining and
variational autoencoders. Particular attention has been given to the conformity with the
requirements defined in WP2. The first technique aims at describing a complex process into a
Petri-Net model, that can be used then to follow the realization of a process and so predict its
future states. The Process mining technique has been widely used in the extraction and the
prediction of behavioural models. However, in SecureIoT, it is applied for predicting anomalies
of IoT systems rather than their traditional application for behavioural discovery and analysis of
Business Process Management systems (BPM). Unlike HMM (Hidden Markov Models) that could
be also used to predict future states of a system by building its behavioural model from
observations, process mining methods that we propose in this deliverable provide more
interpretable models and less probabilistic. However, recent works have proposed to use HMM
as un underlying method for process mining to build probabilistic models of systems workflow.
The second technique is, in our case, use to represent a sequence of events into low-dimensional
representations using neural networks. Sequences can be then clustered and so interpreted as
directions towards future states. Finally, we introduce the use of unsupervised machine learning
in the context of anomaly detection and its integration in a full process from data collection to
alert generation. This deliverable also recommends a list of open-source tools to be considered
in the context of the prototype implementation of WP4 analytics techniques.
Page | 3
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Document History
Version Date Contributor(s) Description
0.1 23/05/2018 INRIA Table of contents
0.2 29/10/2018 INTRA Added deep predictive analytics with deep
learning methods description
0.3 30/10/2018 INRIA Description of process mining
0.4 04/11/2018 FUJITSU Description of the Zinrei AI services
0.5 06/11/2018 INRIA Requirements/architecture added and
notes for other partners
0.6 12/11/2018 INTRA Updated deep predictive analytics with
algorithm results and user manual
0.7 13/11/2018 INRIA Updated section on process mining
0.8 14/11/2018 FUJITSU Update on the Zinrei AI services
0.9 19/11/2018 INRIA, INTRA Update on process mining and deep
learning
0.10 25/11/2018 INRIA
Review, introduction, conclusion,
executive summary, abstract and
keywords
0.11 26/11/2018 INRIA Cleaning of comments and modifications
to make a version ready for review
0.12 27/11/2018 FUJITSU Adaptation to review comments
0.13 28/11/2018 ATOS Integration cybersecurity prediction
analytics
0.14 29/11/2018 INRIA Last edits (with reviewer comments
addressed)
0.15 29/11/2018 INRIA Clean version (all comments and revisions
validated)
1.0 30/11/2018 INTRA Final version to be submitted to the EC
portal
Page | 4
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Table of Contents Executive Summary ......................................................................................................................... 2
Definitions, Acronyms and Abbreviations ...................................................................................... 7
1 Introduction ............................................................................................................................. 8
1.2 Global vision .......................................................................................................................... 8
1.2 Links with other WPs, tasks and deliverables ....................................................................... 9
1.3 Document Structure .............................................................................................................. 9
2 Requirements ........................................................................................................................ 10
3 Data analysis architecture and processing pipeline .............................................................. 12
4 Predictive techniques for IoT Security ................................................................................... 14
4.1 Prediction models using process mining ...................................................................... 14
4.1.1 Overview ................................................................................................................... 14
4.1.2 Summary of the Data Processing ................................................................................. 15
4.1.2 Process mining algorithms ........................................................................................ 16
4.1.2.1 Inductive miner algorithm .................................................................................... 17
4.1.2.2 Transition system miner algorithm ....................................................................... 22
4.1.4 Evaluation metrics ........................................................................................................ 25
4.1.5 Application to SecureIoT preliminary dataset .............................................................. 27
4.1.6 Requirement mapping .................................................................................................. 32
4.2 Deep learning ................................................................................................................ 33
4.2.1 Overview ....................................................................................................................... 33
4.2.2 Variational Autoencoders ............................................................................................. 34
4.2.3 VAE Architecture .......................................................................................................... 35
4.2.4 VAE Training .................................................................................................................. 36
4.2.5 Application to SecureIoT Use Cases/Results ................................................................ 38
4.2.6 Requirement mapping .................................................................................................. 39
4.2.7 Code Availability & User Manual .................................................................................. 40
5 Zinrai AI services .................................................................................................................... 43
5.1 Description .................................................................................................................... 43
5.2 Potential application to SecureIoT scenarios and datasets .......................................... 44
5.2.3 Requirement mapping ..................................................................................................... 44
Page | 5
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
6 Network traffic anomaly detection ....................................................................................... 46
6.1 Live Anomaly Detection System using Machine Learning Methods (L-ADS) ................ 46
6.2 Architecture of the planned solution .................................................................................. 47
6.3 Integration in IoT platforms ................................................................................................ 49
7 Conclusion ............................................................................................................................. 51
References .................................................................................................................................... 52
Table of Figures FIGURE 1: ANATOMY OF THE SECURITY INTELLIGENCE LAYER ..................................................................................................... 13 FIGURE 2: PROCESS PIPELINE .............................................................................................................................................. 14 FIGURE 3 EXAMPLE OF A PETRI NET ...................................................................................................................................... 15 FIGURE 4: DATA PRE-PROCESSING BLOCK .............................................................................................................................. 16 FIGURE 5: PROCESS MINING BLOCK ...................................................................................................................................... 17 FIGURE 6 PETRI NET DEDUCED FROM PROCESS TREE ................................................................................................................ 17 FIGURE 7 DIRECTLY-FOLLOW GRAPH OBTAINED FROM REFINED DATA .......................................................................................... 18 FIGURE 8 CUTS OF THE DIRECTLY-FOLLOW GRAPH [4] .............................................................................................................. 19 FIGURE 9 SEQUENCE CUT OF L............................................................................................................................................. 19 FIGURE 10 EXCLUSIVE CHOICE CUT OF L1 .............................................................................................................................. 20 FIGURE 11 LOOP CUT OF L3 ............................................................................................................................................... 20 FIGURE 12 SEQUENCE CUT OF L5 ........................................................................................................................................ 21 FIGURE 13 PARALLEL CUT OF L2 .......................................................................................................................................... 21 FIGURE 14 HOW TO TRANSFORM PART OF THE PROCESS TREE.................................................................................................... 22 FIGURE 15 PARAMETERS TO CHARACTERIZE A STATE ................................................................................................................ 23 FIGURE 16 TRANSITION SYSTEM WITH STATES DEFINED BY TWO MAXIMUM PREVIOUS ONES ............................................................ 23 FIGURE 17 DEFINITION OF A REGION S1 TO S2 WITH S2 IN S' ................................................................................................... 24 FIGURE 18 DEFINITION OF A REGION S1 TO S2 WITH S1 IN S' ................................................................................................... 24 FIGURE 19 DEFINITION OF A REGION S1 TO S2 WITH S1 AND S2 (NOT) IN S' ............................................................................... 24 FIGURE 20 PETRI NET GENERATED BY THE TRANSITION SYSTEM MINING ALGORITHM ...................................................................... 25 FIGURE 21 DESCRIPTION OF A DATA POINT EXTRACTED FROM THE DATASET ................................................................................. 28 FIGURE 22 REFINED DATA (XES FILE) ................................................................................................................................... 29 FIGURE 23 BEHAVIORAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (WITHOUT FILTERING) ....................................... 29 FIGURE 24 BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (WITH FILTERING, THRESHOLD = 0.07) ............... 30 FIGURE 25 BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (LOW NUMBER OF ACTIVITIES, K=1) ................... 31 FIGURE 26 EXTRACT OF THE BEHAVIOURAL MODEL GENERATED BY THE INDUCTIVE MINING ALGORITHM (HIGH NUMBER OF ACTIVITIES,
K=100) ................................................................................................................................................................. 31 FIGURE 27 AN EXAMPLE OF A SHORT VAE, BASED ON [7] ........................................................................................................ 34 FIGURE 28 THE GENERAL FULLY CONNECTED NODE (PERCEPTRON) ............................................................................................. 35 FIGURE 29 THE RECTIFIER LINEAR UNIT FUNCTION (RELU) [9] ................................................................................................... 36 FIGURE 30 THE LATENT REPRESENTATIONS OF OUR TESTING DATA (CAN DATASET) ....................................................................... 38 FIGURE 31:HIGH-LEVEL DESCRIPTION OF THE L-ADS ............................................................................................................... 47 FIGURE 32:. USE OF L-ADS IN FIWARE-AWARE SECUREIOT DEVICES ........................................................................................ 49
Page | 6
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
List of Tables TABLE 1 MAXIMUM COST ALIGNMENT .................................................................................................................................. 26 TABLE 2 OPTIMAL COST ALIGNMENT..................................................................................................................................... 26 TABLE 3 SYNTHESIS OF PERFORMANCE METRICS WITH DIFFERENT BEHAVIOURAL MODELS ................................................................ 31 TABLE 4 INDUCTIVE MINER PERFORMANCE ........................................................................................................................... 32 TABLE 5 TRANSITION MINER PERFORMANCE ......................................................................................................................... 33 TABLE 6 SAMPLES FROM FIGURE 30 ..................................................................................................................................... 39 TABLE 7 ALGORITHMS PROPERTIES EVALUATION ..................................................................................................................... 40 TABLE 8 INPUTS FOR TRAINING VARIATIONAL AUTOENCODERS .................................................................................................. 40 TABLE 9 INPUTS FOR EXECUTING VARIATIONAL AUTOENCODERS ................................................................................................ 40 TABLE 10 MAPPING OF SECUREIOT REQUIREMENTS TO TENSORFLOW ....................................................................................... 45 TABLE 11 SUPPORTED ALGORITHMS AND MODELS ................................................................................................................. 45
Page | 7
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Definitions, Acronyms and Abbreviations Acronym Title
AI Artificial Intelligence
CE Contextualization Engine
D Demonstrator
DP Data processing
Dx Deliverable (where x defines the deliverable identification number e.g. D1.1.1)
ISKB IoT Security Knowledge Base
ISTE IoT Security Templates Extraction
ML Machine Learning
Mx Month (where x defines a project month e.g. M10)
O Other
OSINT Open-Source Intelligence
OSS Open-Source Software
PaaS Platform-as-a--Service
PM Process mining
PU Public
R Report
RE Restricted to a group specified by the consortium (including Commission Services)
SPEP Security Policy Enforcement Point
TEE Template Execution Engine
TL Task Leader
VAE Variational Autoencoder
WP Work Package
WPL Work Package Leader
WPS Work Package Structure
XES eXtensible Event Stream
Page | 8
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
1 Introduction 1.2 Global vision The task T4.2 is dedicated to providing algorithms for enabling predictive security. It is one of the
major objectives of the project and can support several types of security mechanisms that are
envisioned in the project: manual analysis (dashboard), attack mitigation, risk assessment and
intelligent data collection. Concretely, predictive security refers to the ability to predict an attack
will occur or a threat will be exploited in the future. It mainly relies on detecting early (weak)
signs of an attack. The time horizon of this future can vary from short to long, i.e. seconds to
days. There is no a priori known or expected time horizon. It is dependent of the services that
predictive security supports such as attack mitigation for example. So, the prediction must be
done in a timeframe that allows the other services to properly integrate the outcome of the
latter. Such services also expect accurate prediction. More precisely, attack mitigation can
leverage prediction outcomes to automatically configure or reconfigure security services. Risk
assessment can combine the information about the IoT assets with the prediction to refine the
scoring. Intelligent data collection must rely on prediction and the confidence of the latter to
decide whether or not additional data should be gathered by monitoring probes in order to
improve the accuracy of the prediction, and so helps in increasing the efficiency of the mitigation
and the accuracy of the risk assessment.
T4.2 is closely coupled with T4.1. Indeed, extracted knowledge of T4.1 constitutes a rich source
of pre-processed information that predictive techniques may leverage. In this deliverable, we
thus remind the requirements that must be satisfied by the predictive security algorithms and,
similarly to D4.1 for knowledge extraction, we highlight their integration into the architecture
including the interactions with the other components. Three major predictive techniques are
detailed in this deliverable. First, deep learning is researched on by considering autoencoders.
Actually, to build a predictive approach, it is necessary to embed some historical or long-term
representation of a system behaviour. We can thus concatenate a sequence of data or events
and embed it in a single vector being then processed by usual algorithms to extract outliers, i.e.
the potential attacks. However, those vectors can be very large (with large time frame) leading
to large computational time. Autoencoders are therefore helpful since they reduce the
dimensionality of the vectors using a neural network while keeping information representative.
Another technique that has been introduced in D4.1 is Process Mining. We refine in this
deliverable and show how such an approach designed to learn a model of a process can be also
used for predicting the next states of the system. In a nutshell, learning a behavioural model can
be used to then evaluate if a new trace of the system execution follows this model or not. The
main advantage if the technique is the ability to deal with heterogeneous events but the
drawback is the multiplication of singular events. Therefore, the knowledge extraction using
clustering consolidates the raw data in representative states. Finally, we introduce a predictive
strategy based on non-supervised methods of machine learning. The analysis of the behaviour is
based in the processed network traffic of the IoT devices. Any anomaly or significant deviation in
Page | 9
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
any of the parameters used for detection of anomalous behaviour is identified as an incident.
This is reported either in the analysis of the data or alarms.
1.2 Links with other WPs, tasks and deliverables D4.2 is the first deliverable of T4.2. T4.2 takes as inputs the following WPs and tasks:
- WP2 defines the requirements that techniques developed in T4.2 must fulfil. They have
been documented in D2.2 and section 2 reviews them.
- WP2 also defines the architecture of the project. It is documented in D2.4 and we refer
to it when describing the analytics pipeline in section 3.
- WP4/T4.1 introduces methods for monitoring and extracting knowledge that can be used
as demonstrated in this deliverable in conjunction with predictive algorithms.
- WP3 and T4.1 defines conjointly a data model for data sources we use in this project.
- WP3 defines probes that will generate information/data used for predictive security.
- WP6 based on (ab)use cases documented in D2.1 will execute the scenarios to generate
data to be analysed
The output of T4.2 serves as support for the following:
- WP5/T5.1 defines risk assessment and mitigation services that can rely on the analytics
performed in WP4.
- WP3/T3.3 defines the intelligent data collection that can rely on predictive results to
adapt the probe configuration.
- WP3 will consolidate the data model jointly with T4.2 by modelling predictive algorithm
results.
- WP6 will rely on developed techniques for predictive security for testing and validation
purposes.
1.3 Document Structure This deliverable is structured as follow:
• Section 2 reviews the requirements defined in WP2 related to predictive security.
• Section 3 illustrates the predictive security processing pipeline and shows how it is
mapped to the logical components of the SecureIoT architecture.
• Section 4 describes the process mining and autoencoders. Theoretical background is
given as well as a practical evaluation. Conclusions are drawn regarding the requirements
of section 2.
• Section 5 lists OSS that can be used for the algorithms provided in T4.1 and T4.2.
• Section 6 describes a technique for cybersecurity predictive analytics using machine
learning and network analysis
• Section 7 gives a conclusion and introduces the next steps.
Page | 10
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
2 Requirements In deliverable D2.2, the analysis leads to the identification of particular requirements related to
each task. Four ones are provided for T4.2 and so are in the scope of this deliverable:
• R4.2.1: Data should be protected during processing by predictive algorithms
• R4.2.2: Predictive analytics must discover and predict threats and vulnerabilities in a timely, scalable, consistent and automated manner
• R4.2.3: Support of multiple prediction algorithms and models
• R4.2.4: Prediction algorithms should describe their constraints In this deliverable of task T4.2, we particularly aim to investigate different approaches and techniques (R4.2.3). In particular, we propose to closely work on deep learning and processing mining methods applied in an IoT context while also considering existing FIWARE enablers. For each of the proposed method, this deliverable will assess the following properties from R4.2.2:
• Prediction time: how long the prediction techniques can process the data to return valuable results such as predicting a threat. In this project, this cannot be only considered as the absolute time to execute the algorithm. Later, when being used in conjunction with mitigation, this time must be estimated regarding the time to properly trigger those counter-measures. Indeed, if a counter-measure is very simple and can be applied very fast, there is no need to have predictions on long-term horizon. Hence, we expect more concrete evaluation in subsequent deliverables produced in T4.2.
• Scalability: IoT data are very heterogeneous. A qualitative analysis of first datasets provided in SecureIoT was performed in D4.1. So, the scalability concerns both the volume and heterogeneity of provided data. We will thus carefully evaluate how our proposal can meet this requirement.
• Consistency: the predictions of the proposed algorithms and their triggered mitigations have to be consistent with the monitored system states and the presence of threats. The consistency will be measured using the available performance metrics for each proposed algorithm. For example, for process mining methods, we measure the precision, the fitness and the generalization of the inferred system models to assess their consistencies while predicting future states. Decisions based on the same rulesets and training conditions shall not be contradictory even if positioned on various levels of the SecureIoT architecture.
• Automated execution: in this deliverable, we will quantify the complexity of the configuration of the proposed techniques to evaluate the degree of human operations that are requested.
R4.2.3 is focused on the further integration of the algorithms. For each algorithm, we will thus specify what are the inputs, optional or required, and the constraints on these inputs if they exist. R4.2.1 is less coupled with the predictive techniques themselves but more related to the process that happens beforehand: the data collection. Therefore, we consider as a mandatory
Page | 11
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
requirement to take in account when fulfilling R4.2.4. Our techniques have to never request or force the collection of private data on their own.
Page | 12
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
3 Data analysis architecture and processing
pipeline
In D4.1, we provide a general overview about the processing pipeline. While D4.1 is focusing on
continuous security monitoring and knowledge inference, the pipeline is similar because the
major difference between the algorithms resides in their ability to characterize the current state
of a system (D4.1) or its future state(s) (D4.3).
Therefore, as inputs of predictive techniques, we can thus find again dynamic data collected from
a system in production, assets or static contextual data and external data. More details about
those are given in D4.1. However, in order to leverage the added value provided by the
algorithms defined in T4.1, another input for predictive techniques is the characterization of the
current state of the monitored system. It has the advantage to condense the information (rather
than relying on full raw data) and extract relevant indicators (for example using clustering
techniques).
Figure 1 refers to D2.4 that provides the architecture of the project. This figure is focusing on
WP4 integration and highlights how the analytics process will be integrated. In D4.1, we
summarized the different components of the architecture. The following are just a reminder as
their roles do not change for the predictive security:
• Dynamic data are provided by the Global Storage module. The latter is in charge of storing
all data sent by the monitoring probes of the live system. It will rely on the modelling described
in section 2.3.4.2.
• Context data are provided to the Contextualization Engine (CE): The role of this engine is
actually to fit the algorithms or pre-established learned models to a specific context, i.e. a
specific IoT deployment.
• External data are collected within IoT Security Knowledge Base (ISKB): This knowledge base
comprises external IoT security knowledge, including for example knowledge about known
threats, attacks, incidents and vulnerabilities.
However, for the others, there exist specificities related to predictive techniques that are
highlighted below.
• IoT Security Templates Extraction (ISTE): This module aims at creating models for both the
security monitoring (T4.1) and predictive security (T4.2). In this module, we will build the
different algorithms we defined in these two tasks. In the case of the predictive security, we
will find models capable of predicting the future states of the system, in particular if an attack
will occur or if it will be vulnerable and exposed to a threat.
• IoT Security Templates Database: Models previously built by the ISTE are stored in a
persistent manner in this database.
Page | 13
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
• Template Execution Engine (TEE): In the context of T4.2, the template execution engine will
thus build models to detect potential future attacks, threats or anomalies and will provide
this information to the Security Policy Enforcement Point (SPEP).
• Security Policy Enforcement Point (SPEP): It then turns the outputs of the predictive
algorithms into mitigation decisions to be applied through actuation such as modifying the
configurations of devices, deploying counter-measures, etc.
Figure 1: Anatomy of the Security Intelligence Layer
IoT Systems (Platforms &
Devices)
FieldNetwork
FieldDevice
Edge
Cloud
App Intelligent(Context-
Aware)Data
Collection
Actuation & Automation
Open APIs
IoT Security Template Extraction (Analytics)
Template Execution
Engine(e.g., Rule
Engine)
Global Storage(Cloud)
(SecureIoT Database + Probes Registry)
IoT Security Templates Database
Templates
ContextualizationEngine
IoT Security Knowledge Base
Security Policy Enforcement Point
WP4
Open APIs
WP3Management &
Configuration ToolsVisualization (Dashboards)
Page | 14
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
4 Predictive techniques for IoT Security 4.1 Prediction models using process mining We present in this section the process mining methods used to generate prediction models to
infer monitored system models and predict their deviations and anomalies. Process mining
methods have been widely used for the analysis and the building of workflow models for business
process management applications through the analysis of their event logs. However, in this
project we apply them in a new context to build predictive models for IoT systems to identify
their anomalies when they deviate from their expected states. To be applicable, we have to refine
the raw data we have in the project as process mining cannot interpret directly continuous values
as states. In that context, we detail the prior data processing required to transform raw data into
refined data interpretable by a process miner, the algorithms supporting the processing mining
activity and how they are used to generate behavioral models for prediction, and finally different
evaluation metrics.
4.1.1 Overview
The processing pipeline, detailed in Figure 2, summarizes the overall architecture of our
predictive approach for supporting IoT Security. It is composed of three main blocks
corresponding to the data preprocessing block, the process mining block and the prediction
block. The first two corresponds to the ISTE and the last one relates to TEE. This pipeline takes as
inputs raw data, that correspond to both training datasets used by the process miner or live
monitoring data used for prediction purposes from the behavioral models produced by the
process miner.
Figure 2: Process pipeline
During the training phase, the raw data have first to be transformed, during a data pre-processing
step, to generate refined data interpretable by the process miner. We are considering a
commonly-used process miner tool, called ProM, that requires an input file specified according
to the XML eXtensible Event Stream (XES) format [1] representing event logs. We have
considered in this deliverable, three datasets, provided by LuxAi, ISPRINT and IDIADA. that are
describing application data in the JSON format. We have mostly used the dataset provided by
Page | 15
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
ISPRINT, as an illustrative example. Once the raw data have been transformed into refined data
(XES file), they can be used by process mining algorithms to generate a behavioral model of the
observed system.
This behavioral model is formally expressed as a Petri net representing a discrete event model of
the system. This Petri net corresponds to a bipartite graph, i.e. its nodes can be split into two
disjoint and independent sets. The first set, containing the places (circles), represents the states
of the system. The second one, containing the transitions (boxes), corresponds to the events that
indicate a change of state. The Petri net also contains one or several token(s) that allow
transitions inside the graph.
Figure 3 Example of a Petri net
A simple example of a Petri net is given in Figure 3. In this figure, there are four places that are
noted P1, P2, P3 and P4, and two transitions that are noted T1 and T2. The events enable to cross
transitions. However, each input places of a given transition has to contain at least one token to
reach a new place. Once done, a transition may be crossed. As a consequence, tokens in the input
places are deleted and new tokens are built in the output places.
During the prediction phase, the raw data correspond to live monitoring data. They are also
transformed by the pre-processing step to build refined data. These refined data are then
compared by the prediction block to the behavioral models, in order to predict any abnormal
behaviors.
In the rest of the section, we will remind the methods used for supporting the data pre-processing
(DP), then detail the processing mining (PM) algorithms that are exploited by the process miner,
and overview different evaluation metrics with respect to the generated behavioral models.
4.1.2 Summary of the Data Processing
We will describe in this sub-section how the data pre-processing block generates refined data
(i.e. an interpretable XES file) from raw data. It relies on three sub-blocks, called respectively
feature selection, data normalization and data clustering, as depicted in Figure 4.
First of all, the states of the observed system have to be defined. A state is represented by a tuple
of features (𝑎, 𝑏, 𝑐 … ), where two tuples have to be strictly equal to correspond to the same
state. In particular, (𝑎1, 𝑏1, 𝑐1) and (𝑎2, 𝑏2, 𝑐2) correspond to the same state, if and only if 𝑎1 =
Page | 16
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
𝑎2, 𝑏1 = 𝑏2 and 𝑐1 = 𝑐2. However, this approach is inadequate to handle non-categorical
features.
Indeed, for continuous numerical attributes (such as temperature values), it does not make sense
to directly use these values. For example, in the file “bathroom_environment_it.json”, from the
ISPRINT dataset, there are so much different values for the temperature, the illuminance and the
humidity that every tuple (temperature, illuminance, humidity) would be unique, inducing a high
number of states without so the possibility to infer a real behavioural model.
Figure 4: Data pre-processing block
As a consequence, as depicted in Figure 4, continuous numerical data are processed by the data
normalization sub-block and clustering sub-block to be grouped into clusters, while the other
data (boolean and categorical ones) are directly used to generate the refined data set.
The data normalization sub-block relies on different techniques, such as min-max, z-score or
modified tanh normalization, that have been detailed in D4.1. The clustering sub-block then
permits to group continuous numerical data that share similarities into clusters corresponding to
single states. Different clustering algorithms, such as DBSCAN, K-Means and BIRCH, described in
D4.1, are considered in this work. They permit to reduce the number of states that characterize
the system, and to generate a refined data set exploitable by the process miner.
4.1.2 Process mining algorithms The next block of the process pipeline corresponds to the process mining (PM) algorithms, as
depicted in Figure 5. They permit to infer a behavioral model from the refined data (XES file). In
that context, we are considering two different PM algorithms: (1) the inductive process miner
algorithm, and (2) the transition system miner algorithm. They both lead to the building a Petri
net characterizing the observed system.
Page | 17
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 5: Process mining block
4.1.2.1 Inductive miner algorithm
The first process mining approach corresponds to the inductive miner algorithm [4] [5] [6].
Before describing its different steps, some important notions have to be introduced: the directly-
follow graph, the transition 𝜏, and the process tree. We will then illustrate its operations, with an
example used in [1] based on the following log 𝐿: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏, 𝑒, 𝑓 >50, < 𝑎, 𝑏, 𝑓, 𝑒 >100, <
𝑑, 𝑒, 𝑓 >100, < 𝑑, 𝑓, 𝑒 >100]
A directly-follow graph is built from the refined data (event log). It corresponds to a graph, where
each state corresponds to a state in the event log, and where a transition exists between two
states, only if this transition exists in the log. Thus, when a transition between two states 𝑆1 and
𝑆2 exists in the graph, then a trace in the form < ⋯ , 𝑆1, 𝑆2, … > was present in the refined data.
By using this mining algorithm, a transition 𝜏 may appear. This transition corresponds to a “silent
activity” [2], i.e. an activity that cannot be seen in the event log. However, it permits to reach a
new place in the Petri net. The Petri net generated by the inductive miner algorithm has a unique
beginning and ending places, and so to connect it completely, this silent activity may have to be
added.
A process tree is a compact and abstract graph that represents a Petri net. Each leaf is an activity,
a state here, and the nodes correspond to operators that describe activity interactions. There are
four operators: the exclusive choice ×, the sequential composition →, the loop ↺, and the
parallel composition ^. For example, the literal expression of the process tree corresponding to
the Petri net given in Figure 6 is the following one:
→ (× (↺ (→ (𝑎, 𝑏), 𝑐), 𝑑), ^(𝑒, 𝑓))
Figure 6 Petri net deduced from process tree
The different steps related to the inductive miner algorithm are detailed below:
Page | 18
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
1. Build the directly-follow graph from the refined data (event log). Figure 7
corresponds to the directly-follow graph obtained from the event log used to
generate the example given in Figure 6, where the edge weight between two
states corresponds to the number of times a transition between them has been
observed in the event log. In the graph, a green color will represent an input
transition and a red one an output transition.
Figure 7 Directly-follow graph obtained from refined data
2. Deduce a process tree from this graph. To do that, the algorithm searches the
most adequate operator to be used. Then, the set of activities is split into two
disjoint setsnoted 𝑠𝑒𝑡1 and 𝑠𝑒𝑡2. The log 𝐿 is consequently split into two sub-
logs 𝐿1 and 𝐿2, where only activities from 𝑠𝑒𝑡1 should be in 𝐿1, and only activities
from 𝑠𝑒𝑡2 should be in 𝐿2. A new cut is performed again on the obtained sub logs,
until each activity set contains only one element. Each step cuts the directly-follow
graph. The chosen operator corresponds to the one that leads to groups with the
highest number of nodes. The different operators to split the directly-follow graph
are illustrated in Figure 8.
Page | 19
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 8 Cuts of the directly-follow graph [4]
In the considered example, the first cut is illustrated in Figure 9, and corresponds
to a sequence operator. Consequently, the different activities of the
initial 𝑠𝑒𝑡: {𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓} are split into 𝑠𝑒𝑡1: {𝑎, 𝑏, 𝑐, 𝑑} and 𝑠𝑒𝑡2: {𝑒, 𝑓}, so the
initial log 𝐿: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏, 𝑒, 𝑓 >50, < 𝑎, 𝑏, 𝑓, 𝑒 >100, < 𝑑, 𝑒, 𝑓 >100, <
𝑑, 𝑓, 𝑒 >100] is split into 𝐿1: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏 >50, < 𝑎, 𝑏 >100, < 𝑑 >200]
and 𝐿2: [< 𝑒, 𝑓 >150, < 𝑓, 𝑒 >200].
Figure 9 Sequence cut of L
Let then consider 𝐿1. The algorithm selects an exclusive choice operator to split
this sub-log, as illustrated in Figure 10. The activities from 𝑠𝑒𝑡1: {𝑎, 𝑏, 𝑐, 𝑑} are split
into 𝑠𝑒𝑡3: {𝑎, 𝑏, 𝑐} and 𝑠𝑒𝑡4: {𝑑}. So, the log 𝐿1: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏 >50, <
𝑎, 𝑏 >100, < 𝑑 >200] is split into 𝐿3: [< 𝑎, 𝑏, 𝑐, 𝑎, 𝑏 >50, < 𝑎, 𝑏 >100] and 𝐿4: [<
𝑑 >200]. The 𝑠𝑒𝑡4 contains only one activity, so we have found our first process
tree leaf. There is no need to try to split 𝐿4 anymore.
Page | 20
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 10 Exclusive choice cut of L1
The sub-log L3 is then considered. A loop operator is selected by the algorithm, as
depicted in Figure 11. The activities are split from set3:{a,b,c} to set5:{a,b} and
set6:{c} so the log L3:[<a,b,c,a,b>50,<a,b>100] is split into L5:[ <a,b>200] and L6:[
<c>50].
Figure 11 Loop cut of L3
The sub-log 𝐿5 is now considered. As shown in Figure 12, the sequence operator
is applied on this sub-log. The activities are therefore split from 𝑠𝑒𝑡5: {𝑎, 𝑏}
to 𝑠𝑒𝑡7: {𝑎} and 𝑠𝑒𝑡8: {𝑏}. Consequently, the log 𝐿5 is split from 𝐿5: [<
𝑎, 𝑏 >200] to 𝐿7: [ < 𝑎 >200] and 𝐿8: [ < 𝑏 >200].
Page | 21
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 12 Sequence cut of L5
The sub-log 𝐿2 is finally considered and split with the parallel operator, as shown
in Figure 13. The activities are thus split from 𝑠𝑒𝑡2: {𝑎, 𝑏} to 𝑠𝑒𝑡9: {𝑎}
and 𝑠𝑒𝑡10: {𝑏}, and the log 𝐿2 is split from 𝐿2: [< 𝑒, 𝑓 >150, < 𝑓, 𝑒 >200] to
and 𝐿9: [ < 𝑒 >350] and 𝐿10: [ < 𝑓 >350].
Figure 13 Parallel cut of L2
The complete process tree discovered by the algorithm is given by the literal
expression: → (× (↺ (→ (𝑎, 𝑏), 𝑐), 𝑑), ^(𝑒, 𝑓)).
3. Infer a Petri net characterizing the behavior of the observed system from the
process tree. In the considered example, the Petri net is detailed in Figure 6. Figure
14 represents how to transform every link between two elements (E1 and E2) of
the process tree literal expression into a Petri net.
Page | 22
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 14 How to transform part of the process tree
A filter may also be applied to remove edges that occur rarely. Technically, a noise threshold is
defined to keep only the most outgoing edges of each activity.
4.1.2.2 Transition system miner algorithm
The second process mining approach corresponds to the transition system miner algorithm [7].
A transition system is a graph (S, E, T) where S represents the states, E the events in the log and
T stands for the edges between states with the associated events.
The different steps of the transition system miner algorithm are described below:
1. Create the transition system. States are not explicitly given by the refined data
(event log). The definition of these states can therefore be parameterized. The
algorithm permits to take into account past and future, more or less distant, and
additional explicit knowledge to characterize a state. Figure 15 summarizes, for all
executions of every specific instance, i.e. for all traces, the four parameters related
to the algorithm, in order to characterize the states.
Page | 23
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 15 Parameters to characterize a state
In our scenarios, we cannot consider the future parameter, because our goal is to
determine the next steps from the behavioral model with only information from
the past. Figure 16 gives an example of a transition system, where a state is
inferred from a maximum of two previous ones of the observed system.
Figure 16 Transition system with states defined by two maximum previous ones
2. Apply techniques to simplify the transition system, such as deleting loops or
merging states that have the same inputs and/or outputs.
3. Build the Petri net from the transformed transition system, using the concept of
regions. Each minimal region found will be a place in the generated Petri net. For
a transition system TS = (S, E, T) and the S’ set containing some states from S. S’ is
a region when one of the following conditions is true, for each event e from E:
o There are S1 and S2 in S, such that S1 S2 goes in S’, i.e. S1 is not in S’ and
S2 is in S’, as illustrated in Figure 17.
Page | 24
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 17 Definition of a region S1 to S2 with S2 in S'
o There are S1 and S2 in S, such that S1 S2 goes out S’, i.e. S1 is in S’ and S2
is not in S’ (Figure 18).
Figure 18 Definition of a region S1 to S2 with S1 in S'
o There are S1 and S2 in S, such that S1 S2 does not cross S’, i.e. S1 is in S’
and S2 is in S’, or S1 is not in S’ and S2 is not in S’, as illustrated in Figure 19.
Figure 19 Definition of a region S1 to S2 with S1 and S2 (not) in S'
For our example, the Petri net generated by the transition system mining
algorithm using minimal regions is illustrated in Figure 20.
Page | 25
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 20 Petri net generated by the transition system mining algorithm
Once the model is built, it may be used with a new event log in order to predict an unusual
behaviour. Indeed, if while checking a log file, a deviation from the normal behaviour model can
be found then this deviation is considered as unexpected and potentially dangerous.
According to [3] and [4], these two previously described algorithms have three major differences.
First, their simplicity of use. Indeed, the Inductive Miner algorithm, and its variants, have fewer
parameters than the Transition System Miner. For each of them a slight change in one of the
parameters can highly modify the model found. Then, the metrics that can be focus on. For
example, it is possible to create a model with a perfect fitness by using the Inductive Miner but
it is much more difficult to improve the others metrics. For its part, the Transition System Miner
allows to focus on simplicity or generalization during its first and second step and on fitness or
precision during the third step. Finally, the last difference is their complexity. The principal
drawback of the Transition System Miner is its complexity that can be exponential with the
respect to the size of the log whereas the Inductive Miner is easily scalable.
4.1.4 Evaluation metrics
It is important to evaluate the performance of the behavioral models generated by PM
algorithms. First, the behavioral model and the corresponding event log have to be aligned using
a dedicated method. This alignment method can be described as follows. For each event of the
event log, if the same move (i.e. changing to one state to another one) can be done on the model
and the considered log, then this event is considered as synchronized. The alignment cost is
obtained by adding all the movement costs. A movement cost is equal to 0, when the model and
Page | 26
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
the log are synchronized, otherwise it is equal to 1. The goal of the alignment method is to find
the optimal alignment, i.e. the alignment with the minimal cost.
To have a more concrete idea about the alignment method, the Petri net given in Figure 6 is used as an example with a given trace < 𝒂, 𝒃, 𝒇, 𝒆 >. Table 1 Maximum cost alignment
provides a possible configuration for this alignment. The symbol “>>” means the events in the model and the event log are desynchronized, and then 1 is added to the alignment cost. The optimal cost alignment for this example is given by Table 2 Optimal cost alignment
Move Log a b f E 0 >> >> 0 0 >> 0
Move Model >> >> >> >> τ a b τ Τ e Τ
Table 1 Maximum cost alignment
Move Log 0 a B 0 0 f e 0
Move Model τ a B τ τ >> e τ
Table 2 Optimal cost alignment
Once the alignment is done, it is possible to quantify the performance of the behavioral models
with regard to the four metrics detailed below:
-Fitness: this metric indicates whether the generated model, noted P, can replay the
event log, noted L, in an accurate manner. The closer this metric is to 1, the more the
model is capable to replay the given log. The value of this metric is calculated with the
formula below.
𝐹𝑖𝑡𝑛𝑒𝑠𝑠(𝑃, 𝐿) = 1 − (𝑓𝑐𝑜𝑠𝑡(𝑃, 𝐿)
𝑀𝑜𝑣𝑒(𝐿) + 𝐿𝑒𝑛𝑔𝑡ℎ(𝐿) ∗ 𝑀𝑜𝑣𝑒(𝑃))
Fcost(P,L) is the optimal alignment cost between L and P, Move(L) is the total cost of
desynchronized moves on the log, Move(P) is the same total cost on the model, and
Length(L) is the number of events in the log. The denominator of the formula represents
the maximum possible value of the total alignment cost, when there is not a single
synchronized move between the log L and the model P in the optimal alignment.
-Generalization: this metric indicates whether the model P is general enough to include
behaviors that are not in the log L. The closer this metric is to 1, the more general the
model is (i.e. the more unknown behaviour can be played by the model P). Maximizing
this value allows to avoid overfitting, it allows the model to adapt to a new set of events.
Calculating and using the generalization metric is not trivial, because it refers to unseen
Page | 27
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
examples, and how the model can react to them. As given in [8], the objective is to
consider the relationship amongst the number of activities (noted w) leaving a state, the
number of time this state was visited (noted n), and the probability to discover an activity
not seen before the next time the given state is visited, noted pnew(w,n). The value of
this metric is calculated with the formula below.
𝐺𝑒𝑛𝑒𝑟𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛(𝑃, 𝐿) = 1 − (∑ 𝑝𝑛𝑒𝑤(𝑤, 𝑛)𝑒∈𝐿
𝐿𝑒𝑛𝑔ℎ𝑡(𝐿))
According to [9], the pnew(w,n) value can be estimated as follows:
𝑝𝑛𝑒𝑤(𝑤, 𝑛) = {𝑤(𝑤 + 1)
𝑛(𝑛 − 1)𝑖𝑓𝑛 ≥ 𝑤 + 2
1 𝑒𝑙𝑠𝑒
-Precision: this metric indicates whether the events in this log L follow strictly the model
P. The more this metric is close to 1, the less additional behaviours that are not described
in the log will be able to be played by the given model. Maximizing this value allows to
avoid underfitting, and so an over-generalization. The value of this metric is calculated
with the formula below.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑃, 𝐿) = (1
𝐿𝑒𝑛𝑔𝑡ℎ(𝐿)) ∗ ∑ (
𝐹𝑖𝑟𝑒𝑑(𝑒)
𝐸𝑛𝑎𝑏𝑙𝑒𝑑(𝑒))
𝑒∈𝐿
S is the state right before executing the event e in the log. Fired(e) is the number of
activities of S already activated, Enabled(e) is the number of possible activity of S and
Length(L) is the number of events in the log.
-Simplicity: this metric characterizes whether the model is not complex. The value of this
metric can typically be given by the number of states and transitions of the model. At the
moment, this metric is not the most important. Indeed, by simplifying the behavioral
model, the fitness and precision will drop. Our goal is of course to build models that
describe accurately the observed system.
4.1.5 Application to SecureIoT preliminary dataset We will illustrate the application of the processing pipeline described in Figure 2, with the ISPRINT
dataset. The data included in iSprint dataset is from a simulation from CloudCare2U, a solution
for chronic disease patients to have a life as normal as possible. Information in the dataset is from
rooms sensors, like temperature and illuminance, or from particular devices used by the patients,
like heartbeat monitors (D4.1).
The principle of process mining algorithms is to work with different traces that record several
times the behavior of the observed system. For this data set, the JSON file containing application
data can easily be split, by considering that each recorded day stands for a new set of events and
therefore a new trace. Let us focus on the JSON file (“bathroom_environment_it.json”) to bring
Page | 28
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
out what is a normal behavior inside the bathroom using the sensors. In this file, each data point
is described as given in Figure 21.
{
"id":"2017-03-01T00:00:00.000Z",
"key":"2017-03-01T00:00:00.000Z",
"value":{
"rev":"1-03c148895758691766c23c94e915c6f6"
},
"doc":{
"_id":"2017-03-01T00:00:00.000Z",
"_rev":"1-03c148895758691766c23c94e915c6f6",
"LPG":0,
"door_open":true,
"illuminance":0.0,
"temperature":19.460684278090969,
"humidity":60.54275897145564,
"NG":0,
"CO":0,
"movement":false,
"timestamp":"2017-03-01T00:00:00.000Z"
}
} Figure 21 Description of a data point extracted from the dataset
Currently, the manual selection of features is a mandatory. In the “doc” part of the JSON file, all
the elements, except “_id” and “_rev”, are selected to be used by the process mining block. As
described above, normalization and clustering are performed on continuous numerical data, such
as Liquefied Petroleum GAS consumption (LPG), illuminance, temperature, humidity, Natural Gas
consumption (NG) and Carbon monoxide rate (CO). The goal is to find clusters, where these
numerical features share similarities, then each cluster is exploited as a categorical value to
define a given state in the Petri net.
After looking at the bathroom data, the behavior can be easily summarized as follow:
-During the day, illuminance rises then drops, probably because of the sun light.
-At some time periods, the temperature and humidity rise in a significant way, probably because
someone is using the bath.
Concerning the continuous numerical data processing, the z-score normalization and the K-
means clustering (with k=4) have been used for the first experiments. Once the data pre-
processing is done, the refined data (XES file) are produced. The file aggregates the values
obtained after clustering with the other features (boolean and categorical values). To facilitate
Page | 29
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
the execution of some algorithms, a new element, called “name”, is introduced to indicate each
feature that describes an event. An extract of the resulting file is given in Figure 22.
<trace>
<string key="concept:name" value="bathroom_environment_it.json" />
<event>
<string key="concept:clusterId" value="3" />
<float key="concept:LPG_bathroom_environment_it" value="0" />
<boolean key="concept:door_open_bathroom_environment_it" value="true" />
<float key="concept:illuminance_bathroom_environment_it"
value="0.9080663113006865" />
<float key="concept:temperature_bathroom_environment_it"
value="18.60439614860679" />
<float key="concept:humidity_bathroom_environment_it"
value="59.388833510963686" />
<float key="concept:NG_bathroom_environment_it" value="0" />
<float key="concept:CO_bathroom_environment_it" value="0" />
<boolean key="concept:movement_bathroom_environment_it" value="true" />
<date key="time:timestamp" value="2017-03-02T06:02:00.000Z" />
<string key="concept:name" value="0" />
</event>
…
</trace> Figure 22 Refined data (XES file)
From the refined data, we exploit the ProM miner to perform process mining algorithms and
obtain behavioral models. In particular, the model generated by the inductive miner algorithm
without a filter is given by Figure 23. Without filtering, each transition between two states in the
event log will be in the behavioral model. Each element in the log can therefore be replayed by
the model.
Figure 23 Behavioral model generated by the inductive mining algorithm (without filtering)
Page | 30
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
The obtained model is not really exploitable, it is too general and not precise enough. Indeed,
the future state is impossible to anticipate. By deleting the less relevant edges, i.e. by using a
noise threshold that avoid under- and over-fitting, the behavioral model, detailed in Figure 24,
seems to be more interesting. The zone A (low illuminance, temperature and humidity) and zone
B (medium illuminance, low temperature and humidity) show a globally similar use of the
bathroom. The principal difference is the illuminance value that is higher in the zone B. Moreover,
before reaching the zone C, corresponding to a person taking a bath, going to the zone B is a
mandatory. This means that a slight rising of the illuminance and an opened door are required to
go to this third zone.
Figure 24 Behavioural model generated by the inductive mining algorithm (with filtering, threshold = 0.07)
In order to evaluate the behavioral models, the events are replayed on these models to quantify
the fitness with respect to the event logs. Once the replay is done, the generalization, precision
and simplicity of the model are evaluated.
Considering how the precision and generalization metrics are calculated, it appears that it is
difficult to perform well with respect to both of them. Indeed, when clustering algorithms are
used to determine clusters, many activities can be activated from most of the states. This is the
reason why the precision can be low. To increase the precision, there are two ways. On one hand,
it is possible to highly reduce the number of possible activities (such as given Figure 25), but some
characteristics will be missed.
Page | 31
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 25 Behavioural model generated by the inductive mining algorithm (low number of activities, k=1)
On the other hand, we can highly increase the number of clusters to obtain almost one new
activity per event. In that case, the result will be more linear, as depicted by Figure 26, a
behavioral model with a high precision allows to easily predict the next state of a system. A model
with a lower precision will enable multiple activities, and it will be more difficult to know exactly
what the next step of the system will be. A synthesis of the results obtained with these different
configurations is given in Table 3.
Figure 26 Extract of the behavioural model generated by the inductive mining algorithm (high number of activities, k=100)
Fitness Precision Generalization Simplicity
k = 1
Threshold = 0 1 0.25 1 10 places
k = 4
Threshold = 0 1 0.06 1 3 places
k = 4
Threshold = 0.07 0.52 0.28 0.98 28 places
k = 100
Threshold = 0 1 0.31 0.83 200+ places
Table 3 Synthesis of performance metrics with different behavioural models
Page | 32
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
4.1.6 Requirement mapping
The tables below evaluate the process mining algorithms regarding properties specified by
requirements R4.2.2.
Table 4 evaluates the performance of the inductive miner algorithm.
Prediction time The generation of behavioral models (Petri net) is relatively fast, a model is expected to be obtained in less than 30 seconds [5], but its evaluation may take more time depending on the model complexity. A behavioral model with a high generalization may be evaluated with difficulties due to the huge amount of possibilities. The generation and the evaluation of the Petri net can be done in a few hundred milliseconds with simple models.
Scalability The article [5] highlights the scalability of a variant of the inductive miner algorithm for discovering processes. It was tested on a complex input XES file (77 513 traces, 358 278 events and 3 300 activities), and the result was given in less than 30 seconds.
Consistency Not applicable at this stage of the project since we do not use the model to predict (the attacks) yet.
Automated execution Currently, the process mining part of the pipeline is not automated. The user has to select the inputs and adjust parameters himself. A script will be developed to build and evaluate models in a more automated manner.
Table 4 Inductive Miner Performance
Table 5 evaluates the performance of the transition system miner algorithm.
Prediction time The generation of behavioral models (Petri net) seems to be slower than the generation with the inductive miner algorithm. In the article [4], the complexity of the algorithm in the worst-case is said to be exponential with respect to the size of the log.
Scalability A huge amount of states and activities might be a problem with this algorithm. To find the behavioral model that best fits the log, the fitness has to be evaluated for each possible final marking. The complexity of this algorithm can be exponential with respect to the size of the log. Therefore, too complex input files should be avoided.
Consistency Not applicable at this stage of the project since we do not use the model to predict (the attacks) yet.
Automated execution The algorithm requires to know the final state of the model. As for the inductive mining algorithm, a script has to be developed to evaluate efficiently the fitness and find the models that describe adequately the system in an automated manner.
Page | 33
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Table 5 Transition Miner Performance
Following the requirement R4.2.3, we define below the different inputs (constraints) of the
considered process mining algorithms.
Inputs related to the inductive miner algorithm:
Data: Events log (XES file)
Event log (XES format) from the considered system
Parameter: Variant Variant of the inductive mining algorithm to be used
Parameter: Noise Threshold
The value of the filter parameter that allows to only keep the most relevant outgoing edges of each activity.
Parameter: Event classifiers to consider
Features to be used as labels in the model that will be found.
Inputs related to the transition system miner algorithm:
Data: Events log (XES file)
Event log (XES format) from the considered system
Parameter: Backward and/or forward key
Backward and/or forward key classifiers to represent a state
Parameter: Collection type
Structure type of data to be used for defining states
Parameter: Transition system size limit
Number of elements to define a state
Parameter: Threshold states
Threshold to indicate how many of the selected states should be matched by any trace
Parameter: Threshold transitions
Threshold to indicate how many of the selected transitions should be matched by any trace
Parameter: Post mining conversions
Post-mining conversions that may have to be applied
4.2 Deep learning
4.2.1 Overview
Deep learning is a class of machine learning graph algorithms where each base input feature
vector is classified, clustered, or otherwise modeled in a multi-stage process that performs both
higher-order feature extraction and class fitting in a unified way. The defining characteristic is
that the entire process is simultaneously fitted on the data; there is no need for a pre-designed
feature extraction or preprocessing step. Deep learning models are further divided into
numerous sub-classes based on their topology, such as Recurrent Neural Networks (RNNs), Deep
Belief Networks (DBNs), and Convolutional Neural Networks (CNNs) [6].
Page | 34
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
4.2.2 Variational Autoencoders
In the context of IoT, a deep learning method that has been found to work well, especially for
systematic faults and intrusions, is the Variational Autoencoder (VAE) [7]. This deep learning
structure has multiple hidden layers that first reduce the dimensionality of the input, and then
scale up to reproduce it (see Figure 27). It is conceptually separated into the encoder (the initial
layers that perform dimensionality reduction) and the decoder (the output layers that recreate
the output) which both are part of the Template Execution Engine of the SecureIoT architecture
(see Figure 1). In the variational approach, the code layer includes an adaptive probability
distribution for the latent variable.
Figure 27 An example of a short VAE, based on [7]
The VAE has two important characteristics with regard to its predictive characteristics:
• It is generative, i.e. it estimates the joint distribution 𝑃(𝑥, 𝑦) of inputs and outputs (in
contrast to discriminative models that estimate the posterior distribution 𝑃(𝑦|𝑥). This
allows us to better understand the latent clustering of the data and use the model in
contexts with scarce labeled data (intrusion data in our context). Any generative model
can be turned into a discriminative one using the Bayes rule 𝑃(𝑦|𝑥) = 𝑃(𝑥,𝑦)
𝑃(𝑥).
• It can be trained in both a supervised and an unsupervised manner simultaneously, by
incorporating label data in the loss function when available. This allows us to train the
VAE with the full extent of our datasets [8].
Page | 35
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
The compressive properties of autoencoders often lead to meaningful clustering, which can be
used for predictive analytics. Moreover, if we treat future measurements as missing features, a
prediction problem can be solved as a feature reconstruction problem (see [8] for a relevant
example).
4.2.3 VAE Architecture A variational autoencoder, being a deep learning model, processes the input in multiple layers.
The first layer is always the input layer, which forwards each input scalar into subsequent nodes.
VAEs are feed-forward models, which means that the input layer has to account for all the
relevant inputs for a single output. So, in a sequence of 5 consecutive measurements of 5
different sensors we need an input layer of length 25 (as is the case for the CAN application
dataset).
Next is the value normalization layer. Each input scalar is offset by its minimum value and scaled
down by its new maximum value so that it is always in the range from 0 to 1. This helps numerical
stability and the multi-dimensionality of the model. Without normalization, models tend to be
biased towards inputs that naturally take large values. The length of this layer is equal to the
input layer.
The encoder part is formed by one or more fully connected layers. Each node in a fully connected
layer receives an input from every node of the previous layer (plus a constant bias input),
multiplies it by a learned weight, sums the inputs, and feeds the sum to a non-linear activation
function (see Figure 28). The activation function of the node helps the overall model learn the
non-linearities of the process. The output of the node is the result of the activation function.
Models that perform classification tend to favor sigmoid functions, while regression problems
(such as ours) tend to use the rectifier linear unit function (see Figure 29).
Figure 28 The general fully connected node (perceptron)
Page | 36
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Figure 29 The rectifier linear unit function (ReLU) [9]
In the VAE, the consecutive fully connected layers decrease in length, eventually down to the
desired code dimension.
In the variational approach, a latent Gaussian probability distribution is formed from the code,
using a custom layer with this format:
1. Two fully connected layers in parallel. The first produces the mean of the distribution, and
the second the variance.
2. A sampling layer that follows the previous two, and samples a Gaussian distribution with
the parameters it received.
The probability distribution is generally multi-dimensional. In our application, 2D and 3D latent
representations work well.
The decoder has the same number of fully connected layers as the encoder, but in reverse; the
length of the layers starts from the code dimension and scales up to the number of output nodes
(which here is the same as the input nodes). The same ReLU activation function is used.
Finally, a de-normalization layer is applied to the outputs, to bring them back to their original
range.
4.2.4 VAE Training Training a variational autoencoder is efficient, as it follows the conventions of training any deep
learning model. There are four steps to designing a training process:
1. Preparation of the dataset
2. Choosing a batch size
3. Choosing an optimizer
Page | 37
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
4. Choosing a loss function
To prepare the dataset, we form sequences of measurements according to the model we want
to train. For example, in a model that takes a sequence of 5 input vectors, we concatenate each
vector in the dataset with its neighbors, so that every vector eventually appears in 5 different
sequences, at a different position each time. Then the dataset is shuffled, and 20% is removed
and reserved for testing.
The batch size is the number of samples that are passed into the model in parallel. A high batch
size leads to faster training but can lead to reduced accuracy [10]. We use a safe batch size of 128
samples.
The optimizer is the learning algorithm that updates the weights of each node in the model [11].
We use the Nadam optimizer (Nesterov-accelerated Adaptive Moment Estimation), which is the
Adam (Adaptive Moment Estimation) optimizer that incorporates Nesterov momentum [12]. This
algorithm assigns a different learning rate to each parameter and keeps track of recent training
updates to better guide the process and mitigate noise.
The important innovation that variational autoencoders provide is the loss function, which is
developed in terms of the encoder/decoder/code separation:
𝑙𝑖 = −𝐸 [log (𝑝𝜑(𝑥𝑖|𝑧))] + 𝐾𝐿(𝑞𝜃(𝑧|𝑥𝑖)||𝑁(𝑧))
where:
• 𝑥𝑖 is the input with index 𝑖.
• 𝑧 is the latent representation of 𝑥𝑖, i.e. the code that the encoder produces.
• 𝑝𝜑(𝑥|𝑧) is the distribution of the decoder, given a certain code.
• 𝑞𝜃(𝑧|𝑥) is the distribution of the encoder, given a certain input.
• 𝑁(𝑧) is a distribution we define, usually a unit Gaussian 𝑁(0,1).
• 𝐾𝐿(𝑃||𝑄) is the Kullback-Leibler divergence. It measures how much information is lost
when using distribution 𝑃 to represent another distribution 𝑄.
The first term of the loss function is the expected log-likelihood that the decoder reconstructs
the input accurately. It is zero for a perfect reconstruction and grows as the model becomes less
accurate. We need to use a practical substitute for performance and numerical stability, as is the
root of the mean absolute error.
The second term acts as a regularizer. Normally, the encoder can learn very localized, essentially
meaningless representations, leading to overfitting. However, the Kullback-Leibler divergence
will grow as the encoder’s distribution becomes more complex and drifts away from our base
distribution 𝑁(𝑧).
Page | 38
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
As such, minimizing this loss function leads to an accurate model via the first term, and mitigates
overfitting and keeps latent representations meaningful via the second term.
4.2.5 Application to SecureIoT Use Cases/Results
We have trained variational autoencoders of various complexities and successfully applied them
to available datasets.
For the IDIADA CAN and V2X datasets (as analyzed in D4.1), we provide models that receive a
sequence of 5 input vectors (which leads to 25 input scalars for CAN and 35 for V2X), a total of
20 fully connected layers, and two-dimensional encodings and latent variables.
For classifying a given sequence of inputs, we use the encoding part of the autoencoder and
review its latent representation. Normal operation data concentrate on a structure (here, a line)
that is disconnected from outliers. While outliers are potentially problematic and may indicate a
fault or an attack (see Figure 30).
Figure 30 The latent representations of our testing data (CAN dataset)
Applying the model to the testing data yields insights that are consistent with our expectations.
For the CAN dataset, looking at good examples inside the cluster, we can see low steering angles,
with RPM, speed, and throttle consistent with each other, no braking or braking at low speeds,
Page | 39
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
sharp steering only at low or average speeds, etc. (see Table 6). As divergent examples tending
to leave the cluster, we can see sharp brakes on high speeds, high RPM but low throttle, high
values in general including sharp steering and high throttle and RPM but low speed.
Random samples of normal operation data (blue, average of the 5 inputs)
Brake RPM Speed Steering Throttle Mean_x Mean_y Description
0 2363 48.10 -112.3 20.5 0.0056 -0.1611
Driving and
steering at
low/average speed
0 3220 94.49 3.3 31.3 0.0046 -0.5357 Driving on a
straight line
0 4461 131.71 3.6 48.8 0.0031 -1.0762 Accelerating on a
straight line
Random samples of outlier data (red, average of the 5 inputs)
Brake RPM Speed Steering Throttle Mean_x Mean_y Description
0 1180 3.88 5.72 23.4 -0.0299 0.4664
Speed/RPM too
low for current
throttle
0 1000 72.97 -167.8 0.00 -0.129 1.1765 Sharp steering on
high speed
210 1000 88.91 1.2 0.00 -0.0134 1.1983 Sharp brakes on
high speed
Table 6 Samples from Figure 30
4.2.6 Requirement mapping Table 7 evaluates the algorithm regarding properties requires by requirement R4.2.2
Prediction time Currently, most of the time is spent on initializing the algorithm, i.e.
loading the data and the model and compiling the model for the
target platform. On an Intel Core-i7 4771 CPU and an nVidia Geforce
GTX 770 GPU, initialization takes several seconds, while evaluation of
the data currently takes 80ns for the proposed model.
Scalability As it stands, the number of parameters of a model of a given depth
scales linearly with the dimensionality of the input. As such, so do the
training and evaluation times.
Page | 40
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Consistency There is necessarily a stochastic element in training a deep learning
model, which is further increased by using a variational approach.
Nevertheless, a fully trained encoder will always output the same
probabilistic parameters for the same input.
Automated execution The current implementation requires a script that will invoke the
model with the correct parameters and input files. However, the
parameters do not change between invocations and the input file
may or may not change, depending on the measurement storage
process.
Table 7 Algorithms properties evaluation
Following requirement R4.2.3, we defined in Table 8 and Table 9 the different inputs (constraints)
on our algorithms:
Action Model training
Data Required JSON log file with the entire dataset
Data Type The set of data columns that contain measurements (for datasets
that have not been accounted for)
Desired model format defined by: code dimension, latent variable dimension, model
depth, length of input sequence
Table 8 Inputs for training Variational Autoencoders
Action Model execution
Data Required JSON log file with the sequence of data to be evaluated
Data Type Model parameter file, as produced by the training process
Desired model
format
defined by: code dimension, latent variable dimension, model depth,
length of input sequence
Table 9 Inputs for executing Variational Autoencoders
4.2.7 Code Availability & User Manual The repository containing the code to train and run the deep learning models used to identify
attacks and faults in the context of SecureIoT can be found at SecureIoT GitLab1. Information for
runtime requirements, training, running and testing the deep learning models can be found
below.
1 https://gitlab.atosresearch.eu/dashboard
Page | 41
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Prerequisites
This project uses Python 32 and the Pipenv3 package manager to handle dependencies. Currently,
the only top-level dependency is plaidml-keras4, the cross-platform deep learning framework we
use. It is still dependent on very specific versions of its dependencies, so a virtual environment
handled by pipenv is recommended.
• To install pipenv:
pip install pipenv
• Then, to install the dependencies in a virtual environment:
pipenv install
• If you don’t want to wait for dependency locking:
pipenv install --skip-lock
Training
To train a model based on the CAN or V2X application dataset, use train.py from pipenv. For
example:
pipenv run python train.py --can path_to_can_log --format 2 2 10 5
The --format argument defines the architecture of the model, run
pipenv run python train.py -h
for details.
The training runs indefinitely and automatically saves your model every 500 epochs. Press Ctrl-C
to interrupt.
If you want to train a model on a different dataset:
1. Create a class that inherits from models.DenseCAN
2. Override get_filtered_data() based on what are your measurement fields.
3. Override get_description() to give a different name to your model files.
4. If you want to use the same entry point, add a command line argument and calling code
to train.py (and run.py)
2 https://www.python.org/download/releases/3.0/ 3 https://pipenv.readthedocs.io/en/latest/ 4 https://pypi.org/project/plaidml-keras/
Page | 42
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
See how DenseV2X is handled for an example.
Run/Test
To run your model on new data, use run.py. For example:
pipenv run python run.py --model encoder_can_2_2_10_5.h5 --format 2 2 10
5 --can test_can.log
Example output:
Outputs:
[[0.40297362 0.13062605]]
Normal operation
The actual numbers will vary.
Other Info
You may receive a warning that the model is not compiled. It is actually compiled, we are just
using the encoding portion of the autoencoder, which does not have separate compilation
metadata.
On training a new model, you may get NaNs as the losses of the first epoch and beyond. This is a
non-deterministic, hardware-dependent issue where the very first gradients lead to numerical
instability. It is recommended that you just rerun the training command a few times and/or try a
different architecture (–format).
Page | 43
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
5 Zinrai AI services 5.1 Description Fujitsu systematized its set of AI-related technologies, products and services as Fujitsu Human
Centric AI Zinrai, and has been offering products and services. At the same time, Fujitsu has
worked with customers in co-creation efforts and field trials due to numerous demands in
utilizing AI.
Among other fields, Fujitsu Zinrai targets at social infrastructure, maintenance and
manufacturing providing especially image recognition technologies from Fujitsu. Zinrai
encompasses standardized platform services and the application of Standard elemental AI
frameworks for specific solutions
The API of Zinrai Platform Service (PaaS) is a programming interface group prepared for
customers to introduce Fujitsu AI technology easily and quickly using standardized building blocks
using the REST API. APIs are classified according to elemental technologies of AI, and APIs that
combine elemental technologies according to usage scenes, and so on. Among the usage
scenarios are:
• Smart Document Handling
o Handwritten character recognition
o Document Translation
o Large Document Knowledge extraction
o Document Knowledge information search
• Image Recognition
• Semantic Search
o Semantic Search by field of Specialization
(advanced document search)
o FAQ Search
o Company Information Search
• Sound Analysis
o Soundtext
o Speech synthesis
o Natural Sentence Analysis
• Emotion recognition
• Optimal Candidate Selection
• Smart Home control
Page | 44
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Those services usually are enriched by open elemental AI tools, e.g. Keras, Tensorflow, OpenPose,
h2o, which are applied either dedicatedly in the Zinrai context using a cloud service (PaaS) or on
premise. This way, Zinrai is a best of breed approach utilizing the most suitable technologies
concerning the specific problem in the Zinrai tool set. Therefore, a wide area of tools bringing
their specific environments and APIs regularly extend the Zinrai tool box.
Among those tools FUJITSU recommends:
• Clustering: scikit-learn
• Normalization: scipy, scikit, numpy
• Topological Data Analysis: scikit-tda
• Deep Learning: TensorFlow
5.2 Potential application to SecureIoT scenarios and datasets A major objective applying a standardized tool set based on services of the Zinrai platform for
detecting anomalies by SecureIoT was utilizing available building blocks for fast development.
However, recent observations illustrate the limitations of this kind of building block approach in
terms of flexibility and accuracy. The application needs to be mapped to the defined usage
scenarios and those modules lack of training to the specific problem beyond their scope.
Moreover, in the last 2 years, there has been a shift in the AI market towards a commoditization
of available AI frameworks. Due to the resulting high investment requirements for effective
solutions, existing OSS Machine Learning Frameworks have become relatively easy to use and
results with high precision can be obtained for almost any use case. Various manufacturers, e.g.
Facebook, Microsoft and Google also provide publicly available AI frameworks, which are
adapted to their background systems.
Based on this dynamic landscape, it currently makes sense to adapt the approach and utilize
common open AI frameworks them for their suitability and feasibility in the SECaaS services. Due
to the simplified deployment, applicability and the improved results in nowadays publicly
available AI frameworks Fujitsu regularly recommends the application of Keras in combination
with the latest TensorFlow version in AI related projects for rapid development. This combination
has demonstrated its comprehensive applicability and its suitability for a simple and rapid
development of solutions for a wide range of models.
Beyond this we see a promising development for rapid prototyping by the h2020
frameworkframeworkh20 framework, which may be assessed in the project.
5.2.3 Requirement mapping The table 5 below evaluates the algorithm regarding properties requires by requirement R4.2.2
Prediction time Prediction time depends on the selected model and the underlying
infrastructure. Considering Tensorflow, it supports optimal
performance by recommendations to the developer. Moreover, major
Page | 45
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
library providers like Intel have optimized their code for TensorFlow
application.
Scalability Thanks to the flexible architecture of TensorFlow, users can deploy
computation to one or more CPUs or GPUs in a desktop, server, or
mobile device with a single API. In a distributed TensorFlow work
process, it uses gRPC to connect between different nodes. However,
when deploying training tasks on high performance computing
clusters, the performance of gRPC becomes a bottleneck of
distributed TensorFlow system.
Hence, specific attention should be paid in the implementation of
TensorFlow and the design of the models for optimal scalability.
Consistency Consistency of results over various levels and systems is still a
challenge today and subject of research. It severely depends of
training conditions taking the various levels of the SecureIoT
architecture into account.
Automated execution A number of available PaaS implementations of TensorFlow illustrate
the capabilities of automating operations. For instance, tools like
Kubernetes and docker support the implementation of a TensorFlow
PaaS enabling scalable automated operations.
Table 10 Mapping of SecureIoT Requirements to TensorFlow
Following requirement R4.2.3, we defined below the different inputs (constraints) or our
algorithms.
Supported algorithms
and models
E.g. TensorFlow as an elemental technology support deliberate
algorithms and models.
Table 11 Supported Algorithms and Models
Page | 46
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
6 Network traffic anomaly detection The world of IoT is growing each year in different ways. On the one hand is changing the way
industry works, by being implemented in several scenarios and facilitating their work. On the
other hand, a great amount of data is being obtained from the devices, which was not even
dreamed some time ago. A single device, like a mobile phone, can manage and work with many
critical data that need to be protected. This paradigm has a critical threat in the form of security
of the system, as IoT devices are a very attractive objective. Still, with the change and evolving of
technologies (and threats) cybersecurity must shift from a reactive approach to a proactive
approach, understanding the threats before an attack can damage the system.
In this sense, predictive analytics is proving to be a potent solution to security. It enables IT
security departments to detect breaches and attacks in early stages, thus giving enough time to
take appropriate cybersecurity measures. This is done by identifying anomalies in known
behavior patterns and fortifying the cybersecurity infrastructure. It can analyze huge volumes of
data (including past) in order to understand the cause-effect patterns and provide information
about the sources of threats, probabilities, etc.
We propose to use a solution for network traffic anomaly detection based on unsupervised
machine learning algorithms that is able to identify anomalies up to a one-second interval. It is
able also to process large quantities of data, which is very relevant to the IoT domain, where
many devices manage and work with data. Our solution is generic and is aimed to be applicable
to different IoT platforms, such as FIWARE. Following we present more in detail our solution and
the planned strategy for application.
6.1 Live Anomaly Detection System using Machine Learning Methods (L-
ADS) The L-ADS uses a predictive strategy based on non-supervised methods of machine learning in
order to model automatically the behavior of users and/or applications and be able to detect
anomalies or significant deviations based in this behavior pattern. The analysis of the behavior is
based in the processed network traffic of the IoT devices.
The anomaly detection sensor could use two different algorithms for identifying anomalies: i)
one-class Support Vector Machine (SVM) and ii) Isolation Forest. These two non-supervised
algorithms evaluate the network traffic of the IoT devices (in particular, the head of the packages)
using evaluating factors such as, among others, number of connections between different
devices in the system under monitoring, connections and traffic between the ports used by the
devices, duration of the connections between the devices (or devices and a system), size of the
packages sent and received, origin and destination of the connections, etc. Any anomaly or
significant deviation in any of the parameters used for detection of anomalous behavior is
identified as an incident and an alarm is created for its analysis. We plan to evaluate which
Page | 47
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
algorithm fits better with the needs, requirements and performance constraints of the IoT
devices.
Additionally, we will use other metrics (e.g. accuracy, precision, recall) for evaluating the
percentage of false positives of the machine learning algorithms we use. The final goal of this
task is to better predict the false positives when we combine these solutions with more
traditional tools for predictive analytics such as heuristics and data signatures for detecting
anomalies or even information coming from different sources such as Open-Source Intelligence
(OSINT).
6.2 Architecture of the planned solution The architecture of the Live Anomaly Detection System (L-ADS) is composed of different modules
with specific functionalities such as data gathering, training of the machine learning models, etc.
Figure 31 shows a high-level representation of the modules, input and output, and the integration
with the different components of the SecureIoT solution from an architectural point of view.
Figure 31:High-level description of the L-ADS
The input of the L-ADS is the data of the network of the IoT devices after their headers are
analyzed. This information is received by our solution and processed for anomalies. The result of
the analysis is then provided together with the generated alerts. Following we provide a
description of each of the components of the architecture more in detail together with the
mapping to the SecureIoT architecture described in Section 3:
Page | 48
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
• Data preparation: it analyzes and prepares the input datasets for the training and prediction
process according to the features requested. Traffic is collected from the IoT devices (after
the initial analysis done by an application for the headers) in real-time and then formatted
and prepared for the next steps of the system. This initial formatting only prepares the data
for the analysis, as it already comes prepared from the real data under monitoring (headers
of the packages).
• Data analysis: this component is a multi-threaded socket-based server capable of listening to
multiple connections simultaneously. The data is analyzed using different strategies and sent
to the prediction service analyzer. Additionally, in case of training for the system (initialization
step), this component communicates with the training component using legitimate datasets
of the systems under monitoring.
• Training: this component uses machine learning to make predictions over the captured data.
The training uses a predefined time window that can vary according to the size of the data to
be analyzed. When the training process is under way, the dataset used for it is used with a
Principle Component Analysis (PCA). This component is in charge of reducing the
dimensionality of the dataset while keeping approximately 99% of the data.
• Storage of training datasets and models: it is used for storing and managing two different
data: processed training data and trained models. On the one hand the processed training
data is used as a representation of captured traffic, used for identifying malicious behavior.
On the other hand, the trained models are one-class SVM, modeling under a specific IP and
trained on the previously processed captured. Additionally, the component acts as a
temporary database for traffic captured in the system. The refresh interval can be modified
to better fit the needs of each specific system (e.g. 1 second, 5 seconds, 30 seconds, etc.)
• Configuration: the configuration component is used for managing the way the component
work and any variable that is used for the processing, data acquisition, output, etc.
• Prediction service and alerts: this component is in charge of analyzing the result of the data
analysis component and identify it as normal or anomalous. The result is based on the training
performed at the beginning of the process. It provides the output of the network-based
anomaly detector. Additionally, it generates alarms if the data processed are identified as
critical, which is also based on the learnt behavior models.
Additionally, the input of the system comes from the packets of the IoT devices and prepared in
order to compile and provide data of the required elements. The output of the solution (alarms
and traffic analysis) is provided to SecureIoT for storing in the database, further analysis or
showing to the users of the IoT devices under monitoring.
Page | 49
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
6.3 Integration in IoT platforms We are working currently in using this approach with deployments of IoT systems based on
FIWARE (Figure 32). We will recollect the data of the IoT systems compiled by the data probes of
SecureIoT and use it as input for analysis of malicious behavior. This will allow us to make
cybersecurity predictive analysis of the system and alert of possible threats before they can
impact the system.
Figure 32:. Use of L-ADS in FIWARE-aware SecureIoT devices
Although in a preliminary stage we think this approach will allow us to be able to work with IoT
devices implemented with FIWARE or other IoT platforms, just adapting the input for the data
probes of each system.
Application in use cases
Currently we plan to use this approach in the connected cars use case, which will be of many
benefit to us due to the large quantity of data exchanged and the needs for fast response times
due to the criticality of the system. In a first evaluation we did the cybersecurity requirements of
the use case fits with the benefits of this approach, both in terms of response and reaction times.
Regarding the other use cases we are evaluating its use in them.
Requirement mapping
Following, we present information of how this approach fits the properties specified by the
requirements of R4.2.2.
Prediction time The generation of the training models depends on how much data is provided. A usual batch of information we use in the testing/evaluation process of this tool (approximately 6000 register) takes approximately 10 minutes. On the other hand, the evaluation and response of the analytics is immediate. Of course, the training time depends on the number of features to be considered for evaluation.
Scalability Due to how it works, each instance of the L-ADS must be deployed in a network node because of accessibility to it. Therefore, no issues so far with scalability
Page | 50
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
Consistency Accuracy is good according to the testing we performed but we are working in improving it using several different models and sets of features
Automated execution The configuration of the entry point (packets) is manual but the rest of the processes are automatic. Providing data for the training is also manual and needs to be generated before the monitoring and evaluation of traffic can be done. The alarms and events generated are provided also in an automatic way, which can be then used for dashboards, reports, user evaluation, etc.
Additionally, and following requirement R4.2.3:
Action Model training and data analysis
Data Required Netflow data of the analyzed datasets
Data Type Different features according to what we want to analyze
Desired model format Refining for a more mature version
Page | 51
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
7 Conclusion In this deliverable, we mainly introduced three types of algorithms to support predictive security:
PM, VAE and unsupervised learning. From Zinrai services, we also identified OSS that may be used
to support the implementation of the proposed approach and network predictive analytics. Our
objective in this first deliverable of T4.2 is to bring the description of the mentioned algorithms
with preliminary experiments to show their expected work, performance and usability.
There will be two iterations of this work, with the next one planned on M19. Until this date, we
summarize here the next actions. First, we will define a data model for the outcomes of the
predictive algorithms. It will be done in cooperation with WP3 and by extending the model
defined in D4.1. Second, in relation with WP6, the objective is now to gather larger datasets that
includes valid data, attacks and malicious behavior as documented by WP2 (D2.1) to provide
more concrete results. Prototype developments will be shifted from standalone versions as they
are now to integrated versions in the SecureIoT architecture, using the interfaces to access data.
This implementation will be done using the GitLab of the project to ensure continuous
integration. In relation to WP5, we will particularly synchronize our activities with the risk
assessment and mitigation services (T5.1). Intelligent data-collection in WP3 would also need to
integrate prediction outcomes that will thus require synchronization. We expect M19 to have a
first integrated version of the prototypes for predictive security. From a research point of view,
the proposed techniques in this deliverable will continue to be refined according to results
obtained with other datasets and given scenarios. For instance, learned clusters thanks to VAE or
behavioral models from PM need to be refined afterwards to support a full predictive security.
Similar with the training models for the L-ADS from the use cases that plan to use these solutions.
Different approaches can be considered depending on cases (supervised vs. unsupervised) but
methods need to properly set to interpret the results, such as when replaying an unknown event
log with a built PM model.
Page | 52
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
References
[1] E. V. CW Günther, Xes standard definition, Fluxicon Process Laboratories, 2009.
[2] L. &. T. S. Latha, "Efficient approach to normalization of multimodal biometric scores,"
International Journal of Computer Applications, pp. 57-64, 2011.
[3] P. R. E. R. W. S. F.R. Hampel, Robust Statistics: The Approach Based on Influence
Functions, New York: Wiley, 1986.
[4] D. F. W. M. v. d. A. Sander JJ Leemans, "Discovering block-structured process models from
event logs-a constructive approach," in International conference on applications and
theory of Petri nets and concurrency, Berlin, Heidelberg, 2013.
[5] D. F. W. M. v. d. A. Sander JJ Leemans, "Discovering block-structured process models from
event logs containing infrequent behaviour," in International conference on business
process management, Champ, 2013.
[6] D. F. W. M. v. d. A. Sander JJ Leemans, "Discovering block-structured process models from
incomplete event logs," in International Conference on Applications and Theory of Petri
Nets and Concurrency, Champ, 2014.
[7] W. M. R. V. v. D. B. F. K. E. &. G. C. W. van der Aalst, "Process mining: A two-step approach
using transition systems and regions," BPM Center Report BPM-06-30, BPMcenter. org,
2006.
[8] A. A. B. v. D. Wil Van der Aalst, "Replaying history on process models for conformance
checking and performance analysis," Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, pp. 182-192, 2012.
[9] C. G. E. &. K. A. R. Boender, "A Bayesian analysis of the number of cells of a multinomial
distribution," The Statistician, pp. 240-248, 1983.
[10] D. Li and D. Yu, "Deep Learning Methods and Applications," in Foundations and Trends in
Signal Processing, Volume 7, Issues 3-4, 2014, pp. 197-387.
[11] M. Mohammadi, A. Al-Fuqaha, S. Sorour and M. Guizani, "Deep Learning for IoT Big Data
and Streaming Analytics: A Survey," IEEE Communications Surveys Tutorials, 2018.
[12] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas and J. Lloret, "Conditional Variational
Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT,"
Sensors (Basel), 26 August 2017.
Page | 53
Project Title: SecureIoT Contract No. 779899 Project Coordinator: INTRASOFT International S.A.
D4.3 - Tools and Techniques for Predictive IoT Security,
Version: v1.0- Final, Date 30/11/2018
[13] [Online]. Available: https://commons.wikimedia.org/wiki/File:Perceptron.svg.
[14] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy and P. T. P. Tang, "On Large-Batch
Training for Deep Learning: Generalization Gap and Sharp Minima," 2017.
[15] S. Ruder, "An overview of gradient descent optimization algorithms," 2016.
[16] T. Dozat, "Incorporating Nesterov Momentum into Adam," 2015.