Date post: | 28-Nov-2014 |
Category: |
Education |
Upload: | virtual-campus |
View: | 464 times |
Download: | 2 times |
www.s-cube-network.eu
S-Cube Learning Package
Service Level Agreements:
Runtime Prediction of SLA Violations Based on Service Event Logs
TU Wien (TUW), University of Stuttgart (USTUTT)
Philipp Leitner, TUW
© S-Cube
Learning Package Categorization
S-Cube
SBA Quality Management
Quality Assurance and Quality Prediction
Runtime Prediction of SLA Violations
Based on Service Event Logs
Learning Package Overview
Problem Description
Event-Based Runtime Prediction
Discussion
Conclusions
© S-Cube
Let’s Consider a Scenario (1)
Assume you are a provider of a composite service
Commercial customers order products via a Web service
interface
Your process checks an internal stock, orders some missing
parts, and assembles the product
© S-Cube
ReceiveOrder
CheckStock
SelectSupplier
ChargeCustomer
[everythingavailable]
Order FromSupplier 1
Order FromSupplier 2
Cancel Order
Ship Order[no supplieravailable]
Get PaymentPrefs
Let’s Consider a Scenario (2)
(Of course) there are contractual obligations
– Delivery in time
– Availability of your service
– Product quality
These can be formulated using Service Level Agreements
© S-Cube
ReceiveOrder
CheckStock
SelectSupplier
ChargeCustomer
[everythingavailable]
Order FromSupplier 1
Order FromSupplier 2
Cancel Order
Ship Order[no supplieravailable]
Get PaymentPrefs
– Note: in this context not necessarily
WSLA style machine processable
SLAs – can also be just regular
contracts
Let’s Consider a Scenario (3)
As a provider, you want to receive timely notifications if these
obligations are likely to be violated
– Usually there are penalties to be paid if contractual obligations are
(repeatedly) violated
– Even if this is not the case, customer satisfaction will suffer and
existing customers may terminate their contracts
© S-Cube
ReceiveOrder
CheckStock
SelectSupplier
ChargeCustomer
[everythingavailable]
Order FromSupplier 1
Order FromSupplier 2
Cancel Order
Ship Order[no supplieravailable]
Get PaymentPrefs!
Based on this notifications …
– … countermeasures can be taken (i.e.,
adaptations can be triggered)
– … customers can be notified
Learning Package Overview
Problem Description
Event-Based Runtime Prediction
Discussion
Conclusions
© S-Cube
One S-Cube Approach: Predictions from Event-Log Data
1. Define Checkpoints in the service composition
2. If checkpoints are passed, collect all (important and
available) runtime data …
3. … enrich with estimations for missing data (if possible) …
4. … and use Machine Learning techniques to generate
predictions for objectives
© S-Cube
Necessary Inputs
Evidently, there are some (not unreasonable) assumptions:
1. The provider needs to define a sensible checkpoint for prediction
( quality / timeliness tradeoff ! )
2. Important runtime data needs to be defined and monitored
3. Optimally, some estimations for data still missing in the checkpoint is
available
4. The system needs to be initialized with some historical data
( to learn the Machine Learning model that implements the actual
prediction)
© S-Cube
Types of Runtime Data
© S-Cube
Data that is used to generate predictions can be categorized
along 2 dimensions
Data Availability in Checkpoint
– Available (facts)
– Not available, but estimatable (estimates)
– Not available (unknown)
Type of Runtime Data
– External data (not monitorable, needs to be provided by external data
sources)
– Quality-of-Service information
– Domain-specific instance data (e.g., KPIs)
Defining Checkpoints (1)
ReceiveRFQ
ProduceOffer
C1 C2
Facts: {Customer, OrderedProduct, ...}
Estimates: {QoS_ExtSupplier, QoS_Warehouse, ...}
Unknown: {InStock, PaymentPrefs, ...}
{QoS_BankingService, ...}
{AssemblingTime, QoS_ExtSupplier, ...}
{PaymentPrefs, DeliveryTimeShipment}
OrderUnavailable
Parts
Assemble
QualityControl
Ship
ChargeCustomer
PredictionError
Time forReaction
Data
Serv
ice
Co
mp
ositio
nQ
uality
/ Tim
elin
ess
Tra
de
off
© S-Cube
Defining Checkpoints (2)
© S-Cube
There is a tradeoff to consider when defining checkpoints
– Early checkpoints are attractive as they allow for more / better
reactions there simply is more time left
– Later checkpoints are attractive as the prediction quality generally
improves with time
ReceiveRFQ
ProduceOffer
C1 C2
Facts: {Customer, OrderedProduct, ...}
Estimates: {QoS_ExtSupplier, QoS_Warehouse, ...}
Unknown: {InStock, PaymentPrefs, ...}
{QoS_BankingService, ...}
{AssemblingTime, QoS_ExtSupplier, ...}
{PaymentPrefs, DeliveryTimeShipment}
OrderUnavailable
Parts
Assemble
QualityControl
Ship
ChargeCustomer
PredictionError
Time forReaction
Data
Serv
ice
Co
mp
ositio
nQ
uality
/ Tim
elin
ess
Tra
de
off
Providers need to identify a
checkpoint where …
– … the most important Factors of
Influence are already available
– … adaptation is still possible
Background: Factors of Influence
Factors of Influence of business processes are the main
factors that lead to performance problems
Knowing these factors is essential for targeted optimization of
business processes
– i.e., if a factor is known to lead to performance problems optimization
needs to minimize this factor
Research work on identifying Factors of Influence:
© S-Cube
Wetzstein, Leitner, Rosenberg, Brandic, Dustdar, and Leymann. Monitoring and Analyzing Influential Factors of Business Process Performance. In Proceedings of the 13th IEEE international conference on Enterprise Distributed Object Computing
(EDOC'09). IEEE Press, Piscataway, NJ, USA, 118-127.
Bodenstaff, Wombacher, Reichert, and Jaeger. Monitoring Dependencies for SLAs: The MoDe4SLA Approach. In Proceedings of the 2008 IEEE International Conference on Services Computing (SCC'08). IEEE Computer Society, Washington, DC, USA, 21-29.
Estimates and External Data
One novelty of our approach: missing data can be
supplemented via Estimates
– Simple example:
- the response time of a service that is invoked after the Checkpoint
cannot be monitored …
- … but of course we can assume that the response time will be
similar to the measured average response time of this service
- the average response time can be used as an Estimate for this
unknown Factor of Influence
Technically, Estimates are not monitored from event streams
but provided by dedicated estimator components
Similarly, external data can be included via External Data
Providers
© S-Cube
Event-Based Monitoring
In order to be used for prediction, runtime data needs to be
monitored
Event-based monitoring is an often-used idea to implement
this
Basic principle:
Register for and receive some lifecycle events from the service
composition and use Complex Event Processing (CEP) to extract
monitoring data from raw event data.
Can be used to monitor both QoS and domain-specific data
© S-Cube
Implementing Event-Based Monitoring: Lifecycle Events (1)
Many composition engines emit lifecycle (status) events
Example 1: Apache ODE WS-BPEL Engine
– Currently emits 23 different types of events, including
ActivityExecStartEvent, ActivityExecEndEvent, VariableModificationEvent, …
– Full list available online
Example 2: Windows Workflow Foundation Tracking Service
– Emits a smaller number of events by default, but user-defined events
can be emitted at any time in a service composition
© S-Cube
Implementing Event-Based Monitoring: Lifecycle Events (2)
These low-level lifecycle events can be aggregated to
produce meaningful higher-level metrics (event processing)
– Example 1: the execution time of the supplier service is the timestamp
of the ‘activity-ended’ event minus the timestamp of the respective
‘activity-started’ event
– Example 2: the customer identifier of the customer who put the order
can be retrieved from the response message contained in a specific
‘variable-assigned’ event
– Example 3: the average failure rate of the shipping service is defined
as the number of ‘failed-execution’ events divided by the number of
‘execution-started’ events
However, event aggregation middleware necessary to do the
necessary calculations on the event streams
One possibility: SOA Event Engine implemented in VRESCo
© S-Cube
Background: The VRESCo SOA Runtime (1)
In S-Cube we used the VRESCo SOA Runtime Environment
as basis for this research
© S-Cube
Service
Client
SOAP
Services
measure
QoS
Monitor
VRESCo Client Library
DaiosClient
Factory
invoke
SOAP
VRESCo Runtime Environment
Registry
Database
Notification
Engine
Query
Engine
Composition
Engine
Query
Interface
Publishing
Interface
Metadata
Interface
Notification
Interface
Management
Interface
Composition
Interface
Publishing/
Metadata
Service
Management
Service
OR
M
La
ye
r
Acce
ss
Co
ntr
ol
Certificate
Store
Event
Database
Background: The VRESCo SOA Runtime (2)
VRESCo is a registry with explicit support for Quality-of-
Service and dynamic binding of services
Additionally, it’s very easy to build dynamic, loosely coupled
service compositions based on Workflow Foundation (WF)
technology in VRESCo
© S-Cube
Service
Client
SOAP
Services
measure
QoS
Monitor
VRESCo Client Library
DaiosClient
Factory
invoke
SOAP
VRESCo Runtime Environment
Registry
Database
Notification
Engine
Query
Engine
Composition
Engine
Query
Interface
Publishing
Interface
Metadata
Interface
Notification
Interface
Management
Interface
Composition
Interface
Publishing/
Metadata
Service
Management
Service
OR
M
La
ye
r
Acce
ss
Co
ntr
ol
Certificate
Store
Event
Database
These WF based compositions
emit lifecycle events as
discussed before using the
VRESCo Event Engine, a
complex event processing
engine based on NEsper
Background: The VRESCo SOA Runtime (3)
Check the following S-Cube paper for more information on
VRESCo:
The VRESCo Event Engine, which has been used as basis
for runtime data monitoring, has been introduced in the
following earlier publication:
© S-Cube
Michlmayr, Rosenberg, Leitner, and Dustdar. End-to-End Support for QoS-Aware Service Selection, Binding, and Mediation in VRESCo. IEEE Transactions on Services Computing 3, 3 (July 2010), 193-205.
Michlmayr, Rosenberg, Leitner, and Dustdar. Advanced Event Processing and Notifications in Service Runtime Environments. In Proceedings of the Second International Conference on Distributed Event-Based Systems (DEBS '08). ACM, New York, NY, USA, 115-125.
Architectural Overview of Prediction (1)
© S-Cube
Architectural Overview of Prediction (2)
© S-Cube
1. Define Checkpoint as discussed before
2. Define important Factors of Influence and monitor them
3. Train Machine Learning model from monitored data
– For quantitative objectives (e.g., execution time of composition) we
use Artificial Neural Networks
– For qualitative objectives (e.g., quality of product) we use Decision
Trees
4. In Checkpoint, generate
predictions using trained model
Background: Machine Learning (1)
© S-Cube
Machine Learning is a branch of computer science that deals
with algorithms that learn dependencies and relationships
from data
The basic principle is the idea of generalization, i.e., Machine
Learning algorithms aim at finding general rules and relations
in concrete data sets
Machine Learning is usually a two-step procedure:
1. Firstly, a predictor is learned from a training set
2. Secondly, the predictor can be applied to not-yet-seen data
There is a plethora of algorithms available for all kinds of
purposes, but in S-Cube we focused on Artificial Neural
Networks and Decision Trees
Background: Machine Learning (2)
© S-Cube
Artificial Neural Networks implement regression functions in a
bio-inspired way
Decision Trees implement classification functions using a
simple tree structure
How Accurate is my Prediction? (1)
Quality of predictor can be evaluated in two ways:
– Past data based (how well performs the predictor based on the
existing training data?)
– Future data based (how well performs the predictor when actually
used on not-yet-seen data?)
Metric for evaluation based on past data:
– Correlation coefficient of predictions and measured values in the
training set
© S-Cube
How Accurate is my Prediction? (2)
Unfortunately, this metric is prone to overfitting
– But still useful to judge the performance of a trained predictor before it
has been used the first time
During runtime it is better compare actual predictions with the
outcome of instances
– Mean Prediction Error (average difference between predicted value
and actual value)
– Prediction Error Standard Deviation (is the error relatively constant, or
are there many outliers?)
© S-Cube
VRESCo Open Source Implementation
The prototype used to do these experiments has been
developed as part of the VRESCo project
– Can be downloaded from Sourceforge
The Sourceforge package also includes the assembling case
study
– However, setup is not very user-friendly
– Get in touch with us if interested in reproducing the case study
Uses WEKA Machine Learning toolkit to implement the actual
prediction algorithms
© S-Cube
Learning Package Overview
Problem Description
Event-Based Runtime Prediction
Discussion
Conclusions
© S-Cube
Now Let’s Check How Accurate Predictions are (1)
ReceiveOrder
CheckStock
SelectSupplier
ChargeCustomer
[everythingavailable]
Order FromSupplier 1
Order FromSupplier 2
Cancel Order
ShipOrder
[no supplieravailable]
C1 C3
Get PaymentPrefs
C4 C5C2
16076
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Avg.
Err
or
[ms]
Mean Prediction Error
in Checkpoints
Error Standard Deviation
in Checkpoints
21581328 989 806
5030
2864
1541 1604 1516
0
1000
2000
3000
4000
5000
6000
7000
Error Std.Dev. [ms]
© S-Cube
Now Let’s Check How Accurate Predictions are (2)
© S-Cube
Prediction error is decreasing with time
In C2, the error is already reasonably small
ReceiveOrder
CheckStock
SelectSupplier
ChargeCustomer
[everythingavailable]
Order FromSupplier 1
Order FromSupplier 2
Cancel Order
ShipOrder
[no supplieravailable]
C1 C3
Get PaymentPrefs
C4 C5C2
16076
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Avg.
Err
or
[ms]
Mean Prediction Error
in Checkpoints
Error Standard Deviation
in Checkpoints
21581328 989 806
5030
2864
1541 1604 1516
0
1000
2000
3000
4000
5000
6000
7000
Error Std.Dev. [ms]
Hence, C2 may be an
interesting candidate for a
checkpoint in this case
Some Important Earlier Work
We were not the first ones to have similar ideas
Important earlier work includes:
© S-Cube
Sahai, Machiraju, Sayal, van Moorsel, and Casati. Automated SLA Monitoring for Web Services. In Proceedings of the 13th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2002). Springer-Verlag, Berlin, Heidelberg, 28-41.
Zeng, Lingenfelder, Lei, and Chang. Event-Driven Quality of Service Prediction. In Proceedings of the 6th International Conference on Service-Oriented Computing (ICSOC '08), Springer-Verlag, Berlin, Heidelberg, 147-161.
Main Advances Over Earlier Work
The S-Cube approach to prediction based on event logs
improves on earlier work in some important aspects:
– Estimations of missing data can be incorporated via Estimates
– External data can be incorporated via External Data Providers
– Many different algorithms can be used for prediction
- Courtesy of the WEKA backend
– Prototype implementation available as open source software
- Part of the VRESCo project
© S-Cube
Discussion - Advantages
Event log and machine learning based prediction of SLA
violations has a number of clear advantages …
– Simplicity – the principle approach is easy to understand
– Openness – the approach is not limited to event logs, but all kinds of
knowledge and data can be included in the prediction
– Proven in the real world – machine learning is by now a proven
technique that has been successfully applied in many areas
– Generality – the approach works both for quantitative and qualitative
objectives
– Efficiency – even though training of the machine learning models takes
some time, generating predictions is very fast
© S-Cube
Discussion - Disadvantages
… but of course the approach also has some disadvantages.
– Bootstrapping problem – the approach assumes that some recorded
historical event logs are available for training
– Necessary domain knowledge – in order to define checkpoints some
domain knowledge is necessary
– Availability of monitoring data – one of the basic assumptions of the
approach is that all necessary data can be monitored (if this is not the
case the approach cannot be used)
© S-Cube
Learning Package Overview
Problem Description
Event-Based Runtime Prediction
Discussion
Conclusions
© S-Cube
Summary
Machine learning based techniques can be used to predict
performance problems in service compositions
Steps:
1. Define a checkpoint in the composition
2. Train machine learning model from historical event log
3. Whenever a composition instance passes the checkpoint, use the
monitored data of the instance as input for the machine learning
based prediction
Allows us to quickly predict problems, so that
countermeasures can still be applied in time
– If the checkpoint is early enough that reaction is still possible …
© S-Cube
Further S-Cube Reading
© S-Cube
Leitner, Wetzstein, Rosenberg, Michlmayr, Dustdar, and Leymann. Runtime Prediction of Service Level Agreement Violations for Composite Services. In Proceedings of the 2009 International conference on Service-Oriented Computing (ICSOC/ServiceWave'09), Springer-Verlag, Berlin, Heidelberg, 176-186.
Leitner, Michlmayr, Rosenberg, and Dustdar. Monitoring, Prediction and Prevention of SLA Violations in Composite Services. In Proceedings of the 2010 IEEE International Conference on Web Services (ICWS '10). IEEE Computer Society, Washington, DC, USA, 369-376.
Wetzstein, Leitner, Rosenberg, Brandic, Dustdar, and Leymann. Monitoring and Analyzing Influential Factors of Business Process Performance. In Proceedings of the 13th IEEE international conference on Enterprise Distributed Object Computing
(EDOC'09). IEEE Press, Piscataway, NJ, USA, 118-127.
Acknowledgements
The research leading to these results has
received funding from the European
Community’s Seventh Framework
Programme [FP7/2007-2013] under grant
agreement 215483 (S-Cube).
© S-Cube