S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs

www.s-cube-network.eu

S-Cube Learning Package

Service Level Agreements:

Runtime Prediction of SLA Violations Based on Service Event Logs

TU Wien (TUW), University of Stuttgart (USTUTT)

Philipp Leitner, TUW

© S-Cube

Learning Package Categorization

S-Cube

SBA Quality Management

Quality Assurance and Quality Prediction

Runtime Prediction of SLA Violations

Based on Service Event Logs

Learning Package Overview

Problem Description

Event-Based Runtime Prediction

Discussion

Conclusions

© S-Cube

Let’s Consider a Scenario (1)

Assume you are a provider of a composite service

Commercial customers order products via a Web service

interface

Your process checks an internal stock, orders some missing

parts, and assembles the product

© S-Cube

ReceiveOrder

CheckStock

SelectSupplier

ChargeCustomer

[everythingavailable]

Order FromSupplier 1


Cancel Order

Ship Order[no supplieravailable]

Get PaymentPrefs

http://www.s-cube-network.eu/km/terms/s/service-composition


(Of course) there are contractual obligations

– Delivery in time

– Availability of your service

– Product quality

These can be formulated using Service Level Agreements

© S-Cube

ReceiveOrder

CheckStock

SelectSupplier

ChargeCustomer




Cancel Order


Get PaymentPrefs

– Note: in this context not necessarily

WSLA style machine processable

SLAs – can also be just regular

contracts

http://www.research.ibm.com/wsla/


As a provider, you want to receive timely notifications if these

obligations are likely to be violated

– Usually there are penalties to be paid if contractual obligations are

(repeatedly) violated

– Even if this is not the case, customer satisfaction will suffer and

existing customers may terminate their contracts

© S-Cube

ReceiveOrder

CheckStock

SelectSupplier

ChargeCustomer




Cancel Order


Get PaymentPrefs!

Based on this notifications …

– … countermeasures can be taken (i.e.,

adaptations can be triggered)

– … customers can be notified

http://www.s-cube-network.eu/km/terms/a/adaptation/


Problem Description


Discussion

Conclusions

© S-Cube

One S-Cube Approach: Predictions from Event-Log Data

1. Define Checkpoints in the service composition

2. If checkpoints are passed, collect all (important and

available) runtime data …

3. … enrich with estimations for missing data (if possible) …

4. … and use Machine Learning techniques to generate

predictions for objectives

© S-Cube

Necessary Inputs

Evidently, there are some (not unreasonable) assumptions:

1. The provider needs to define a sensible checkpoint for prediction

( quality / timeliness tradeoff ! )

2. Important runtime data needs to be defined and monitored

3. Optimally, some estimations for data still missing in the checkpoint is

available

4. The system needs to be initialized with some historical data

( to learn the Machine Learning model that implements the actual

prediction)

© S-Cube

Types of Runtime Data

© S-Cube

Data that is used to generate predictions can be categorized

along 2 dimensions

Data Availability in Checkpoint

– Available (facts)

– Not available, but estimatable (estimates)

– Not available (unknown)

Type of Runtime Data

– External data (not monitorable, needs to be provided by external data

sources)

– Quality-of-Service information

– Domain-specific instance data (e.g., KPIs)

http://www.s-cube-network.eu/km/terms/q/quality-of-service-qos/





http://www.s-cube-network.eu/km/terms/k/key-performance-indicator/

Defining Checkpoints (1)

ReceiveRFQ

ProduceOffer

C1 C2

Facts: {Customer, OrderedProduct, ...}

Estimates: {QoS_ExtSupplier, QoS_Warehouse, ...}

Unknown: {InStock, PaymentPrefs, ...}

{QoS_BankingService, ...}

{AssemblingTime, QoS_ExtSupplier, ...}

{PaymentPrefs, DeliveryTimeShipment}

OrderUnavailable

Parts

Assemble

QualityControl

Ship

ChargeCustomer

PredictionError

Time forReaction

Data

Serv

ice

Co

mp

ositio

nQ

uality

/ Tim

elin

ess

Tra

de

off

© S-Cube

Defining Checkpoints (2)

© S-Cube

There is a tradeoff to consider when defining checkpoints

– Early checkpoints are attractive as they allow for more / better

reactions there simply is more time left

– Later checkpoints are attractive as the prediction quality generally

improves with time

ReceiveRFQ

ProduceOffer

C1 C2

Facts: {Customer, OrderedProduct, ...}

Estimates: {QoS_ExtSupplier, QoS_Warehouse, ...}

Unknown: {InStock, PaymentPrefs, ...}

{QoS_BankingService, ...}

{AssemblingTime, QoS_ExtSupplier, ...}

{PaymentPrefs, DeliveryTimeShipment}

OrderUnavailable

Parts

Assemble

QualityControl

Ship

ChargeCustomer

PredictionError

Time forReaction

Data

Serv

ice

Co

mp

ositio

nQ

uality

/ Tim

elin

ess

Tra

de

off

Providers need to identify a

checkpoint where …

– … the most important Factors of

Influence are already available

– … adaptation is still possible

Background: Factors of Influence

Factors of Influence of business processes are the main

factors that lead to performance problems

Knowing these factors is essential for targeted optimization of

business processes

– i.e., if a factor is known to lead to performance problems optimization

needs to minimize this factor

Research work on identifying Factors of Influence:

© S-Cube

Wetzstein, Leitner, Rosenberg, Brandic, Dustdar, and Leymann. Monitoring and Analyzing Influential Factors of Business Process Performance. In Proceedings of the 13th IEEE international conference on Enterprise Distributed Object Computing

(EDOC'09). IEEE Press, Piscataway, NJ, USA, 118-127.

Bodenstaff, Wombacher, Reichert, and Jaeger. Monitoring Dependencies for SLAs: The MoDe4SLA Approach. In Proceedings of the 2008 IEEE International Conference on Services Computing (SCC'08). IEEE Computer Society, Washington, DC, USA, 21-29.

http://portal.acm.org/citation.cfm?id=1719370

http://portal.acm.org/citation.cfm?id=1447847&CFID=30193600&CFTOKEN=12178174

Estimates and External Data

One novelty of our approach: missing data can be

supplemented via Estimates

– Simple example:

- the response time of a service that is invoked after the Checkpoint

cannot be monitored …

- … but of course we can assume that the response time will be

similar to the measured average response time of this service

- the average response time can be used as an Estimate for this

unknown Factor of Influence

Technically, Estimates are not monitored from event streams

but provided by dedicated estimator components

Similarly, external data can be included via External Data

Providers

© S-Cube

Event-Based Monitoring

In order to be used for prediction, runtime data needs to be

monitored

Event-based monitoring is an often-used idea to implement

this

Basic principle:

Register for and receive some lifecycle events from the service

composition and use Complex Event Processing (CEP) to extract

monitoring data from raw event data.

Can be used to monitor both QoS and domain-specific data

© S-Cube

http://www.s-cube-network.eu/km/terms/m/monitoring

http://en.wikipedia.org/wiki/Complex_event_processing

Implementing Event-Based Monitoring: Lifecycle Events (1)

Many composition engines emit lifecycle (status) events

Example 1: Apache ODE WS-BPEL Engine

– Currently emits 23 different types of events, including

ActivityExecStartEvent, ActivityExecEndEvent, VariableModificationEvent, …

– Full list available online

Example 2: Windows Workflow Foundation Tracking Service

– Emits a smaller number of events by default, but user-defined events

can be emitted at any time in a service composition

© S-Cube

http://ode.apache.org/

http://ode.apache.org/ode-execution-events.html

http://msdn.microsoft.com/en-us/library/bb264458(v=vs.80).aspx

Implementing Event-Based Monitoring: Lifecycle Events (2)

These low-level lifecycle events can be aggregated to

produce meaningful higher-level metrics (event processing)

– Example 1: the execution time of the supplier service is the timestamp

of the ‘activity-ended’ event minus the timestamp of the respective

‘activity-started’ event

– Example 2: the customer identifier of the customer who put the order

can be retrieved from the response message contained in a specific

‘variable-assigned’ event

– Example 3: the average failure rate of the shipping service is defined

as the number of ‘failed-execution’ events divided by the number of

‘execution-started’ events

However, event aggregation middleware necessary to do the

necessary calculations on the event streams

One possibility: SOA Event Engine implemented in VRESCo

© S-Cube

Background: The VRESCo SOA Runtime (1)

In S-Cube we used the VRESCo SOA Runtime Environment

as basis for this research

© S-Cube

Service

Client

SOAP

Services

measure

QoS

Monitor

VRESCo Client Library

DaiosClient

Factory

invoke

SOAP

VRESCo Runtime Environment

Registry

Database

Notification

Engine

Query

Engine

Composition

Engine

Query

Interface

Publishing

Interface

Metadata

Interface

Notification

Interface

Management

Interface

Composition

Interface

Publishing/

Metadata

Service

Management

Service

OR

M

La

ye

r

Acce

ss

Co

ntr

ol

Certificate

Store

Event

Database


VRESCo is a registry with explicit support for Quality-of-

Service and dynamic binding of services

Additionally, it’s very easy to build dynamic, loosely coupled

service compositions based on Workflow Foundation (WF)

technology in VRESCo

© S-Cube

Service

Client

SOAP

Services

measure

QoS

Monitor

VRESCo Client Library

DaiosClient

Factory

invoke

SOAP

VRESCo Runtime Environment

Registry

Database

Notification

Engine

Query

Engine

Composition

Engine

Query

Interface

Publishing

Interface

Metadata

Interface

Notification

Interface

Management

Interface

Composition

Interface

Publishing/

Metadata

Service

Management

Service

OR

M

La

ye

r

Acce

ss

Co

ntr

ol

Certificate

Store

Event

Database

These WF based compositions

emit lifecycle events as

discussed before using the

VRESCo Event Engine, a

complex event processing

engine based on NEsper

http://msdn.microsoft.com/en-us/netframework/aa663328

http://esper.codehaus.org/about/nesper/nesper.html


Check the following S-Cube paper for more information on

VRESCo:

The VRESCo Event Engine, which has been used as basis

for runtime data monitoring, has been introduced in the

following earlier publication:

© S-Cube

Michlmayr, Rosenberg, Leitner, and Dustdar. End-to-End Support for QoS-Aware Service Selection, Binding, and Mediation in VRESCo. IEEE Transactions on Services Computing 3, 3 (July 2010), 193-205.

Michlmayr, Rosenberg, Leitner, and Dustdar. Advanced Event Processing and Notifications in Service Runtime Environments. In Proceedings of the Second International Conference on Distributed Event-Based Systems (DEBS '08). ACM, New York, NY, USA, 115-125.



Architectural Overview of Prediction (1)

© S-Cube

Architectural Overview of Prediction (2)

© S-Cube

1. Define Checkpoint as discussed before

2. Define important Factors of Influence and monitor them

3. Train Machine Learning model from monitored data

– For quantitative objectives (e.g., execution time of composition) we

use Artificial Neural Networks

– For qualitative objectives (e.g., quality of product) we use Decision

Trees

4. In Checkpoint, generate

predictions using trained model

Background: Machine Learning (1)

© S-Cube

Machine Learning is a branch of computer science that deals

with algorithms that learn dependencies and relationships

from data

The basic principle is the idea of generalization, i.e., Machine

Learning algorithms aim at finding general rules and relations

in concrete data sets

Machine Learning is usually a two-step procedure:

1. Firstly, a predictor is learned from a training set

2. Secondly, the predictor can be applied to not-yet-seen data

There is a plethora of algorithms available for all kinds of

purposes, but in S-Cube we focused on Artificial Neural

Networks and Decision Trees

http://en.wikipedia.org/wiki/Machine_learning

Background: Machine Learning (2)

© S-Cube

Artificial Neural Networks implement regression functions in a

bio-inspired way

Decision Trees implement classification functions using a

simple tree structure

http://en.wikipedia.org/wiki/Artificial_neural_network

http://en.wikipedia.org/wiki/Decision_trees

How Accurate is my Prediction? (1)

Quality of predictor can be evaluated in two ways:

– Past data based (how well performs the predictor based on the

existing training data?)

– Future data based (how well performs the predictor when actually

used on not-yet-seen data?)

Metric for evaluation based on past data:

– Correlation coefficient of predictions and measured values in the

training set

© S-Cube

How Accurate is my Prediction? (2)

Unfortunately, this metric is prone to overfitting

– But still useful to judge the performance of a trained predictor before it

has been used the first time

During runtime it is better compare actual predictions with the

outcome of instances

– Mean Prediction Error (average difference between predicted value

and actual value)

– Prediction Error Standard Deviation (is the error relatively constant, or

are there many outliers?)

© S-Cube

VRESCo Open Source Implementation

The prototype used to do these experiments has been

developed as part of the VRESCo project

– Can be downloaded from Sourceforge

The Sourceforge package also includes the assembling case

study

– However, setup is not very user-friendly

– Get in touch with us if interested in reproducing the case study

Uses WEKA Machine Learning toolkit to implement the actual

prediction algorithms

© S-Cube

http://sourceforge.net/projects/vresco/

mailto:[email protected]

http://www.cs.waikato.ac.nz/ml/weka/


Problem Description


Discussion

Conclusions

© S-Cube

Now Let’s Check How Accurate Predictions are (1)

ReceiveOrder

CheckStock

SelectSupplier

ChargeCustomer




Cancel Order

ShipOrder

[no supplieravailable]

C1 C3

Get PaymentPrefs

C4 C5C2

16076

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Avg.

Err

or

[ms]

Mean Prediction Error

in Checkpoints

Error Standard Deviation

in Checkpoints

21581328 989 806

5030

2864

1541 1604 1516

0

1000

2000

3000

4000

5000

6000

7000

Error Std.Dev. [ms]

© S-Cube

Now Let’s Check How Accurate Predictions are (2)

© S-Cube

Prediction error is decreasing with time

In C2, the error is already reasonably small

ReceiveOrder

CheckStock

SelectSupplier

ChargeCustomer




Cancel Order

ShipOrder

[no supplieravailable]

C1 C3

Get PaymentPrefs

C4 C5C2

16076

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Avg.

Err

or

[ms]

Mean Prediction Error

in Checkpoints

Error Standard Deviation

in Checkpoints

21581328 989 806

5030

2864

1541 1604 1516

0

1000

2000

3000

4000

5000

6000

7000

Error Std.Dev. [ms]

Hence, C2 may be an

interesting candidate for a

checkpoint in this case

Some Important Earlier Work

We were not the first ones to have similar ideas

Important earlier work includes:

© S-Cube

Sahai, Machiraju, Sayal, van Moorsel, and Casati. Automated SLA Monitoring for Web Services. In Proceedings of the 13th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2002). Springer-Verlag, Berlin, Heidelberg, 28-41.

Zeng, Lingenfelder, Lei, and Chang. Event-Driven Quality of Service Prediction. In Proceedings of the 6th International Conference on Service-Oriented Computing (ICSOC '08), Springer-Verlag, Berlin, Heidelberg, 147-161.



Main Advances Over Earlier Work

The S-Cube approach to prediction based on event logs

improves on earlier work in some important aspects:

– Estimations of missing data can be incorporated via Estimates

– External data can be incorporated via External Data Providers

– Many different algorithms can be used for prediction

- Courtesy of the WEKA backend

– Prototype implementation available as open source software

- Part of the VRESCo project

© S-Cube

Discussion - Advantages

Event log and machine learning based prediction of SLA

violations has a number of clear advantages …

– Simplicity – the principle approach is easy to understand

– Openness – the approach is not limited to event logs, but all kinds of

knowledge and data can be included in the prediction

– Proven in the real world – machine learning is by now a proven

technique that has been successfully applied in many areas

– Generality – the approach works both for quantitative and qualitative

objectives

– Efficiency – even though training of the machine learning models takes

some time, generating predictions is very fast

© S-Cube

Discussion - Disadvantages

… but of course the approach also has some disadvantages.

– Bootstrapping problem – the approach assumes that some recorded

historical event logs are available for training

– Necessary domain knowledge – in order to define checkpoints some

domain knowledge is necessary

– Availability of monitoring data – one of the basic assumptions of the

approach is that all necessary data can be monitored (if this is not the

case the approach cannot be used)

© S-Cube


Problem Description


Discussion

Conclusions

© S-Cube

Summary

Machine learning based techniques can be used to predict

performance problems in service compositions

Steps:

1. Define a checkpoint in the composition

2. Train machine learning model from historical event log

3. Whenever a composition instance passes the checkpoint, use the

monitored data of the instance as input for the machine learning

based prediction

Allows us to quickly predict problems, so that

countermeasures can still be applied in time

– If the checkpoint is early enough that reaction is still possible …

© S-Cube

Further S-Cube Reading

© S-Cube

Leitner, Wetzstein, Rosenberg, Michlmayr, Dustdar, and Leymann. Runtime Prediction of Service Level Agreement Violations for Composite Services. In Proceedings of the 2009 International conference on Service-Oriented Computing (ICSOC/ServiceWave'09), Springer-Verlag, Berlin, Heidelberg, 176-186.

Leitner, Michlmayr, Rosenberg, and Dustdar. Monitoring, Prediction and Prevention of SLA Violations in Composite Services. In Proceedings of the 2010 IEEE International Conference on Web Services (ICWS '10). IEEE Computer Society, Washington, DC, USA, 369-376.

Wetzstein, Leitner, Rosenberg, Brandic, Dustdar, and Leymann. Monitoring and Analyzing Influential Factors of Business Process Performance. In Proceedings of the 13th IEEE international conference on Enterprise Distributed Object Computing

(EDOC'09). IEEE Press, Piscataway, NJ, USA, 118-127.




Acknowledgements

The research leading to these results has

received funding from the European

Community’s Seventh Framework

Programme [FP7/2007-2013] under grant

agreement 215483 (S-Cube).

© S-Cube

Date post:	28-Nov-2014
Category:	Education
Upload:	virtual-campus
View:	464 times
Download:	2 times

S-CUBE LP: Runtime Prediction of SLA Violations Based on Service Event Logs

Education