SLALOM Project Technical Webinar 20151111

Post on 10-Jan-2017

282 views 0 download

transcript

SLALOM WebinarGeorge Kousiouris, Andreas Menychtas,

Dimosthenis KyriazisICCS/NTUA

2

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

3

SLALOM Project• EU funded project started on January 2015• Aims at developing a core SLA specification

– As a basis for service interactions between providers and customers– Considering current SLA landscape and research outcomes from various

projects and initiatives• Current status

– Analysis of SLA landscape, standardization efforts, relevant research outcomes– Development of SLA specification (v1) including main blocks and components

for each component– Development of abstract metric / function (v1) applicable to different metrics– Submission of our work to ISO SLA WG for standardization

• SLALOM is officially accepted as liaison body to ISO SLA WG• Attended and presented our work in the last ISO SLA WG meeting (Dublin)

4

Our Background• European Commission Expert Group on SLAs, http://

ec.europa.eu/digital-agenda/en/news/cloud-computing-service-level-agreements-exploitation-research-results

• Real time SLAs (IRMOS FP7 project, 2008-2011)– Real Time Cloud for enabling performance guarantees on soft real-time

applications via SLAs • Admission Control and Legal Aspects (OPTIMIS FP7 project, 2010-2013)

– Risk, Eco-efficiency and Cost as parameters– Data Location considerations, Contractual terms

• Abstracted Auditing/Monitoring of SLAs (ARTIST FP7 project, 2012-2015)– 3ALib abstracted library implementation– CloudML@ARTIST definition of a UML profile for SLA descriptions

5

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

6

SLA Specification Contribution (1/2)

• The proposed specification / reference model takes into account – Standardization approaches and working groups outcomes– Current SLAs offered by commercial cloud providers – Expressed views by cloud providers and adopters– Research outcomes

• With respect to ISO SLA WG outcomes– Follows ISO 19086-2: Core blocks (i.e. metric definition,

parameters definition, rule definition) and the corresponding elements (e.g. ID, name, unit, scale, etc) of the SLA

– Follows ISO 3534-2: Metric in different scales (such as interval, ratio, nominal or ordinal)

7

SLA Specification Contribution (2/2)

• With respect to ISO WG outcomes (cont.)– Suggests changes to ISO 19086-2: Naming of specific

elements (e.g. referenceId is used both for metricId for parameterId and for ruleId, while SLALOM proposes the use of different identifiers)

– Suggests changes to ISO 19086-2: Inclusion of additional elements and blocks:• Dependencies with other metrics - e.g. availability of

storage service and dependency to latency or response time and dependency to bandwidth

• Importance of a metric (i.e. gradeOfImportance)

8

SLA Specification / Reference model: Main blocks

• Follows ISO 19086-2 SLA specification• Metric

– Corresponds to the service metric / objective (e.g. availability)• Parameter

– Links the metric with a parameter to express how it can be monitored and validated (e.g. time to provide resources following an elasticity trigger)

• Rule– Refers to metric “constraints” (e.g. number of concurrent connections for

a number of users metric)• SLALOM proposed the addition of the Dependency block

– Captures dependencies between metrics (e.g. response time and bandwidth)

9

Components• Specific components are used for all building blocks

– ID– Name– Definition / Expression – Unit– Notes

• SLALOM proposed the following additions to the ISO specification– gradeOfImportance component for the Metric definition, to define

the metrics importance within an SLA– consequenceOfViolation component for the Rule definition, to define

the potential consequence of violation on the service provisioning

10

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

11

Abstract metric function / definition primary goal

• Have a standard that forces ambiguities to be clarified

• Help in the measurement/auditing process of an SLA– Especially by 3rd party providers– What is the purpose of having SLAs if one is not able to measure them

non-repudiably?

• Abstract the process

12

Example of ambiguity in the measurement process: AWS EC2 SLA

• ““Unavailable” and “Unavailability” mean:– For Amazon EC2, when all of your running instances have no

external connectivity.”

• Determination of external connectivity. How?– E.g. pinging (ICMP)?

• Security threat

– Application layer (endpoint checking)? • Includes application downtime (Not the responsibility of AWS EC2)

13

SLALOM 3-Layer Definition

Metric (Ratio) LayerFinal ratio calculation of the defined metric Correlates individual periods to the overall metric

Period (Time) LayerSize of period, limit of base period and on the error

rate neededAggregates samples at a time interval granularity,

deciding on this level

Sample (Measurement) layerBoolean expression based on concrete measurement

constraintsDictates success or not of a sample

14

Sample Layer• Sample Condition - sc: the condition stating whether a sample has been successful.

– operator: the operator can either be a boolean one (i.e. AND, OR, NOT) or a comparison operator (<, >, <=, >=, ==, !=).

– value: the actual value of the condition that can be arithmetic, non-arithmetic (e.g. a string such as “exception”) or an enumeration (e.g. HTTP response code == 200).

– unit: the unit for the value of the condition.• Sample - s: the sample used to evaluate a parameter against the condition sc.• Successful Sample - ss: the sample satisfying the condition sc.• Unsuccessful Sample - us: the sample not satisfying the condition sc.• Type of Operation field: defined sampling process e.g. pair of protocol- response

Sample definition For a given type of operation as specified in the corresponding field (described previously) sc = operator + value + unit ss = s if (sc is true) us = s if (sc is false)

15

Period Layer• Boundary Period - bp: the period for which the analysis of a parameter (through samples) should be

taken into account. Any sample that is not meeting this criterion (i.e. falls within the period) is excluded even though if it is successful (i.e. ss according to the sample definition).– operator: a comparison operator (<, >, <=, >=, ==, !=).– value: the actual arithmetic value of the condition.– unit: the unit in this case is always a time unit (e.g. seconds, minutes, etc).

• Error Condition - ec: the error condition ratio for which the analysis of a parameter (through samples) should be taken into account. The ratio is always expressed in a percentage (%) format.– operator: a comparison operator (<, >, <=, >=, ==, !=).– value: the actual arithmetic value of the condition.

• Error Ratio - er: the error ratio calculated based on the total set of samples and the successful samples.

• Period - p: the period in which samples (sc and uc) are examined according to the boundary period and the error condition.

• Valid Period - vp: the period for which the error ratio value meets the error condition ratio and the boundary period condition is also satisfied.

• Non-valid Period - np: the period for which the error ratio value does not meet the error condition ratio (the boundary period condition is satisfied).

Boundary period and error definitions bp = operator + value + unit ec = operator + value + % er = Σus/Σs us p ∀ ∈ vp = p if ((er<=ec) && (p>=bp)) np = p if ((er>=ec) && (p>=bp))

16

Metric Layer• Metric Condition - mc: the condition regarding a specific metric. The

condition is always expressed in a percentage (Ratio Type?) (%) format to enable its evaluation as proposed through the metric evaluation.– operator: a comparison operator (<, >, <=, >=, ==, !=).– value: the actual arithmetic value of the condition.

• Metric Evaluation - me: the evaluation of the metric based on the valid and non-valid period samples. The evaluation should be smaller than the condition (i.e. me <mc).

Abstract metric definition mc = operator + value + % me = Σnp / (Σvp+Σnp)

17

Mapping of EC2 SLA@SLALOMAmazon EC2

Level / definition Expression NotesSample definition sc: UNDEFINED (assumed ‘ping’->

ICMP)The sampling condition is not defined in the Amazon EC2 SLA. The concrete wording is “when all of your running instances have no external connectivity”. Nonetheless, the way to specify / measure “external connectivity” is not defined. For example a customer could use a ping operation or a custom monitoring mechanism.

Type of operation: ping Not defined how the condition of connectivity can be actually measured (e.g. the ping operation mentioned previously).

Boundary period and error definitions

bp > 60 sec The exact wording is “the percentage of minutes”, thus the period is 60 seconds.

ec = 100% Error condition reflecting that the error ratio is that for the entire bp the resource must be continuously “unavailable”.

Abstract metric definition

availability < 99.95 % Availability metric definition given the boundary period and error condition.

18

Mapping of SLALOM @ GAE DatastoreGoogle AppEngine Datastore

Level / definition Expression NotesSample definition sc: INTERNAL_ERROR Several sampling conditions are

defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls.

Type of operation: API calls Several type of operations are defined. An example is provided here.

Boundary period and error definitions

bp > 300 sec The exact wording is “five consecutive minutes”.

ec > 10% Error condition reflecting that the error ratio is (exact wording) “ten percent Error Rate”.

Abstract metric definition

availability < 99.95 % Availability metric definition given the boundary period and error condition.

19

Mapping of SLALOM @ Microsoft Azure SLA

Microsoft Azure StorageLevel / definition Expression NotesSample definition sc = 60 sec Several sampling conditions are defined

per type of operation. For example it is specified (exact wording) “Sixty (60) seconds” for PutBlockList and GetBlockList.

Type of operation: PutBlockList and GetBlockList

Several type of operations are defined. An example is provided here.

Boundary period and error definitions

bp > 3600 sec The exact wording is “given one-hour interval”.

ec > 0% Error condition reflecting that all periods should be taken into account for the availability metric evaluation (exact wording) “is the sum of Error Rates for each hour”.

Abstract metric definition

availability < 99.9 % Availability metric definition given the boundary period and error condition.

20

Preconditions• For any SLA to apply, a number of preconditions typically exist

per provider

• Examples– Deployment: Number of Availability Zones used

– Deployment: Replication options used

– Usage/Measurement: Restarting of resources when unavailable

– Usage/Measurement: Applied Throttling of requests

21

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

22

Snapshot of SLA contributions• An SLA specification / reference model (captured in SLALOM

SLA Specification and Reference Model v1: http://slalom-project.eu/content/d32-%E2%80%93-sla-specification-and-reference-model) – Follows and proposes changes to ISO 19086-2– Follows ISO 3534-2

• An abstract metric function / definition that can be exploited to specify any SLA metric (captured in SLALOM SLA Metric Specification v1: http://slalom-project.eu/content/slalom-sla-specification-v1-sep-2015)– Submitted as contribution to ISO 19086-2

23

Feedback

• Please provide us feedback on the presented concepts at:

• https://docs.google.com/forms/d/1Ljnc2x2WSaAXrWzHglDyy31xdCNC3qDiQqhC1FoxDZI/viewform

24

Any questions?

www.slalom-project.eu

25

3ALib (Availability SLA Benchmark and Auditor)

• Abstracted Availability Auditor Library (Java-based) Implementation:– Based on the conceptual abstractions of different providers SLAs– Abstracted at the code level, for efficient replacement of processes

and inclusion of providers• Purpose:

– Align monitoring with specific provider definitions and adapt availability calculations

– Check preconditions of SLA applicability for a specific deployment and give feedback

– Adapt to dynamic Cloud Services user behavior

26

Details for GAE Datastore SLA

27

Details for Azure SLA

Transaction Type Maximum Processing Time* PutBlob and GetBlob (includes blocks

and pages) Get Valid Page Blob Ranges

Two (2) seconds multiplied by the number of MBs transferred in the course of processing the request

Copy Blob Ninety (90) seconds (where the source and destination blobs are within the same storage account)

PutBlockList GetBlockList

Sixty (60) seconds

Table Query List Operations

Ten (10) seconds (to complete processing or return a continuation)

Batch Table Operations Thirty (30) seconds

All Single Entity Table Operations All other Blob and Message Operations

Two (2) seconds