+ All Categories
Home > Technology > SLALOM Project Technical Webinar 20151111

SLALOM Project Technical Webinar 20151111

Date post: 10-Jan-2017
Category:
Upload: oliver-barreto-rodriguez
View: 282 times
Download: 0 times
Share this document with a friend
27
SLALOM Webinar George Kousiouris, Andreas Menychtas, Dimosthenis Kyriazis ICCS/NTUA
Transcript
Page 1: SLALOM Project Technical Webinar 20151111

SLALOM WebinarGeorge Kousiouris, Andreas Menychtas,

Dimosthenis KyriazisICCS/NTUA

Page 2: SLALOM Project Technical Webinar 20151111

2

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

Page 3: SLALOM Project Technical Webinar 20151111

3

SLALOM Project• EU funded project started on January 2015• Aims at developing a core SLA specification

– As a basis for service interactions between providers and customers– Considering current SLA landscape and research outcomes from various

projects and initiatives• Current status

– Analysis of SLA landscape, standardization efforts, relevant research outcomes– Development of SLA specification (v1) including main blocks and components

for each component– Development of abstract metric / function (v1) applicable to different metrics– Submission of our work to ISO SLA WG for standardization

• SLALOM is officially accepted as liaison body to ISO SLA WG• Attended and presented our work in the last ISO SLA WG meeting (Dublin)

Page 4: SLALOM Project Technical Webinar 20151111

4

Our Background• European Commission Expert Group on SLAs, http://

ec.europa.eu/digital-agenda/en/news/cloud-computing-service-level-agreements-exploitation-research-results

• Real time SLAs (IRMOS FP7 project, 2008-2011)– Real Time Cloud for enabling performance guarantees on soft real-time

applications via SLAs • Admission Control and Legal Aspects (OPTIMIS FP7 project, 2010-2013)

– Risk, Eco-efficiency and Cost as parameters– Data Location considerations, Contractual terms

• Abstracted Auditing/Monitoring of SLAs (ARTIST FP7 project, 2012-2015)– 3ALib abstracted library implementation– CloudML@ARTIST definition of a UML profile for SLA descriptions

Page 5: SLALOM Project Technical Webinar 20151111

5

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

Page 6: SLALOM Project Technical Webinar 20151111

6

SLA Specification Contribution (1/2)

• The proposed specification / reference model takes into account – Standardization approaches and working groups outcomes– Current SLAs offered by commercial cloud providers – Expressed views by cloud providers and adopters– Research outcomes

• With respect to ISO SLA WG outcomes– Follows ISO 19086-2: Core blocks (i.e. metric definition,

parameters definition, rule definition) and the corresponding elements (e.g. ID, name, unit, scale, etc) of the SLA

– Follows ISO 3534-2: Metric in different scales (such as interval, ratio, nominal or ordinal)

Page 7: SLALOM Project Technical Webinar 20151111

7

SLA Specification Contribution (2/2)

• With respect to ISO WG outcomes (cont.)– Suggests changes to ISO 19086-2: Naming of specific

elements (e.g. referenceId is used both for metricId for parameterId and for ruleId, while SLALOM proposes the use of different identifiers)

– Suggests changes to ISO 19086-2: Inclusion of additional elements and blocks:• Dependencies with other metrics - e.g. availability of

storage service and dependency to latency or response time and dependency to bandwidth

• Importance of a metric (i.e. gradeOfImportance)

Page 8: SLALOM Project Technical Webinar 20151111

8

SLA Specification / Reference model: Main blocks

• Follows ISO 19086-2 SLA specification• Metric

– Corresponds to the service metric / objective (e.g. availability)• Parameter

– Links the metric with a parameter to express how it can be monitored and validated (e.g. time to provide resources following an elasticity trigger)

• Rule– Refers to metric “constraints” (e.g. number of concurrent connections for

a number of users metric)• SLALOM proposed the addition of the Dependency block

– Captures dependencies between metrics (e.g. response time and bandwidth)

Page 9: SLALOM Project Technical Webinar 20151111

9

Components• Specific components are used for all building blocks

– ID– Name– Definition / Expression – Unit– Notes

• SLALOM proposed the following additions to the ISO specification– gradeOfImportance component for the Metric definition, to define

the metrics importance within an SLA– consequenceOfViolation component for the Rule definition, to define

the potential consequence of violation on the service provisioning

Page 10: SLALOM Project Technical Webinar 20151111

10

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

Page 11: SLALOM Project Technical Webinar 20151111

11

Abstract metric function / definition primary goal

• Have a standard that forces ambiguities to be clarified

• Help in the measurement/auditing process of an SLA– Especially by 3rd party providers– What is the purpose of having SLAs if one is not able to measure them

non-repudiably?

• Abstract the process

Page 12: SLALOM Project Technical Webinar 20151111

12

Example of ambiguity in the measurement process: AWS EC2 SLA

• ““Unavailable” and “Unavailability” mean:– For Amazon EC2, when all of your running instances have no

external connectivity.”

• Determination of external connectivity. How?– E.g. pinging (ICMP)?

• Security threat

– Application layer (endpoint checking)? • Includes application downtime (Not the responsibility of AWS EC2)

Page 13: SLALOM Project Technical Webinar 20151111

13

SLALOM 3-Layer Definition

Metric (Ratio) LayerFinal ratio calculation of the defined metric Correlates individual periods to the overall metric

Period (Time) LayerSize of period, limit of base period and on the error

rate neededAggregates samples at a time interval granularity,

deciding on this level

Sample (Measurement) layerBoolean expression based on concrete measurement

constraintsDictates success or not of a sample

Page 14: SLALOM Project Technical Webinar 20151111

14

Sample Layer• Sample Condition - sc: the condition stating whether a sample has been successful.

– operator: the operator can either be a boolean one (i.e. AND, OR, NOT) or a comparison operator (<, >, <=, >=, ==, !=).

– value: the actual value of the condition that can be arithmetic, non-arithmetic (e.g. a string such as “exception”) or an enumeration (e.g. HTTP response code == 200).

– unit: the unit for the value of the condition.• Sample - s: the sample used to evaluate a parameter against the condition sc.• Successful Sample - ss: the sample satisfying the condition sc.• Unsuccessful Sample - us: the sample not satisfying the condition sc.• Type of Operation field: defined sampling process e.g. pair of protocol- response

Sample definition For a given type of operation as specified in the corresponding field (described previously) sc = operator + value + unit ss = s if (sc is true) us = s if (sc is false)

Page 15: SLALOM Project Technical Webinar 20151111

15

Period Layer• Boundary Period - bp: the period for which the analysis of a parameter (through samples) should be

taken into account. Any sample that is not meeting this criterion (i.e. falls within the period) is excluded even though if it is successful (i.e. ss according to the sample definition).– operator: a comparison operator (<, >, <=, >=, ==, !=).– value: the actual arithmetic value of the condition.– unit: the unit in this case is always a time unit (e.g. seconds, minutes, etc).

• Error Condition - ec: the error condition ratio for which the analysis of a parameter (through samples) should be taken into account. The ratio is always expressed in a percentage (%) format.– operator: a comparison operator (<, >, <=, >=, ==, !=).– value: the actual arithmetic value of the condition.

• Error Ratio - er: the error ratio calculated based on the total set of samples and the successful samples.

• Period - p: the period in which samples (sc and uc) are examined according to the boundary period and the error condition.

• Valid Period - vp: the period for which the error ratio value meets the error condition ratio and the boundary period condition is also satisfied.

• Non-valid Period - np: the period for which the error ratio value does not meet the error condition ratio (the boundary period condition is satisfied).

Boundary period and error definitions bp = operator + value + unit ec = operator + value + % er = Σus/Σs us p ∀ ∈ vp = p if ((er<=ec) && (p>=bp)) np = p if ((er>=ec) && (p>=bp))

Page 16: SLALOM Project Technical Webinar 20151111

16

Metric Layer• Metric Condition - mc: the condition regarding a specific metric. The

condition is always expressed in a percentage (Ratio Type?) (%) format to enable its evaluation as proposed through the metric evaluation.– operator: a comparison operator (<, >, <=, >=, ==, !=).– value: the actual arithmetic value of the condition.

• Metric Evaluation - me: the evaluation of the metric based on the valid and non-valid period samples. The evaluation should be smaller than the condition (i.e. me <mc).

Abstract metric definition mc = operator + value + % me = Σnp / (Σvp+Σnp)

Page 17: SLALOM Project Technical Webinar 20151111

17

Mapping of EC2 SLA@SLALOMAmazon EC2

Level / definition Expression NotesSample definition sc: UNDEFINED (assumed ‘ping’->

ICMP)The sampling condition is not defined in the Amazon EC2 SLA. The concrete wording is “when all of your running instances have no external connectivity”. Nonetheless, the way to specify / measure “external connectivity” is not defined. For example a customer could use a ping operation or a custom monitoring mechanism.

Type of operation: ping Not defined how the condition of connectivity can be actually measured (e.g. the ping operation mentioned previously).

Boundary period and error definitions

bp > 60 sec The exact wording is “the percentage of minutes”, thus the period is 60 seconds.

ec = 100% Error condition reflecting that the error ratio is that for the entire bp the resource must be continuously “unavailable”.

Abstract metric definition

availability < 99.95 % Availability metric definition given the boundary period and error condition.

Page 18: SLALOM Project Technical Webinar 20151111

18

Mapping of SLALOM @ GAE DatastoreGoogle AppEngine Datastore

Level / definition Expression NotesSample definition sc: INTERNAL_ERROR Several sampling conditions are

defined per type of operation. For example it is specified (exact wording) “INTERNAL_ERROR, TIMEOUT, …” for API calls.

Type of operation: API calls Several type of operations are defined. An example is provided here.

Boundary period and error definitions

bp > 300 sec The exact wording is “five consecutive minutes”.

ec > 10% Error condition reflecting that the error ratio is (exact wording) “ten percent Error Rate”.

Abstract metric definition

availability < 99.95 % Availability metric definition given the boundary period and error condition.

Page 19: SLALOM Project Technical Webinar 20151111

19

Mapping of SLALOM @ Microsoft Azure SLA

Microsoft Azure StorageLevel / definition Expression NotesSample definition sc = 60 sec Several sampling conditions are defined

per type of operation. For example it is specified (exact wording) “Sixty (60) seconds” for PutBlockList and GetBlockList.

Type of operation: PutBlockList and GetBlockList

Several type of operations are defined. An example is provided here.

Boundary period and error definitions

bp > 3600 sec The exact wording is “given one-hour interval”.

ec > 0% Error condition reflecting that all periods should be taken into account for the availability metric evaluation (exact wording) “is the sum of Error Rates for each hour”.

Abstract metric definition

availability < 99.9 % Availability metric definition given the boundary period and error condition.

Page 20: SLALOM Project Technical Webinar 20151111

20

Preconditions• For any SLA to apply, a number of preconditions typically exist

per provider

• Examples– Deployment: Number of Availability Zones used

– Deployment: Replication options used

– Usage/Measurement: Restarting of resources when unavailable

– Usage/Measurement: Applied Throttling of requests

Page 21: SLALOM Project Technical Webinar 20151111

21

Outline

• SLALOM Project• Our background• Overview of contributions• SLA specification / reference model• Abstract metric function / definition• Conclusions

Page 22: SLALOM Project Technical Webinar 20151111

22

Snapshot of SLA contributions• An SLA specification / reference model (captured in SLALOM

SLA Specification and Reference Model v1: http://slalom-project.eu/content/d32-%E2%80%93-sla-specification-and-reference-model) – Follows and proposes changes to ISO 19086-2– Follows ISO 3534-2

• An abstract metric function / definition that can be exploited to specify any SLA metric (captured in SLALOM SLA Metric Specification v1: http://slalom-project.eu/content/slalom-sla-specification-v1-sep-2015)– Submitted as contribution to ISO 19086-2

Page 23: SLALOM Project Technical Webinar 20151111

23

Feedback

• Please provide us feedback on the presented concepts at:

• https://docs.google.com/forms/d/1Ljnc2x2WSaAXrWzHglDyy31xdCNC3qDiQqhC1FoxDZI/viewform

Page 24: SLALOM Project Technical Webinar 20151111

24

Any questions?

www.slalom-project.eu

Page 25: SLALOM Project Technical Webinar 20151111

25

3ALib (Availability SLA Benchmark and Auditor)

• Abstracted Availability Auditor Library (Java-based) Implementation:– Based on the conceptual abstractions of different providers SLAs– Abstracted at the code level, for efficient replacement of processes

and inclusion of providers• Purpose:

– Align monitoring with specific provider definitions and adapt availability calculations

– Check preconditions of SLA applicability for a specific deployment and give feedback

– Adapt to dynamic Cloud Services user behavior

Page 26: SLALOM Project Technical Webinar 20151111

26

Details for GAE Datastore SLA

Page 27: SLALOM Project Technical Webinar 20151111

27

Details for Azure SLA

Transaction Type Maximum Processing Time* PutBlob and GetBlob (includes blocks

and pages) Get Valid Page Blob Ranges

Two (2) seconds multiplied by the number of MBs transferred in the course of processing the request

Copy Blob Ninety (90) seconds (where the source and destination blobs are within the same storage account)

PutBlockList GetBlockList

Sixty (60) seconds

Table Query List Operations

Ten (10) seconds (to complete processing or return a continuation)

Batch Table Operations Thirty (30) seconds

All Single Entity Table Operations All other Blob and Message Operations

Two (2) seconds


Recommended