+ All Categories
Home > Documents > Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions,...

Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions,...

Date post: 09-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
66
Complete Dynamic Multi-cloud Application Management Project no. 644925 Innovation Action Co-funded by the Horizon 2020 Framework Programme of the European Union Call identifier: H2020-ICT-2014-1 Topic: ICT-07-2014 – Advanced Cloud Infrastructures and Services Start date of project: January 1 st , 2015 (36 months duration) Deliverable D6.3 Solutions for Non-functional Aspects of Cloud Computing Due date: 30/11/2015 Submission date: 19/12/2015 Deliverable leader: UvA Editors list: M. Z ivkovic (UvA), C. Loomis (SixSq)
Transcript
Page 1: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

CompleteDynamicMulti-cloudApplicationManagement

Projectno.644925

InnovationAction

Co-fundedbytheHorizon2020FrameworkProgrammeoftheEuropeanUnion

Callidentifier:H2020-ICT-2014-1

Topic:ICT-07-2014–AdvancedCloudInfrastructuresandServices

Startdateofproject:January1st,2015(36monthsduration)

DeliverableD6.3

SolutionsforNon-functionalAspectsofCloudComputing

Duedate: 30/11/2015

Submissiondate: 19/12/2015

Deliverableleader: UvA

Editorslist: M.Zivkovic(UvA),C.Loomis(SixSq)

Page 2: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925D6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page2of66

DisseminationLevel PU: Public PP: Restrictedtootherprogrammeparticipants(includingtheCommissionServices) RE: Restrictedtoagroupspecifiedbytheconsortium(includingtheCommissionServices)

CO:

Confidential,onlyformembersoftheconsortium(includingtheCommissionServices)

Page 3: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page3of66

ListofContributors

Participant Short Name Contributor

Interoute S.P.A. IRT

SixSq Sàrl SIXSQ C. Loomis, K. Skaburskas, S. Tavera, L. Schaub, K. Basbous

QSC AG QSC

Technische Universitaet Berlin TUB

Fundacio Privada I2CAT, Internet I Innovacio Digital A Catalunya

I2CAT J. Aznar

Universiteit Van Amsterdam UVA M. Zivkovic

Centre National De La Recherche Scientifique

CNRS

Page 4: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page4of66

Changehistory

Version Date Partners Description/Comments

0.1 07/12/2016 SixSq, UvA

Initial copy into Word format

0.2 12/12/2016 SixSq, UvA

First complete version for review.

0.3 14/12/2016 I2CAT Incorporate comments from J. Aznar.

1.0 15/12/2016 SixSq, UvA

Improved SoTA, final corrections.

Page 5: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page5of66

TableofContentsListofContributors..................................................................................................................3

Changehistory........................................................................................................................4

ListofFigures..........................................................................................................................7

ListofTables...........................................................................................................................8

ExecutiveSummary.................................................................................................................9

1. Introduction.......................................................................................................................10

2.Motivation.........................................................................................................................122.1.UseCases........................................................................................................................................122.1.1. GenericUseCase1:Provisioning(UC1)..................................................................................122.1.2. GenericUseCase2:Redeployment(UC2)..............................................................................122.1.3. GenericUseCase3:Monitoring(UC3)...................................................................................132.1.4. ScientificImageProcessing(UC7)...........................................................................................132.1.5. BenchmarkDrivenPlacement(UC8).......................................................................................13

2.2. Requirements..................................................................................................................................142.3. StateoftheArt...............................................................................................................................142.3.1. mOSAIC...................................................................................................................................152.3.2. Cloud4SOA..............................................................................................................................162.3.3. OPTIMIS..................................................................................................................................162.3.4. MODAClouds...........................................................................................................................17

3. ProposedSolution..............................................................................................................193.1.AnomalyDetection.........................................................................................................................193.2.ApplicationOptimization................................................................................................................20

4. SlipStreamComponentsforAuto-scaling...........................................................................214.1. SlipStreamArchitecture..................................................................................................................214.2.ApplicationDescriptionRepository.................................................................................................224.3.DeploymentEngine.........................................................................................................................234.3.1. PlacementandRankingService..............................................................................................234.3.2. Provisioning............................................................................................................................254.3.3. Coordination...........................................................................................................................26

4.4.Monitoring/OptimizationInfrastructure........................................................................................274.5. ServiceCatalog...............................................................................................................................284.5.1. SourcesofInformation...........................................................................................................284.5.2. ServiceCatalogResources......................................................................................................284.5.3. Filtering/QueryLanguage.......................................................................................................29

4.6.AccessingSlipStream......................................................................................................................304.7.Auto-scalingofUserApplications...................................................................................................304.7.1. ConfigurationofAuto-scaleConstraints.................................................................................314.7.2. AutoscalerComponent...........................................................................................................324.7.3. PublishingComponentMetricstoAutoscaler........................................................................33

4.8. ExampleAuto-scaleApplication.....................................................................................................334.8.1. ApplicationConfigurationandDeployment...........................................................................34

Page 6: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page6of66

4.8.2. UsageoftheApplicationAfterDeployment...........................................................................36

5. AnalyticsEngine.................................................................................................................395.1. TheoreticalBackground..................................................................................................................395.1.1. OptimalApplicationSelection................................................................................................40

5.2.OptimalApplicationSubstitutionandOptimalRedeploymentofApplications..............................405.2.1. Introduction............................................................................................................................415.2.2. ProblemDefinition,Model,andAnalysis...............................................................................42

5.3. FrequencyofPerformanceUpdatesfromApplications..................................................................445.4. PerformanceDeviation...................................................................................................................465.4.1. Background.............................................................................................................................465.4.2. ProblemDescription...............................................................................................................47

5.5. EvaluatingChange-pointDetectionMethods.................................................................................485.5.1. TheAlgorithmofBrodskyandDarkhovsky.............................................................................485.5.2. ChangeinVariance.................................................................................................................495.5.3. MultidimensionalChange-pointDetection.............................................................................495.5.4. GeneralConclusions...............................................................................................................50

6. SlipStreamComparativeAnalysis.......................................................................................526.1. CloudManagementSolutions........................................................................................................536.1.1. Competitororcompanion?....................................................................................................56

6.2. ConfigurationManagementSolutions...........................................................................................566.2.1. Competitororcompanion?....................................................................................................57

6.3. ToolsandLibraries..........................................................................................................................576.3.1. Competitororcompanion?....................................................................................................58

6.4. ContainerManagementSolutions..................................................................................................586.4.1. Competitororcompanion?....................................................................................................59

6.5. PaaS................................................................................................................................................606.5.1. Competitororcompanion?....................................................................................................61

6.6. BrokerageSolutionsorServices......................................................................................................616.6.1. Competitororcompanion?....................................................................................................62

7. Conclusions........................................................................................................................63

References.............................................................................................................................64

Glossary.................................................................................................................................66

Page 7: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page7of66

ListofFigures

Figure1:ConceptualArchitectureofMODAClouds........................................................................................17Figure2:TheQoS-controlArchitectureandInteractionwithSlipStream.......................................................19Figure3:FunctionalBlocksofSlipStream.......................................................................................................22Figure4:DeploymentStates...........................................................................................................................25Figure5:ScalingStates....................................................................................................................................26Figure6:AutoscalerandScalableApplication................................................................................................33Figure7:ComponentsofExampleAuto-scaleApplication.............................................................................34Figure8:ExampleScalableApplicationDefinitioninSlipStream....................................................................35Figure9:DeploymentofExampleScalableApplication..................................................................................35Figure10:ApplicationEntryPoint...................................................................................................................37Figure11:Locust:DefiningWork(as3Users)toLoadtheApplication..........................................................37Figure12:Riemann-dash:HighLoadontheWebLayer.................................................................................38Figure13:AverageResponseTimeasaFunctionofTimeinGraphite...........................................................38Figure14:ApplicationSubstitution.................................................................................................................44Figure15:EvaluationGridforExpectedRevenueinCaseofApplicationSubstitution...................................44Figure16:ClosedLoopControlApproach.......................................................................................................45Figure17:OccurrenceofChangeofMeaninTimeSeries..............................................................................47Figure18:AnExampleofBrodsky-DarkhovskyAlgorithm..............................................................................49Figure19:AlgorithmbyGaleanoandPeña.....................................................................................................51

Page 8: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page8of66

ListofTables

Table1:DerivedRequirements.......................................................................................................................14Table2:SlipStreamApplicationParameters...................................................................................................27Table3:IaaSvendors(publicandprivate)vsSlipStream/Nuvla...................................................................54Table4:CloudResourcesManagementvsSlipStream/Nuvla.......................................................................54Table5:Applicationmanagement,inthecloudvsSlipStream/Nuvla..........................................................55Table6:Applicationmanagement,inthecloudvsSlipStream/Nuvla..........................................................57Table7:ToolsandlibrariesvsSlipStream/Nuvla..........................................................................................58Table8:ContainermanagementsolutionsvsSlipStream/Nuvla..................................................................59Table9:PaaSvsSlipStream/Nuvla................................................................................................................60Table10:PaaSvsSlipStream/Nuvla..............................................................................................................62Table11:StatusofRequirements...................................................................................................................63

Page 9: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page9of66

ExecutiveSummary

Non-functional aspects, such as cloud application cost and performance, have a strong impact on themanagement of cloud applications. Both the research community and the main cloud providers havealreadydevelopedauto-scalingtoolsandsolutions.Thesetoolsavoidunderutilizationandoverutilizationofthe cloud resourceswhile stillmaintaining a good level of quality for the hosted services. This has twoadvantages: for the customer, this reduces the total costs, while the cloud provider can serve morecustomerswiththesameinfrastructure.Cloudproviders’solutions,however,arelimitedintermsofprices,availability,reliability,andconnectivitybecausetheyarelockedtoasinglevendor.

Building on the existing work, this document describes the CYCLONE solution for dealing with non-functional and functional aspects of cloud computing to ensure that an application has the optimalresourcesallocatedtoitoveritsfulllifecycle.

The solution relies on SlipStream (with extensions) to handle the provisioning and re-provisioning ofresources as the operating conditions for a cloud application change over time. All components ofSlipStream contribute to the solution: TheWorkspace for application policy definition, the deploymentenginefor(re-)allocatingresources,themonitoringtoprovideperformancedata,andtheServiceCatalogfor findingappropriatecloud resources.SlipStreamalsoprovidesanexampleapplication todemonstratehowauto-scalingofapplicationscanbeaccomplished.

Inamulti-cloud,multi-user,multi-applicationenvironment,therecannotbeauniquesolutionthatappliesin all cases. The specific decisions on what actions to take (scale-up, scale-down, migration, etc.) aredefined by an application-specific policy using input from general algorithms for detecting changes inmetrics. We, however, do propose general algorithms to determine when changes in non-functionalmetrics occur and allow the application developer/operator to define the actions to take when thosechangesaredetected.

Advancedimplementationsoftheauto-scalingSlipStreamfeaturesandthealgorithmstodetectchangesinapplicationmetricsexist.Allthatremainsistobringthesetwocomponentstogethertovalidatetheoverallsolutionandtodemonstratethedefinedusecases.

The document also reviewed the state of the art for autoscaling, both in the academic and commercialworlds.ThecomparisonofSlipStreamwithcompetingproductsandserviceshasshownthatmanyofthemoffercomparablefeatures,butonlyforsubsetsofSlipStream’sfull featureset.SlipStreamisstill theonlyproducttoofferacomprehensivemulti-cloudsolutionforallaspectsofapplicationmanagement(includingscaling)andbrokering.

Page 10: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page10of66

1. Introduction

Cloud computing infrastructures are a recognized alternative to buying and maintaining custom ITinfrastructures on premise. The primary economic attraction is the ability to pay for the computationalresourcesonlywhentheyareneeded—ashiftfromcapitalexpenditurestooperationalexpenditures.Theflexibility to change the allocated resource dynamically, known as elasticity, is the basis of the utilitycomputingmodel.

The dynamic provisioning of virtualized resources offered by cloud computing infrastructures allowsapplications deployed in a cloud environment to increase and decrease automatically the allocatedresources. Themain purpose of this capability, known as auto-scaling, is tominimize automatically theresourcesallocated toanapplicationwhile satisfying thevaryingworkloadandperformanceconstraints.Theneedforauto-scaling isparticularly importantduringworkloadpeaks,duringwhichapplicationsmayneedtoscaleup toextremely large-scalesystems.Non-functionalaspects, suchascloudapplicationcostandperformance,haveastrongimpactonthemanagementandauto-scalingofcloudapplications.

Boththeresearchcommunityandthemaincloudprovidershavealreadydevelopedauto-scalingtoolsandsolutions. These tools avoid underutilization and overutilization of the cloud resources while stillmaintainingagoodlevelofqualityforthehostedservices.Thishastwoadvantages:forthecustomer,thisreducesthetotalcosts,whilethecloudprovidercanservemorecustomerswiththesameinfrastructure.Cloudproviders’solutions,however,arelimitedintermsofprices,availability,reliability,andconnectivitybecausetheyarelockedtoasinglevendor.

Theselimitationsmotivatethecurrentresearcheffortsincloudcomputingtofindefficientwaystoexploitmultiple cloud infrastructures for thedeploymentofanapplicationor service.Thecapability to federatedifferentcloudprovidersandcloudservicesbecomesparticularlyusefulwhensmallercompanieswanttoprovide unique offers in the competitive cloud market. With a federated or hybrid-cloud system,applicationoperatorscanmaketheirowntrade-offsfromcloudserviceproviderswithvaryingcapabilities,guarantees,andstability.

The types of application or service operators that can profit from such platforms are varied. Examplesinclude multimedia (audio and video) transcoding, scientific processing, strong encryption/decryption,distributedcompilation,translation,andsolvingNP-hardproblemsfromamodel.Manycompaniesalreadyoffer such services through the Internet, such as Zencoder [ZEN16] for video file transcoding. Thesecompanieshaveastrategicinterestinreducingthecostsforinstantiatingtheirserviceswhilemaintainingthesatisfactionoftheirpayingcustomers.

We present here the applicationmanagement platform for the CYCLONE that takes advantage of cloudfederation toavoid the limitationsofasinglecloudprovider,whilemaintainingagoodqualityof service(QoS) for the customers and minimizing infrastructure costs for the service provider. The solution isdecentralizedandself-organizing;noneofthecloudprovidersinthefederationhaveacentralroleinhowresourcesfortheserviceareplacedwithinthecloudproviders.Thesolutionisalsoself-adapting,reducingthehumaneffort requiredwhen redeployingand reconfiguring the system in caseofa serious failure inonecloudprovider.

Thedocumentfirstdescribesthemotivatingusecasesandthenidentifiestherequirementsofthesystem.It then describes the components of the CYCLONE application management system (centered on

Page 11: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page11of66

SlipStream) that contribute to the auto-scaling solution. This system provides the means, but not theknowledge, toscaleasystemautomatically.Theknowledge isembedded in thealgorithmsusedtoscalethe application; the document describes the current state of the art and proposes the algorithms to beusedinCYCLONEforscalingapplicationsbasedonbothfunctionalandnon-functionalrequirements.

TheCYCLONE solution is centeredon SlipStream.However, there areother platforms andproducts thatprovidesimilarfunctionality.AcomparativeanalysisbetweenSlipStreamandthealternativesjustifiestheproject’schoiceofSlipStream,butalsoindicatespotentialmarketsforexploitingSlipStreamandtheotherCYCLONEcomponentsaftertheproject.

Thedocumentconcludeswithasummaryofthesolution,themainpointsofthecomparativeanalysis,andpointsforfuturework.

Page 12: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page12of66

2. Motivation

2.1. UseCasesTomotivatetheCYCLONEauto-scalingframework,severalusecaseshavebeen identified.Thefirstthreehighlightthegeneral functionality(independentoftheselectedapplication)requiredofthesystem.Theyrely upon a couple of statistical properties of the non-functional parameters (e.g. costs, response timedistributions, etc.) that describe the application and cloud infrastructure performance. The last two usecasesarereal-worldusecasesthatfurtherillustratetherequirementsofthesystemanddemonstratetheutilityofthesolution.

2.1.1. GenericUseCase1:Provisioning(UC1)

ThroughtheSlipStreamAppStore,userscanfindalistofavailableapplicationsandthenfilterthembasedontheirrequirements.Inthisscenario,threefunctionallyequivalentapplicationsarefound:A1,A2,andA3.WiththeSlipStreamServiceCatalog,userscanfindallthefunctionalandnon-functionalcharacteristicsofeachcloudinfrastructure.Inthisscenario,twopossiblecloudinfrastructuresareidentified:C1andC2.

Theuserthenspecifiestheapplicationselectionandplacementcriteria(or“policy”),choosingforexample,thecheapestandthefastestapplication.Withinthepolicy,theuseralsospecifiestherelativeimportanceofeachcriterion: forexample, thecost is ten timesmoresignificant than theapplication response time.Theusercouldalsospecifythatthereisamaximumbudgetand/oradeadline.

Basedon this input, theSlipStream ranks theapplicationsand cloud infrastructures,presenting theuserwitha ranked listofchoices.Theusercanchooseanyoption (e.g.A2onC1)andthentellSlipStreamtodeploytheapplication.

2.1.2. GenericUseCase2:Redeployment(UC2)

Oncetheoptimaldeploymentsolutionhasbeenfound,theselectedapplicationisdeployedonthechosencloudinfrastructure(s).Startingvirtualmachinesoncloudinfrastructures isnot instantaneous; itcantaketensofsecondstoseveralminutes,dependingonthecloudserviceprovider.Thestartuplatencymustbebenchmarked to understand when the deployment process has encountered an error or performancedegradationonthechosencloudinfrastructure(s).Performanceissuesorproblematicresponsetimescanalsoariseforrunningapplications.Ineachcase,itmaybebeneficialfortheuser(withrespecttocostsandcompletiontime)tointerruptthecurrentdeploymentprocessandthenre-calculatetheplacementpolicywithout theproblematic cloud infrastructure.Theusercan thenchooseanewoptimalapplication/cloudpairfromtheranking.

Despitethepresumptionthatredeploymentcanbebeneficial,redeploymentwithinafederatedorhybridcloud infrastructure is something that is not yet well-understood and certainly not automated. Theconditions under which redeployment is beneficial must be identified before this can be rolled into analgorithmforapplicationmanagement.

Page 13: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page13of66

2.1.3. GenericUseCase3:Monitoring(UC3)

Performance volatility on cloud infrastructures is common and such volatility can lead to changes inapplicationperformancethatneedtobeaddressed.Typically,theperformancedeterioratesabruptlyandtheresponsetimeforanapplicationincreasessignificantly.

Monitoring is essential to detect such situations and overall performance management of cloudapplications. When the monitoring determines that the performance has changed significantly, theCYCLONE platform may request scaling up the application’s resources or redeployment of applicationcomponentselsewhere.Volatilitycanalsoleadtoimprovedperformance.Monitoringcanalsodetectthesesituationsandtriggeraprocesstoscaledowntheapplication.

2.1.4. ScientificImageProcessing(UC7)

Publicly funded research creates an immense amount of data that has general social, academic, andcommercial value. The taxpayers, through fundingagencies, increasingexpect thesedata tobeavailablethrough“opendata”programs.Despite theseexpectations, findingviablebusinessmodels thatkeepthemaintenancecostsforthepublicreasonablehasbeenanobstacletowidespreadavailabilityofopendata.SixSqisexploringsolutionswithEuropeanSpaceAgency(ESA)inwhichpublicdataishostedonEuropeancloudinfrastructuresandpartiallyorfullymonetizedtoreducetheneedforpublicsubsidies.

Technically,aviablesolutionrequires:

• Detailedknowledgeofthestoragelocationsofdatasetcomponents,• Meansofplacinganalysisapplicationsnearthedataofinterest,and• Rankingofmultipleprovidersbasedonpriceorothercharacteristics.

AllbutthefirstarealreadyfeaturesoftheCYCLONEbrokeringandmatchmakingcomponents.IntegrateddatamanagementisacrucialfeatureforthisusecasethatisplannedforSlipStreaminthelastyearoftheproject. Advanced networkingmay also play a role here, if significant bandwidth is required for remoteaccesstodatasets.

MoreinformationaboutthisusecasecanbeobtainedfromtheCYCLONEusecaseportal[UC7].

2.1.5. BenchmarkDrivenPlacement(UC8)

AsignificantpartofthedesigndiscussionfortheCYCLONEbrokeringandmatchmakingcomponentsdealtwithbenchmarks,bothgeneralandapplication-specificbenchmarks.AttheUCC2015conference[UCC15],a bioinformatics group from Cardiff presented custom tooling that collects benchmarks for commonbioinformaticstoolsonmanycommunityandpubliccloudsandthatusesthisinformationtooptimizetheplacementoftheirvirtualmachinesinthefuture[CON15].Thebenchmarksarecollectedcontinuouslyfromactivescientificanalysesandareaugmentedwithinformationfromdirectedprobesofeachinfrastructure.

Demonstrating their workflowwith the CYCLONE tools would be an interesting, direct validation of theproject’s brokering and matchmaking design. Similar research groups would benefit by being able tooptimizetheplacementoftheirscientificanalysisapplicationswithouthavingtoinvesteffortinbuildingupandmaintainingtheirowncustomizedmanagementtools.

Althoughtheprojectwillprobablynotworkdirectlywiththisgroup,anexampleapplicationwillbecreatedto demonstrate how benchmark driven placement can be achieved with the CYCLONE tools. MoreinformationaboutthisusecasecanbeobtainedfromtheCYCLONEusecaseportal[UC8].

Page 14: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page14of66

2.2. RequirementsBasedontheaboveusecases,asetofrequirementsfortheauto-scalingfeatureswasderived.Manyoftherequirementsduplicatethosethatwerepreviouslyidentifiedforthematchmakingandbrokeringfeaturesdescribed in earlier deliverables. Overall, these requirements require minor adjustments to existingfeaturesofSlipStream,ratherthansignificantadditionaldevelopment.

Table1:DerivedRequirements

1 Triggeringofscalingactionsofanapplicationbasedonapplicationmetricsusingsimple,predefinedalgorithms(e.g.addingnodebasedonmachineload).

2 Triggeringofscalingactionsofanapplicationbasedonapplicationmetricsdefinedbythedeveloperoftheapplication.

3 Ability to publish application-specific benchmarks of cloud providers into the ServiceCatalogorOpenServiceCompendium.

4 Placement based on static characteristics (e.g. geographical location) of a cloud serviceprovider.

5 PlacementbasedondynamicVMmonitoringinformationfromSlipStreamitself.6 Placementbasedonexternal informationpushed into theSlipStreamServiceCatalogor

OpenServiceCompendium.7 Placement based on the join of all information associated with agiven cloud service

provider.8 Rankingofselectedcloudserviceprovidersbasedonpredefinedalgorithms(e.g.price).9 Rankingbasedonalgorithmsprovidedbytheapplicationdeveloperand/ortheapplication

operator.10 Abilitytotriggernotifications/alertsthroughSlipStream.11 Abilitytotriggerscalingactionsfromwithintheapplication.12 AbilitytosearchtheServiceCatalogandOpenServiceCompendiummanuallytoseethe

resultsfromvariouspoliciesandtoideallythenassociatethosepolicieswithapplications.

2.3. StateoftheArtCloud-basedapplicationcaseshavebeenreviewedandweconsideredunresolved issuesrelatedtocloudplatforms[DIA16].Regardingauto-scalingschemes,relatedconceptsandtaxonomyweresurveyed[ALI14].Atechnicalreviewofauto-scalingforelasticcloud-basedapplicationswasprovided,andtheGartnergroupdescribesauto-scalingasanautomaticexpansionorcontractionofsystemcapacity,andindicatedthatsucha capacity is a commonly desired feature in cloud infrastructure as a service and platform as a serviceofferings [LOR14]. In other words, auto-scaling refers to the significant capability of a cloud computingenvironmenttoutilizevirtualizedcomputingresourcesautomatically.Inthisscheme,virtualizedresourcescan be increased or decreased dynamically by adapting resource utilization to satisfy the givenrequirements.Auto-scalingcontributes tocostcontrol.Thekey featuresofauto-scalingare theability toscale-out, i.e., automatic addition of resources during increased demand, and scale-in, i.e., automaticterminationofunusedresourceswhendemanddecreases.Scale-outandscale-inschemesarereferredtoashorizontalscaling.Unlikehorizontalscaling,verticalscalingincreasescomputationresourcesinexistingnodes. Auto-scaling at the service level is important because services run on a set of connected virtualmachines. The optimal model-driven configuration of cloud auto-scaling infrastructure was studied

Page 15: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page15of66

[DOU12].Itwasimplementedanopen-sourcecloudenvironmentwithauto-scalingtoaccessresourcesfora flexibleperiodwithvarying requirements inbioinformaticsandbiomedicalworkflows [KRI17],andwasimplemented an auto-scalingmodel in simulation experiments using theAmazon Elastic Compute Cloud(EC2) to reduce resource costs and test the quality of service in terms of response time and availability[QU16].

Thekeyfeaturesofauto-scalingare:

• Theability toscaleout (i.e., theautomaticadditionofextra resourcesduring increaseddemand)andscalein(i.e.,theautomaticterminationofextraunusedresourceswhendemanddecreases,tominimizecost).

• Thecapabilityofsettingrulesforscalingoutandin.• Thefacilitytodetectautomaticallyandreplaceunhealthyorunreachableinstances.Auto-scalingis

often referred in the contextof resourceprovisioning, scalability, andelasticity. These termsareoftenusedinterchangeably,buttheyareslightlydifferentconcepts.

Somechallengesfortheauto-scaling:

• Insufficienttoolsformonitoringandaggregatingmetricsattheplatformlevelandserviceleveltosupportauto-scalingdecisions.

• Auto-scaling in hybrid cloud environments is not well supported. In hybrid scenario, cloudmayofferdifferentauto-scalingtechniquesthatarenotcompatiblewitheachother,sotherewouldbeaninteroperabilityissueinauto-scalingresourcesacrossthetwoclouds.

• The efficiency of auto-scaling in terms of the reliability of the auto-scaling process is not wellmanaged. Failure of the auto-scaling process can result in violations of the system’s QoSrequirementsofperformanceandscalabilityandevenincurunnecessarycost.

• The relationship between auto-scaling and quality attributes such as availability, reliability andsecurityisunknown.

2.3.1. mOSAIC

AnFP7projectmOSAIC (http://www.mosaic-cloud.eu/)wasanFP7 researchproject that lookedatcloudapplicationmanagementfromcloudbrokeringandinteroperabilitytodeploymentandexecution.Themainconceptsare:

Component: Represents the basic building block of a cloud application, the atomic deployment andexecutionunit,whichismaterializedasone,orasetoftightlycoupled,OSprocessesthatruninanisolatedenvironment.Therearemanytypesofcomponents,eachtypemappingtoanapplicationtier,buttheyaretreatedthesamebytheplatform.Ingeneral,theyfitinoneofthefollowingcomponentcategories:

1. The "user" component, which embody the code developed by the user, and implements theneededlogic;

2. Resource ormiddleware component,which provides generic services like data storage (MySQL),messagebrokering(RabbitMQ),etc.

3. Specialized components, which are of particular use in the mOSAIC platform or in a cloudenvironment,liketheHTTPGatewayservingasaload-balancer.

Controller: The orchestration service that initiates the deployment and controls the execution of thecomponents.

Hub: A bus-like or RPC-like system that allows components to discover each other, or exchangeconfigurationmessages.

Page 16: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page16of66

Thisprojectbuiltanopen-sourceprototype,withwebapplicationsasmainapplicationdomain.However,thescalabilityofthesolution ismanual,andthere isnomonitoring.TheresourcesprovidersareAmazonEC2andEucalyptus.

WithinmOSAIC,anycomponentisabletolistenonports,providedthatitrequestsaccessbeforehand,orcan receive inbound requests from the Internet. The resources allocated to a particular component areconfiguredbytheoperator.ThereallimitationisthatthecomponentmustrunonLinuxandcannotrequirerootaccess.

This project has many differences with CYCLONE platform, as it does not provide any monitoringinformation,nordoesithandletheruntimeQoSofthedeployedapplications.

2.3.2. Cloud4SOA

TheCloud4SOA(http://www.cloud4soa.com/)projectprovidesanopensemanticinteroperableframeworkfor developers and providers, capitalizing on Service Oriented Architecture (SOA), lightweight semanticsand user-centric design and development principles. The system supports Cloud-based applicationdeveloperswithmultiplatformmatchmaking,management,monitoringandmigrationby interconnectingheterogeneousofferingsacrossdifferentprovidersthatsharethesametechnology.

Cloud4SOAprovidesfourcorecapabilitiesimplementedbythereferencearchitecture:

Matchmaking.Thematchmakingcomponentallowssearchingamongtheexistingofferingsforthosethatbest match the developer’s needs. To succeed in this, the matchmaking algorithm capitalizes on theSemanticlayerand,especially,onthePaaSandApplicationmodelswhileitdistinguishestheuser’sneedsinto application requirements and user preferences. The degree of relation is computed based on thesimilarityofthesemanticdescriptionsbetweenofferingsandanapplicationprofiletakingalsointoaccountthe targetuser’spreferences.Theoutcomeof thematchmakingalgorithm isa listofPaaSofferings thatsatisfyusers’needs,rankedaccordingtothenumberofsatisfieduserpreferences.

Management. The module performs an analysis of the application requirements to build a specificapplication deployment descriptor. It then checks if a valid SLA contract has been previously agreedbetween the specific offering and the application, finally initiating the deployment process using theCloud4SOAstandardAPIexposedbyeveryCloud4SOAplatformadapter.

Monitoring. To consider the heterogeneity of different Cloud architectures, Cloud4SOA provides amonitoring functionality basedonunifiedplatform-independentmetrics, such as latency andapplicationstatus, toallowapplicationdevelopers toproactivelymonitor theperformanceofapplicationshostedonmultipleCloudsenvironments.

Migration. The Cloud4SOA framework aims to support a seamless migration to tackle semanticinteroperability conflicts.Moving anapplicationbetweenofferings consistsof twomain steps: i)movingtheapplicationdataand ii)moving theapplication itself.During the first step, all theapplicationdata isretrieved from the offering where the application is running and moved to the new one. To avoidinconsistentstates,theapplicationisstoppedbeforedatamove.Oncedatahasbeeninitializedatthenewprovider,theapplicationisdeployedaswell.

2.3.3. OPTIMIS

OPTIMIS aims at optimizing IaaS cloud services by producing an architectural framework and adevelopment toolkit. The optimization covers the full cloud service lifecycle (service construction, clouddeployment and operation). OPTIMIS gives service providers the capability to easily orchestrate cloudservices from scratch, run legacy applications on the cloud and make intelligent deployment decisions

Page 17: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page17of66

based on their preference regarding trust, risk, eco efficiency and cost (TREC). It supports end to endsecurity and compliance with data protection and green legislation. It also gives service providers thechoiceofdevelopingonceanddeployingservicesacrossalltypesofcloudenvironments-private,hybrid,federatedormulti-clouds.

OPTIMIS simplifies the management of infrastructures by automating most processes while retainingcontrol over the decision-making. TREC solution involves choosing an optimal target platformbasedontrust, risk, eco and cost data. This dynamic data that is extracted from target platforms is used for thebenefitofserviceprovidersandendusersandcanbe‘weighted’tofittheirneeds,ensuringanautomatedruntimesolution.Thiscanbeusedatruntimetodynamicallymanagetheoptimalplatformforaservicerun,providing cross-platform scalability, using platforms with different hypervisors and cloud software, theOPTIMIStoolsprovideacommonapproachandmethodologytoachievethis.

2.3.4. MODAClouds

TheMODACloudsproject(www.modaclouds.eu)leveragestheresultsofOPTIMISandCloud4SOAprojectsto provide a run-time environmentwhich conceptual architecture is given in Figure 1. The project usesMAPE-Kautonomic loop(Monitor,Analyze,Plan,Execute,Knowledge)thatrepresentsablueprintforthedesignofautonomic systemswhereamanagedelement is coordinatedbya loopstructured in4phasesandacommonknowledge.

Figure1:ConceptualArchitectureofMODAClouds

Ingeneral,MODACloudsspecifies:

1. A monitoring platform to characterize the state of applications developed and deployed usingMODAClouds.

2. Self-adaptivepoliciestomanageapplicationQoSatruntime.Thesepoliciesrelyonmodels,shippedwiththeruntimeenvironment,toperformpredictionsonapplicationperformanceandscalabilityaswellastotrackorestimateitscurrentstatusandresourcedemands.

3. An execution platform for managing application deployment, configuration, and run-timeexecution.Thisplatformwillbeutilizedbytheself-adaptivepoliciestomanageapplicationQoS.

Page 18: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page18of66

4. Data synchronization and loadbalancingmechanisms to support the executionof an applicationthatisreplicatedovermultiplecloudstoensurehigh-availability.

There are however, some important differences between CYCLONE andMODAClouds. CYCLONE appliesoptimalsubstitutionmodelsthatcouldbeusedforre-deploymentofapplications.Thisisauniquefeatureof theCYCLONEplatform.WhileMODAClouds investigate indetail themodels toperformpredictionsonapplication performance, in our opinion, thesemodels are complex, andwhile these could be potentialused,thereisalwaysaquestionoftheaccuracyforsomemodellingapproaches,inparticularthequeueing-theorybasedmodels.Wehavethereforechosentobuilda libraryofanomaly-detectionbasedsolutions,andwill investigate the efficiency of these algorithms. In particular, non-parametricmodels of anomalydetectionrequirenoaprioriknowledgeofunderlyingapplicationperformancemodels.Thismakesthemeasiertoimplementandapplywithinhighly-variablecloudenvironments.

WealsoconsiderotheroptimizationparametersthanTREC.

Page 19: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page19of66

3. ProposedSolution

Theproposedsolutionforauto-scalingofcloudapplicationtomaintainagivenQualityofService(QoS)isillustratedinFigure2.Therearetwocorecomponentsofthesolution:SlipStreamandanAnalyticsEngine.SlipStreamhandlesall aspects related to resourcemanagement for theapplication.TheAnalyticsEngineencapsulatestheknowledgenecessarytodecidewhatactionsarerequired(ifany)giventhecurrentstateofanapplication.Botharedescribedindetailinthesubsequentchapters.

In the figure, there are three functionally equivalent applications (A, B, C) that are deployedwithin twodifferentcloudinfrastructures(oneandtwo).Basedoncurrentconditions,applicationrequirements,anduserpreferences,SlipStreamrankspossibleplacementsolutionstopresenttheuserwithanorderedlistofalternatives.Inthiscase,applicationBdeployedwithincloudinfrastructure1wastheoptimalchoice.

When the deployment begins, SlipStream starts the resource monitoring. Once the application isoperational, it toocanprovideapplicationmetrics,suchasperformanceorresponsetime.Thesemetricscreate a time-series of data that is streamed to the Analytics Engine for analysis. The Engine currentlysupportstwodifferentmechanismsfordataanalysis:anomalydetectionandapplicationoptimization(viaresourcescaling,componentmigration,orapplicationsubstitution).

Figure2:TheQoS-controlArchitectureandInteractionwithSlipStream

3.1. AnomalyDetectionAnomaliesareitems,eventsorobservationsthatdonotconformtoanexpectedpatternorotheritemsinadataset.Forexample:

• Significantdeviationsintimeseriesmetrics,• Significantchangesinmessagerates,

SlipStream

Cloud 1 Cloud 2

Application A t: p: $

Application B t: p: $$$ Application C

t: p: $$$$

t: time p: price

Data sources Connectors Analytics

Engine

Alert and Decision Making

Connectors

GUI

Actions

Substitute Monitoring

Page 20: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page20of66

• Rareorunusuallogmessages,or• Unusualuserbehavior.

Iftheexpectedornormalpatternsofbehaviorareknown,anomaliescanbeidentified.Givenatime-series,the Engine models the time-series producing an expected value that may be later consumed by theanomaly detection methods. The anomaly detection methods include both outlier and change-pointdetectionalgorithms.

For an outlier detection, the algorithms identify a time-series element for which the observed value issignificantlydifferentfromtheexpectedvaluefromtherestofthetimeseries.

For change-point detection algorithms, the goal is to identify for a given time-series of observationswhetherthecurrentpoint(oroneinthenearpast)representsachangeintheseries.Onedesirestodetecta potential change as soon as possible. In general, change-point detectionmethodsmonitor some teststatistic, which is based on the observations, and issue an alarm if this test statistic exceeds a certainthreshold, such that the calculated probability of not detecting a change-point is kept below a certainpredefinedvalue,forexample,α=5%.Statisticallyspeaking,thechange-pointdetectionmethodsperformahypothesistestforeverytimestep.Change-pointdetectionmethodscanbeusedtodetectachangeinmeanorachangeinthedistributionofthetimeseries,forexample.

3.2. ApplicationOptimizationAsfor theoptimization, thesealgorithmsareusedtoprovideoptimalpolicies that takeas inputthecostfunctions,response-timedistributions,and(theoretically)reward/penaltyfunctions.Thesepoliciesfurtherspecifywhentosubstituteoneapplicationforanotherandwhichapplicationtosubstitute.Thevaluetobeoptimizedmaychangebyapplicationandactor; forexample, itmaybemaximizing the revenue, for theprovider),orminimizingthecosts,fortheapplicationowner.

The optimization algorithms may also be triggered by the anomaly detection, for instance when aperformanceanomalyhasbeenidentifiedandanewpolicymustbederived.Onceanewpolicyisavailable,a decision is then made as to whether to apply the newly derived policy or not. The decision-makingprocessmaybecompletelyautomated.Ifchangesarenecessary,thespecifiedactionsarethenappliedtothedeployedapplicationbySlipStream.InFigure2,adecisionismadetosubstituteServiceBinCloudonebyServiceCinCloudtwo.

Page 21: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page21of66

4. SlipStreamComponentsforAuto-scaling

4.1. SlipStreamArchitectureNon-functional requirements strongly influence the choice of cloud service providers (CSPs) for a givenapplicationdeployment.Theserequirementsinclude:

• SecurityguaranteesorcertificationsofCSPs,• Historicalandcurrentoperationalquality,and• Applicationperformanceorresponseundervaryingloads.

The set of non-functional and operational requirements are often formalized as part of Service LevelAgreements (SLAs) between the cloud application owners and cloud service providers. SlipStream, theCYCLONE component that deals with cloud application lifecycle management, provides the means tomonitorsuchagreementsandtoreactincaseofviolations.

Thehigh-levelarchitectureofSlipStreamconsistsoffourfunctionalblocks:

• ApplicationDescription RepositoryAllows applicationdevelopers to describe the functional andnon-function requirements, software installation/configuration, and parameters for cloudcomponentsandmulti-componentcloudapplications.

• Service Catalog Provides information concerning the administrative, financial, and operationalcharacteristicsofCSPs.The information iscollected into“offers”thatcanthenbeselectedbasedonbothfunctionalandnon-functionalrequirements.

• Deployment Engine Combines application and CSP information to choose and then provisionappropriate resources for an application. This engine handles the full lifecycle of the applicationincludingtheinitialdeployment,scalingactions,andtermination.

• MonitoringSlipStreammonitorsthestateofalldeployedcloudresources,allowingthedeploymentengine to control the state of resources, to raise alerts for abnormal conditions, and to provideusageinformation.

Users can access these functional blocks through a comprehensive API, aweb browser interface, and acommand line client. Theweb browser interface is roughly organized around the four functional blocksdescribed above. The API is a REST-based API running over the HTTP(S) protocol. The API is currentlymigratingfromaproprietaryRESTAPI[SAPI15]toonebasedontheCIMIstandardfromDMTF[CIMI16].

Figure 3 shows the functional blocks of the SlipStream server, accessmethods, and the underlying CSPsthatprovideresourcesforusers’cloudapplications.

Page 22: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page22of66

Figure3:FunctionalBlocksofSlipStream

4.2. ApplicationDescriptionRepositoryRunningapplicationsonanycloudsystemrequiresmanagementofvirtualmachineimages.Inmanycloudmanagement systems, users generate thesemachine image files and then publish them tomake themavailableontheircloudserviceprovider(CSP).Themanagementoverheadassociatedwiththetransport,conversion,andevolutionoftheseimagesdiscouragestheuseofmultipleCSPs.

SlipStreamtakesadifferentapproach:usersspecifytheresourcerequirements,placementconstraints,andthesoftwareinstallationandconfigurationprocedures.SlipStreamthenusesthisinformationtotransformexisting,minimalimagesoptimizedforeachcloudproviderintothecustomizedVMrequestedbytheuser.Thishastwoadvantages:1)theimagedescriptionsareportableandcanbeusedforanycloudsupportedbySlipStreamand2)allknowledgeabouttheapplicationiscapturedandmanaged.

Oncloudsthatsupportcustomizeduserimages,binaryimagefilescanbeproduced(“built”)bySlipStreamtoreducethestartuplatency.Usersmustexplicitlyrequestthebuildofthesebinaryimagefiles,butonceproduced,SlipStreamwillusethemautomatically.ThismaintainscloudportabilitywhileallowinguserstooptimizeforparticularCSPs.

SlipStream provides a “workspace” in which usersmanage their application and application componentdescriptions. These descriptions can be sharedwith other users. In addition, system administrators canpublishvettedapplicationsintoan“AppStore”tomakethemvisibletoallSlipStreamusers.ApplicationsintheAppStorecanbefoundeasilyandlaunchedwitha“singleclick”.

Withinthecomponentdefinitions,theresourcerequirementandplacementconstraintsdirectlyaffecttheCSPsthatarechosen.Theplacementconstraintscanincludesecurity,location,availability,andothernon-functionalrequirementstosupportthedefinitionandenforcementofSLAs.

SlipStream Server

IaaS Cloud Service Providers (CSPs)

Application Description Repository Service CatalogDeployment Engine Monitoring

REST Application Programming Interface (API)

Web Browser User Interface (UI)Command Line Interface (CLI)

AWS

Exoscale

OpenStack

CloudStack

Page 23: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page23of66

4.3. DeploymentEngineThedeploymentengineorchestrates theentireprocess tobringacloudapplication intoaworkingstate,including choosing appropriate cloud resources and provisioning them. The implementationmaintains afinitestatemachineforeachapplicationandmanagesthetransitionsbetweenthedefinedstates.

The Placement and Ranking Service (PRS), part of the deployment engine, matches applicationrequirementsandconstraintstoavailableCSPoffers.Itfirstfilterstheofferstoremovethosethatdonotmeettheminimumresourcerequirementsandthenrankstheacceptableoffersbyprice, fromlowesttohighest.Automatedprovisioningchosesthe lowestpriceoffer;manualprovisioningcanchooseanyofferthatpassedthefiltering.

The deployment enginemust interact with CSPs to provision the resources required for an application.SlipStreaminternallyusesauniform,abstractinterfacetohandleinteractionswithcloudserviceproviders.“Cloudconnectors”thenimplementthisinterface,allowingSlipStreamtoseamlesslysupportawiderangeofdifferentcloudserviceproviders.

All applications need to provide information to its users (e.g. the endpoint of the service) and/or takeconfigurationoptionsfromtheoperator(e.g.whatdatabasetoconfigure).Moreover,thereisoftenaneedto coordinate the configuration of different machines in an application, for example, ensuring that adatabaseclientdoesnot startbefore thedatabase is ready.Tomeet theseneeds,SlipStreamprovidesaparameter “database” foreach runningapplication instance.Allpartsof SlipStream,aswell as theusersandtheirapplications,canaccessthisdatabasetoexchangeinformation.

The following sectionsprovidemoredetailson theplacement,provisioning, and coordinationaspectsofthedeploymentengine.Ofthethree,thePRSisthemostcrucialforhandlingfunctionalandnon-functionalrequirementsofcloudapplications.

4.3.1. PlacementandRankingService

ThePRS,amicro-servicewithintheSlipStreamserver,selectsacceptableoffersfromCSPs.IthasasimpleAPI,wherebycloudapplicationrequirements in JSONformatarepassedtothePRSviaanHTTPPUTandthe service responds with a ranked list of clouds, also in JSON format. SlipStream comes with its ownimplementation of this service but alternate implementation can be used by changing the SlipStreamconfiguration.

Toperformitsfunction,thePRSinteractswiththeservicecatalog,filteringandrankingtheoffers itfindsthere.ThePRSobtains informationabout theapplicationandusercloudconfigurationthroughthe inputdocument.Consequently,itdoesnothaveanydirectinteractionwiththeapplicationdescriptionrepositoryortheotherfunctionalblocksofSlipStream.

Thefirstoperationisthefilteringofserviceoffers.Thefilteringonlyacceptsoffersforcloudsthattheuserhasconfiguredandfromthoseonlyselectsoffersbasedoneither:capacityrequirementsoranexactmatchofaninstancetypeor“flavor”.

CapacityrequirementsareexpressedintermsofCPU/RAM/Diskatthecomponentlevel.OnlyserviceofferswithCPU/RAM/Diskvaluesabove thecomponent requirementsareselected.As somecloudsdonotusefixedvaluesforCPU/RAM/Disk,thisrulecanbeoverriddenwiththeschema-org:flexibleattributesettotrue.

{ "components" : [{ "module" : "moduleURI", "cpu.nb" : 1,

Page 24: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page24of66

"ram.GB" : 2, "disk.GB" : 10, "placement-policy" : "<optional CIMI filter>", "connector-instance-types" : { "connector-name1": "<size1>", "connector-name2": "<size2>" } }], "user-connectors" : ["connector-name1", "connector-name2"] }

Whenspecifyinganinstancetypeor“flavor”foracomponent,onlyofferswiththeexactinstancetypeareretained. If both the capacity requirements and an instance type aredefined, the instance type filteringtakesprecedence.

Acomponentmayalsospecifyaplacementpolicy,suchas:

{"placement-policy": "schema-org:location=’ch’"}

whichwouldselectonlythoseoffersthatprovideresourcesinSwitzerland.Despitethename“placement”,thispolicy is completelygeneralandcan filteronanycharacteristicsofaCSPoffer. If theofferprovidessecuritycertificationinformationoravailability,forexample,thosevaluescanbeusedtoselectacceptableservices.Thesepoliciesareappliedinadditionto(i.e.withalogical“and”)totheresourcerequirements.

Foreachconnector inthefiltered list therankingprocesscalculatesametric.Currentlythismetric is theestimatedpriceperhourineuros.Thelistisenrichedwiththisinformationandthensortedbyincreasingprice.(Offerswithoutpricesareputattheendofthelist.)Theresultisthenreturned.

The detailed JSON input and output formats are shown below. Note that connector-instance-typesattributecontainsthepreferredinstancetypesperconnector.

The’node’attributeisonlyusedforapplications(i.e.multi-componentdeployments);itisnotpresentfordeployments of simple components. Ordering is indicated with the ’index’ attribute.When price is notavailable,thevalue-1isreturned.

{ "components" : [{ "module" : "moduleURI", "cpu.nb" : 1, "ram.GB" : 2, "disk.GB" : 10, "placement-policy" : "<optional CIMI filter>", "connector-instance-types" : { "connector-name1": "<size1>", "connector-name2": "<size2>" } }], "user-connectors" : ["connector-name1", "connector-name2"] }

{ "components" : [{ "node": "<unused-when-not-an-application>", "module": "moduleURI", "connectors": [{ "name": "connector-name2", "price": 0.0171, "currency": "EUR", "cpu": 2, "ram": 8,

Page 25: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page25of66

"disk": 10, "instance_type": "micro", "index": 0 }, { "name": "connector-name1", "price": 0.0185, "currency": "EUR", "cpu": 4, "ram": 32, "disk": 200, "instance_type": "medium", "index": 1 }] }] }

4.3.2. Provisioning

Theprovisioning(andfreeing)ofcloudresourcesishandledbytheSlipStreamserver.SlipStreammanagesacloudapplication’slifecyclebydrivingitthroughthewell-definedsetofstates.

For components (simple applications with only one machine), SlipStream itself will directly start themachineinthecloud.However,forapplicationswithmultiplemachines,SlipStreamdelegatessomeofthelifecyclemanagementtoanadditionalmachinecalledan“orchestrator”.Theorchestratorisresponsibleforprovisioningallthemachinesoftheapplicationandwillalsohandlescalingactions.Fordeploymentsthatwill not scale, theorchestrator is terminatedonce theapplication is ready; for scalable applications, theorchestratorremainstohandleprovisioningwhenscalingrequestsaremade.

Throughout the application lifecycle, SlipStream executes and synchronizes the provisioning andconfigurationactionsfollowingatwo-levelstatemachine.Figure4showsthemainstatemachinewithitstransitionsandsynchronization.

Thesecondarystatemachineonlyappliestoscalabledeploymentsduringascalingaction.Whenascalingactionisinitiated,themainstatemachinemovestothe“Provisioning”state,wherethescalingactionswillbe applied. The statemachine then continues until it returns to the “Ready” state. Figure 5 shows thesecondarystatemachine;theelementsinblueonlyapplytoverticalscalingactions.

Figure4:DeploymentStates

Page 26: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page26of66

Figure5:ScalingStates

4.3.3. Coordination

Foreachrunninginstanceofacomponentorapplication,SlipStreammaintainsasimplekey-valuedatabase,whichisusedtoshareinformationbetweenSlipStream,managedvirtualmachines,andtheoperatoroftheapplication.This“parameterdatabase”canbeaccessedviathewebbrowserUI,theSlipStreamCLI,andtheSlipStreamAPI.

Table2describesthemostimportantparametersthataremanagedbySlipStreamortreatedspecially.Thetabledoesnotincludeallparameters,norparametersdefinedbyspecificcomponents.

Theglobalparametersprovideinformationaboutthecompleteapplication.Generally,theparameterslike“ss:state”and“ss:abort”areusedtocontrolthestatemachineoftheapplication.Otherslike“ss:url.service”aretreatedasURLswithinthebrowser interfaceto facilitateaccesstodeployedapplications.Thosethatarenotread-only,canbesetbytheuseror theapplication itself throughtheSlipStreamAPIor thess-setcommandintheCLI.

SlipStreamprovides some“automatic”, read-onlyparameters for thevirtualmachineswithinadeployedapplication. Themost importantare “hostname”and “instanceid” thatprovide themachine’s IP addressandcloudinstanceID,respectively.

A“node”inaSlipStreamdeploymentreferstoacollectionofmachinesofthesametype(orclass)andinthesamecloudinfrastructure.ThereareafewparameterssetbySlipStreamthatmakeitpossibletoiterateoverthemachineswithinthenodeclass.

Applicationcomponentscandefineanynumberofinputandoutputparameters.Allparameters,includingtheuser-definedones,canbeaccessedviatheSlipStreamcommandlineclientusingthecommandsss-getandss-set.Notethattryingtoaccessanundefinedparameterwillabortthedeployment.

Page 27: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page27of66

Thess-get commandwill, by default, block (subject to a timeout) if the parameter has not yet beengiven a value. This allows parameters to act as simple semaphores to coordinate the behavior betweenvirtual machines within a given deployment. It is common for components to define “ready” flags tofacilitate thiscross-componentcoordination.Forexample,adatabasecomponentmayprovidea“ready”flagtoindicatetoclientswhenthedatabaseisavailable.

The parameter database probablywould not be used directly in themanagement of SLAs, but could beusedindirectlytoprovide,forexample,thenumberoffailedmachinesorthecurrentnumberofjobsinabatchqueue.Note, however, that the valueswouldneed tobe setdirectlyby theapplication. Similarly,triggeringstatechangesmustbedoneexternallyviatheAPI.

Table2:SlipStreamApplicationParameters

Name Scope Read-only Descriptionss:state Global Yes Currentstateoftheapplicationss:abort Global No Global abort flag for the application type of

deploymentss:category Global Yes Typeofdeploymentss:tags Global No A list of tags set by the user to differentiate

deploymentsss:url.service Global No AserviceURLfortheoverallapplicationIds Node Yes ListofIDsformachinesinthisnodeclassMultiplicity Node Yes NumberofmachinesinthisnodeclassHostname Machine Yes ContainstheIPaddressoftheVMInstanceid Machine Yes Thecloud-specificidentifierfortheVMAbort Machine No Flagindicatingiftheapplicationhasaborted

4.4. Monitoring/OptimizationInfrastructureSlipStreammonitorsthestateofvirtualmachinesdeployedaspartofacloudapplication,bothduringandafter thedeployment.During thedeploymentprocess, the informationhelpssynchronize the installationandconfigurationactions;afterwards, it isdisplayedontheusers’dashboards toprovideanoverviewoftheir cloud activity.As a deployment works its way through the SlipStream state machine, SlipStreamgenerates “events” with information about the progress. Users and users’ applications can similarlygeneratetheirownspecializedeventsfortheirapplications.AlltheeventscanbeaccessedfromthewebbrowserUI, CLI and the various APIs. Eventually, notifications based on these eventswill be supported,allowingcorrectiveactionstobetriggeredbasedonthestateoftheapplication.

In parallel, SlipStream periodically polls the cloud infrastructures to obtain information about users’runningvirtualmachines.The information iscorrelatedwiththeknownvirtualmachinesfromSlipStreamdeploymentstoprovideanoverallviewofresourcesbothmanagedandnotmanagedbySlipStream.Thisinformation is available on the web browser UI dashboard and through the CLI and APIs. Eventually,discrepancies between the expected and actual deployment states will be reported as events, allowingnotifications and automated corrective actions. Similarly, availability statistics will be published into theservicecatalog,influencingtheselectionofCSPsforfuturedeploymentsorscalingactions.

Page 28: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page28of66

Themonitoringwillplayacrucial role fordeployingself-adaptingandself-healingapplicationsonclouds.Thisprovidesthefeedbackonthecurrentstateoftheapplicationthatwillbethebasisfortriggeringscaling,migration,orotheractions.

4.5. ServiceCatalogTheServiceCatalogisarepositoryofcharacteristics(stableinformation)anddynamicpropertiesassociatedwithagivencloudinfrastructure.ItiscomposedofthreeRESTresourcesfollowingtheCIMIstandard.ThemainoneistheServiceOffer,thetwoothers(ServiceAttributeandServiceAttributenamespace)providesemanticinformationandnamespacefunctionalities.

4.5.1. SourcesofInformation

This information is inserted (or updated) into the Service Catalog both on regular basis (“InformationProviders”thatfeedtheprices)andbySlipStreamitselfcontinuously(viathemonitoring).

• Application monitoring from SlipStream itself. SlipStream continuously monitors the state ofvirtual machines within deployed cloud applications. Currently this information is only availableinternally,soSlipStreamwouldneedtobemodifiedtomakethisinformationavailablethroughtheAPI.Thisinformationcouldbeusedtotriggerdeployments(withoptionalmigration)ofapplicationcomponents.

• Price information for clouds. “Information Providers”, that is, processes that parse web pageinformationandregularlyupdatepricevalues.Thetypicalrefreshrateofthesedataisslow.

• Application-specific benchmarks. The applications themselves can publish specific informationabout the performance of a cloud infrastructure for a given application. This could be used toinfluencefutureschedulingdecisionsfortheapplication.Itcouldalsobeusedtoinitiatechangesinthecurrentapplicationinstanceaswell.

• Application metrics. These are application-specific metrics that are taken frequently and usedinsidethedeploymenttodecideonscalingactions.Thesemetricstypicallychangetooquicklyandaretoospecifictoanapplicationinstancetopushintotheservicecatalog.

• External information. Information such as security certifications (e.g. from the Cloud SecurityAlliance)orgenericbenchmarks(e.g.CloudHarmony)canbe inserted intotheServiceCatalogorOpenServiceCompendiumtoprovidearichersetofinformationtouseforresourceselection.Thismayalsobeextendedto“endorsements”whereathird-partywouldcertifycertaincloudservices.

4.5.2. ServiceCatalogResources

ThemostimportantresourceistheServiceOfferthatcontainactualoffersfromCSPs.TheotherresourcessupportServiceOfferbyprovidingnamespaces(toavoidnamecollisionsacrossallserviceoffersattributes)andmeta-data information(mainlyadescriptionofanattribute).EachattributenameforaServiceOffermust be of the following form: prefix:name. For example, schema-org:location is a validattribute name if a Service Attribute Namespace resource with this prefix exists. Each service attributenamespaceprefixisguaranteedtobeunique.

4.5.2.1. OffersSelf-contained offers for specific services that are associatedwith a single, specific price calculation. Anexample,wouldbeaVMwithaspecificsizeandduration.Anotherwouldbeapersistentstorageareawitha given size, SLA, etc. The schema for a Service Offer is very flexible. This is basically a key value store(exceptfortheconnector/href attributethatidentifiesthenameofthecloud).

{

Page 29: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page29of66

"connector" : { "href" : "nuvlabox-christiane-nusslein-volhard" }, "schema-org:last-online" : "Tue Oct 11 08:52:41 UTC 2016", "schema-org:state" : "ok", "schema-org:flexible" : "true" }

4.5.2.2. AttributesThesemanticdefinitionofasingleattribute.ThisassociateswithaURIwiththesemanticdefinitionoftheattribute.Thisalsoallowsinternationalizationofthedescriptionoftheattribute.

"prefix": "NonBlankString", "attr-name": "NonBlankString", "type": "type", "authority": "NonBlankString", "major-version": "NonNegInt", "minor-version": "NonNegInt", "patch-version": "NonNegInt", "normative": "true-or-false", "en": { "name": "Name of the attribute", "description": "Description of the attribute", "categories": ["category-one", "category-two"] }, "fr": { "name": "Nom de l'attribut", "description": "Description de l'attribut", "categories": ["categorie-un", "categorie-deux"] } }

4.5.2.3. NamespacesAssociates a URI with a given prefix. Note that the pair prefix/URI must be unique across all ServiceAttributeresources.TheJSONstructureforServiceAttributeNamespacewiththeCIMIcommonattributesstrippedoutlookslike:

{ "prefix": "a-prefix-without-dot-or-slash", "uri": "NonBlankString" }

4.5.3. Filtering/QueryLanguage

The current implementation uses the CIMI filtering [CIMI16] with the filter query parameter to selectsuitableserviceoffers.Forexample,thefollowingexpression:

((connector/href=’connector-name1’) or (connector/href=’connector-name2’)) and ((schema-org:flexible=’true’) or ((schema-org:descriptionVector/schema-org:vcpu>=4 and schema-org:descriptionVectorschema-org:ram>=16 and schema-org:descriptionVectorschema-org:disk>=50)))

will select connectors named “connector-name1” or “connector-name2” and with CPU/Ram/Disk above4/16/50,orwithaflexibleschema.

The use of native Elastic Search API could bring more flexible ways of querying the Service Offer. Forexample, it would be possible to use fuzzy matching, phrase or proximity queries, etc.Whether this isrequiredwilldependonthecurrentandfutureusecases.

Page 30: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page30of66

4.6. AccessingSlipStreamThe many features of SlipStream would not be useful unless they can be accessed easily. Fortunately,SlipStreamprovidesthreemechanismsforaccessingtheservice:

• RESTAPI:AcompleteAPIthatallowsprogrammaticaccesstotheservice.TheAPI followsrecenttrendsandprovidearesource-orientedorRESTAPIthatworksovertheHTTPprotocol.Thisallowseasy access from all programming languages without having to provide customized libraries foreach.

• WebBrowserUI:Aconvenientgraphicalmethodofaccessingtheserviceandallowingpeopletomanageinformationconcerningtheircloudapplications,notably inthecontextofthisdocument,managingtheplacementpoliciesthatcontainthefunctionalandnon-functionalrequirementsfortheapplications.UsestheRESTAPIoftheservice.

• CommandLineInterface(CLI):Accessfromthecommandline,eitherforquicktestsorforscripting,is essential for any complete solution. SlipStream provides a CLI written in Python to make itsdeploymentandusenearlyuniversal.TheCLIuses theSlipStreamRESTAPIandexposesmostoftheavailablefunctionality.

All threeof theseaccessmethodsplayroles in thesolutions for functionalandnon-functionalaspectsofcloudapplicationmanagement.Specifically,thewebbrowserinterfaceallowspoliciestobeeasilywrittenandassociatedwith application components. TheRESTAPI andCLI facilitate themanual andautomatedscalingofcloudapplications(seefollowingsections)inresponsetooperatingconditions,whetheritbethechangingloadontheserviceorchangingcharacteristicsoftheCSPs.

4.7. Auto-scalingofUserApplicationsWith SlipStream one can automatically scale up and down the number of instances of an applicationcomponent of a (possibly) complex application. Currently only one component can be scaled at a time.Worktoremovethislimitationisongoing.Therearetwowaystotakeadvantageofauto-scalingwiththeexistingSlipStreamimplementation:

• Black-box autoscaling: The simplest approach is to use the default implementation of the auto-scaling inSlipStream.Currently, it is suitable for theapplicationswhere theonlyonecomponentwill scale based on only onemetric. If these requirements aremet, adding a special autoscalercomponent to theuser’sapplication,providingconfiguration filewith theapplicationcomponentscalability constraints, and deploying the application as a scalable deployment allows user tobenefitfromtheautomatedscalabilityprovided“outofthebox”bySlipStream.

• Do-it-yourself (DIY)autoscaling:The implementationoftheautoscaler inSlipStreamallowsuserstosupplytheirownimplementationsoftheautoscalinglogic.Inthiscase,SlipStreamtakescareofthedeploymentoftheautoscalerplatformandrunningtheusersuppliedauto-scalinglogic.Whatisrequiredfromtheuseristheinclusionoftheautoscalerintheuser’sapplicationasacomponentandprovidingapublicURLwiththeautoscalingimplementation.

Intheabovecases,itisrequiredthatchosencomponentsoftheuser’sapplicationpublishmetricsonwhichthescalingactionswillbebasedandtosupporttheautomaticscaling.

SlipStreamusesRiemann[RIES16]toimplementitsauto-scalingdecisionmakingfeatureandconsequentlyrequiresthatthemetricsbepublishedasRiemannevents.Tofacilitatemetricscollection,Riemannhasawiderangeofmetricpublishers[RIEC16]; these includeaRiemannpluginfor“collectd”andaPythonCLIandlibraryAPI.

Page 31: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page31of66

TheimplementationoftheautoscalerinSlipStreamiscompletelyopenandflexible.ItallowsuserstowritetheirowndecisionmakinglogicasRiemannstreamsandtoprovideittotheautoscalercomponentasaninputparameter.Thiscanbeusefulincaseswhenthedefaultautoscalingimplementation,thatcomeswiththeSlipStreamautoscaler,doesnotfullysatisfytheneedsoftheuser’sapplication.

4.7.1. ConfigurationofAuto-scaleConstraints

Below is the example configuration file (in edn format, see [EDN12]) to be used with the black-boxautoscalingapproach.Theconfigurationdefinesscalabilityconstraintsforanapplicationcomponentcalledwebapp.

[ { ;; ;; Mandatory parameters. ; name of the component in the application :comp-name "webapp" ; service tags as sent by Riemann publisher in the event :service-tags ["webapp"] ; monitored service metric name (as sent with Riemann event) :service-metric "avg_response_time" ; value of the metric after which start adding component instances :metric-thold-up 500.0 ; value of the metric after which start removing component instances :metric-thold-down 200.0 ; max number of component instances allowed :vms-max 4 ; "Price" constraint. ;; ;; Optional parameters (with defaults). ; min number of component instances allowed ;:vms-min 1 ; number of instances by which to scale up ;:scale-up-by 1 ; number of instances by which to scale down ;:scale-down-by 1 } ]

Itreadsthefollowingway:

1. ForanapplicationcomponentwebapptheautoscalerexpectstoreceiveRiemannevents:1. withtheservicenameavg_response_time,2. oneoftagsbeingwebapp;

2. Whenthevalueofthemetricintheeventis1. Abovemetric-thold-up(500.0),thentheautoscalershouldperformascaleupaction,2. Belowmetric-thold-down(200.0),thentheautoscalershouldperformascaledownaction;

3. The autoscaler should not perform scale up action if there are already vms-max (4) componentinstancesrunning,

4. The minimum number of component instances should not go below vms-min. That is, theautoscaler should not attempt a scale down action if the number of the currently runningcomponentinstancesisvms-min.(1),

5. Theautoscalershouldscaleupbyscale-up-bynumberofinstancesandscaledownbyscale-down-by.

Page 32: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page32of66

4.7.2. AutoscalerComponent

Theautoscaler component isavailable inAppStoreonNuvla1, theSlipStream“asaService” runbySixSq(seeFigure6).ItssourcecodeispublishedonGitHubinthe“slipstream-auto-scale-apps”repository2inthe“autoscaler”module.Thestructureoftheautoscalerapplicationis:

app/ dashboard.json # Configuration of the layout of Riemann-dash. dashboard.rb # Configuration of the Riemann-dash service. event-example.json # An example of Riemann event in JSON. project.clj # Clojure project file for working with Riemann streams file. riemann-ss-streams.clj # SlipStream scaling logic as Riemann streams. scale-constraints-example.edn # Example scale constraints. deployment/ deployment.sh # Deployment of the autoscaler component.

The application uses Riemann to process incoming componentmetrics as events. Themain part of theapplication is theRiemann configuration file,app/riemann-ss-streams.clj, a script iswritten inClojureprogramminglanguage.

The default implementation loads the auxiliary library (as the “sixsq.slipstream.riemann.scale” Clojurenamespace) that defines custom event processing streams and helper functions. It then uses thosefunctionstoreadinthescalabilityconstraintsforthecomponent(s),toprocess/updateincomingevents(orgeneratenewevents),tomakethescalingdecisions,andtorequestthescalingactionsfromSlipStream.

Thedecision-makingalgorithmusesRiemann’smovingtimewindowstreamwithawindowsizeof30stosmoothoutspikesintheincomingmetrics’timeseries.

The functionsdefined in the “sixsq.slipstream.riemann.scale”namespace simplify themain configurationscript by providing a number of utility functions that hide the details of interacting with SlipStream torequestscalingactions,creatingnewand/orupdatingoldevents,and(re-)indexingand/orpublishingthemtoGraphite.Thenamespace isdefined ina the“run-proxy/api”module in theGitHub repository.All thepublicfunctionsaredocumented.

Based on the example in app/riemann-ss-streams.clj and taking advantage of the utilityfunctions, application developers canwrite their own scaling logic and then provide it to the autoscalercomponent as a public URL via an input parameter. For details, see the autoscaler component in theAppStore.

ThedeploymentscriptoftheautoscalermoduledeploysRiemann,theRiemanndashboard,andGraphite.After deployment, the Riemann dashboard can be found at the URL http://<autoscaler IP>:6006 andGraphite,athttp://<autoscalerIP>.

Riemannacts as a streamprocessingenginewithnoor littlememoryof theevents.Graphite is used tostorethehistoricaldataof theevents.Primarily this isusedtoplot thehistoricaldata,butcouldalsobereadintotheRiemannstreamstoconsiderthehistorywhenmakingscalabilitydecisions.

1https://nuv.la2https://github.com/slipstream/slipstream-auto-scale-apps

Page 33: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page33of66

Figure6:AutoscalerandScalableApplication

4.7.3. PublishingComponentMetricstoAutoscaler

The autoscalermakes the scalability decisions based on themetrics coming from the user’s applicationcomponentsandthethresholdsprovidedbytheuser.BecausetheautoscalerisaRiemannapplication,theusermustuseoneofthemanyRiemannclientstopushthemetrics(asevents)toit.(See[GAL07]fortheavailableclients.)

4.8. ExampleAuto-scaleApplicationThe following section describes the example auto-scale application that uses write Riemann plugin ofcollectdandacustompublisherwritteninPythonusingRiemannclientlibraryAPI.

Theexampleauto-scaleapplicationisawebapplicationthatusestheRiemanncollectdpluginandacustompublisherwritteninPythonandtheRiemannClientAPI.ItisschematicallydepictedinFigure7andconsistsofthefollowingcomponents:

• webapp: a stateless web application that takes requests, synchronously performs a moderatelyintensivecomputation(calculatingPiupto100digits),andreturnstheresult,

• nginx: a load balancer based on Nginx [NGIN16] that distributes client requests to the set ofstatelesswebservers,

• client:atestclientbasedonLocust[LOC16]thatsimulatesavaryingnumberofclients,and• autoscaler:StandardSlipStreamautoscalercomponentthatmakesscalingdecisions.

TheapplicationcanbefoundintheAppStoreonNuvla(seeFigure6);itssourcecodeisinthe“client-nginx-webapp”moduleintheGitHubrepository.

Page 34: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page34of66

Figure7:ComponentsofExampleAuto-scaleApplication

4.8.1. ApplicationConfigurationandDeployment

Figure 8 shows the definition of the application in SlipStream.Note that values for the autoscaler inputparameters, namely, “riemann_config_url” and “scale_constraints_url” must be provided. The first onedefinestheURLwitharesourceunderwhichthefollowingfilesareexpected:

• riemann-ss-streams.clj:RiemannstreamsusingSlipStreamscaleup/downactions,• dashboard.json:Riemanndashboardlayoutandqueries,and• dashboard.rb:Riemanndashboardconfiguration.

TheycontaintheprocessinglogicforautoscalingactionsandtheconfigurationfortheRiemanndashboard.This allowsusers toprovide their own implementationof the scaling logic anddashboard configuration.The second input parameter “scale_constraints_url” provides the URL with the application scalingdefinitions.Forthisapplication,theylooklike:

$ cat scale-constraints.edn [ {:comp-name "webapp" :service-tags ["webapp"] :service-metric "avg_response_time" :metric-thold-up 7000.0 :metric-thold-down 4000.0 :vms-max 4} ]

Thisfileisapplication-specific;fortheexample,itcomesfromtheuserapplicationmoduleonGitHub.

Page 35: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page35of66

ClickingonDeploybuttonbringstheApplicationDeploymentdialog(seeFigure9).Inthisdialog,youmustchecktheboxtoindicatethatthisisascalableapplication;youcanoptionallychangethemultiplicityofthewebappcomponent.Selectthecloudandproceedwiththedeployment.

Figure8:ExampleScalableApplicationDefinitioninSlipStream

Figure9:DeploymentofExampleScalableApplication

Page 36: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page36of66

4.8.2. UsageoftheApplicationAfterDeployment

AfterasuccessfuldeploymentoftheapplicationoneshouldfirstopentheHTTPURLpublishedbytheclientcomponent.Itprovidesapagewithadescriptionoftheapplication,adeploymentdiagram,andlinkstotheservicesrunningonothercomponents(seeFigure10).

ThefirstlinktofollowistheonepointingtotheLocustloadgenerator;thissimulatesavaryingnumberofclients. Although Locust can be used as an automatic load generator, this application assumes manualconfigurationoftheloadparameters.Figure11showsthedefinitionofthreeparallelusersthatLocustwillusetoaccesstheweblayerandloaditwithrequests.Theendpointtocontactandtheresourceofthewebapplicationwerealreadyautomaticallyconfiguredduringthecomponentdeployment.

After Locust starts loading thewebapplication, a customRiemanneventspublisher collects theaverageresponse timemetric from Locust and publishes it as an event to the Riemann service running on theautoscaler.Theeventlookslike:

{"service": "avg_response_time", "tags": ["webapp"], "ttl": 10, "host": "httpclient", "time": 1479202972, "metric_f": 3167.896}

Other metrics are also collected and sent from client (Locust) and webapp component instances. Forexample:

• TheclientpublishesnumberofconcurrentclientsusedbyLocust,currentrequestspersecond,andtherequestfailrate;

• Thewebappinstancespublishtheircurrent1,5,and15minload.ThismetricispublishedthroughtheRiemanncollectdplugin.This isdeployedandconfiguredautomaticallyoneachwebappwiththecollectd.shscriptavailablefromtherepository.

Basedontheloadmetricsreportedbythewebappinstances,aRiemannstreamdynamicallycalculatesthecurrentnumberoftheinstancesandindexesittoallowtheRiemanndashboardtoqueryanddisplayit.

ThecurrentmultiplicityofthewebappcomponentisqueriedbyaRiemannstreamdirectlyfromSlipStreamand indexed.Asonecansee inFigure12, there isadelaybetweentheprovision request (multiplicityasreported by SlipStream) and availability of the virtualmachine (as reported by the loadmetrics). This isalmost entirely due to the provisioning latency on IaaS level; SlipStream’s control flow contributesnegligiblytothelatency.

Basedonthegivenconstraints,theautoscalerattemptedtosatisfythescalabilityconstraintsprovidedforthewebappcomponentbykeepingtheaverageresponsetimemetricwithintherequestedbounds.

Alltheapplicationlevelmetrics(nativeorgenerated)arepublishedtoGraphite,whichisdeployedontheautoscaler and runs alongside Riemann. See Figure 13, which shows historical evolution of the averageresponsetimeinGraphite.

Page 37: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page37of66

Figure10:ApplicationEntryPoint

Figure11:Locust:DefiningWork(as3Users)toLoadtheApplication

Page 38: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page38of66

Figure12:Riemann-dash:HighLoadontheWebLayer

Figure13:AverageResponseTimeasaFunctionofTimeinGraphite

Page 39: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page39of66

5. AnalyticsEngine

5.1. TheoreticalBackgroundOne of the key features of cloud computing is the capability of acquiring and releasing resources on-demand.Theobjectiveofacloudapplicationoperator inthiscase istoallocateanddeallocateresourcesfrom the cloud to satisfy its Service Level Objectives (SLOs), while minimizing its operational cost. Toachievehighagilityrespondingtorapidperformancefluctuations,theresourceprovisioningdecisionsmustbemadeautomatically.Anapplicationperformancemodel canbe constructedusing various techniques,includingqueuingtheory,controltheoryandstatistics.

Theresourcecontroldecisionscanbeeitherproactiveorreactive.Theproactiveapproachusespredicteddemandtoallocateresourcesbasedontheanticipatedload.Thiscouldbeused,forexample,forday/nightloadfluctuations.Thereactiveapproachmakesdecisionsbasedonthecurrentloadandperformance.Bothapproachesarenecessaryforeffectiveresourcecontrolindynamicoperatingenvironments.

WithinCYCLONE,weuseastatistical(stochastic)modelforapplicationresponse-time(latency)intheformofprobabilitydensityfunctions(PDF).BecausethePDFofresponsetimeisacontinuousfunction,weneedtostorearepresentationof it inanacceptableformatwithinoursystem.Weusehistograms,wheretwoparameters are of importance: the number of bins (b), and the bin width (m). The values for thesedetermine what the resulting histogram looks like. Choosing a small bin width will result in a chaoticrepresentation with many isolated peaks. Conversely, choosing the large bin width will obliterate thedetailsofthehistogramthatarenecessaryforthegoodrepresentationofthedata.Formulasthatmaybeusedfortheselectionofb[SCO79]andm[STU26]are:

𝑚 = 1 + 3.3 ∙ log01(𝑛)

𝑏 = 3.49 ∙ 𝑠𝑛90 :

wherenisthesamplesize.Theresponsetime(RT)maybespecifiedbytheapplication/cloudprovidersasanSLOswithintheServiceLevelAgreement(SLA).Notethatthismaynotbetheonlywaytospecifytheresponsetime;forexample,thePDFcouldbeobtainedfrombenchmarkingbeforeaservicebasedonthecloudapplicationisoffered.

Themodelthatweconsideredwhileimplementingalgorithmsisspecifiedasthefollowing:

Each one of N applications has a response time represented by the random variable Di ≥ 0. For eachrandomvariableDiarespectiveprobabilitydensityfunction(PDF)isgiven.Theresponse-timedistributionsmaybeobtainedbybenchmarkingorspecifiedbytheprovider.Next,theexecutioncostsci(inmoneyunits)arespecifiedforeachapplication.

Oncetheapplicationisselected,theperformancemaydeterioratewithadeadlineapproaching,whilethejobs submitted have not yet finished. The Analytic Engine is constantly updated with the measuredapplicationperformance(viaamonitoringconnector).Insuchacase,anotherapplicationisused.Naturally,this comesat additional costs. This situationmayoccurwhen choices aremadebetween relatively slowandcheapapplicationsandapplicationsthatareingeneralfasterbutmoreexpensive.

Page 40: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page40of66

Thequestionsthatneedtoberesolvedare:

1. Whendoestheperformanceofthedeployedapplicationdeviatefromwhathasbeenpromised(inanSLAorthroughbenchmarking)?

2. Howoftendoesoneneedtocollecttheperformanceupdatesfromtheapplicationsthatwerenotinitiallyselected?

3. Whatistheoptimalmomenttotakeanaction?Theseactionsmayincluderescaling,orsubstitutingtheinitiallyselectedapplicationwithanotherone.

Theanswerstothesequestionsaregiveninthefollowingsections.

5.1.1. OptimalApplicationSelection

Thespecifiedmodel isusedastheinputfortheanalyticenginecomponent.Letusassumetherearetwofunctionally equivalent applications, AP1 and AP2. These applications are deployed on their respectivecloud infrastructures and the applicationoperatormust select one.Whenapplicationoperatorwants toselect one of the applications, he specifies constraints covering functional requirements aswell as non-functionalrequirementslikecostandresponse-time.Basedsolelyonfunctionalrequirements,AP1andAP2willbelisted.

Todecidebetweenthem,non-functionalpropertiesneedtobecompared.Therearemanypossiblechoices:deadlines (which implicitly convert to total costs), availability, etc. These parameters are related to oneanotherinanobjectivefunction,forwhichamaximum(orminimum)isfound.

Therefore,theapplicationoperatorneedstospecifyhowtheseparametersarerelatedtooneanother;hedoes this by weighting them. For example, it is possible to specify that relative weights for cost andresponsetimeis4:1,meaningthatcost is4times“moreimportant”forhim.Otherapplicationoperatorsmayhavedifferenttheseweightsfortheirobjectivefunction.

The maximum for the objective function is then found subject to certain optimization criteria. Theoptimizationcriterionmaybe, forexample, thehighestexpectedreward.Wewill typicallyhaveamono-objective function, inwhichwewant tooptimizea trade-offbetweenall theobjectives.Thismeans thatoneshoulddefinetheweighted-sumobjective:

𝑤0 ∙ 𝑜𝑏𝑗0 + 𝑤? ∙ 𝑜𝑏𝑗? + 𝑤: ∙ 𝑜𝑏𝑗:where𝑤D > 0and𝑤0 + 𝑤? +𝑤: = 1

Dependingontheratiobetweentheweightedcoefficients,oneobjectivecouldbemademore“important”thantherest.Thisisexpectedtoresultinauniquesolution.

However, formulti-objective functions, each objective could beoptimized “separately”. There aremanytechniqueswhich transform themulti-objective problem into severalmono-objective problems that onecan solve sequentially (for example, the epsilon-Constraint, two step method of Aneja & Nair). Theexpected result here is a set of non-dominated (Pareto) solutions (“Pareto front”); that is, there is nosolutionthatisbetterthanothersolutionswithrespecttoallobjectives;otherwise,thissolutionwouldbeanoptimalone.

5.2. OptimalApplicationSubstitutionandOptimalRedeploymentofApplications

Themainpointsarisingfromthissectionare:

Page 41: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page41of66

• Theoptimalapplicationsubstitutionandoptimalredeploymentofapplicationscanbeutilizedonlyif observed distributions (response time for applications and deployment times) exhibit the so-called“decreasinghazardrate”.Therefore,theyarenotuniversallyapplicable.

• Optimization only makes sense if multiple alternatives are available: functionally equivalentapplications on different cloud providers or the same application available on multiple cloudproviders.

• Wederive thepolicies foroptimaldeploymentor substitutionbeforedeployment takesplaceorbeforeapplicationsareused.Oncethedeploymenthasnotbeenfinisheduntilgivendeadline,allittakesistoevaluatethecalculatedpoliciesanddetermineatwhichcloudtheapplicationshouldberedeployed,orwhichapplication/cloudcombinationshouldbeusedinsteadofthecurrentone.

5.2.1. Introduction

In many situations, restarting system components, re-issuing a request, or re-establishing a networkconnection improves theperformanceoravailabilityof thesystemunderconsiderationsignificantly. It isnotalwaysknownpreciselywhy this isnecessaryorbeneficial.Most Internetusersare familiarwith thefact that clicking the reload button often helps in speeding up the download of a page, although onetypically understands only to a limited extent why the download was slow in the first place. Anotherexampleinwhichrestartisbeneficialisinthecaseofsoftware“aging”;there-initializationofthesoftwareenvironmentmayhelp inpreventingapplicationfailuresandspeedingupprocessing. Inmanycases,verylittle is knownabout the causesofdelayor agingand consequently cannot correct the rootproblem. Inpractical situations, therefore, we want to determine the optimal time to restart, without knowing ormodellingthedetailsofthesystem.

What characteristics do tasks that benefit from restarts exhibit? In general terms, the completion timewhenstartingfreshmustbelessthanthecompletiontimewhennotrestarting.WecanformalizethisbylettingtherandomvariableTdenotethecompletiontimeofatask.Assumeweareinterestedinthemeancompletion time. Under the assumption of independent identically distributed completion time ofsuccessivetries,onewouldrestartattimeτwhen

𝔼 𝑇 < 𝔼[𝑇 − 𝜏|𝑇 > 𝜏]

holds.

The question then becomes, what distributions of T fulfil this requirement for at least one value of τ.Distributionswithheavytailshavetherequiredbehavior.Forsuchdistributions,asthetaildecreasesatapolynomial rate, leaving considerable probability mass at high values of T. Heavy-tailed and similardistributions commonly arise when studying Internet applications. Distributions with exponentiallydecayingtailsdemonstratetherequiredbehaviorquiteoftenaswell[BOU98].

Mathematically, the key to the analysis of restarts is the hazard rate h(t) of a probability distribution,definedas

ℎ 𝑡 =𝑓(𝑡)

1 − 𝐹(𝑡)

wheref(t)istheprobabilitydensityfunctionandF(t)isthecumulativedistributionofthetaskcompletiontimeT.Thehazardrateattimetcanbeinterpretedasthepotentialtocompletethetask,irrespectiveofwhatmayhavehappenedbefore. If thehazard rate ismonotonicallydecreasing, thehighest completionpotential is at time zeroanda restart alwayshelps. Thesedistributions (functions) are calleddecreasinghazardrate(DHR)functions.Thisisthecaseforcertainheavy-taileddistributions,forexample,theParetodistribution.Incontrast, ifthehazardratemonotonicallyincreases,thehighestcompletionpotential isat

Page 42: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page42of66

infinity and restart never helps. The hazard rate for the exponential distribution is a constant and anyrestarttimeisthereforeequallyeffectiveasisnotrestartingatall.

5.2.2. ProblemDefinition,Model,andAnalysis

Wegiveashortanalysisoftheoptimalapplicationselectionandsubstitutionshouldoneoftheapplicationsfailtoperformasexpected.Letusassumetherearetwofunctionallyequivalentapplications,AP1andAP2.These applications are deployed on their associated cloud infrastructures andwemust choose betweenthem.Thechoiceismadeusinganoptimizationcriterion,whichinourcaseisthehighestexpectedrewardfor the application operator.When application completes executionwithin given deadline δ, provider isrewardedwithR(moneyunits),otherwise,providerpaysapenaltyV.

Ingeneral,foranapplicationAPiwehave:

𝔼 𝑊D = −𝑐D + 𝑅 ∙ ℙ 𝑋D ≤ 𝛿 − 𝑉 ∙ ℙ 𝑋D > 𝛿= −𝑐D + 𝑟 ∙ 𝐹D 𝛿 − 𝑣 ∙ 𝐹D(𝛿)= 𝑟 − 𝑐D − (𝑟 + 𝑣)𝐹D(𝛿)

Weshouldsimplychoosetheserviceprovideriforwhich𝔼[𝑊D]ismaximal,

𝔼 𝑊D > 𝔼 𝑊 ⟺𝑟 − 𝑐D − 𝑟 + 𝑣 𝐹D 𝛿 > 𝑟 − 𝑐 − (𝑟 + 𝑣)𝐹 𝛿 ⟺−𝑐D − 𝑟 + 𝑣 𝐹D 𝛿 > −𝑐 − (𝑟 + 𝑣)𝐹 𝛿 ⟺𝑐D + 𝑟 + 𝑣 𝐹D 𝛿 < 𝑐 + (𝑟 + 𝑣)𝐹 𝛿 ⟺

𝑐D𝑟 + 𝑣

+ 𝐹D 𝛿 <𝑐

𝑟 + 𝑣+ 𝐹 𝛿

Application substitution. Once the application is selected, the performance may deteriorate with adeadlineapproaching,whilethejobssubmittedhavenotyetfinished.Thederivedpoliciesmayspecifythat,insuchacase,anotherapplicationisused.Naturally,thiscomesatadditionalcost.Thequestionisatwhichtimethissubstitutiontakesplace?Thissituationmayoccurwhenchoicesaremadebetweenrelativelyslowand cheap applications and applications that are in general faster but more expensive. This generalsituation is illustrated in Figure 14; an applicationAPj (in this case j = 2) is initially selected. The policyspecifieswhetheritisbettertowaittillitgeneratesaresponseortoselectserviceAPk,performaretryatagivenmoment0 < 𝜃 →d, andwait for serviceAPk to complete (in this casek=N). Theexpected rewardwithoutretryis

𝔼 𝑊 (𝛿) = −𝑐 + 𝑅 ∙ 𝐹 𝛿 − 𝑉 ∙ (1 − 𝐹 𝛿 )= −𝑐 − 𝑉 + (𝑅 + 𝑉)𝐹 𝛿

Theexpectedrewardwhenaretryismadeafter𝜃 →disgivenas:

𝔼 𝑊→d 𝛿, 𝜃 →d = −𝑐term1

+ 𝑅 ∙ 𝐹 (𝜃 →d)term2

+

(1 − 𝐹 𝜃 →d )term3

∙ {−𝑐d − 𝑉 + (𝑅 + 𝑉)𝐹d(𝛿 − 𝜃 →d)}term4

Term1representsthecostsofexecutingtheapplicationAPj.Term2representstherewardRthatproviderincurs when application APj responds before the retry moment𝜃 →d, with probability𝐹 (𝜃 →d). Term 3representstheprobabilitythereisaretryat𝜃 →d,andTerm4representstheexpectedrewardwhenretryismadeat𝜃 →d.Theretryimpliesthatservicecostsckarepaid,andtheremainingtimetodeadlineis𝛿 −𝜃 →d.

Page 43: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page43of66

Let𝜃∗be the optimal substitution time instance when there are two applications to choose from. As𝔼[𝑊|𝜃]iscontinuousin𝜃,theoptimalvalue𝜃∗ontheopeninterval(0,δ)satisfies

𝑑𝔼[𝑊|𝜃]𝑑𝜃 mnm∗

= 0

giving

𝑑𝔼[𝑊|𝜃]𝑑𝜃 mnm∗

= 𝑐? ∙ 𝑓0 𝜃 + (𝑅 + 𝑉)(𝑓0 𝜃 ∙ 𝐹?(𝛿 − 𝜃))

−𝐹0 𝜃 ∙ 𝑓? 𝛿 − 𝜃 = 0

Therefore,theoptimalsubstitutionmoment𝜃∗canbedeterminedfromthefollowingequation:

ℎD 𝜃∗ ∙ 1 +𝑐?

𝑅 + 𝑉 1 − 𝐹? 𝛿 − 𝜃∗= ℎ?(𝛿 − 𝜃∗)

where

ℎD 𝑡 =𝑓D(𝑡)

1 − 𝐹D(𝑡)

Naturally, solving thepreviousequations isnotpractical.Hence, thepolicy𝜋(𝛿p)forgivendeadline𝛿pisdeterminedeitherfromEquation(1)

𝜋 𝛿 = argmaxDn0,?,…,s

𝔼 𝑊D 𝛿

orfromEquation(2)

𝜋 𝛿 = argmax`,dn0,?,…,tmu→v∈(1,x)

{𝔼[𝑊→d(𝛿, 𝜃 →d)]}

Theformerpolicy,specifiedbyEquation1isusedforthosevaluesofdeadline𝛿whentheexpectedrewardwithoutsubstitutionisbiggerthantheexpectedrewardwithit,thatis,when

𝔼 𝑊 𝛿 > 𝔼 𝑊→d 𝛿, 𝜃 →d , ∀𝑗, 𝑘, 𝜃 →d ∈ (0, 𝛿)

Insuchacasethepolicyrepresentsindicesofapplicationsresultinginmaximumrevenueforgivendeadline𝛿. Equation 2 is used for those values of deadline𝛿when substitutions are useful. It contains indices ofapplicationsselectedforgivendeadline𝛿,aswellasindicesofapplicationstobeusedforsubstitution,aswellastheoptimalsubstitutionmoment.

Toevaluatenumerically𝔼[𝑊 (𝛿)]and𝔼[𝑊→d(𝛿, 𝜃 →d)],adiscretizationofbothdeadline𝛿 ∈ (0, 𝛿p)andretrymoment𝜃 ∈ (0, 𝛿)isnecessary.

The discretization steps are denotedby∆𝜏for the deadline, and by∆𝜃for the retrymoment. The equalnumbermofdiscretizationstepsforbothδandθ isanaturalandconvenientchoice. Insuchacase,thediscretizationstepsare∆𝜏 = ∆𝜃 = 𝛿p/𝑚.

Theevaluationgridof𝔼[𝑊→d(𝛿, 𝜃 →d)]is illustrated in Figure15. Foreachpointwithin sucha grid, thevalueofthefunction iscalculatedbasedonthepresentedequations.Thenumberofevaluationpoints istherefore𝑚 ∙ (𝑚 + 1)/2 . The grid points where function𝔼[𝑊→d(𝛿, 𝜃 →d)] reaches a maximum are

Page 44: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page44of66

representedbysymbol⋆inthisfigure.Therefore,thesepointsalsospecifyoptimalretrymomentsforgivendeadlinevalue𝛿.

Figure14:ApplicationSubstitution

Figure15:EvaluationGridforExpectedRevenueinCaseofApplicationSubstitution

5.3. FrequencyofPerformanceUpdatesfromApplicationsWe have seen that response time distributions are an example of probabilistic Quality of Service (QoS)guarantees. These guarantees are used to derive optimal selection and substitution policies. Due to thevolatility of the cloud environment(s), these distributions are not time-invariant; they changeover time.Becauseofthis,previouslydeterminedpoliciesmaynolongerbeaccurate.

Our approach for realizing the response-time distributions considers these time-varying changes. Thecurrentresponse-timedistributioniscomparedtotheoneusedforthepreviouspolicyupdate.Usingwellknown statistical tests, we determine if a significant change occurred and thus if the policy must be

δ θ Δθ

Δτ

C

δ𝐿 δ𝐻

Page 45: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page45of66

recalculated.By trackingresponsetimes, theactual response-timebehaviorcanbecaptured inempiricaldistributions.

We illustrate our approach using Figure 16. After each execution of a request in (1) the empiricaldistribution is updated at (2).Once an application is selected, other applicationsmay not be used for awhile. In that case, we do not receive any information about these applications. These could becomeattractive when an application is updated or when it is deployed on a more powerful cloud instance.Therefore,in(3),whenanapplicationisnotusedforacertaintime,aproberequestwillbesentat(4)andthecorrespondingempiricaldistributionwillbeupdatedat(5a).Aftereachcalculationofthepolicies,thecurrent set of empirical response-time distributionswill be stored. These are the empirical distributionsthat were used for policy calculations and form a reference response time distribution. Calculating thepoliciesforeverynewsampleisexpensiveandundesired.Therefore,astrategyisusedwherethepolicieswill be updated when a significant change in one of the applications is detected. For this purpose, thereference distribution is used for detection of response time distribution changes. In (4) and (5a), thereferencedistributionandcurrentdistributionare retrievedanda statistical test is applied fordetectingchange in the response time distribution. If no change is detected, the policy remains unchanged;otherwise the policy is recalculated. In (6) the lookup–table is updated with the current empiricaldistributions and these distributions are stored as new reference distribution. By using empiricaldistributions,wecandirectlylearnandadaptto(temporary)changesinbehaviorofapplications.

Figure16:ClosedLoopControlApproach

Significant change detected

No

Yes

Execute policy

1

Send Probe 4

Update distribution

5b

Test distribution change 5a

Derive policy 6

No significant change detected

Response time realisations

Update Distributions

2

Probe time expired?

3

Page 46: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page46of66

5.4. PerformanceDeviationMainpoints:

• Thereisno“onesizefitsall”whenthesealgorithmsareapplied.• TheEngineneedstohaveacollectionofmultiplealgorithms.• Weneedtoverifyandtestthesealgorithmsinactualdeploymentstoidentifythosethataremost

suitable.• Thereisatypicaltrade-offbetweenspeedofdetectionandaccuracy.Typically,thelongerwewait,

thebetterdetectionwouldbe.

5.4.1. Background

Monitoring applications/infrastructure in multi-cloud settings implies that measurements arrive in datastreams.Adatastreamhasnecessarilyatemporaldimension,andtheunderlyingprocess(e.g.applicationresponsetime)thatgeneratesthedatastreamcanchangeovertime[HUL01,WID96].Itisthereforecrucialtobeable todetectpoints in time inwhichsuchchangesoccuras thesemay indicatepotential failures,performancedeteriorationorattacks.

Theobservedstatisticalprocessesaretypicallymodelledasatime-series.Thepointinatime-serieswhenthe statistical properties of an underlying process change is known as a change-point. In practice, thesechangesmaymanifestas

• Ashiftinmean,• Ashiftinvariance,or• Achangeinadistributionoftheprocess.

The changeofmean is illustrated in Figure 17.Aswe can see from this figure, theunderlying statisticaldistributionofthedatasuddenlychanges.Giventhestatisticalrandomnessinthedata,wecannotsimplydrawaline,sayat9,anddeclareachangepointwhenitiscrossed.Takingsuchasimpleapproachwouldleadtomanyfalsealarms.Onlywhenthedataisstructurally,onaverage,lowerthan10canwesaythatachangeisoccurred.

The change-point detection methods derive from the sequential analysis. Both sequential analysis andchange-pointdetectionmakeuseofateststatistic(somefunctionoftheobservations)andissueanalarmwhenthisteststatisticexceedsacertainpredefinedthreshold.Thegoalistochoosethethresholdinsuchaway that the detection delay isminimized for a given probability of a false alarm (called type I error instatistics). The main difference is that change-point detection tests for a change in the underlyingdistributionatapointinthesequencewhereassequentialanalysisteststhewholesequence.Change-pointdetectionmethodsaremoresuitablefortheCYCLONEproject.

Therearemanyalgorithmsthatcouldbeusedforthechange-pointdetection.Theeffortscouldbedividedintwobroadgroups,namely

• The offline setting, where inference regarding the detection of a change occurs retrospectively,afterallrelevantdatahavebeenreceived

• The online setting, featuresmethods inwhich analysis is performed sequentially - as every newobservationisreceived,thedetectionmethodis implementedtolocatepossiblechange-pointsinpreviousobservations.

Change-pointdetectionmethodscanbefurthercategorizedintotwoclasses:

• Parametric:Incorporatesdistributionalknowledgeofthedataintothedetectionscheme,and

Page 47: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page47of66

• Non-parametric:Makesnosuchdistributionalassumptionsregardingthedata.

Figure17:OccurrenceofChangeofMeaninTimeSeries

5.4.2. ProblemDescription

Weassumethatthedataareobservedsequentially;everytimestepaddsonemoreobservation.Giventheset of observations up to the current time, wewant to knowwhether a change-point has occurred. Ingeneral,change-pointdetectionmethodsmonitorsometeststatistic,whichisbasedontheobservations,andissuesanalarmifthisteststatisticexceedsacertainthreshold,suchthatthecalculatedprobabilityofnot detecting a change-point is kept below a certain predefined value α, for example α = 5%. Thesemethodsperformahypothesistestforeachtimestep.Wedefinethishypothesistestbelow.

More formally, consider a sequence of observations,𝑋 = (𝑋0, 𝑋?, 𝑋:, … ), whichmay contain a change-point.Inprobabilisticterms,theproceduretoidentitythechange-pointis:

Suppose there is a𝑘 > 0such that the𝑋D are independent identically distributed (i.i.d) realizations of arandom variable with probability density function𝑓for𝑖 = 1, … , 𝑘 − 1 while the𝑋D are i.i.d. with adifferentdensity𝑔for𝑖 ≥ 𝑘.Inthiscase,wecall𝑘achange-point.

At a point in time𝑛 = 1, 2, …, we check whether a change-point has occurred at some time𝑘 ≤ 𝑛, byevaluating 𝑋� ≔ (𝑋0, … , 𝑋�), if not we continue by evaluating 𝑋��0 ≔ (𝑋0, … , 𝑋��0)etc. In terms ofhypothesistesting,attime𝑛wewanttodecidebetweentwohypotheses:

• Underthenull-hypothesis(𝐻1)the𝑋D 𝑖 = 1, … , 𝑛 arei.i.d.realizationsofarandomvariablewithdensity𝑓.

• Underthealternativehypothesis(𝐻0)thereisa1 ≤ 𝑘 ≤ 𝑛suchthatupto𝑘 − 1theobservationsarei.i.d.samplesfromadistributionwithdensity𝑓,whilefromobservation𝑘ontheyarei.i.d.withadifferentdensity𝑔.

Therefore, under the null-hypothesis there has not been a change-point, while under the alternativehypothesistheprocesschangesatsometime𝑘.

Mean = 10

Mean=8

Page 48: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page48of66

5.5. EvaluatingChange-pointDetectionMethodsTypicalmetricstoevaluateahypothesistestconsidertheprobabilityofso-calledtypeIandtypeIIerrors;theperformanceofthealgorithmattime𝑛canbequantifiedbytheseerrorprobabilities.Theyaredefinedasfollows:

• AtypeIerror(falsepositive)occursifwedetectachange-pointthathasnotoccurred.ThetypeIerrorprobabilityisthereforetheprobabilityoffalsepositives.

• AtypeIIerror(falsenegative)occursifwemissachange-point.ThetypeIIerrorprobabilityistheprobabilitythatweaccept𝐻1over𝐻0.

Letusnowconsiderachange-pointdetectionprocedurefortheentiresequence𝑋.Thatistheprocedurewhereastoppingruleissuesanalarmattime𝜏 ≥ 1,definedasthefirsttimethatwedecidetoreject𝐻1.Ontheonehand,𝜏shouldoccursoonafterthechange-point𝑘;ontheotherhand,therateoffalsealarmsshouldbe low.Mathematically, this canbe formulatedas keeping thedistributionof𝜏 − 𝑘stochasticallysmall,giventhatthechange-pointtakesplaceat𝑘(i.e.,under𝐻0(𝑘)),whereasthedistributionof𝜏shouldbestochasticallylargeincasethereisnochange-point(i.e.,under𝐻1).

Wewillnowdescribejustoneofthenon-parametricchange-pointmethodsusedtodetectchangeinmean.Thedetailsofthealgorithmandarigorousanalysisaregivenin[BRO93].

5.5.1. TheAlgorithmofBrodskyandDarkhovsky

Thisalgorithmhasnoaprioriassumptionondatafollowingadefineddistributionandisusedtoidentifyachangeinmeanoftheunderlyingtime-series.

At time𝑛thisalgorithmconsiders theobservations 𝑋�9s�0, 𝑋�9s�?, … , 𝑋�,where𝑁is thewindowsize.To check whether a change in mean has occurred at time 𝑛 − 𝑁 + 𝑘 + 1 , the average over𝑋�9s�0, … , 𝑋�9s�d is comparedwith theaverageover𝑋�9s�d�0, … , 𝑋�. Inotherwords, thewindow (ofsize𝑁) is split in two (unequal) parts, one of size𝑘and the other one of size𝑁 − 𝑘, and respectiveaverages are calculated.When there is no change-point within the window, the difference tends to beclose to zero. An alarm is raised when there is a𝑘for which the difference exceeds some predefinedthreshold𝑐 > 0. Note that𝑐should be smaller than the mean change, as otherwise issuing an alarmremainsrare,even if there isachange-point.For𝑘closeto1or𝑁 − 1,oneoftheaveragescontainstoofewvalues,andthealgorithmmayproduceerrors.

To prevent this, a parameter𝛾 ∈ (0, 0?)is chosen and only a subset of𝑘is considered namely,𝑘 ∈

{ 𝛾𝑁 , … , 1 − 𝛾 𝑁 }.Thealgorithmcanbesummarizedasthefollowing:

Fixthewindowwidth𝑁, 𝛾 ∈ (0, 0?)andthreshold𝑐 > 0.Let:

𝑈� 𝑘, 𝑁 = 1𝑘 𝑋D −

1𝑁 − 𝑘

�9s�d

Dn�9s�0

𝑋D

Dn�9s�d�0

Thealgorithmraisesanalarmwhentheteststatistic

𝑈� 𝑁 = maxd∈ �s ,…, 09� s

𝑈�(𝑘, 𝑁)

exceedsthethreshold𝑐.

Weillustrate thealgorithmofBrodsky-Darkhovsky inFigure18. In thisexample, thealgorithm isusedtodetect a change inmean from𝜇0to𝜇?at time instance𝑘 = 51. The settings are𝑁 = 40, 𝛾 = 0.1and

Page 49: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page49of66

threshold𝑐 = 0.5. A change-point at𝑘∗ = 50is detected at time𝑛∗ = 53. The sample averages are𝑚0 =

0:�

𝑋D��Dn0� = 1.65and𝑚? =

0� 𝑋D�:

Dn�1 = 0.98; their difference is 0.67 which is larger than thethresholdvalue.

Figure18:AnExampleofBrodsky-DarkhovskyAlgorithm

5.5.1.1. WindowSizeWeusetheconceptofawindowtodivideupacontinuousstreamofdataintobatchesforprocessing.Forexample,ifwemonitortheaverageresponsetimeofasystemandreceiveadatapointevery10minutes,usingawindowsizeofonehourmeansthatattheendofeachhourwewouldcalculatetheaverage(mean)value of the last hour’s data (6 data points) and compute the anomalousness of that average valuecomparedtoprevioushours.

Thewindowsizehastwopurposes:itdictatesoverwhattimespantolookforanomalousfeaturesindataanddetermineshowquicklyanomaliescanbedetected.Choosingashorterbucketspanallowsanomaliestobedetectedmorequicklybutattheriskofbeingtoosensitivetonaturalvariationsornoiseintheinputdata.Choosingtoolongabucketspanhowevercanmeanthatinterestinganomaliesareaveragedaway.

Themethod requires the user to set three parameters. First, onemust decide on thewindow size, andsecond,onaparameter𝛾.Awindowofobservationsisdividedintotwointervalssuchthatbothcontainatleastafraction𝛾ofthenumberofobservationsinthewindow.Third,athresholdonthesizeofthemeanchangemustbechosen;thisthresholddetermineswhenanalarmisraised.

5.5.2. ChangeinVariance

It is apparent there ismuch less literature about detection of change in variance than change inmean.Under assumption that mean does not change, the problem of testing for change in variance can beconvertedtotheoneoftestingforchangeinmeanwithsimpletransformation.

Letussupposethatinitialdatais𝑋D andwedefine𝑌D = (𝑋D − 𝜇)?,where𝜇isthemean.Thenthechangeinmeanof𝑌D willbeachangeofvariancein𝑋D,andanygivenalgorithmforchangeofmeancouldbeappliedafterthissimpletransformation.Thisresultsfromtheequation𝔼𝑌D = 𝔼(𝑋D − 𝜇)? = Var(𝑋D).

5.5.3. MultidimensionalChange-pointDetection

Performancemonitoring (e.g. networks and applications) often involvesmultidimensionality (more thanone sensor) with dependence between sensors and time. Many of the well-known anomaly detection

Page 50: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page50of66

methods, however, assume one-dimensional, independent input. These assumptions are of course toounrealistic inmanycases,andstraightforwardapplicationtomorecomplexproblems frequently leads toerroneousresults.Dependenceisthusacommoncharacteristicofmultidimensionaldata(measurements).

There exist several generic models known to have good capabilities to model data originating frommultidimensional,dependentmeasurements,especiallymultidimensionalautoregressive,movingaverage(ARMA) time seriesmodels, which are popular in econometrics research. Existing literature onmachinelearning techniques show that violatedmodel assumptions do not necessarily obviate the efficacy of agiven algorithm. As an example, we present a parametric method that makes assumptions of theunderlyingdata.Wefocusonmultidimensionaldata thatexhibit“abrupt”change-pointsandcan includemultiplechangepointsovertime.

5.5.3.1. LikelihoodRatioTestIn Galeano and Peña’s work on detecting covariance changes in multivariate data, they proposed twomethods for calculating test statistics from which change points could be identified [GAL07]. Thesemethodsmodelthegivendataasavectorautoregressiveintegratedmovingaverage(vARIMA),extractingtheerrors(orinnovations)fromthisdata,andapplyingthesemethodsonthiserrordata.

The first such statistic, on which we focus here, uses a likelihood-ratio test (LRT) to compare twohypotheses:thenullhypothesis𝐻�thatthecovarianceofthiserrordata isbestcharacterizedbyasinglecovariance matrix𝛴versus the alternative hypothesis𝐻�that, at some time pointℎ, the data is bestcharacterizedbytwoseparatecovariancesmatricesΣ0beforeℎandΣ?afterℎ.Thelogarithmofamodifiedformoftheratio𝐻�/𝐻�thengeneratesateststatistic𝐿𝑅�thatexistingliteratureshowsisgovernedbyachi-squareddistributionwithdegreesof freedomproportional to thedimensionality𝑘of thedata. Fromsimulationsofthisdistribution,wecangenerateacriticalvaluegivensome𝛼againstwhichtocomparethisteststatistictodeterminewhetherachangepointexistsatsometimeℎ.

5.5.3.2. AlgorithmGivensometime-seriesdata𝑦�andconfidence𝛼,weusethealgorithmdescribedinFigure19to identifypointsofchangeincovariance.

ToimplementLRT,weusedPythonandScikit’sstatsmodelspackageforfittingdatatoVAR()models.OneshouldnotethisrestrictiontoVAR()modelsisaresultofanexistingconstraintinthestatsmodelspackage.

We also implemented a version of the LRT algorithm that does not rely on calculating the𝑊transformationmatrix. Rather than evaluating𝑊,we leveraged statsmodels and itsmaximum likelihoodestimationtofitthedatatotwonewVAR()modelsforeachregime.Theabovealgorithmperformsbetterthan this secondary implementation because it obviates the need for separate rounds of maximumlikelihoodestimationforeachlevelofrecursion.

5.5.4. GeneralConclusions

Fromtheabovediscussion,therearemanyalgorithmsthatcouldbeusedfordetectionofchangesinthemeasurements of, for example, applicationperformance. Thismeans that several algorithmsneed tobeimplemented and deployed within our Analytics Engine component. The verification of the algorithmsrequiresagoodknowledgeofassumptionsusedforthesealgorithmsandestimationoftypeIandtypeIIerrors,whicharethemainperformanceindicatorsoftheirapplicability.

Page 51: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page51of66

Figure19:AlgorithmbyGaleanoandPeña

Function LRT(yt

,↵) Algorithm by Galeano and Pena [9]

fit VARIMA(p, d0, q) model to yt

;compute residuals e

t

;

k dimension(yt

) ;

d k(p+ q + 1) + k(k+1)2 + 1 ; /* minimum points needed */

n len(yt

) ;

df k(k+1)2 ; /* degrees of freedom for �2 */

C simulateChiSquareMax(df,↵) ; /* obtain the critical value

*/

LR zeros(n) ;S 1

n

⌃n

i=1ei · e0i ;for h 2 [d, n� d� 1] do

v h/n ;S1 1

h

⌃h

i=1ei · e0i ;S2 1

n�h

⌃n

i=h+1ei · e0i ;LR[h] n ln |S|

|S1|v|S2|1�v ;

end

hmax

argmaxh

(LR) ;⇤max

LR[hmax

] ;

changePoints [ ] ;if ⇤

max

> C thenchangePoints += h

max

;

W transformation governing new data regime (see [9]);changePoints += apply LRT to e

t

[0 : hmax

] ;changePoints += apply LRT to W · e

t

[hmax

+ 1 : n] ;end

return changePoints

54

Page 52: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page52of66

6. SlipStreamComparativeAnalysis

SlipStream provides the bulk of the application management functionality within the CYCLONE toolkit.When developing the CYCLONE project, SlipStreamwas chosen because of its unique feature set in themulti-cloud domain and SixSq was brought into the project because of this. However, this sector hasevolved rapidly and there are now other products and services that offer similar functionality. It isimportant to understand how SlipStream compares to those other products and services to understandhow theCYCLONE toolkit canbeexploited in the future. This sectionprovides a comparativeanalysisofSlipStream (and Nuvla which is a “Software as a Service” offering of SlipStream) to other products andservicesinthissector.

SlipStream,bySixSq,formsthetechnologyfoundationofitsportfolioofproductsandservices:

1. Nuvla: SlipStreamas a Service,managedby SixSq toofferbrokerageandmulti-cloudapplicationautomation

2. SlipStreamonpremise:softwareproductthatenablesmulti-cloudandhybridcloudstrategiesforcustomers

3. NuvlaBox:instantprivatecloud,deliveredasafan-lessappliance,connectedtotheSlipStreameco-systemforremotemanagement

Nuvla is used in a growing range of industries and fields of application. Similarly, NuvlaBox is targeting“InternetofThings”(IoT)andSmartCitymarkets.AndSlipStreamonpremise isgenericandcanapplytomostapplicationdomains.

ThisshortcomparativestudyfocusesonSlipStream,inthecontextoflarge-scaleapplicationmanagementinmulti-cloudandhybridenvironments.WhencomparingtheSlipStreamecosystemwithcompetitionandalternatives on the market, we must carefully compare like and like, even in this narrowed context.ProductsorservicescansometimesseemtocompetewithSlipStream-basedproductsandservices,where,inreality,theycanbecompanions,withlittleoverlapandgreatcomplementarity.

SlipStreamprovidesawiderangeof featuresthatspanmanydifferentmarkets.Wehavesegmentedtheanalysisintothefollowingareas,showingwhereSlipStreamcompetesorcomplementsothersolutions:

1. Cloudmanagementsolutions2. Configurationmanagementsolutions3. Toolsandlibraries4. Containermanagementsolutions5. PaaS6. Brokeragesolutionsorservices

Wechosethesecategories,sincetheyrepresentthecategoriesmostcloudrelatedproductsandservicesusewhencommunicatingtheirvalueproposition.Asyouwillseethroughoutthisanalysis,thereareseveralcategorieswithoverlappingfeaturesandbenefits,orevenalternativewaysofdeliveringagivenbenefit.

For each of these categories, we highlight the support for the open source movement. This is key ascustomers,bothinindustryandacademia,increasinglyvaluetheopensourcemodel,becauseitcreatesadifferentdynamicbetweencustomersandproviders.Indeed,itallowscustomerstoinfluencetheevolutionofproductsandperhapsevencontributetothem.

Page 53: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page53of66

The following figure compares the products and services mentioned in this analysis based on howapplication-specificor infrastructure-specifictheyare.AsSixSqvaluesvendor-neutralsolutionsthatapplytoabroadrangeofapplications,NuvlaandSlipStreamsitintheupper-righthandcornerofthediagram.

6.1. CloudManagementSolutionsCloudmanagementsolutionsarenumerousandvaried.Severalofthesesolutionsattempt,likeSlipStream,toprovideamulti-cloudorhybrid cloud solution,withawidecoverageofexisting IaaSproviders. ThesemanagementsolutionstypicallysitontopofIaaSAPIs.

Butevenwithinthiscategory,wecandistinguishsub-categories:

1. CloudResourcesManagement:managementofcloudresources(e.g.compute,storage,network)2. Application management, in the cloud: application deployment, sometimes referred to as

applicationorchestration

CloudResourcesManagementcanbedeliveredeitherasservices,suchasSlipStreamorNuvla,butalsoasIaaSsolutions,foronpremiseinstallations:

1. OpenStack2. CloudStack3. OpenNebula

Multi-cloudHybrid-cloud

Vendor / SolutionSpecific

Appl

icat

ion

agno

stic

Appl

icat

ion

spec

ific

Nuvla

ChefPuppet

SaltAnsible

AWS: Cloud Formation

OpenStack Heat

Cloudify

OpenShiftMesos

KubernetesRancher

Docker Engine / Swarm / ComposeGoogle App Engine

Heroku

SlipStream

Page 54: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page54of66

4. vCloudDirector

Table3:IaaSvendors(publicandprivate)vsSlipStream/Nuvla

Product/solution/vendor OpenSource? Interoperable? CommentSlipStream/Nuvla • • Apache 2.0 and support most IaaS solutions

andservicesOpenStack •

CloudStack •

OpenNebula •

vCloudDirector CommercialproductbyVMware

These IaaSsolutionsoffera rangeof features tomanagecloudresources.Severalof thesesolutionsalsosupport or include related projects or products able to deliver features that provide higher-levelabstractions,towardsforexamplethePaaSlayer(seeSection6.5fordetails).

TherearealsoagrowingnumberofIaaSdeliveredbyproviders:

1. AmazonWebServices2. MicrosoftAzure3. GoogleCompute4. IBMSoftLayer5. DigitalOcean6. Exoscale7. T-System’sOpenTelekomCloud8. Cloud28+providers

While some providers offer multi-region ormulti-zone services, these services are provided by a singlevendor,withtheriskof lock-in forcustomers.Therearealsoattemptstoprovidebothclientandserver-sideabstraction,withDeltaCloudasanexampleofserver-sidesolution.SeeSection6.3forfurtherdetailsonclient-sidemulti-cloudsupport.

Table4:CloudResourcesManagementvsSlipStream/Nuvla

Product/solution/vendor OpenSource? Multi-cloud/hybrid? AvailableasaService? CommentSlipStream/Nuvla • • • Opensource:Apache2.0

AmazonWebServices • Offersseveralregions/zones

MicrosoftAzure • Offersseveralregions/zones

GoogleCompute • Offersseveralregions/zones

Page 55: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page55of66

IBMSoftLayer • Offersseveralregions/zones

DigitalOcean • Offersseveralregions/zones

Exoscale • • BasedonCloudStack

T-System’sOpen TelekomCloud • BasedonOpenStack

Cloud28+providers • Based,mostly,onOpenStack

DeltaCloud • • ApacheSoftwareFoundationproject

On the Application management, in the cloud, product and services range, recent years have seensignificantmovement,withseveralstart-upsbeingacquiredby largesystemintegratorsoroperators.Forexample,ServiceMeshwasacquiredbyCSC(October2013),EnstratiusbyDell(May2016),CliQrbyCisco,andGravitantbyIBM.Onceacquired,thesesolutionsarefrequentlyeitherblendedinexistingportfoliosorintegrated with existing offerings. It is therefore often difficult to disentangle these products forcomparison.

Ontheserviceside,ComputeNextlaunchedafewyearsagoamarketplaceforsinglevirtualmachine(VM)applications available on several IaaS platforms. BitNami supports a wealth of standard apps, whichtypically deploy on a single VM. But these solutions do not support custommade deployment recipes,which is key for many customers, developing their own software, or mixing intimately softwarecomponents,servicesandmicro-services.

Wealsohavemost ‘fabricmanagement’ systemsoffering theirown ‘forge’ for facilitatingdeploymentofcommon,oftenopensource,software(seeSection6.3fordetails).

Finally,severalprojectsalsoattempttoprovideapplicationmanagement.Thesecancomefromproviders,suchasAmazonwithCloudFormation,orrelatedopensourceprojects,suchasOpenStackHeat.

Table5:Applicationmanagement,inthecloudvsSlipStream/Nuvla

Product/solution/vendor OpenSource? Support customapps?

Hybrid cloudsupport?

Comment

SlipStream/Nuvla • • • Apache2.0

ServiceMesh • • SoldtoCSC

Enstratius • SoldtoDell(andsinceabandoned?)

CliQr • SoldtoCisco

Gravitant • SoldtoIBM

ComputeNext Nowfocusingonon-premise

BitNami PublicApps

OpenStackHeat • • OpenStackspecificandcommand-linetool

Page 56: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page56of66

6.1.1. Competitororcompanion?

AretheproductsandsolutionsidentifiedintheprevioussectioncompetitorsorcompanionstoSlipStreamandNuvla?Itdepends.

SlipStream,andthereforeNuvla,doesn’tprovidetheIaaSlayer.Onthecontrary,itleveragestheIaaSlayer,viaanarchitecturallayercalledconnectors.Therefore,SlipStream/NuvladeliversitsvalueontopofIaaSAPIs.And this connectorarchitecture is the reasonwhySlipStream is trulymulti-cloudandsupports richhybridcloudscenarios(e.g.public-public,public-private,private-private).

Regarding applicationmanagement, in the cloud, SlipStreammostly competes with the solutions listed.Simplicity,multi-cloud,hybrid-cloudandopensourcearethekeydifferentiatorinthissegmentinfavorofSlipStreamandNuvla.

6.2. ConfigurationManagementSolutionsConfigurationmanagement or ‘fabricmanagement’ solutions help to configure servers and, to a certainextent,monitorserverstomaintainorevolvethespecifiedconfiguration.Theyworkbyactivelyalteringtheserverstateby,forexample,updatingsoftwarepackagesorrunningspecificscripts.

Themainplayersinthisfieldare:

1. Chef2. Puppet3. Ansible4. Salt

These solutions have historically targeted system administrators, allowing them to manage largeoperational systems with fewer manual interventions. More recently they have added functionality tobetterworkwithcloudbased infrastructures.Thisapplies tobothadministratorsofcloud infrastructures(e.g.managingtheIaaSlayeronphysicalservers),aswellasapplicationsrunninginthecloud.

Theircloudmanagementfeaturesarespecifictotheirowntooling.Theirsupportformulti-cloudislimitedand they offer no interoperability with other fabric management systems. They also require significantinvestmentintimeandeffort,asusersmustunderstandandmasternewdomainspecificlanguages(DSL)andconcepts.

Someofthevendorsoftheseabove listedsolutionshavetriedtodeliveranall-inclusivesolution,addingforexampleancillaryprojectstocreatehigh-levelportals.However,sincethesesolutionsprimarilyaimatsystemadministratorsanddevelopers,oftenpreferringcommand-lineinterfaces,theygenerallyhavepoorsupportforrichwebuserinterfaceexperience,comparedtoSlipStreamAppStoreandDashboard.

The community behind these solutions also maintains rich and interesting repositories of recipes (alsoknown as cookbooks, playbooks ormanifests).While the quality and stability of these recipes can varysignificantly,theycanspeedupsoftwareinstallationandconfiguration.

Thesupportforscalingandauto-scalingofthesesolutions,ifavailable,isgenerallycomplex.Incomparison,SlipStream’sscalingandauto-scalingsolutionissimpleandmulti-cloud.

SupportforAppStoreandself-provisioningservices,tofacilitateapplicationdeploymentforawiderrangeofusersrequiresextrasoftwareandservices,whileitisacorefeatureofSlipStream.

Page 57: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page57of66

Table6:Applicationmanagement,inthecloudvsSlipStream/Nuvla

Product/solution/vendor OpenSource?

Languageagnostic?

Web UIanf high-levelfunction?

Interoperable?

Comment

SlipStream/Nuvla • • • •

Chef •

Puppet •

Ansible •

Salt •

6.2.1. Competitororcompanion?

Weseefabricmanagementsolutionmostlyascompanions.Ifandoncethecustomerhasinvestedtimeandeffort in learning the tool and its language, they shine is their ability to manage the installation andconfigurationofservices.Theycanalsoparticipatetothemaintenanceandupdateofservices,assoftwarepackagesarereleased.

They also require significant investment in time and effort, as users must understand and master newdomain specific languages (DSL) and concepts.We argue that SlipStream provides similar benefits, in asimplermanner,withamuchlowerentrybarrier.However,configurationmanagementsolutionscaneasilybe integrated in a SlipStream recipe. This can also be a path for integration of diverse configurationmanagementsolutions,forcustomerstoharnessexistinginvestments.

Finally, as more system administrators and DevOps teams opt for architectures with mostly statelessservicesandexplicit,isolatedstatemanagement,itisofteneasiertore-deployservicesonnewresources(akaVMsorcontainers),ratherthanupdatingormigratingexistingservices.ThiscanbeperformedmoresimplyandflexiblyusingSlipStream,onanycloud.

We therefore recommend thatusersandcustomerswith investment in thesesolutions simplyuse theserecipesinsideSlipStreamapplicationcomponentdefinitions.Seerelatedbestpracticesonhowtoachievethis.

6.3. ToolsandLibrariesTo facilitate the management of infrastructure as a Service (IaaS), several client tools and libraries areavailable.Currently,themostpopularare:

1. Terraform: open source project by HashiCorp, is a command-line tool, aiming at simplifying thecreationandmanagementofIaaSresources

2. Libcloud: Apache Software Foundation library, written in Python, providing abstractions formanagingIaaSresources

3. Jclouds:ApacheSoftwareFoundationlibrary,written inJava,providingabstractionsformanagingIaaSresources

Page 58: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page58of66

Jcloudsandlibcloudarebothlibraries,allowingrespectivelyJavaandPythondeveloperstointerfacewithIaaS API to manage virtual resources. Terraform is more a command-line tool, offering an abstractconfigurationfilebasedsystemtodefineandmanageIaaSresources.

SlipStreamalsoshipsanopensourcecommand-lineclient.Thelatestversionincludesanabstractionofthebasiccloudresourcerequirements(e.g.numberofcores,memory,rootdisksize).Thismeansthatusersdonothavetospecify‘t-shirt’sizes(i.e.predefinedbundlesofresourcesizes,suchascores,memoryanddisk)for the virtual machines. Having to provide these, breaks interoperability, since users must track andmanagethedifferent,oftenincompatible,t-shirtsizedefinitions.SlipStreamallowsuserstodirectlyspecifytheseunderlyingresourcesizes.Itthensearchesandmatchestherightofferforeachavailablecloud,usingitsservicecatalogue.Thisensuresuniformandinteroperableprovisioningofcloudresources.

Table7:ToolsandlibrariesvsSlipStream/Nuvla

Product/solution/vendor OpenSource? Interoperable?

Abstract‘t-shirt’size?

Supportfilters?

Command-line?

Supportnativefeatures?

Comment

SlipStream / Nuvlacommand-lineclient • • • • •

Terraform • • • •

libcloud • • ApacheSoftwareFoundationproject

Jclouds • • ApacheSoftwareFoundationproject

6.3.1. Competitororcompanion?

SlipStreaminterfaceswiththeIaaSviaasystemofconnectors.Severaloftheconnectorsareimplementedusinglibcloud.EarlierversionsofSlipStreamusedJclouds,buttheJcloudsprojectgrewincomplexityandthe SixSq team found libcloud simpler andmore effective. However, libcloud is a developer tool,whichrequiresthatdevelopersknowsomedetailsofeachcloudthattheywanttouse.

These tools therefore compete, but canalso coexist, asdevelopers and systemadministrators learnandevolvetheirtooling.

One key difference is that the SlipStream client provides a higher level and simpler abstraction to IaaS.Terraform,forexample,exposingmoreoftheintricaciesofeachIaaSprovidernativefeatures,atthecostofportability.

Finally,theSlipStreamclientprovidesatruly interoperableclient,whereapplicationdeployments,simpleand complex, can be automatically deployed,without requiring any IaaS provider prior knowledge. Thisalsoincludesdefaultmatchingofthecheapestofferandcloud,andadvancedpolicyplacements.

6.4. ContainerManagementSolutionsCurrently,thecontainerworldisinebullition.Beyondthehypeandthepromises,thereisarealizationthatchanges required to embrace the container revolution is intrusive to the way IT is operated andapplicationscrafted.Adding to that theuncertaintybetween thechampions fighting todeliver containermanagementsolutions,themarketisunderstandablyskepticalandworried.

Page 59: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page59of66

Someof the confusion comes from the fact that containersper sayarenotnew,but theirmanagementsolutions,andunderlyingphilosophy,is.

One clear benefit of containers themselves is the packagingmodel they propose. Up to now, softwaredevelopers when creating a server, had to assemble binary packages (e.g. RPM, DEB), mixed withconfigurationscripts.Thiscouldbeorganizedusingfabricmanagementsystems(seeSection6.2fordetails).With containers, software developers now have another option to create custom servers. However,creatingacontaineralsorequiresacquiringnewknowledgeandworkingwithintheconstraintsitimposes.

Currently,containermanagementsolutionsinclude:

1. Mesos2. Kubernetes3. DockerEngine/DockerSwarm/DockerCompose4. Rancher5. CloudFoundry

CloudFoundrywas initiallyopensourcedbyVMware,withstart-upsbuildingsolutionsontopandsystemintegrators investing in the solution (e.g. Pivotal). But with the recent development of containermanagementsystems,theCloudFoundryprojectisfindingitselfchallengedbynewcomers.

Containermanagementsolutions,suchasMesosandKubernetes,offerhigherlevelmanagementfunctions.Thesesolutionsare themselvescomplexsystems.Theirbenefit is inprovidinga framework formanagingworkloadsandcontainers,atscale,insideapoolofVMs.

Docker,whichisthecurrentfavoritecontainersolution,isalsoenteringthemanagementchase,withafewfastevolvingprojects,suchasSwarm,EngineandCompose.

Table8:ContainermanagementsolutionsvsSlipStream/Nuvla

Product/solution/vendor OpenSource?

Supportcloudresourceauto-scaling?

Supportcontainers?

Simpletouse?

Automaticdeployment incloud?

Comment

SlipStream/Nuvla • • • • •

Mesos • •

Kubernetes • •

DockerEngine/DockerSwarm/DockerCompose • •

Rancher • • •

CloudFoundry • •

6.4.1. Competitororcompanion?

Comparedtocontainermanagementsolutions,SlipStreamtakesadifferentapproach,whichisinlinewithitsagnosticphilosophy.Usingsimplescripts,SlipStreampackagestogetherbinarypackage installation,aswellasconfigurationandparameterization. Itcanalso leveragecontainers, iftheyexist.Therefore, itcantakeadvantageofcontainers,withoutforcinguserstotaketheredpill.Thismeansusersseethebenefitofautomationearlier,withlesscostordisruptiontoitsinfrastructureandrunningapplications.

Page 60: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page60of66

Further,SlipStreamcanbeinterfacedwithforexamplethequeueofcontainermanagementsolutions,togrow and shrink the available cloud resources, based on the key performance indicators they provide.While this integration is a roadmap item, the possibility to deliver auto-scaled container managementsolutions to any cloud is interesting andwould nicely complement the containermanagement solutionslistedabove.

Finally, container management solutions target very large scale container deployments. In this age ofassemblyofservices,mixingSaaSandPaaSofferingwithcustomersoftware,itisnotclearthatthisraceforhyperscaleaddressestheneedofmostorganizationsandsoftwaredevelopers.SlipStream’sagnosticandindependentapproach isa simplerand saferpath,whileenablingmanyof thebenefits containersoffer,whileprovidingsimplificationindeployingthesemanagementsolutions,thereforereducingcomplexityandrisk.

Asthecontainerworldremainsafast-movingtarget,investingtooheavilyandtoodeeplyinoneofthesesolutionispotentiallyrisky.Instead,SlipStreamoffersasaferroute,whileprovidingaccesstosomeofthebenefits, aswell as potentially providing simplermigration fromone containermanagement solution toanother,thankstoitsabilitytoautomatetheirdeployment.

6.5. PaaSThe Platform as a Service landscape is also very dynamic. With the latest consolidation of cloudmanagementsolutions(seeSection6.1fordetails)andtheadventofcontainermanagementsolutions,thePlatformasaServiceisbeingchallenged.Hereareafewwell-knownPaaS:

1. OpenShift2. Cloudify3. GoogleAppEngine4. Heroku

AccordingtoPaaSfinder3,SlipStream,ormorespecificallyNuvla(SlipStreamasaService,managedbySixSq),is a IaaS-centric PaaS. This taxonomymight seema little complex, but it is usefulwhen comparingPaaSsolutions.SomePaaS,suchasHerokuorGoogleAppEngine,aregenericPaaStypes,wheretheyhidetheIaaSlayer.Thisissignificant,sinceitmeanstheydonotsupporthybriddeployment(e.g.youcan’tconnectyour private IaaS to the PaaS). Further, they tend to support specific deployments and applicationarchitectures–e.g.request/responsearchitectures,suchaswebapplications.Theyalsotendtoprovidespecific stateordatabasemanagement services (e.g.BigTable),pre-integratedwith theoverall solution.Finally,theyhaveapricingmodelspecifictothem–e.g.basedonthenumberofrequest/responsepairsorresourcesdeployed;suchthatprovidersdeployresourcestomatchuserapplicationdemandsasloadsvaryovertime.

Thesegeneric typePaaSarewelladapted forwebapplications,wheredeploying theapplicationsonly tothePaaSproviderisacceptabletousersandwhenthepricingmodelisalsoadequate.

Table9:PaaSvsSlipStream/Nuvla

Product/solution/vendor OpenSource?Support onpremiseinstallations?

Applicationarchitectureagnostic?

Available as aService? Comment

SlipStream/Nuvla • • • •

3https://paasfinder.org

Page 61: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page61of66

OpenShift • • •

Cloudify • • •

GoogleAppEngine •

Heroku •

6.5.1. Competitororcompanion?

Inthiscategory,SlipStreamisacompetitor.Itboilsdowntoatrade-offbetweenthebalanceofconstraints/benefitsofthegenericPaaSvsflexibility/portability/multi-cloudbenefitsofSlipStream.

6.6. BrokerageSolutionsorServicesThis last category is lessmature than the previous, yet offersmany benefits.Our aim in brokering is toprovideaplatform(asaservice)whereuserscanself-provisionresourcesanddeploysimpleandcomplexapplicationstoawiderangeofcloudservices.Ourcloudbrokeragedefinition,therefore,doesnotincludeconsultancy,searchorrelationalserviceswherehumansareperformingthebrokeringbetweencustomersandcloudproviders.

Whilewedonegotiateregularlywithcloudproviders,this isdonebehindthesceneandtransparentlytotheusers,suchtheyonlyseetheresultofthesenegotiationsinourservicecatalogue(e.g.pricing).

Our vision is to automate this process, where users are provided with relevant, complete and richinformation,suchthattheycanselecttherightoffer(i.e.combinationofcloudservicesprovidedbyagivenprovider).Selectioncriteriacaninclude,forexample:

1. SLA2. Pricing3. Location4. Certification

Taking thisdefinitionofbrokerage, there is littlecompetitiononthemarket.Several IaaSprovidersofferthepossibility topurchase inbulk (e.g. reserved instances) tobenefit fromcheaper resources.However,the riskandmanagementof thesedecisionsand transactionsarewith thecustomers.Thebroker,aswedefine it, candecide tobulkpurchase,onbehalfof its customers, anddeliverpartof the correspondingsavingstothecustomers.Thismeansthebrokerperformstheanalysisandtakesthedecision,andtherisk,topurchasebulkresources.

Thebenefitforthecustomerisguaranteedaccesstothecheapestcloudresources,withthepossibilitytoaccessevencheaperpricesifthebrokercanpurchaseinbulk.

In return, thebroker takesaminimumbrokerage fee,with thepossibility to increase thisprofit if it cansuccessfullysellbulkpurchasedresourcestocustomers,thuscreatingawin-winsituation.

Weknowoffewcompaniesattemptingtodeliversuchaservice.Forexample,DBCE(DeutscheBörseCloudExchange) tried something similar, but with a much more basic offer compared to the richness ofSlipStream(orNuvla),andfailedsincetheystoppedoperatinginMarch2016.ComputeNextalsoproposedabrokerageservice,whereuserscanselectapplications.

Asasummary,thefollowinglistincludesprovidersthatofferpartialbrokerageservices:

Page 62: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page62of66

1. AmazonWebServices2. Exoscale3. T-Systems4. DBCE

Thefollowingtableillustratessomeofthebenefitsandshortcomingsoftheseaboveservices,comparedtoNuvla’sbrokerageoffer:

Table10:PaaSvsSlipStream/Nuvla

Product/solution/vendor Based onOpenSource?

Supportpolicyplacement?

Supportmulti-cloud?

Support bulkpurchase?

Managedbulkpurchase?

Comment

SlipStream/Nuvla • • • • •

AmazonWebServices •

Exoscale •

T-Systems • •

DBCE •

6.6.1. Competitororcompanion?

As mentioned above, in the brokerage space as defined in this analysis, Nuvla currently has no clearcompetitor. Further, its growing support for a wide-ranging number of public clouds, and its ability tosupportprivateclouds,makesNuvla’sbrokeragebenefithighlyvaluable.

Building on the enabling foundations provided by the service catalogue feature of SlipStream, thebrokeragebenefitswill groweven further. Indeed,with roadmap items includingdatadrivenplacement,for example in the fieldof EarthObservationdataprocessing, aswell as IaaSprovider certifications, thebrokeragefeatureofNuvlawillofferricherattributestousewhenchoosingtherightIaaSprovider(s)andoffers.

Page 63: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page63of66

7. Conclusions

ThisdocumentdescribestheCYCLONEsolutionsfordealingwithfunctionalandnon-functionalaspectsofcloud computing. The solution reliesonSlipStream (withextensions) tohandle theprovisioningand re-provisioningofresourcesastheoperatingconditionsforacloudapplicationchangeovertime.Thespecificdecisions on what actions to take (scale-up, scale-down,migration, etc.) are defined by an application-specificpolicy.Thegeneralalgorithmsfordetectingchangesinmetrics(usedtotriggerscalingactions)havebeen defined. These algorithms now must be brought together with SlipStream to validate the overallsolutionandtodemonstratethedefinedusecases.

The implementationsof thenew features inSlipStreamandof thechangedetectionalgorithmsareveryadvanced.ReviewingthestatusoftherequirementsforSlipStream(seeTable11)showsthatmostofthemarealreadysatisfied;weexpectthatallofthemwillbesatisfiedbythefinalsolution.

The document also reviewed the state of the art for autoscaling, both in the academic and commercialworlds.ThecomparisonofSlipStreamwithcompetingproductsandserviceshasshownthatmanyofthemoffercomparablefeatures,butonlyforsubsetsofSlipStream’sfull featureset.SlipStreamisstill theonlyproducttoofferacomprehensivemulti-cloudsolutionforallaspectsofapplicationmanagement(includingscaling)andbrokering.

Table11:StatusofRequirements

1 ✔ Triggering of scaling actions of an application based on application metrics usingsimple,predefinedalgorithms(e.g.addingnodebasedonmachineload).

2 ✔ Triggeringofscalingactionsofanapplicationbasedonapplicationmetricsdefinedbythedeveloperoftheapplication.

3 Abilitytopublishapplication-specificbenchmarksofcloudproviders intotheServiceCatalogorOpenServiceCompendium.

4 ✔ Placement based on static characteristics (e.g. geographical location) of a cloudserviceprovider.

5 PlacementbasedondynamicVMmonitoringinformationfromSlipStreamitself.6 ✔ PlacementbasedonexternalinformationpushedintotheSlipStreamServiceCatalog

orOpenServiceCompendium.7 Placementbasedonthejoinofallinformationassociatedwithagivencloudservice

provider.8 ✔ Ranking of selected cloud service providers based on predefinedalgorithms (e.g.

price).9 Ranking based on algorithms provided by the application developer and/or the

applicationoperator.10 Abilitytotriggernotifications/alertsthroughSlipStream.11 ✔ Abilitytotriggerscalingactionsfromwithintheapplication.12 AbilitytosearchtheServiceCatalogandOpenServiceCompendiummanuallytosee

the results from various policies and to ideally then associate those policies withapplications.

Page 64: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page64of66

References

[ALI14] Hanieh Alipour, Yan Liu, and Abdelwahab Hamou-Lhadj. “Analyzing auto-scalingissues in cloud environments”. In Proceedings of 24th Annual InternationalConference on Computer Science and Software Engineering, CASCON ’14, pages75–89,2014.

[BOU98] Boudewijn R. Haverkort. “Performance of Computer Communication Systems: AModel-BasedApproach”.JohnWiley&Sons,Inc.,1998.

[BRO93) B.BrodskyandB.Darkhovsky.Nonparametricmethods inchange-pointproblems.Springer,1993.

[CIMI16] Distributed Management Task Force, Inc., Cloud Infrastructure ManagementInterface(CIMI)ModelandRESTfulHTTP-basedProtocol,2016.

https://www.dmtf.org/sites/default/files/standards/documents/DSP0263_2.0.0.pdf[CON15] Thomas Richard Connor and Joel Southgate, Automated Cloud Brokerage Based

UponContinuousReal-TimeBenchmarking,2015.

http://ieeexplore.ieee.org/document/7431434[DIA16] Manuel Díaz, CristianMartín, and Bartolomé Rubio. “State-of-the-art, challenges,

and open issues in the integration of internet of things and cloud computing”.JournalofNetworkandComputerApplications,67:99–117,2016.

[DOU12] BrianDougherty,JulesWhite,andDouglasC.Schmidt.“Model-drivenauto-scalingof green cloud computing infrastructure”. Future Generation Computer Systems,28(2):371–378,2012.

[EDN12] RichHicky,EDNFormat,2012.

https://github.com/edn-format/edn[GAL07] PedroGaleanoandDanielPeña.“Covariancechangesdetectioninmultivariatetime

series”. Journal of Statistical Planning and Inference, 137(1):194– 211, January2007.

[HUL01] Geoff Hulten, Laurie Spencer, and Pedro Domingos. “Mining time-changing datastreams”.InProceedingsoftheSeventhACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,KDD’01,pages97–106,2001.

[KRI17] Michael T. Krieger, Oscar Torreno, Oswaldo Trelles, and Dieter Kranzlmüller.“Building an open source cloud environment with auto-scaling resources forexecutingbioinformaticsandbiomedicalworkflows”.FutureGenerationComputerSystems,67:329–340,2017.

[LOC16] Locust,Locust,2016.

http://locust.io/

Page 65: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page65of66

[LOR14] Tania Lorido-Botran, JoseMiguel-Alonso, and Jose A. Lozano. “A review of auto-scaling techniques forelasticapplications in cloudenvironments”. JournalofGridComputing,12(4):559–592,2014.

[NGI16] Nginx,Nginx,2016.

https://www.nginx.com/[QU16] Chenhao Qu, Rodrigo N. Calheiros, and Rajkumar Buyya. “A reliable and cost-

efficient auto-scaling system for web applications using heterogeneous spotinstances”.JournalofNetworkandComputerApplications,65:167–180,2016.

[RIEC16] Riemann,Clients,2016.

http://riemann.io/clients.html[RIES16] Riemann,Riemann,2016.

http://riemann.io[SAPI16] SixSqSàrl,SlipStreamAPIReference,2016.

http://ssapi.sixsq.com[SCO79] DavidW. Scott. “On optimal and data-based histograms”. Biometrika, 66(3):605–

610,1979.[STU26] H.A. Sturges. “The choice of a class interval”. Journal of the American Statistical

Association,(21):65–66,1926.[UC7] CYCLONE,UC7:OpenScientificData,2015.

https://cyclone.france-bioinformatique.fr/usecases/view/160[UC8] CYCLONE,UC8:BenchmarkDrivenPlacement,2015.

https://cyclone.france-bioinformatique.fr/usecases/view/161[UCC15] IEEE,2015IEEE/ACM8thInternationalConferenceonUtilityandCloudComputing

(UCC),2015.http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7430473

[WID96] GerhardWidmerandMiroslavKubat.Learninginthepresenceofconceptdriftandhiddencontexts.Mach.Learn.,23(1):69–101,1996.

[ZEN16] Zencoder.Zencoder.2016.https://zencoder.com/en/

Page 66: Deliverable D6.3 Solutions for Non-functional Aspects of ... · Cloud providers’ solutions, however, are limited in terms of prices, availability, reliability, and connectivity

H2020-ICT-644925–CYCLONED6.3:SolutionsforNon-functionalAspectsofCloudComputing

CYCLONE_D6.3_non-functional-aspects-v1.0.docx Page66of66

Glossary

B2B BusinesstoBusinessCSP CloudServiceProviderDC DataCenterE2E EndtoEndIaaS Infrastructure-as-a-ServiceIPR IntellectualPropertyRightsIT InformationTechnologyMaaS MetalasaServiceNaaS Network-as-a-ServiceNet-HAL NetworkHardwareAbstractionLayerNFV NetworkFunctionVirtualizationPaaS Platform-as-a-ServicePC ProjectCoordinatorPMB ProjectManagementBoardPoP PointofPresenceQoS QualityofServiceSaaS Software-as-a-ServiceSCI SmartCoreInterworksSDN SoftwareDefinedNetworksSP ServiceProviderTC TechnicalCoordinatorTCTP TrustedCloudTransferProtocolTMB TechnicalManagementBoardWP WorkPackageWPL WorkPackageLeader


Recommended