Monitoring and Maintenance - Interim

DeliverableD4.41

MonitoringandMaintenance-Interim

Editor G.Gardikis(SPH)

Contributors I.Koutras,G.Mavroudis,S.Costicoglou(SPH),G.Dimosthenous,D.Christofi(PTL),M.DiGirolamo(HPE),K.Karras(FINT),G.Xilouris,E.Trouva(NCSRD),M.Arnaboldi(ITALTEL),P.Harsh(ZHAW),E.Markakis(TEIC)

Version 1.0

Date November30th,2015

Distribution PUBLIC(PU)

Ref. Ares(2016)560809 - 02/02/2016

T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim

©T-NOVAConsortium2

ExecutiveSummary

This deliverable is an interim report of the work currently being carried out in Task 4.4(MonitoringandMaintenance).Thetaskfocusesontheimplementationandintegrationofamonitoring framework, able to extract, process and communicatemonitoring informationfrombothphysicalandvirtualnodesaswellasVNFsatIVMlevel.

ThefirststepistheconsolidationofIVMrequirements,asexpressedinDeliverableD2.32,inordertoderivethespecificrequirementsforthemonitoringframework.Thelatterinclude:monitoring of all NFVI domains (hypervisor/compute/storage/network) as well as VNFapplications;processingandgenerationofeventsandalarms;communicationofmonitoringinformationaswellasevents/alarmstotheOrchestratorinascalablemanner.

Inparallel,acomprehensivesurveyofcloudandnetworkmonitoringtoolsisperformed,inordertoidentifytechnologieswhichcanbere-usedforVIMmonitoring.Specialemphasisisput on frameworks which integrate smoothly with Openstack, in particular OpenstackTelemetry/Ceilometer,Monasca,Gnocchi,Cyclops,Zabbix,NagiosaswellasrelevantOPNFVprojects(DoctorandPrediction).ItseemsthatmostoftheexistingtechnologicalenablersforVIMmonitoring,canonlypartiallyaddressalltheaforementionedchallengesinalightweightandresource-efficientmanner.Althoughmostofthemareindeedopenandmodular,theyare already quite complicated and resource-demanding and therefore further expandingthemtocovertheseneedswouldrequireconsiderableeffortandwouldraiseefficiencyissues.Wethusproposea“clean-slate”approachtowardsNFVmonitoringatVIMlevel,exploitingonlysomebasicenablersandaddingonlytherequiredfunctionalities.

TheT-NOVAVIMmonitoringframeworkisintroducedasacontributiontowardsthisdirection.The framework is built around the VIMMonitoringManager (VIMMM),which is the keycomponent devoted to monitoring at VIM level. The VIM MM exploits OpenStack andOpenDaylight APIs to retrieve a set ofmetrics for both physical and virtual nodes, whichshouldbesufficientformostNFVhandlingrequirements.However,inordertogainamoredetailedinsightontheVNFstatusandoperation,aMonitoringAgent,basedonthecollectdframework,isalsointroducedineachVNFVM,collectingalargevarietyofmetricsatfrequentintervals.

TheVIMMMconsistsofthefollowingcomponents:

• OpenstackandOpenDaylightconnectors,usedtoperiodicallypollthetwoplatformsviatheirmonitoringAPIs.

• AVNFApplicationconnector,whichacceptsdataperiodicallydispatchedbytheVNFapplication.ThesemetricsarespecifictoeachVNF.

• Atime-seriesdatabase(InfluxDB)fordatapersistence.

• Analarming/anomalydetectionengine–currentlyunderdevelopment-whichutilisesstatistical methods based on pre-defined but also dynamic thresholds in order toidentify possible anomalies in the NFV service and to produce the correspondingalarms/eventstobeforwardedtotheOrchestrator/VNFM.

• A Graphical User Interface (GUI), based on Grafana, which visualizes the storedmetricsandpresentsthemaslive,time-seriesgraphs.

• A Northbound API, which communicates selected metrics and events to theOrchestrator and, in turn, to the VNF Manager(s). The provided REST API allowsmetricstobecommunicatedineitherpushorpullmode.


©T-NOVAConsortium3

The VIMmonitoring framework is integrated, validated, evaluated and released as open-sourceintheframeoftheproject.

Itisconcludedthat,withtheproposedapproach,thegoalofdeliveringaneffective,efficientandscalablemonitoringsolutionfortheT-NOVAIVMlayer isachieved.Thesolutionunderdevelopment is able to expose to the Orchestrator and to the Marketplace enhancedawareness of the IVM status and resources, while at the same time keeping thecommunicationandsignallingoverheadatminimum.

Thenext steps in implementation involve the finalizationof theOrchestratorAPIwith thealarmingfunctionality,theintegrationofOpenDaylight,aswellastheanomalydetectionpart.Theseadvanceswillbereflectedinthefinalversionofthisdeliverable.


©T-NOVAConsortium4

TableofContents

1.INTRODUCTION..............................................................................................................6

2.REQUIREMENTSOVERVIEWANDCONSOLIDATION.........................................................7

3.TECHNOLOGIESANDFRAMEWORKSFORNFVMONITORING..........................................9

3.1.OPENSTACKTELEMETRY/CEILOMETER....................................................................................93.2.MONASCA.......................................................................................................................113.3.GNOCCHI.........................................................................................................................133.4.CYCLOPS..........................................................................................................................143.5.ZABBIX............................................................................................................................153.6.NAGIOS...........................................................................................................................153.7.OPNFVPROJECTS............................................................................................................16

3.7.1.Doctor....................................................................................................................163.7.2.Prediction..............................................................................................................17

3.8.OPENDAYLIGHTMONITORING.............................................................................................173.9.OTHERRELEVANTMONITORINGFRAMEWORKS.......................................................................183.10.TECHNOLOGYSELECTIONANDJUSTIFICATION.......................................................................18

4.THET-NOVAVIMMONITORINGFRAMEWORK.............................................................21

4.1.ARCHITECTUREANDFUNCTIONALENTITIES............................................................................214.2.MONITORINGMETRICSLIST................................................................................................22

4.2.1.Genericmetrics......................................................................................................224.2.2.VNF-specificmetrics..............................................................................................24

4.3.VNFMONITORINGAGENT.................................................................................................284.4.COLLECTIONOFVNF-SPECIFICMETRICS................................................................................294.5.MONITORINGOFFPGA-BASEDVNFS..................................................................................304.6.VIMMONITORINGMANAGERARCHITECTUREANDCOMPONENTS.............................................31

4.6.1.VIMMMArchitecture............................................................................................314.6.2.Interfacestocloudandnetworkcontrollers..........................................................344.6.3.NorthboundAPItoOrchestrator...........................................................................344.6.4.Anomalydetection................................................................................................374.6.5.Time-seriesDatabase............................................................................................374.6.6.Graphicaluserinterface........................................................................................38

4.7.PACKAGING,DOCUMENTATIONANDOPEN-SOURCERELEASE....................................................38

5.VALIDATION.................................................................................................................40

5.1.FUNCTIONALTESTING........................................................................................................405.2.BENCHMARKING...............................................................................................................425.3.FULFILLMENTOFREQUIREMENTS.........................................................................................43

6.CONCLUSIONSANDFUTUREWORK..............................................................................45

7.REFERENCES.................................................................................................................46

8.LISTOFACRONYMS......................................................................................................48

9.ANNEXI:SURVEYOFRELEVANTIT/NETWORKMONITORINGTOOLS............................49

9.1.1.IT/Cloudmonitoring..............................................................................................49


©T-NOVAConsortium5

9.1.2.NetworkMonitoring..............................................................................................51


©T-NOVAConsortium

6

1. INTRODUCTION

This deliverable is an interim report of the work currently being carried out in Task 4.4(MonitoringandMaintenance).Task4.4focusesontheimplementationandintegrationofamonitoring framework, able to extract, process and communicatemonitoring informationfrom both physical and virtual nodes as well as VNFs at IVM level. In other words, theoperationalscopeofthemonitoringframeworkbeingdevelopedinTask4.4correspondstothe two lower layers of the T-NOVAarchitecture, namely theNFVI andVIM. Themetrics1collected,alongwithalarms/eventsgenerated,areinturncommunicatedtotheupperlayers(OrchestratorandMarketplace),sothatthelatterhaveacomprehensiveviewofthestatusoftheinfrastructureresourcesaswellasthenetworkservicesrunningonthem.

Thepresentdocumentisstructuredasfollows:

• Chapter 2 overviews and consolidates the T-NOVA system and IVM requirementswhichdirectlyorindirectlyaffectthemonitoringframework.

• Chapter3presentsa surveyof themostprominentenabling technologies forNFVmonitoring, aswell as relevantOpenstack, OpenDaylight andOPNFV projects andpresentsajustificationforthetechnologiesused.

• Chapter 4 presents the architecture and the functional blocks of the T-NOVAVIMmonitoringframework.

• Chapter5presentsthetestingandvalidationoftheframeworkagainstspecifictestcases.

• Finally,Chapter6concludesthedocument.

1ItmustbeclarifiedthatTask4.4focusesonthecollectionofdynamicmetrics,i.e.metricswhichchangefrequentlyinrelationtoresourceusage.Staticinformationreflectingthestatusandcapabilitiesofinfrastructure,e.g.numberof installed compute nodes, processing resources per node etc. are assumed to be handled by Task 3.2(InfrastructureRepository).


©T-NOVAConsortium7

2. REQUIREMENTSOVERVIEWANDCONSOLIDATION

DeliverableD2.32[D232]hasdefinedandidentifiedarchitecturalconceptsandrequirementsfortheIVM(NFVIandVIM)layers.ThetechnicalrequirementswhichdrivethespecificationanddevelopmentoftheT-NOVAmonitoringframeworkcanbedirectlyderived/inheritedbythespecific IVMrequirements.Table1belowidentifiesthe IVMrequirementsthat–eitherdirectlyorindirectly-areassociatedtoIVMmonitoring,focusingonNFVI-PoPresourcesanddescribes how each of these are translated to a specific requirement for the monitoringframework.

Table1.IVMrequirementswhichaffectthemonitoringframework

IVMReq.ID

IVMRequirementName RequirementfortheMonitoringFramework

VIM.1Abilitytohandleheterogeneousphysicalresources

TheMFmustprovideavendoragnosticmechanismforphysicalresourcemonitoring.

VIM.2

Abilitytoprovisionvirtualinstancesoftheinfrastructureresources

TheMFmustbeabletoreportthestatusofvirtualizedresourcesaswellasfromphysicalresourcesinordertoassistplacementdecisions

VIM.3 APIExposure TheMFmustprovideaninterfacetotheOrchestratorforthecommunicationofmonitoringmetrics.

VIM.6

Translationofreferencesbetweenlogicalandphysicalresourceidentifiers

TheMFmustre-useresourceidentifierswhenlinkingmetricstoresources.

VIM.8 ControlandMonitoring

TheMFmustmonitorinrealtimethephysicalnetworkinfrastructureaswellasthevNetsinstantiatedontopofit,providingmeasurementsofthemetricsrelevanttoservicelevelassurance.

VIM.9 Scalability TheMFmustkeepupwithdynamicincreaseofthenumberofresourcestobemonitored

VIM.18 QueryAPIandMonitoring

TheMFmustprovideanAPIforcommunicatingmetrics(ineitherpushorpullmode)

VIM.21 VirtualisedInfrastructureMetrics

TheMFmustcollectperformanceandutilisationmetricsfromthevirtualisedresourcesintheNFVI.

C.7 ComputeDomainMetrics TheMFmustcollectcomputedomainmetrics.

C.12 Hardwareacceleratormetrics TheMFmustcollecthardwareacceleratormetrics

H.1 ComputeDomainMetrics TheMFmustcollectcomputemetricsfromtheHypervisor.

H.2 NetworkDomainMetrics

TheMFmustcollectnetworkdomainmetricsfromtheHypervisor.

H.12 Alarm/ErrorPublishing TheMFmustprocessanddispatchalarms.


©T-NOVAConsortium8

N.5 Usagemonitoring TheMFmustcollectmetricsfromphysicalandvirtualnetworkingdevices.

N.8 SDNManagement TheMFmustleverageSDNmonitoringcapabilities.

Byconsolidatingtheaforementionedrequirements,itbecomesclearthatthebasicrequiredfunctionalitiesoftheIVMmonitoringframeworkareasfollows:

1. CollectionofITandnetworkingmetricsfromvirtualandphysicaldevicesoftheNFVI.It should be noted that at the IVM level,metrics correspond only to physical andvirtual nodes and are not associated to services since the VIM does not haveknowledge of the end-to-end Network Service. Metrics are mapped to NetworkServicesatOrchestratorlevel;

2. Processingandgenerationofeventsandalarms;

3. Communicationofmonitoringinformationandevents/alarmstotheOrchestratorinascalablemanner;

ThefollowingchapteroverviewsseveraltechnologicalframeworksforNFVmonitoringwhichcouldbepartiallyexploitedtowardsfulfillingtheserequirements.


©T-NOVAConsortium9

3. TECHNOLOGIESANDFRAMEWORKSFORNFVMONITORING

ThischapterpresentsabriefoverviewofthemostrelevantmonitoringframeworkswhichcanbeappliedtotheNFVdomain.ThissectionmostlyfocusesonmonitoringtoolswhichprovideasatisfactorydegreeofintegrationwithOpenStackandcanbeextendedforNFVmonitoring;amorecomprehensivesurveyofother,mostgenericITandnetwork/SDNmonitoringtoolscanbefoundinAnnex2.

3.1. OpenStackTelemetry/Ceilometer

The goal of the Telemetry project within OpenStack [Telemetry], is to reliably collectmeasurementsoftheutilisationofphysicalandvirtualresources,comprisingdeployedclouds,storesuchdata forofflineusage,and triggeractionson theoccurrenceofgivenevents. Itincludesthreedifferentservices(Aodh,CeilometerandGnocchi–seeSec.3.3),providingthedifferent stagesof thedatamonitoring functional chain:Aodhdeliversalarming functions,Ceilometerdealswithdatacollection,Gnocchiprovidesatime-seriesdatabasewithresourceindexing.

The actual data collection service in the Telemetry project is Ceilometer. Ceilometer is anOpenStackservicewhichperformscollectionofdata,normalizesanddulytransformsthem,making them available to other services (starting from the Telemetry ones). Ceilometerefficiently collects themetering data of virtual machines (VMs) and the computing hosts(Nova),thenetwork,theOperatingSystemimages(Glance),thediskvolumes(Cinder),theidentities (Keystone), the object storage (Swift), the orchestration (Heat), the energyconsumption(Kwapi)andalsouser-definedmeters.

Figure1.OpenStackTelemetry/Ceilometerarchitecture


©T-NOVAConsortium10

Figure1depictsanoverallsummaryoftheTelemetry/Ceilometerlogicalarchitecture.EachoftheTelemetryservicesaredesignedtoscalehorizontally.Additionalworkersandnodescanbe added depending on the expected load. The system consists of the following basiccomponents:

• Pollingagents;theseare:o compute agents (ceilometer-agent-compute): they run on each compute

nodeandpollforresourceutilisationstatistics;o central agents (ceilometer-agent-central): it runs on one or more central

management servers topoll for resourceutilisation statistics for resourcesnottiedtoinstancesorcomputenodes;

• Notificationagents;theserunononeormorecentralmanagementserverstomonitorthemessagequeues(fornotificationsandformeteringdatacomingfromtheagent);

• Collectors(ceilometer-collector):designedtogatherandrecordeventandmeteringdatacreatedbynotificationandpollingagents.

• Databases, containing Events, Meters and Alarms; these are capable of handlingconcurrentwrites (fromoneormore collector instances) and reads (from theAPImodule);

• AnAlarm Evaluator andNotifier (ceilometer-alarm-notifier): Runs on one ormorecentral management servers to allow configuration of alarms based on thresholdevaluation fora collectionof samples. This functionality isnowundertakenby theAodhmodule,aswillbedescribedlater.

• AnAPImodule(ceilometer-api):Runsononeormorecentralmanagementserverstoprovideaccesstothedatafromthedatastore.

Ceilometerofferstwoindependentwaystocollectmeteringdata,allowingeasyintegrationofanyOpenstack-relatedprojectwhichneedstobemonitored:

• Bylisteningtoeventsgeneratedonthenotificationbus,transformedintoCeilometersamples.Thisisthepreferredmethodofdatacollection,sinceitisthemostsimpleandstraightforward.Itrequires,however,thatthemonitoredentityusesthebustopublishevents,whichmaynotbethecaseforallOpenStack-relatedprojects.

• BypollinginformationviatheAPIsofmonitoredcomponentsatregularintervalstocollect information. The data are usually stored in a database and are availablethroughtheCeilometerRESTAPI.Thismethodisleastpreferredduetotheinherentdifficultyinmakingsuchacomponentresilient.

Eachmetermeasures a particular aspect of resource usage or on-going performance. Allmetershaveastringname,aunitofmeasurement,andatypeindicatingwhethervaluesaremonotonically increasing (cumulative), interpreted as a change from the previous value(delta), or a standalone value relating only to the current duration (gauge). Samples areindividualdatapointsassociatedwithaparticularmeterandhaveatimestampandavalue.Theaggregationofasetofsamplesforaspecifiedduration(start-endtime)iscalledastatistic.Eachstatistichasalsoanassociatedtimeperiod,whichisarepeatingintervaloftimethatthesamples are grouped for aggregation. Currently there are five aggregation functionsimplemented:count,max,min,avgandsum.

AnotherfeatureofTelemetryisalarming,whichusedtobeinternaltoCeilometer,butmovedtoaseparateproject,Aodh[Aodh].Analarmisasetofrulesdefiningamonitorofastatisticthatwilltriggerwhenathresholdconditionisbreached.Analarmcanbesetonasinglemeter,oronacombinationofmetersandcanhavethreestates:

• alarm(thethresholdconditionisbreached)



• ok(thethresholdconditionisnotmet)• insufficientdata(notenoughdatahasbeengatheredtodetermineifthealarmshould

fireornot).

Thetransitiontothesestatescanhaveanassociatedaction,whichiseitherwritingtoalogfile or an http post to a URL. The concept ofmeta-alarm is also supported;meta-alarmsaggregateoverthecurrentstateofasetofotherbasicalarmscombinedviaalogicaloperator(AND/OR).Forexample,ameta-alarmcouldbetriggeredwhenthreebasicalarmsbecomeactiveatthesametime.

3.2. Monasca

Monasca [Monasca] is an OpenStack project, aiming at developing an open-sourcemulti-tenant,highlyscalable,performant,fault-tolerantmonitoring-as-a-servicesolution,whichisintegratedwithintheOpenStackframework.MonascausesaRESTAPIforhigh-speedmetricsprocessingandquerying,andhasastreamingalarmandnotificationengine.MonascaisbeingdevelopedbyHPE,RackspaceandIBM.

Monascaisconceivedtoscaleuptoserviceproviderlevelofmetricsthroughput(intheorderof100,000metrics/sec). TheMonascaarchitecture isnativelydesigned to support scaling,performanceandhigh-availability.Retentionperiodofhistoricaldataisnotlessthanoneyear.Storageofmetrics values, andmetricsdatabasequery,useanHTTPRESTAPI.Monasca ismulti-tenant, and exploits OpenStack authentication mechanisms (Keystone) to controlsubmissionandaccesstometrics.

Themetricdefinitionmodelconsistsofa(key,value)pairnameddimension.Basicthreshold-basedreal-timealarmsareavailableonmetrics.Furthermore,complexalarmeventscanbedefinedendinstrumented,basedonasimpledescriptiongrammarwithspecificexpressionsandoperators.

Monasca agents embed a number of built-in systemand service level checks, plusNagioschecksandstatsd.



Figure2.Monascaarchitecture

Monasca agents are Python based, and consist of several sub-components and supportssystemmetrics,suchascpuutilizationandavailablememory,Nagiosplugins,statsdandmanybuilt-inchecksforservicessuchasMySQL,RabbitMQ,andmanyothers.

TheRESTAPIprovidesanexhaustivesetoffunctions:

• Real-timestorageandqueryingoflargeamountsofmetrics;• Statisticsqueryformetrics;• Alarmdefinitionmanagement(create,delete,update);• Queryandcleanupofhistoricalmetricsdatabase;• Compoundalarmsdefinition;• Alarmseverityranking;• Fullstorageofalarmtransitionpattern;• Managementofalarmnotificationmechanisms;• JavaandPythonAPIavailable

PublishedmetricsandeventsarepushedintoaKafka2basedmessagequeue,fromwhichacomponentnamedPersisterpullsthemoutandstoresthemintothemetricsdatabase(HPEVertica, InfluxDB and Cassandra are supported). Other engine components look aftercompound metric creation, predictive metrics, notification, and alarm thresholdmanagement.

Monascaalsoincludesamulti-publisherpluginforOpenStackceilometer,abletoconvertandpublishmetricsamplestotheMonitoringAPI,plusanOpenStackHorizondashboardasuserinterface.

2http://kafka.apache.org/



Monasca features like real-time alarm processing, integration with OpenStack andscalability/extendibilitymakeitamonitoringsystempotentiallywellsuitabletobeemployedwithinNFVplatforms.

3.3. Gnocchi

Gnocchi[Gnocchi]saprojectincubatedundertheOpenStackTelemetryprogramumbrella,addressingthedevelopmentofaTDBaaS(TimeSeriesDatabaseasaService)framework.ItsparamountgoalistofixthesignificantperformanceissuesexperiencedbyCeilometerinthetimeseriesdatacollectionandstorage.Therootcauseof such issues is thehighlygenericnature of Ceilometer’s datamodel, which gave the needed design flexibility in the initialOpenStack releases, but imposed a performance penalty which is no longer deemedacceptable(storingalargeamountofmetricsonseveralweeksmakessubstantiallycollapsethestoragebackend).Thecurrentdatamodelononehandencompassesmanyoptionsneverappearing in real user requests, on the other hand doesn't handle use cases which areovercomplexortooslowtoberun.Fromtheaforementionedremarks,theideaofabrandnew solution formetrics sample collectionwas ignited,whichbrought to the inceptionofGnocchi.

Diving deeper into the problem, whereas event collection model in Ceilometer is prettyrobust,metrics collectionand storage suffers theaforementionedperformance flaws. Therootoftheproblemisinthefreeformmetadataassociatedtoeachmetric,storingabevyofredundant information which is hard to efficiently query. Gnocchi proposes a faster andscalablepairoftimeseriesstorage/resourceindexer,withaRESTAPIreturninganentity(themeasured thing) and a resource (information metadata). Differently from Ceilometer, inGnocchidatastoresareseparatedformetricsandmetadata.

Figure3.Gnocchiarchitecture

Thestoragedriver(abstracted)isinchargeofmetricsstorage.Aggregatedmetricsareactuallypre-aggregatedbeforethestorageoperationoccurs,basedontheuserrequestatentitytimecreation.Thecanonical implementationof timeseriesdata (TSD) storageusesPandasandSwift.

Theindexerdriver(abstractedaswell)usesSQLAlchemy,toexploitthespeedandindexablenatureofSQL,verywellfittingindexingstorage.InGnocchivision,therewillbepredefinedresource schemas (image, instance,…) to improve indexing and querying at themaximumextent.

Additional functional updates envisioned in Gnocchi include configurable per time-seriesretentionpolicies.



Infutureperspective,GnocchiAPIshouldbetransitionedtoCeilometerAPIV3,anditsTSDBinteractionfullymovedintotheCeilometercollector.Inaninitialphase,Gnocchishouldbeintegrated as self-standing code inside the Ceilometerworkflow (via Ceilometer DatabaseDispatcher).

3.4. Cyclops

Cyclops [Cyclops] is a generic rating-charging-billing (rcb) framework that allows arbitrarypricing and billing strategies to be implemented. The business rules are registered andprocessedinsideofaDroolsBPMruleengine[Drools].Theframeworkitselfisorganizedasasetofmicro-serviceswithclearseparationoffunctionalitiesandcommunicationamongthemcarriedoverwelldefinedRESTfulinterfaces.

Figure4.Cyclopsarchitecture

Figure 4 above shows the key micro services and modules that make up the Cyclopsframework.Abriefexplanationofeachmoduleisdescribednext:

• udrμ-service: thisservice is responsibleofmetriccollection fornativelysupportedcloud platforms, currently OpenStack and CloudStack are supported. The metriccollectiondriversforotherpopularframeworksincludingPaaSaspubliccloudvendorsareintheplans.ItalsopersiststhecollectedmetricsdataintoTSDB(InfluxDB0.9.xdatastore).Fornon-nativelysupportedapplications,itprocessesthemetricspushedintothemessagingqueuesasandwhentheyarrivethusenablingtheframeworktosupportallsortsofcompositeandconvergedbillingneeds.

• rcμ-service:theratingandchargingserviceprocessestheudr-recordsandtransformsthemintocharge-recordswithcostdetails.Theratingpartoftheservicegenerates/fetches the ratevalue for identifiedservicesor resources that form in theproductportfolioofanyprovider.The rating rulesareprocessed throughdroolsBPMrulesengine, enabling providers to activate dynamic rating and maximize the revenuepotentialoftheirportfoliowhilemaintainingconsumersatisfactionandloyalty.



• billingμ-service:asthenamesuggests,thisserviceprocessesandaggregatesallthechargerecordscreatedbyrcμ-service; furthermore itprocessesanySLAviolationsandassociatedpenaltiesforaspecifiedtimeperiod.Thisservicealsoprocessesanypending service credits, discounts and seasonal offers and applicable regional taxratesbeforegeneratingthefinalbilldocument.

• Auth-n/zμ-service:Cyclopsmicro-servicesvalidateAPIrequestsusingsecuretokens.

• Messagingservice: this featureallowsexternalnon-nativelysupportedapplicationsandplatformstosendintheirusagemetricsdataforfurtherprocessingbytheCyclopsframework.

• Dashboard: this module provides a rich graphical user interface for customers tomanageandviewtheirusagechartsandbills,andallowsadminstocontrolvariousparametersoftheframeworkandalsomanagethepricingandbillingrules.

Astheframeworkisimplementedasadistributedplatform,thehealthstatusmonitoringofvariousserviceiscritical.Forthis,currentlySensu[Sensu]isusedtotrackthealivenessofeachservice. Sensu could also be used to manage the data collection tasks scheduling andtriggering. Although the framework designers are migrating towards a self-containedschedulerfortheirdatacollectionandprocessingrequirements.Theusagemetricscollectiondependsheavilyonthegranularityoftheservicemonitoringimplementation.

3.5. Zabbix

Zabbix[Zabbix]isanopensource,general-purpose,enterprise-classnetworkandapplicationmonitoring tool that can be customised for use with OpenStack. It can be used toautomatically collect and parse data from monitored cloud resources. It also providesdistributedmonitoringwithcentralisedWebadministration,ahighlevelofperformanceandcapacity, JMX monitoring, SLAs and ITIL KPI metrics on reporting, as well as agent-lessmonitoring.AnOpenStackTelemetrypluginforZabbixisalreadyavailable.

Using Zabbix the administrator can monitor servers, network devices and applications,gatheringstatisticsandperformancedata.MonitoringperformanceindicatorssuchasCPU,memory, network, disk space andprocesses canbe supported throughan agent,which isavailable as a native process for Linux, UNIX andWindows platforms. For the OpenStackinfrastructureitcancurrentlymonitor:

● Core OpenStack services: Nova, Keystone, Neutron, Ceilometer (OpenStackTelemetry),Horizon,Cinder,Glance,SwiftObjectStorage,andOVS(OpenvSwitch)

● Core infrastructure components: MySQL, RabbitMQ, HAProxy, memchached, andlibvirtd.

● Operatingsystemstatistics:DiskI/O,CPUload,freeRAM,etc.ZabbixisnotlimitedtoOpenStackcloudinfrastructures:itcanbeusedtomonitorVMwarevCenter and vSphere installations for various VMware hypervisor and virtual machinepropertiesandstatistics.

3.6. Nagios

Nagiosisanopensourcetoolthatprovidesmonitoringandreportingfornetworkservicesandhost resources [Nagios]. The entire suite is based on the open-source Nagios Corewhichprovides monitoring of all IT infrastructure components - including applications, services,operating systems, networkprotocols, systemmetrics, andnetwork infrastructure.Nagiosdoesnotcomeasaone-size-fits-allmonitoringsystemwiththousandsofmonitoringagents



andmonitoringfunctions;itisratherasmall,lightweightsystemreducedtothebareessentialof monitoring. It is also very flexible since it makes use of plugins in order to setup itsmonitoringenvironment.

Nagios Fusion enables administrators to gain insight into the health of the organisation'sentirenetworkthroughacentralisedviewoftheirmonitoredinfrastructure.Inaddition,theycanautomatetheresponsetovariousincidentsthroughtheusageofNagiosIncidentManagerandReactor.TheNetworkAnalyser,whichispartofthesuite,providesanextensiveviewofallnetworktrafficsourcesandpotentialsecurity threatsallowingadministrators toquicklygatherhigh-level informationregardingthestatusandutilisationofthenetworkaswellasdetailed data for complete and thorough network analysis. All monitoring information isstored in the Log Server that provides monitoring of all mission-critical infrastructurecomponents–includingapplications,services,operatingsystems,networkprotocols,systemsmetrics,andnetworkinfrastructure.

NagiosandTelemetryarequitecomplementaryproductswhichcanbeusedinanintegratedsolution. The ICCLab, which operates within the ZHAW’s Institute of Applied InformationTechnology,hasdevelopedaNagiospluginwhichcanbeusedtocapturemetricsthroughtheTelemetryAPI,thusallowingNagiostomonitorVMsinsideOpenStack.Finally,theTelemetryplugincanbeusedtodefinethresholdsandtriggersintheNagiosalertingsystem.

3.7. OPNFVProjects

3.7.1. Doctor

Doctor (Fault Management) [Doctor] is an active OPNFV requirements project. StartedDecember2014,itsaimistobuildfaultmanagementandmaintenanceframeworkforhighavailabilityofNetworkServicesontopofvirtualizedinfrastructure.Theprojectissupportedbyengineersfromseveralmajortelecomvendorsaswellastelcoproviders.

So far, the project has produced a report deliverable which was very recently released(October2015)[DoctorDel].ThisreportidentifiesusecasesandrequirementsforanNFVfaultdetectionandmanagementsystem.Inspecific,thefollowingrequirementsareidentifiedforaVIM-layermonitoringsystem:

• Monitoringofresources

• Detectionofunavailabilityandfailures

• CorrelationandCognition(especiallycorrelationoffaultsamongresources)

• Notificationbymeansofalarms

• Fencing,i.e.isolationofafaultyresource

• Recoveryactions

Doctor has also specified an architectural blueprint for the fault management functionalblockswithintheNFVinfrastructure,asshowninFigure5.Inparticular,itisenvisagedthatcertainfunctionalitiesforcontrol,monitoring,notificationandinspectionneedtobeincludedintheVIM.



Figure5.Doctorfunctionalblocks

Asafirstimplementationproposal,thereportproposestore-useandintegratesomeoff-theshelf solutions for these functionalities, namely Ceilometer (see Sec. 3.1) for theNotifier,Zabbix(seeSec.3.5)fortheMonitorandMonasca(seeSec.3.2)fortheInspector.

However, it is evident that integrating these frameworks results in considerable overlaps,sincemanyfunctionalitiesarepresentinallofthem(e.g.metricscollection,storage,alarmingetc.) and thus may produce an unnecessarily complex and overprovisioned system. Inaddition,somekeyrequirementsmentionedinthedocument,suchascorrelationandrootcausedetectionarenotcoveredbythepresentversionsoftheseframeworks.

3.7.2. Prediction

Data Collection for Failure Prediction [Prediction] is another OPNFV project, aiming toimplementasystemforpredictingfailures.Notificationsproducedcanbedispatchedtothefaultmanagementsystem(seeprevioussection),sothatthelattercanproactivelyrespondtofaults,beforetheseactuallyhappen.

Thescopeoftheprojectisverypromisingindeedandalsoveryrelevant.However,itsprogressseems quite limited for the time being. The plan is to introduce just the data collectioncapabilitiesinOPNFVRelease2.

3.8. OpenDaylightmonitoring

SinceOpenDaylighthasbeenselectedastheSDNnetworkcontrollerforT-NOVA,itisrelevanttoinvestigatethemonitoringcapabilitiesitprovides.

The OpenDaylight Statistics Manager module, implements statistics collection, sendingstatistics requests to all enabled nodes (managed switches) and storing responses in thestatisticsoperationalsubtree.TheStatisticsManagercollectsinformationonthefollowing:

• node-connector(switchport)



• flow• meter• table• groupstatistics

IntheHydrogenandHeliumreleases,monitoringmetricswereexposedviathenorthboundStatisticsRESTAPI.

TheLithiumreleaseintroducestheModel-DrivenServiceAbstractionLayer(MDSAL),whichstoresstatus-relateddataintheformofadocumentobjectmodel(DOM),knownasa“datatree.”MDSAL'sRESTful interfacesforconfigurationandmonitoringaredesignedbasedonRESTCONFprotocol.TheseinterfacesaregenerateddynamicallyatruntimebasedonYANGmodelsthatdefineitsdata.

3.9. Otherrelevantmonitoringframeworks

Apartfromtheframeworkssurveyedinthepresentsections,thereexistsalargenumberofIT/CloudandNetwork/SDNmonitoringtools,manyofthemopen-source,whichcouldbere-used as components of anNFVmonitoring platform. Some of themost popular tools arepresentedinFigure6,andarebrieflyoverviewedinAnnexI.

Figure6.OtherrelevantCloud/SDNMonitoringframeworks

WhilemostoftheseframeworksrequireconsiderableeffortinordertobeadaptedtosuittheneedsofNFVmonitoring,therearecertaincomponentswhicharequitematureandcanbereused. For example, in T-NOVA, the collectd module (the core version) is adopted asmonitoringagentforVNFsandcomputenodes,aswillbedescribedinthenextsection.

3.10. Technologyselectionandjustification

With regard to the basic functionalities identified in Section 2 as requirements for VIMmonitoring,metricscollection(Functionality1)canalreadybeachievedbyre-usinganumberofthepre-existingmonitoringmechanismsforvirtualisedinfrastructures,assurveyedinthefollowingsection.Apartfromselectingandproperlyintegratingtheappropriatetechnologiesandpossiblyselectingtheappropriatesetofmetrics, limitedprogressbeyondthestate-of-the-artshouldbeexpectedinthisfield.

Shinken Icinga Zenoss Ganglia

StackTach Healthmon SeaLion MonALISA

collectd,StatsDandGraphite

vSphere AmazonCloudWatch OpenNetMon

Payless DCM Flowsense



On the other hand, the actual challenges and envisaged innovation of the monitoringframeworkareseentobeassociatedwithFunctionalities2and3.Specifically,thefollowingchallengeshavebeenidentified:

• Eventsandalarmsgeneration:Movingbeyondthetypicalapproach,whichisfoundinmostmonitoringsystemsandisbasedonstaticthresholds(i.e.generateanalarmwhenametrichas crossedapre-defined threshold) theaim is to studyandadoptmore dynamic methods for fault detection. Such methods should be based onstatistical methods and self-learning approaches, identifying outliers in systembehaviourandtriggeringalarmsreactivelyorevenproactively(e.g.beforetheactualfaulthasoccurred).Thisanomalydetectionprocedure,inthecontextofT-NOVA,canclearlybenefitfromthefactthatthemonitoredservicesarecomposedofVNFsratherthan generic VMs. As virtual appliances dedicated to traffic processing, VNFs areexpectedtoexposesomecommoncharacteristics(e.g.theCPUloadisexpectedtoproportionallyrise,notnecessarilylinearly,withtheincreaseofprocessedtraffic).Asignificant deviation from this correlation could, for example, indicate a potentialmalfunction.

• Communicationwith theOrchestrator:With this functionality, scalability is thekeyrequirement that needs to be fulfilled. In an operational environment, theOrchestrator is expected to manage tens or hundreds of NFVI-PoPs (or eventhousands,ifmicro-datacentresdistributedintheaccessnetworkareenvisaged).Itis thus impossible for the Orchestrator to handle the full set of metrics from allmanagedphysicalandvirtualnodes.ThechallengeistooptimisethecommunicationofmonitoringinformationtotheOrchestratorsothatonlynecessaryinformationistransmitted.Thisoptimisationdoesnotonlyimplyfine-tuningofpollingfrequency,carefuldefinitionofaminimalsetofmetricsorproperdesignofthecommunicationprotocol, but also requires an intelligent aggregation procedure at VIM level. Thisprocedureshouldachievethegrouping/aggregationofvariousmetricsfromdifferentpartsoftheinfrastructureaswellasofalarms,andthedynamicidentificationoftheinformationthatisofactualvaluetotheOrchestrator.

Toachievetheaforementionedinnovations,Task4.4workplaninvolvesinitsinitialstagetheestablishmentofabaselineframeworkwhichfulfilsthebasicfunctionalitiesbycollectingandcommunicating metrics and, as a second step, the study, design and incorporation ofinnovativetechniquesforanomalydetectionandmetricsaggregation.

Therearemanyalternativewaystowardsthisdirection,whoseassessmentisoverviewedinthefollowingtable.

Table2.AssessmentofvariousimplementationchoicesforVIMmonitoring

Implementation choice for T-NOVAVIMmonitoring

Pros Cons

Integrationofrequiredfunctionalities(pushmeters,statisticalprocessing,integrationofguestOSandVNFmetrics)intoCeilometerandAodh.

DirectintegrationintoOpenstack,contributiontoamainstreamproject.TakesadvantageofCeilometer’sopenandmodulararchitecture.

WillrequireintrusiveinterventionsintoCeilometer.SolutionwillbeOpenstack-specificandalsoversion-specific.Also,Ceilometersuffersspecificscalabilityissues.



AdoptionofMonasca,withsomeextensions(pushmeters,statisticalprocessing,integrationofguestOSandVNFmetrics)

Monascaisacompletemonitoringsystemwithremarkablescalabilityandalsoquitemature.ItsRESTAPIalreadyprovidesanexhaustivesetoffunctions.Monascahasanopenandmodulararchitecture.

Monascaisquitecomplexandresource-demanding,involvingmanycapabilitieswhicharenotrequiredinT-NOVA,giventhatmetricsarealsoprocessedatOrchestrator.Requiresaspecialmonitoringagent(monasca-agent).

ExtensionofGnocchi(asaTDBaaSframework)withallnecessarycommunicationandprocessingtools

Gnocchiisquitematureandadvancingrapidly,isalsowellintegratedwithCeilometertoprovidescalability.

Willneedtoimplementseveralextensionsforcommunicationandprocessing,sinceGnocchimainlyprovidesastoragesolution.

ExtensionofCyclopswithallnecessarycommunicationandprocessingtools

Quitematuresolution,know-howavailablewithinT-NOVAconsortium(CyclopsisdevelopedbyZHAW)

WillrequireextensivemodificationsinceCyclopsismostlyarating-charging-billingplatform.

ExtensionofNagiosorZabbixwithallnecessarycommunicationandprocessingtools

Botharewell-provenmonitoringframeworksandprovidesupportformultiplesystemsandapplications

NagiosandZabbixalreadyinvolvemanyfeaturesandcapabilitieswhicharenotneededinT-NOVA,andthustheirextensionwouldbeinefficient,alsorequiringseveralmodifications.

Integrationofspecificenablers(agent,time-seriesDB,existingAPIs)intoanewframework.

WillresultinatailoredsolutionforT-NOVAneeds.Lightweightanddirectlyconfigurable.

Somefunctionalities(suchasalarming)willhavetoberedevelopedfromscratch.

ItseemsthatmostoftheexistingtechnologicalenablersforVIMmonitoring,aspreviouslyoverviewed,canonlypartiallyaddressalltheaforementionedchallengesinalightweightandresource-efficientmanner.Althoughmost of themare indeedopenandmodular (such asMonasca),theyarealreadyquitecomplicatedandresource-demandingandthereforefurtherexpanding them to cover these needs would require considerable effort andwould raiseefficiency issues.We argue that a “clean-slate” approach towardsNFVmonitoring at VIMlevel,exploitingsomebasicenablersandaddingonlytherequiredfunctionalities,isamoreoptimizedapproach.

TheVIMmonitoring framework,whichwedescribe in thenext section, aims toprovide alightweightandNFV-tailoredcontributiontowardsthisdirection.



4. THET-NOVAVIMMONITORINGFRAMEWORK

4.1. Architectureandfunctionalentities

TheoverallarchitectureoftheT-NOVAVIMmonitoringframeworkcanbedefinedbytakingintoaccountthetechnicalrequirements,as identified inSection2,aswellasthetechnicalchoicesmadefortheNFVIandVIMinfrastructure.ThespecificationphasehasconcludedthattheOpenStackplatformwillbeusedforthecontrolofthevirtualisedITinfrastructure,aswellastheOpenDaylightcontrollerforthemanagementoftheSDNnetworkelements.

In this context, it is proper to leverage the OpenDaylight (Statistics API) and OpenStack(TelemetryAPI) capabilities for collectingmetrics, rather thandirectly polling thenetworkelementsandthehypervisorsatNFVIlayer,respectively.

Theoretically,itwouldbepossiblefortheOrchestratortodirectlypollthecloudandnetworkcontrollers of each NFVI-PoP and retrieve resource metrics respectively. This approach,althoughsimpleandstraightforward,wouldonlypoorlyaddress thechallengesoutlined inSection 3.10 and in particularly would introduce significant scalability issues on theOrchestratorside.

Thus, it seems appropriate to introduce amediator/processing entity at the VIM level tocollect,consolidate,processmetricsandcommunicatethemtotheOrchestrator.WecallthisentityVIMMonitoringManager(VIMMM),asastand-alonesoftwarecomponent.AsjustifiedinSec.3.10,VIMMMisre-designedanddevelopedinT-NOVAasanovelcomponent,withoutdependingonthemodificationofexistingmonitoringframeworks.

Withregardtothecollectionofmonitoringinformation,OpenStackandOpenDaylightalreadyprovidearichsetofmetricsforbothphysicalandvirtualnodes,whichshouldbesufficientformostT-NOVArequirements.However,inordertogainamoredetailedinsightontheVNFandtheNFVIstatusandoperation,weconsideradvisabletoalsocollectarichsetofmetricsfromtheguestOSoftheVNFcontainer(VM)-includinginformationwhichcannotbeobtainedviathehypervisor–aswellasthecomputenodeitself.

Forthispurpose,weintroduceanadditionalVNFMonitoringAgent,deployedwithintheVNFVMs.TheagentintendstoaugmentVNFmonitoringcapabilities,bycollectingalargevarietyofmetrics,asdeclared in theVNFDescriptordocument (VNFD)ofeachVNFandalsoatahighertemporalresolutioncomparedtoCeilometer.

Themonitoring agent can be either pre-installed in the VNF image or installed upon VNFdeployment.Itmustbenoted,however,thatinsomecasesthepresenceofanagentmightnot be desirable by the VNF developer for several reasons (e.g. resource constraints,incompatibilitiesetc.).Inthiscase,thesystemcanalsoworkinagent-lessmode,solelyrelyingonCeilometerdataforVNFswhichdonothaveanagentinstalled.

InadditiontocollectinggenericVNFandinfrastructuremetrics,theVIMMMisalsoexpectedto retrieveVNF-specificmetrics from theVNFapplication itself. For thispurpose,wehavedevelopedspecificlightweightlibraries(currentlyinPython,butplannedtoexpandtootherlanguages),whichcanbeusedbytheVNFdevelopertodispatchapplication-specificmetricstotheVIMMM.

AlthoughtraditionallytheVNFmetricsaresupposedtobedirectlysenttotheVNFManager,for the sake of simplicity we chose to exploit the already established VIM monitoringframeworktocollectandforwardVNFmetricstotheVNFManagerthroughtheVIM,ratherthanimplementasecondparallel“monitoringchannel”.



Basedonthedesignchoices,outlinedabove,thearchitectureoftheT-NOVAVIMmonitoringframeworkcanbedefinedasshowninFigure7below.

Figure7.OverviewoftheVIMmonitoringmodules

TheVIMMMaggregatesmetricsbypollingthecloudandnetworkcontrollersandbyreceivingadditional information from the monitoring agents as well as the VNF applications,consolidatesthesemetrics,producesevents/alarmsifappropriateandcommunicatesthemtotheOrchestrator.Forthesakeofscalabilityandefficiency,itwasdecidedthatmetricswillbe pushed by the VIM MM to the Orchestrator, rather than being polled by the latter.Moreover, the process ofmetrics collection/communication and event generation can bepartiallyconfiguredbytheOrchestratorviaarelevantconfigurationservicetobeexposedbytheVIMMM.Moredetailsontheintroducedmodulescanbefoundinthesectionstofollow.

4.2. Monitoringmetricslist

4.2.1. Genericmetrics

A crucial task when defining the T-NOVA approach formonitoring is the identification ofmetrics that need to be collected from the virtualised infrastructure. Although the list ofmetricsthatareavailableviatheexistingcontrollerscanbequiteextensive,itisnecessary,forthesakeofscalabilityandefficiency,torestrictthislisttoincludeonlytheinformationthat



isactuallyneededfortheimplementationoftheT-NOVAUseCases,asdefinedinDeliverableD2.1.Table3belowssummarisesalistofsuchmetrics,whichare“generic”inthesensethatthey are not VNF application-specific3. This list is meant to be continuously updatedthroughouttheprojectinordertoalignwiththetechnicalcapabilitiesandrequirementsofthecomponentsunderdevelopmentandtheusecaseswhichareimplemented.

Table3.Listofgenericmonitoringmetrics

Domain Metric Units Origin RelevantUCs

VM/VNF CPUutilisation(user&system)

% VNFMon.Agent UC3,UC4

VM/VNF RAMallocated MB VNFMon.Agent UC3,UC4

VM/VNF RAMavailable MB VNFMon.Agent UC3,UC4

VM/VNF Diskread/writerate MB/s VNFMon.Agent UC3,UC4

VM/VNF Diskread/writerate Ops/s VNFMon.Agent UC3,UC4

VM/VNF NetworkInterfacein/outbitrate

Mbps VNFMon.Agent UC3,UC4

VM/VNF NetworkInterfacein/outpacketrate

pps VNFMon.Agent UC3,UC4

VM/VNF No.ofprocesses # VNFMon.Agent UC4

ComputeNode CPUutilisation % OSTelemetry UC2,UC3,UC4

ComputeNode RAMavailable MB OSTelemetry UC2,UC3,UC4

ComputeNode Diskread/writerate MB/s OSTelemetry UC3,UC4

ComputeNode Networki/fin/outrate Mbps OSTelemetry UC3,UC4

Storage(Volume) Read/writerate MB/s OSTelemetry UC3,UC4

Storage(Volume) Freespace GB OSTelemetry UC2,UC3,UC4

Network(virtual/physicalswitch)

Portin/outbitrate Mbps ODLStatistics UC2,UC3,UC4


Portin/outpacketrate pps ODLStatistics UC3,UC4


Portin/outdrops # ODLStatistics UC3,UC4

Withregardtometrics identification,averyrelevantreference istheETSIGVNFV-INF010document[NVFINF010]whichwasreleasedDecember2014.Thisdocumentaimsatdefininganddescribingmetricswhichrelatetotheservicequality,asperceivedbytheNFVConsumer.Thesemetricsareoverviewedinthetablebelow.

3PleaserefertoSec.4.2.2foralistofVNF-specificmetrics.



Table4.NFVServiceQualityMetrics(Source:[NFVINF010])

Itcanbeseenthat,apartfromtheservicelatencymetricswhicharerelatedtotheprovisioningand/orreconfigurationoftheserviceandessentiallyrefertotheresponseofmanagementcommands (e.g. VM start), the restmetrics can be directly or indirectly derived from theelementarymetricsidentifiedinTable3aswellastheevents/alarmsassociated.However,itisuptotheOrchestrator,whichhasacompleteviewoftheservice,toassemble/exploitVIMmetricsinordertoderivetheservicequalitymetricstobeexposedtotheSPandtheCustomerviatheDashboard.ThesemetricswillbeusedasinputtoenforcetheServiceLevelAgreement(SLA)thatwillbefinallyevaluatedatMarketplacelevelfortheapplicabilityofpossiblerewardstothecustomerincaseoffailure(seeD6.4–SLAsandbilling).

4.2.2. VNF-specificmetrics

Apartfromthegenericmetricsidentifiedintheprevioussection,eachVNFgeneratesspecificdynamicmetricstomonitoritsinternalstatusandperformance.

Thesemetrics:

• arespecifiedinsidetheVNFDescriptor(VNFD)asmonitoring-parameters(bothfortheVDUsand for thewholeVNF) todefine theexpectedperformanceof theVNFundercertainresourcerequirements.

• aresentbytheVNFapplicationtotheVIMMonitoringManager,eitherviatheagentordirectly(seedetailsinSec.4.4)

• are processed, aggregated and forwarded, if required, to the upper layers(OrchestratorandMarketplace).

AttheOrchestrationlevel,someoftheVNF-specificmetricscanbeusedforautomatingtheselectionofthemostefficientVNFflavourintermsofusageofresources,toachieveagivenSLA(forexampleusingautomatedscalingprocedures–seeD3.3).

At theMarketplace level, those VNF-specificmetrics thatmay be part of the SLA agreedbetween SP and customer will be evaluated for business and commercial clauses (e.g:penalties,rewards,etc.)thatwillfinallyimpactinthebillingprocedure(seeD6.4)

ThesubsectionstofollowoverviewalistofVNF-specificmetricsforeachoftheVNFsbeingdevelopedinT-NOVA.ThelistswhichfollowaretentativeandaremeanttobecontinuouslyupdatedastheVNFapplicationsevolve.



PleasealsonotethatmostofthesemetricsrefertothespecificfunctionalityofeachVNFaswell as its component softwaremodules. For a detailed description of the T-NOVA VNFs,pleaserefertoDeliverableD5.32[D532].

4.2.2.1. vSBCmetrics

ThevSBCcomponents(VNFCs)abletogeneratemetricsare:LB,IBCF,BGFandO&M(pleasereferto[D532]formoredetails).

ThesedataarecollectedbytheO&Mcomponent,andsenttotheMonitoringManagerviathemonitoringagent,usingtheSNMPprotocol.

ThefollowingtablesumsuptheVNF-specificmetricsforthevSBCfunctions.

Table5.vSBCmonitoringmetrics

Metric Unit Notes

TotalnumberofSIPsessions/transactions

Count-Incremental GeneratedbytheLBandIBCFcomponent

NumberoffailedSIPsessions/transactionsduetovSBCinternalproblems

Count-Incremental Thiscounterdoesn’tincludeincomingincorrectSIPRequests,orfailurescomingfromexternalnetwork

IncomingRTPdatathroughput(incomingbandwidthconsumption)

Count-Incremental NumberofincomingRTPpackets/bytes.

GeneratedbytheBGFcomponent

OutgoingRTPdatathroughput(outgoingbandwidthconsumption)

Count-Incremental NumberofoutgoingRTPpackets/bytes.


RTPframeloss Average% Dueeithertosourcefiltering,ortofailingasourceaddress/portcheck,oriftheflowrateexceedsthepre-determinedbandwidthContext

Latency msec(Averagevalue) Averagetransmissiondelay.


Interarrivaljitter msec(Averagevalue) Averageinter-packetarrivaljitter.


Numberoftranscoding/transratingprocedures

Count-Incremental GeneratedbytheBGFcomponent

Numberoffailedtranscoding/transratingproceduresduetovSBCinternalproblems

Count-Incremental GeneratedbytheBGFcomponent

Wepointoutthat:

• thefirsttwometrics(SIPsessions)arerelatedtothecontrolplanemonitoring• therestmetricsarerelatedtothemediaplanemonitoring



AtthereceiptofanewSNMPGETrequestcomingfromthemonitoringagent,allthesemetricsarereset,whiletheirenhancementstartsagain.

Thesemetricsmaybestronglyinfluencedby:

• incomingpacketsizes(i.e:64,128,256,……,1518byte)• hardwareandsoftwareaccelerationtechnologies(i.e:DPDKorGPU).Inparticular

theGPUhardwareacceleratorsmightbeused,intandemwithstandardprocessors,incaseofintensiveprocessing(i.e:videotrascoding/transrating).

4.2.2.2. vTCmetrics

Table6overviewsthemetricsreportedbythevirtualTrafficClassifierVNF(vTC).

Table6.vTCmonitoringmetrics

Metric Unit Notes

Packetspersecond Count-average PacketsprocessedbytheTCFlowspersecond Count-average Discreteflowspersecond

Totalflows Count-incremental Thenumberofuniquetotalflowsrecognized

ApplicationProtocols Count-incremental Thenumberofdifferentapplicationsdetected

Throughput Mbits/sec ThetrafficinMbitsprocessedbythevTCpersecond

4.2.2.3. vSAmetrics

ThemetricsreportedbythevirtualSecurityAppliance(vSA)areshowninTable7.ThevSAmetricscorrespondtothetwovSAcomponents(snortandpfsense).

Table7.vSAmonitoringmetrics

Metric Unit Notes

Numberoferrorscomingin/goingoutofthewan/laninterfaceofpfsense

Count-incremental Fourmetrics(withprefix'vsa_pfsense_'):lan_inerrs,lan_outerrs,wan_inerrs,wan_outerrs

Numberofbytescomingin/goingoutofthewan/laninterfaceofpfsense

Count-incremental Fourmetrics(withprefix'vsa_pfsense_'):lan_inbytes,lan_outbytes,wan_inbytes,wan_outbytes

Numberofpacketscomingin/goingoutofthewan/laninterfaceofpfsense

Count-incremental Fourmetrics(withprefix'vsa_pfsense_'):lan_inpkts,lan_outpkts,wan_inpkts,wan_outpkts

Statetablesizeofpfsense Count -



Percentofdroppedpackets,generatedbysnort

Percent(%) -

Numberofalertspersecond,generatedbysnort

Percent(%) -

vSAthroughput,generatedbysnort

Kbits/sec -

Pfsenseuptime String Forexample:01Hour17Minutes51Seconds

4.2.2.4. vHGmetrics

Table8summarisesthevirtualHomeGateway(vHG)metrics.

Table8.vHGmonitoringmetrics

Metric Unit Notes

Totalsizeswiftnode Bytes FromSwift

Totalobjectsstored Count-incremental FromSwift

Totalversionsstored Count-incremental FromFrontendxmlfile

TotaluniqueURLsstored Count-incremental FromFrontendxmlfile

Totalrules Count-incremental FromFrontendxmlfile

Averagetimefortranscoding Seconds Fromworkers

4.2.2.5. vProxymetrics

Themetricsreportedbythevirtualproxy(vProxy)VNFaresummarizedinTable9.Mostofthemareboundtothespecificproxyimplementation(squid),butcanbeextendedtomatchotherimplementationsaswell.

Table9.vProxymonitoringmetrics

Metric Unit Notes

NumberofHTTPrequestsreceived Count-incremental ThenumberofHTTPrequestsreceivedbySquidsincethelastmeasurement

Cachehitspercentage Percent(%) ThepercentageofHTTPrequeststhatresultinacachehitforthelast5minutes.ItalsoincludescasesinwhichSquidvalidatesacachedresponseandreceivesa304(NotModified)reply.

Memoryhitspercentage Percent(%) Thepercentageofallcachehitsthatwereservedfrommemory



(hitsthatareloggedasTCP_MEM_HITinSquid’slogs)

Diskhitspercentage Percent(%) Thepercentageofallcachehitsthatwereservedfromdisk(hitsthatareloggedasTCP_HITinSquid’slogs)

Cachediskutilization Percent(%) Theamountofdiskcurrentlybeingusedbythecachedobjectsdividedbythetotalamountofdiskthatcanbeallocatedforcaching.

Cachememoryutilization Percent(%) Theamountofmemory(RAM)currentlybeingusedbythecachedobjectsdividedbythemaximumamountofmemorythatcanbeallocatedforcaching.

Numberofusersaccessingtheproxy

Count SquidassumesthateachuserhasauniqueIPaddress

4.3. VNFMonitoringAgent

TheVNFMonitoringAgentcomeseitherpre-installedwithintheVMimagehostingtheVNFCorinstalleduponVNFCdeployment.ItwillbeautomaticallylauncheduponVNFstart-upandruncontinuouslyinthebackground.TheagentcollectsawiderangeofmetricsfromthelocalOS.

For the implementationof themonitoringagent,weexploit the thepopular collectd-coremodule[collectd](alsoseeAnnexI,Sec.9.1.1.8.).Collectd-corecomesinapackagealreadyavailable in most Linux distributions and can be directly installed with relatively minimaloverhead.

Giventhatthelistofavailablecollectdpluginsisquiteextensive,wehaveselectedabasicsetofpluginstobeusedinT-NOVA,inordertocoverallgenericmetrics,asidentifiedinSec.4.2.1butalsotocapturemostvitalmetersofthesystem,withoutontheotherhandintroducingtoomuchoverhead.Theseplugins,accompaniedbyabriefdescriptionandthemetricswhicharecollected,areoverviewedinTable10below.

Table10.CollectdpluginsusedinT-NOVA

Plugin Description Metrics

CPU CollectstheamountoftimespentbytheCPUinvariousstates,mostnotablyexecutingusercode,executingsystemcode,waitingforIO-operationsandbeingidle.

• user• interrupt• softirq• steal• nice• system• idle• wait

Memory Collectsphysicalmemoryutilization. • used• buffered• cached



• free

Disk Collectsperformancestatisticsofhard-disksand,wheresupported,partitions.

• octets.read• octets.write• ops.read• ops.write• time.read• time.write• merged.read• merged.write

Interface Collectsinformationaboutthetraffic(octetspersecond),packetspersecondanderrorsofinterfaces

• if_octects.rx• if_octects.tx• if_packets.rx• if_packets.tx• if_errors.rx• if_errors.tx

Processes Collectsthenumberofprocesses,groupedbytheirstate(e.g.running,sleeping,zombies,etc.).

• ps_state-running• ps_state-sleeping• ps_state-zombies• ps_state-stopped• ps_state-paging• ps_state-blocked• fork_rate

Forthecommunicationofmetrics,theMonitoringAgentfeaturesaTCPorUDPdispatcherwhich pushes measurements to the VIM MM periodically. The push frequency will beconfigurable (either manually or automatically). The set of metrics (selection among allavailableones) tobecommunicatedwillalsovaryamongVNFs,andwillbedefined in theVNFD.

4.4. CollectionofVNF-specificmetrics

TheVIMmonitoringframeworkprovidesseveraloptionsforcollectingVNFmetrics;eachVNFdevelopermaychoosethemostappropriateoptionwhichsuitstheirrequirements,policiesandconstraints.

The direct communication method involves the VNF application itself reporting selectedmetricsaskey-valuepairstotheVIMMMatarbitrary intervals.Forthispurpose,wehavedeveloped a set of lightweight libraries (currently in Python and Java) which the VNFprovider/developercanintegrateintheapplication.Thisway,theVNFPcanusethemethodsprovided to easily and quickly dispatch internal application metrics without knowing theinternalsandinterfacesofthemonitoringframework.

The indirect communication method implies that all VNF metrics are collected by themonitoringagent(collectd)bymeansofplugins.Thiscanbedoneinthreealternativeways:

1. Usingthecollectd“Snmp”plugin.ThisoptionisappropriateincaseswhentheVNFalready exposes an SNMP service. In this case, the metrics to be collected aredescribedbyaManagementInformationBase(MIB).TheMIBincludestheVNF/VDUidentifiers, and uses a hierarchical namespace containing object identifiers (OIDs);eachOIDidentifiesavariablethatcanbereadviatheSNMPprotocol(seeRFC2578).TheT-NOVAagent(collectd)issuesstandardSNMPGETrequestsperiodicallytothe



VNF SNMP service for these specificOIDs and gets back the values,which in turncommunicatestotheVIMMM.

2. Usingthecollectd“Tail”plugin.ThisisthesimplestmethodwhichrequiresminimalintegrationwiththeVNF.Withthisapproach,theVNFapplicationdumpsmetricsasentriesinalogfilewithknownformat.ThecollectdTailpluginparsesthelogfileaftereachupdate,extractsthemetricsandcommunicatestotheVIMMM.

3. Usingthecollectd“Custom”plugin.ThisisthemostcomplicatedmethodandrequirestheVNFPtodevelopaspecialcollectdpluginfortheVNF.However,thismightbethepreferredchoiceforsomeVNFPswhoinanycasewanttoaddcollectdsupportintheirVNF, given that collectd is very widely used also outside T-NOVA and alreadyintegratedwithsomeofthemostpopularmonitoringframeworks.

4.5. MonitoringofFPGA-basedVNFs

The T-NOVA project attempts to expand the NFV purview to heterogeneous computearchitecturesuchasGPUsandFPGAs.Utilizingsuchspecializedhardwarehasdirecteffectsonthemonitoringinfrastructurethatshouldbeusedtosupportit.ThissectionexploresthecorollariesoftheuseofprogrammablelogicascomputenodesintheT-NOVAenvironment.

Monitoring programmable logic devices represents unique challenges in comparison tostandardCPUs.Manyofthenotionspresentinthelatterarenotpresentinprogrammablelogicdevicesandfurthermoreprogrammablelogic-basedsystemscanshowlargedisparitieswhichmakesthetaskofprovidingoneoverarchingconceptexceedinglydifficult.

Asastartingpoint forourmeasurementarchitecturedefinitionweuse theprogrammablecloudplatformarchitectureintroducedinD4.1.InthisarchitectureweassumeanFPGASoCis used as the compute node. An FPGA SoC consists of a Processing System (PS), whichcomprisesoneormoreCPUsand theProgrammableLogic (PL). In thisarchitecture thePSexecute theOpenStackworker andany software requirement for themanagementof theprogrammable resources available in the PL,while the actual VNFCs to bemonitored aredeployedtothePL.

Inthisschemethemonitoringinfrastructureisbynecessityalsodividedintotwocomponent.OnecomponentresidesinSWandisexecutedinthePS.ThisprogramcollectsstatisticsfromtheHWcomponentsandforwardsthemtothemonitoringmanagerandcanalsobeusedtomonitortheperformanceofthePSifthisisdesired.



Figure8.MonitoringarchitecturefortheT-NOVAFPGASoC

The HW component on the other hand is responsible for collecting all of the relevantparameters on the HW side and forwarding them to the software component. TheseparametersarespecifictoeachVNFCandthusit’suptotheusestoprovidetheappropriateconnectionsandcircuitsforit.TheFPGASoCplatformprovidestheuserwithinfrastructurewhichcanbeusedtosendthatdataontothesoftwarecomponent.

ThisinfrastructureconsistsofasimpleAXI4streamwhichtheuserHWstatisticsmodulemustuse to interfacewith an AXI4 DMA engine. The latter is responsible for all data transfersbetweenthePSandthePLandensureshighthroughput,lowlatencytransfersbetweenthetwo.TheHWmonitoringcomponentsharetheAXIDMAwiththeVNFCalsodeployedontheFPGA.ThisisenabledbyusingseparateDMAchannelsforthedatatransfers,afeatureofferedbytheAXIDMAblockusedinthedesign.TheDMAblockoffersoneAXI4S interfaceforallinputsandusesadditionalsignaltodiscerntheuserwhichprovidedthedata.ItthentransfersthattotheappropriatememoryaddressspacefromwhichtheSWmonitoringapplicationcanreadthedata.

Thisschemeprovideaclear,expandable,standardinterfacefortheHWVMtotransfersitsmonitoringdatatotheSWcomponentandastraightforwardmethodfortheSWtoreadthedataandperformanyprocessingrequiredbeforesendingitontothemonitoringmanager.

4.6. VIMMonitoringManagerarchitectureandcomponents

4.6.1. VIMMMArchitecture

Aligned with the requirements and the design choices set in the previous sections, thefunctionalcomponentsoftheVIMMonitoringManageraredepictedinFigure9anddescribedinthissectionandintheoneswhichfollow.


©T-NOVAConsortium

32

Figure9.VIMMMfunctionalcomponents

TheMonitoringBackendisthecorecomponentofthemonitoringframework.ItisdevelopedinJavaScriptandusesthenode.js[nodejs]frameworktorunasaserver-sideapplication.Thereason behind this choice is that JavaScript matches an asynchronous, event-drivenprogrammingstyle,optimalforbuildingscalablenetworkapplications.Themainfunctionalityof VIMMM is data communication and the node.js ecosystem offers several services tofacilitatecommunication,especiallyviawebservices,aswellasevent-drivennetworking.

Thebackenditselfisdividedtothefollowingmodules:

• Databaseconnector.Thismoduleaccessesthetime-seriesdatabase(seeSec.4.6.5)in order to write and to read measurements. This module uses influent4, anInfluxDBJavascriptdriver.

• OpenStack and OpenDaylight connectors. These modules perform requests tovariousOpenStackandOpenDaylightservicesinordertoacquirecloud-andnetwork-related metrics (see Sec. 4.6.2). The Openstack connector communicates withKeystone,theOpenStackIdentityservice,inordertogeneratetokensthatcanbeusedforauthenticationandauthorisationduringtherestoftheOpenStackqueries.ItpollsNova,theOpenStackComputeservice,inordertogettheavailableinstances.Finally,itpollsCeilometer, theOpenStackTelemetryservice, inorder to receive the latestmeasurementsoftheinstances.Therequest-promise5npm(node.jspackage)moduleprovides here the HTTP client to perform all these requests. The OpenDaylightconnectorisstillunderdevelopment.

4https://github.com/gobwas/influent5https://www.npmjs.com/package/request-promise


©T-NOVAConsortium

33

• NorthboundRESTAPI.ThismoduleexposesalltherecordedmeasurementsviaHTTPandofferstheabilitytosubscribetospecificmeasurementevents.hapi.js6hasbeenused as a framework to build this. hapi.js pluginswere also used, such as joi7 forvalidation, hapi-swaggered8 and hapi-swagger9 for Swagger documentationgeneration.SeeSec.4.6.3formoredetails.

• Alarmingandanomalydetection. Thismodule, still underdevelopment, performsstatisticalprocessing inorder toderiveeventsandalarms (seeSec.4.6.4 formoredetails)

• VNF Application connector. It accepts data periodically dispatched by each VNFapplication, filters them and stored. These metrics are specific to each VNF (e.g.number of flows, sessions etc.). The list ofmetrics to be collected as well as thedispatchfrequencyaredescribedintheVNFDescriptor(VNFD).

• Configuration.Thismoduleallowstheuseoflocalfilesinordertoloadsettings.Node-config10hasbeenusedheretodefineasetofdefaultparametersandextendthemfor different deployment environments, e.g., development, QA, staging andproduction. Configurations are stored in configuration files within the backendapplicationandcanbeoverriddenandextendedbyenvironmentvariables.

Thedefaultconfigfileiscalledconfig/default.jsonandtheadministratormaycreatemultiplefiles intheconfigdirectorywiththesameformat,whichcan laterbeusedbythebackendapplicationiftheenvironmentvariableNODE_ENVissettotheconfigurationfilewithoutthe.jsonsuffix,e.g.forconfig/production.jsonthefollowingcommandneedstobeinvokedonaBash-compatibleshell:

exportNODE_ENV=production

Theconfigurationparametersthatarecurrentlyavailablearethefollowing:

ConfigParameter Description

loggingLevel Setsthelogginglevel.Availablelevelsaredebug,warnandinfo.

database Connection information for the time-series database. Requiredinformationshouldbeenteredinthefollowingstrings:host,port,username,passwordandname(forthetargetdatabasename).

identity Connection information for the OpenStack Keystone service.Requiredinformationshouldbeenteredinthefollowingstrings:host, port, tenantName, username and password. It should benoted that the tenant whose credentials must have sufficientprivilegestoaccessallthenecessaryOpenStackVNFinstances.

ceilometer Connection information for the OpenStack Ceilometer service.Requiredinformationshouldbeenteredinthefollowingstrings:pollingInterval,host andport. ThepollingInterval sets the timeperiod during which the backend polls Ceilometer formeasurements.

6http://hapijs.com/7https://github.com/hapijs/joi8https://github.com/z0mt3c/hapi-swaggered9https://github.com/glennjones/hapi-swagger10https://github.com/lorenwest/node-config



nova Connection information for the OpenStack Nova service.Requiredinformationshouldbeenteredinthefollowingstrings:hostandport.

4.6.2. Interfacestocloudandnetworkcontrollers

Monitoringofcomputing,hypervisorandstoragestatusandresourcesareperformeddirectlyviatheOpenStackCeilometerframework.TheVIMMM(OpenStackconnector)periodicallypolls the Telemetry API formetrics regarding all deployed physical and virtual resources.AlthoughthesemetricscouldberetrievedbydirectlyaccessingtheCeilometerdatabase,sincetheschemeofthelattermayevolveinfutureOpenStackversions,itismoreappropriatetousetheREST-basedTelemetryAPI.TheVIMMMissuesGETrequeststotheservicereferringtoaspecificresourceandmeter,andtheresultarereturnedinJSONformat.

Fortunately,theTelemetrysupportforthehypervisorselectedforT-NOVA(libvirt)offersthewidestpossiblelistofavailablemonitoringmetrics,comparedtootherhypervisors,suchasXenorvSphere.

ThecurrentversionoftheOpenStackconnectorhasthefollowingworkflow:

• Tokenmanagement.CommunicationwiththeOpenStackAPIrequiresalwaysavalidtoken.Thebackenduses theOpenstackKeystone service toacquirea valid token,whichisusedforeverytransaction.Thetokenisbeingcheckedbeforesubmittinganyrequestandifitisexpired,itgetsrenewed.

• Instanceinformationretrieval.Thebackenddoesnotknowaprioriwhichinstancesaretobemonitored.BypostingarequestattheNovaAPI,itgetsalistoftheactiveinstancesinordertoproceedwiththemeasurementrequest.

• Measurement retrieval Once the backend knows the existence of an activeOpenStackinstance,itisabletoretrievespecificmeasurementsforit.CurrentlyCPUutilisation, incoming and outgoing bytes rate are supported, but the list is quicklyexpandedwithothermetricsthatOpenStackCeilometersupports.

ThisworkflowisbeingperformedwithatimeperiodthatcanbesetwiththepollingIntervalparameterasaforementioned.

Moreover, collectingmetricsvia theAPIallowsexploitingadditional featuresofTelemetrysuchas:

• Metergrouping:itispossibletodefinesetofmetricsandretrieveanentiresetwithasinglequery;

• Sampleprocessing:itispossibletodefinebasicaggregationrules(average,max/minetc.)andretrieveonlytheaggregateinsteadofasetofmetrics;

• Alarming:itispossibletosetalarmsbasedonthresholdsforthecollectionofsamples.Analarmcandependonasinglemeter,oracombination.TheVIMMMmayusetheAPItosetanalarmanddefineanHTTPcallbackservicetobecalledwhenthealarmhasbeensetoff.

4.6.3. NorthboundAPItoOrchestrator

The VIMMonitoring Framework offers a Northbound API to the Orchestrator in order toinformthelatterofthenewestmeasurementsandinthefutureforpossiblealerts.AnHTTP



RESTfulinterfaceprovidesthelatestmeasurementsuponrequestsandtheabilitytosubscribetomeasurements.

ThelatestdraftoftheETSINFVIFAdocument[NFVIFA005]whichprovidesaninsighttotheOr-Vi referencepoint, containsahigh-level specificationof the requirementsand thedatastructureswhich need to be adopted for infrastructuremanagement andmonitoring. Therequirementsrelatedtomonitoringcanbesummarizedintothefollowing:

• The VIM must support querying information regarding consumable virtualisedresources

• TheVIMmustissuenotificationsofchangestoinformationregardingresources

• The VIM must offer full support for alarming (alarm creation/ modification/subscription/issue/deletion)

• TheVIMmustissuenotificationsforinfrastructurefaults

TheNorthboundAPIprovidedby theT-NOVAVIMMonitoringFramework intends toalignwiththeserequirements.

4.6.3.1. Querying

First,asetofRESTGETendpointssupport thetransmissionof the latestmeasurementsofeveryavailabletypeforeveryinstancebeingmonitored.

The template of such URL is /api/measurements/{instance}.{type}, where instance is theUniversallyUniqueIdentifier(UUID)givenbytheOpenStackdeploymenttotheinstanceandtype one of the supported measurement types. The currently supported measurementtypesare:

• cpu_util(CPUutilisation)

• cpuidle(CPUidleusage)

• fsfree(freespaceontherootfilesystem)

• memfree(freememoryspace)

• network_incoming(therateofincomingbytes)and

• network_outgoing(therateofoutgoingbytes)

TheformatoftheanswerisaJSONobjectwhosefieldsarethefollowing:

• timestamp:showsthetimestampthemeasurementwastaken

• value:showstheactualmeasurementvalue

• units:showsthemeasurementunits

Theseendpointsrequireconstantpollinginordertoretrievetheirvalues.Ifasystemrequiresa constant stream of measurements at specific interval times, then it could use thesubscriptionendpoint.

4.6.3.2. Meters/notificationspush

TheVIMMMenablesapublish-subscribecommunicationmodelforpushingofmetricsandeventstotheOrchestrator. Inordertosubscribeformeasurementevents, it isrequiredtoprovidethefollowinginformationintheformofaJSONobject:



• types:Thisisanarrayofthemeasurementtypes.ThesupportedtypesarethesameonesastheonesintheGETendpoints.

• instances:Thisisanarrayoftheinstancesthathavetobemonitored.TheUUIDsoftheinstancesarealsousedhere.

• interval:Thisistheintervaltimethemonitoringbackendhastowaitbeforesendinganewsetofmeasurements.Thetimeshouldbegiveninminutes.

• callbackUrl:ThisistheURLthemonitoringbackendhastocallbackinordertosubmitthenewestmeasurements.

This JSON object has to be submitted as a payload in a POST request to the endpoint/api/subscribe.Upontransmission,aconfirmationmessageissentbackasresponseandafterthespecifiedinterval,amessageisgiventothecallbackUrl,similartotheonesonecangetviatheGETendpoints.

4.6.3.3. Alarming

TheVIMMMwilloffermethodsforcreatingalarmsanddispatchingcallbackswheneverthestatusofalarmchanges.Thisfeatureisunderdevelopment.

4.6.3.4. APIdocumentation

FortheconvenienceofAPIconsumers,aSwagger-UIendpointisgivenat/docs,whereuserscanrefertoforup-to-dateinformation(Figure10)

Figure10.LiveAPIdocumentationviaSwagger



4.6.4. Anomalydetection

Fordetectingcriticalsituationsandproducingalarms,theT-NOVAVIMMMwillatfirststagesupportoperationsviastaticallydefinedthresholds.TheOrchestratorwillusethealarmingmethods(seeSec.4.6.3.3.)todefineandmodifyalarms.InturntheVIMMMwillchecktheaffectedmeasurements,evaluatetheexpressionsgivenbytherulesandsendeventuallyanotificationiftheexpressionistrue.Thisisastandardfeatureprovidedbyseveralmonitoringframeworks,assurveyedinChap.3

Goingbeyondthiscapability,anomalydetection i.e. identificationofpossiblemalfunctionswithout pre-defined alarm thresholds is a very promising research directionwhich is veryrelevanttoNFVmonitoring.

In the T-NOVAVIMmonitoring framework, anomalydetectionwill be incorporated to thealarmingsystem.TheVIMMMwilluseoutlierdetectionmethodstoexamineasetofmetersandsamplesassociatedwitharesourceandidentifywhetherthereisasignificantoperationfrom the “standard” operation. Some of themethods for outlier detection which will beinvestigated for application in T-NOVA are [Hodge04] proximity-based techniques,parametricsandnon-parametricmethodsaswellasneuralnetworks.

Both semi-supervised and unsupervised approaches will be considered. Semi-supervisedapproachesareexpected toyieldbetter results, yet their applicationassumes thatpropertestingoftheVNFwillhavebeenpreceded,resultingintoanappropriatelysizeddatasetwhichwillcorrespondto“normal”operation.

4.6.5. Time-seriesDatabase

Since the T-NOVA VIM Monitoring Backend handles primarily measurements, we haveselected a time-series database as optimal. For its implementationwe have opted to useInfluxDB [InfluxDB], a time-series database written in Go. By concentrating all data in aperformant DB and relying on periodical feeds, we can simplify workflows, reduce inter-componentsignalingandthuseliminatetheneedforamessagequeue,whichiscommonlyusedinmonitoringframeworks.

Although,InfluxDBisadistributeddatabase,forthetimebeingweareevaluatingitonasinglenodeuntilstorageissuesappear.

The Backend requires that a database has already been created in InfluxDB. The use of aretention policy is also highly recommended, since the database could store potentiallymultiplegigabytesofmeasurementdataeveryday.ForthedevelopmentandQAtestingofthebackendweusearetentionpolicyof30days.After30daysthemeasurementsareerased,inordertofreeupdiskandmemoryspace.

Eachmeter isstored inaseparatetable,wheremultiple instancesmaystorevaluesofthespecificmeasurementtype.Inadditiontotheactualvalue,atimestampandaninstancetagarealsostored,inordertoidentifythemeasurement’soriginandtime.Finally,thedatatypeofeverymeasurementisfloat64.


©T-NOVAConsortium

38

QueriesareperformedintheLineprotocol11byusingtheinfluentmodule.Anexamplequeryisthefollowing:

SELECTlast(value)FROMmeasurementTypeWHEREhost='instanceA'

ThisqueryretrievesthelastmeasurementofacertaintypeandacertainVNFinstance.

4.6.6. Graphicaluserinterface

Themain interface of theVIMMonitoring Framework is theHTTPAPI of the backend, asdescribed in Sec. 4.6.3. During the development of the monitoring framework, theVNF developers have, however, requested for a graphical way of accessing themonitoringdatatheirVNFinstancesandVNFapplicationsmorespecifically,produce.ThisleadtotheintegrationofaGrafanaserver.Grafana[Grafana]isagraphbuilderforvisualisingtimeseriesmetrics. It supports InfluxDBasadatasourceandthus, it iseasy tovisualiseall theavailablemeasurementsdirectlyfromthedatabase.

Figure11.VisualizationofmeasurementswithGrafana

4.7. Packaging,documentationandopen-sourcerelease

InanefforttocontributetomaximisingtheimpactofT-NOVAontheNFVcommunity,theT-NOVA monitoring framework is released [GH-VIM] under the GNUGeneral Public License v3.012. Interested stakeholders and prospective contributors arewelcometodownloadanddopullrequestsonthepublicGitHubrepository13.Theplanistomove the project to the overall T-NOVA Github account, as soon as the latter becomesavailable.

11https://influxdb.com/docs/v0.9/write_protocols/write_syntax.html12https://github.com/spacehellas/tnova-vim-backend/blob/master/LICENSE.txt13https://github.com/spacehellas/tnova-vim-backend



ADockerimagecontainingnodeandtheapplicationisalsoprovidedinsidethisrepository.ADocker image allows the seamless usage of the backend in any OpenStack deployment,regardless of deploying it on a physical or virtual machine. Most of the configurationparametersareexposedinDockerenvironmentvariablesandcanbesetupduringcontainercreation.A YAML file (docker-compose.yml) is also provided inside the repository, so thatuserscancombinetheVIMmonitoringbackend,InfluxDBandGrafanainthesamewaythebackendisbeingdevelopedandtested.

For further documentation of the backend, please refer to the README file14 and thedocumentationdirectory15oftherepository.The informationwillbekeptup-to-datewhiledevelopmentprogresses.FortheAPIdocumentation,pleaserefertothe/docsendpointofaworkingdeployment,wheretheSwagger-UIishosted.

14https://github.com/spacehellas/tnova-vim-backend/blob/master/README.md15https://github.com/spacehellas/tnova-vim/blob/master/documentation



5. VALIDATION

5.1. Functionaltesting

ForthepurposeoffunctionaltestingandbenchmarkingofthecurrentreleaseoftheT-NOVAVIMmonitoringframework,thelatterwasintegratedintotheT-NOVAIVMtestbedasshowninFigure12below.

Figure12.TestbedconfigurationfortestingVIMMonitoring

TheVIMMonitoringManagerwasdeployedasaDockercontainerinaseparatephysicalhost,aspartoftheVIMmanagementandmonitoringframework.ItinterfacedwithOpenstackforthecollectionofmetricsviaCeilometer.

TheworkloadtoproducethemetricswasthelatestversionofthevTC(virtualTrafficClassifier)VNF,deployedinaVM.ThevTCusedthePythonlibraryprovided(seeSec.4.4)inordertodispatchVNFmetricstotheVIMMM.Atthesametime,thecollectdagentwasalsoinstalledintheVNFVM,todispatchgenericmetrics.

Inordertoemulaterealisticoperationalconditions,atrafficgeneratorhostedinaseparateVMwasusedtoplaybackarealnetworktrafficdumpcontainingamixofvariousservices.

BehindtheVIMMon.Mgr.twoclientswereused;oneforaccessingthemetricsviatheRESTAPIandasecondoneaccessingtheweb-basedGUI.

ThechosenfunctionaltestintendedtovalidatemostofthefunctionalcapabilitiesoftheVIMMM,namely:

• InterfacingwithCeilometer

• Collectionofagentmetrics

• CollectionofVNFmetrics

• Persistenceofmeasurements

• GUIoperation



TheGUIwasconfiguredbytheusertodisplay thefollowingmetrics, integrated inasingleview:

• VNFCPUutilization,retrievedfromOpenstack

• VNFmemoryusage andnetwork traffic (cumulative packet count), as reportedbyguestOSviathemonitoringagent

• VNF-specificmetrics.Specifically,thevTCisabletoreportthepacketrateofdifferentapplicationsdetectede.g.Skype,Bittorrent,Dropbox,Google,Viberetc.)

Figure13.End-to-endfunctionaltest:theVIMMMGUIscreenshot,integratingmetricsfromvarious

sources

Itwasverifiedthatthemultiplesampleswerecollectedanddisplayedproperly.ThevalidityofthemeasurementswasverifiedbyaccessingtheconsoleviewoftheVNFVMand:

Checkingthesystemmetricsviacommand-linetools(top,netstatetc.)

Checking the vTC VNF logs which contained periodic dumps of the VNF metrics (per-application rate). It is clarified thatwhat is tested here is the ability to communicate andcollectmetricsandnottheaccuracyofthevTC.

Systemstabilitywasalsocheckedbyallowingthesystemtoruncontinuously;thevTCandVIM MM was found to be operating normally after more than two days of uptime andcontinuousoperation,untilitwasmanuallystopped.



5.2. Benchmarking

Asaforementioned,apartfromtheGUI,theVIMalsoexposesaprogrammaticinterface(API)totheorchestrationandVNFMcomponents.WetestedthescalabilityandperformanceoftheVIMMMbyloadingitwithavariablenumberofGETrequests,askingforasinglespecificmetric (CPU load) of the vDPI VNF. We used the httperf software [httperf] to generatesyntheticHTTPGETrequestsatvariousratesandmeasuredtherateofresponsesreceived.Then,werepeatedtheprocedure,thistimedirectlypollingCeilometerforthesamemetric.

Thetwosetsofmeasurementsweremadeonplatformswithsimilarhardwarecapabilities.TheresultsaredepictedinFig.5.

Figure14.VIMMonitoringManagerperformance

ItcanbeshownthattheVIMMMcanexposemetricswithperformancecomparabletonativeCeilometer. It also seems to exhibit better stability when overloaded (at more than 160requests/secforthegivenhardwareconfiguration).

An important added value of VIM MM is the communication overhead, which has beenreducedtotheminimumtoimprovescalability.Figure15comparesthelength(inbytes)oftheresponsestoasingleGETrequestforaspecificmetricofaVNFVM(CPUload,memoryutilizationanddiskusage).TheresponseofCeilometerisquiteverbose,sinceitalsoincludesdetailedinstanceinformation.Wetrytoalleviatethiseffectbyincludingintheresponsebodyonlytheabsolutelynecessaryelements,i.e.themetricname,thevalueandthetimestamp.Theresultisadecreaseinoverheadbyabout95%.



Figure15.Lengthofresponsesforsingle-metricrequests

It is thus seen that the VIM MM exhibits acceptable performance when it comes tocommunicatingmetrics.

Regarding Ceilometer, however, it must be noted that the performance limitations ofCeilometer are known and expected to be alleviated in the upcoming releases - and theTelemetry project is already targeting at fulfilling NFV requirements. In this context,Ceilometer could also in the near future offer an efficient solution for NFV monitoring,bringingatthesametimestrongcommunitysupportaswellaswideindustrialuptake,beingacoreOpenstackcomponent.

5.3. Fulfillmentofrequirements

Followingthesuccessfulexecutionoftheaforementionedvalidationandassessmenttests,the table below explains how the implemented and tested VIM monitoring frameworkeventuallyfulfills(orisplannedtofulfill)therequirementswhichweresetinSection2.

Table11.Compliancetorequirements

RequirementfortheMonitoringFramework

Status Justification

TheMFmustprovideavendoragnosticmechanismforphysicalresourcemonitoring.

Compliance Themechanismsintroducedformeasurementscollectionarevendoragnosticformostmetrics(CPU,memory,storage,networketc.)

TheMFmustprovideaninterfacetotheOrchestratorforthecommunicationofmonitoringmetrics.

Compliance TheMFexposesanorthboundRESTAPIformetrics/alarmscommunicationineitherpushorpullmode.



TheMFmustre-useresourceidentifierswhenlinkingmetricstoresources.

Compliance TheVIMMMre-usestheOpenstackUUIDsandhostnamesforlinkingmetricstoresources.

TheMFmustmonitorinrealtimethephysicalnetworkinfrastructureaswellasthevNetsinstantiatedontopofit.

Compliance(Planned)

Network-relatedmeasurementsarederivedfromthemonitoringagents(fornetworkinterfaces)andalsoOpenDaylight(forvirtuallinksandnetworks)(Featureunderdevelopment)

TheMFmustprovideanAPIforcommunicatingmetrics(ineitherpushorpullmode)

Compliance TheMFexposesanorthboundRESTAPIformetrics/alarmscommunicationineitherpushorpullmode.

TheMFmustcollectutilisationmetricsfromthevirtualisedresourcesintheNFVI.

Compliance TheVIMMMcollectsmetricsfromdeployedVMsandestablishedvNets.

TheMFmustcollectcomputedomainmetrics.

Compliance Directaccesstocomputedomainmetricsisachievedbymeansofthecollectmonitoringagent.

TheMFmustcollecthardwareacceleratormetrics

Compliance(Planned)

Hardwareacceleratormetricsareaccessiblebymeansofspecificagent(collectd)plugins.(Featureunderdevelopment)

TheMFmustcollectcomputemetricsfromtheHypervisor.

Compliance TheVIMMMcollectshypervisormetricsindirectlyviatheCeilometerAPI.

TheMFmustcollectnetworkdomainmetricsfromtheHypervisor.

Compliance TheVIMMMcollectshypervisormetricsindirectlyviatheCeilometerAPI.

TheMFmustprocessanddispatchalarms.

Compliance(Planned)

TheVIMMMallowsconfiguringandsubscribingtoalarms.

TheMFmustcollectmetricsfromphysicalandvirtualnetworkingdevices.

Compliance(Planned)

TheVIMMMinterfaceswithOpenDaylighttocollectnetworkdevicestatistics.(Featureunderdevelopment)

TheMFmustleverageSDNmonitoringcapabilities.

Compliance(Planned)

TheVIMMMcollects(mostlyport)statisticsforSDNdevicesfromOpenDaylight.(Featureunderdevelopment)



6. CONCLUSIONSANDFUTUREWORK

ThisdocumentdescribedthedesignanddevelopmentofamonitoringframeworkfortheT-NOVAIVMlayer.Usingacomprehensivestate-of-the-artsurveyaswellasaconsolidationofT-NOVA requirements, the architecture of the T-NOVA VIM monitoring framework wasspecified. Taking into account the use of OpenDaylight and OpenStack as the controllertechnologiesintheVIM,infrastructuremetricsandstatisticsavailablefromthesecontrollersare collected. Furthermore, a VNF monitoring agent was also introduced, as an optionalcomponent,collectingarichsetofmetricsfromwithinVMsandVNFapplications.Allthesemetricsareaggregatedand filtered intoacentralisedMonitoringManager,whichexposesstatusandresource informationof theNFVI-PoPtotheOrchestrator,asconfiguredbythelatter.

Itisconcludedthat,withtheproposedapproach,thegoalofdeliveringaneffective,efficientandscalablemonitoringsolutionfortheT-NOVAIVMlayer isachieved.Thesolutionunderdevelopment is able to expose to the Orchestrator and to the Marketplace enhancedawareness of the IVM status and resources, while at the same time keeping thecommunicationandsignallingoverheadatminimum.

The current release has been integrated with the T-NOVA IVM testbed and has beendemonstrated in operation in IEEE IM 2015 and IEEE SDN/NFV conferences as part of anintegrateddemonstratoroftheT-NOVAproject,monitoringanNFVservicewiththevTCVNF.

Thenext steps in implementation involve the finalizationof theOrchestratorAPIwith thealarmingfunctionality,theintegrationofOpenDaylight,aswellastheanomalydetectionpart.Theseadvanceswillbereflectedinthefinalversionofthisdeliverable.



7. REFERENCES

[Aodh] OpenstackTelemetryalarming,https://github.com/openstack/Aodh

[Cloudwatch] AmazonCloudWatch,http://aws.amazon.com/cloudwatch

[Collectd] collectd–Thesystemstatisticscollectiondaemon,https://collectd.org/

[Cyclops] Cyclopsframework,http://icclab.github.io/cyclops/

[D232] M. McGrath (Ed.) et al, “Specification of the Infrastructure Virtualisation,ManagementandOrchestration–Final”,T-NOVADeliverableD2.32,October2015

[D532] “NetworkFunctionsImplementationandTesting–Final”,T-NOVADeliverableD5.32,June2016

[DCM] YeYu,C.Qian,andX.Li,"DistributedandCollaborativeTrafficMonitoringinSoftwareDefinedNetworks,"presentedattheACMSIGCOMMWorkshoponHot Topics in Software DefinedNetworking (HotSDN'14), Chicago, IL, USA,2014.

[Doctor] OPNFV Wiki - Project: Fault Management (Doctor),https://wiki.opnfv.org/doctor

[DoctorDel] Doctor Deliverable: Fault Management and Maintenance, Release 1.0.0,October 2015,http://artifacts.opnfv.org/doctor/DoctorFaultManagementandMaintenance.pdf

[Drools] DroolsBPMengine,http://www.drools.org/

[Flowsense] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. Madhyastha,"FlowSense:MonitoringNetworkUtilizationwithZeroMeasurementCost,"inPassiveandActiveMeasurement.vol.7799,M.RoughanandR.Chang,Eds.,ed:SpringerBerlinHeidelberg,2013,pp.31-41.

[Ganglia] GangliaMonitoringSystem,http://ganglia.sourceforge.net

[GH-VIM] https://github.com/spacehellas/tnova-vim-backend

[Gnocchi] OpenstackGnocchiproject,https://wiki.openstack.org/wiki/Gnocchi

[Grafana] Grafana:Anopensource,featurerichmetricsdashboardandgrapheditorforGraphite,InfluxDB&OpenTSDB,http://grafana.org/

[Graphite] Graphite: A Highly Scalable Real-time Graphing System,https://github.com/graphite-project/graphite-web

[Hodge04] V. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies”,ArtificialIntelligenceReview22(2004),pp.85-126

[httperf] https://github.com/httperf/httperf

[Icinga] ICINGA.,https://www.icinga.org/

[InfluxDB] InfluxDB:Anopen-source,distributed,timeseriesdatabasewithnoexternaldependencies,http://influxdb.com/

[Monalisa] MONitoring Agents using a Large Integrated Services Architecture,http://monalisa.caltech.edu/monalisa.htm



[Monasca] OpenstackMonascaproject,https://wiki.openstack.org/wiki/Monasca

[Nagios] Nagios Is The Industry Standard In IT Infrastructure Monitoring,http://www.nagios.org/

[NFVIFA005] NetworkFunctionsVirtualisation(NFV);ManagementandOrchestration;Or-Vireferencepoint–InterfaceandInformationModelSpecification,workinprogress,November2015

[nodejs] https://nodejs.org/en/

[NVFINF010] ETSI GS NFV-INF 010 V1.1.1 (2014-12), Network Functions Virtualisation(NFV);ServiceQualityMetrics

[OpenNetMon]N.L.M.vanAdrichem,C.Doerr,andF.A.Kuipers,"OpenNetMon:NetworkmonitoringinOpenFlowSoftware-DefinedNetworks,"inNetworkOperationsandManagementSymposium(NOMS),2014IEEE,2014,pp.1-8.

[Payless] S.R.Chowdhury,M.F.Bari,R.Ahmed,andR.Boutaba,"PayLess:AlowcostnetworkmonitoringframeworkforSoftwareDefinedNetworks,"inNetworkOperationsandManagementSymposium(NOMS),2014IEEE,2014,pp.1-9.

[Prediction] OPNFV Wiki – Data Collection for Failure Prediction,https://wiki.opnfv.org/prediction

[SeaLion] Sealion; Quickly Diagnose Problems with you Linux Servers,https://sealion.com/

[Shinken] Shinken,http://www.shinken-monitoring.org/

[Stacktach] Stacktach, Event-based Monitoring & Billing solution for OpenStack,https://github.com/rackerlabs/stacktach

[Statsd] StatsD; Simple daemon for easy stats aggregation,https://github.com/etsy/statsd/

[Telemetry] OpenstackTelemetry,https://wiki.openstack.org/wiki/Telemetry

[vSphere] vmwarevSphere,http://www.vmware.com/products/vsphere

[Zabbix] ZABBIX, The Enterprise-class Monitoring Solution for Everyone,http://www.zabbix.com/

[Zenoss] ZenossUserCommunity,http://www.zenoss.org/



8. LISTOFACRONYMS

Acronym Explanation

API ApplicationProgrammingInterface

CPU CentralProcessingUnit

DPDK DataPacketDevelopmentKit

FPGA FieldProgrammableGateArray

GPU GraphicsProcessingUnit

HW Hardware

KPI KeyPerformanceIndicator

NFV NetworkFunctionsVirtualisation

NFVI NFVInfrastructure

NFVIPoP NFVIPoint-of-Presence

NFVO NFVOrchestrator

OID ObjectIdentifier

OPNFV OpenPlatformforNFV

OS OperatingSystem

REST RepresentationalStateTransfer

SNMP SimpleNetworkManagementProtocol

SoC System-on-Chip

VDU VirtualDeploymentUnit

VIM VirtualisedInfrastructureManager

VIMMM VIMMonitoringManager

VM VirtualMachine

VNF VirtualNetworkFunction

VNFD VNFDescriptor

VNFM VNFManager

VNFP VNFProvider

vTC VirtualTrafficClassifier

YANG YetAnotherNextGeneration



9. ANNEXI:SURVEYOFRELEVANTIT/NETWORKMONITORINGTOOLS

This sectionpresents abriefoverviewofexisting frameworks formonitoring virtualized ITinfrastructuresaswellasSDN-enablednetworks,anddiscussestechnologieswhichcouldbepartiallyre-usedinT-NOVA.

9.1.1. IT/Cloudmonitoring

9.1.1.1. Shinken

Shinken isanopensourcesystemandnetworkmonitoringapplication [Shinken]. It is fullycompatiblewithNagiosplugins.ItstartedasaproofofconceptforanewNagiosarchitecture,but since the proposal was turned down by the Nagios authors, Shinken became anindependenttool.ItisnotaforkofNagios;itisatotalrewriteinPython.Itwatcheshostsandservices,gathersperformancedataandalertsuserswhenerrorconditionsoccurandagainwhentheconditionsclear.Shinken'sarchitectureisfocusedonofferingeasierloadbalancingandhighavailabilitycapabilities.ThemaindifferencesandadvantagestowardNagiosare:

• Amoreefficientdistributedmonitoringandhighavailabilityarchitecture

• GraphiteintegrationintheWebUI

• Betterperformance,mostlyduetotheuseofadistributeddatabase(MongoDB)

9.1.1.2. Icinga

Icingaisanopen-sourcenetworkandsystemmonitoringapplicationwhichwasbornoutofaNagios fork [Icinga]. Itmaintainsconfigurationandplug-incompatibilitywiththe latter. Itsnewfeaturesareasfollows:

• AmodernWeb2.0styleuserinterface;• Aninterfaceformobiledevices;• Additionaldatabaseconnectors(forMySQL,Oracle,andPostgreSQL);• RESTfulAPI.

CurrentlytherearetwoflavoursofIcingathataremaintainedbytwodifferentdevelopmentbranches:Icinga1(theoriginalNagiosfork)andIcinga2(wherethecoreframeworkisbeingreplacementbyafullrewrite).

9.1.1.3. Zenoss

ZenossisanopensourcemonitoringplatformreleasedundertheGPLv2license[Zenoss]Itprovides an easy-to-use Web UI to monitor performance, events, configuration, andinventory.Zenossisoneofthebestoptionsforunifiedmonitoringasitiscloud-agnosticandisopensource.Zenossprovidespowerfulplug-insnamedZenpacks,whichsupportmonitoringonhypervisors(ESX,KVM,XenandHyperV),privatecloudplatforms(CloudStack,OpenStackand vCloud/vSphere), and public cloud (AWS). InOpenStack Zenoss integrateswithNova,KeystoneandOpenStackTelemetry.

9.1.1.4. Ganglia

Gangliaisascalabledistributedsystemmonitortoolforhigh-performancecomputingsystemssuchasclustersandgrids[Ganglia].Itsstructureisbasedonahierarchicaldesignusingatree



of point-to-point connections among cluster nodes. Ganglia is based on an XML datarepresentation,XDRforcompactandRRDtoolfordatastorageandvirtualisation.TheGangliasystemcontains:

1. Twouniquedaemons,gmondandgmetad2. APHP-basedwebfront-end3. Othersmallprograms

gmond runs on each node tomonitor changes in the host state, to announce applicablechanges,tolistentothestateofallGanglianodesviaaunicastormulticastchannelbasedoninstallation, and to respond to requests. gmetad (GangliaMeta Daemon) polls at regularintervalsacollectionofdatasources,parses theXMLandsavesallmetrics to round-robindatabases.AggregatedXMLcanthenbeexported.

TheGangliawebfrontendiswritteninPHP.ItusesgraphsgeneratedbygmetadandprovidesthecollectedinformationlikeCPUutilisationforthepastday,week,month,oryear.Gangliahasbeenusedtolinkclustersacrossuniversitycampusesandaroundtheworldandcanscaleto handle clusters with 2000 nodes. However, further work is required in order for it tobecomemorecloud-agnostic.

9.1.1.5. StackTach

StackTach isadebuggingandmonitoringutility forOpenStackthatcanworkwithmultipleDataCentres,includingmulti-celldeployment[Stacktach].Itwasinitiallycreatedasabrowser-baseddebuggingtoolforOpenStackNova.Sincethattime,StackTachhasevolvedintoatoolthat can perform debugging, monitoring and auditing. StackTach is quickly moving intoMetrics,SLAandMonitoringterritorywithversion2andtheinclusionofStacky,thecommandline interface to StackTach. StackTach contains aworker that reads notifications from theOpenStack’sRabbitMQqueuesandstorestheminadatabase.Fromthere,StackTachreviewsthestreamofnotificationstogleanusage informationandassemble it inaneasy-to-queryfashion.Userscaninquireoninstances,requests,servers,etc.usingthebrowserinterfaceortheStackycommandlinetool.RackspaceisworkingonStackTachintegrationwithTelemetry.

9.1.1.6. SeaLion

SeaLionisacloud-basedsystemmonitoringtoolforLinuxservers.Itinstallsanagentinthesystem,whichcanberunasanunprivilegeduser[SeaLion].Theagentcollectsdataatregularintervalsacrossserversandthisdatawillbeavailableonyourworkspace.Sealionprovidesahigh-level view (graphical overview) of Linux server activity. The monitoring data aretransmittedoverSSLtotheSeaLionservers.Theserviceprovidesgraphs,chartsandaccesstotherawgathereddata.

9.1.1.7. MonALISA

MONitoringAgents using a Large Integrated ServicesArchitecture (MonaLISA) is basedonDynamicDistributedServiceArchitectureandisabletoprovidecompletemonitoring,controland global optimisation services for complex systems[Monalisa]. TheMonALISA system isdesigned as a collection of autonomous multi-threaded, self-describing agent-basedsubsystems which are registered as dynamic services, and are able to collaborate andcooperateinperformingawiderangeofinformationgatheringandprocessingtasks.

Theagentscananalyseandprocesstheinformationinadistributedway,inordertoprovideoptimisation decisions in large-scale distributed applications. The scalability of the systemderives from the use of amultithreaded execution engine, that hosts a variety of looselycoupledself-describingdynamicservicesoragents,andtheabilityofeachservicetoregister



itselfandthentobediscoveredandusedbyanyotherservices,orclientsthatrequiresuchinformation. The system is designed to easily integrate existing monitoring tools andproceduresandtoprovidethisinformationinadynamic,customised,self-describingwaytoanyotherservicesorclients.

By using MonALISA the administrator is able to monitor all aspects of complex systems,including:

• Systeminformationforcomputernodesandclusters;• Networkinformation(traffic,flows,connectivity,topology)forWANandLAN;• Monitoringtheperformanceofapplications,jobsorservices;and• End-usersystemsandend-to-endperformancemeasurements.

9.1.1.8. collectd,StatsDandGraphite

Cloudinstancesmayalsobemonitoredbyusingacollectionofseparateopensourcetools.collectdisadaemonwhichcollectssystemperformancestatisticsperiodicallyandprovidesmechanismstostorethevaluesinavarietyofways[Collectd].collectdgathersstatisticsaboutthesystemitisrunningonandstoresthisinformation.Thesestatisticscanthenbeusedtofindcurrentperformancebottlenecks(i.e.performanceanalysis)andpredictfuturesystemload(i.e.,capacityplanning).collectdiswritteninCforperformanceandportability,allowingittorunonsystemswithoutscriptinglanguageorcrondaemon,suchasembeddedsystems.Atthesametimeitincludesoptimisationsandfeaturestohandlebigamountsofdatasets.StatsD[Statsd]isaNode.JSdaemonthatlistensformessagesonaUDPtoTCPport.StatsDlistensforstatistics,likecountersandtimersandthenparsesthemessages,extractsmetricsdata,andperiodicallyflushesthedatatootherservicesinordertobuildgraphs.AtoolthatcanbeusedtobuildgraphsafterwardsisGraphite[Graphite],whichisabletostorenumerictime-seriesdataandrendergraphsofthedataondemand.

9.1.1.9. vSphere

The vSphere statistics subsystem collects data on the resource usage of inventory objects[vSphere].Dataonawiderangeofmetricsiscollectedatfrequentintervals,processedandarchived inadatabase.Statistics regardingthenetworkutilisationarecollectedatCluster,Host andVirtualMachine levels. In addition vSphere supports performancemonitoring ofguestoperatingsystems,gatheringstatisticsregardingnetworkutilisationamongothers.

9.1.1.10. AmazonCloudWatch

AmazonCloudWatch isamonitoringservice forAWScloud resourcesand theapplicationsrunningonAWS[Cloudwatch].Itprovidesreal-timemonitoringtoAmazon'sEC2customersontheirresourceutilisationsuchasCPU,diskandnetwork.However,CloudWatchdoesnotprovideanymemory,diskspace,orloadaveragemetricswithoutrunningadditionalsoftwareon the instance. Itwasprimarilydesigned forusewithAmazonElasticLoadBalancingandAutoScalingwithloadbalancinginmind:theservicechecksCPUusageonmultipleinstancesandautomaticallycreatesadditionaloneswhentheloadincreases.

9.1.2. NetworkMonitoring

Networkmonitoring isadomain thathasattractedsignificantattention fromthe researchcommunity over the past decades, withwell-established technologies and standardswithregard to measurement processes (active and passive) as well as the communication ofmonitoringmetrics(SNMP,IPFIX,sFlowetc.).



InthecontextofT-NOVA,wherenetworkmanagement,atleastwithineachNFVI-PoPisbasedonOpenFlow,themeasurementprocesswillleverageOpenFlow’smonitoringcapabilities.

OpenFlowprovidesthecapability toreportper-flowandper-portmetrics, reportedbytheswitchitself.ThesemetricsarethencollectedbytheControllerandcommunicatedtoSDNcontrolapplicationsviathenorthboundAPIoftheControllerit-self(Figure16).AlmostallSDNcontrollersofferthecapabilitytoexposemonitoringmetrics,eitherviaAPIcallsorlanguagebindings.Inthisrespect,theOpenFlow-basedarchitectureprovidesthecapabilitytomonitorallnetworkelementsinauniformandvendor-agnosticmanner.

NetworkDevices

SouthboundAPI(OpenFlow)

CONTROLLERS(NOX,POX,OpenDaylight,Floodlight,Beacon,Ryu,Trema,Mul,

Jaxon,Maestro,NodeFlow,Ovs-controller,NDDI-OESS)

NorthboundAPI

SDNApplications

Monitoring

Figure16.CommunicationofmonitoringmetricsinanOpenFlow-enabledarchitecture

Inthiscontext,severalmonitoringapplicationshavebeendeveloped,leveragingOpenFlowcapabilities for integrated network management tasks. Some of these applications areoverviewedinthetablebelow.

Table12.OpenFlowmonitoringapplications

MonitoringApplication

Briefdescription Control-lerUsed

OpenSource

Availableat

OpenNetMon OpenNetMon[OpenNetMon]continuouslymonitorsallflowsbetweenpredefinedlinkdestinationpairsonthroughput,packetlossanddelay

POX Yes

https://github

.com

/TUD

elftNAS

/SDN

-Ope

nNetMon

/



Payless Payless[Payless]providesaflexibleRESTfulAPIforflowstatisticscollectionatdifferentaggregationlevels.Itusesanadaptivestatisticscollectionalgorithmthatdelivershighlyaccurateinformationinreal-timewithoutincurringsignificantnetworkoverhead.

POX, NOX,OpenDayLight

Yes

http://gith

ub.com

/srcviru

s/flo

odlight.

DCM DCM[DCM]allowsswitchestocollaborativelyachieveflow-monitoringtasksandbalancemeasurementload.

None(nativeOF)

No Notavailable

FlowSense FlowSense[Flowsense]achievesapush-basedapproachtoperformancemonitoringinflow-basednetworks,wherethenetworkinformsofperformancechanges,ratherthanqueryit.

None(nativeOF)

No Notavailable

In addition, many of the monitoring frameworks mentioned in Section 9.1.1 for cloudinfrastructurescanbealsousedformonitoringOpenFlowinfrastructures,viatheappropriateplugins.

Date post:	14-Feb-2017
Category:	Documents
Upload:	vudiep
View:	215 times
Download:	0 times

Monitoring and Maintenance - Interim

Documents