WhitePaper
Follow us on
Designing the ideal video streaming QoE analysis tool
Contents 1. Introduction .......................................................................................................................................................................... 1
2. Factors that impact QoE ................................................................................................................................................... 2
Encoding Profile ................................................................................................................................................................. 2
Network Conditions ........................................................................................................................................................... 3
Devices and Players ........................................................................................................................................................... 4
Combinations of Factors .................................................................................................................................................. 5
3. Measuring QoE – what are the output metrics? ........................................................................................................ 6
4. What does the ideal tool need to do? ........................................................................................................................... 7
5. Framework design ............................................................................................................................................................... 7
6. Video capture and analysis .......................................................................................................................................... 9
7. Summary ........................................................................................................................................................................ 10
1. Introduction EurofinsDigitalTestinghasbuiltapurposedesignedframeworkforconductingvideostreamingQualityofExperience(QoE)analysis.Inthiswhite-paperwedescribetheproblemanalysis,requirementsdefinitionanddesignprocessthatresultedintheresultingtoolthatwehavesinceusedonmultipleprojects.
OurbasicgoalwastoplaybackarangeofvideostreamsonarangeofdevicesandmeasuretheresultingQoE.However,thisisnotassimpleasitsounds.Inthisarticlewestartbyexaminingallthefactors(orinputs)thatcanultimatelyaffecttheenduser’sQoEwhenwatchingstreamedvideo,andhowtheseinputscanberepeatedlycontrolled.
WethenconsiderwhatQoEmetrics(oroutputs)shouldbemeasuredandputthisaltogethertodeduceasetofrequirementsthatshouldbemetbyanidealQoEanalysisframework.
Finally,wedescribehowweengineeredaframeworktomeettheserequirements,resultinginatoolthatcanmeasureQoEfordifferentcontent,networkconditionsanddevices/appsinawaythatisautomatedandeasilyvisualisedwhileprovidingdeepofflineanalysiscapabilities.
WhitePaper
Follow us on
2. Factors that impact QoE ThefactorsthataffectQoEcanbesummarisedinthefollowingpicture:
Figure1:OverviewoffactorsthatimpactQoE
Broadlyspeaking,thesefactorscanbeputintothreecategories:encodingprofile,end-to-endnetworkconditions,andthedevice/applicationthecontentisplayedbackon.Let’sconsidereachoftheseinturn.
Encoding Profile
Forourpurposesan“encodingprofile”referstotheentiresetofaudiovideodataandparametersdescribingasinglecontentasset(containingaudioandvideo,inmultipleadaptationsetsinthecaseofABRcontent).ThisistypicallystoredasasinglefragmentedMP4filebutdoesn’thavetobe.Anencodingprofilecanvaryinmanyways:
a) Contentcharacteristics–e.g.sport/movie/news/cartoonb) Codec–e.g.H.264/H.265/VP9/AV1c) Encodervendorchoice–identicalcodecscanhaveverydifferentperformanceaccordingto
theencoderusedd) ABRformat:
• HLSvsMPEG-DASH• Segmentlength
e) Numberandbitrateofrepresentationsorvariants–i.e.thedesignofthebitrateladderf) Videoresolutiong) Videoframe-rateh) Otherelementarystreamencodingparameters,including:
• Profile/Level• GOPstructure• CBR/VBR
Typically,higherbitratesareusedforhigherpictureresolutionsandframeratesandareimplicitlyassociatedwithhigherQoE.However,it’salsowidelyunderstoodthatthereisanon-linear
WhitePaper
Follow us on
relationshipbetweenbitrateandvideoqualityforagivenvideoresolutionandthat,aboveacertainthreshold,
simplyincreasingthebitrateavailabletoencodethevideoresultsininsignificantincreasesinpicturequality.Thismeanstherearesituationswhere,ifthereismorebandwidthavailable,simplyallocatingmorebitratebaseduponthecurrentprofile/resolution/frame-rateusedintheencodingprofilemaynotresultinnoticeablyimprovedQoEand,instead,perhapsahigherresolutionpictureshouldbeoffered.Ofcourse,mostoftheabovevideofactorshaveanaudioequivalent.FormostAVcontent,thevideobitrateisanorderofmagnitudegreaterthantheaudiobitratesohasgreaterimpactonQoE,whichiswhywefocussedonvaryingandcontrollingvideocharacteristic
Network Conditions
InanABRstreamingscenario,theselectionofvideostreamtoconsumefromasetofvideostreamsofvaryingbitratesisbasedontheprevailingnetworkconditionsatthetimeofviewing.Fluctuationsinthenetworkconditionscancausethequalityoftheviewingexperiencetodegradeorimprove.Thereisnosimplemodeloftheeffectdifferentnetworkimpairmentshaveonanygivenviewingeventsincethealgorithmusedbydifferentplayerswillresponddifferently,andtheimpactonqualityisoftenhighlynon-linear.
Thesignificantfactorsthatimpactnetworkconditionsforoneclientplayingonestreamincludethefollowing:
a) Effectivebandwidth/throughput(Mb/s)b) Packetlatency,jitter,reorderingandlossc) HTTPconnectionerrorrate
Allstagesintheend-to-endnetworkconnectionarerelevant:theperformanceoftheoriginserver,theCDN,theISP’slastmileconnection,andtheconsumer’shomenetworkallcombinetodeterminetheoverallnetworkconditions.Ourgoalwastobeabletomodelandcontroltheend-to-endnetworkcharacteristics,notanyindividualstage.
Thefactorslistedaboveallvaryovertimeandthenatureofthisvariationisacriticalfactorinmodellingreal-worldbehaviour.Weareabletosimulateawiderangeofbehaviourthroughaccesstoaverylargeandrichdatasetofreal-worldend-to-endnetworkconditions.
SincethemajorityofABRvideoistodaycarriedviaHTTPoverTCP,weassumedthatpacketlevelbehaviour(i.e.thesecondbulletpointabove)ishiddenbytheTCPlayerandweconstrainedjusttheeffectiveavailablebandwidth,ratherthanindividuallycontrolledfactorssuchaslatencyandjitter.InthecaseofmanyCDNstheHTTPconnectionerrorrate(i.e.thenumberoftimesaclientreceivesanHTTP404responseandhastore-requestasegment)issosmallthatitdoesnothaveamaterialimpactonQoE.
Thetimevaryingbitratethroughputavailabletotheclientwecallthe“networkprofile”
WhitePaper
Follow us on
Devices and Players
ThedeviceandviewingenvironmentusedtowatchthecontenthasaclearimpactontheQoEduetothefollowingfactors.
a) Decodercapabilities–formatsupportb) Deviceresourceload(CPU/memory)c) Displaysized) Displayresolutione) Otherdisplayproperties:
• Pixelrefreshspeed• Displayfilteringand“pixelimprovement”algorithms• Brightness• Viewingangle
f) ABRimplementationandbufferingalgorithmg) Networkstackcharacteristicsh) Viewingdistance&lightingconditions
Factorsa)andb)arecharacteristicsofthedevicesprocessinghardware,whilec),d)ande)arecharacteristicsofthedevice’sdisplayhardware.
Sometimethecontentmightnotbesupportedbythedeviceatallbecausethemediacodeclevelofthecontentisnotsupportedbythehardwarecomponentswithinthedevice(especiallywhenconsideringmobiledevicessuchasphonesandtablets).ThisisaparticularproblemonAndroiddeviceswherethemandatedlevelofcodecsupportisquitelowrelativetootherdevices,butitisknownthatmanydeviceshavehigherlevelsofcodecsupport.
Whiletheend-userdevicehardwareanddisplayitselfisobviouslyassociatedwithaviewer’sQoE,theimpactofthevideoplayerclientsoftwareusedtowatchthecontentisoftennotconsideredexplicitly.Factorsf)andg)abovearebothdeterminedbythedevice’ssoftware.Theplayersoftware,whetherattheapplicationlayerorpartofthebuilt-inmediaplayer,hasamaterialimpactontheQoEbecauseinABRvideoconsumptiontheplayerisresponsibleforchoosingwhichvariantbitratetostream,whentoswitch,andwhatbufferingstrategytoadopt.ABRalgorithmscanvaryveryconsiderablybetweenplayerapplicationsrunningonthesamedevice.
UnderstandinghowthesoftwareimpactstheQoEandaffectsthechoiceofencodingprofilesiscomplicatedbythefactthattherearedifferenttypesofplayeragentsoftwareandtheabilityofanOTTserviceprovidertoaffectthebehaviouroftheplayerdependsonthetypeofplayerinuse.Broadlyspeakingtherearetwoprimarycategoriesofplayer:
Embedded.Thisiswherethechoiceofbitratetostreamandpresentisdeterminedbyanunderlyingcapabilityoftheplatformtheplayerapplicationisrunningon.InthecaseofAppleiOS,HLSplaybackandbitrateselectionarecontrolledbyiOSitselfandtheappauthorcan’taffecttheadaptationbehaviourdirectly(ignoringoneortwoexceptionalcases,suchasNetflix).InthecaseofAndroid,thereisnativeHLSplaybackavailableviatheMediaPlayerAPI,butitislimitedbasedonthespecificversionofAndroid.Anotherexamplewouldbefor
WhitePaper
Follow us on
HbbTVdeviceswhichsupportMPEG-DASHviaprovidingthemanifest(MPD)directlytoanHTML5videoelementandtheunderlyingmiddlewarecontrolstheadaptationalgorithm;inotherwords,althoughit'sanHTMLapplication,HTMLMediaStreamingExtensions(MSE)arenotused.
Applicationcontrolled.Thisiswheretheserviceprovider’sapplicationitselfcontrolstheselectionofthebitrateandhasanalgorithmthatdetermineswhentoswitchprofile.Examplesinclude:HTMLbasedapplicationsusingMSE,e.g.theopensourcedash.jsandHLS.jsplayers:nativemediaplayerlibrariesthatcanbeembeddedintheapplication,e.g.theopen-sourceExoPlayerapplicationforAndroid;andcommerciallyavailablethird-partyplayerapplicationsthattheserviceprovidercancustomisetheUIfor.
InthisproposalthecombinationofdeviceandplayersoftwareisconsideredasasinglevariabledimensionwhenconsideringimpactonQoE.
Combinations of Factors
AllofthefactorsdescribedwithinthesethreecategoriescometogethertoaffecttheQoE.TheycanbevisualisedasdimensionsofsearchspacewhereeachpointinthatspacecorrespondstodifferentinputconditionsthatwillresultindifferentoutputQoE.
Inputconditions ResultingoutputQoEmetricsDevice/AppCombinations
EncodingProfiles
NetworkProfiles
QoEmetric1
QoEmetric2
…
DA1 EP1NP1 …
NPo
…NP1 … NPo
EPnNP1 … NPn
…
EP1NP1 … NPo
…NP1 … NPo
EPnNP1 … NPn
DAm
EP1NP1 … NPo
…NP1 … NPo
Figure2:AsearchspacerepresentationofthefactorsthatimpactQoE.Thetablerowsindicatethetestrunresultsfor‘m’device/appcombinations,‘n’encodingprofilesand‘o’networkprofiles.
NetworkProfiles
Device/Apps
Encoding Profiles
WhitePaper
Follow us on
Abriefconsiderationofallthefactorslistedabovequicklysuggeststhattestingallpossiblecombinationsleadstoavastsearchspaceduetothecombinatorialexplosion.Forexample,thefollowingexperimentwouldresultin25,000testruns!
• 50encodingprofiles,from,say,5contenttypes,2codectypes,5bitrateladderdesigns.
• 10networkprofiles• 50device/playercombinations,from,say,10deviceswith5differentweb-player
applications
Evenarelativelynarrowexperiment–e.g.examiningwhichsoftwareplayerondeviceXperformsbestforaspecifictypeofcontentandcodec–canrequirehundredsoftestrunstoenablestatisticallyrobustevaluation.Forthisreason,enablingefficientandrepeatablerunningofalargenumberoftestrunswasafundamentalrequirementofourtool.
3. Measuring QoE – what are the output metrics? ThefollowingmeasuresareallrelevanttoassessingtheQoEofstreamingvideo.
• Videostart-uptime:timefrominitiationofplaybacktofirstvideoframesbeingdisplayed.• Bufferingevents:thenumber,durationandlocationofeventsduringplaybackwherethe
videoispausedornotshowingwhilefurthersegmentsareacquired.• Switchingevents:thenumberandlocationofeventswheretheplayeradaptsfromone
bitratetoanother.• Frame-by-frameobjectivevideoquality.Tocalculatethis,weusedtheSSIMPLUSalgorithm–
afull-reference,objectivevideoqualitymetric.Wemadethischoiceafterevaluatinganumberofalternatives,includingVMAF,andthereasonforthechoiceofSSIMPLUShasbeenexplainedinaseparatearticle.
• Audioquality,includingaudio-videosync.
MeasuringABRQoEisanactiveandrelativelyimmatureresearchtopicandthereisnoagreedwayofcombiningthesemeasurementstoasimpleoverallQoEmetric.Worse,forsomeofthemetricsresearchexperimentshavecontradictoryconclusionashowevenasinglemeasureimpactsQoE.Forexample,considerthefrequencyandvisibilityofswitchingevents.Someresearchershaveconcludedthatswitchingeventsshouldbeavoidedasfaraspossibleleadingto“widelyspaced”encodingladderswithfewerrepresentations.OtherresearchershaveshownthatbarelyperceptibleswitchingeventshavelittleimpactonQoE,leadingto“tightlypacked”encodingladderswithahighernumberofrepresentations,whereeachadjacentrepresentationhasaqualitydifferencethatisonlyjustdistinguishable.Inpracticeitseemsthatencodingandstoragecostsarethemoreimportantfactorindeterminingthischoice:OTTproviderswithhighvalue,relativelystaticcatalogues(e.g.Netflix)willprefer“tightlypacked”ladders,whereasthosewithfastturnoverorlivecontent(e.g.broadcastercatch-upservices)arelikelytoprefer“widelyspaced”ladders.
WhitePaper
Follow us on
TheimpactonQoEofthenumberanddurationofbufferingeventsisequallypoorlyunderstood.Whilelessbufferingisobviouslydesirable,thereisnoconclusiveresearchonwhether,say,5x6sbufferingeventsisbetterorworsethanasingle60sevent.
4. What does the ideal tool need to do? So,havingevaluatedtheimpactfactorsthatneedtobevaried(networkconditions,encodingprofiles,device/players),andtheoutputmetricsthatneedtobemeasured,wearrivedatthefollowingsetofrequirementsforouranalysisframework.
• Automated.Thisistheonlywaytocostefficientlyrunhundredsofexperiments:itissimplynotviabletousemanualexecution.Additionally,measuringQoEaccuratelyandinarepeatablefashionmakestestautomationthebestchoice.
• Deviceagnostic.Theframeworkmustbeusableonanytypeofdevice(set-top-box,HDMIstick,mobilephone,tablet,computer,smartTV)andawidevarietyofscreentypeswithoutrelyingonframecaptureandanalysistechniquesthatworkonsomedevicesbutnotonothers.
• Contentandencoderagnostic.Wewantedthesolutiontobeagnosticofthecodec,encodingandpackagingplatformwithoutrelyingonparticulartricksorfeaturesofaparticularencoderpipeline.Thiswastomakesurewecouldevaluatedifferentcodecs,encodersandABRformatswithouthavingtore-architecttheframework.
• Networkreproducibility.Topreciselyandrepeatedlycontrolnetworkconditionsmeanstheframeworkhastorunwithinadedicatedlocalnetwork,whereavarietyoftypicalnetworkconditionscanbeaccuratelyreproduced.
• Powerfulresultsvisualisation.Allthemeasurementsmustbeautomaticallycapturedintoaspreadsheettoallowdetailedandcomprehensiveanalysis.However,wealsowantedtoprovideeasy,visualandintuitivebrowsingofindividualtestruns,allowingtheusertoseehowmetricsvaryovertimewhilealsobeingabletoseethecorrespondingvideothatwasbeingdisplayedontheclientdevice.Itwasalsoimportanttoenabledeepanalysisoftheresultswithouthavingtogobackandre-runaparticulartest–meaningthatallrelevanttestdate,suchasnetworktrafficlogs,shouldbestoredforlateruse.
• Accurate.Theframeworkmustofferframeaccurateanalysisoftheresultswhendeducingmetricslikeamountofbufferingtime,switchingeventsandstart-uptimes.Gatheringthesemetricswithsuchhighprecisionisnottrivial.
5. Framework design Theaboverequirementsresultedinthefollowingframeworkdesign,showninthediagramonthenextpage.
WhitePaper
Follow us on
Figure3:Schematicrepresentationoftestframework
Eachtestrunconsistsofplayingbackasinglepieceofvideocontentonasingletestdevice,whilemodifyingpropertiesofthenetworklinkbetweenthevideoserverandthedevice-under-test.Atestruncanbebrokenintothefollowingsteps:
• UsingAppium,SeleniumorIR,thedeviceisdriventoruntheappandplayavideofile.Theprecisesequenceofcontrolstepsdependsuponthedeviceinquestionandtheappbeingused.Carefultimesynchronisationismaintainedbetweenthedevicecontrolscript,thenetworkemulator,andthevideocaptureunit.
• Thenetworkprofileshapingscriptisexecutedwhichvariesthebandwidthforthedurationofthetestasrequired.Thenetworkshapingunitisalsoresponsibleforloggingallnetworktrafficduringthetests.
• ThecameraorHDMIcapturerecordsthedisplayscreenofthedevice-under-testfortwicetheknowndurationofthecontent(toallowforanybufferingevents).Therecordedvideoisstoredforeverytestrunforsubsequentanalysisandforusewithintheresultsvisualisation.
• Thedevicestartstoplaybackthelocalvideocontentandcontinuesuntilplaybackiscomplete.• Therecordedvideoisstoredandanalysedoffline,givingalistofalltheframe/representations
playedateverypointoftimeduringthecontentplayback.• Thisdataisusedtocalculateasetofqualitymetrics(listedabove)andstorethemina
spreadsheet.AtthesametimeanHTML/JavaScriptvisualisationofalltheresultsiscreatedandhostedinthecloudtoenablevisualisationandanalysisoftheresultsfromanylocation.
TestRunDescription
Device/appscript
Encodingprofile
Networkprofile
NetworkShaping
VideoCapture
VideoServer
DeviceControl
Testcontrol&sync
ResultsAnalysis
ViaIR,AppiumorSelenium
LAN
Devicesundertest
Camera
HLSsegmentsoverHTTP
Resultsandvisualisation
HDMICapture
WhitePaper
Follow us on
Thesestepsarethenrepeatedforeachtestrun,withtheURLofthevideocontentandthenetworkprofileadjustedtowhatisrequiredforthattestrun.Betweeneachtestrunthedeviceisrebootedandtheapplication
isrestartedtoensurethatallcachesarecleared.
6. Video capture and analysis EachsourcevideotypehastwosmallQRcodeembeddedinthesourcecontentthatvariesforeveryframeandforeveryrepresentation/variant.UsingacamerarecordingorHDMICapturefromthedevice/appplayingthestreamedvideo,werobustlyandautomaticallydetecteveryframethatwasplayedoutonthedevice/appundertestanddeterminewhichrepresentationwithintheABRformattheframewastakenfrom.Thisallowsfordetectionofdroppedframes,blankframesorbuffering,repeatedframes,anddetectionofswitchingevents.
ThecapturedvideoandcontainedQRcodesareonlyusedtocalculatetheexactframethatisbeingdisplayedatagivenmomentintime.Thequalityofthevideodisplayedbythedeviceisnotdeterminedusingtherecordedvideo.Instead,eachencodingofthesourcecontentispre-processedtodeterminetheSSIMPLUSscoreofeveryframewithineveryrepresentation/variant.WhenanalysingthetestrunthisSSIMPLUSscoreisthenlookedupforeverycapturedframe,meaningthatthequalityoftherecordedvideohasnoimpactontheSSIMPLUSscores.
Theresultsetforeachtestrunisveryrich,containinginformationabouthowthedisplayedframe,SSIMPLUSscore,availablebandwidthandutilisedbandwidthvaryovertime.Tofacilitateinterpretationofthisinformation,displayedalongsidethevideoandtheinferredQoEmetricstheresultscanbepresentedinaninteractivemannerinanywebbrowser.Byhoveringthemouseoverthetimeline,theusercanseethevalueofvariouspropertiesatthatmomentintime,aswellastheprecisevideoframethatwascapturedatthattime.
Figure4onthenextpageshowsatypicalresultsvisualisationforasingletestrun.
WhitePaper
Follow us on
Figure4:Atypicalresultsvisualisationforasingletestrun
7. Summary OurQoEanalysistoolhassucceededinfulfillingourdesignrequirementsandhasbeensuccessfullyusedtoevaluatevideostreamingQoEperformanceacrossmultipleprojects.Thishasinvolvedmorethan20device/appcombinations,10encodingprofilesandover10networkprofiles;leadingtohundredsofindividualtestruns.AlreadythishasledourcustomerstohaveagreaterdeeperinsightintohowtheirsystemchoicesimpactQoE–inmanycasesreachingconclusionsthatarenon-obviousandcounter-intuitive.
Ifyouwouldliketoseeademonstrationofthesystem,[email protected],youcanexploreasubsetoftheresultsbyvisitinghttps://ott-qoe.eurofins-digitaltesting.com
Follow us on