Programmable edge-to-cloud virtualization fabric for the ......Diagram of VNF deployment in TID...

Project co-funded by the European Commission under the Horizon 2020 Programme.

Programmable edge-to-cloud virtualization fabric for

the 5G Media industry

D6.3: 5G-MEDIA Mobile Contribution, Remote and Smart Production Pilot

Work Package: WP6: 5G-MEDIA Use Case Scenarios and Validation

Lead partner: IRT

Authors: Panagiotis Athanasoulis, Stamatia Rizou [SILO], George Agapiou [OTE], Oscar Prieto, Gabriel Solsona [RTVE], David Griffin, Khoa Phan, Morteza Kheirkhah, Miguel Rio [UCL], David Jiménez, Federico Alvarez, Javier Serrano, José Manuel Menéndez [UPM], Gordana Macher, Madeleine Keltsch, Felix Oberhardt, Christoph Brendes [IRT], Igor Fritzsch, Truong-Sinh An [BIT], Francesco Iadanza [ENG], Alberto Florez, Rocío Ortiz [TID], David Breitgand, Avi Weit [IBM], Kourtis Michail-Alexandros [NCSRD]

Delivery date (DoA): 29 February, 2020

Actual delivery date: 09 April, 2020

Dissemination level: Public

Version number: 1.0 Status: Final

Grant Agreement N°: 761699

Project Acronym: 5G-MEDIA

Project Title: Programmable edge-to-cloud virtualisation fabric for the 5G Media industry

Instrument: IA

Call identifier: H2020-ICT-2016-2

Topic: ICT-08-2017, 5G PPP Convergent Technologies, Strand 2: Flexible network applications

Start date of the project: 1 June, 2017

Duration: 33 months

5G-MEDIA - Grant Agreement number: 761699 D6.3: 5G-MEDIA Mobile Contribution, Remote and Smart Production Pilot

Page 2 / 126

Revision History

Revision Date Who Description

0.1 13/06/2019 IRT ToC version

0.2 10/02/2020 IRT • Revision of ToC to avoid duplications

• Integration of ENGs testbed contribution

• Addition to the production profile table by UCL

0.3 13/02/2020 IRT • Additional refinements on the ToC as agreed in the Plenary Meeting in Thessaloniki

0.4 27/02/2020 IRT • Integration of TIDs testbed contribution

• Integration of the RTVE contribution

0.5 13/03/2020 IRT • Integration of NSCRD testbed contribution

• Integration of the vCE GUI work

• Integration of BITs contribution

• Integration of RTVEs “Deployment of a real live event” contribution and the “Remote Production” Conclusion

• Integration of UPMs contribution to MPE, vProbe, vUnpacker and Section 3.4 QoE

• Integration of SiLo’s MAPE contribution

0.6 19/03/2020 IRT • Integration of the Telefónica testbed additions

• Integration of IBM’s contribution to the Mobile Journalist scenario

• Integration of UCL’s contribution to the SS-CNO and O-CNO

0.7 24/03/2020 IRT • Integration of ENG’s contribution to the billing part

0.8 26/03/2020 IRT • Integration of OTE’s minor comments on Section 1 and Section 4.1.2

• Integration of ENG’s minor comments on Section 4.1.1

• Integration of NSCRD’s minor comments on Section 4.1.4

• Integration of SiLo’s minor comments on Section 4.1.1

0.9 30/03/2020 IRT • Integration of OTE’s minor comments

• Integration of ENG’s updated figures in Section 3.2.3


Page 3 / 126

Revision Date Who Description

0.10 01/04/2020 IRT • Integration of the updated 5G-MEDIA Platform figure (Figure 1)

• Integration of UCL’s minor comments

0.11 06/04/2020 IRT • Integration of UCL’s contribution on the Multi-UC scenario in Section 4.3.2

0.12 08/04/2020 IRT • Integration of UCL’s minor updates

Quality Control

Role Date Who Approved/Comment

Reviewer 28/03/2020 RTVE Approved

Reviewer 06/04/2020 UCL Approved with minor comments.


Page 4 / 126

Disclaimer

This document may contain material that is copyright of certain 5G-MEDIA project beneficiaries and may not be reproduced or copied without permission. The commercial use of any information contained in this document may require a license from the proprietor of that information. The 5G-MEDIA project is part of the European Community's Horizon 2020 Program for research and development and is as such funded by the European Commission. All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability. For the avoidance of all doubts, the European Commission has no liability with respect to this document, which is merely representing the authors’ view.

The 5G-MEDIA Consortium consists of the following organisations:

Participant number

Participant organisation name Short name

Country

01 ENGINEERING – INGEGNERIA INFORMATICA SPA ENG Italy

02 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel

03 SINGULARLOGIC ANONYMI ETAIREIA PLIROFORIAKON SYSTIMATON KAI EFARMOGON PLIROFORIKIS

SILO Greece

04 HELLENIC TELECOMMUNICATIONS ORGANIZATION S.A. - OTE AE (ORGANISMOS TILEPIKOINONION TIS ELLADOS OTE AE)

OTE Greece

05 CORPORACION DE RADIO Y TELEVISION ESPANOLA SA RTVE Spain

06 UNIVERSITY COLLEGE LONDON UCL United Kingdom

07 TELEFÓNICA INVESTIGACION Y DESARROLLO SA TID Spain

08 UNIVERSIDAD POLITECNICA DE MADRID UPM Spain

09 INSTITUT FUER RUNDFUNKTECHNIK GMBH IRT Germany

10 NEXTWORKS NXW Italy

11 ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS ANAPTYXIS CERTH Greece

12 NETAS TELEKOMUNIKASYON ANONIM SIRKETI NET Turkey

13 INTERINNOV SAS IINV France

14 BITTUBES GMBH BIT Germany

15 NATIONAL CENTER FOR SCIENTIFIC RESEARCH “DEMOKRITOS”

NCSRD Greece


Page 5 / 126

Table of Contents

EXECUTIVE SUMMARY .................................................................................................... 12

1 INTRODUCTION ....................................................................................................... 13

1.1 SCOPE OF 5G-MEDIA .............................................................................................................................. 13

1.2 SCOPE OF THE DOCUMENT .......................................................................................................................... 16

1.3 STRUCTURE OF THE DOCUMENT ................................................................................................................... 16

2 MOBILE CONTRIBUTION, REMOTE AND SMART PRODUCTION PILOT OVERVIEW....... 17

2.1 MOBILE CONTRIBUTION, REMOTE AND SMART PRODUCTION STORYLINE DESCRIPTIONS ......................................... 17

2.1.1 Current situation and where to go from here ................................................................................. 17

2.1.2 “Remote Production” scenario storyline descriptions .................................................................... 21

2.1.3 “Mobile Contribution” scenario storyline descriptions ................................................................... 22

2.2 KEY PERFORMANCE INDICATORS.................................................................................................................. 23

3 TECHNICAL DESCRIPTION ......................................................................................... 26

3.1 5G-MEDIA COMPONENTS USED IN USE CASE 2 ............................................................................................ 26

3.1.1 Cognitive Services ........................................................................................................................... 26

3.1.2 Media Process Engine ..................................................................................................................... 37

3.1.3 vCompression Engine ...................................................................................................................... 37

3.1.4 vProbe ............................................................................................................................................. 42

3.1.5 vUnpacker ....................................................................................................................................... 42

3.1.6 5G-MEDIA App ................................................................................................................................ 42

3.1.7 Splitter ............................................................................................................................................. 43

3.1.8 5G-MEDIA Endpoint ........................................................................................................................ 46

3.1.9 5G-MEDIA Gateway ........................................................................................................................ 46

3.2 5G-MEDIA SERVICES USED IN USE CASE 2 ................................................................................................... 47

3.2.1 Monitor Analyze Plan Execute (MAPE) ........................................................................................... 47

3.2.2 Service Optimization with the Cognitive Network Optimizer ......................................................... 50

3.2.3 Quality of Experience (QoE) Data Gathering for billing model ....................................................... 60

3.2.4 Serverless with Function as a Service (FaaS) .................................................................................. 62

4 VALIDATION ............................................................................................................ 65

4.1 TESTBEDS ................................................................................................................................................ 65

4.1.1 Engineering Ingegneria Informatica (ENG) Testbed ....................................................................... 65

4.1.2 Hellenic Telecommunications Organization (OTE) Testbed ............................................................ 67

4.1.3 Telefónica (TID) Testbed ................................................................................................................. 68

4.1.4 National Centre for Scientific Research “DEMOKRITOS” (NCSRD) Testbed .................................... 71

4.2 DEPLOYMENTS OF THE “REMOTE PRODUCTION” SCENARIO .............................................................................. 73


Page 6 / 126

4.2.1 “Remote Production” scenario — Initial version (1st Cornerstone, CS1.1) ..................................... 73

4.2.2 “Remote Production” scenario with SMPTE ST 2110 (2nd Cornerstone, CS1.2) .............................. 74

4.2.3 “Remote Production” — Proof of Concept (3rd Cornerstone, CS1.3) .............................................. 78

4.2.4 “Remote Production” — Multi-Instance-Scenario (4th Cornerstone, CS1.4) ................................... 88

4.2.5 Description of the testbed updates and the configuration of the VNFs used in the implemented workflows ...................................................................................................................................................... 90

4.3 DEPLOYMENT OF THE “MOBILE CONTRIBUTION” SCENARIO.............................................................................. 93

4.3.1 “Mobile Contribution” scenario — General Demo Workflow (1st Cornerstone, CS2.1) .................. 93

4.3.2 “Mobile Contribution” scenario — Multi-Use Case scenario (2nd Cornerstone, CS2.2) ................ 103

4.4 RESULTS................................................................................................................................................ 106

5 CONCLUSIONS ....................................................................................................... 109

5.1 MAIN ACHIEVEMENTS ............................................................................................................................. 109

5.2 LESSONS LEARNED .................................................................................................................................. 110

5.3 SUMMARY ............................................................................................................................................. 110

6 REFERENCES .......................................................................................................... 112

7 ANNEX................................................................................................................... 113

7.1 CONVENTIONAL PRODUCTION VS. 5G-MEDIA REMOTE PRODUCTION............................................. 113

ABSTRACT .................................................................................................................................................... 113

PROGRAMME DETAILS ................................................................................................................................ 113

RESOURCE ENUMERATION ......................................................................................................................... 113

5G-MEDIA PRODUCTION ENUMERATION ................................................................................................... 114

RESOURCE AND COST COMPARISON .......................................................................................................... 115

CONCLUSIONS AND FURTHER ACTIONS ...................................................................................................... 123

7.2 BROADCAST CONTENT EXCHANGE - PARADIGMS AND APPROACH TO 5G TECHNOLOGY .............................. 124

ABSTRACT .................................................................................................................................................... 124

BROADCAST CONTENT EXCHANGE TAXONOMY ......................................................................................... 124

CONTRIBUTION ........................................................................................................................................... 125

DISTRIBUTION ............................................................................................................................................. 126


Page 7 / 126

List of Figures

Figure 1: 5G-MEDIA Architecture ............................................................................................. 14

Figure 2: Use Case 2: "Mobile Contribution, Remote and Smart Production" Signal flow...... 15

Figure 3: Use Case 2 - Scenarios overview ............................................................................... 20

Figure 4: Use Case 2, “Mobile Contribution” scenario with FaaS architecture ....................... 22

Figure 5: Speech-to-Text Engines of Use Case 2 Remote and Smart Production and “Mobile Contribution” Scenario .......................................................................................... 27

Figure 6: Image/Face Recognition Engine: Design approach ................................................... 29

Figure 7: Image/Face Recognition Diagrams for the pre trained model TinyYOLO v2 (blue) and SSD MobileNet v1 (red). On the left side in the CPU version, and on the right side the GPU version. .................................................................................................................. 34

Figure 8: vCE GUI screenshot ................................................................................................... 40

Figure 9: vCE GUI scheme ......................................................................................................... 40

Figure 10: GUI of Periscope App and ZDF Reporter App .......................................................... 43

Figure 11: Splitter design and implementation ........................................................................ 44

Figure 12: Involved MAPE services in UC2 ............................................................................... 47

Figure 13: Provision of OpenNebula-NFVI level monitoring data using its XML-RPC API ........ 48

Figure 14: Provision of FaaS-NFVI level monitoring data using the Prometheus API in kubernetes cluster ............................................................................................................... 48

Figure 15: Virtual Compression Engine day1, 2, …n configuration though the MAPE services ................................................................................................................................ 50

Figure 16: A3C........................................................................................................................... 52

Figure 17: CNO-RL..................................................................................................................... 56

Figure 18: CNO hierarchical architecture ................................................................................. 58

Figure 19: Billing Data workflow .............................................................................................. 61

Figure 20: QoE range comparison ............................................................................................ 61

Figure 21: QoE range 3.0-4.0 (left table) and QoE range 3.0-4.5 (right table) for premium users ..................................................................................................................................... 62

Figure 22: QoE range 2.0-3.0 (left table) and QoE range 2.0-4.0 (right table) for standard users ..................................................................................................................................... 62

Figure 23: OTE labs infrastructure ............................................................................................ 67

Figure 24: OnLife Infrastructure ............................................................................................... 69

Figure 25: Logical architecture and components of the CTpd project ..................................... 69

Figure 26. Diagram of VNF deployment in TID testbed and connector to OSM ...................... 70


Page 8 / 126

Figure 27: Diagram of how the 5G-Media universe are implemented over the infrastructure ....................................................................................................................... 70

Figure 28: NCSRD spine – leaf network topology..................................................................... 71

Figure 29: NCSRD Data Center ................................................................................................. 72

Figure 30: Use Case 2, “Remote Production” scenario CS1.1 - Screenshot of Ericsson Video Processor GUI ............................................................................................................ 74

Figure 31: Overall Architecture of the “Remote Production” scenario (C1.2) ......................... 75

Figure 32: Sony PDW-F1600 video player ................................................................................ 75

Figure 33: Use Case 2, “Remote Production” scenario CS1.2, - Screenshot of a typical Nevion Virtuoso IP Production Platform GUI ...................................................................... 76

Figure 34: Overall overview of the vCE-Dashboard ................................................................. 77

Figure 35: Overall overview of the Traffic Manager-Dashboard ............................................. 77

Figure 36: Website of RTVE’s radio 3 ....................................................................................... 78

Figure 37: The different location of the Use Case 2 Matadero Demo ..................................... 79

Figure 38: Simplified Architecture of the “Remote Production” scenario 1a (CS1.3) ............. 80

Figure 39: Internal test to measure NTP clock accuracy in several Android devices............... 81

Figure 40: Audio Video lip sync test with a conventional clapboard ....................................... 82

Figure 41: Real end-to-end time lag measurement ................................................................. 82

Figure 42: PMW-500 Sony camcorder ..................................................................................... 83

Figure 43: Shows the Floor layout of Cineteca’s Matadero Cutltural Space with camera and audio positions ............................................................................................................. 84

Figure 44: Torrespaña HQ Edge – Technical room on Gallery ................................................. 85

Figure 45: Shows the two double input HD-SDI to SMPTE-2110 IP Embrionix SFP gateways attached to a Cisco 9300 Catalyst switch ............................................................ 85

Figure 46: The event location Matadero in Madrid. On the right the Cineteca building and on the left the building with Telefónica’s point of presence ....................................... 86

Figure 47: Use Case 2, ”Remote Production” Multi-Instance Scenario ................................... 89

Figure 48: Overall overview of the vCE-Dashboard of a multi-instance production ............... 90

Figure 49: Use Case 2 – 5G-MEDIA UC2 VMs deployed in OnLife ........................................... 91

Figure 50: Use Case 2.b: mobile contribution. High level description ..................................... 94

Figure 51: UC2.b Orchestration Flow ....................................................................................... 95

Figure 52: Integration of UC2.b Orchestration Flow with CNO ............................................... 99

Figure 53: end-to-end latency in mobile contribution ........................................................... 102

Figure 54: Multi-Use Case-Scenario, O-CNO and SS-CNO interaction ................................... 104


Page 9 / 126

List of Tables

Table 1: Overview of the different mobile units ...................................................................... 18

Table 2: Use Case 2 - KPIs ......................................................................................................... 23

Table 3: Overview of Image/Face Recognition relevant Parameters ...................................... 35

Table 4: Lookup table for Production Profiles with H.264 and H.265 encoding ...................... 39

Table 5: Overview of the splitter functions .............................................................................. 45

Table 6: Lookup table for H.264 Compression Levels .............................................................. 55

Table 7 Engineering testbed components................................................................................ 65

Table 8: Components overview of the OTE testbed ................................................................ 67

Table 9: Overview of the NCSRD testbed infrastructure ......................................................... 72

Table 10: Use Case 2, Multi-Instance scenario – VNF parameters/properties in OnLife ........ 92

Table 11: Overview of the deployed vm’s in Telefónica’s testbed OnLife for the “Mobile Contribution” scenario ........................................................................................................ 93

List of Messages

Message 1: JSON example of a message received as input ..................................................... 30

Message 2: JSON example of a metadata result of the processing Recognition step which will be added to the Report object ...................................................................................... 31

Message 3: Lines of an SSA/ASS subtitles file, that show an example of a rectangle (vector graphic) around the face of a person labelled as Albert Einstein. ...................................... 33

Message 4: JSON example of a "app.vce.metrics" message received as input for the GUI .... 41

Message 5: JSON example of a " ns.instances.demandconf" message send to the Kafka bus after it is confirmed in the GUI ..................................................................................... 41

Message 6: Permission enquiry from SS-CNO UC2_MC to O-CNO to use one GPU .............. 100

Message 7: Reply message with a confirmation from the O-CNO to the SS-CNO UC2_MC to grant a GPU ................................................................................................................... 100

Message 8: Edge Selector to SS-CNO Message ...................................................................... 101

Message 9: SS-CNO Response Message to Edge Selector ..................................................... 102


Page 10 / 126

Definitions and acronyms

5G-PPP 5G Infrastructure Public Private Partnership

HbbTV Hybrid broadcast broadband TV

AAA Authentication, Authorization, Accounting

HD High Definition

ALM Application Lifecycle Management

HLS HTTP Live Streaming

AVB Audio Video Bridging HQ Headquarter

AVC Advanced Video Coding ISR Integrated Services Router

ODL Open Day Light KPI Key Performance Indicator

CDN Content Delivery Network MANO Management and Orchestration

CI Continuous Integration MAPE Monitor Analyze Plan Execute

CMAF Common Media Application Format

MM Multimode

CNO Cognitive Network Optimizer MOS Mean Opinion Score

CO Central Office MPE Media Process Engine

CORD Central Office Re-architected as a Datacentre

MPEG Motion Picture Experts Group

CPU Central Processing Unit MPEG-TS MPEG Transport Stream

CTpd Central Telefónica de procesamiento de datos

MSP Media Service Provider

DASH Dynamic Adaptive Streaming over HTTP

NFV Network Function Virtualisation

FaaS Function as a Service NFVI Network Function Virtualisation Infrastructure

FPGA Field Programmable Gate Array

NS Network Service

GPU Graphical Processing Unit NTP Network Timing Protocol

GoP Group of Pictures O-CNO Overarching- Cognitive Network Optimizer

GPU Graphical Processing Unit OCP Open Compute Project


Page 11 / 126

ODL Open Day Light SM Single Mode

OICD OpenID Connect SMPTE Society of Motion Picture and Television Engineers

ONOS Open Network Operating System

SO Service Orchestrator

OSM Open Source MANO SS-CNO Service Specific-Cognitive Network Optimizer

OTT Over-The-Top SSD Single Shot Multibox Detector

OW OpenWhisk SVP Service Virtualisation Platform

PaaS Platform as a Service TCP Transmission Control Protocol

PD Progressive Download TI Tele-immersive

PDV Packet Delay Variation TVM Time Varying Mesh

PNF Physical Network Function UHD Ultra High Definition

PoC Proof of Concept VAD Voice Activity Detection

PSTN Public Switched Telephone Network

VF Virtual Functions

PTS presentation timestamp VIM Virtualised Infrastructure Manager

QoE Quality of Experience VM Virtual Machine

QoS Quality of Service VNF Virtual Network Function

R CNN Region-based convolutional neural network

VNF-FG VNF-Forwarding Graph

RO Resource Orchestrator VoD Video on Demand

RTT Round trip time vOLT virtual Optical Line Terminal

SDI Serial Data Interface vROADM virtual reconfigurable optical add/drop multiplexer

SDK Service Development Kit WAN Wide Area Network

SDN Software-Defined Networking

XGS-PON 10 Gigabit-capable Symmetrical Passive Optical Network

SFP Small Form-factor Pluggable YOLO You only look once

SLA Service Level Agreements


Page 12 / 126

Executive Summary

The 5G PPP 5G-MEDIA Phase 2 project has worked in the design and implementation of a platform that supports the media service lifecycle management providing a holistic solution encompassing mechanisms and tools for the development, testing and continuous optimized deployment of media services.

With Use Case 2 (UC2): “Mobile Contribution, Remote and Smart Production” a pilot that covers several scenarios was set-up, with the intention to validate the capabilities of the platform to flexibly develop, deploy, and optimize media service applications.

In this deliverable we have described and reported the various validation activities and outcomes of the tests and pilots that intended to approve this goal. Therefore, for the “Remote Production” scenario different single-instance workflows in the versions 1a, 1b, and 1c were set-up and the different MAPE (Monitor Analyze Plan Execute) services that provide the metrics and the allocation of the available bitrate with respect to the existing bandwidth were tested. Even more complex scenarios were implemented to test the bitrate allocation under aggravating conditions. For this, a multi-instance scenario in the workflow implementation 1a that runs in parallel with a workflow implementation 1c was implemented for further tests. To complete the “Remote Production” tests, a proof of concept (PoC) was made. Therefore, a remote production with three cameras was set-up in Matadero (Madrid (ES)) to broadcast a radio event. This proof of concept of the “Remote Production” scenario was successful. Within the PoC it met the end-to-end latency KPI with a latency of 10 frames. The intended setup was made, the live production happened smoothly, and the distribution made it possible to reach RTVE’s viewers via web. Additionally, with these different setups it could be shown, that the different KPIs could be met (cf. Section 4.4). With the development of specific production profiles, for every production profile a lower and an upper bandwidth was defined, based on the codec and the category of the content (cf. Table 4: Lookup table for Production Profiles with H.264 and H.265 encoding). This lower and upper bandwidth is later on used by the CNO (Cognitive Network Optimizer) as border to select the appropriate compression level regarding the available bandwidth (cf. Section 3.2.2 of this document). Additionally, the lower border guaranteed a minimum data rate.

The other scenario of UC2 “Mobile Contribution” covers the work of mobile journalists. The cognitive services of the 5G-MEDIA Platform offer media services for them that enhance their produced content during the delivery to their broadcaster’s headquarter. Additionally, the UC2 “Mobile Contribution” scenario was tested and the work of the O-CNO (Overarching-Cognitive Network Optimizer) was approved in a multi-use case (Multi-UC) scenario with UC1. In this scenario the O-CNO is in charge of the allocation of GPUs between these two use cases. Both scenarios (UC1 and UC2 “Mobile Contribution”) were sharing GPU resources in the same Edge in this scenario.


Page 13 / 126

1 Introduction

1.1 Scope of 5G-MEDIA

Media applications such as “Mobile Contribution, Remote and Smart Production” are among the most challenging applications in term of QoS, since they demand near real-time delivery and processing of media content across distributed geographical locations. Recent advancements in 5G technologies can provide advantageous solutions for the realization of complex demanding media-related scenarios enabling the well-timed media content delivery and processing though intelligent and flexible service orchestration.

The 5G PPP 5G-MEDIA Phase 2 project has worked in the design and implementation of a platform that supports the media service lifecycle management providing a holistic solution encompassing mechanisms and tools for the development, testing and continuous optimized deployment of media services. The main innovations of the proposed platform lie in the following parts: (a) the offering of a complete Service Development Kit, including tools for the media service validation and emulation as well as for the testing of optimisation algorithms in emulation environment and the continuous corrective sizing of resources required after the media service deployment; (b) the development of two auxiliary services, the 5G-MEDIA Service Catalogue and the AAA mechanisms facilitating the media service management as well as account and billing mechanisms; (c) the introduction of a multi-hierarchical cognitive network optimizer catering for the continuous optimisation of media services during and after deployment; and (d) the enablement of both traditional as well as serverless service orchestration allowing the on demand deployment of services at run time. Although some of the main innovation aspects of the 5G-MEDIA solution can be applied in other application domains, beyond media industry, the 5G-MEDIA project has focused on the validation and evaluation of the solution in the media vertical, providing promising results for the value and use of the platform (cf. Figure 1).


Page 14 / 126

Figure 1: 5G-MEDIA Architecture


Page 15 / 126

The 5G-MEDIA approach has been tested in three different media-related use cases validating the capabilities of the platform to flexibly develop, deploy and optimize media service applications. One of these use cases is Use Case 2 (UC2): “Mobile Contribution, Remote and Smart Production”.

Figure 2: Use Case 2: "Mobile Contribution, Remote and Smart Production" Signal flow

The goal of UC2: “Mobile Contribution, Remote and Smart Production” in the 5G-MEDIA project was to examine how professional remote and smart broadcast productions can benefit from the advancement in 5G technology today and in the future. This use case enables remote production of an event without the need for dedicated infrastructure to be specifically deployed in the event venue. In this context, cameras and audio equipment at the venue are connected via a 5G network to media production applications deployed and orchestrated in the Cloud. Another variation of this use case considers the streaming of live events via smartphones or tablets by spectators or journalists. The role of the 5G-MEDIA Platform in this use case is to ensure that the media processing functions are efficiently deployed in cloud infrastructure enabling low latency and high throughput as required by live streaming and media processing. This is achieved by using the 5G-MEDIA Service MAPE to optimize bitrate/compression levels of media streams and ensure QoE for the realization of the remote production scenario. In addition, this use case demonstrates the use of FaaS orchestration by automatically deploying media services at the start of a mobile contribution session.


Page 16 / 126

1.2 Scope of the document

This deliverable reports the outcomes of the experimentations and pilots with applications using 5G-MEDIA for (mobile) contribution as well as for remote and smart production. It is a follow up on Deliverable D6.1, Section 3 [1]. It reports on the work done regarding realization, setups, testing, demos and pilots with applications using the 5G-MEDIA Application Platform for the “Mobile Contribution, Remote and Smart Production Pilot”. Basically, this is a review of the Key Performance Indicators (KPIs). The KPIs were described in Deliverable D2.2 [2] from a general point of view. In Deliverable D6.1 [1] they were mapped to the UC2 scenarios and specific testbeds. Furthermore, minimum requirements for the KPIs were defined (cf. Section 2.2, Table 2). This document also provides the outcomes of tests and pilots.

1.3 Structure of the document

This deliverable is structured as follows.

Section 1 provides a general scope of 5G-Media and the document and gives an overview of the structure of the document. Section 2 provides an overview of the UC2: “Mobile Contribution, Remote and Smart Production Pilot”. This is realized by providing a summary and recap of the pilot storylines for the “Remote Production” scenario as well as for the “Mobile Contribution” scenario, in terms of main goals and media related objectives. This paragraph is concluded by a summary report of the KPIs assessment. Section 3 describes all of the technical details for the UC2 pilot, including its media service application view and how the 5G-MEDIA Platform components have been integrated and used for the media service orchestration and cognitive optimisation in terms of the “Remote Production” and “Mobile Contribution” scenario. The integration of the FaaS paradigm and the account and billing part are also provided. Section 4 provides the details of the various pilot validation activities performed for the “Remote Production” and “Mobile Contribution” scenarios. It contains the explanations of the different used testbeds, the deployment of each demo and the results. Section 5 completes the UC2 testing/activities and provides some closing remarks. Section 6 provides the references used in this document. And Section 7 closes the document with an Annex, which provides additional, detailed information on broadcast contribution. It also contains material referred to in the previous Sections in full.


Page 17 / 126

2 Mobile Contribution, Remote and Smart Production Pilot overview

This Section provides the fundamentals of UC2 and is structured as follows. In Section 2.1 general information about the current situation on broadcast productions is given as well as a summary and recap of the pilot storylines for the “Remote Production” scenario as well as for the “Mobile Contribution” scenario. Section 2.2 closes this Section with a table on the KPIs assessment.

2.1 Mobile Contribution, Remote and Smart Production storyline descriptions

2.1.1 Current situation and where to go from here

Public service broadcasters have the task of informing, educating, advising and maintaining the public. In Germany it is intended to "meet the democratic, social and cultural needs of society", as stated in the Broadcasting Act, the so-called State Broadcasting Treaty. With their program they should offer topics on education, information, advice and entertainment. Entertainment should also correspond to a public service profile.

To fulfil these tasks broadcasters, people use different production methods like studio- and outside broadcast productions. Especially for outside broadcast productions various mobile units like Outside Broadcasting Vans (OB-Vans) with a weight > 7.5 t that are equipped with several cameras, tripods, all sorts of cables, a fully operative control room as it is used in studio productions as well as transmitting devices like transportable satellite antennas. OB-Vans also called SNGs (Satellite News Gathering) from the size of a Mercedes Sprinter that is equipped with two to three cameras and a lightweight broadcasting equipment down to the size of a Mercedes Smart. These Smarts are equipped with one camera and a transportable satellite antenna to enable a direct delivery of the stream to the broadcaster. The smallest mobile units are reporter backpacks like LiveU that use bonded uplink techniques or tools like iPads/iPhones with additional hardware like they are used by mobile journalists (cf. Table 1: Overview of the different mobile units).


Page 18 / 126

Table 1: Overview of the different mobile units

Mobile unit 1: OB-Truck

RTVEs mobile studio at the UEFA Champions League 2014 ©RTVE1

Description: OB-Vans are the largest mobile unit on broadcaster site. These units can be extended laterally and so that there is space for numerous workplaces. But since it is no longer about vehicles with just a “transmission” technique, but rather a completely mobile video- and audio control room we referred here to them as “MPs = Mobile production facilities”. In these MPs of the latest generation up to around 30 employees find their workstations, where they processed up to 16 HD camera signals and worked with up to two slow motion units.

Types: Trucks > 7.5 t up to lorries with trailers

Production team size: 7 - 30 persons

Operational area: Big-sized productions

1 Source: www.svgeurope.org page from the championsleague2014


Page 19 / 126

Mobile unit 2: OB-Van / SNGs

2a) 2b)

© Studio Hamburg Media Consult International

(MCI) GmbH

© Raimond Spekking / CC BY-SA 4.0

(via Wikimedia Commons),

Smart-Übertragungswagen des WDR,

CC BY-SA 4.0

Description: Particular used as an uplink vehicle during live broadcasts. Equipped with a lightweight broadcasting equipment it can also be used as a video control for mobile productions.

Particular used as an uplink vehicle that uses transportable satellite antennas during live broadcasts.

Types: Mercedes Sprinter SMART

Production team size: 2 – 3 Persons 1 Person

Operational area: from bigger medium-sized down to medium-sized productions

smaller medium-sized productions

Mobile unit 3: Mobile Journalist Units

3a) 3b)

© Studio Hamburg Media Consult International

(MCI) GmbH

© http://www.film-tv-

video.de/newsdetail.html

Description: Backpack units that uses bonded uplink technique to deliver the produced content to the broadcaster.

iPad or iPhone units with a rig and extensions for light, sound, objective, etc. and an app for photo-, video editing, file-sync and upload.

Types: Backpacks iPad and iPhones

Production team size: 2 Persons 1 Person

Operational area: Small-sized production Small-sized productions

https://creativecommons.org/licenses/by-sa/4.0/

http://www.film-tv-video.de/newsdetail.html?&tx_ttnews%5btt_news%5d=43986&no_cache=1

http://www.film-tv-video.de/newsdetail.html?&tx_ttnews%5btt_news%5d=43986&no_cache=1


Page 20 / 126

Under the scope of 5G-MEDIA, UC2 examines how professional remote and smart broadcast productions can benefit from the advancement in 5G technology today and in the future.

Therefore UC2 examined two production methods “Remote Production” and “Mobile Contribution” in different scenarios (cf. Figure 3).

Figure 3: Use Case 2 - Scenarios overview

The scenarios in the “Remote Production” branch are aimed to replace the today common broadcast productions of events, which are characterized by large teams required on location, one or several OB-Vans (cf. Table 1: Overview of the different mobile units, example 1, 2a and 2b), and long preparation times for the placement and adjustment of audio and video equipment. Another time-consuming part is the set-up and facilitation of a control room for the audio- and video-engineers as well as the directing team. Moreover, the steadily rising cost pressure and complexity forces broadcasters to look for new, low-cost and time-saving production methods like remote and smart production. In a remote production, the control room at the broadcaster’s facilities is used. Therefore, less equipment and crew need to be present on site during the production process. Today dedicated connections are established between the event location and the broadcasting center to guarantee the required high performance and quality of the transmission. Smart mobile production as part of the “Mobile Contribution” branch on the other hand, addresses the needs of mobile reporters in the field who need to transfer content to the studio fulfilling the high broadcasting requirements. Due to slow Internet connections and bad reception conditions such streams often have poor or unreliable quality and cannot be broadcasted. Another aspect of “Mobile Contribution” covers the enhancement of the content by adding Cognitive Services to the workflow.

5G-MEDIA aims to overcome the limitations posed today on traditional broadcast productions by implementing orchestrated mobile contribution, remote and smart production over 5G networks for low-latency and high-bandwidth media streaming. 5G-MEDIA enables remote productions from anywhere without the need for dedicated infrastructure to be specifically deployed for the event. Cameras and audio equipment at the venue are connected via a 5G


Page 21 / 126

network to media production applications deployed and orchestrated by the 5G-MEDIA Service Virtualisation Platform (SVP) to ensure that the media processing functions are embedded within the network and cloud infrastructure enabling low latency and high throughput as required by live streaming and media processing. More details can be found in Deliverable D2.2 [2] and D6.1 [1].

2.1.2 “Remote Production” scenario storyline descriptions

All the scenarios described for the “Remote Production” branch are based on the following basic storyline.

An outside live event takes place that is going to be covered by the local broadcaster. A camera crew is sent out to capture the event. The broadcaster will produce a live stream of the event and distribute it (e.g. via its web channel). Three cameras are capturing the event sending out three typical broadcast signals in HD-SDI in the format 720p50. To feed these signals into the network a converter/gateway (Physical Network Function (PNF)) is used. The gateway converts the signals from HD-SDI to IP. Then they are fed into the 5G-MEDIA network (Network Function Virtualisation Infrastructure (NFVI)). Inside the network the media streams are processed by different media-specific Virtual Network Functions (VNF) which are instantiated upon request from the 5G-MEDIA catalogue and launched/onboarded (automatically) in the nearest/most appropriate NFVI (5G-MEDIA network) via the 5G-MEDIA MANO (Management and Orchestration) which gives directions to the NFVI VIM (Virtualised Infrastructure Manager). At first, the video signals are routed to the Media Process Engine (MPE) which is a virtual video switcher responsible for switching/mixing the input signals and creating the final broadcast signal. The switching is done on the basis of preview video streams that are transferred back to the broadcaster/studio (via Internet) where the director is situated. Based on those streams he decides which signal is to be broadcast connected with the editorial content and creative style of a program. That information is sent back to the MPE in the form of control signals and then those decisions are carried out by the MPE. After the MPE the final signal is sent to the vCompression Engine which is responsible for compressing and encoding the signal to a lower bitrate stream suitable for Wide Area Network (WAN) transfer and distribution (over the public Internet). Finally, the stream is sent to a “Speech-to-Text Engine” (Cognitive Service) where the spoken word is converted into text which is then added to the video stream as subtitles. This final stream can then be accessed via a browser (through the Internet) either from the broadcaster for further in-house use as a contribution signal or it can be accessed by the public as an online service offered by the broadcaster to the public (cf. Deliverable D6.1 [1]).


Page 22 / 126

2.1.3 “Mobile Contribution” scenario storyline descriptions

Scenario 2, “Mobile Contribution” (cf. Figure 4) concentrates on the news gathering aspect.

Figure 4: Use Case 2, “Mobile Contribution” scenario with FaaS architecture

An uninspected event like the burning of the Notre Dame in Paris happened. Now instead of sending out an OB-Van, which at the end due to firefighting operations can’t come close enough to the place. He takes his bag with the additional broadcast equipment including smartphones and can go there by walk. When he arrives there, our mobile journalist takes his smartphone and starts his 5G-MEDIA App. He configured the basic information like the broadcaster address and the Cognitive Services VNFs like “Speech-to-Text” or “Image/Face Recognition” he wants to use to enrich his content. As far as he has done this, he presses the video button and starts streaming to the broadcast center.

In the background the App now is connected to the 5G-MEDIA network for live streaming video and/or audio to the broadcasting center. The signal is compressed and encoded on the smartphone for transmission. Then the stream passes through a Cognitive Service function, triggered using FaaS, where it is enriched with additional information. In our case a face recognition service is applied that tags and identifies people in the video and provides the broadcaster with supplemental information and metadata from a database for further value-added services to enhance the viewing experience. At the network edge near the receiver a vCompression Engine decodes and decompresses the stream for further use and processing at the broadcaster (cf. Deliverable D2.2, Sections 4.1.3.2 and 4.4.1.1 [2]).


Page 23 / 126

2.2 Key Performance Indicators

KPIs are the cornerstones around which the benchmarking of the system during the testing phase is planned. In order to keep this document readable, we present in Section 2.2 the list of the Key Performance Indicators (KPIs) that were mapped to the UC2 scenarios and specific testbeds (cf. Table 2) mapped in Deliverable D6.1 [1] For further details please consult the respective documents.

Table 2: Use Case 2 - KPIs

5G-MEDIA KPI

Definition Specific Requirements Measurement/Assessment

Guaranteed user data rate (end-to-end)

Guaranteed/minimum bit rate or connection bandwidth to allow for broadcast quality video transfer/streaming in contribution. The quality has to be good enough to allow for later processing within the production chain.

The required bandwidth depends on the used video format/codec and the exact parameters specified.

• Uncompressed HD (1080i50) 1.5 Gbps

• H.264/AVC-Intra 25 Mbps

• H.265/HEVC 15 Mbps

(Listed here are pure video data rates per stream, for resulting data rates on the network layer add 10% overhead)

• Subjective Expert Viewing to assess video quality

• Measurement of the utilization of links at various points inside the network

• Measurement at the end points with test generators and professional network and video measurement equipment


Page 24 / 126

5G-MEDIA KPI


End-to-end latency

Maximum tolerable time a packet needs from the sender to the receiving side. That includes the infrastructure, the time needed for uplink, any necessary routing in the infrastructure, and the downlink.

End-to-end latency can be distinguished between pure network latency comprising the latency of the whole network path excluding end devices on-site and signal-transport latency which comprises the latency of the whole signal path including the processing of end devices on-site.

• Network latency (RTT): <= 50 ms

• Signal transport latency (one way):

(a) Audio/video t <= 500 ms

(b) Intercom t <= 100 ms (according to ITU G.114)

(c) Control t <= 50 ms


• Measurement of the latency inside the network

• Application level latency measurements

Service deployment time

Duration required for setting up end-to-end logical network slices characterized by the respective network level guarantees (such as bandwidth guarantees, end-to-end latency, reliability, etc.) required for supporting the media services.

• Service deployment t <= 5 min

• For immediate access (breaking news event) a deployment time well below 5 minutes is targeted

• Launch virtual functions, services or workflows and measure the deployment time till full functionality is reached.

Service reliability

Maximum tolerable packet loss rate at the application layer within the maximum tolerable end-to-end latency for that application.

An error-free/lossless transport of signals is targeted as signal loss and errors are not tolerable in live broadcast productions.

Max. packet-loss rate < 10-12

• Subjective Expert Viewing


• Measurement of the packet loss rate of links at various points inside the network.


Page 25 / 126

5G-MEDIA KPI


Service and SLA monitoring

Ability to define service and network metrics to monitor performances

Full traceability of the microservice components throughout their lifecycle even when placed/migrated to nodes administered by different actors.

Automatic negotiation and monitoring of specific SLAs between different actors.

• Compare predefined SLAs with real conditions and values.

(Virtualization infrastructure) scalability

Ability to support, seamlessly instantiate, migrate and up-/downscale media-related virtualized services over different NFVIs, with various virtualized functions

Fast scalability of network resources, bandwidth and virtual media functions is needed, especially in ad-hoc live productions that happen spontaneously or on short notice.

Adaptation of the virtualization infrastructure to the size of media coverage form: from one camera up to 20 or more cameras and other elements like microphones.

• Measure time for upscaling/downscaling of resources and/or time for instantiating new functions

QoS

Ability to provide different priority to different applications, users, or data flows, or to guarantee/reserve a certain level of performance/resources for a data flow.

Classification and prioritization of audio-video-streams. Due to the high requirements on latency, jitter, errors and bandwidth, media streams have to be prioritized in the network at all times.


• Measurement of the packet loss rate of links at various points inside the network.


Page 26 / 126

3 Technical Description

3.1 5G-MEDIA components used in Use Case 2

This Section provides an overview of the Virtual- and Physical Network Functions that were used in UC2.

3.1.1 Cognitive Services

During the 5G-MEDIA project two different cognitive services “Speech-to-Text Engine” and “Image/Face Recognition” were developed.

Speech-to-Text Engine

The “Speech-to-Text Engine” (vSpeech) belongs to the Media Network Functions.

It is a media-specific function based on deep learning technologies that recognizes speech within any audio and video material and converts it into a human-readable text format either for offline use or in real-time. Supported input signals include both raw and encoded audio and video files as well as streams based on one of the following input protocols: SRT, RTP, RTMP, RTSP, UDP and the TCP transport protocol. The text derived from the input stream can either be rendered directly into the output stream i.e. as subtitles, output as metadata file based on the W3C Video Text Tracks (VTT) specification2 or provided separately as metadata via appropriate (real-time) interfaces such as the HTTP Restful API, WebSockets or Webhooks. The list below summarizes the most important features:

• Integration of Mozilla DeepSpeech3 as an open-source implementation based on Baidu's Deep Speech research paper4 and Google's TensorFlow Framework5.

• Support of both Google Speech API and Mozilla DeepSpeech.

• Support for real-time speech recognition

• GPU support for real-time speech recognition

• A graphical user interface with integrated low-latency video player for the playback of live streams and support for real-time subtitling

• Support of different output formats including JSON, VTT and pre-rendered video with subtitles

• Provides several output interfaces including the HTTP Restful API, WebSocket and Webhooks

2 Source: https://www.w3.org/TR/webvtt1/ 3 Source: https://github.com/mozilla/DeepSpeech 4 Source: https://arxiv.org/abs/1707.07413 5 Source: https://www.tensorflow.org/

https://www.w3.org/TR/webvtt1/

https://github.com/mozilla/DeepSpeech

https://arxiv.org/abs/1707.07413

https://www.tensorflow.org/


Page 27 / 126

Since the release of Deliverable D4.2 [3], the Speech-to-Text Engine has been refined and further developed in two different variants, one for the UC2 “Remote and Smart Production”, and the other one for the UC2 “Mobile Contribution” scenario.

The main reason for this decision was to support serverless architectures where the function to be executed is essentially limited to exactly one task. The requirements on the part of the media-specific VNF included a headless design, GPU support, a strict non-use of voice activity detection (VAD) and the abstraction of the actual speech recognition and all other logic based on pure high-level programming languages such as Python or JavaScript.

Regarding VAD is to be noted that UC2 Remote and Smart Production could basically be assumed to be a live production, whereby in the case of a mobile contribution, the A/V material is often only released for further processing on-demand, i.e. after receipt of the journalist's contribution. This results in two different scenarios, one for an offline and one for live production. Due to technical limitations of the open-source framework Mozilla DeepSpeech, no VAD may be included in the signal flow in case of an offline production. Otherwise, the metadata cannot be created properly. For the “Mobile Contribution” case, the VAD was deliberately omitted, whereas it has been added for the Speech-to-Text Engine of the UC2 “Smart and Remote Production” for an optimized speech recognition performance.

In addition, the input formats and interfaces differ between both Speech-to-Text variants. While the primary implementation expected A/V material via SRT, RTP, RTMP, RTSP, UDP or TCP, the serverless form of the Speech-to-Text Engine requires a raw audio stream with a frequency of 16000 Hz mono to be provided via the integrated WebSocket Server.

An overview of the refinement and further development, as well as the differences between the two Speech-to-Text variants, can be seen in Figure 5.

Figure 5: Speech-to-Text Engines of Use Case 2 Remote and Smart Production and “Mobile Contribution” Scenario

For further details please consult the Deliverable D4.2, Section 7.2.1.4 [3].


Page 28 / 126

Image/Face Recognition

The Image/Face Recognition Engine (vDetection) belongs to the Media Network Functions.

It is a media-specific function that detects and recognizes Objects, such as images/faces, within a given image. Typical steps in machine learning for computer vision are to localize, classify and recognize objects in images. Well-known use-cases like object-detection or face-recognition are combinations of these steps is a continuous application of these steps to a sequence of images (video). In addition to the indicator of how accurately an object or face has been recognized, the time factor also has a significant role to play. Depending on the use case, images must be analyzed in real-time or may take a little more time to achieve higher accuracy. The current state of research is formed by the two model families Region-based convolutional neural network (R-CNN) and You only look once (YOLO) - the latter is designed for real-time analysis and usually achieves a lower accuracy than the CNNs. The Engine uses the TensorFlow framework for JavaScript with pre-trained Models for Face-Detection. The Engine can start with one of two Models: SSD (Single Shot Multibox Detector) based on MobileNetV1 as part of the R-CNN family and Tiny Yolo V2 as a lightweight YOLO approach. The latter Tiny Yolo V2 Tiny Face Detector is a very performant, real time face detector, which is much faster, smaller and less resource consuming compared to the SSD Mobilenet V1 face detector, in return it performs slightly less well on detecting small faces. For Face-Recognition purposes, the detected Faces and its metadata need to be processed a second time with a Model trained for "known" Faces. Conceptually the Engine looks up images of known persons within a pre-defined subfolder on start-up and trains its own Face-Recognition Model with these given images. Hereby the Filenames of the images provide the labels. In the following, the basic design and its parts will be represented.


Page 29 / 126

Figure 6: Image/Face Recognition Engine: Design approach

Design

The Image/Face Recognition Engine is built up into five different main (grouped) parts, which

are chained together and where each fulfils a particular task (cf. Figure 6). These are:

• Input: Describes the input source, which is an image (image buffer stream) with its corresponding metadata and the timestamp. The Image/Face Recognition Engine starts a WebSocket Server and awaits messages in JSON format containing the image as image buffer and the metadata with information about the resolution of the image – and if the image was extracted from a video: The frames per second of the video, the resolution of the video and the presentation timestamp (PTS) at which the image was extracted from the video – and the timestamp the message was sent by a connected WebSocket client. Each image buffer will be converted into an image as tensor2d model for further processing. The following JSON shows an example of a message received as input.


Page 30 / 126

Message 1: JSON example of a message received as input

{

timestamp: 1582977600,

metadata: {

id: “12700”,

fps: 25,

pts_time: 12.7,

image_width: 320,

image_height: 180,

video_width: 1920,

video_height: 1080

},

buffer: ….

}

• Processing Detection: Each tensor2d image is processed in the sense of object detection based on the pre-trained model for face-detection. The resulting metadata is passed on for further processing.

o The metadata is passed on to Processing Recognition. o The metadata is passed on to Handler Add to Report.

• Processing Recognition: The metadata as a result of the processing step Detection

can contain a list of detections, each detection is processed in the sense of object

recognition based on the own pre-trained model for face-recognition which was

trained on start-up of the function. The resulting metadata is passed on for further

processing.

o The metadata is passed on to Handler Add to Report.

The following JSON shows an example of a metadata result of the processing

Recognition step which will be added to the Report object (JSON). A detection in the

list of results contains the detection with the coordinates where the object (face) was

detected, the landmarks of the detected face, the descriptor comparing the match

possibility against the own pre-trained model of persons of interest, some further data

like age, gender and face-expressions and the best match of all recognitions as label.


Page 31 / 126

Message 2: JSON example of a metadata result of the processing Recognition step which will be added to the Report object

"12700": {

id: 12700,

ts: 637,

metadata: {

id: 12700,

fps: 25,

pts_time: 12.7,

image_width: 320,

image_height: 180,

video_width: 1920,

video_height: 1080

},

results: [

{

detection: {

….

},

landmarks: {

….

},

descriptor: [

….

],

gender: "male",

genderProbability: 0.8751861229538918,

age: 38.889286041259766,

expressions: {

neutral: 0.9488733410835266,

happy: 0.0004964579711668193,

sad: 0.011710616759955883,

angry: 0.03139818459749222,

fearful: 0.000050215581723023206,

disgusted: 0.005097310524433851,

surprised: 0.0023738357704132795

},

bestMatch: {

_label: "AlbertEinstein",

_distance: 0.535172176188348

}

]

}


Page 32 / 126

• Handler Add to Report: The metadata as a result of either processing step Detection or Recognition will be added to a Report with the corresponding metadata of the image. If set the Report will be stored locally on the filesystem as JSON file.

• Handler Convert to SSA/ASS: The Report will be converted into an SSA/ASS specification6 subtitles file which will be sent to a predefined endpoint. The SSA/ASS format is an alternative specification to W3C WebVTT for subtitles, which allows additional commands for drawing vector graphics; the advantage of this approach is that media players natively support SSA ASS as well as W3C WebVTT; they render the commands on demand so the user can decide whether to display detections/recognitions or not. The following lines of an SSA/ASS subtitles file show an example of a rectangle (vector graphic) around the face of a person labelled as Albert Einstein.

6 Source: https://www.matroska.org/technical/specs/subtitles/ssa.html

https://www.matroska.org/technical/specs/subtitles/ssa.html


Page 33 / 126

Message 3: Lines of an SSA/ASS subtitles file, that show an example of a rectangle (vector graphic) around the face of a person labelled as Albert Einstein.

[Script Info]

Title: object detections

Original Script: BitTubes GmbH

ScriptType: v4.00+

Collisions: Normal

PlayResX:1920

PlayResY:1080

PlayDepth: 0

Timer: 100.0000

[V4+ Styles]

Format: Name, Fontname, Fontsize, PrimaryColour,

SecondaryColour, OutlineColour, BackColour, Bold, Italic,

Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle,

BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR,

MarginV, Encoding

Style: Detection, Arial, 12, &HFF000000, &H66000000,

&H00B4FCFC, &HFF000000, 0, 0, 0, 0, 100, 100, 0, 0, 1, 1, 0,

10, 0, 0, 0, 0

Style: Label, Arial, 24, &H00B4FCFC, &H00B4FCFC, &H00000008,

&H80000008, -1, 0, 0, 0, 100, 100, 0, 0, 1, 1, 2, 10, 20, 20,

20, 0

Style: Subtitle, Arial, 24, &H00B4FCFC, &H00B4FCFC,

&H00000008, &H80000008, -1, 0, 0, 0, 100, 100, 0, 0, 1, 1, 2,

2, 0, 0, 100, 0

[Events]

Dialogue: 1, 0:0:12.70, 0:0:12.73, Detection, NTP, 0000, 0000,

0000, , {\pos(319, 61)\p1}m 0 0 l 52 0 l 52 81 l 0 81 {\p0}

Format: Layer, Start, End, Style, Name, MarginL, MarginR,

MarginV, Effect, Text

Dialogue: 1, 0:0:12.70, 0:0:12.73, Label, NTP, 0000, 0000,

0000, , Gender: male (87%)\NAge: 38.89\NExpressions:

neutral\NLabel: AlbertEinstein (0.53)

Format: Layer, Start, End, Style, Name, MarginL, MarginR,

MarginV, Effect, Text

…

For better integration of TensorFlow into the JavaScript runtime environment Node.js the Node.js package/extension face-api.js7 was used, which provides a TensorFlow core-binding for Node.js. Compared to a previous version the Framework was changed from OpenCV to TensorFlow. This decision was based on the insight of a leaner implementation and better support for GPU supported image processing under containerized runtime environments like Docker. Previously under the framework, OpenCV face-api.js was also used as Node.js

7 Source: https://github.com/justadudewhohacks/face-api.js

https://github.com/justadudewhohacks/face-api.js


Page 34 / 126

package/extension together with the extension opencv4nodejs8 as binding between Node.js and OpenCV. The interaction of these extensions with OpenCV as a native library under Linux required more resources and computing power compared to TensorFlow. Based on the face-api.js framework the output format of detections and recognition doesn't change. As a result, this service provides computer vision-specific metadata derived from the detected and recognized images/faces. This metadata can be provided as JSON via WebSockets or a subtitles file following the SSA ASS specification – a subtitle specification for enriched W3C WebVTT files which allows drawing vector graphics via commands.

The new implementation of the Image/Face Recognition Engine based on TensorFlow

supports GPU-supported processing for a more performant and real-time processing. With

GPU-based processing, the Image/Face Recognition Engine achieved almost twice the

performance of standard CPU processing with the same video. Face detection and recognition

of 10, 15, 20 and 25 up to 30 images per second with GPU support were analyzed in real-time

(cf. Figure 7, diagrams above). Furthermore, the GPU supported processing allows to select

the more accurate R-CNN Model SSD MobileNet V1 with better results in face-detection and

8 Source: https://github.com/justadudewhohacks/opencv4nodejs

— TinyYoloV2 — SSDMobileNetV1

Each Diagram shows 5 runs of Image/Face Recognition Engine with 10, 15, 20, 25, 30 frames per second (X-Axis) and the according processing time (Y-Axis) for the pre trained Face-Detection model TinyYOLO v2 (blue) and SSD MobileNet v1 (red). The left diagram shows the result of CPU processing with a linear increase of processing time with increasing frames per second whereas on the right the GPU-supported processing time stays nearly constant and slightly increases with 30 fps.

Figure 7: Image/Face Recognition Diagrams for the pre trained model TinyYOLO v2 (blue) and SSD MobileNet v1 (red). On the left side in the CPU version, and on the right side the GPU version.

https://github.com/justadudewhohacks/opencv4nodejs


Page 35 / 126

face-recognition with the same performance in terms of processing time compared to the

YOLO model TinyYolo v2 (cf. Figure 7, the lower diagram - compare the red and the blue line).

Configuration

To configure the function, it needs at least the PORT of its WebSocket Server to receive messages containing the images to process. Optional parameters concern the service itself such as the timeout how many times after the last processed image the Report or subtitles file will be sent, the address of the endpoint to send, the selection of the algorithm and thresholds for face-detection and -recognition and whether to use GPU-support or not. The following table shows all the relevant parameters:

Table 3: Overview of Image/Face Recognition relevant Parameters

Parameter name Parameter description

DEBUG Flag whether to output the LOG or not

STREAM_TIMEOUT Timeout after the last processed image

REQ_URL The Endpoint URL to send the Report or SSA ASS subtitles file

REQ_METHOD Select from the following HTTP request methods: POST or PUT

USE_GPU Flag whether to use GPU-support or not

USE_RECOGNITION Flag whether to process with face-recognition or just face-detection

CNN_MODEL

Select from the following detection models9: SSDMobileNetV1 or TinyYoloV2

● SSDMobileNetV1: SSD (Single Shot Multibox Detector) based on MobileNetV1.

● TinyYoloV2: Tiny Yolo V2. The Tiny Face Detector is a very performant, real time face detector, which is much faster, smaller and less resource consuming compared to the SSD Mobilenet V1 face detector, in return it performs slightly less well on detecting small faces.

9 Source: https://github.com/justadudewhohacks/face-api.js/blob/master/README.md#face-detection-models

https://github.com/justadudewhohacks/face-api.js/blob/master/README.md%23face-detection-models


Page 36 / 126


MIN_CONFIDENCE Processing detection minimal confidence in percentage (default: 60% as 0.6, best and max: 100% as 1.0)

MAX_DESCRIPTOR_DISTANCE Processing recognition maximal distance (default: 0.6, best and min: 0)

PROCESSING_TIMEOUT Flag whether to use a timeout for processing steps. If use timeout the time for process the detection will be 1000ms/{fps}, i.e. 25 fps, max timeout time will be 40ms.

USE_AGE_GENDER_EXPRESSIONS Flag whether to detect age, gender and face-expressions for each face-detection result.

EXPRESSION_THRESHOLD Processing face-expression minimal confidence in percentage (default: 75% as 0.75, best and max: 100% as 1.0)

WS_PORT Port of the own WebSocket server

Packaging

The “Image/Face Recognition” Engine is packaged to the following formats:

• Docker: Build script included

Further Information

For further information, please see the documentation of the “Image/Face Recognition” Engine in the README.md of its Git repository.

• Since the release of D4.2 [3], the Image/Face Recognition Engine has been adapted to a new approach to unify the interfaces of cognitive services. The following changes have been done.

• Integration of Google's TensorFlow Framework and the corresponding JavaScript bindings for Node.js and the Node.js package/extension face-api.js. Removing OpenCV and its JavaScript bindings for Node.js and the Node.js package/extension opencv4nodejs.

• Improved synchronization of video and Metadata based on the presentation timestamp (PTS) of each image (frame) and the subtitles specification SSA ASS (drawing rectangles around detected and recognized objects, like faces, and their according labels).

• Extend by an additional output format/interface. o Metadata as JSON via WebSocket and API (client or server-side rendering). o Metadata converted into “subtitles” as SSA/ASS file.


Page 37 / 126

• Decoupling of FFmpeg's10 low-level tasks from the actual inference task by using WebSockets. This allows the actual Image/Face Recognition process to be executed as a true function within serverless infrastructures.

• Integration of GPU support regarding real-time Image/Face recognition.


3.1.2 Media Process Engine

The Media Process Engine (MPE) belongs to the Media Network Functions.

It is a media-specific function, which acts as a video signal switcher, based on the open-source tools Voctomix11 and GStreamer12. Concretely, the MPE VNF is composed by voctocore13, the processing part of the Voctomix solution. This processing core has a python-based code that allows to switch between three different input video source streaming’s, and to compose a new video streaming, by composing the input together with a background.


3.1.3 vCompression Engine

The vCompression Engine (vCE) belongs to the Media Network Functions.

It is a media-specific function for compression/decompression and encoding/decoding of incoming audio and video streams with low latency. The vCompression Engine can be adapted to different content categories by applying different prototypical production profiles (see below). It is based on open-source encoding techniques and uses the latest video standards such as SMPTE ST 2110 and H.264/H.265. It is intended for use before and after WAN transfer and is capable of dynamically adjusting encoding properties during runtime. Properties such as bitrate can be set on-the-fly via the RESTful API or performance metrics such as the actual and average bitrate, quality, CPU or RAM usage can be monitored. The vCE is completely implemented as software and will replace traditional hardware en-/decoders in the future due to its flexibility and agility especially with regard to remote productions.

Since the release of D4.2 the vCE has been extended in terms of changing the video resolution at runtime. The approach that has been developed and implemented describes the parallel execution of several FFmpeg instances, which all share the same input stream but provide video streams with different resolutions but otherwise identical parameters. The downstream video switcher receives these video streams as its input and produces the final stream as output, which can be configured via the RESTful API.

10 Source: https://ffmpeg.org/

11 Source: https://github.com/voc/voctomix

12 Source: https://gstreamer.freedesktop.org/

13 Source: https://github.com/voc/voctomix/tree/master/voctocore

https://ffmpeg.org/

https://github.com/voc/voctomix

https://gstreamer.freedesktop.org/

https://github.com/voc/voctomix/tree/master/voctocore


Page 38 / 126

The approach using FFmpeg's ZeroMQ14 filter to change parameters of the scale filter cannot be used to adjust the resolution on-the-fly because after changing the filter graph the stream is basically no longer played back properly on the client (VLC, ffplay, etc.).

Other important enhancements of the vCE concern its adaptation especially for the multi-session scenario regarding application-side identity management, format, connection and fault tolerance towards the 5G-MEDIA’s MAPE/SVP messaging system.

vCE Production Profiles

In order to further refine the performance, three types of production profiles (low, standard and high) were defined in the encoding formats H.264 and H.265.

These production profiles are based on different factors like i) the different television production guidelines, e.g. the one of the Spanish broadcaster RTVE, or the “Guideline TPRF-HDTV”15 from the German public broadcasters, ii) the different standards like the ETSI ES 201 03-4 “Dynamic synchronous Transfer Mode (DTM); Part 4: Mapping of DTM frames into SDH containers”16 or the ITU T-REC-G.826 “End-to-end error performance parameters and objectives for international, constant bit-rate digital paths and connections”17, iii) the used codec (in our case H.264 or H.265) and iv) the type of the produced content. Here we distinguish between the categories “SPORTS”, “MIXED” and “NEWS”.

Every production profile defines a lower and an upper bandwidth, based on the codec, the category of the content, and the different factors mentioned above (cf. Table 4: Lookup table for Production Profiles with H.264 and H.265 encoding). This lower and upper bandwidth is later on used by the CNO as border to select the appropriate compression level regarding the available bandwidth (cf. Section 3.2.2 of this document).

In the “low” production profile, the focus is on the usage of less bandwidth going along with a higher compression rate and therefore worse video quality. In the “standard” production profile, bandwidth, compression, and video quality are in balance. In the “high” production profile, the focus is on high video quality going along with low a compression rate.

14 Source: https://ffmpeg.org/ffmpeg-filters.html#zmq_002c-azmq

15 Source: https://www.irt.de/en/publications/technical-guidelines/technical-guidelines-download

16 Source: https://www.etsi.org/deliver/etsi_es/201800_201899/20180304/01.01.01_50/es_20180304v010101m.pdf

17 Source: https://www.itu.int/rec/T-REC-G.826-200212-I/en

https://ffmpeg.org/ffmpeg-filters.html#zmq_002c-azmq

https://www.irt.de/en/publications/technical-guidelines/technical-guidelines-download

https://www.etsi.org/deliver/etsi_es/201800_201899/20180304/01.01.01_50/es_20180304v010101m.pdf

https://www.itu.int/rec/T-REC-G.826-200212-I/en


Page 39 / 126

Table 4: Lookup table for Production Profiles with H.264 and H.265 encoding

Encoding format with

Bandwidth in Mbps

Production Profiles

Low Standard High

Bandwidth Bandwidth Bandwidth lower upper lower upper lower upper

H.264

Catego

ry

SPORTS 10 20 20 25 25 50

MIXED 5 10 15 20 20 40

NEWS 5 10 10 15 15 20

H.265

Catego

ry

SPORTS 10 15 15 20 20 50

MIXED 5 10 10 15 15 35

NEWS 5 10 10 10 10 15

Note: The point of the example table above is to illustrate that the quality/production profile has two separate factors: the content category/genre affects the bitrate (depending on the type of codec used, of course) and each content category/genre can be compressed at different quality levels. The broadcaster needs to decide which quality level they want for a certain type of content and this will then define the “range” of bitrates (compression levels) that are suitable for that production. This range will then be used to inform the CNO of the upper and lower bounds on bitrate/compression level for that stream. This table can also be related to the “quality profiles and compression levels” table above, each cell in the latter table can be mapped to one of the colored profile ranges in the first table.

vCE GUI

With the introduction of the production profiles and due to the fact that the vCE needs to be feed starting values (bitrate, encoding and more), it was necessary to provide a configuration tool. For this initial configuration of a vCE an easy to use tool was needed and the vCE GUI was implemented (cf. Figure 8).

With the vCE GUI, the configuration of a vCE is easy and can be done with a few mouse clicks. Firstly, the user has to select the encoder. For now, only H.264 is supported but others are prepared (H.265 and JPEG2000). Secondly and thirdly, the production profile and the content type have to be selected. They define the upper and lower bandwidth limit (not directly user selectable). Finally, by accepting the configuration, the user initiates an instance of vCE with the respective settings.


Page 40 / 126

Figure 8: vCE GUI screenshot

Figure 9: vCE GUI scheme shows a simple scheme how this tool works.

Figure 9: vCE GUI scheme

START

Connect toKafka bus

Wait for new vCE

Add new vCEconfig tab

accept/discardVCE config

Write vCE configinto the Kafka bus

Remove vCEconfig tab


Page 41 / 126

It connects to the Kafka bus on the topic “app.vce.metrics” (cf. Message 4) and waits for new vCE.

Message 4: JSON example of a "app.vce.metrics" message received as input for the GUI

{

"id":"06:00:cc:74:72:99",

"utc_time":1583486232425,

"pid":6025,

"pid_cpu":46,

"pid_ram":97464320,

"gop_size":25,

"num_fps":25,

"num_frame":16590344,

"enc_quality":69,

"enc_dbl_time":663613.76,

"enc_str_time":"1",

"max_bitrate":10,

"avg_bitrate":14723.7,

"act_bitrate":157,

"enc_speed":1

}

If a new vCompression Engine was found it adds a new configuration page. The title identifies the vCE. After accepting or discarding the configuration for a vCompression Engine this page is closed. If the new configuration was accepted the tool writes a JSON string into the Kafka bus on topic “ns.instances.demandconf” (cf. Message 5).

Message 5: JSON example of a " ns.instances.demandconf" message send to the Kafka bus after it is confirmed in the GUI

{

"id":"06:00:cc:74:72:99",

"utc_time":1583486232425,

"pid":6025,

"pid_cpu":46,

"pid_ram":97464320,

"gop_size":25,

"num_fps":25,

"num_frame":16590344,

"enc_quality":69,

"enc_dbl_time":663613.76,

"enc_str_time":"1",

"max_bitrate":10,

"avg_bitrate":14723.7,

"act_bitrate":157,

"enc_speed":1

}


Page 42 / 126


3.1.4 vProbe

The vProbe is part of to the generic Network Function. Its functionality is to analyze the multimedia content passing through it and to provide a measure of Quality of Experience (QoE).

The vProbe network function aims to measure the QoE based on the media content that the platform is providing to the end user. This python-based function uses different open-source tools and libraries for its running process: Keras18, sklearn19 and FFmpeg.

The media workflow in UC2 is regular video content so the model obtains the measurement of MOS in real-time. More details are depicted in the Deliverable D4.2, Section 7.2.1.5 [3].

3.1.5 vUnpacker

The vUnpacker belongs to the Media Network Functions.

It is a VNF that enables the support of UDP protocol and the IP SMPTE standard ST 2110. This VNF is based on a modification of the open-source tool FFmpeg provided by Canadian Broadcas

ting Corporation20 allowing the decoding of SMPTE ST-2110 RTP stream video over IP, creating a regular TCP workflow in the output for the rest of VNFs.


3.1.6 5G-MEDIA App

The “5G-MEDIA Journalist” App is a mobile application that is developed for journalists. It enables journalist to a) record and upload the recorded content and/or b) stream audio video content back to the broadcaster, for example in a traditional reporting use case over the 5G-MEDIA Platform. The basic functionality of the app is recording, uploading of recorded content, and direct streaming of audio and/or video to a central infrastructure for storage or further distribution. The app targets journalists and reporters and could later be extended for spectators as well as for producing user-generated content.

18 Source: https://keras.io/

19 Source: https://scikit-learn.org/stable/modules/classes.html

20 Source: https://github.com/cbcrc/FFmpeg

https://keras.io/

https://scikit-learn.org/stable/modules/classes.html

https://github.com/cbcrc/FFmpeg


Page 43 / 126

This kind of apps is currently used broadly among broadcasters to record direct-quotes and interviews in the field and send them back in a timely manner to the editorial offices. Examples include popular apps like Periscope as well as broadcast specific apps like the ZDF Reporter App (cf. Figure 10).

Figure 10: GUI of Periscope App21 and ZDF Reporter App22

3.1.7 Splitter

The Splitter belongs to the Media Network Functions.

The Splitter is a media-specific function that separates exactly one A/V input stream into its constituent parts audio, video and image and provides these at its output as independent audio, video and image streams. According to the description, the Splitter has one input and three outputs, where at least one of the three outputs must be configured for proper execution. The encoded and extracted output signals are directly forwarded to their endpoints without buffering. To ensure that the transmission path to the input and the input itself cause minimal delays as well, the Splitter supports the Secure Reliable Transport23 (SRT). The special added value of these open-source transport technology compared to Adobe’s proprietary RTMP is its dynamical adaption to real-time conditions between transport endpoints, its reliability and low latency features.

Design

The Splitter is essentially based on FFmpeg compiled with Haivision’s open-source implementation24 of the SRT protocol. At the input, an FFmpeg instance handles the reception of the input stream, which forwards/pipes the stream directly to the next level without changing it. Depending on the configuration at least one and up to three FFmpeg instances are located in the following level, which are used for the de-/encoding into the corresponding

21 Source: https://www.periscope.tv/press

22 Source: https://www.zdf-digital.com/reporter-app/

23 Source: https://www.srtalliance.org/

24 Source: https://github.com/Haivision/srt

https://www.periscope.tv/press

https://www.zdf-digital.com/reporter-app/

https://www.srtalliance.org/

https://github.com/Haivision/srt


Page 44 / 126

audio, video and image format. The results are piped into separate WebSocket streams (for audio, video and image) that was established during instantiation with the endpoints. The only exception is that for the video output, where FFmpeg streams it directly to the remote endpoint without WebSocket connection.

Figure 11: Splitter design and implementation

The Splitter is built up into 5 different main (grouped) parts. These are:

• Input/ingress: The first FFmpeg consumes a video via SRT and outputs the stream into the I/O channel 1 (stdout).

• Audio: The FFmpeg for the audio stream consumes a video via I/O channel 0 (stdin), extracts the audio (by removing the video with the FFmpeg flag -vn) and outputs the stream into the I/O channel 3.

• Image: The FFmpeg for the image stream consumes a video via I/O channel 0 (stdin), extracts images (frames) from the video and metadata by parsing the FFmpeg output of showinfo. Both the image buffer and metadata as JSON are built to a message (JSON) and the message is written into the I/O channel 4.

• Video: The FFmpeg for the video stream consumes a video via I/O channel 0 (stdin) and either outputs the stream via FFmpeg directly to an endpoint (URL) or into the I/O channel 5.


Page 45 / 126

WebSocket client: For each use-case audio, image and video a WebSocket client is started with an endpoint (WebSocket server). The message stream (image) or datastream (audio, video) will be sent as data stream via WebSocket.Configuration

To configure the function, it needs at least one-use case of audio, image or video to be enabled and the corresponding URL and PORT of the endpoint (e.g. WebSocket server). The following table shows all the relevant parameters:

Table 5: Overview of the splitter functions


DEBUG Flag whether to output the LOG or not

STREAM_INPUT Timeout after the last processed image

ENABLE_STREAM Flag whether enabling use-case Video

STREAM_OUTPUT_URL Host/Endpoint for the Video, default: udp://0.0.0.0:5004?pkt_size=1316.

STREAM_OUTPUT_FORMAT The output format of the video stream, default: MPEG-TS.

ENABLE_VSPEECH Flag whether enabling use-case Audio.

VSPEECH_HOST The host of the WebSocket server for Audio, default 0.0.0.0 (localhost).

VSPEECH_PORT Port of the WebSocket server for Audio, default: 8885.

ENABLE_VDETECTION Flag whether enabled use-case Image.

VDETECTION_HOST Host of the WebSocket server for Images, default 0.0.0.0 (localhost).

VDETECTION_PORT Port of the WebSocket server for Images, default: 9995.

VDETECTION_RESIZE_WIDTH FFmpeg and image2pipe will resize the extracted Image (Size at which image will be processed, the smaller the faster, but less precise in detecting smaller faces, must be divisible by 32, common sizes are 128, 160, 224, 320, 416, 512, 608), default: 320.

VDETECTION_FPS Frames per second FFmpeg and image2pipe will extract Images, default: 10.


Page 46 / 126

Packaging

The Splitter is packaged to the following formats:

• Docker: Build script included

Further Information

For further information, please see the documentation of the Splitter in the README.md file of its Git repository

3.1.8 5G-MEDIA Endpoint

For the “Mobile Contribution” scenario two endpoints, as Safe-Local endpoints, at the Spanish broadcaster RTVE and at IRT were installed.

The function of a Safe-Local endpoint is to store the ready-made contributed media stream and its metadata for a later consumption by the broadcaster. The amount and kind of metadata like caption and/or face recognition features depends on the fact if any of the cognitive services like Speech-to-Text (cf. Section 3.1.1.1) or Image/Face Recognition (cf. Section 3.1.1.2) were requested by the journalist and therefore used in the workflow.

In our prototype, the Safe-Local endpoint is a Linux Ubuntu 18.04 VM (it could also be a baremetal server) with 2vCPUs, 8 GB RAM and 100 GB of disk installed with docker CE. It runs a dockerized version of nginx with nginx-rtmp-module 25 plugin capable of storing and serving RTMP streams (via ports 1935 and 8080). The Safe-Local endpoint is also pre-installed at the edge.

3.1.9 5G-MEDIA Gateway

The 5G-MEDIA Gateway is a PNF and belongs to the Media Network Functions.

The purpose of the 5G-MEDIA Gateway PNF is to convert incoming SDI signals from cameras or video servers to IP to make use of them inside the network or vice versa converting those signals back to SDI.

Gateways are typically hardware appliances so UC2 uses here the vendor solution from Nevion the Nevion Virtuoso IP production Platform. This Platform is a software defined media node platform designed to meet the challenges of an IP-based live broadcast production environment. The platform runs virtualized media functions dependent of the user’s needs. The functions reach from encoding/decoding and transport protection to monitoring, and signal processing.

In UC2 the Nevion Virtuoso IP production Platform is used to packetize incoming SDI signals without any encoding. After the encoding the signal is transmitted with the uncompressed bitrate of about 1,5 Gbps per signal stream.


25 Source: https://github.com/arut/nginx-rtmp-module

https://github.com/arut/nginx-rtmp-module


Page 47 / 126

3.2 5G-MEDIA services used in Use Case 2

This Section provides an overview of services of the 5G-MEDIA Platform that were used in UC2.

3.2.1 Monitor Analyze Plan Execute (MAPE)

Figure 12: Involved MAPE services in UC2

Figure 12 depicts the involved MAPE services in UC2. Since a network service is running, the MAPE services collects infrastructure, application/VNF-specific and QoE monitoring data, adapts them in a common format and correlates with the running service. Each VNF publishes monitoring data in a specific topic (based on its type) of the publish/subscribe broker (Kafka bus). Figure 13 and Figure 14 depict the exact flow of the monitoring data collection from the infrastructures.


Page 48 / 126

Figure 13: Provision of OpenNebula-NFVI level monitoring data using its XML-RPC API

Figure 14: Provision of FaaS-NFVI level monitoring data using the Prometheus API in kubernetes cluster

The next service that involves in the UC2 scenarios is the Translation service (including the caching mechanism), which correlates the original monitoring data with the running VNFs/NS and adapt them in a unified structure. The adapted data are published in the topic ns.instances.trans of the publish/subscribe broker (Apache kafka26 bus).

After that, three parallel flows are occurred. Through the Metrics importer the adapted data are stored in the database, where InfluxDB27 time series is used, and visualized from built UI dashboards in the Grafana28. In the second flow, the adapted monitoring data are filtered

26 Source: https://kafka.apache.org/

27 Source: https://www.influxdata.com/

28 Source: https://grafana.com/

https://kafka.apache.org/

https://www.influxdata.com/

https://grafana.com/


Page 49 / 126

from the accounting agent service (keep the metrics relevant with the VNFs resource consumption) and are provided in the Accounting service through its API. According to the third parallel flow, the service specific CNO (SS-CNO), part of the hierarchical CNO, consumes the adapted data from the pub/sub broker, analyses them and suggest an optimisation action. A possible optimisation action could be for e.g. the change of the virtual Compression Engine (vCE VNF) bitrate in runtime with respect to the selected production profile and its range. More details about the cognitive network optimisation in the scenarios of this use case will follow in the next Sections. To close the loop, a specific plugin of the Execution service is used that receives the optimisation action from the ns.instances.exec topic and converts it to a configuration message of the vCE VNF. This configuration message is published in the publish/subscribe broker under the topic ns.instances.conf using as the VNF type and the VNF uuid as identifiers. Afterwards, the proper vCE VNF instance will process and apply the new configuration. Figure 15 depicts the exact flow.

The recommendation service is another involved service that analyzes the historical monitoring data related to the VNFs consumption and suggests new flavors for the defined VDUs per VNF when over-utilization or under-utilization is noticed. The recommendation is registered in the Issue Tracking Server though its API and allows the developer of the target VNF descriptor to provide a new flavor through the SDK.


Page 50 / 126

Figure 15: Virtual Compression Engine day1, 2, …n configuration though the MAPE services

3.2.2 Service Optimization with the Cognitive Network Optimizer

With the closeup of the Telefónica OnLife testbed to the 5G-MEDIA Platform makes it possible to use the Cognitive Network Optimizer (CNO) support to achieve a better performance. This was done by refinements of the bandwidth usage by different algorithms (depending if they were used in the single-instance production or multi-instance production scenario) of the CNO.

Role of Cognitive Network Optimization algorithms in the “Remote Production” service

In Section 4.2 we discussed the single-instance (cf. Section 4.2.2) and multi-instance (cf. Section 4.2.4) workflow of this use case in detail. In this Section we first focus on the role of the Service Specific-Cognitive Network Optimizer (SS-CNO) per-session reinforcement learning (RL) algorithm (cf. Section 3.2.2.2) that is mainly about selecting suitable target


Page 51 / 126

bitrate (i.e. compression level) for each stream from the vCE VNF to the broadcaster site. The vCE is used for compressing the audio and video streams for Internet transfer supporting a much lower bitrate. SS-CNO facilitates this task by dynamically selecting a suitable target bitrate as the underlying network condition changes.

In the Section 3.2.2.3, we describe the arbitrator algorithm of the SS-CNO that is mainly designed to ensure the total network capacity available to a service (SS-CNO) is distributed across multiple active sessions fairly according to each session’s priority that is stated in the video quality profile. The arbitrator algorithm of SS-CNO is able to control the bandwidth distribution in such a way because it is aware of active sessions and their network condition, current bandwidth usage, requirements, and constraints.

SS-CNO reinforcement learning algorithm

For the remote and smart production use case we adopt a reinforcement machine learning algorithm to adjust the target bitrate (compression level) of a video streaming session dynamically according to current underlying network condition with aim of achieving high QoE while maximizing the total number of video sessions sharing the same resources.

Several different reinforcement learning (RL) algorithms could be used to train the SS-CNO learning agent in 5G-MEDIA’s remote production use case. For instance, the following have previously been used in the context of streamed media content, including Asynchronous Advantage Actor-Critic (A3C) [4], Deep Q-Network (DQN) [5], REINFORCE [6], tabular Q-Learning [7].

With tubular Q-learning, all possible actions (e.g. choosing a video chunk target bitrate, switching between CPU/GPU) and states (i.e. observations from environments like available network capacity, loss rate, latency) are stored in a single table called a “Q-table”. Each row of the Q-table represents a separate state and each column maps to a distinct action. The key drawback of table-based algorithms is that they do not scale well when the state-action space is relatively large.

A typical approach to solve the scalability issue of tabular RL solutions is to decrease the state space by making some simplifications that often translate to unrealistic conditions. For example, the state of the art tabular RL algorithm for choosing the correct bitrate of video chunks assumes that network throughput follows a Markovian model [8]. This means that the agent does not need to hold throughput information of the past video chunks but knowing the throughput of the last chunk could be used to choose a bit rate for the next chunk. However, this kind of assumption can make inaccurate estimations of future network conditions, and, in turn, result in the incorrect selection of chunk target bit rates, especially in networks with highly dynamic conditions. This is because simple network models such as Markovian dynamics [8] may fail to capture the intricacies of real dynamic networks. For this reason, we consider an alternative approach based on a deep reinforcement learning algorithm utilizing a neural network rather than explicit state-action tables [4]. A3C can incorporate a large amount of history information into its state space to overcome the limitations of tabular-based methods, and in turn it can predict the future. A3C has been successfully applied to several learning problems [9], [10], [11].


Page 52 / 126

Asynchronous Advantage Actor-Critic (A3C)

Unlike conventional approaches, such as Deep Q-Networks (DQN), where a single agent implemented by a single neural network interacts with a single environment, A3C can be trained via parallel agents asynchronously and thus learns more efficiently and quickly. Each learning agent sees a different set of input parameters during training phase. The agents then continually send their local information (e.g. state, action, and reward) to the central agent which aggregates all the information and updates a single learning model. This updated model will then be sent back to the agents and the learning process continues in this way. Note that the central agent populates the updated model to each agent asynchronously. The parallelism of A3C reduces the training time by several order of magnitudes compared to conventional techniques like DQN, mainly because it allows each agent to be exposed to a completely new environment. Because the experience of each agent is independent of another agent, the overall learning (by central agent) becomes extremely diverse. This is also known to generate better results.

A3C runs two separate neural networks simultaneously: actor and critic networks. The main responsibility of the critic is to give feedback to the actor in order to refine the policy function of the actor network so that a better action can be selected given a particular state. Note that the critic network is only used during the training phase to improve the actor network policy (i.e. in the post-training phase only the actor network is required to execute the desired behavior in the real environment). Figure 16: A3C demonstrates the high-level interaction between the actor and critic networks. Both actor and critic networks receive the same set of states from the environment. The actor network generates an action (at) based on a policy function (at, st) which is the probability that an action (at) is taken at a particular state (st). The critic network produces the value function (st) which estimates (from empirically observed rewards) the expected total reward starting at state (st) for a particular policy. In this way the critic network helps the actor network to have better understanding of the consequences of choosing an action at a particular state.

Note should also be taken that A3C is designed to run on a machine with multiple CPUs (i.e. one can train an A3C model reasonably quickly without having expensive GPUs). In other words, each agent of A3C can use both CPU and GPU.

Figure 16: A3C


Page 53 / 126

Training methodology

The most crucial part of any machine learning algorithm (either supervised or unsupervised learning techniques) is the training phase. If a machine learning algorithm is not sufficiently trained it may react to unseen conditions very poorly, making inappropriate decisions.

Unlike supervised learning algorithms that require diverse labelled datasets, a reinforcement learning algorithm requires to be exposed to diverse environments in order to effectively learn from its mistakes before it can be usable in real environments. However, in practice, exposing an RL algorithm to a large number of conditions can be a time-consuming process, although it could be considered more accurate compared to simulated/emulated approaches. For example, in the “Remote Production” scenario of UC2, the SS-CNO per-session RL algorithm (i.e. RL agent) should explore a live video streaming environment, ideally using an actual live video streaming client to learn what target bitrate (compression level) to employ for video chunks under a wide range of network conditions in order to improve the quality of experience (QoE) of the video receiver. However, creating such conditions requires the receiver (i.e. the broadcaster site at RTVE in this case) to continuously watch live videos, streaming from journalists under a wide variety of network conditions that introduce different levels of congestion, packet latency and loss. Alternatively, this can be achieved more simply if the receiver watches pre-recorded videos rather than live streaming from journalists. Unfortunately, both approaches require a large number of training sessions to be set-up and executed with QoE being assessed from each case - the QoE being a function of the original video, the selected compression parameters (including target bitrate) and the distortion introduced by the network as the network throughput, latency and loss will be affected by the background traffic over the path from the remote venue to the broadcaster site.

To make this process faster, it may be reasonable to train the RL agent in a simulated environment where the RL agent can be exposed to a wide range of conditions. For example, in this way, the RL agent can explore various video profiles, network conditions and target bitrates that in practice may require several weeks/months of training.

Choosing an appropriate simulating platform is also crucial. A packet level simulator like Network Simulator-3 (NS3) is an ideal choice because it can model various network environments more accurately compared to flow-level or chunk-level [9] simulators. However, NS3 is slow compared to a simple chunk-level simulation used in [9] that is particularly designed to model video streaming contents. It has been argued that although no simulator can capture all artefacts of real environment, if a RL algorithm is sufficiently trained, with even a simple simulator, it can operate well in real systems [9]. We therefore design and implement a new and simple chuck-based simulator to train the SS-CNO RL algorithm for the “Remote Production” scenario in UC2. Our simulator source code is available publicly [15].

Offline vs online learning

There are several ways to deploy an RL algorithm in real systems. The first approach is referred to as “offline learning” where the algorithm is trained in simulated/emulated environments, and then it is deployed in the real environment with no more learning. In this


Page 54 / 126

way, if an RL algorithm is well-trained it can perform well when deployed in the real system. A drawback of this approach is that a simulation is not identical to the real-world environment. On the one, a simulation will not model all aspects of a real system behavior 100% accurately resulting in the algorithm being trained against a less realistic model. On the other hand, a simulation environment is more controlled and could result in some metrics being readily available that could be difficult to extract in a real-world system. We followed this approach for training our neural network model for remote and smart production use cases in the 5G-MEDIA project.

The alternative approach is an online one where the RL algorithm explores actions in a real-world setting and uses the reward function obtained from its actions to update its policy. In machine learning literature this is referred to as “online learning”. A drawback of online learning is that it can be time consuming to set-up and execute a wide range of training scenarios. As mentioned in [9], offline simulation-based training can experience 100 hours of online training in only 10 minutes. It also has the disadvantage that the exploration of the action-space of the RL algorithm results in real-world, rather than simulated, behavior.

Another variation of online learning is a hybrid of the above two techniques whereby the algorithm is first trained in a simulated environment the algorithm continues to learn in the operational environment.

Optimize compression levels

The vCompression engine (vCE) is mainly responsible for compressing and encoding the signal to a lower bitrate stream that is well-suited for transferring and distributing over the Internet. The compression levels at a vCompression engine VNF should be configured dynamically by SS-CNO based on the perceived broadcaster’s quality of experience (QoE). Currently SS-CNO adjusts compression level at vCE by changing the target bitrate of a video session. Note that each completion level is related to a particular target bitrate. In this project we considered 10 different target bitrates ranging from 5 Mbps to 50 Mbps (cf. Table 6 below).


Page 55 / 126

Table 6: Lookup table for H.264 Compression Levels

Compression Level for H.264 Bitrate in Mbps

Compression Level 1 5 Mbps










Adjusting an appropriate compression scheme would significantly improve the overall QoE perceived by a user (in this case a broadcaster). For example, if the network condition between the vCE VNF and the broadcaster site is deteriorated due to network congestion for instance, it may be appropriate to adopt a different compression level (or target bitrate) which is more suited for the present network condition.

Generally, a highly compressed video requires less network bandwidth to be streamed compared to a low compressed video. Thus, if the network bandwidth suddenly becomes limited, it might be reasonable to adopt a high compression scheme (i.e. setting a low target bitrate) at vCE. Switching to a compression level that highly compresses the video content may also negatively affect the quality of video, and, in turn, the QoE perceived by users. The core responsibility of SS-CNO here is to achieve a right trade-off. Overall, SS-CNO aims to make a right decision to achieve the best QoE for a broadcaster based on current network condition and available computational resources that could be allocated to the vCompression VNF.

Reinforcement learning algorithm outline

Figure 17: CNO-RL shows the high-level interaction between components in the environment (right box) and the RL agent (left box) for UC2 both scenarios. The per-session RL algorithm of SS-CNO (i.e., the RL agent in this case) receives a set of inputs from the environment where some of which will be used as state inputs into the RL algorithm (see the dotted boxes at the bottom left of the RL agent). Currently, in UC2, SS-CNO considers the following as state inputs to the neural network: (1) history of available capacity (e.g. past eight samples); (2) history of loss rate (e.g. past eight samples) and (3) last selected target bitrate of a video stream.

The output of the RL algorithm is an action that can be a target bitrate for a live video streaming passing through the vCompression engine VNF (in case of the remote production) or switching between CPU/GPU for the Speech-To-Text VNF (in case of the mobile


Page 56 / 126

contribution). The impact of the selected action is then returned to the SS-CNO from the environment through a reward function which may also consider the QoE score parameter. The QoE score could be an integer valued between 0 and 5 and is calculated at a QoE probe by a set of metrics that are mainly tailored for examining user’s quality of experience. Combining QoE metrics with other metrics, such as cost of computational resources and network related, is formed a reward function, and in turn a reward. The aim of an RL agent is to adjust the weight of neurons in its neural network in order to maximize this cumulative discounted reward.

Figure 17: CNO-RL

In the “Remote Production” scenario of UC2, the RL agent is trained its neural network model offline (by our custom written simulator in Python [12]) in which the reward function is emulated. This way once a trained offline model is deployed in a real system it is not essential for SS-CNO to receive live feedback from the environment (i.e. SS-CNO can operate without a QoE probe). The reward function we have defined for UC2 “Remote Production” use case has the following formula:

Reward = Bitrate – α (Lossrate) – β (Smoothness)

Bitrate in the above formula corresponds to the last target bitrate that has been selected by SS-CNO/RL agent; Alpha (α) is a weight factor for the loss rate parameter. The higher alpha (α) is, the less tolerable SS-CNO becomes against packet losses. Currently we set the value of this parameter empirically between 100 and 500. Smoothness (β) ensures that SS-CNO does not fluctuate rapidly and largely from one bitrate to another. In other words, it is considered to be a negative aspect in the reward function if SS-CNO rapidly changes the compression


Page 57 / 126

level of a video from a very high level to a very low level as this may damage the user’s perceived quality of experience. If SS-CNO predicts that there would be lack of network resources in near future it begins (in advance) decreasing the compression level (i.e. target bitrate) at vCE smoothly. Overall, the reward function defines the objectives of the RL agent, which is maximizing the value of positive elements/parameters and minimizing the value of negative elements/parameters in the reward function.

As illustrated in Figure 17: CNO-RL, there are several different entities in the environment which may potentially send information to SS-CNO via a Kafka bus: vCompression engine VNF and/or Speech-To-Text VNF, network link, QoE probe VNF, and video client at RTVE (broadcast site).

The vCE sends information related to the video quality profile that includes a set of acceptable compression levels/target bitrate. The video quality profile basically defines a range of compression levels/target bitrate with the fixed upper and lower bound. SS-CNO then ought to select a compression level within this predefined range. This range could be different for each video, and it mainly depends on the nature of the video and customer quality preferences. For example, a high-resolution high-motion video contents might be a better match with less compressed options (i.e. with high target bitrate) compared to a low-resolution low-motion video content. This is due to the fact it is generally difficult to compress high-resolution high-motion videos given that they typically produce less redundant frames.

The monitoring system (e.g. Ceilometer) collects metrics such as available computational resources from the machine hosting the vCompression engine VNF (e.g. available CPU/GPU and memory usage). In reality, network related measurements such as available capacity, latency, and packet loss rate can also be collected by monitoring systems from virtual and/or real network links between the vCompression engine VNF and the QoE probe.

The QoE probe is mainly responsible for calculating the QoE score as it has been just briefly discussed. It calculates the QoE score by considering video related metrics such as bitrate, smoothness, rebuffering, compression artefact (e.g. image blockiness and blur). SS-CNO can also consider the QoE score in its reward function to calculate a reward alongside other parameters such as cost of computational resources. The QoE probe can be located at RTVE (broadcast site), so it can easily receive some video related measurements from the video client. It is also possible to place this VNF at the network edge. This way a 5G-MEDIA service provider ensures that it is optimizing resources to achieve an expected QoE for each user between servers/compute nodes and the network edge. Alternatively, SS-CNO can consider multiple QoE probes located at different locations. For example, one at the network edge (e.g. at a Telefónica edge), and another one at the receiver site (e.g. at RTVE). Obviously, the former approach provides better outcome compared to others as it allows CNO to have full measurement end-to-end.

Finally, the monitoring systems periodically sends relevant measurements to CNO (both SS-CNO and O-CNO). The interval for sending measurements dictates how fast CNO can react to changes in underlying network condition. The lower this interval is the CNO has better reaction time.


Page 58 / 126

SS-CNO arbitrator algorithm

The CNO hierarchical architecture is discussed in detail in D2.4 [13]. Here we focus on the SS-CNO arbitrator algorithm that is located at the middle part of the hierarchy (cf. Figure 18), but not the bottom part that is related to the SS-CNO RL algorithm.

SS-CNO mainly optimizes the allocated/planned resources by Overarching-CNO (O-CNO) for each service. Thus, each service potentially needs a tailored SS-CNO, which is designed according to the service specific requirements, constraints, and service level agreement (SLA).

The arbitrator algorithm of the SS-CNO is similar to the one in the O-CNO (cf. D2.4 [13]). This is different than the machine learning algorithm discussed above. The main responsibilities of the arbitrator algorithm of SS-CNO are to optimize resources across multiple instances and/or sessions, and to ensure application fairness between them. Note that, the arbitrator algorithm of O-CNO is in charge of dynamically distribute resources (e.g. GPUs, CPUs, Bandwidth, etc.) between services in need from a shared of pool resources, ensuring application fairness.

In summary, O-CNO dynamically allocates resources to each service (SS-CNO) and each SS-CNO can further dynamically distribute/adjust its available resources across its instances and/or sessions, and finally within each session a machine learning algorithm may conduct particular task/optimisation with the goal of meeting application requirements and constraints.

There are several ways the SS-CNO arbitrator algorithm can be designed to allocate network bandwidth to its users/sessions. In this document we will explore two different approaches both of which can potentially be deployed (emulated) for the “Remote Production” use case in our small testbed at the Telefonica network in Madrid.

Figure 18: CNO hierarchical architecture


Page 59 / 126

Arbitrator with soft bandwidth allocation

In this model the SS-CNO arbitrator algorithm tries to allocate bandwidth to each user/session according to an agreed service level agreement (SLA). If an SLA of a particular user/application dictates that SS-CNO should provide a particular set of resources (notably network bandwidth) then SS-CNO will try to meet such requirements/constraints. With this model, the SS-CNO arbitrator function is not required to reserve bandwidth for users by coordinating with underlying network devices such as existing switches/routers (both virtual and real network devices).

SS-CNO is aware of network condition, active sessions and their requirements/constraints, therefore it should be able to split the available resources across all of them with objectives of providing the minimum quality of experience (QoE) of each user while maximizing the total number of users sharing the same resources.

SS-CNO may allocate more resources than the minimum QoE of a user/session is required when there are spare resources. An upper limit to this allocation may be followed by SS-CNO according to a policy devised by the service provider or requested by users.

To get a better sense of this bandwidth allocation model, let us explore the behavior of the arbitrator algorithm of the SS-CNO in remote and smart production scenario which is one of the main parts of this deliverable, and, in turn, this project. Each video stream of a broadcaster is bundled with a quality profile which includes a set of compression levels (or target bitrate). This implies that each quality profile (e.g. low, standard, high/premium) comprises a bandwidth range with a lower and an upper bound.

For example, for a premium quality profile session SS-CNO should allocate a bandwidth with lower bound of 15 Mbps and upper bound of 40 Mbps. This implies that the minimum target bitrate for this video session should not be lower than 15 Mbps and higher than 40 Mbps. SS-CNO actively tries to allocate network bandwidth within this range according to underlying network conditions. Note should be taken that SS-CNO may not allocate more bandwidth than 40 Mbps in this case because it is possible that the content generator of this video session (i.e. FFmpeg) may not be able to utilize more than the maximum required bandwidth stated in the video quality profile. For example, allocating 50 Mbps bandwidth for a low-quality low motion video that only needs 10 Mbps is not effective in improving the QoE of its recipient.

Furthermore, if SS-CNO finds itself in a difficult condition to maintain the minimum QoE of its users/sessions due to lack of network resources it may follow a particular policy to reduce bandwidth of each user in a way that the QoE of users are not exceedingly damaged and the stream continuity is preserved (without any artefact). Alternatively, the arbitrator component of the SS-CNO can ask the arbitrator component of the O-CNO to provide extra capacity. Thus, SS-CNO can gracefully deal with sudden surges in number of sessions and/or service demand. Typically, this allocation is on a temporary basis and it should be released to O-CNO once the unexpected demand returns back to normal and predicted level. SS-CNO also considers scenarios where it reduces bandwidth from low/standard users before high/premium ones. We follow the latter two policies stated for the final demo of the “Remote Production” use case.


Page 60 / 126

SS-CNO may also consider a particular policy for distributing spare network bandwidth across its users/sessions based on their priorities (i.e. quality profile). Note that by spare we mean a condition in which SS-CNO have already satisfied the minimum QoE of all users/sessions, and yet there are spare resources (e.g. network bandwidth) to be distributed across them. In such conditions, SS-CNO may consider allocating more resources to users with the premium profile compared to users with the standard and low profiles. This approach may result in a higher revenue if we assume that the premium users are more expensive than other users. We also adopt this policy for the final demo of the “Remote Production” use case.

Arbitrator with hard bandwidth allocation

This model is similar to the soft bandwidth allocation scheme. But the arbitrator function of the SS-CNO reserves bandwidth by help of network elements (e.g. underlying network switches and routers). In this way, a set of resources (both computational and network) will be allocated to a user/session regardless of whether that user/session is capable of using it or not. This approach may only be appealing for users/sessions with highly strict QoE requirements.

Given that the bandwidth is reserved by the network in this model it is less likely that the service provider fails to deliver the minimum QoE of a user/session. But yet penalty policy can still be considered in case the network itself actually fails to do so.

Note that another variant of this scheme could be thought of as a model in which SS-CNO reserves bandwidth for the maximum QoE of a user. Unfortunately, with this scheme we do not need SS-CNO, and thus we do not consider such scenarios.

3.2.3 Quality of Experience (QoE) Data Gathering for billing model

Over the “Remote Production” scenario, a QoE-based billing model is defined. In order to do so, two vProbes are deployed for each instance of the “Remote Production” scenario, understanding by instance a full workflow with an MPE, vCE and vProbes. The scenario is presented in Figure 19. The model obtains data from vProbes before and after the public link, obtaining the necessary data for the production of the billing model depicted in D4.2, Section 5.2 [3].


Page 61 / 126

Figure 19: Billing Data workflow

For the UC2, the data produced was used for the cost analysis/billing, as described in the D2.4 [13], and to help define the best QoE ranges for standard and premium users for a possible simple commercial offering shaping.

Thanks to a specific report about QoE ranges, it has been possible to measure the benefits of applying the CNO optimisation to a scenario with one standard and one premium user.

Figure 20: QoE range comparison

In the Figure 20: QoE range comparison above, it is clearly shown how the CNO lowers the percentage of time the UC2 sessions are run with lower QoE ranges (1-2 and 2-3 have lower percentage when using CNO), while at the same time it increases the percentages for higher QoE ranges (3-4 and 4-5 are higher with CNO). These values were taken using the measured QoE from the vProbes on the end-user’s devices.

Another interesting report is about the values to be used to define the standard and the premium users QoE ranges, with CNO activated, that could be used as an initial assessment of the best commercial offering based on historical data.

The initial assumption to use lower-higher thresholds for standard users as 2-3 and for premium users as 3-4 has been corrected using historical data; the ranges reflect the expected


Page 62 / 126

QoE percentages that one service provider is expected to host given its resource availability when using the CNO optimisation. This of course has strong dependency on the available resources and, most of all, the mean properties of the services provided, such as sports vs interviews, with very different entropy values.

For example, for premium users, the initial 3-4 range was not efficient, with a high percentage of service provisioning above the upper QoE level (around 50% of the time). A more efficient allocation, assuming similar video content (e.g. similar entropy values), would be with range 3.0-4.5. The Figure below shows such difference.

Figure 21: QoE range 3.0-4.0 (left table) and QoE range 3.0-4.5 (right table) for premium users

Similarly, for standard users, the initial range 2-3 was corrected to 2.0-4.0, as shown in the Figure below; even if still inefficient (46% above upper threshold), this reflects the current availability of resources with respect to the services used for the demo.

Figure 22: QoE range 2.0-3.0 (left table) and QoE range 2.0-4.0 (right table) for standard users

Using smaller QoE ranges and using the video content entropy as an additional input, it is possible to tailor specific services and target the end-users needs.

To conclude, the availability of historical data through the Grafana console opens up to additional analysis, such as, for example, if the quality profile requested (and paid) by the users (tracked by the session IDs) is suitable with the current video service currently provided (e.g. based on its entropy).

3.2.4 Serverless with Function as a Service (FaaS)

Serverless computing29 has become a very popular cloud computing paradigm over the last few years. However, prior to the 5G-MEDIA project, it has not been applied to media intensive applications. In a typical serverless framework, the functions do not communicate with each other over the network, they are not configurable after they are being launched and they do not support specialized hardware, such as GPUs. Also, orchestration of serverless functions is not covered by the existing ETSI MANO standards. Our approach to integrating serverless computing with 5G-MEDIA is described in D3.2 [14], D2.4 [13] and D3.4 [15].

29 Source: https://en.wikipedia.org/wiki/Serverless_computing

https://en.wikipedia.org/wiki/Serverless_computing


Page 63 / 126

In this document, we only briefly describe what serverless computing is and why it is relevant to the “Mobile Contribution” scenario.

Serverless computing is a cloud computing paradigm that simplifies deployment of code to production. A typical serverless framework provides several pre-built runtime environments (e.g., Docker container images) to support different programming languages. In the function-as-a-service (FaaS) model, the unit of execution is a function. A developer creates a function by using some SDK tools accompanying the framework. At the FaaS provider side, the function creation boils down to storing metadata about the function in some database. This metadata includes the code itself (or an executable image in case of “black box” actions for which no standard prebuilt language runtime exists) and some other important information for runtime, such as intended memory consumption, tags, name of the function, etc.

When a function is invoked, the serverless framework injects the code into the stem runtime (i.e., a non-specialized pre-built runtime) via an HTTP (or other protocol) endpoint, which is part of the base image of the prebuilt runtime. In case of a blackbox function (i.e., the function that is not written in one of the standard languages supported out of the box), the metadata of the function points to the executable (e.g., a Docker image) that is being executed on a backend container orchestrator (e.g., Kubernetes).

The resources (i.e., the servers) for executing functions are allocated and managed by the FaaS provider (i.e., by the SVP provider). Functions are being executed in response to events (an invocation request by a user is just one type of event). The FaaS provider charges the owner of a function on a pay-as-you go basis. In this sense, FaaS is the first cloud computing model that truly delivers on the initial promise of the cloud computing to allow customers to pay only for cloud resources that they use. To that end, FaaS providers bill the customers for GB/s (typically, at a resolution of 100 ms).

It should be noted that FaaS is an extreme form of renting capacity on demand. There is absolutely no commitment involved on the customer side. While being a very attractive option for the customer, this complicates capacity planning for the FaaS provider and, therefore, makes it more difficult to meet the target profit margins. To simplify scheduling – and therefore to simplify capacity planning – serverless frameworks usually impose a hard limit on functions’ lifetime. Rather than being generally distributed, the lifetime of a function is limited by 10-15 minutes. This is an important feature of FaaS that makes it economically feasible and it’s an important feature of the model rather than an incidental one. While the 10-15 minutes of lifetime can be relaxed and configured, in general, it is against the spirit of FaaS to make function lifetimes generally distributed.

These features of FaaS: (a) events orientation, (b) pay-as-you go, and (c) limited lifetime of the functions make this model suitable for relatively short-lived, irregular workloads triggered by events.

“Mobile Contribution” scenario is a clear-cut example of this type of workload. The identification of the workload as a candidate for FaaS based implementation has been done early in the project as described in Deliverable D2.2 [2] and D3.2 [14].

For completeness, we recap the UC2.b (mobile contribution) as follows. A mobile journalist witnesses some newsworthy event. The journalist’s goal is to produce a contribution as fast as possible and make it accessible ahead of the competition to the broadcaster she works for.


Page 64 / 126

To that end, the mobile journalist uses a special application on her mobile phone. The application captures the stream and interacts with 5G-MEDIA SVP which helps to determine which 5G edge to use to stream the contribution to. The edge selection is based on the functionality that a mobile journalist selects for this contribution production (in our prototype a journalist can select real time caption production and image recognition functionality) and the target endpoint where the final contribution should be streamed to.

An initial selection of a 5G-MEDIA edge for the “Mobile Contribution” scenario is based simply on the GPS coordinates of the journalist with the geographically closest edge being the preferred selection. This initial selection is being done by the application itself. However, this initial choice can be overridden by the CNO of the 5G-MEDIA SVP that might suggest a different edge based on such aspects as overall SVP optimisation, matching between the functions selected by the mobile journalist and the special hardware (e.g., GPU) availability, network bandwidth availability, network latency, etc.

There are cognitive services, such as caption production (vSpeech) and face recognition (vDetection) that can be spawned on demand on a per session basis and there are two possible targets where the contribution can be sent to: (a) safe-local and (b) safe-remote. The two endpoints are identical except the former is installed at a 5G-MEDIA edge and the latter is installed at the broadcaster’s premises. The goal of the safe-local endpoint is to allow creating contributions at the edge, storing them temporarily, and serving them to the broadcaster on demand. The goal of the safe-remote endpoint is to receive the contribution, store it at the broadcaster premises and allow the broadcaster to fetch it for pre-processing and including it in a program.

Based on a combination of the cognitive functions and an end-point selected by the journalist for a specific session, the VNFs for speech recognition (vSpeech) and image/face recognition (vDetection) are being spawned on demand via FaaS and orchestrated to connect to a correct end-point (either safe-local or safe-remote). The VNF creation is being performed via OSM that uses 5G-MEDIA FaaS VIM as described in Deliverable D3.4 [15] and the event driven orchestration is being performed by the Serverless Orchestrator collaboratively with OSM as described in Deliverable D2.4 [13].

Furthermore, because the raw media stream should be split into two identical streams: one for caption production and one for image recognition, an additional FaaS based function, called vSplitter is spawned on-demand on a per session basis and is orchestrated via the Serverless Orchestrator to connect its outputs to vSpeech and vDetection. We describe this in more detail in Section 4.3.1.1 of this document.


Page 65 / 126

4 Validation

4.1 Testbeds

The design of the 5G-MEDIA testbed infrastructure was developed on the base of the evaluation of the various infrastructure requirements originated by the definition of the three 5GMEDIA use cases, cross-checked with the availability of specific resources at the laboratories of the testbed owners National Center for Scientific Research “Demokritos” (NCSRD), Telefónica Investigacion y Desarrollo SA (TID) and Engineering – Ingegneria Informatica SPA (ENG). In addition, specific hardware and software configurations and procurement of additional elements (e.g. GPUs or dedicated OpenStack-based infrastructures) have been performed to make the deployment of the planned scenarios described in the following Sections 4.2 Deployments of the “Remote Production” Scenario and Section 4.3 Deployment of the “Mobile Contribution” Scenario.

Due to the specific requirements of UC2 regarding computing (incl. hardware acceleration via GPU), storage and network resources for UC2 as testbeds in the edge TIDs and NCSRDs testbeds were chosen. The current testing locations were also motivated by the proximity of the partners to the specific testbed facilities.

4.1.1 Engineering Ingegneria Informatica (ENG) Testbed

The ENG (Engineering) testbed hosts the 5G-MEDIA Platform and is the Central Cloud in the 5G-MEDIA project.

ENG Testbed topology

As described in Deliverable D6.1 [1], the Engineering testbed is composed by four dedicated

servers that provide a relevant validation environment based on Openstack Ocata30 where

several combinations of edge/core deployments can be configured. The Table 7 below

provides a brief description for each of the Engineering testbed component.

Table 7 Engineering testbed components

Testbed component Hardware capabilities Description

Controller Node HP Proliant DL 580 G5 CPU: 16 RAM: 72 GB

It is the Openstack controller node where all of the Openstack services are deployed, including Horizon.

30 Source: https://www.openstack.org/software/ocata/


Page 66 / 126

Testbed component Hardware capabilities Description

Compute Nodes (x3) 3x HP Proliant DL 585 G5 CPU: 16 cores (each) RAM: 80GB (each)

They are the Openstack compute nodes that can be configured in several availability zones to model edge/core deployments. They host all of the virtualized media applications (i.e. VNFs) as well as all of the 5G-MEDIA Platform components

In total, the Engineering testbed resources available for 5G-MEDIA can be summarized as:

• 48 CPU Cores (192 vCPU, allocation ratio is 4)

• 228 GB of RAM (same vRAM, allocation ratio is 1)

• 1 TB of available block storage, 300 GB for volumes, 63 GB for images and snapshots

Use of the ENG testbed in 5G-MEDIA

From a resource perspective, including the testbed configuration, the ENG testbed has not received major updates with respect to what described in the Deliverable D6.1 [1].

The only few advancement about the platform services are that all the 5G-MEDIA services are now hosted on the dedicated OpenStack Ocata31 instance except for two services still deployed on OpenStack Kilo32: the CNO services, because they need more recent CPUs run TensorFlow33 algorithms, and the Jenkins34 slave node has a dedicated network connection to the CI/CD services provided by ENG.

Some platform services have been updated: the central OSM instance, based on the release 5.0.5, now supports OpenWhisk35, OpenNebula36 and the OpenID Connect37 integration for the 5G-MEDIA Catalogue. Furthermore, the MAPE services has been updated such as the Translator, the hierarchical CNO and the Executor. In the context of MAPE, new services have been introduced including the logging server (Graylog stack38) that keeps the logs of the MAPE services, the Recommendation service that gives feedback on the SDK regarding the VNFs behavior in the production environment and the Issue Tracking Server (Redmine) that stores all the reported recommendations from the SVP towards the SDK. The Authentication,

31 Source: https://www.openstack.org/software/ocata/

32 Source: https://www.openstack.org/software/kilo/

33 Source: https://www.tensorflow.org/

34 Source: https://jenkins.io/

35 Source: https://openwhisk.apache.org/

36 Source: https://opennebula.org/

37 Source: https://openid.net/connect/

38 Source: https://docs.graylog.org/en/3.1/pages/ideas_explained.html

https://www.openstack.org/software/ocata/

https://www.openstack.org/software/kilo/

https://www.tensorflow.org/

https://jenkins.io/

https://openwhisk.apache.org/

https://opennebula.org/

https://openid.net/connect/

https://docs.graylog.org/en/3.1/pages/ideas_explained.html


Page 67 / 126

Authorization, Accounting (AAA) portal publishes the cost analysis/billing Grafana39 reports to measure the benefits of the CNO in terms of better Quality of Experience (QoE) and lower resource consumption. Both these activities are described in detail in the Deliverable D4.1 [16] and D2.4 [13].

The main role of the ENG testbed in the UC2 has been to host the central OSM and CNO services for the orchestration.

4.1.2 Hellenic Telecommunications Organization (OTE) Testbed

The OTE testbed is configured in the research labs for the purposes to be connected with the NCSRD lab in order to provide a second infrastructure that could be used for the purposes of testing multi-UC scenarios especially for UC1 and UC2. The infrastructure is composed of two servers for hosting the core programs, VIM, O.W., Catalogue, etc. and three H/Y with GPUs for hosting the VNF clients.

OTE labs infrastructure

Additional resources are offered by the OTE testbed which is connected with the NCSRD testbed with a 1 Gbps/10 Gbps line as depicted in Figure 23.

Figure 23: OTE labs infrastructure

The overview of the components of the testbed is shown in the table below:

Table 8: Components overview of the OTE testbed

System CPU RAM GPU

Power edge R640 server Intel(R) Xeon Gold 6138 2.0G, 20C/40T

384GB Quadro M4000

Power edge R640 server Intel(R) Xeon Gold 6138 2.0G, 20C/40T

384GB Quadro M4000

Alienware Aurora R7 Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

32GB GeForce GTX 1080 Ti

39 Source: https://grafana.com/

https://grafana.com/


Page 68 / 126

System CPU RAM GPU





The NFV infrastructure is using the Open Stack ‘’Queens’’ version as a cloud operating system that is deployed on top of two Dell servers. The set of the additional GPU resources are also available to enable UC1 service scaling. The management of the testbed is done through the IP 195.167.80.32/27.

4.1.3 Telefónica (TID) Testbed

For the purpose of this project TID provides a data center at the edge where the OnLife platform is integrated. One is located in the city center of Madrid at Peñuelas Central Office, which is pre-commercial implementation a so houses other commercial equipment of Telefónica España that provides live connectivity to customers. Additionally, Telefónica TID provide a lab environment, safer for testing, in Boecillo Lab in Valladolid city in Telefónica premises.

Telefónica Testbed Topology

OnLife is an innovation project of Telefónica explained in Deliverable D2.4 [13] and D2.3 [17] - which is integrated in the central office of Peñuelas aiming to provide services to several users as part of the value validation of edge computing for real customers. Additionally, there is a replicated environment in the lab of Boecillo with same architecture characteristics.

With OnLife, Telefónica provides a testbed environment to validate the VNFs and their implementation as well as a real customer scenario with similar conditions as a real venue site.

The OnLife architecture components are explained in Deliverable D2.3 , Section 2.2.1 [17] “OnLife Platform” and with further information in the Deliverable D6.1 [1], Figure 24 shows an overview of the system where the servers function as the hosts of the VNFs.


Page 69 / 126

Figure 24: OnLife Infrastructure

All network applications are similar to those in CORD40 (Central Office Re-architected as a Datacenter) but have been redesigned to work with the Central Telefónica de procesamiento de datos (CTpd) architecture. Service management is coordinated by OneFlow41 which rely on OpenNebula and Open Network Operating System (ONOS)42 respectively for infrastructure and network management as shown in Figure 25

Figure 25: Logical architecture and components of the CTpd project

40 Source: https://www.opennetworking.org/cord/

41Source: https://docs.opennebula.io/5.8/advanced_components/application_flow_and_auto-scaling/appflow_use_cli.html

42 Source: https://www.opennetworking.org/onos/

https://www.opennetworking.org/cord/

https://docs.opennebula.io/5.8/advanced_components/application_flow_and_auto-scaling/appflow_use_cli.html

https://docs.opennebula.io/5.8/advanced_components/application_flow_and_auto-scaling/appflow_use_cli.html

https://www.opennetworking.org/onos/


Page 70 / 126

Use of the Telefónica testbeds in 5G-MEDIA

Telefónica TID testbed are composed by two environments, the lab in Boecillo (Valladolid), and another closer to customer (pre-commercial environment) in central office of Peñuelas in Madrid. Both have the same architecture, and the 5G-Media VMs have being instantiated in both places, depending on the needs of the project (Peñuelas, for Pilots with customers, and Boecillo for further evolutions and testing). The central OSM instantiated in Engineering will interact with the infrastructure already described through the OpenNebula connector into the edge computing infrastructure at the central office.

Figure 26. Diagram of VNF deployment in TID testbed and connector to OSM

Figure 27: Diagram of how the 5G-Media universe are implemented over the infrastructure


Page 71 / 126

4.1.4 National Centre for Scientific Research “DEMOKRITOS” (NCSRD) Testbed

NCSRD Testbed Topology: SDN Spine - Leaf Network

The WAN backbone network on the NCSRD site is composed by several physical SDN Switches forming a spine – leaf architecture. All the switches are OpenFlow enabled and support OpenFlow protocol version 1.3. They are controlled by a centralized OpenDayLight (ODL) SDN controller, which is responsible for installing forwarding rules (flows) on each switch.

Figure 28 presents the NCSRD spine-leaf network topology of Site 1 of Athens platform. Every lower-tier switch (leaf layer) is connected to each of the top-tier switches (spine layer) in a full-mesh topology. The leaf layer consists of access switches that connect to any physical or virtual device located on the NCSDR site, while the spine layer is the backbone of the network and is responsible for interconnecting all leaf switches and establish connectivity with the Internet and the other sites of the 5G Media platform. SDN backbone network can offer isolation and QoS policies for each network slice instantiated on the platform.

Figure 28: NCSRD spine – leaf network topology

Use of the NCSRD testbed in 5G-MEDIA: Core Network Gateway

An Integrated Services Router (ISR) by Cisco, alongside a Firewall (i.e. Cisco ASA 5510), are used for the realization of the core network gateway on the NCSRD site. Through these nodes the NCSRD core network is connected to the Internet, via the access provided by Greek Academic network provider (GRNET). Moreover, it is also used as the endpoint for the interconnection between NCSRD and OTE sites using the QinQ Ethernet transport. Finally, a VPN concentrator server allows remote users to connect to the NCSRD testbed via VPN offering all the standard tunnel types (i.e. OpenVPN, IPSec, Anyconnect).

The core Data-Center Domain is physically located at the NCSRD campus, at the city of Athens. The DC domain is implementing two different services. Firstly, it offers the resources for the deployment of 5G-MEDIA virtualized components (OSM, OpenWhisk). Secondly it offers computing resources to be used by the NFV Orchestrator for the deployment of Network Services and VNFs (i.e. NFVI-PoP). Furthermore, the Core DC domain also supports Kubernetes cluster either for Cloud Native NFV service deployment (i.e. container based VNFs) or for other types of applications related to 5G-MEDIA.

The overview of the testbed infrastructure is presented in detail in the table below:


Page 72 / 126

Table 9: Overview of the NCSRD testbed infrastructure

CPU RAM GPU

HP ProLiant DL380 Gen9

Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50GHz

100 GB Generic

HP Z230 Tower Workstation

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

50 GB Quadro M4000

Alienware Aurora R7

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

32 GB GeForce GTX 1080 Ti

Alienware Aurora R7

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

32 GB GeForce GTX 1080 Ti

Figure 29: NCSRD Data Center

The first NFV Infrastructure uses OpenStack “Queens” version as cloud Operating System and NFVI enabler. Currently, it is deployed over two physical HP servers. The dedicated management network for this NFVI is 10.100.176.0/24. Components from the MANO layers are deployed as VMs on this cloud.

The second 5G-MEDIA Infrastructure is a GPU - Kubernetes Cluster, enabling container-based deployment of VNFs. The management and external network of the cluster is 10.30.2.0/24.


Page 73 / 126

4.2 Deployments of the “Remote Production” Scenario

The scenarios in the “Remote Production” branch are aimed to complement or be an alternative to the today common broadcast productions of events, which are characterized by large teams, and one or several OB-Trucks or OB-Vans required on location (cf. Mobile Unit examples 1, 2a and 2b in Table 1: Overview of the different mobile units). Due to this scenario complexity in realization of the “Remote Production” part starts first. The scenario was put into practice with two point of view. The first viewpoint is from the perspective of a production team, where the aspect of work lays on a single production. The second viewpoint is from the perspective of a broadcaster, where the aspect of work lays on a multi-scenario production. Therefore, the single production scenario was extended. These different point of views were realized as described in the following Sections with the four cornerstones 1) “Remote Production” scenario 1c in an initial version (CS1.1), 2) “Remote Production” scenario 1c with different refinements on VNFs (CS1.2), 3) Proof of Concept of the “Remote Production” with ultra-low latency (CS1.3) and 4) “Remote Production” Multi-Instance-Scenario 1c in one Edge (CS1.4).

4.2.1 “Remote Production” scenario — Initial version (1st Cornerstone, CS1.1)

The initial workflow was realized under the perspective of a production team and focused on a single-production scenario.

Technical Description of the Scenario/Demo Workflow

The scenario was realized in Telefónica’s (TID) testbed OnLife. This testbed is a real customer network in TIDs office at Peñuelas. In the initial workflow first versions of the media-specific VNFs “Media Process Engine” (MPE) (cf. Section 3.1.2 of this document), “vCompression Engine” (vCE) (cf. Section 3.1.3 of this document) and the cognitive service “Speech-2-Text Engine” (vSpeech) (cf. Section 3.1.1.1 of this document) are developed and deployed. All VNF modules run in this implementation on their own virtual machine (VM). The VM deployment and resource allocation is performed by OpenNebula which interfaces with the ONOS SDN Controller. The interfaces among OnLife and OSM have been implemented within 5G-MEDIA. Due to the actuality that most common cameras today have an SDI output, it is necessary to bring the SDI signal to IP. To process the video signal in the 5G-MEDIA NFVI (Telefónica OnLife network), it is fed into a gateway where it is converted from HD-SDI (720p50) to IP. The gateway is a Physical Network Function (PNF) and a vendor solution by Ericsson is used here. The Tandberg/Ericsson Video Processor (cf. Deliverable D4.1 [16], Section 7.3.2.7) is responsible for the conversion from HD-SDI to IP. At the same time, it also encodes the video signal to H.264/MPEG-TS (cf. Figure 30).


Page 74 / 126

Figure 30: Use Case 2, “Remote Production” scenario CS1.1 - Screenshot of Ericsson Video Processor GUI

The work done on these was presented in the 1st Review and is reported more detailed in Deliverable D6.1 [1].

4.2.2 “Remote Production” scenario with SMPTE ST 2110 (2nd Cornerstone, CS1.2)

In the further development, scenario 1c of the “Remote Production” branch was extended to support SMPTE43 (Society of Motion Picture and Television Engineers) ST 211044. Additionally, refinement on the used VNFs and the workflow performance were made. The workflow was connected to MAPE (cf. Section 3.2.1 of this document) of the 5G-MEDIA Platform. This setup was later presented at the 2nd Review.

The SMPTE ST 2110 workflow is a further implementation of CS1.1 and realized under the perspective of a production team and focused on a single-production scenario as well.


The “Remote Production” scenario 1c was set-up in the Telefónica’s testbed OnLife. In the workflow the VNFs “Media Process Engine” (MPE) (cf. Section 3.1.2 of this document), “vCompression Engine” (vCE) (cf. Section 3.1.3 of this document), the Cognitive Service “Speech-2-Text” (vSpeech) (cf. Section 3.1.1.1 of this document), and to make the MPE ST 2110 ready the new “vUnpacker” (cf. Section 3.1.5 of this document) were used, as shown in Figure 31. For the processing of the video signal in the 5G-MEDIA NFVI (Telefónica OnLife network), it is necessary to bridge the SDI signal to IP. This bridging is done by a gateway, that connects the event location with the 5G-MEDIA network using here SMPTE ST 2110 as well.

43 Source: https://www.smpte.org/

44 Source: SMPTE ST 2110 – SMPTE Standard – Professional Media Over Managed IP Networks, 2017

https://www.smpte.org/


Page 75 / 126

This gateway is a Physical Network Function (PNF) and a vendor solution by Nevion (cf. Deliverable D4.2, Section 7.2.1.7 [3] and D4.1, Section 7.3.2.7 [16]).

SMPTE ST 2110 is a professional Media Over Managed IP Networks which is a new suite of standards for uncompressed video transport over IP. It specifies the carriage, synchronization, and description of separate elementary essence streams over IP for real-time production, playout, and other professional media applications.

Figure 31: Overall Architecture of the “Remote Production” scenario (C1.2)

As video source for the demo, a camera BlackMagic Micro Studio Camera 4K and two video players Sony PDW-F1600 (cf. Figure 32) were installed at the Telefónica Peñuelas Lab (OnLife). The camera and the video players send uncompressed HD video signal in 1080i/25 over HD-SDI (Serial Digital Interface, the established broadcast transmission standard) to the gateway, a Nevion Virtuoso IP production Platform. SDI is still predominantly used in broadcasting with lots of equipment like cameras, players, mixers still only providing these in-/outputs, but IP is more and more penetrating the broadcast market with an increasing number of devices supporting basic or advanced functionalities.

Figure 32: Sony PDW-F1600 video player


Page 76 / 126

Uncompressed professional video signals are very demanding in terms of bandwidth. One signal of uncompressed HD video in 1080i/25 needs about 1 Gbps (pure video data rate without audio and ancillary data). To process the video signal in the 5G-MEDIA NFVI (Telefónica OnLife network), it is fed into the Nevion Virtuoso IP Production Platform gateway where it is converted from HD-SDI to IP.

Figure 33: Use Case 2, “Remote Production” scenario CS1.2, - Screenshot of a typical Nevion Virtuoso IP Production Platform GUI

The mentioned gateway supports the new SMPTE ST 2110 video mapping standard as ST 2110 allows the transport of uncompressed media content over IP.

After the signals ran through the gateway it goes to a switch to transform it from 1 GbE copper to 10 GbE fiber to then feed it into the NFVI/OnLife network (for a detailed description of OnLife cf. Deliverable D2.3 [17] and D2.4 [13]). For now, a layer 3 IPv6 network with static IP configuration is used.

After entering the network, the media signals are routed to the Media Process Engine (MPE), which has been a VNF in the first year setup (when handling only compressed audio/video streams) but had to be changed into a PNF because of the significantly higher demand in the network interface capabilities. The MPE itself plays out another video file which is also fed into the MPE’s video switching component. The switching/video mixing between the two signals is performed and the final broadcast stream is produced. This is done on the basis of preview streams that are sent to the studio (RTVE Torrespaña) where the director remotely controls the MPE on the basis of those streams. The final output signal of the MPE is also sent as a preview to the director. For transmission the previews are compressed to low-resolution low-latency video streams in the video switching core of the MPE. It is worth to highlight that the lag from the real time acquisition and the monitoring of this signal from the TV Director has to be as short as possible no more than 20 frames. This is due to the need that the reaction time of the Director to the order received by the camera operator would be as tight in time as possible. So, the delay of such loop has to be minimized as maximum. This is one of the paramount constraints for the success of this project as practical service.


Page 77 / 126

From the MPE, the final broadcast stream is then fed into the vCompression Engine (which is a VNF that is extended for adoption in the 5G-MEDIA project) which is used for compressing the audio video streams for WAN/Internet transfer supporting a much lower bitrate. The compressed output stream of the vCompression Engine is finally routed to the Speech-to-Text Engine (another VNF developed within 5G-MEDIA project) where the audio gets analyzed and a text is extracted. This text is then added as subtitles to the video stream. The final signal with subtitles can then be accessed either via a browser from the broadcaster for further inhouse use as a contribution signal or it can be offered to the public as an online service.

With the connection of TIDs OnLife network to the SVP of the 5G-MEDIA Platform the MAPE loop was closed and makes it possible to use the Cognitive Network Optimizer (CNO) (cf. Section 3.2.2 of this document) support to achieve a better performance and the collection of metrics. These metrics were displayed for the user in two dashboards, the vCE-Dashboard (cf. Figure 34) and the Traffic Manager-Dashboard presented by the CNO (cf. Figure 35).

Figure 34: Overall overview of the vCE-Dashboard

Figure 35: Overall overview of the Traffic Manager-Dashboard


Page 78 / 126

The work done on these was presented in the 2nd Review and reported partly in Deliverable D6.1 [1].

4.2.3 “Remote Production” — Proof of Concept (3rd Cornerstone, CS1.3)

To approve the “Remote and Smart Production Pilot” scenario of UC2 and their goals, UC2 made a live event field test as proof of concept (PoC) (CS1.3). The radio show “La Radio es Sueño” organized by Radio 3 was selected for the field test. This radio show took place at Cineteca de Matadero (Madrid) on stage with a live audience and was broadcast via radio. Additionally, this event was produced remotely and outcome was published via streaming over the Radio 3 website (cf. Figure 36).

Figure 36: Website of RTVE’s radio 3

This Section describes the realization and lessons learned from this demo.


Page 79 / 126


The “Remote Production” scenario 1a was set-up in Telefónica’s testbed OnLife at Peñuelas (cf. Figure 37, ③). The radio show “La Radio es Sueño” by Radio 3 took place at Cineteca de Matadero (Madrid) (cf. Figure 37, ①) and was produced remotely from the RTVE’s headquarter Torrespaña (cf. Figure 37, ②).

Figure 37: The different location of the Use Case 2 Matadero Demo

①

③

②


Page 80 / 126

In the workflow the VNFs “Media Process Engine” (MPE) (cf. Section 3.1.2 of this document) and “vCompression Engine” (vCE) (cf. Section 0 of this document) were used, as shown in Figure 38. Due to the fact that the Cognitive Service “Speech-2-Text” (vSpeech) supports only the English language and the radio show was produced in Spanish this time this VNF was not part of the workflow.

Figure 38: Simplified Architecture of the “Remote Production” scenario 1a (CS1.3)

Preparations for the “Remote Production” of the radio show “La Radio es Sueño” (CS1.3)

Once the previous proof of concepts were progressing properly, further steps were needed. Next targets were to measure the latency on the round-trip loop TV Director/Camera operator and over the top, to mockup a full-fledged deployment of a real live event. Moreover, this keystone should face the fulfilment of the KPI related to this use case.

Measurement of the monitoring delay

The audio/video signal has two points in the timeline, and they are time separated. The first point is where the real event is happening and were the cameramen are. The second point is where the remote TV Director is. This point is positioned further than the point of the real event time.

Those two points have to be as close as possible. Since the TV signal was digital, the coincidence in time of those two points was lost. So, depending on the genre of TV Production such a point can be allowed to be closer or more distant. Sports and drama genre could be the most demanding whereas other genres such as news could be more relaxed.

Some figures of this distance were made prior to the design of the system and before the KPI was established. But the final assessment has to be done subjectively by the people who are the target of this project: the TV Directors.


Page 81 / 126

The time lag has two components: the latency and the time needed for compressing the audio and video signal. The former is by far smaller than the latter. The latter can be adapted in a match with bitrate (bandwidth) consumption, this means less processing time for compression the higher the bitrate and vice versa.

All in all, a theoretical time lag was designed in advance. The figure as explained before is going to depend almost uniquely on the compression processing time. This was intended to be less than 15 frames, i.e. less than 600 ms.

The approach to measure such lag in real time is trying to put two probes in the very end-to-end extremes of the chain. In other words, it is needed to place an audio-visual probe as clapboard, only more sophisticated. For this, a setup with two smartphones each with an app showing a real time clock with millisecond resolution and Network Time Protocol (NTP) synchronized and compensated was chosen.

The first internal test was to validate this solution. For this, three identical Android devices were placed next to each other, synchronized with the same NTP server and recorded with another Android camera device.

Figure 39: Internal test to measure NTP clock accuracy in several Android devices

Several runs were made, and the accuracy resolved was less than 20 ms, i.e. half a frame. So, fortunately the uncertainty of the measurement system was below half of a frame (± 1/4 frame). To sum up, the end-to-end figure can be done with less than one frame resolution.

On the other hand, audio/video lip-sync was confirmed due to the compressor algorithm is well known as a solution contrasted. Furthermore, RTP traffic assured that the streams are going to be reconstructed without slipping artifacts. Nevertheless, a conventional clapboard and the same end-to-end configuration were used to check that the time lag was zero between the audio and video component of the signal.


Page 82 / 126

Figure 40: Audio Video lip sync test with a conventional clapboard

The final test involved the real end-to-end measurement. For this, an Android device was placed in front of the camera and the other device was placed next to the video monitor of the remote production edge (Torrespaña edge). The video monitor showed explicitly the first Android device. A third Android device was recording both devices. This way, a real end-to-end measurement with less than 1 frame accuracy was achieved.

Figure 41: Real end-to-end time lag measurement

As Figure 41 shows, there is a time difference of around 500 ms, i.e. around 12 frames. This way, the designed time lag was accomplished. Now, the final assessment for the TV directors has to be made when the real set-up is deployed.


Page 83 / 126

The “Remote Production” of the radio show La Radio es Sueño (CS1.3)

Deployment of a real live event

One of the objectives of WP6 is the proof of concept of the use cases. Therefore, a real deployment was planned for this purpose. In the case of the Use Case 2, scenario 1a, a remote production was the aim of this concept. The situation intended consists of the coverage of a TV production on site, specifically a drama. A real event was chosen coinciding with a Radio Nacional de España (RNE) programmed event. This event was an adaptation of the classical drama ”La vida es Sueño” from de drama writer Calderon de la Barca. This adaptation was named ”La Radio es Sueño”.

The event duration was around one hour, with up to six stand up speakers in front of the audience speaking up the script. This formula is called Radio Drama. The audience seated in front was around 130 people. The location was ”Centro cultural el Matadero”, a compound of scene and arts places run by the Major of Madrid. Specifically, the show was run on the Cineteca space (cf. Figure 43).

The deployment set-up was covered by three TV cameras. Two PMW-500 (cf. Figure 42) and one PDW-700 from Sony. Two of them with memory card, the other with XDCAM magneto-optical disks (technology on which Blu Ray is based). They were positioned two in the back (cf. Figure 43, C1 and C2) and one in the middle on the right side (cf. Figure 43, C3) of the Cineteca space. The audio from the speakers, music and effects was provided by RNE and embedded as digital audio in the video signal (cf. Figure 43, AUDIO).

Figure 42: PMW-500 Sony camcorder

The executive production was covered by an aggregation of RNE and TVE producers.

There were two TV directors, Mr. Jorge Alonso and Mr. Ivan López Olmos from Toledo TV Center and Torrespaña News HQ respectively. Three TV camera operators were provided from Toledo, Valencia, and Barcelona. One of the camera operators took the responsibility of the technical deployment. Finally, auxiliary people were called for rigging and physical deployment.

The location was chosen because it belongs to the coverage area of the Telefónica’s Peñuelas C.O. It is around 3 km from the C.O. Therefore, XG-SPON 10 Gb passive optic fiber was available for this proof of concept. Each camera signal takes 1.5 Gbps SMPTE-2110 IP traffic. All in all, around 4.5 Gbps was steadily reached. So, another option than 10 Gbps was not feasible.


Page 84 / 126

Figure 43: Shows the Floor layout of Cineteca’s Matadero Cutltural Space with camera and audio positions

On the other hand, the other edge, Torrespaña HQ edge, is where the “Remote Production” would happen. For that, a multipurpose technical room was set-up with the servers, and a

pasillo evacuacion

B

C

D

E

F

G

H

I

A A

B

C

D

E

F

G

H

I

PANTALLA

ASEOS/VESTUARIOS ASEOS/VESTUARIOS

CAMERA 1 CAMERA 2

CAMERA 3

AUDIO

C1 C2

C3

AUDIO


Page 85 / 126

~70 inch screen working as multi-camera viewers monitor. Audio was fed into a couple of auto-amplified loudspeakers. The surface control for the chores of video switcher was done by a regular qwerty keyboard.

Figure 44: Torrespaña HQ Edge – Technical room on Gallery

The video format chosen was HD-SDI 720p50 (1280x720 pixels, 50 complete frames per second, 8-bit depth per color RGB channel). The three TV cameras were synchronized with each other by TriSync sync signal, choosing one of them as master and slaving the others in a daisy chain configuration. This digital video signal was converted to SMPTE ST 2110 IP protocol by two double input HD-SDI to SMPTE ST 2110 IP Embrionix SFP gateways (cf. Figure 45) attached to a Cisco 9300 Catalyst switch in order to aggregate traffic towards a 10 Gbps copper port. This port was linked to the Telefónica´s router converting the signal to XG-SPON FO to Peñuelas. It is worth noting that despite using the "Nevion" solution in the previous tests, it was necessary to opt for "Embrionix" to have a more complete implementation of the SMPTE ST 2110 standard.

Figure 45: Shows the two double input HD-SDI to SMPTE-2110 IP Embrionix SFP gateways attached to a Cisco 9300 Catalyst switch


Page 86 / 126

Unfortunately, the Telefónica’s point of presence had to be located in the opposite building from Cineteca, around 12 meters in front of (cf. Figure 46). This condition forced to deploy a cable protector to cross the street transversally to connect both buildings.

Figure 46: The event location Matadero in Madrid. On the right the Cineteca building and on the left the building with Telefónica’s point of presence

2nd December was the day of rigging and powering. All the technical and human resources were on Matadero installing and testing all the system, feeding the signal to the edge and finally managing the monitoring signal and doing the TV direction from Torrespaña HQ edge.

All the tests were satisfactory, especially stability. The weakest link perceived was the Telefónica’s SFP from the Cisco switch 9300 towards the Telefónica’s router (Alpha). Unfortunately, the host of the LAN from this router was 10 Gbps cooper instead of Fiber Optic SM or MM. This is something not common on these bitrates and as a result, some instabilities were shown.

All in all, the rehearsal day showed that the system was full-fledged up and running and responding in nearly-real-time.

3rd December was the day of the event. There were two complete rehearsals with technical and artistic teams. The first was at 16:00 h the second was at 17:00 h.

Finally, the event happened at 20:00 h, the hall was packed.

TV Directors attended the rehearsal at Cineteca. At the same time, we were finishing the connection with RTVE Interactivos for feeding the output of the Matadero event to the RTVE website. At the same time UPM and Telefónica people were checking and monitoring the status of both edges.

Half an hour to start the event, several people moved to Torrespaña for managing or attending the event. They were both TV Directors, people from IRT, UPM and RTVE.

This was the very first time the TV Directors could feel the tight loop (low latency) between the orders they made and the reaction of TV cameramen. The event started smoothly.

Around 10 minutes after, the system shutdowns for few seconds. The image and sound went to blank/silence. All of us knew that the SPF was not so much stable for that. We were ready for such condition and managed to lift the system in a short time.


Page 87 / 126

Apart from that, the outcome and the work atmosphere were as usual as if the production were conventional.

Rework of the “Remote Production” of the radio show La Radio es Sueño (CS1.3)

To gather the experiences and impressions of the TV Directors during the event, five questions regarding these remote productions were collected from the TV Directors and technical staff. Few days after the event the questionnaire was submitted to Mr. Jorge Alonso giving us his assessment. Following the questions (Q) and his answers (A) (they have been translated to English):

Q: Can you explain the benefits of a remote production for the broadcasting industry?

A: Remote production for broadcast audio-visual events has an important advantage and it is cost reduction. It is no longer necessary to move a large part of the human resources and the technical resources to the location of the event and this has direct consequences. You can expand the offer presented to the audience, this events are much faster to prepare because we do not need the logistics to carry trucks and not to depend the space available for parking and powering and then the ubiquity is also broken when not being alone on a geographically point, but we can connect different points anywhere in the world.

Q: And for the general public?

A: The general public will not notice the difference if it is a conventional production or if we are in a remote production and I believe that the improvements that the audience has is that it will have more offer.

Q: How can this solution facilitate your work?

A: The production in remote allows that the gallery (TV direction control rooms) will not be necessarily in the place where the event is made, and this makes saving time much easier. It is no longer necessary to move there, and we are talking that most of the time we need one day to go on a trip and another to return. This time saving will greatly improve productivity and we can devote more effort to the result.

Q: What were your expectations before the proof of concept?

A: We are still at the beginning of what the revolution will mean with the arrival of 5G not only in the audio-visual field but also in everyday life and we have faced this proof of concept as a discovery to open new paths. Our main objective and the biggest challenge were to shorten the latency times between the orders and the response of the machines and well, I think it was a goal met and we are very proud to have had that almost direct communication.


Page 88 / 126

Q: What impressed you most about this solution?

A: What 5G technology allows is to reduce the latency x times to practically zero. And what this means is that the time that elapses between the time we give the order to a machine and we receive the response is almost immediate. We are not accustomed to these short latencies when we work long distances and it is surprising that within a few minutes of managing an event in remote production, you forget that they are many kilometers away or even in different countries and you get to lose that distance.

4.2.4 “Remote Production” — Multi-Instance-Scenario (4th Cornerstone, CS1.4)

In the last workflow implementation of the 5G-MEDIA “Remote Production” scenario the perspective was changed from a production team focus that looks on a single-instance production scenario to a broadcaster focus that is looking on multi-instance scenarios in one edge.


The “Remote Production” multi-instance scenario 1c was set-up in the Telefónica’s testbed OnLife in one edge in Boecillio. In the workflow we had two productions, “Production 1” and “Production 2”. In the “Production 1” the VNFs “Media Process Engine” (MPE) (cf. Section 3.1.2 of this document), “vCompression Engine” (vCE) (cf. Section 0 of this document), and the cognitive service “Speech-2-Text Engine” (vSpeech) (cf. Section 3.1.1.1 of this document) were used. In the “Production 2”, the VNFs “Media Process Engine” (MPE) (cf. Section 3.1.2 of this document) and “vCompression Engine” (vCE) (cf. Section 0 of this document) were used. The other VNFs “Traffic Manager” and “Background traffic generator” shown in Figure 47 were used by both productions and shared the resources in the edge.

Due to the reason that in the previous demos we approved that the system is able to deal with our high input video data rates of 1,5 Gb per stream, for this demo GStreamer45 was used as video player.

45 Source: https://gstreamer.freedesktop.org/



Page 89 / 126

Figure 47: Use Case 2, ”Remote Production” Multi-Instance Scenario

The video players in both productions send uncompressed HD video signals in 1080i/25 in the NFVI/OnLife network directly to the Media Process Engine (MPE) (for a detailed description of OnLife cf. Deliverable D2.3 [17] and D2.4 [13]). The MPE itself plays out another video file which is also fed into the MPE’s video switching component. The switching/video mixing between the two signals is performed and the final broadcast stream is produced. This is done on the basis of preview streams that are sent to the studio (RTVE Torrespaña) where the director remotely controls the MPE on the basis of those streams. The final output signal of the MPE is also sent as a preview to the director. For transmission the previews are compressed to low-resolution low-latency video streams in the video switching core of the MPE. It is worth to highlight that the lag from the real time acquisition and the monitoring of this signal from the TV Director has to be as short as possible, i.e. no more than 20 frames. This is due to the need that the reaction time of the Director to the command received by the camera operator would be as tight in time as possible. So, the delay of such a loop has to be minimized. This is one of the paramount constraints for the success of this project as practical service.

From the MPE, the final broadcast stream is then fed into the vCompression Engine (which is a VNF that is extended for adoption in the 5G-MEDIA project) which is used for compressing the audio/video streams for WAN/Internet transfer supporting a much lower bitrate. The compressed output stream of the vCompression EngineP1 in “Production 1” is finally routed to the Speech-to-Text Engine (another VNF developed within 5G-MEDIA project) where the audio gets analyzed and a text is extracted. This text is then added as subtitles to the video stream. The final signal with subtitles can then be accessed either via a browser from the broadcaster for further in-house use as a Contribution signal or it can be offered to the public as an online service. In “Production 2” the compressed output stream of the vCompression EngineP2 is the final signal. This final signal can then be accessed either via a browser from the broadcaster for further in-house use as a Contribution signal or it can be offered to the public as an online service.


Page 90 / 126

The collected metrics were displayed for the broadcaster in the Multi-Production-Dashboard (cf. Figure 48). With the change of the point of view to the overall broadcaster perspective, it was necessary to adapt the dashboard from CS1.2 to this multi-instance production scenario. Therefore, the shown metrics were reduced to the following information. In the box on the top of the left side the “active” productions with their IDs were shown. In the box on the top in the middle the “Quality Profile” of “Production 1” and in the box on the top on the right side the “Quality Profile” of “Production 2” is shown. In the gauge box below on the left side the actual bitrate is shown and in the gauge box below on the right side the targeted bitrate.

Figure 48: Overall overview of the vCE-Dashboard of a multi-instance production

The work done on this multi-instance production scenario will be presented in the End Review of the 5G-MEDIA project.

4.2.5 Description of the testbed updates and the configuration of the VNFs used in the implemented workflows

Telefónica testbed OnLife

In the developmental period of CS1.2 the Telefonica OnLife internal connectivity between the different virtual functions is developed on IPv6. Indeed, the first version was developed on IPv4. The connection between hosts following the architecture defined at Deliverable D6.1 [1] must be IPv6, because it allows having virtual machines that should interact, being instantiated on different hosts. Thus, the computing resources allocated are shared between physical nodes.


Page 91 / 126

As explained at Deliverable D6.1 [1], all VNF run on their own virtual machine (VM) in Telefónica OnLife with the VM deployment and resource allocation performed by OpenNebula and the connectivity between them managed by ONOS SDN Controller at the CLOS Fabric. It is highlighted that the interfaces among OnLife and OSM have been implemented within 5G-MEDIA.

Figure 49: Use Case 2 – 5G-MEDIA UC2 VMs deployed in OnLife


Page 92 / 126

The VNFs are configured with the following parameters/properties in OnLife (cf. Table 10):

Table 10: Use Case 2, Multi-Instance scenario – VNF parameters/properties in OnLife

VNF vCPUs Memory (GB) Storage (GB) Number of VMs

MPE 8 8 10 2

vCompression 24 8 20 2

Speech-to-Text 2 4 10 1

Demonstrator 2 2 2 1

RTMP 2 2 10 1

Publisher 1 2 10 1

QoE 8 16 20 1

Traffic Manager 2 2 11 1

Background Traffic 2 2 11 1

Media Process Engine

The Media Process Engine is a media-specific function, which acts as a mere video signal switcher, based on the open source tools Voctomix46, a videomixer software, and Gstreamer47, a multimedia framework (for details cf. Deliverable D4.1, Section 7.3.2.3 [16]). Concretely, the MPE VNF is composed by voctocore, the processing part of the Voctomix solution. This processing core has python-based code that allows to switch between different input streamings, and to compose a new video streaming, by composing the input together with a background.

As the voctocore is python based, python libraries have been installed in the VM of the VNF. Apart from this, Ffmpeg is also installed, as well as the desired libraries, for taking the video in its correct format, in our case H264 library. Finally, Gstreamer libraries are also installed.

vCompression Engine

The vCompression Engine is responsible for compression/decompression and encoding/decoding of the audio video material before and after the WAN transfer as uncompressed video material requires very high bandwidth that is often not available in the networks offered today or much too expensive to use. It relies on the open source tool FFmpeg and uses the latest and most common video standards (e.g. H.264). Today mostly hardware en-/decoders are used for this, so a virtual software-only approach will facilitate flexibility and agility in remote production. For detailed information on the vCompression Engine please refer to Deliverable D4.1, Section 7.3.2.2 [16] and D4.2 [3].

46 Source: https://github.com/voc/voctomix

47Source: https://gstreamer.freedesktop.org/

https://github.com/voc/voctomix



Page 93 / 126

Cognitive Service: Speech-to-text Engine

The Cognitive Services are a collection of additional functions and services that essentially extract supplemental information from the A/V material using state-of-the-art machine learning algorithms in order to ultimately make this available to a wide variety of value-added services.

4.3 Deployment of the “Mobile Contribution” Scenario

The scenarios in the “Mobile Contribution” branch deals with the streaming of live events via smartphones or tablets and the enrichment of the streamed content. Journalists or spectators (producing user-generated content) use a smartphone app (5G-MEDIA App) to connect to the 5G-MEDIA network for live streaming video and/or audio to the broadcasting center. This content can be enriched by using the different Cognitive Services provided by the 5G-MEDIA Platform during the streaming to the broadcaster. This scenario extends the Mobile Units examples 3a and 3b (cf. Table 1: Overview of the different mobile units) with the enrichment of the streamed content, and provides both an additional enhancement of the stream and time saving aspects for the broadcaster. Furthermore, this “Mobile Contribution” scenario was extended to a Multi-Use Case scenario that is looking into the behavior of the CNOs’ functionalities when they have to deal with resources that are shared across two use cases. These different aspects of work were realized as described in the following sections with the two cornerstones 1) “Mobile Contribution” scenario — General Demo Workflow (CS2.1), and 2) “Mobile Contribution” scenario — Multi-Use Case scenario (CS2.2).

4.3.1 “Mobile Contribution” scenario — General Demo Workflow (1st Cornerstone, CS2.1)

Technical Description of the implemented “Mobile Contribution” scenario 2.b

The scenario was realized in the “DEMOKRITOS” testbed of the National Center for Scientific Research (NSCRD) and Telefónica’s (TID) testbed OnLife. The VMs deployed to provide the “Mobile Contribution” scenario in Telefónica’s testbed OnLife are:

Table 11: Overview of the deployed vm’s in Telefónica’s testbed OnLife for the “Mobile Contribution” scenario

VNF vCPUs Memory (GB) Storage (GB) Number of VMs

OpenWhisk Controller

2 4 35 1

OpenWhisk Worker

16 16 40 2

OpenWhist Master

2 4 20 1

In the workflow the media-specific Cognitive Services VNFs “Speech-2-Text Engine”, and “Image/Face Recognition” are updated/reinvented and deployed.


Page 94 / 126


In the background the App now connects to the 5G-MEDIA network for live streaming video and/or audio to the broadcasting center. The signal is compressed and encoded on the smartphone for transmission. Then the stream passes through a Cognitive Service function, triggered using FaaS, where it is enriched with additional information. In our case a face recognition service is applied that tags and identifies people in the video and provides the broadcaster with supplemental information and metadata from a database for further value-added services to enhance the viewing experience. At the network edge near the receiver a vCompression Engine decodes and decompresses the stream for further use and processing at the broadcaster (cf. Deliverable D2.2, Sections 4.1.3.2 and 4.4.1.1 [2]).

Figure 50: Use Case 2.b: mobile contribution. High level description

Figure 50 depicts the high-level architecture of the use case. Suppose a mobile journalist working for RTVE witnesses a newsworthy event. In our example it is a spontaneous demonstration taking place in Madrid in the vicinity of Puerta del Sol. The mobile journalist wishes to create a mobile contribution for the news program of the broadcaster she works for. To this end she uses a mobile application in her smartphone. Each mobile journalist account is associated with several 5G edges in different geographic locations, where the mobile journalist can on-demand set-up the infrastructure for mobile production.

The VNFs comprising this mobile contribution service are based on the FaaS technology (cf. Deliverable D3.2 [14]). The cognitive functions vDetection and vSpeech and the supportive function vSplitter are containers that derive from 5G-MEDIA base image (cf. Deliverable D3.2 [14]) for serverless VNFs. These functions are being dynamically invoked based on the mobile journalist selection of cognitive services when the journalist requests to establish a mobile contribution session.


Page 95 / 126

The application is smart, so that it knows how to select an edge that is closest geographically to the current GPS coordinates reading of the mobile phone used by the journalist. In Figure 50 an edge in the Telefonica Region in Madrid was selected as shown on the attached maps.

The application then contacts Apache OpenWhisk Web API Gateway (OW GW) – a pre-installed extension of SVP to the edge assisting applications with session establishment – and asks to establish a mobile contribution session supported by face recognition (vDetection and caption production (vSpeech) cognitive services and the metadata and the media stream itself should be forwarded to the remote broadcaster’s site.

The session initiation request triggers serverless orchestration flows that collaboratively with the SVP OSM invoke and configure the serverless functions for the session. Moreover, these flows entail interaction with the CNO component. The basic application orchestration flow is described in subsection 4.3.1.1 and the interaction with CNO is further elaborated in subsection Error! Reference source not found.

Orchestration Workflows

Figure 51: UC2.b Orchestration Flow

Figure 51 depicts an interaction sequence diagram for a basic orchestration flow for the “Mobile Contribution” scenario of UC2. The entities and message exchanges shown in red belong to the application components. The blue entities and message exchanges belong to the 5G-MEDIA SVP.


Page 96 / 126

There are three static application components:

1. A smartphone client (frontend): an application offering a convenient GUI for capturing media streams. It might have multiple functionalities, but for the sake of the scenario, the application offers controls that allow to (a) request edge cloud selection to establish a most efficient connection to SVP, (b) start remote mobile production with selected supporting cognitive services, (c) stop the production session and require contribution finalization.

2. Edge Selector Web Service: this is a Web service that keeps a database of edges that this mobile journalist account is eligible to use. Please note that this Web service can be packaged in several ways and can be hosted anywhere. One option is to package this Web service as 5G-MEDIA network service that can be run in SVP and serve multiple sessions of mobile journalism for the journalist. Another option would be to package this Web service as 5G-MEDIA network service and share it among multiple sessions of multiple journalists working for the same broadcaster. Yet another option would be to run this service as an application component on the smartphone itself.

3. Safe-Local endpoint: The endpoint stores the resulted contributed media stream and its metadata, such as caption and face recognition features (if any of these cognitive functions were requested by the journalist) for a later consumption by the broadcaster. In our prototype, the Safe Local endpoint is a Linux Ubuntu 18.04 VM (it could also be a baremetal server) with 2vCPUs, 8 GB RAM and 100 GB of disk installed with docker CE. It runs a dockerized version of nginx with nginx-rtmp-module48 plugin capable of storing and serving RTMP streams (via ports 1935 and 8080). The Safe Local endpoint is pre-installed at the edge.

There are three dynamic application components implemented as FaaS functions deriving from the 5G-MEDIA base FaaS VNF Docker image:

1. vSplitter: multiplies the stream to vSpeech, vDetection, and Safe-Local; 2. vSpeech: produces caption metadata; 3. vDetection: produces face recognition metadata.

The network service descriptor (NSD) of the mobile journalism network service contains the following VNFDs:

1. Bootstrapper: is a special function that sets up serverless orchestration end point and the underlying infrastructure for the per-session orchestration based on Argo Workflows and Argo Events as described in Deliverable D3.4, Section 4 [15];

2. vSplitter; 3. vSpeech_CPU: produces caption metadata and is supposed to run on CPU; 4. vSpeech_GPU: produces caption metadata and is supposed to run on GPU; 5. vDetect_CPU: produces face recognition metadata and is supposed to run on CPU; 6. vDetect_GPU: produces caption metadata and is supposed to run on GPU.

48 Source: https://github.com/arut/nginx-rtmp-module

https://github.com/arut/nginx-rtmp-module


Page 97 / 126

The CPU vs GPU differentiation at the VNFD level is required because of the way Apache OpenWhisk extensions that we developed in 5G-MEDIA handle GPU based functions. Our implementation is based on annotations49, a standard OpenWhisk mechanism to add metadata to functions. These annotations are being used when FaaS VIM offloads a VNF to be executed as pod in K8s. The offloading service (cf. Deliverable D3.2 [14]) uses annotations to label pods in such a way that K8s scheduler would place the pods on the CPU or GPU nodes as appropriate.

The flow starts with mobile journalist selecting a combination of cognitive functions (vSpeech, vDetect, vSpeech+vDetect, None) and a remote endpoint: (Safe-Local, Safe-Remote, Live-Remote).

Then the journalist uses an application GUI control to request edge selection for this session UUID, GPS coordinates, and the combination of the cognitive functions and an endpoint. The frontend sends this edge selection request to the Edge Selector Web service. The edge selector Web service can either operate in a standalone mode or be supported by CNO to make more optimal edge-selection decisions. In this flow, we assume that the Edge Selector requires support from CNO. The edge selector sends an edge selection message to the CNO via an SVP Kafka bus. The CNO makes an optimisation decision and publishes an edge selection (i.e., the PoP selection) on the SVP Kafka bus that might be different from the geographically closest edge. The reason for that might be a global optimisation that is performed by the CNO or availability of special hardware resources, such as GPU, in specific edges. On the same response message, the CNO publishes resource allocation that is authorized for this session. For example, it might include information how many GPUs this session of mobile journalism can use in the selected PoP.

The Edge Selector component picks the CNO response from the Kafka bus and passes it to the application that polls the Edge Selector Web service about the edge selection decision.

With this information, the application uses the URL of OW GW (i.e., the selected PoP) is being used by the application to issue a request to establish a streaming session. The OW GW invokes a special FaaS action that requests instantiation of the mobile contribution service via SVP OSM. The latter instantiates the service in a regular way (including the Bootstrapper) via the FaaS VIM. The Bootstrapper sets up the serverless orchestration endpoint (Argo Gateway as described in Deliverable D3.4 [15] and D2.4 [13]) that triggers the S-NFVO for this service (i.e., Argo Sensor as described in Deliverable D3.4 [15] and D2.4 [13]) and the latter executes the orchestration workflow that matches the selection of functionality and endpoint by the mobile journalist and resource allocation made by CNO for this session.

After the requested VNFs are invoked by the serverless orchestrator (in the sequence diagram we show only one cognitive function to save space), the IP:PORT of the splitter which serves as ingress becomes available via OSM. The mechanism that ensures this is as follows. OSM has a periodic VNFR refresh task. This task requests metadata about each VNF from the VIM that started it. The FaaS VIM supports the same functionality. When metadata about a serverless VNF is requested by the OSM VNFR refresh task, FaaS VIM executes a special

49 Source: https://medium.com/openwhisk/using-gpus-with-apache-openwhisk-c6773efcccfb

https://medium.com/openwhisk/using-gpus-with-apache-openwhisk-c6773efcccfb


Page 98 / 126

serverless action that discovers the metadata, such as IP:PORT, of the VNF via K8s API (as described in Deliverable D3.2 [14]).

The application polls OSM for IP:PORT of the ingress (vSplitter) and once VNFR becomes populated as described above, the application becomes ready for production. The mobile journalist gets an indication of this and requests media streaming via the application control. The application starts streaming the media to vSplitter.

At some point when the journalist decides to stop capturing, she stops the session, and this automatically triggers an application request to finalize the contribution to OW GW. The latter executes a serverless function that requests closing the media stream at the Safe-Local endpoint and flushing it to storage. A URL is produced for the finalized contribution. This URL is returned to the application that polls OW GW for the URL.


Page 99 / 126

Integration with CNO

Figure 52: Integration of UC2.b Orchestration Flow with CNO

In this subsection we provide additional details about the integration of the UC2.b “Mobile Contribution” flow with the CNO. First, in 5G-MEDIA there are two types of CNO: Service Specific CNO (SS-CNO) and Overarching CNO (O-CNO). As explained in Deliverable D2.4 [13], the O-CNO is part of SVP. It is concerned with global optimisation. The SS-CNO is specific to the service it optimizes and takes a narrower, approach by greedily optimizing instances of


Page 100 / 126

the same service. A long-term Planner CNO is a component that inspects historic data of how services utilize resources and how this impacts QoE for their users. The Planner CNO calculates quotas of resources that every SS-CNO can arbitrate among the sessions of the service that it optimizes. In cases when the resources within the quota are insufficient, the SS-CNO requests resources from O-CNO. Communication between SS-CNO and O-CNO occurs over the SVP Kafka bus. The format of the messages exchanged between the SS-CNO and the O-CNO is standardized. For instance, a message from SS-CNO UC2_MC to O-CNO asking permission to use one GPU in the edge “ncsrd” is as follows:

Message 6: Permission enquiry from SS-CNO UC2_MC to O-CNO to use one GPU

{

"sender": "SS-CNO UC2_MC",

"receiver": "O-CNO",

"session_uuid": "123abc"

"resource":{

"GPU": 1,

"CPU": 0},

"nfvi": "ncsrd"

}

If the O-CNO agrees to grant a GPU to the SS-CNO UC2_MC, the O-CNO will reply a message as follows:

Message 7: Reply message with a confirmation from the O-CNO to the SS-CNO UC2_MC to grant a GPU

{

"sender": "O-CNO",

"receiver": "SS-CNO UC2_MC ",

"session_uuid": "123abc"

"resource": {

"GPU": 1,

"CPU": 0},

"nfvi": "ncsrd"

}

The O-CNO will set “GPU”: 0 in the reply message if it decides not to grant a GPU to the SS-CNO UC2_MC.

It is important to stress that SS-CNO is a component developed by the service developer and deployed as VNF in SVP. In UC2.b SS-CNO optimizes multiple sessions of mobile contribution. Figure 52 provides more details on how the orchestration flow of UC2.b is integrated with SS-CNO and O-CNO. In our current implementation, SS-CNO is packaged as a separate service that should be instantiated via OSM in a regular way. An important design principle is that Edge Selector should not know neither physical location, nor address of SS-CNO and SS-CNO should not know neither physical location, nor address of O-CNO. These components communicate via SVP Kafka bus using topics named by convention.

Therefore, also it is not important in which order the SS-CNO service and the rest of the VNFs of the service are started. As Figure 52 shows, in Step 3, Edge Selector publishes a message for SS-CNO on the SVP Kafka bus on a predefined topic named ss-cno-mc requesting


Page 101 / 126

optimisation of the edge selection and resource allocation for the selected cognitive functions. Edge Selector provides a list of legit edges ordered in a descending order of preference with most close to the current location of a journalist being most preferable. The message content is shown in Message 8.

Message 8: Edge Selector to SS-CNO Message

{

"sender": "edge-selector",

"receiver": "SS-CNO-UC2-MC",

"session_uuid": "123abc",

"payload": {

# function_values can be out from:

'vspeech', 'vspeech_vdetection', 'vdetection', 'none'

# none - means just the

"function": "vspeech_vdetection",

# mode_values = ['safe-remote', 'live-remote', 'safe-local']

"mode": "safe-local",

# descending order of geolocation

# empty list would considered as wild card

"nfvi_uuid_list": [

"tid",

"ncsrd",

"ote"

]

}

}

The SS-CNO responds with a message shown in Message 9.

The response shown in Message 9 tells Edge Selector to use PoP at NCSRD and it authorizes allocation of two GPUs for two functions. Therefore, the serverless orchestrator will invoke the GPU annotated VNFs for each of the required cognitive functions.


Page 102 / 126

Message 9: SS-CNO Response Message to Edge Selector

{

"sender": "SS-CNO-UC2-MC",

"receiver": "edge-selector",

"session_uuid": "123abc",

"resource": {

"GPU": ["vdetection","vspeech"],

"CPU": 0,

"nfvi_uuid": "ncsrd"

}

}

If SS-CNO does not respond within a preconfigured time-out, the edge selector returns the first edge on the list (i.e., the closest one geographically) to the application. Note that the next session of mobile journalism is independent from this one and if SS-CNO responds then it will use its response for establishing the session at the selected PoP and allocating resources within this PoP according to the SS-CNO/O-CNO prescription.

When the SS-CNO UC2_MC receives a request from the edge selector, it needs to make a decision, for instance, whether or not to use GPU or CPU and which edge location the user should start the session. The SS-CNO UC2_MC can make a decision directly if the reserved resources for UC2 are available; otherwise it consults the O-CNO to ask for using the shared resources between multiple services (UCs).

In brief, the SS-CNO UC2_MC needs to compute end-to-end latency (cf. Figure 53) between mobile journalist and the broadcaster, which includes the sum of latencies of:

• Network latency from the mobile journalist app. to the edge (e.g. TID, NCSRD or OTE)

• Processing latency at the edge

• Network latency from the edge to the broadcaster (e.g. IRT or RTVE)

Figure 53: end-to-end latency in mobile contribution


Page 103 / 126

After calculating the end-to-end latency, ideally, a utility-based algorithm should be deployed at SS-CNO UC2-MC to make a decision trading off between latency and cost/revenue. In general, the algorithm can try to minimize the latency within a cost budget constraint (e.g. cost to deploy a service using GPU in TID) or minimize the cost with an upper bound constraint on latency. The decision on which objective function and constraints to be used are service specific. For instance, the algorithm tends to minimize latency in a live-event case to reduce lag/jitter effects. On the other hand, for a recorded case, latency is not a critical requirement, therefore minimizing cost is preferred in the objective function.

Processing time at the edge depends on which cognitive service is running (e.g. vDetection, vSpeech, etc.), if it is a live or a recorded case and if GPU or CPU is used. For example, based on our measurement in our NCSRD testbed, vDetection service is always faster if GPU is used in both the live and the recorded cases. It is also faster to use GPU than CPU for vSpeech in the live case. However, there is not much difference in processing time for vSpeech in the recorded case if using GPU or CPU. Depending on the locations of the mobile journalists, the edges and the broadcasters, the end-to-end latency can be dominated by network latency (mobile journalist to the edge and the edge to the broadcaster) or processing latency at the edge.

In our current implementation, we set the objective at the SS-CNO UC2_MC in order to minimize the end-to-end latency. Therefore, the SS-CNO UC2_MC decides to use GPU or CPU and which edge location to start the session so that it minimizes the end-to-end latency. However, if the gain of latency by using GPU is not much, the SS-CNO UC2_MC might decide to better use CPU. For example, in the recorded case, the end-to-end latency of UC2 might be reduced by 100 ms by using GPU, however it might need to preempt the GPUs currently used by UC1. Thus, it is better for UC2 to use CPU instead of GPU because using GPU can gain a bit in latency but potentially disrupt the system when preempting the GPUs used by UC1. Ideally, in future work, we can define different levels of disruption and add this to the utility-based objective function for better decision. In addition, the SS-CNO UC2_MC can make a bad decision because of inaccurate measurements on network latency or processing time latency, thus there should be a mechanism for the SS-CNO UC2_MC to get QoE feedback and change the decision accordingly.

4.3.2 “Mobile Contribution” scenario — Multi-Use Case scenario (2nd Cornerstone, CS2.2)

Additionally, the “Mobile Contribution” scenario was tested and the work of the O-CNO was approved in a Multi-Use Case (Multi-UC) scenario with UC1. In this scenario the O-CNO is in charge of the allocation of GPUs between these two use cases. Both scenarios (UC1 and UC2 “Mobile Contribution”) were sharing GPU resources in the same Edge in this scenario.


Page 104 / 126

Multi-Use Case-Scenario, O-CNO Orchestration Flow

Figure 54: Multi-Use Case-Scenario, O-CNO and SS-CNO interaction

Figure 54 shows the interactions of the O-CNO (O-CNOpredictive optimizer and O-CNOarbitrator) and SS-CNO of the Multi-UC scenario. The O-CNO consists of the following components:

• O-CNO (Overarching-CNO) includes two CNO components which are O-CNOpredictive optimizer and O-CNOarbitrator.

• O-CNOpredictive optimizer: offline predictive optimiser algorithm to be executed periodically (e.g. 1 per hour, 1 per day, etc.) or to be triggered by O-CNOarbitrator.

• SS-CNO (Service Specific-CNO): handling new requests, asking permission from O-CNOarbitrator to use the shared resources if needed.

• O-CNOarbitrator: has the visibility of shared resources between use cases, executes an online version of utility maximising algorithm to decide who has permission to use the shared resources.


Page 105 / 126

The O-CNO performs the following processes/steps:

(1) O-CNOpredictive optimizer executes the offline predictive optimizer algorithm and notifies the O-CNOarbitrator about the amount of resources reserved for each use case and the shared resources between use cases.

(2) O-CNOarbitrator tells the SS-CNO the amount of the reserved resources each use case can use.

(3) SS-CNO receives a request from a user application. If there is free space in the reserved resource, step (a), (b) and (c) are skipped.

(a) SS-CNO asks O-CNOarbitrator to use the shared resources. (b) O-CNOarbitrator executes the online arbitration algorithm to decide whether or

not to allow the user using the shared resources. (c) O-CNOarbitrator might signal to O-CNOpredictive optimizer, e.g. to raise an alarm when

the utilization of the shared resources is above a threshold. (4) Response to the user application

CNOarbitrator algorithms

A simplified version of the overall optimization algorithm (cf. CNOplatform in D3.4 [15]) can be used. In this simplified algorithm, the underlying routing part might be omitted to reduce algorithm complexity. In addition, the algorithm optimizes only for the new request and possibly a subset of on-going sessions. This could help the algorithm react quickly to every new user request. There are two possible online arbitration algorithms:

• Non-preemptive algorithm: every time a new user request arrives, the algorithm considers the availability of shared resources and finds the optimal solution only for the new request. There is no change for all on-going sessions. This could lead to a suboptimal solution.

• Preemptive algorithm: when a new request arrives, the arbitration algorithm is executed considering the new request together with a set of on-going sessions which are possible to be preempted (e.g. those have lower priority). As a result, some on-going sessions can be preempted to release resources for the new request.

Example of Multi-Use Cases (UC1 and UC2.b “Mobile Contribution”) arbitration algorithm:

1. There are 2 GPUs. O-CNOpredictive_optimizer decides one is reserved for UC1 and the other GPU is shared between the two use cases, assuming that UC2 “Mobile Contribution” has higher priority than UC1.

2. A new request from UC1 asks to use GPU, SS-CNO sends a response directly (skipping step (a), (b), and (c) in the sequence diagram) as the reserved GPU is free.

3. Another new request from UC1 asks to use GPU, SS-CNO asks for permission from O-CNOarbitrator to use the shared GPU. O-CNOarbitrator allows to use the shared GPU as it is currently free.

4. A new request from UC2 “Mobile Contribution” asks to use GPU, SS-CNO asks for permission from O-CNOarbitrator to use the shared GPU:

a. If O-CNOarbitrator uses non-preemptive algorithm, UC2 “Mobile Contribution” cannot use the shared GPU as it is currently occupied and UC2 “Mobile


Page 106 / 126

Contribution” has to wait until it is free or CNOarbitrator rejects UC2 request of using GPU.

b. If O-CNOarbitrator uses preemptive algorithm, UC1 has to release the GPU for UC2 “Mobile Contribution”.

4.4 Results

Use Case 2 (UC2): “Mobile Contribution, Remote and Smart Production” in the 5G-MEDIA project enables remote production of an event without the need for dedicated infrastructure to be specifically deployed at the event venue. In this context, cameras and audio equipment at the venue are connected via a 5G network to media production applications deployed and orchestrated in the cloud. Another variation of this use case considers the streaming of live events via smartphones or tablets by spectators or journalists. The role of the 5G-MEDIA Platform in this use case is to ensure that the media processing functions are efficiently deployed in cloud infrastructure enabling low latency and high throughput as required by live streaming and media processing. To ensure a successful production and delivery of the produced content over the 5G-MEDIA Platform, the following different latency and quality conditions should be met: guaranteed end-to-end user data rate, end-to-end latency, service deployment time, service reliability, virtualization infrastructure / scalability, and quality of service). They were addressed and approved during the several validation tests.

Guaranteed end-to-end user data rate

For a successful contribution the broadcaster/journalist/user requires a guaranteed, end-to-end user data rate. This minimum bit rate or connection bandwidth assures an audio-visual transfer/streaming quality. It is depending on the used video format/codec.

Within the UC2 this requirement was met by developing specific production profiles. For every production profile a lower and an upper bandwidth was defined, based on the codec and the category of the content (cf. Table 4: Lookup table for Production Profiles with H.264 and H.265 encoding). This lower and upper bandwidth is later on used by the CNO as border to select the appropriate compression level regarding the available bandwidth (cf. Section 3.2.2 of this document). Additionally, the lower border guaranteed a minimum data rate.

End-to-end latency

Another important requirement is the end-to-end-latency. Due to the distributed teams the signal propagation time needs to be as short as possible and nearly real-time. So, this requirement describes the maximum tolerable time a packet needs from the sender to the receiving side. That includes the infrastructure, the time needed for uplink, any necessary routing in the infrastructure, and the downlink. End-to-end latency can be distinguished between pure network latency on the one side comprising the latency of the whole network path excluding end devices on-site and signal-transport latency on the other side which comprises the latency of the whole signal path including the processing of end devices on-site. Regarding the different workflow components, different latencies have to be met. Therefore, the round-trip time (RTT) or network latency should be t <= 50 ms. The signal


Page 107 / 126

transport latency in one way should be t <= 500 ms for the audio/video, t <= 100 ms for the intercom (according to ITU G.114), and t <= 50 ms for the control.

Within the UC2 this requirement was met by the implementation of the proof of concept of the Matadero demo. In the pretest a time difference of around t = 500 ms, i.e. around 12 frames, was measured and could be decreased during the final remote production to 10 frames.

Service deployment time

Not less important the service deployment time has to be met as well, especially from the “Mobile Contribution” scenario point of view. The service deployment time describes the duration that is required for setting up end-to-end logical network slices characterized by the respective network level guarantees (such as bandwidth guarantees, end-to-end latency, reliability, etc.). It is required for supporting the media services and should be t <= 5 min for gathering and breaking news events.

Within the UC2 this requirement due to time issues couldn’t be measured. For the “Remote Production” scenario a measurement was done. There the service deployment time was 7 minutes.

Service reliability

Due to the fact that contributed content normally is post processed inside the broadcaster’s facility before distribution, it’s important to have the content available in the best quality as possible. Therefore, a faultless transfer to the broadcaster is targeted as well. During a transmission errors could occur. This cannot be prevented and therefore error correction mechanisms are applied that increase end-to-end latency as a side effect. Therefore, service reliability has to be guaranteed. The service reliability describes the maximum tolerable packet loss rate at the application layer within the maximum tolerable end-to-end latency for that application. As errors are not tolerable in a live broadcast production in this case, an error-free/lossless transport of signals is targeted, therefore the maximum packet-loss rate should be < 10-12.

Within the UC2 this requirement was out of scope.

Service and Service Level Agreements (SLAs) monitoring

For a broadcaster it is also important to have the possibility to monitor the used services as well as the service level agreements (SLAs). Therefore, the broadcaster/user should have the ability to define service and network metrics to monitor performances. The Service and SLA monitoring should provide full traceability of the used microservice components throughout their lifecycle even when placed/migrated to nodes administered by different actors. Automatic negotiation and monitoring of specific SLAs between different actors should be provided.

Within the UC2 the service monitoring requirement was met by the MAPE services, that enable the collecting of metrics. These metrics were displayed for the user in different


Page 108 / 126

dashboards, like the vCE-Dashboard (cf. Figure 34), the Traffic Manager-Dashboard presented by the CNO (cf. Figure 35) or the Multi-Production Dashboard (cf. Figure 48).

Virtualization infrastructure / scalability

Due to the fact, that productions are not always operated in the same setup and especially in ad-hoc live productions that happen spontaneously or on short notice a fast scalability of network resources, bandwidth, and virtual media functions is needed. Therefore, the virtualization infrastructure should provide scalability.

Within the UC2 the scalability of the virtualized infrastructure was approved by the different workflow setups of the “Remote Production” scenarios. These scenarios grew more and more complex and so they required the 5G-MEDIA Platform to scale.

Quality of Service

Due to the high requirements on latency, jitter, errors and bandwidth, media streams always have to be prioritized in the network. The 5G-MEDIA Platform should provide therefore a Quality of Service (QoS) for classification and prioritization of audio-video-streams.

Within the UC2 this requirement was indirectly met by the CNO who in combination with the production profiles and the vProbe, prioritized the video in the way the user requested it.


Page 109 / 126

5 Conclusions

5.1 Main Achievements

This Deliverable demonstrates how Use Case 2 (UC2): “Mobile Contribution, Remote and Smart Production” can be supported and benefits by the usage of media services and tools provided by the 5G-MEDIA Platform. During the 5G-MEDIA project the several performed tests show the following achievements.

• Performance overview

With the MAPE service which collects infrastructure, application/VNF-specific and QoE monitoring data, adapts them in a common format and correlates with the running service, the achievement is that the users have a better insight in the performance and metrics of normally unseizable metrics from virtual services.

• Use of capacity and cost reduction

The CNO service and QoE service made it possible to take a better advantage of the available bandwidth, this leads not only to a better utilization of the bandwidth under respect of quality; it leads as well as to a better economic utilization and with this to a better billing.

The use of VNFs in the "Remote Production" scenario shows that the usage of VNFs lead to a reduction of broadcaster hardware and human resources on site. This leads to a shift of persons from on site to the Headquarter, where now more people with network and IP knowledge will be needed, leading to a new cost item. But in total it shows a costs reduction by 34.65% (cf. Section 7.1, Sub-Section “Total Costs”).

• Time saving and content enrichment

The use of Cognitive Services in the “Mobile Contribution” workflow provide the transcript and metadata with the video thus enriching the content and saving time and resources.

The Cognitive Services were implemented in two versions for the “Mobile Contribution” Scenario. During the tests with these two implementations we saw that the GPU-based implementation is faster than the CPU-based one. So, for time-critical workflows, the GPU-based VNF should be used where possible.

With the development of one-task-limited VNFs that support the serverless architectures paradigm like used in the Cognitive Services for the “Mobile Contribution” use case, we demonstrated a time saving potential as well.

• Platform validation

During the different demos and tests, the “Remote Production” environment has managed up to 6 Gbps of constant video traffic in real time with good performance. With this test it was also possible to validate the XG-SPON access technology as part of the OnLife Edge platform.


Page 110 / 126

• Flexibility

By providing the cognitive services in a VM-based VNF version that works only with CPUs or a FaaS VNF Docker-based version that works with CPUs or GPUs, flexibility is given to the user.

5.2 Lessons learned

All of the “Mobile Contribution, Remote and Smart Production” activities showed, that for a successful pilot realization the implementation, integration and test activities require an intensive collaborative work across all 5G-MEDIA partners involved.

Another lesson learned was, that with the reduction of hardware on site it is necessary to keep the distance to the edge cloud as short as possible. This is required due to the high amount of data which productions have to deal with.

We have seen that, depending on the workflow, it may be necessary to offer VNFs in different versions. This makes the system more flexible for the user, but for 5G-MEDIA it means an increase in complexity.

With the proof of concept of the “Remote Production” scenario in Matadero (Madrid (ES)) we showed that a crew without prior experience can handle a remote production with our 5G-MEDIA framework. The live production went smoothly, and the video stream was distributed to RTVE’s webviewers. The most important lesson here was, that the main objective and the biggest challenge was to have low latency. Low latency is needed in the communication between the TV Director and the cameramen (preview video to TV Director and commands to cameramen). This happened in near real-time. On the other side the response of the VNFs to the orders of the TV Director have to low latency. This requirement was also met, and the reaction was also in near real-time.

5.3 Summary

In the 5G-Media project with Use Case 2 (UC2): “Mobile Contribution, Remote and Smart Production” a pilot that covers several scenarios was set-up, which intend to validate the capabilities of the platform to flexibly develop, deploy, and optimize media service applications. In this deliverable we had described and reported the various validation activities and outcomes of the tests and pilots, that intended to approve this goal. So, for the “Remote Production” scenario different single-instance workflows in the versions 1a, 1b, and 1c were set-up and the different MAPE services, that provide the metrics and the allocation of the available bitrate with respect to the available bandwidth by the CNO were tested. Even more complex scenarios were implemented to test the bitrate allocation under aggravating conditions. For this a multi-instance scenario in the workflow implementation 1a that runs in parallel with a workflow implementation 1c was implemented for further tests. To complete the “Remote Production” tests a proof of concept (PoC) was made. Therefore, a remote production with three cameras was set-up in Matadero (Madrid (ES)) to broadcast a radio event. This proof of concept of the “Remote Production” scenario was fulfilled successfully. The intended setup was made, the live production happened smoothly, and the distribution made it possible to reach RTVE’s viewers via web. The other scenario of “Mobile Contribution” covers the work of mobile journalists. The cognitive services of the 5G-MEDIA Platform offers


Page 111 / 126

media services for them, that enhance their produced content during the delivery to their broadcaster’s headquarter.


Page 112 / 126

6 References

[1] 5G-MEDIA Consortium, Deliverable D6.1: “5G MEDIA Use Case Scenarios and Testbed”, August 2018.

[2] 5G-MEDIA Consortium, Deliverable D2.2: “5G-MEDIA Requirements and Use Case Refinement”, April 2018.

[3] 5G-MEDIA Consortium, Deliverable D4.2: “5G-MEDIA Catalogue APIs and Network Apps”, November 2019.

[4] V. Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning”, in the International Conference on Machine Learning, 1928–1937, 2016.

[5] V. Mnih et al., “Human-level Control through Deep Reinforcement Learning”, Nature 518 (2015), 529–533.

[6] R. S. Sutton et al., “Policy Gradient Methods for Reinforcement Learning with Function Approximation”, in NIPS, Vol. 99. 1057–1063, 1999.

[7] C.J.C.H Watkins, “Learning from Delayed Rewards”, PhD Thesis, University of Cambridge, England, 1989.

[8] F. Chiariotti, S. D'Aronco, L. Toni, and P. Frossard, “Online Learning Adaptation Strategy for DASH Clients”, in ACM MMSys, 2016.

[9] H. Mao, R. Netravali, and M. Alizadeh, “Neural Adaptive Video Streaming with Pensieve”, in ACM SIGCOMM 2017.

[10] M. Jaderberg et al., “Reinforcement Learning with Unsupervised Auxiliary Tasks”, in ICLR, 2017.

[11] Y. Wu and Y. Tian, “Training Agent for First-person Shooter Game with Actor-critic Curriculum Learning”, in ICLR, 2017.

[12] M. Kheirkhah, https://github.com/mkheirkhah/5gmedia/tree/deployed/cno. Last checked in March 2020.

[13] 5G-MEDIA Consortium, Deliverable D2.4: “Final Report on Architecture, Requirements and Specification”, February 2020.

[14] 5G-MEDIA Consortium, Deliverable D3.2: “Specification of the 5G-MEDIA Serverless Computing Framework”, August 2018.

[15] 5G-MEDIA Consortium, Deliverable D3.4: “5G-MEDIA Operations and Configuration Platform”, November 2019.

[16] 5G-MEDIA Consortium, Deliverable D4.1: “5G-MEDIA Catalogue APIs and Network Apps”, August 2018.

[17] 5G-MEDIA Consortium, Deliverable D2.3: “5G-MEDIA-Platform Architecture”, November 2018.

https://github.com/mkheirkhah/5gmedia/tree/deployed/cno


Page 113 / 126

7 Annex

7.1 CONVENTIONAL PRODUCTION VS. 5G-MEDIA REMOTE PRODUCTION

CONVENTIONAL PRODUCTION VS.

5G-MEDIA REMOTE PRODUCTION

RTVE REPORT V3

ABSTRACT

5G-Media “Remote Production” is a paradigm joining broadcast industry along IT internetworking.

This combination yields an advantage factor for both sides offering a new business model.

Broadcast industry can take advantage of a 5G-Media “Remote Production” Service. As the main

benefit is a cost reduction, other profits derive from this, for instance, enhanced capabilities for

covering more TV live events simultaneously among others. The result is an enhanced viewer target.

The purpose of this Report is to make a comparison between a Conventional Production vs. a new

5G-Media remote The Production example will be the coverage of a Live Sport event, in this case an

indoor football.

PROGRAMME DETAILS

This data is extracted from a real scenario. The regular Indoor Football League comprises the

following features:

• This kind of production is covered by 9 cameras with 7 Camera operators. 7 of them are

transmitted simultaneously. 2 of those cameras are unattended and they are placed on the

football goal posts.

• Microphones are placed on 3 cameras. Usually a stereo main pair is placed in the middle.

• Some images are recorded before the match and played later.

• There is a pre-match programme segment with two commentators set. They are talking

while the players are coming to the arena and the images previously recorded are played.

This programme is 15 minutes to half an hour long just before the match.

• The match starts, it lasts two hours with a pause in between.

• Finally, a post-match segment is aired by the commentators talking about the match

summary, playing-out the match highlights. Usually it is 5 minutes long.

RESOURCE ENUMERATION

Let us divide them in Human resources and technical resources.

HUMAN RESOURCES

For this event, there are the following personnel needed to be on site:

• 2 commentators, they speak during the match over sound ambience but also make

interviews before the match.


Page 114 / 126

• 2 Director and Director Assistant, they are in charge of TV signal production, coordinating all

the artistic resources, preparing the script and directing the signal on air in the O.B.

• 7 Camera operators, these resources are not going to change substantially compared to new

remote production schema.

• 1 Producer, in charge to coordinate the event deployment and manage the budget.

• 2 Broadcast technicians, they are a technical chief and a plain technician. They are in charge

that all the technical resources on the O.B. signal link etc. would be on duty.

• 3 Installation personnel, one of them is usually the O.B. truck driver. They are in charge of

cabling deployment, camera assembly and general rigging.

• 3 Audio operators, one of them is the Sound Supervisor whereas the other two are sound

technicians. They are in charge of sound operation on O.B, sound cables deployment,

microphone placement, attending commentator position and coach-players boom operation

as well as intercommunication chores.

• 3 EVS operators, these technicians are in charge of recording video material, playing out on

TV producer demand and edit the highlights and melting reel.

• 1 Rigging operator, this person is in charge of attending mechanical fixtures for cameras

stands and other mechanical concerns.

• 1 Video mixer technician, he is in front of the video switcher in close contact to TV Producer

orders.

• 1 Image Control Technician, he is in charge of adjusting camera parameters like colorimetry,

diaphragm aperture, dynamic range, etc. One of the main goals is to assure that all the

cameras show the same perceptual aspect.

• 1 self-generator operator, usually it is needed to auto-generate power supply whereas the

arena infrastructure is not able to cope with such energy demand. For that it is needed an

operator to manage the power supply.

• 1 Uplink Technician, in charge of the international satellite delivery.

TECHNICAL RESOURCES

On the other hand, a conventional production needs the following material resources:

• 1 O.B. (Outside Broadcast truck) E or G type. The number of cameras classifies the O.B.

trucks. In this case, this is a 9 cameras O.B. It needs a 63 A input power supply. 1 Auxiliary

Truck. It serves to stow the rest of material as Camera, camera pods, cables, antennas,

rigging and all the miscellaneous material needed for the broadcast event coverage.

• 1 Satellite Link truck. The purpose of this resource is transporting the signal to a Broadcast

Centre. This step is called Signal Contribution.

• 1 Self-generator truck. Most of the venues are not able to supply the power needed for

these technical trucks.

• 1 Super Slow camera. Not included as default material provided by the O.B. set.

• 2 Mini-cameras. Neither, material not included in the set. They will cover the goal angle

shot.

5G-MEDIA PRODUCTION ENUMERATION

At this point and having into consideration the features that 5G-Media brings to a Remote

Production model, the deployment of human and technical resources will be:


Page 115 / 126

HUMAN RESOURCES (ON SITE)

• 2 commentators, normally they have to stay there for interviews but even they could not be

aggregated to the venue moved staff in a Remote Production Model.

• 1 Director or Director Assistant will be in charge of both duties.

• 7 Camera operators.

• 1 Producer.

• 1 Broadcast technician.

• 3 Installation personnel.

• 1 Audio operator assistant.

• 1 Rigging operator.

• 1 Self-generator operator, usually it is needed to auto-generate power supply whereas the

arena infrastructure is not able to cope with such energy demand. For that it is needed an

operator to manage the power supply.

HUMAN RESOURCES (AT TVE HEADQUARTERS)

• 1 Director or Director Assistant.

• 1 Broadcast technician (studio control room).

• 2 Audio operator assistant.

• 3 EVS operators.

• 1 Video mixer technician.

• 1 Image Control Technician.

TECHNICAL RESOURCES (ON SITE)

On the other hand, as it is not needed an O.B. truck where the production is made (Video Switcher,

Audio Desk, Technical control, etc.) but neither a Link truck is needed onsite, the technical resources

will be reduced to a single truck, this is, an auxiliary truck for stowing cameras and rigging but a small

modification would be needed. This is a very small rack area where the interface to the 5G-Media

gateway would be located. In addition, this rack has the purpose of checking and boundary point for

troubleshooting.

Therefore, the enumeration would be:

• 1 Auxiliary / Technical Endpoint truck, as explained above.

• 1 Super Slow camera.

• 2 Mini-cameras.

TECHNICAL RESOURCES (AT TVE HEADQUARTERS)

• 1 TV Studio Control Room. It’ll be used to mix all the feeds from origin.

• 1 EVS Room

• 1 Satellite Link Truck. In this particular case we will need to deliver a World Feed for other

broadcasters.

RESOURCE AND COST COMPARISON

First, let’s show the figures for Conventional Production:

5G-MEDIA - Grant Agreement number: 761699 D6.1: 5G-MEDIA Use Case Scenarios and Testbed

Page 116 / 126

Table 1: Overview of costs of a Conventional Production


Page 117 / 126

This table shows the days involved in the production being day N-1 the day before for park and

powering, rigging mainly. Then the event day is denoted by day N. Finally, day N+1 is the day after

where derigging and come back to HQ is done.

Each cell indicates the total amount of hours per day-resource. In this kind of events usually

everyone surpasses a regular shift time reaching 10 or even 12 hours.

Next columns indicate total hours per resource, total cost per resource and cost of Per Diem,

(transport + food expenses)

As before, this table is divided in Human resources and technical resources.

Next, here is the table for a 5G-Media “Remote Production” for the exact same event:


Page 118 / 126

Table 2: Overview of costs of a 5G-MEDIA Production


Page 119 / 126

The total costs are presented here:

HUMAN RESOURCES COSTS

On site HQ TOTAL COST

REDUCTION

Conventional Production

23.189,09 € 23.189,09 € 27.04%

5G-Remote Production 14.981,18 € 1.938,00 € 16.919,18 €

The cost of Human Resources in location has been reduced from 23.189,09€ to 14.981,28€.

Nevertheless, as some people must attend their role in the Main Facilities, there is a new cost item of

1.938,00€. All in all, the cost is reduced 27.04%.

€23,189.09

€14,981.18

€1,938.00

€-

€5,000.00

€10,000.00

€15,000.00

€20,000.00

€25,000.00

Conventional Production 5G-Remote Production

On site HQ


Page 120 / 126

PER DIEM COSTS

COST

REDUCTION Conventional

Production 11.938,70 €

35.06% 5G-Remote Production 7.753,50 €

The Per Diem cost, due to less people movement, gets a reduction of 35.06%.

€11,938.70

€7,753.50

€-

€2,000.00

€4,000.00

€6,000.00

€8,000.00

€10,000.00

€12,000.00

€14,000.00



Page 121 / 126

AGGREGATED HUMAN RESOURCES COSTS

On site HQ PER DIEM TOTAL COST

REDUCTION


23.189,09 € 11.938,70 € 35.127,79 € 29.76%

5G-Remote Production 14.981,18 € 1.938,00 € 7.753,50 € 24.672,68 €

If we aggregate the cost of Human resource cost plus Per Diem, the Total Human Resource cost

decreases a 29.76% of the Budget.

€23,189.09

€14,981.18

€1,938.00

€11,938.70

€7,753.50

€-

€5,000.00

€10,000.00

€15,000.00

€20,000.00

€25,000.00

€30,000.00

€35,000.00

€40,000.00


AGGREGATED HUMAN RESOURCES COSTS PER DIEM

AGGREGATED HUMAN RESOURCES COSTS HQ

AGGREGATED HUMAN RESOURCES COSTS On site


Page 122 / 126

TECHNICAL RESOURCES COSTS

On the other hand, the Technical resources is broken down in this way:

On site HQ TOTAL COST

REDUCTION


15.835,84 € 15.835,84 € 45.48%

5G-Remote Production 6.663,75 € 1.969,85 € 8.633,60 €

Here, we can appreciate a new item cost at the Main Facility side. However, the reduction cost

reaches 45.48%.

€15,835.84

€6,663.75

€1,969.85

€0.00

€2,000.00

€4,000.00

€6,000.00

€8,000.00

€10,000.00

€12,000.00

€14,000.00

€16,000.00

€18,000.00


On site HQ


Page 123 / 126

TOTAL COSTS

To sum up, the total reduction cost is:

COST REDUCTION

Conventional Production 50.963,63 € 34.65%

5G-Remote Production 33.306,29 €

CONCLUSIONS AND FURTHER ACTIONS

This report proves from an economical point of view the advantage of using break through 5G-Media

Edge Computing in regular Broadcast Productions. Such figure is a net cost reduction of 34.65%.

This report -to be complete- needs an estimation of the renting cost of a 5G-Media Edge Cloud

Service provided by Telco-Companies or third-party service provider. These costs can be difficult to

estimate due to it depends the business model applied, and it is out of scope of this report.

€50,963.63

€33,306.29

€0.00

€10,000.00

€20,000.00

€30,000.00

€40,000.00

€50,000.00

€60,000.00


COST REDUCTION


Page 124 / 126

7.2 BROADCAST CONTENT EXCHANGE - Paradigms and approach to 5G Technology

ABSTRACT

Broadcast industry consists in flowing content streams from one or some origins to multiple

destinations. The most part of the business model are directly related to the exchange of contents.

These flows obey to classified scheme as this document is going to explain.

BROADCAST CONTENT EXCHANGE TAXONOMY

The need of a centralized broadcast facility is the main determining factor that explains the

taxonomy. It consists in a stream concentration towards the main facility and a stream expansion

towards the final destinations.

The main facility is called:

• Central Control Area (CCA),

• Master Control Room (MCR), or simply

• Broadcast Main Facility or Headquarter.

The core element of this facility is a router. Their mission is to interconnect the concentration stream

inputs towards the stream outputs going to the final destinations.


Page 125 / 126

CONTRIBUTION

The set of streams that go from every content origin to the broadcast main facility is called

Contribution. There are some real scenarios that fit in this schema.

Scenario 1 - Broadcast permanent facility

Here, a Public Broadcast company (as RTVE for instance) or private company owns a complete

infrastructure. It comprises:

• content origin points,

• transport lines that connects the origin points to the router and

• Central Control Area.

Scenario 2 – Broadcast Ephemeral Facility

In this case, a big Broadcast Facility is built for a short period of time. The reason in the coverage of a

Special Event such as Olympic Games, FIFA World Cup and others.

Contribution Elements

The content origin points are:

• Studios. They can stream signals for on air live programs or for recording. These studios

could be in the same building complex or far from the broadcast main facility.

• Ephemeral remote locations. It is needed to deploy a mobile infrastructure in some trucks.

This is called O.B. (Outside Broadcast).

• Occasional Electronic News Gathering (ENG). It comprises a lightweight and streamlined

Team to get as fast as possible to the location for covering news. Increasingly, personal

mobiles devices are used for this purpose in sake of news coverage.

When the origin is far from the Broadcast Main facility the concept of Contribution plays a special

role. This is due to the need to transport the signal by a carrier. Nowadays, the carrier is a third party,

normally a telco company. They are carried by Satellite, Terrestrial Microwave Link, Terrestrial

Optical Fiber or maybe, Submarine Cable.

Transported signal has been evolved from analogical stream (video signal) to digital stream. From

some time ago, the digital signal has been compressed to avoid high bandwidth but at the cost of

latency in between. The signal, then, is encapsulated in IP packets (what is called an IP Transport

Stream).

The next natural step with the arrival of processing at the edges is the Remote Production. The main

advantage, in short, is the cost reduction due to the advantage of moving less Technical resources to

the content origin points (venue), and the capability of reaching more locations simultaneously due

to the cost reduction. Other advantages of edge-to-cloud virtualization is creating as many virtual

functions as needed for instance for assuring signal quality in an IP packet environment or for

replicate regular broadcast functions as a Video Switcher, Audio desk, Camera control, signal format

conversion, etc.

5G Media Use case 2 Scenario 1 and Scenario 2 are the materialization of the Contribution concept in

Broadcast Industry.


Page 126 / 126

DISTRIBUTION

Once the streams are collected to the router in the Main Facility, the next step is the diffusion of such

signal. This is called Distribution.

The final destinations could be classified by:

• Internal departments (in-house). Theses signal streams must be pass through Storage areas

for editing and archive or for further transmission. Normally such installation is in the Main

Building. 5G-Media Edge Computing could help in de-centralizing these facilities allowing to

storage Program Contents in the cloud.

• Terrestrial coverage. It involves the coverage to the end user (i.e. the viewer at home).

Terrestrial diffusion is normally taken by a third-party company in the form of DBV.T signal

transmission. Several distribution signals are packed into a channel transmission called

multiplex. Depending if the multiplex’ signals belongs to a same broadcaster or several of

them, the multiplex encoding can be allocated and managed by the broadcaster or the

terrestrial network company.

• Third party right holders. In this section, there are mainly companies that can distributed the

signal of the broadcast at the same time of the terrestrial coverage. Such network could by

satellite (DBV-S) or by cable platform, normally by IP networking. Also, the contribution can

be served on demand or in another static broadcast schedule different from the terrestrial

coverage.

Date post:	01-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Programmable edge-to-cloud virtualization fabric for the ......Diagram of VNF deployment in TID...

Documents