Hybrid Backup Resource Optimization for VNF Placement over Optical Transport NetworksJoão Pedro, António Eira
2© 2019 Infinera. All rights reserved. Company Confidential.
Outline
• Motivation
• Network Scenario
• Survivability Mechanisms
– Hop Protection
– Chain Protection
– Hybrid Protection
• Optimization Model
• Simulation Results
• Conclusions
3© 2019 Infinera. All rights reserved. Company Confidential.
Introduction
• Advent of 5G brings increasingly diverse service requirements
– High BW, high-availability, latency-critical
• Edge computing vital to meet these requirements
– Introduce application-awareness in the network
• Carriers re-purposing central offices (COs) as data centers (DCs)
– Opportunity to cost-effectively host computing resources atmetro-aggregation nodes which are closer to end-users
• Converged nodes combining DC/virtualization capabilities with packet/optical transport
– Joint visibility of both IT and networking resources
– Opportunity to exploit more efficient resource dimensioning methods
4© 2019 Infinera. All rights reserved. Company Confidential.
Network Scenario
• Network Topology
– Metro DWDM ring network
– Each optical node may be co-located with a DC
• Logical Topology
– Service chains deployed between two end nodes byinstantiating the set of required VNFs at one or more DCs
– Logical topology must ensure each set of VNFs in a chaincan be traversed in the target order
– Placement of VNFs across the network determines bothIT requirements and the logical topology to support them
• Survivability to Link Failures: paramount to support critical 5G Services
DC Node
A
B
C
D
E
F
G
H
VNF SetSource
𝑩𝟏
G f1 f2 f3 B𝑩𝟐 𝑩𝟑 𝑩𝟒
Destination
Bandwidth per hop [Gb/s]
Maximum E2E latency [ms]
5© 2019 Infinera. All rights reserved. Company Confidential.
Survivability Mechanisms: Hop Protection
• Link-failure survivability via protecting each optical hop of the VNF chain
Low transponder count: backup paths only require additional spectrum
Low IT resource usage: no duplication of VNFs to protect against link failures
Reduced lightpath capacity: same channel format in working and backup lightpaths defined by worst performing one (extreme differences in a ring network)
High worst-case latency: single link failure can trigger multiple backup paths
Lightpath rate constrainedby longest path whentransponders are sharedbetween working/backup
Working Path
Backup Path
Transponder *
fiBackup VNF i
fiWorking VNF i
G A BF
f1, f2 f3
Link-disjointness enforced betweenworking/backup links on each chain hop*
*not required when VNFs at same node
16-QAM
16-QAM
QPSK ** * * **
6© 2019 Infinera. All rights reserved. Company Confidential.
Survivability Mechanisms: Chain Protection
• Link-failure survivability via replicating the VNF chain over a disjoint path
Lightpath rate constrainedby longest path whentransponders are sharedbetween working/backup
Working Path
Backup Path
Transponder *
fiBackup VNF i
fiWorking VNF i
High lightpath capacity: independent lightpaths, each using the best channel format
Low worst-case latency: bounded by working or backup chain (independent of failed link)
High transponder count: additional transponders required to support backup lightpaths
High IT resource usage: additional storage/compute resources to duplicate VNFs…
… but provides resilience against failures within DCs
G
H
B
A
F E
f3
f364-QAM
64-QAM
64-QAM
16-QAM
Link-disjointness enforced betweenworking/backup links across entire chain
f1, f2
f1, f2* *
7© 2019 Infinera. All rights reserved. Company Confidential.
Survivability Mechanisms: Hybrid Protection
Lightpath rate constrainedby longest path whentransponders are sharedbetween working/backup
Working Path
Backup Path
Transponder *
fiBackup VNF i
fiWorking VNF i
Lowest transponder count: by selectively combining the high lightpath capacity of Chain protection with the absence of dedicated backup transponder of Hop protection
Customizable and adaptable to specific network scenario and optimization priorities…
… but at the expense of a more complex design process
• Link-failure survivability via a combination of per hop protection with replicating segments of the VNF chain over disjoint paths
G
A
BE
F
f3
64-QAM 16-QAM
Link-disjointness enforced between working/backup links for all chain hops in the same cycle
16-QAM
f1, f2
f1, f2* * * *
8© 2019 Infinera. All rights reserved. Company Confidential.
Optimization Model
• Multifactorial problem structure
– Optimal protection mechanism depends on several factors
» e.g. service BW, possible DC placements, latency constraints
– Protection mechanism impacts lightpath capacity
» e.g. mitigate impact of working/backup ligthpaths with large performance differences via introducing VNF redundancy
• Results in a complex optimization problem
– Routing and spectrum assignment on top of VNF placement, considering specific optical performance constraints associated to each protection mechanism
• Solved via a single ILP model
– Jointly captures all interdependencies
– Applicable to small/medium sized networks,suitable to Metro ring networks
Access
Aggregation
Core Interface
Storage/Compute
Transponder
Bank
Enables fair comparison of the 3 protection mechanisms
9© 2019 Infinera. All rights reserved. Company Confidential.
Optimization Model
• ILP Objective Function
– Min. total number of transponders required for working and backup chains
• Main ILP Constraints
– Flow conservation; working/backup chain segment disjointness
– Shared vs dedicated backup transponder; lightpath capacity defined by worst-case working/backup in case of shared backup transponders; max. number of optical channels per link
– Max. number of nodes hosting a DC; max. IT resource capacity per node; max. working chain latency
• Chain/Hop protection modelled by manipulating variables and adding constraints
• Number of variables grows with N2 (N – number of nodes)
– Model all candidate paths between arbitrary node pairs (for every chain hop)
10© 2019 Infinera. All rights reserved. Company Confidential.
Simulation Results: Network Scenario
• Network topology and traffic load
– Ring topologies with {200, 400} km,comprising {5, 10} nodes
– Up to {40, 80} % of nodes host a DC
– 10 Tb/s of traffic (summing over all VNFhops of every chain) generated uniformlybetween all nodes
• Channel formats and optical design
– Flex-rate interfaces with line rates 100-600 Gb/s
» Modulation formats BPSK, QPSK, 8QAM, 16QAM, 32QAM, 64QAM
» Symbol rate of 64 Gbaud; 75 GHz frequency slots
– Reach estimation model accounts for filtering penalties,crosstalk levels and express losses
Virtual Network Functions
1 - Network Address Translation
2 - Firewall
3 - WAN Optimization Controller
4 - Intrusion Detection Prevention System
5 - Video Optimization Controller
6 - Traffic Monitor
Service Type VNF Chain
Web Services 1-2-6-3-4
VoIP 1-2-6-2-1
Video Conferencing 1-2-6-5-4
Cloud Gaming 1-2-5-3-4
5G Service 1-2-6-3-5
From Savi et al., “To distribute or not to distribute? Impact of latency on virtual network function distribution at the edge of FMC networks”, ICTON 2016, We.C3.4
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9
Max
imu
m D
ata
Rat
e [
Gb
/s]
Ring Hop Count
10 Nodes - 200 km
10 Nodes - 400 km
5 Nodes - 200 km
5 Nodes - 400 km
11© 2019 Infinera. All rights reserved. Company Confidential.
Simulation Results: Transponder Count
• Chain protection requires the highest number of transponders
– 23 to 94% more transponders wrt Hop protection
– More inefficient in shorter rings: due to smaller working/backup optical performance differences
• Hybrid protection requires the lowest number of transponders
– 3% reduction wrt Hop protection for 200 km rings: small working/backup performance gap discourages using chain protection
– 9% reduction wrt Hop protection for 400 km rings: uses chain protection in some segments to mitigate working/backup performance gap
0
50
100
150
200
250
300
40% 80% 40% 80% 40% 80% 40% 80%
5 Nodes 10 Nodes 5 Nodes 10 Nodes
200 km 400 km
Nu
mb
er
of T
ran
sp
on
de
rs
Ring Length
Chain ProtectionHop ProtectionHybrid Protection
Max DC
Node %
Results averaged over 10 independent runs
12© 2019 Infinera. All rights reserved. Company Confidential.
0
500
1000
1500
2000
40% 80% 40% 80% 40% 80% 40% 80%
5 Nodes 10 Nodes 5 Nodes 10 Nodes
200 km 400 km
IT C
ap
acity U
nits
Ring Length
Chain ProtectionHop ProtectionHybrid Protection
Max DC
Node %
Simulation Results: IT Resources
Results averaged over 10 independent runs
• Chain protection requires the highest amount of IT resources
– VNFs duplicated at every node
• Hop protection requires the lowest amount of IT resources
– No VNF duplication
• Hybrid protection only requires slightly more IT resources than Hop protection
– Extent of increase related to transponder savings
– 9% transponder savings obtained at expense of 24% extra IT resources
13© 2019 Infinera. All rights reserved. Company Confidential.
0
0.5
1
1.5
2
40% 80% 40% 80% 40% 80% 40% 80%
5 Nodes 10 Nodes 5 Nodes 10 Nodes
200 km 400 km
La
ten
cy [m
s]
Ring Length
Chain Protection
Hop Protection
Hybrid Protection
Max DC
Node %
Simulation Results: Latency
Results averaged over 10 independent runs
• Working chain latency
– Only depends on path selected
– Chain protection tends to increase latency:VNF replication can result in higher spread of functions across DCs => higher hop count per chain
• Backup chain latency
– Requires simulating every link failure to determine worst-case value
– Slightly higher latency with hop protection:extra latency from rerouting around the ring between end nodes of failed hop
• Hybrid protection exhibits smoother fluctuations in working & backup latency
Working
Backup (worst-case)
14© 2019 Infinera. All rights reserved. Company Confidential.
Conclusions
• Metro networks with optical nodes co-located with DCs
• 3 mechanisms for VNF chains to survive link failures
– Hop, chain and hybrid protection
• ILP model to evaluate and compare the effectiveness of the 3 strategies
• Simulation results highlight the merits of hybrid protection
– Requires lowest number of transponders by selectively using VNF replication in cases where there are significant performance differences between working and backup paths…
– …effectively trading-off IT resources for transponders…
– ….and without compromising latency
16© 2019 Infinera. All rights reserved. Company Confidential.
• Rational re-use of the existing fiber layout Often ring topologies interconnecting aggregation sites
One node interfaces with the core network
• Basic requirements Capacity: support very high BW towards the core
Flexibility: support reconfiguration of existing servicesin response to traffic dynamics and VNF re-optimization
Resiliency: services should survive failures while meetingtheir requirements with minimal resource overprovisioning
• Evaluate optical node architectures for Metro rings Broadcast-and-Select (B&S), i.e., ROADM node
Drop-and-Waste (D&W) , i.e., Filterless node
Fixed Filter, i.e. FOADM node
Motivation
17© 2019 Infinera. All rights reserved. Company Confidential.
Metro Transport Architectures
ROADM Node
1 … N
SC
Drop stage Add stage
1 … N
WSS
Flexible use/re-use of frequencies between different node pairs
Noise bandwidth filtered at each node
Easier upgrade to higher nodal degrees (i.e., from ring/chain to mesh)
WSS at the add stage increases cost
Susceptible to cascaded filtering penalties
Drop stage: power splitterAdd stage: low-port count pluggable WSS
18© 2019 Infinera. All rights reserved. Company Confidential.
SC
Drop stage
SC
Add stage
1 … N1 … N
Metro Transport Architectures
Filterless Node
Lowest cost realization of add and drop stages
Not susceptible to cascaded filtering penalties (natively gridless)
Any frequency can be used between any node pair…
…but frequency re-use is not possible (each frequency used for a single node pair)
Core facing node still requires a WSS to avoid lasing loop
Drop stage: power splitterAdd stage: power combiner
19© 2019 Infinera. All rights reserved. Company Confidential.
FF FF
Drop stage Add stage
l1 l2 l3 l4 l1 l2 l3 l4
Metro Transport Architectures
FOADM Node
Drop stage: fixed filterAdd stage: fixed filter
Low cost fixed filters at the add and drop stages
Frequency re-use by different node pairs is possible…
…but only according to the installed filter configurations
Colored ports further limit dynamic resource allocation
Requires reserving spectrum bands for hub-tributary and single-hop connections
20© 2019 Infinera. All rights reserved. Company Confidential.
• 1+1 protection at the optical layer
Every lightpath replicated in the opposite ring direction
Line-side protection enables to use the same transponderfor working/backup paths
• Impact in latency
Latency constraints are imposed on the working servicechain end-to-end
Additional latency due to failures evaluated as the highestlatency added over the whole chain given any single link failure
• Filterless nodes impose additional requirement Working/protection signals are split between the fiber pairs
Tx/Rx signals must use different frequencies to avoidinterference
Protection in Service Chains
VNF 1VNF 2
VNF 3
Source
Destination
Working
Protection
SC
Drop stage
SC
Add stage
1 … N1 … N
SC
Drop stage
SC
Add stage
1 … N1 … N
Tx Rx
Working
Protectionλ1 λ2
λ1 , λ2
λ2
λ2
λ1 , λ2
21© 2019 Infinera. All rights reserved. Company Confidential.
Simulation Results
Network Scenario
• Network topology and traffic Ring topology with 100 km and containing 5, 10 or 15 evenly spaced aggregation nodes
2.5 to 7.5 Tb/s of total traffic load, with varying shares of locally processed traffic (25-75%)
• Each scenario is re-optimized four times Changing traffic patterns and re-configuring VNF instantiation
• Channel formats and optical design Flex-rate interfaces with line rates between 100-600 Gb/s
Symbol rates between 32 and 64 Gbaud
Modulation formats between QPSK and 64-QAM
Full C-band (4.8 THz), flexible grid (12.5 GHz granularity)
Pre-amplifiers only deployed when attenuation exceeds 20dB
Reach estimation model for the different architectures accounts for filtering penalties, crosstalk levels and express losses
22© 2019 Infinera. All rights reserved. Company Confidential.
Simulation Results
Number of Transponders & Average Lightpath Spectral Efficiency
• Filterless viability hampered by (1) number of nodes, (2) total traffic load, (3) share of locally processed traffic
Any of which leads to faster spectrum exhaustion
• ROADM requires less transponders in the majority of cases, being the most robust architecture
• In some cases, FOADM and even Filterless benefit from slightly better optical performance than ROADM
More spectral efficient formats occasionally possible due to lower express losses and reduced filtering penalties
0
40
80
120
160
200
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
2.5 5.0 7.5 2.5 5.0 7.5
25% 75%
Tota
l Tra
nsp
on
der
Co
un
t
ROADM Filterless FOADM
OfferedLoad [Tb/s]% of Service-Chaining Traffic
Ring Nodes
0
2
4
6
8
5 10 15 5 10 15 5 10 15 5 10 15 5 10 15 5 10 15
2.5 5.0 7.5 2.5 5.0 7.5
25% 75%
Ave
rage
Lig
htp
ath
SE
[b/s
/Hz]
ROADM Filterless FOADM
% of Service-Chaining Traffic
23© 2019 Infinera. All rights reserved. Company Confidential.
Simulation Results
Latency
• Filterless and ROADM outperform FOADM with respect to latency performance
No clear advantage of one architecture over the other
• FOADM can introduce significantly more latency
Higher number of lightpath hops per chain, in order to escape connectivity restrictions imposed by the fixed filters
Further aggravated in case of single link failure
0
1
2
3
RO
AD
M
Filt
erle
ss
FOA
DM
RO
AD
M
Filt
erle
ss
FOA
DM
RO
AD
M
Filt
erle
ss
FOA
DM
RO
AD
M
Filt
erle
ss
FOA
DM
RO
AD
M
Filt
erle
ss
FOA
DM
RO
AD
M
Filt
erle
ss
FOA
DM
5 10 15 5 10 15
2.5 7.5
Late
ncy
[m
s]
PropagationO/E/OProtection
Offered Load [Tb/s] (25% service-chained traffic)Ring
Nodes
24© 2019 Infinera. All rights reserved. Company Confidential.
Evaluated the capacity and scalability of 3 possible Metro optical transport architectures for supporting 5G services with survivability constraints
– Coexistence of Metro-Core traffic with service chained traffic within the Metro network
– Vary network node count, total traffic load and share of service chained traffic
– Optimize VNF placement (IT resources) and transponder count (transport resources)
Conclusions
ROADM
Deployment “sweet spot”: small node count, highly dominant hub traffic (ill-suited when increasing the amount of IT resource sharing within the Metro network)
FOADM
Filterless
Most robust and scalable architecture: any network size, any traffic pattern, with possible upgrade of nodal degree (WSS still implies a premium)
Reasonably scalable but with substantial trade-offs: increase of transponder count and worst-case latency and demanding careful fixed frequency allocation