+ All Categories
Home > Documents > Achieving Five Nines of VNF Reliability in Telco-grade ...

Achieving Five Nines of VNF Reliability in Telco-grade ...

Date post: 03-Oct-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
12
Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud Panel Discussion Kandan Kathirvel, AT&T; Eoin Walsh, Intel; Rimma Iontel, Red Hat Inc. and Fausto Marzi, Ericsson. Moderated by Haseeb Akhtar, Ericsson 1:50 PM – 2:30 PM on Wednesday, April 27, 2016
Transcript
Page 1: Achieving Five Nines of VNF Reliability in Telco-grade ...

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud

Panel Discussion

Kandan Kathirvel, AT&T; Eoin Walsh, Intel; Rimma Iontel, Red Hat Inc. and Fausto Marzi, Ericsson.

Moderated by Haseeb Akhtar, Ericsson

1:50 PM – 2:30 PM on Wednesday, April 27, 2016

Page 2: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 2

Converting PNF to VNF without cloud awareness is not optimal

Compute

Datacenter / Central Office

Application

Orchestration Software

Software defined Network

Same

Purpose built and dedicated for field of use

N/w Fabric Purpose built & dedicated (Most cases)

Physical Connection – no SDN

Physical Network Function

Virtual Network Function

Same

Host OS & Virtualization

Not Virtualized (most cases) Vendor provided OS

Mostly Manual or Contained

Purpose build software Operationally Perfected over decades

Early adoption and rapid evolution

Same

Multi-tenant (Common framework)

Commodity hardware (any Vendor)

Cloud Provided (Common)

Common automation Rapidly evolving

New & Evolving Requires significant innovation

Page 3: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 3

VNF availability depends on Cloud & VNF resiliency

High Risk of Application Outages Low Risk of Application Outages Cloud Aware Applications

Most of VNFs Few

1 2 3 4 5

Openstack Region

Geo Location 1

(DC1) Geo Location 2

(DC2)

Geo Location 1

(DC1) Geo Location 4

(DC4)

99.99% (52.56 mins down/year) 99.9% (8.76 hrs down/year) 99.999%(5.26 Mins down/year)

VNF Availability Some VNFs current state

VNF HA in a region VNF - Single VM VNF HA in 2 regions at same DC VNF HA across 2 DCs VNF HA across 4 DCs

Optimal VNF

Single Instance of OpenStack region is about 99.9% (8.76 hours unplanned downtime per year)

Openstack Region

Openstack Region1

Openstack Region2

Single DC Single DC Single DC

Few Optimal 2 Regions at a DC

Page 4: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 4

Proposed OpenStack Enhancements

Few

• Hitless upgrades – reduce overall platform downtime

• Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support

• Multi-location awareness & workload placement

• Resiliency/Stability testing framework in OpenStack Rally – measure and report

• Auto healing framework for OpenStack Controllers

VNF evolution • Support HA both locally and Globally

• Leverage OpenStack/Cloud Platform resiliency features ex: anti-affinity to place VMs on different servers

Page 5: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 5

Open APIs

VIM

Compute

Platform Resource Monitoring & Reporting

Virtualization

Enhanced Platform Awareness

Service Catalog

Service Orchestration

Service Assurance

VNF Manager

SDN Controller

Network Orchestration

Security

Services

Virtual Resource Monitoring & Reporting

Descriptor Repositories

OSS/BSS

VNFC

Analytics

Storage Network

vCompute vStorage vNetwork

VNF VNFC VNFC

VNF VNFC

NFV Ready Architecture

Page 6: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 6

Proposed OpenStack Enhancements

Few

• Hitless upgrades – reduce overall platform downtime

• Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support

• Multi-location awareness & workload placement

• Resiliency/Stability testing framework in OpenStack Rally – measure and report

• Auto healing framework for OpenStack Controllers

• Automated provisioning and monitoring (Ceilometer, Heat and Ironic)

• Intelligent workload placement (Nova scheduler)

Page 7: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 7

Open APIs

VIM

Compute

Platform Resource Monitoring & Reporting

Virtualization

Enhanced Platform Awareness

Service Catalog

Service Orchestration

Service Assurance

VNF Manager

SDN Controller

Network Orchestration

Security

Services

Virtual Resource Monitoring & Reporting

Descriptor Repositories

OSS/BSS

VNFC

Analytics

Storage Network

vCompute vStorage vNetwork

VNF VNFC VNFC

VNF VNFC

NFV Ready Architecture

Page 8: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 8

Proposed OpenStack Enhancements

Few

• Hitless upgrades – reduce overall platform downtime

• Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support

• Multi-location awareness & workload placement

• Resiliency/Stability testing framework in OpenStack Rally – measure and report

• Auto healing framework for OpenStack Controllers

• Automated provisioning and monitoring (Ceilometer, Heat and Ironic)

• Intelligent workload placement (Nova scheduler)

• Tools to measure, monitor and report end-to-end platform SLA

Page 9: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 9

Compute Node HA – Local

Prerequisite:

Compute nodes shared

storage

Disaster Workflow:

Disaster is detected

Compute node evacuation

Users connect to the new

service

Risks:

Node Fencing

Active HA Controller

Compute

Node 1

VM VM

VM

HA Agent

Compute

Node n

HA Agent

Compute

Node 2

VM VM

VM

HA Agent

VM VM

VM

Control

Node 1

Control

Node 2

Control

Node 3

Database

Corosync + Pacemaker

HA HA HA

VM VM

VM

VM VM

VM

VM VM

VM

Page 10: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 10

Compute Node HA – Global

BGPAS XXXXX

Internet

DC1 DC2

1.1.1.0/24 1.1.2.0/24

VM1Evacuation

VM1

FIP: 100.100.1.15

Freezer-dr-api

FIP: 100.100.1.15

Freezer-dr-api

Prerequisite:

Data replication

Operational Workflow:

Floating IPs retrieved from

Nova

Announce IPs with BGP or

OSPF

Disaster Workflow:

Disaster is detected

Compute node evacuation

On the other Compute Nodes the floating IPs are retrieved

Floating IP announced with BGP or OSPF

Users connect to the new service

Risks:

Node and DC Fencing

Page 11: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 11

Proposed OpenStack Enhancements

Few

• Hitless upgrades – reduce overall platform downtime

• Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support

• Multi-location awareness & workload placement

• Resiliency/Stability testing framework in OpenStack Rally – measure and report

• Auto healing framework for OpenStack Controllers

• Automated provisioning and monitoring (Ceilometer, Heat and Ironic)

• Intelligent workload placement (Nova scheduler)

• Global and Local Compute HA management

Page 12: Achieving Five Nines of VNF Reliability in Telco-grade ...

Page 12

Thanks! We need to build together.


Recommended