Date post: | 22-Dec-2015 |
Category: |
Documents |
Upload: | lauren-green |
View: | 214 times |
Download: | 1 times |
1
High Availability for OPNFV
May. 2015
2
Agenda
• Project introduction• Scenarios discussion• Requirement doc • Gap analysis• Blue prints• Open questions: Storage HA
3
Project Introduction
Fu Qiao
4
Project Detail• Project Page:
• https://wiki.opnfv.org/high_availability_for_opnfv• Weekly meeting:
• Wednesday at 13:00pm-14:00pm UTC
• https://wiki.opnfv.org/high_availability_project_meetings• Mailing list
• Opnfv-tech-discussion [availability]• Participants:
• Hui Deng [[email protected]]• Jolliffe, Ian [[email protected]]• Maria Toeroe [[email protected]]• Qiao Fu [[email protected]]• Xue Yifei [[email protected]]• Yuan Yue [[email protected]]• Yao Cheng LIANG [[email protected]]• Sean Winn [[email protected]]• Joe Huang [[email protected]]• Georg Kunz [[email protected]]• Basavaprabhu Badami [[email protected]]• ….
5
Project progress• The ongoing work of 1st release of OPNFV has included some HA schemas, e.g.
openstack HA, active/active or active/passive state of Rabbit MQ and Mysql, which is described in requirement doc. section 5.
• In this project, we further discuss the scenarios, framework, and detail requirements and API definition of HA in OPNFV platform.
• Project Outputs:– Service HA scenario analysis– Requirement Document
• https://etherpad.opnfv.org/p/High_Availabiltiy_Requirement_for_OPNFV– Gap Analysis of Openstack HA scheme
• https://etherpad.opnfv.org/p/ha_gap_analysis– Blue Prints:
• https://etherpad.opnfv.org/p/Blue_Print_From_HA_project• https://blueprints.launchpad.net/keystone/+spec/keystone-ha-multisite
– HA API description
6
Scenarios Discussion
Fu Qiao
7
Service Availability Levels for Carrier Grade VNFs
Recovery Time
Customer Type Recommendations
SAL 1 e.g. 5 – 6 seconds
- Network Operator Control Traffic
- Government/Regulatory Emergency Services
Redundant resources to be made available on-site to ensure fast recovery.
SAL 2 e.g. 10 – 15 seconds
- Enterprise and/or large scale Customers
- Network Operators service traffic
Redundant resources to be available as a mix of on-site and off-site as appropriate.On-site resources to be utilized for recovery of real-time services. Off-site resources to be utilized for recovery of data services.
SAL 3 e.g. 20 – 25 seconds
General Consumer Public and ISP Traffic
Redundant resources to be mostly available off-site. Real-time services should be recovered before data services
Source: ETSI GS NFV-REL 001 V1.1.1
8
ScenariosState Redundancy in VNF Failure detection Use Case
VNF
Stateful
yesVNF only UC1
VNF & NFVI UC2
noVNF only UC3
VNF & NFVI UC4
Stateless
yesVNF only UC5
VNF & NFVI UC6
noVNF only UC7
VNF & NFVI UC8
UC9: Repeated failure in VNF
9
NFVI
UC1: Stateful VNF with Redundancy
VM VM VM VM
VNF
VNFMSTB
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only
ACT
1. VNFC fails 2. NF fails*3. VNF detects the failure4. VNF isolates VNFC5. VNF fails over6. NF recovers7. VNF repairs VNFC
STB
Nothing new in this scenario
Recovery time
*Steps 1&2 are simultaneous they are separated for clarity
10
NFVI
UC2: Stateful VNF with Redundancy
VM VM VM VM
VNF
VNFMSTB
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF & NFVI
ACT
1. VM fails 2. VM Service fails
5a. VNF detects the failure
5b. NFVI detects the failure
6a. VNF fails over7a. NF recovers
11. VNF repairs VNFC
STB
3. VNFC fails4. NF fails*
6b. NFVI reports to VIM7b. VIM reports to VNFM8. VNFM ok to VIM9. VIM repairs VM
VM
10. VM Service recovers
Recovery time
*Steps 1-4 are simultaneous they are separated for clarity
11
NFVI
UC3: Stateful VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only
1. VNFC fails 2. NF fails*3. VNF detects the failure4. VNF isolates VNFC5. VNF repairs VNFC
7. NF recovers
ACT
statestate
state
6. VNFC gets state
VNFC checkpoints its state to VD, which is HA
Recovery time
*Steps 1&2 are simultaneous they are separated for clarity
12
NFVI
UC4: Stateful VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF & NFVI
3. VNFC fails 4. NF fails*5a. VNF detects the failure6a. VNF reports to VNFM
12. VNFM repairs VNFC
14. NF recovers
ACT
statestate
state
13. VNFC gets state
1. VM fails 2. VM Service fails
5b. NFVI detects the failure6b. NFVI reports to VIM7. VIM reports to VNFM8. VNFM ok to VIM9. VIM repairs VM
10. VM Service recovers
VM
Recovery time
VNFC checkpoints its state to VD, which is HA
11. VIM informs VNFM
*Steps 1-4 are simultaneous they are separated for clarity
13
High Availability Flow Chart
Service is unavailable
VNF failure only
repeated failure
Step 1-Service Recovery: (Time Constraint, Carrier Grade VNF should be recovered within seconds)
Step 2-NFVI recovery or repair:(less Time Constraint)
recovery time
Service failure happens(may be caused of failure of VNF or NFVI)
Failure detection (by service heartbeat loss/NFVI report of failure)
Service failover
VNFC repair/restart
VM recovery and VNFC recovery
NFVI failure
14
Requirement doc & Gap Analysis doc.
Ian Jolliffe
15
Requirement Doc. Details• Framework
– The ultimate goal is to provide upper layer service high availability– Service high availability is provided by recovery of service (including
service restart and failover) within seconds following the SAL.– Repair or recovery of the failed layer should happen afterwards.– Ensure that no failure in one layer causes a cascading failure at other
layers. – A single layer can detect failures in other layers and help recover failed
components.
Hardware
NFVI VIM
VNF/VNFC VNFM
Service Service layer
Application/VM layer
NFVI/VIM layer
Hardware layer
16
Requirement doc. outline• 1 Overall Principle for High Availability in NFV
– 1.1 Framework for High Abailability in NFV– 1.2 Definitons– 1.3 Overall requirements– 1.4 Time requirement
• 2 Hardware HA• 3 Virtualization Facilities (Host OS, Hypervisor)• 4 Virtual Infrastructure HA – Requirements:
– 4.1 Virtual Compute– 4.2 Virtual Storage– 4.3 Virtual Network
• 5 VIM High availability– 5.1 Archeticture requirement of VIM HA– 5.2 Fault detection and alarm requirement of VIM– 5.3 HA mechanism of VIM provided for NFV– 5.4 SDN controller
• 6 VNF HA– 6.1 Service Availability– 6.2 Service Continuity
• 7 Storage
17
Gap Analysis14+ HA related gaps have been discovered
Nova: 6 gaps in Nova covering scheduler, consoleauth and health status of compute node.
Neutron: 2 gaps in Neutron covering L3 agent and DHCP agent.
Cinder: 2 gaps in Cinder covering HA configuration and multi-attachment.
VIM NBI: 1 gap for error reporting
QoS: 1 gap for QoS management
References:https://etherpad.openstack.org/p/kilo-crossproject-ha-integrationhttps://etherpad.openstack.org/p/kilo-summit-ops-hahttps://blueprints.launchpad.net/openstack
18
Blue Prints
Joe Huang
19
KeyStone
KeyStoneMiddleware
API Request(Nova/Cinder/Neutron…)
OpenStack service(Nova/Cinder/Neutron…)
Validate Token (Fernet,UUID) or retrieve RevokeList (PKI)
Site1 KeyStone
KeyStoneMiddleware
API Request(Nova/Cinder/Neutron…)
OpenStack service(Nova/Cinder/Neutron…)
Validate Token (Fernet,UUID) or retrieve RevokeList (PKI)
Site2 KeyStone
Only one KeyStone server can be configured for token validation or revoke list
Allow secondary KeyStone server configured in case of site level KeyStone failure.
(Cons. of DNS based load balance : delayed failover for caching issues, an unpredictable routing)
Escape from site level KeyStone failure
20
Open Questions: Storage HA
Georg Kunz
21
Storage Architecture
Hardware
host host storage array
NFVI
VNF/VNFC
distributed storageVIM
switch switch
block, file, object block, file,
object
storage service component
file, (object)
Ctrl1 Crtl2
22
Storage HA – Network Failure
host host storage array
distributed storageVIM
switch switch
block, file, object block, file,
object
Ctrl1 Crtl2
1. Storage network link fails2. Storage network detects failure3. Storage network switches to standby link(s)
• iSCSI multi-pathing • bonding
4. Report failure to O&M
23
Storage HA – Failure in Storage Array
host host storage array
distributed storageVIM
switch switch
block, file, object block, file,
object
Ctrl1 Crtl2
1. Component within storage array fails2. Array-internal fail-over kicks in
• RAID• Redundant controllers, NICs, …
3. Report failure to O&M
24
Storage HA – Host Failure
host host storage array
distributed storageVIM
switch switch
block, file, object block, file,
object
Ctrl1 Crtl2
1. Storage host fails2. Distributed storage layer detects failure3. Distributed storage layer rebalances data
25
NFVI
Non-HA Block Storage (legacy)
• Mirroring of block devices on VNF level
VNF
VNFC(active)
VNFC(passive)
mirroring
26
HA Block Storage
• Active/passive configuration– Failover supervised by clustering software in VNF– Requires multi-attach capability of Cinder
NFVI
VNF
VNFC(active)
VNFC(standby)
VNFC(active)
VNFM
VIM
27
HA Block Storage
• Active/active configuration– Clustered file system enables concurrent access– Requires multi-attach capability of Cinder
NFVI
VNF
VNFC(active)
VNFC(active)
VNFM
VIM
28
VNF level HA for Multiple Backends
• Block devices provided by multiple backends– Mirroring of block devices on VNF level– Pro-active failover possible
NFVI
VNF
VNFC(active)
VNFC(passive)
mirroring
VNFM
VIM
backend 1 backend 2
29
Open Questions
• Can NFVI storage system provide sufficient level of HA to meet SAL levels?– Failover/recovery times heavily depend on
deployed solution• How much does rebuild of data impact
performance?
30
File Storage
• Legacy deployments– File storage service provided by VNFC– Layered on top of block storage services
• NFVI– File storage service provided by NFVI / hardware– Openstack Manila
31
Ephemeral Storage
• Ephemeral Storage– Main use: File systems of VMs booted from image– Location
• On local disks of compute host– Isolation of failover domains
» VM unaffected by failure of storage system– Disk failure corresponds to host failure– Limits live migration capabilities
• On distributed or external storage– Correlated failures possible
» Failure of storage backend impacts VMs– Properties of respective storage backend apply
32
Appendix
33
NFVI
UC3: Statefull VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only
1. VNFC fails 2. NF fails*3. VNF detects the failure4. VNF isolates VNFC5. VNF repairs VNFC
7. NF recovers
ACT
statestate
state
6. VNFC gets state
VNFC checkpoints its state to VD, which is HA
Recovery time
*Steps 1&2 are simultaneous they are separated for clarity
34
NFVI
UC3-b: Statefull VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only
1. VNFC fails 2. NF fails*3. VNF detects the failure
6. VNFM repairs VNFC
8. NF recovers
ACT
statestate
state7. VNFC gets state
VNFC checkpoints its state to VD, which is HA
Recovery time
4. VNF reports to VNFM5. VNFM isolates VNFC
*Steps 1&2 are simultaneous they are separated for clarity
35
NFVI
UC4: Statefull VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF & NFVI
3. VNFC fails 4. NF fails*5a. VNF detects the failure6a. VNF reports to VNFM
12. VNFM repairs VNFC
14. NF recovers
ACT
statestate
state
13. VNFC gets state
1. VM fails 2. VM Service fails
5b. NFVI detects the failure6b. NFVI reports to VIM7. VIM reports to VNFM8. VNFM ok to VIM9. VIM repairs VM
10. VM Service recovers
VM
Recovery time
VNFC checkpoints its state to VD, which is HA
11. VIM informs VNFM
*Steps 1-4 are simultaneous they are separated for clarity
36
NFVI
UC5: Stateless VNF with Redundancy
VM VM VM VM
VNF
VNFMSpare
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only
ACT
1. VNFC fails 2. NF fails*3. VNF detects the failure4. VNF isolates VNFC5. VNF fails over6. NF recovers7. VNF restores redundancy
Spare
Nothing new in this scenario
Spare VNFC may or may not be instantiated
Recovery time
*Steps 1&2 are simultaneous they are separated for clarity
37
NFVI
UC6: Stateless VNF with Redundancy
VM VM VM VM
VNF
VNFMSpare
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF & NFVI
ACT
1. VM fails 2. VM Service fails
5a. VNF detects the failure
5b. NFVI detects the failure
6a. VNF fails over7a. NF recovers
11. VNF restores redundancy
Spare
3. VNFC fails4. NF fails*
6b. NFVI reports to VIM7b. VIM reports to VNFM8. VNFM ok to VIM9. VIM repairs VM
VM
10. VM Service recovers
Spare VNFC may or may not be instantiated
Recovery time
*Steps 1-4 are simultaneous they are separated for clarity
38
NFVI
UC7: Stateless VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only
1. VNFC fails 2. NF fails*3. VNF detects the failure
5. VNF isolates VNFC6. VNF repairs VNFC 7. NF recovers
ACT
Recovery time
4. VNF reports to VNFM
*Steps 1&2 are simultaneous they are separated for clarity
39
NFVI
UC8: Stateless VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF & NFVI
3. VNFC fails 4. NF fails*5a. VNF detects the failure6a. VNF reports to VNFM
12. VNF repairs VNFC 13. NF recovers
ACT
1. VM fails 2. VM Service fails
5b. NFVI detects the failure6b. NFVI reports to VIM7. VIM reports to VNFM8. VNFM ok to VIM9. VIM repairs VM
10. VM Service recovers
VM
Recovery time
11. VIM informs VNFM
*Steps 1-4 are simultaneous they are separated for clarity
40
NFVI
UC9: Stateless VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
ACT
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only – BUT Repeatedly1. VNFC fails 2. NF fails3. VNF detects the failure and counts4. VNF isolates VNFC5. VNF repairs VNFC 6. NF recovers ACT 1…. VNFC fails….2 …. VNFC fails….3 …. VNFC fails….4
234
Fault is not in the VNFC!
ACTACT
UC7
41
NFVI
UC9: Stateless VNF with No Redundancy
VM VD VM VM
VNF
VNFM
VIM
NFVO
NFVI’s Services
VNF’s Services NF
Failure detection: VNF only – BUT Repeatedly1. VNFC fails 2. NF fails3. VNF detects the failure and counts4. VNF isolates VNFC5. VNF repairs VNFC 6. NF recovers ACT…. VNFC fails….2 …. VNFC fails….3 …. VNFC fails….4
4
N. VNF reports to VNFM
N+5. VNF repairs VNFC N+6. NF recovers
N+1. VNFM reports to VIMN+2. VIM isolates VM
N+4. VM Service recoversN+3. VIM repairs VM
VM
42
• Scenario chart• Scenario 1,2,5,6• Add all the scenarios as appendix• NFVI provide HA API to VNF? Opensaf is a PaaS,
as a HA middleware actually• VNF stateful and stateless may require different
schema in the NFVI, if VNF is not redundancy, we may need VM redundancy.
At this case VNF problem may not be solved.