+ All Categories
Home > Documents > OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize...

OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize...

Date post: 24-Jul-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
27
2017 May 11 Duong Ha-Quang and Hieu LE Fujitsu Vietnam Limited OpenStack Cluster Zero-Downtime Upgrade ft. Kolla Copyright 2017 Fujitsu Vietnam Limited
Transcript
Page 1: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

2017 May 11Duong Ha-Quang and Hieu LEFujitsu Vietnam Limited

OpenStack ClusterZero-Downtime Upgrade ft. Kolla

Copyright 2017 Fujitsu Vietnam Limited

Page 2: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Who are we?

Duong Ha-Quang Software Engineer at Fujitsu Vietnam

Core reviewer of Kolla

Email: [email protected]

IRC: duonghq

Hieu LE Software Engineer at Fujitsu Vietnam

Official Vietnam OpenStack UG organizer

Email: [email protected]

IRC: hieulq

1 Copyright 2017 Fujitsu Vietnam Limited

Page 3: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Introduction

Some reviews and thought about zero-downtime upgrade for OpenStack services.

Ideas in this presentation are just concept.

PoC in Kolla.

2 Copyright 2017 Fujitsu Vietnam Limited

Page 4: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Agenda

1. OpenStack Upgrading overview

OpenStack upgrade assertion tag

OpenStack rolling upgrade requirements

2. From minimal to zero

3. Zero downtime upgrade proposal in Kolla

Kolla support for configuration management

Kolla support for OSM

Proposal/Demo

3 Copyright 2017 Fujitsu Vietnam Limited

Page 5: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

OpenStack Upgrading

One of the most demand feature for every system

Cold upgrade is easy

Service-level agreement is more strictly nowadays, we need decreasing downtime of the system.

4 Copyright 2017 Fujitsu Vietnam Limited

Source: http://imgur.com/a/kZCrS

Page 6: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Review of current upgrading model

BlueGreen deployment and Canary release

Rolling upgrade

5 Copyright 2017 Fujitsu Vietnam Limited

Page 7: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

BlueGreen deployment

Copyright 2017 Fujitsu Vietnam Limited6

Old

New

Router

User(s)

https://martinfowler.com/bliki/BlueGreenDeployment.html

Page 8: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Canary release

Copyright 2017 Fujitsu Vietnam Limited7

https://martinfowler.com/bliki/CanaryRelease.html

Router

User(s)

Old

New

Page 9: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Rolling upgrade

Eliminates the need to restart all services on new code simultaneously.

Requires mixed-version services work together properly in mid-upgrade.

May have downtime of some services at a time.

8 Copyright 2017 Fujitsu Vietnam Limited

A(X)B(X)

C(X)

X + 1 X + 1

Page 10: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

OpenStack rolling upgrade requirements

1. Online Schema Migration (OSM)

2. Maintenance Mode

3. Live Migration

4. Multi-version Interoperability

5. Graceful Shutdown

6. Upgrade Orchestration

7. Upgrade Gating

8. Project Tagging

9 Copyright 2017 Fujitsu Vietnam Limited

https://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/rolling-upgrades.html

https://github.com/openstack/governance/blob/master/reference/projects.yaml

Page 11: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

OpenStack upgrade assertion tag

TC of OpenStack defines five upgrade-related tags:

1. assert:supports-upgrade

2. assert:supports-accessible-upgrade

3. assert:supports-rolling-upgrade

4. assert:supports-zero-downtime-upgrade

5. assert:supports-zero-impact-upgrade

10 Copyright 2017 Fujitsu Vietnam Limited

https://governance.openstack.org/tc/reference/tags/

Page 12: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

From minimal downtime to zero-downtime upgrade

Zero-downtime upgrade requires to enhance:

Configuration management (CM)

Database migration (OSM)

11 Copyright 2017 Fujitsu Vietnam Limited

Icons made by Freepik from www.flaticon.com is licensed by CC 3.0 BYIcons made by Google from www.flaticon.com is licensed by CC 3.0 BYIcons made by Madebyoliver from www.flaticon.com is licensed by CC 3.0 BY

Old version New version

New configDeprecated configRemoved configRPC pinning

RPC

Service Upgrading database

Page 13: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Two main approaches for OSM

Trigger-based E.g. Keystone, Glance

Other: Facebook [1]

Triggerless E.g. Neutron

Binary log-based [2]

12 Copyright 2017 Fujitsu Vietnam Limited

[1] https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/

[2] https://github.com/github/gh-ost

Page 14: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Zero downtime upgrade proposal

Database online schema migration: 2 candidate solutions

1. Buffer requests in upgrade period.

2. Utilize checkpoint/snapshot and binary log of database.

13 Copyright 2017 Fujitsu Vietnam Limited

Page 15: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Upgrading

Zero downtime upgrade proposal (1)

Copyright 2017 Fujitsu Vietnam Limited14

Buffering HTTP and RPC requests in upgrade period.

HA/LB stack

Service (ver X)

Database (ver X)

HA/LB stack

Service (X X + 1)

Database (X X + 1)

Requests buffer

Service (X + 1)

Database (X + 1)

Requests buffer

HA/LB stack

Requests are resent with original order

x

Step 0 Step 1 Step 2

Page 16: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Zero downtime upgrade proposal (1)

Buffering requests in upgrade period

There are 2 request types need buffering: RESTful HTTP requests from user and inter-projects.

Internal service RPC requests (through MQ)

Requests must be put in buffer in received order for replay correctly (best with timestamp)

15 Copyright 2017 Fujitsu Vietnam Limited

Page 17: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Zero downtime upgrade proposal (1)

Buffering requests in upgrade period

Pros: From user’s POV: no service perceivable downtime if migration time is short enough.

Cons: From user’s POV: system is lag when requests are queued in buffer.

If migration time is long (mainly in database migration), some requests can timeout [1].

Buffer can be very large if database migration time is long.

16 Copyright 2017 Fujitsu Vietnam Limited

[1] https://blueprints.launchpad.net/keystone/+spec/allow-expired

Next idea

Page 18: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Zero downtime upgrade proposal (2)

Copyright 2017 Fujitsu Vietnam Limited17

Utilize checkpoint/snapshot and binary log of database.

Database

New data (Y)

BeforeCheckpoint (X)

Create checkpoint

Turn on binary log

X innew schemaMigrate to

next release

Start new version and bring up system

New data (Y) inold schema

X innew schema

Database innew schema

Migrate usingbinary log

Shutdown database-related service

Turn off binary log

Recorded by binary log

downtime

Can beeliminatedwith previousapproach

Page 19: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Zero downtime upgrade proposal (2)

Utilize checkpoint/snapshot and binary log of database.

Pros:

Internal system downtime is much smaller than previous approach (only downtime for delta change vs

whole database).

Cons:

From user’s POV: there is a short downtime.

Implementation is more complicated than previous approach.

18 Copyright 2017 Fujitsu Vietnam Limited

Page 20: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Database is turned off

Zero downtime upgrade proposal

Two candidate methods can be combined to get advantage from both methods:

- Use 2nd approach but add buffer layer and turn on when database is shutdown

→ Zero downtime from user’s POV, only a bit lag.

19 Copyright 2017 Fujitsu Vietnam Limited

Create checkpoint,binary log

Migrate current dbto new schema

Turn on buffer

Migrate new data tonew database

Turn off binary log Finish

Idea 2

Idea 1

Page 21: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

PoC in Kolla

Kolla’s mission is to provide production-ready containers and deployment tools for operating OpenStack clouds.

Three official deliverables: kolla(-image)

-> Docker images

kolla-ansible-> Deploy OpenStack using Ansible

kolla-kubernetes -> Deploy OpenStack inside k8s cluster

20 Copyright 2017 Fujitsu Vietnam Limited

Page 22: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Kolla support for configuration management

Kolla-Ansible has implemented mechanism for configuration management functions:Configurations overridden [1]

Kolla-Kubernetes posed good potential to automated CM

21 Copyright 2017 Fujitsu Vietnam Limited

[1] http://docs.openstack.org/developer/kolla/advanced-configuration.html

Page 23: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Kolla support for OSM (1)

For OpenStack projects had OSM native-supported Patches for Neutron and Keystone OSM are in progress

https://blueprints.launchpad.net/kolla-ansible/+spec/apply-service-upgrade-procedure

22 Copyright 2017 Fujitsu Vietnam Limited

Page 24: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Kolla support for OSM (2)

For OSM unsupported project

Implement above ideas at HA/LB layer.

Request buffer: Intermission/OpenResty

23 Copyright 2017 Fujitsu Vietnam Limited

Page 25: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

PoC scenario for 1st approach

PoC for HTTP requests buffering

Intermission/OpenResty configuration proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

Intermission is bound to VIP:5000 and VIP:35357, HAProxy 5000 -> 5050, 35357 -> 35387

Scenario: Continuously send 200 create and delete network request to Ocata cluster

Upgrade Neutron to master code-based while requests are sending.

24 Copyright 2017 Fujitsu Vietnam Limited

Page 26: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Demonstration

Used scripts: https://github.com/vietstacker/zero-downtime-upgrade-scenario

Rolling upgrade with Kolla, we have downtime here https://www.youtube.com/watch?v=CfCBLeV1kIM

PoC buffer request with Kolla https://www.youtube.com/watch?v=6UDQXDINw84

25 Copyright 2017 Fujitsu Vietnam Limited

Page 27: OpenStack Cluster Zero-Downtime Upgrade ft. Kolla...Zero downtime upgrade proposal (2) Utilize checkpoint/snapshot and binary log of database. Pros: Internal system downtime is much

Recommended