+ All Categories
Home > Documents > How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql /...

How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql /...

Date post: 26-Mar-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
55
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Keisuke Suzuki Software engineer How to Upgrade Major Version of Your Production PostgreSQL 2018/12/12 PGConf.ASIA 1
Transcript
Page 1: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Keisuke SuzukiSoftware engineer

How to Upgrade Major Version of Your Production PostgreSQL

2018/12/12 PGConf.ASIA

1

Page 2: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Who am I?

Keisuke Suzuki• Backend Engineer @ Treasure Data K.K.

– PlazmaDB: distributed storage– Datatank: data mart

Both uses PostgreSQL internally

• Interest: DB / Distributed system / Performance optimization

• Twitter: @yajilobee

2

Page 3: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

PostgreSQL Versions

3

Page 4: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

PostgreSQL Versions

• Major version: released yearly - new features–DB data are not compatible between different major versions

• Minor version: every 3 months at least - bug & security fixes–DB data are compatible if major versions are the same

9.6.11 10.6major minor major minor

4

Page 5: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

EOL of Major Versions

https://www.postgresql.org/support/versioning/

The PostgreSQL Global Development Group supports a major version for 5 years after its initial release.

5

Page 6: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Q: Should we follow the latest Major Version?

A: Depends on the case-> Upgrading requires downtime and may cause incompatibility• Extended support is provided by venders–Mission critical systems

• SaaS venders may stop providing old major versions– e.g.) 9.3 retirement on Amazon RDS✓ Stop Deployment: Aug. 2018✓ Force Major Version Upgrade: Nov. 2018

6

Page 7: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Our Version

We were here until May 2018...

7

Page 8: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

PostgreSQL Usage in Treasure Data

8

Page 9: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Arm Treasure Data eCDP

9

Page 10: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

PlazmaDB: Core Storage Layer of TD

Streaming data

Bulk load

PlazmaDBMetadata(PostgreSQL)

AWS S3 /Riak CS

10

Page 11: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Daily Workload & Storage Size of PlazmaDB

Import Analytical Query Storage size

500 Billion Records / day~ 5.8 Million Records / sec

5 PB (+5~10 TB / day)55 Trillion Records

600,000 Queries / day15 Trillion Records / day

11

Page 12: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

AWS RDS

Server Structure of Meta DB

Masterr3.8xlarge (32 vcores, 244GB RAM)Multi-AZ

Read replica(asynchronous streaming replication)m3.large(2 vcores, 7.5GB RAM)

No production workloadConnection from applications

12

Page 13: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Data VolumePlazmaDB

Meta DB (PostgreSQL)

Realtime Storage Archive Storage

AWS S3 / Riak CS

5 PB

GiST GiST

Partition Metadata

Partition Metadata

1 TB13

Page 14: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

CPU & IO utilization of PostgreSQL (Meta DB)

AVG: 13%MAX: 25%

AVG: 700 read IOPS 1100 write IOPSMAX: 2500 read IOPS 3000 read IOPS

14

Page 15: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

TPS & WAL Throughput

AVG: 1.2k TPSMAX: 1.5k TPS

AVG: 4MB/secMAX: 20MB/sec

15

Page 16: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

If PlazmaDB was down..

Streaming data

Bulk load

PlazmaDBMetadata(PostgreSQL)

AWS S3 /Riak CS

16

Page 17: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

If PlazmaDB was down..

17

Page 18: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Planning Major Version Upgrade

18

Page 19: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Long Way to Complete Upgrade

• Choose upgrade strategy

• Choose the next major version

• Plan operation and evaluation (include Rollback ops)

• Regression Test on the new major version

19

Page 20: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Major Version Upgrade Strategies

• pg_dump/pg_dumpall

• pg_upgrade

• Logical replication

Old ver New verSQL

dump restore

Old ver New verLog shipping

Convert data

Old ver New ver

20

Page 21: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Upgrade with pg_dump/pg_dumpall

Old ver New verSQL

2. pg_dump/pg_dumpall

3. psql/ pg_restore

1. Stop DB 4. Sart DB

Downtime: Step 1 to 4Include Full data copy

21

Page 22: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Upgrade with pg_upgrade Link mode

1. Stop DB

Old ver New ver

System table

User data User data

System table

3. Start DB

2.2. Create hard link

2. pg_upgrade

2.1. Rebuild

Downtime: Step 1 to 3Not include user data copy

22

Page 23: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Upgrade with Logical Replication

Old ver New ver

1. Start replication & wait synchronization

2. Stop connections

Downtime: Step 2 to 3DBs don’t stop during upgrade

Applications

3. Start connections

23

Page 24: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Downtime

pg_dump/pg_dumpall >> pg_upgrade > Logical Replication

Full data copy Rebuild system tables Switch serversDowntime comes from...

Not acceptable

24

Page 25: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Downtime

pg_dump/pg_dumpall >> pg_upgrade > Logical Replication

Full data copy Rebuild system tables Switch serversDowntime comes from...

Not acceptableHow Long?

25

Page 26: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Downtime: pg_upgrade in RDS

• Operation–Upgrade major version by modify-db-instance APIPerform pg-upgrade link mode internally

• Downtime -> 7-8 mins• Note–Downtime happens several minutes after invoking APIExact time to happen cannot be specified.

26

Page 27: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Upgrade of Replicas

Old ver New ver Old ver New ver

• Downtime will be longer if upgrade of replicas is required– 1. Stop DBs– 2. Upgrade master– 3. Upgrade slave– 4. Start DBs

27

Page 28: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Logical Replication: How to switch servers?

Update application configuration Overwrite DB endpoint

Old ver New ver

App App App App App

Distribute new DB configuration

Release new DB after connections of old DB are closed to avoid write conflict

App App App App App

IP / DNS / proxy etc..

Old ver New ver

Update only 1 point28

Page 29: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Downtime: Logical Replication

• Operation–Change DB endpoints (DNS) by modify-db-instance API

• Downtime -> 1-2 minsOld ver New ver

Applications

1. plazma.***.com-> plazma-old.***.com

2. plazma-new.***.com-> plazma.***.com

29

Page 30: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Feasibility of each Strategy

pg_upgrade Logical Replication

• Pros–Easier operationEspecially in AWS RDS

• Cons– Longer downtime–Upgrade only 1 major version at once (AWS RDS)

• Pros–Shorter downtime

• Cons–PostgreSQL 9.3 doesn’t have native logical replication

30

Page 31: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Logical Replication on PostgreSQL

• Before 9.3: 3rd party tool based on trigger– Bucardo, Londiste, Slony– Write amplification happens to record delta– For AWS RDS, replication server is required outside of DB servers

• 9.4 - 9.6: 3rd party tool based on logical decoding– pglogical, AWS Database Migration Service (DMS)– DMS provides managed replication server

• After 10: Native logical replication– No external process is required

31

Page 32: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Our Plan

• Step 1: 9.3 -> 9.4 by pg_upgrade–Operation simplicity and manageable downtime–Done on May 2018

• Step 2: 9.4 -> latest by DMS– Less downtime and managed service is available–Coming soon..

32

Page 33: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Things to Consider on Upgrade

33

Page 34: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

How to handle downtime?8 minutes downtime is expected

Endpoint Queue PlazmaDBWorker

Import

Query

Sync Async Retry Retry

Has capacity for 8 mins

Requests were delayed but no error was returned

34

Page 35: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Impact of Cold Start

• Shared buffers is empty after booting DB– In our env, Shared buffer = 160GB –PostgreSQL page size = 8kB–Pages to load into buffer = 160GB / 8kB = 20M pages

• IOs are issued based on some factors–Number of queries–Number of pages read by a query– Locality

35

Page 36: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Bonus of pg_upgrade: Page Cache is Hot

• Page cache of OS remains during upgrade–Binary format of user data file is compatible

Old ver New ver

System table

User data User data

System table

hard link

Rebuild

Page cache

36

Page 37: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Mitigate Impact of Cold Start

• Adjust RDS Preserved IOPS to 10k–Our major select workload scans ~10k pages/sec–RAM = 244GB, Buffer = 160GB -> Page cache ~ 80GB✓ 50% is filled by page cache -> expected IOPS ~ 5k✓ +5k just in case

• Page cache isn’t hot if you need to change servers– pg_prewarm

37

Page 38: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Rollback Plan

1. Promote Replica: 1-2 mins

2. Restore from Backup (if No.1 doesn’t work): 30-60 mins

Master Replica

Applications1. plazma.***.com-> plazma-old.***.com

2. plazma-rr.***.com-> plazma.***.com

3. promote

38

Page 39: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Evaluate New Major Version

39

Page 40: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Importance of Evaluation

• Rollback is the worst case– Longer downtime–Data created after upgrade should be recovered manually

• Detect degradation by test– Interface incompatibility✓System test on staging environment–Performance degradation✓ production != staging in terms of server resources

40

Page 41: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Real World Application is Complicated..Many Performance Related Factors• Type of Workload• Users’ behaviour– # of import requests / analytical query requests– Data skew

• Metadata Storage size• Server Size (CPU cores, RAM, Storage, Network)• etc..

Modeling workload and Define target performance41

Page 42: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Use Metrics for Modeling Workload• Metrics Collectors– Arm Treasure Data:

detailed analyze– DataDog: visualization

• Metrics– CPU / IOPS– Table size– # of Query / Query Types– Cache hit ratio– SEL / INS / DEL / UPD rows– … 50+ metrics

42

Page 43: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Created Benchmark Tool based on Metrics

• Enable production size perf measurement without integration– Iteratively refined benchmark workload by adding lacked parameters

● Number of queries / Types of queries● DB size● Data access locality● Data skew

• Goal of Measurement– Check if pg 9.4 is capable of processing current production workload (+α)– Detect different behavior of metrics

43

Page 44: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Workload of DB

PlazmaMeta DB

Streaming Import

BulkLoad

Merge

Presto Select

Presto Insert / delete

HiveSelect

GC

HiveInsert

44

Page 45: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Workload of DB

PlazmaMeta DB

Streaming Import

BulkLoad

Merge

Presto Select

Presto Insert / delete

HiveSelect

GC

HiveInsert

80% of query workloadSelect 10k rows/sec

93% of import workloadInsert 1k rows/sec

45

Page 46: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Selectivity

• Random sampling from actual selectivity distributione.g.)– 40% queries: sl = 1– 5% queries: sl = [0.01, 0.5]– 5% queries: sl = [0.5, 0.99]

Percentile of total # of queriesS

elec

tivity

46

Page 47: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Data Access Locality

• Metadata size = 1TB• Shared Buffer size = 160GB• But, Hot Data size is smaller

than Shared Buffere.g.)– 85% of workload comes

from 1% data sets– 95% of workload comes

from 5% data sets

# of Read Requested on Data Set /day

# of

Par

titio

ns R

ead

/day

47

Page 48: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Result of Performance Measurement

Observed no performance degradation48

Page 49: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Operation in Production

49

Page 50: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Final Upgrade Plan

• Preparation– Increase Preserved IOPS to 10k for cold start–Scale up replica server to the same as master for rollback

• Operation include downtime– Invoke pg_upgrade via modify-db-instance API in scheduled maintenance window✓ ~8 mins downtime starts several minutes after invoking API✓ Expected Read IOPS after upgrade ~5k

50

Page 51: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Actual Impact

Requests were delayed 9 minutes (~ DB downtime)

51

Page 52: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Request Queue Depth

Max 38k reqs* 12 queues= 450k reqs

Consumed in 11 mins

Normally < 1Alert if > 200

52

Page 53: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Actual Impact of Cold StartIOPS Buffer cache misses

This could be stopped..

5k IOPS was expected

9k / 19k ~ 47% was hit on page cache53

Page 54: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

Summary

• Choose appropriate strategy for your environment– Downtime: pg_dump >> pg_upgrade > logical replication– Operation easiness: pg_dump < pg_upgrade < logical replication– Consider not only DB impact but also entire system impact– Think of Downtime, Cold Start, Rollback

• Metrics is very important– Estimate impact, Model workload, Evaluate performance, Measure system

healthiness

54

Page 55: How to Upgrade Major Version of Your Production PostgreSQL · SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 ... How to switch servers?

Thank You!Danke!Merci!谢谢!Gracias!Kiitos!

Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.55


Recommended