Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Keisuke SuzukiSoftware engineer
How to Upgrade Major Version of Your Production PostgreSQL
2018/12/12 PGConf.ASIA
1
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Who am I?
Keisuke Suzuki• Backend Engineer @ Treasure Data K.K.
– PlazmaDB: distributed storage– Datatank: data mart
Both uses PostgreSQL internally
• Interest: DB / Distributed system / Performance optimization
• Twitter: @yajilobee
2
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Versions
3
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Versions
• Major version: released yearly - new features–DB data are not compatible between different major versions
• Minor version: every 3 months at least - bug & security fixes–DB data are compatible if major versions are the same
9.6.11 10.6major minor major minor
4
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
EOL of Major Versions
https://www.postgresql.org/support/versioning/
The PostgreSQL Global Development Group supports a major version for 5 years after its initial release.
5
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Q: Should we follow the latest Major Version?
A: Depends on the case-> Upgrading requires downtime and may cause incompatibility• Extended support is provided by venders–Mission critical systems
• SaaS venders may stop providing old major versions– e.g.) 9.3 retirement on Amazon RDS✓ Stop Deployment: Aug. 2018✓ Force Major Version Upgrade: Nov. 2018
6
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Version
We were here until May 2018...
7
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Usage in Treasure Data
8
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Treasure Data eCDP
9
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB: Core Storage Layer of TD
Streaming data
Bulk load
PlazmaDBMetadata(PostgreSQL)
AWS S3 /Riak CS
10
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Daily Workload & Storage Size of PlazmaDB
Import Analytical Query Storage size
500 Billion Records / day~ 5.8 Million Records / sec
5 PB (+5~10 TB / day)55 Trillion Records
600,000 Queries / day15 Trillion Records / day
11
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
AWS RDS
Server Structure of Meta DB
Masterr3.8xlarge (32 vcores, 244GB RAM)Multi-AZ
Read replica(asynchronous streaming replication)m3.large(2 vcores, 7.5GB RAM)
No production workloadConnection from applications
12
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Data VolumePlazmaDB
Meta DB (PostgreSQL)
Realtime Storage Archive Storage
AWS S3 / Riak CS
5 PB
GiST GiST
Partition Metadata
Partition Metadata
1 TB13
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
CPU & IO utilization of PostgreSQL (Meta DB)
AVG: 13%MAX: 25%
AVG: 700 read IOPS 1100 write IOPSMAX: 2500 read IOPS 3000 read IOPS
14
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
TPS & WAL Throughput
AVG: 1.2k TPSMAX: 1.5k TPS
AVG: 4MB/secMAX: 20MB/sec
15
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
If PlazmaDB was down..
Streaming data
Bulk load
PlazmaDBMetadata(PostgreSQL)
AWS S3 /Riak CS
16
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
If PlazmaDB was down..
17
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Planning Major Version Upgrade
18
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Long Way to Complete Upgrade
• Choose upgrade strategy
• Choose the next major version
• Plan operation and evaluation (include Rollback ops)
• Regression Test on the new major version
19
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Major Version Upgrade Strategies
• pg_dump/pg_dumpall
• pg_upgrade
• Logical replication
Old ver New verSQL
dump restore
Old ver New verLog shipping
Convert data
Old ver New ver
20
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with pg_dump/pg_dumpall
Old ver New verSQL
2. pg_dump/pg_dumpall
3. psql/ pg_restore
1. Stop DB 4. Sart DB
Downtime: Step 1 to 4Include Full data copy
21
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with pg_upgrade Link mode
1. Stop DB
Old ver New ver
System table
User data User data
System table
3. Start DB
2.2. Create hard link
2. pg_upgrade
2.1. Rebuild
Downtime: Step 1 to 3Not include user data copy
22
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with Logical Replication
Old ver New ver
1. Start replication & wait synchronization
2. Stop connections
Downtime: Step 2 to 3DBs don’t stop during upgrade
Applications
3. Start connections
23
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime
pg_dump/pg_dumpall >> pg_upgrade > Logical Replication
Full data copy Rebuild system tables Switch serversDowntime comes from...
Not acceptable
24
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime
pg_dump/pg_dumpall >> pg_upgrade > Logical Replication
Full data copy Rebuild system tables Switch serversDowntime comes from...
Not acceptableHow Long?
25
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime: pg_upgrade in RDS
• Operation–Upgrade major version by modify-db-instance APIPerform pg-upgrade link mode internally
• Downtime -> 7-8 mins• Note–Downtime happens several minutes after invoking APIExact time to happen cannot be specified.
26
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade of Replicas
Old ver New ver Old ver New ver
• Downtime will be longer if upgrade of replicas is required– 1. Stop DBs– 2. Upgrade master– 3. Upgrade slave– 4. Start DBs
27
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Logical Replication: How to switch servers?
Update application configuration Overwrite DB endpoint
Old ver New ver
App App App App App
Distribute new DB configuration
Release new DB after connections of old DB are closed to avoid write conflict
App App App App App
IP / DNS / proxy etc..
Old ver New ver
Update only 1 point28
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime: Logical Replication
• Operation–Change DB endpoints (DNS) by modify-db-instance API
• Downtime -> 1-2 minsOld ver New ver
Applications
1. plazma.***.com-> plazma-old.***.com
2. plazma-new.***.com-> plazma.***.com
29
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Feasibility of each Strategy
pg_upgrade Logical Replication
• Pros–Easier operationEspecially in AWS RDS
• Cons– Longer downtime–Upgrade only 1 major version at once (AWS RDS)
• Pros–Shorter downtime
• Cons–PostgreSQL 9.3 doesn’t have native logical replication
30
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Logical Replication on PostgreSQL
• Before 9.3: 3rd party tool based on trigger– Bucardo, Londiste, Slony– Write amplification happens to record delta– For AWS RDS, replication server is required outside of DB servers
• 9.4 - 9.6: 3rd party tool based on logical decoding– pglogical, AWS Database Migration Service (DMS)– DMS provides managed replication server
• After 10: Native logical replication– No external process is required
31
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Plan
• Step 1: 9.3 -> 9.4 by pg_upgrade–Operation simplicity and manageable downtime–Done on May 2018
• Step 2: 9.4 -> latest by DMS– Less downtime and managed service is available–Coming soon..
32
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Things to Consider on Upgrade
33
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
How to handle downtime?8 minutes downtime is expected
Endpoint Queue PlazmaDBWorker
Import
Query
Sync Async Retry Retry
Has capacity for 8 mins
Requests were delayed but no error was returned
34
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Impact of Cold Start
• Shared buffers is empty after booting DB– In our env, Shared buffer = 160GB –PostgreSQL page size = 8kB–Pages to load into buffer = 160GB / 8kB = 20M pages
• IOs are issued based on some factors–Number of queries–Number of pages read by a query– Locality
35
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Bonus of pg_upgrade: Page Cache is Hot
• Page cache of OS remains during upgrade–Binary format of user data file is compatible
Old ver New ver
System table
User data User data
System table
hard link
Rebuild
Page cache
36
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Mitigate Impact of Cold Start
• Adjust RDS Preserved IOPS to 10k–Our major select workload scans ~10k pages/sec–RAM = 244GB, Buffer = 160GB -> Page cache ~ 80GB✓ 50% is filled by page cache -> expected IOPS ~ 5k✓ +5k just in case
• Page cache isn’t hot if you need to change servers– pg_prewarm
37
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Rollback Plan
1. Promote Replica: 1-2 mins
2. Restore from Backup (if No.1 doesn’t work): 30-60 mins
Master Replica
Applications1. plazma.***.com-> plazma-old.***.com
2. plazma-rr.***.com-> plazma.***.com
3. promote
38
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Evaluate New Major Version
39
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Importance of Evaluation
• Rollback is the worst case– Longer downtime–Data created after upgrade should be recovered manually
• Detect degradation by test– Interface incompatibility✓System test on staging environment–Performance degradation✓ production != staging in terms of server resources
40
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Real World Application is Complicated..Many Performance Related Factors• Type of Workload• Users’ behaviour– # of import requests / analytical query requests– Data skew
• Metadata Storage size• Server Size (CPU cores, RAM, Storage, Network)• etc..
Modeling workload and Define target performance41
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Use Metrics for Modeling Workload• Metrics Collectors– Arm Treasure Data:
detailed analyze– DataDog: visualization
• Metrics– CPU / IOPS– Table size– # of Query / Query Types– Cache hit ratio– SEL / INS / DEL / UPD rows– … 50+ metrics
42
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Created Benchmark Tool based on Metrics
• Enable production size perf measurement without integration– Iteratively refined benchmark workload by adding lacked parameters
● Number of queries / Types of queries● DB size● Data access locality● Data skew
• Goal of Measurement– Check if pg 9.4 is capable of processing current production workload (+α)– Detect different behavior of metrics
43
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Workload of DB
PlazmaMeta DB
Streaming Import
BulkLoad
Merge
Presto Select
Presto Insert / delete
HiveSelect
GC
HiveInsert
44
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Workload of DB
PlazmaMeta DB
Streaming Import
BulkLoad
Merge
Presto Select
Presto Insert / delete
HiveSelect
GC
HiveInsert
80% of query workloadSelect 10k rows/sec
93% of import workloadInsert 1k rows/sec
45
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Selectivity
• Random sampling from actual selectivity distributione.g.)– 40% queries: sl = 1– 5% queries: sl = [0.01, 0.5]– 5% queries: sl = [0.5, 0.99]
Percentile of total # of queriesS
elec
tivity
46
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Data Access Locality
• Metadata size = 1TB• Shared Buffer size = 160GB• But, Hot Data size is smaller
than Shared Buffere.g.)– 85% of workload comes
from 1% data sets– 95% of workload comes
from 5% data sets
# of Read Requested on Data Set /day
# of
Par
titio
ns R
ead
/day
47
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Result of Performance Measurement
Observed no performance degradation48
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Operation in Production
49
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Final Upgrade Plan
• Preparation– Increase Preserved IOPS to 10k for cold start–Scale up replica server to the same as master for rollback
• Operation include downtime– Invoke pg_upgrade via modify-db-instance API in scheduled maintenance window✓ ~8 mins downtime starts several minutes after invoking API✓ Expected Read IOPS after upgrade ~5k
50
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Actual Impact
Requests were delayed 9 minutes (~ DB downtime)
51
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Request Queue Depth
Max 38k reqs* 12 queues= 450k reqs
Consumed in 11 mins
Normally < 1Alert if > 200
52
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Actual Impact of Cold StartIOPS Buffer cache misses
This could be stopped..
5k IOPS was expected
9k / 19k ~ 47% was hit on page cache53
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary
• Choose appropriate strategy for your environment– Downtime: pg_dump >> pg_upgrade > logical replication– Operation easiness: pg_dump < pg_upgrade < logical replication– Consider not only DB impact but also entire system impact– Think of Downtime, Cold Start, Rollback
• Metrics is very important– Estimate impact, Model workload, Evaluate performance, Measure system
healthiness
54
Thank You!Danke!Merci!谢谢!Gracias!Kiitos!
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.55