etcd based PostgreSQL HA Cluster

transcript

etcd based PostgreSQL HA cluster

TL;DR: github.com/compose/template-etcd-based-postgres-ha

Introduction

Chris Winslett

@winsletts

compose.io

reading the top 5 comments on Imgur since 2012

How we started using PostgreSQL

MongoDB was a primary datastore

launched project to understand financial metrics

required data exploration, which is brutal in MongoDB

Our database product

our platform runs databases

these databases scale automatically as a customer

increases data size

Our database product

could we run PostgreSQL on our platform?

Database operational requirements

• replicated • highly-available • no human interaction for failover • minimize core-engine

modifications • customers use entire

deployment

Tools investigated

repmgr with pgpool II

required human interaction for failover

does not use PostgreSQL streaming

pgpool was flakey on failover

Tools investigated

PostgreSQL streaming replication

no automatic failover

Tools investigated

bi-directional replicationi.e. master-master

only runs on one database per cluster

requires a patch on core engine

is automated failover too ambitious with PostgreSQL?

Learned from tools investigation

PostgreSQL should not be the canonical store of its own state, investigated:

serf - not consensus based consul - runs with consensus

etcd - run with conensus

Consulwe built the prototype on Consul

using:

locking sessions

health checks

code at: https://github.com/MongoHQ/consul_ha

ConsulCode at: https://github.com/MongoHQ/

consul_ha

Tight coupling between:

Consul interaction and

HA decision loop

Consul Diagram 1

Final Consul Diagram

Consul Results

amazing

automatically growing and shrinking Consul clusters

health checks to prevent unhealthy secondaries from acquiring locks

Consul

until, we ran into massive swap allocation.

40 GB swap allocation.

fine for prototypes, not for production.

Results from Consul

HA PostgreSQL is possible

but, we need a tool which uses our resources more wisely.

Switch to etcd

because of what we’d learned in Consul, the switch to etcd took a

day to have a working sample

Modern etcd diagramStart

Connect to etcd?

Is data directory empty?

Win race to set initialization

key?yes Initialize

database

Take over lead TTL

keyStart

PostgreSQL as a

leaderless Secondary

Leader owns key?

pg_basebackup from leader

Do I own leader key?

Acquire leader lock?

Update leader

TTL lock

Promote to leader

Is leader key

owned?

Am I following

the correct leader?

Am I the healthiest member?

Am I the leader?

Wait 30 seconds

Start Postgres

wait 5 seconds

follow proper leader

Running Loop

Start Postgres

Start-up Process

etcd features used

concensus recursivettl prevValue prevExist

https://coreos.com/docs/distributed-configuration/etcd-api/

etcd: recursive

used to find all members known to a cluster

etcd: ttl

used with our keep alive from a PostgreSQL runner

etcd: prevValue

used in conjunction with TTL to ensure the leader remains the leader when updating the TTL

etcd: prevExist

used to create a deployment initialization race

Improved with etcd

removed tight coupling in classes:

HA decision process

etcd state interaction

PostgreSQL handler

Issues with etcd

overly aggressive about consensus

instructions for optimization at https://coreos.com/docs/cluster-management/debugging/etcd-

tuning/

Issues with etcd

overly aggressive about consensus

we quit running etcd along side PostgreSQL because we wanted expanding PostgreSQL clusters

Time for live demo?

etcd based PostgreSQL HA Cluster

Technology