Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

transcript

Spilothe high-available PostgreSQL cluster

Feike SteenbergenZalando SE

• 15 EU countries• 3 fulfilment

centers • 15+ million

active customers• 2.2 billion €

revenue 2014

Zalando

150 000+ products

We are growing!

Our databases

• > 150 production Postgresql databases

• > 13.5 TB data• > 5 TB biggest DB• 400-1000+ write tps• > 2 DB failures/month

Zalando never sleeps

Infrastructure bottleneck

ACID Teamcreatealterdeploymigratefailoverupgrade

80+ teams

Radical Agility

Purpose

Autonomy

Mastery

• 2013: ZCloud

• 2014: project Pequod

• 2015: Let’s just use AWS…

Amazon abbreviations

• AWS - amazon web services• EC2 - elastic compute cloud• ELB - elastic load balancer• RDS - relational DB service• CF - Cloud Formation• ASG - Auto Scaling Group

• One account per team

• Microservices

• REST/OAuth2

• Deployment with Docker

Autonomous teams on AWS

INTERNET

Autonomous teams• Team decides which product to build• … and which technologies to use

• REST/OAuth2 mandatory

• Team is responsible for its infrastructure

• Developers should take care of infrastructure

• ..including production databases

• On AWS!

Databases?

Isn’t it dangerous?

DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958

ACID team provides

PostgreSQL trainings

What about failover?

• Detect the master failure

• Elect a new master

• Redirect clients

Autofailover tasks

Autofailover issues

• Discarded writes

• Split-brain

• False positives

• Support for PostgreSQL

• Automatic failover

• Most extensions

• Automatic backups

RDS?• Vendor lock

• No superuser

• No untrusted languages

• No logical decoding plugins

• Costs

Spilo (სპილო)

Spilo does

• Rapid deployment of PostgreSQL on AWS EC2 instances

• Streaming replication with auto-failover

Spilo on AWS

SpiloMASTER

SpiloREPLICA

Master connection

Application DB request

ETCD cluster statusupdate

Failover

SpiloREPLICA

Master connection

Failover

SpiloMASTER

SpiloREPLICA

Master connection

NEWSPILOSTARTS…

Failover

SpiloMASTER

SpiloREPLICA

Master connection

SpiloREPLICA

What is Spilo?

Patroni

MASTER

Patroni

REPLICA

Patroni

REPLICA

Auto-scaling group Auto-scaling group

Demo: Deploying Spilo

• We use stups• First we define a template• We create a Cloud Formation

Stack from this template

Patroni (პატრონი)• Handles new replicas and failover

• Based on ideas and code of the Compose Governor

• Open-source• Runs everywhere

Compose Governor idea

● Use etcd for failover decision

● Run etcd on every node● Run 1 node with HAProxy + etcd

Distributed configuration systems

• Fault tolerant

• Reliably store small amounts of strongly-consistent data between distributed nodes

• Good for storing the PostgreSQL cluster state

Distributed consensus

LEADER

CLIENT CLIENT CLIENT

write request

Distributed consensus

LEADER

CLIENT CLIENT CLIENT

write request

LEADER

Cluster state in etcd$ export ETCD=172.17.0.2:4001$ etcdctl -C $ETCD ls /service/ --recursive/service/dm/service/dm/optime/service/dm/optime/leader/service/dm/members/service/dm/members/postgresql_172_17_0_3/service/dm/members/postgresql_172_17_0_4/service/dm/members/postgresql_172_17_0_5/service/dm/initialize/service/dm/leader

Leader key• Points to the member key• Has a TTL, autoexpires• Acts as an exclusive lock• Only the leader can become the

master

Leader TTL$ http http://$ETCD/v2/keys/service/dm/leader...{ "action": "get", "node": { "createdIndex": 8, "expiration": "2015-11-20T09:56:43.59367038Z", "key": "/service/dm/leader", "modifiedIndex": 85, "ttl": 22, "value": "postgresql_172_17_0_3" }}

Member key$ etcdctl -C $ETCD get /service/dm/members/postgresql_172_17_0_5{ "conn_url": "postgres://un:pw@172.17.0.5:5432/postgres", "api_url": "http://172.17.0.5:8008/patroni", "tags": {}, "state": "running", "role": "replica", "xlog_location": 67109176}

PatroniPatroni

MASTER REPLICA

MASTER LB

PostgreSQL connection

API HealthCheck (master)

Connection and API URL

PatroniPatroni

MASTER REPLICA

MASTER LB

PatroniPatroni

MASTER REPLICA

MASTER LB

PatroniPatroni

MASTER REPLICA

MASTER LB

PatroniPatroni

MASTER REPLICA

MASTER LB REPLICA LB

API HealthCheck (slave)

Initialize key$ etcdctl -C $ETCD get /service/dm/initialize6219169399948550171

• PostgreSQL cluster system ID• Created by the first node that joins

the cluster• Nodes with different system ID are

not allowed to join

Patroni modules

ETCD ZOOKEEPER

ABSTRACT DCS PostgreSQL REST API

High availability

Asynchronous executor

Callbacks

Demo time!

https://asciinema.org/a/2ttvu50yehjo2712s1w43udio

• Robust exception handling• Run long-running tasks (i.e.

base backup in a separate thread)

• ETCD + Zookeeper• Rest API

Patroni improvements

• Configurable replica imaging• Support for pg_rewind• patronictl• packaged:pip install patroni

Patroni improvements

• Manual failover• Initialize from external cluster• Attach to already running

PostgreSQL nodes• Tags (i.e. nofailover)

• Spilo:github.com/zalando/spilospilo.readthedocs.org

• Patroni:github.com/zalando/patronipatroni.readthedocs.org

• Stups:github.com/zalando-stups/stups.io

• Feedback: @ekief

Thank you!

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Data & Analytics