OutlineHeroku and Databases as a Service
How we manage 1 million+ Postgres DBs
Monitoring for 1m+
Incident response for 1m+
Disaster Recovery for 1m+
What is Heroku?
Applications
MonitoringLogging
Build And Deploy
Add-ons
Creating postgresql-solid-8793... done, (free) Adding postgresql-solid-8793 to demo-app... done Setting HEROKU_POSTGRESQL_GRAY_URL and restarting demo-app... done, v2782 Database has been created and is available ! This database is empty. If upgrading, you can transfer ! data from another database with pgbackups:restore Use `heroku addons:docs heroku-postgresql` to view documentation.
$
$ heroku addons:create heroku-postgresql --app demo-app
heroku-postgresql
Applications
MonitoringLogging
Build And Deploy
Databases
MonitoringLogging
Export and Import of dataBackups for HA and DR
heroku-postgresql
heroku-postgresqlheroku-redis
Single Tenant Production
ServerClusterDatabase
Single Tenant Production
ServerClusterDatabase
Multi-tenant Production
ServerclDb
clDb
clDb
Single Tenant Production
ServerClusterDatabase
Multi-tenant Production
ServerclDb
clDb
clDb
Hobby
ServerCluster
Db Db DbDb
Db Db DbDb
Db Db DbDb
Single Tenant Production
Serverlxc
Database
Multi-tenant Production
ServerlxcDb
lxcDb
lxcDb
Hobby
Serverlxc
Db Db DbDb
Db Db DbDb
Db Db DbDb
Plan vCPU RAM PIOPs Multi-tenant Connections
standard-0premium-0 2 1GB 200 Yes 120
standard-2premium-2 2 3.5GB 200 Yes 400
standard-4premium-4 2 15GB 1000 No 500
standard-5premium-5 4 30GB 2000 No 500
standard-6premium-6 8 60GB 3000 No 500
standard-7premium-7enterprise-7
16 120GB 4000 No 500
enterprise-8 32 240GB 4000 No 500
Single Tenant Production
ServerClusterDatabase
Multi-tenant Production
ServerclDb
clDb
clDb
Hobby
ServerCluster
Db Db DbDb
Db Db DbDb
Db Db DbDb
Shogun
ShogunMonitoring
psql
SSH
AWS APIs
Service{
database: ‘d3lwi9ef2’,
port: 5432,
username: ‘u23f8doife9’,
password: ‘dfwefujp’,
created_at: ‘2012-05-02’,
state: ‘available’
}
Server{
ip: ‘192.168.0.1’
instance_id: ‘i-2fidj3c8’,
ami: ‘pg-prod’,
availability_zone: ‘us-east-1a’
created_at: ‘2012-05-02’,
state: ‘booting’
}
available
creating
uncertain
unavailable
deprovisioning
deprovisioned
service.feel service.tick
Need to do this all the time
SSH
psql
echo '1'
select 1
agentless-ish
collectdpglogplex-collector
wal-e
available
creating
uncertain
unavailable
deprovisioning
deprovisioned
Incident Workers
Automated incident resolution
await_resolution
triggered
human_intervention
resolved
archived
resource down?
restart resource and file a ticket
HA leader down?
fail over to standby and file a ticket
server down?
stop and start AWS instance
Stuff happens constantly
Stuff happens constantly
Incidents let us not worry about 99% of it
Circuit Breakers
everything down?
page a human
EBS disk apocalypse?
page a human
database disk full?
add a new EBS disk and xfs_grow
/dev/xvdg
LVM
/database
/dev/xvdg
LVM
/database
/dev/xvdh
/dev/xvdg
LVM
/database
/dev/xvdh
/dev/xvdg
LVM
/database
/dev/xvdh
Server Features
Infrastructure Feature Flags
Immutable-ish Infrastructure
Durability and Availability
INSERT INTO … 1. Write to WAL
2. Keep it in memory
4. Flush to disk3. Respond to client
Ship WAL at least every 60s
S3
fork
follower
timeline
T0
participant
participant
followers
fork
Point In Time Recovery
disaster
disaster
HA recovery
STONITH
complicated project
modularize and build APIs
composable services
abstract-able services
Thanks!@gregburek
@herokupostgres
+