The Database Designer’s Modern-Day Cookbook
Preetam JinkaSoftware Engineer
Percona Live 2017
VividCortex’s database monitoring application is the best way to improve your database performance, efficiency, and uptime. Supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora, VividCortex uses patented algorithms to reveal key insights, helping users fix performance problems before they impact customers. Say hello and see a demo, Booth #205.
We’re hiring!
Topics
● Immutability
● Transactions & ACID
● Replication
3
Topics
● Immutability
● Transactions & ACID
● Replication
But mainly about trade-offs.
4
Immutability
5
6
Immutability
Not changing something once it’s created.
An immutable database?
Just use a log.
● Write optimized
● Transactional
● Everything else is just a read optimization, right?
● Space might become a problem...
7
Something more realistic
● Two ways to update data.
○ In-place
○ Copy-on-write (the immutable approach)
8
In-place Copy-on-write
Concurrency
Databases need to handle multiple readers and writers.
And they need to provide certain guarantees
(transactions).
9
In-place updates with ARIES
Algorithms for Recovery and Isolation Exploiting Semantics
● Write-ahead logging
● Redo logs
● Undo logs
Systems like MySQL, Oracle, SQL Server, DB2 use something like ARIES to
manage transactions.
10
Copy-on-write with row versioning
● Systems like PostgreSQL create a copy of data when it needs to be changed.
● Immutability is inherently free of data races!
● But you need to get rid of old versions through vacuuming.
● You also need to manage the overhead of multiple versions.
● This is why systems like PostgreSQL don’t have an “undo log.”
11
> … PG must do more work at commit time, right?
No. Commit and abort are both O(1). Where we pay the piper is in
having to run VACUUM to clean up no-longer-needed row versions.
This is a better design in principle, because the necessary maintenance
can be done in background processes rather than making clients wait
for transactions to finish. In practice, it's still pretty annoying,
just in different ways than Oracle's UNDO.
http://www.postgresql-archive.org/PG-and-undo-logging-td5850789.html
12
Secondary indexes
● Secondary indexes need to point to the original row.
● For MySQL, you just need the primary key.
● For PostgreSQL, you need the primary key and a version.
○ Primary keys aren’t unique because there could be different row versions!
13
What’s better?
It’s a trade-off!
14
Uber’s Migration to MySQL
Uber migrated from PostgreSQL to MySQL. Their reasons:● Inefficient architecture for writes
● Inefficient data replication
● Issues with table corruption
● Poor replica MVCC support
● Difficulty upgrading to newer releases
https://eng.uber.com/mysql-migration/In other words: PostgreSQL probably didn’t have a set of trade-offs that worked well for them.
15
Transactions & ACID
16
17
Transactions are complicated!
MVCC
ARIES
UNDO logs REDO logs
Row locks
Isolation levelsACID
Write skew
Snapshot isolation
Write-ahead log
Consistency
Commit
This is about making them simple.
18
ACID transactions
19
● Atomicity
● Consistency
● Isolation
● Durability
What do you need for ACID?
1. A snapshot view of the data
2. Durable, atomic writes
● Immutability makes #1 easier.
● Single writer makes #2 easier.
You can get both from a log.
20
...but transactions & ACID in the real world tend to be much more complicated...
...because not everything uses immutability, and most systems are not single writer.
21
Replication
22
23
Replication
Copying data to several places.
24
Replication choices
● Asynchronous
● Synchronous
○ Semi-synchronous is another option with MySQL
As usual… trade-offs.
Replication spectrum
25
Synchronous AsynchronousSemi-sync
Most guarantees
Leastflexible
Least guarantees
Mostflexible
● Requires coordination at the master○ Coordination can get complicated...
● There’s waiting involved○ Replica lag doesn’t exist
● Safe
Synchronous
26
● You need some sort of “master” or “leader” server handling coordination.
● Leader election and consensus are ways of selecting a master automatically○ Paxos is a consensus algorithm that’s widely used. MySQL Group Replication uses a variant in
their multi-master approach.
Synchronous
27
Synchronous
28
Master
Replica Replica
Master pushes to replicas
Asynchronous
29
Master
Replica Replica
Replicas pull from the master
Asynchronous
30
● Less coordination
● Pro: Master doesn’t wait for replicas.○ It’s faster because there’s no waiting.
● Con: Master doesn’t wait for replicas.○ Replicas can fall behind.
● Delayed replicas can be really useful when things go wrong!
Disaster recovery with a delayed replica
31
DigitalOcean’s April 2017 Outage:
“Within three minutes of the initial alerts, we discovered that our primary
database had been deleted. Four minutes later we commenced the recovery
process, using one of our time-delayed database replicas. Over the next four
hours, we copied and restored the data to our primary and secondary replicas.”https://www.digitalocean.com/company/blog/update-on-the-april-5th-2017-outage/
The diagrams look similar but they’re very different.
32
Master
Replica Replica
Replicas pull from the master
Master
Replica Replica
Master pushes to replicas
Use the right tool for the job.
33
Final thoughts
34
● There are trade-offs everywhere.
● You’re not limited to a single technology or implementation.
● Things keep getting more exciting.
Questions?
35