Unique ID generation in distributed systems

Post on 29-Nov-2014

22,862 views 4 download

description

A run through of the various options available for generating unique IDs

transcript

ID generation

PHP London 2012-08-02@davegardnerisme

@davegardnerisme

hailoapp.com/dave(for a £5 discount)

Web AppMySQL

DC 1

MySQL auto increment

1,2,3,4…

MySQL auto increment

• Numeric IDs

• Go up with time

• Not resilient

Web AppMySQL

DC 1

MySQL multi-master replication

MySQL

1,3,5,7…

2,4,6,8…

MySQL multi-master replication

• Numeric IDs

• Do not go up with time

• Some resilience

Going global…

DC 1

DC 2

DC 3

DC 4

DC 5

DC 6

Web App

DC 1

MySQL in multi DC setup

MySQL

Web App

DC 2

?

1,2,3…

WAN LINK

Web App

DC 1

Flickr MySQL ticket server

Ticket Server

Web App

DC 2

1,3,5…

WAN LINK

Ticket Server

4,6,8…

WAN link not required to generate an ID

Flickr MySQL ticket server

• Numeric IDs

• Do not go up with time

• Resilient and distributed

• ID generation separated from data store

DC

The anatomy of a ticket server

Web App

Web App

Web App

Web App

Ticket Server

DC

Making things simpler

ID gen

Web App

ID gen

Web App

ID gen

Web App

ID gen

Web App

UUIDs

• 128 bits

• Could use type 4 (Random) or type 1 (MAC address with time component)

• Can generate on each machine with no co-ordination

Type 4 – random

xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

f47ac10b-58cc-4372-a567-0e02b2c3d479

version

variant (8, 9, A or B)

5.3 x 1036

possible values for a type 4 UUID

1.1 x 1019

UUIDs we could generate per second since the Universe began

2.1 x 1027

Olympic swimming pools filled if each possible value contributed a millilitre

Type 1 – MAC address

51063800-dc76-11e1-9fae-001c42000009

• Time component is based on 100 nanosecond intervals since October 15, 1582

• Most significant bits of timestamp shifted to least significant bits of UUID

Type 1 – MAC address

• The address (MAC) of the computer that generated the ID is encoded into it

• Lexical ordering essentially meaningless

• Deterministically unique

There are some other options…

No co-ordination needed

Deterministically unique

K-ordered (time-ordered lexically)

Twitter Snowflake

• Under 64 bits

• No co-ordination (after startup)

• K-ordered

• Scala service, Thrift interface, uses Zookeeper for configuration

Twitter Snowflake

41 bits Timestampmillisecond precision,

bespoke epoch

10 bits Configured machine ID

12 bits Sequence number

Twitter Snowflake

77669839702851584

= (timestamp << 22) | (machine << 12) | sequence

Boundary Flake

• 128 bits

• No co-ordination at all

• K-ordered

• Erlang service

Boundary Flake

64 bits Timestampmillisecond precision,

1970 epoch

48 bits MAC address

16 bits Sequence number

PHP Cruftflake

• Based on Twitter Snowflake

• No co-ordination (after startup)

• K-ordered

• PHP, ZeroMQ interface, uses Zookeeper for configuration

Questions?

References

Flickr distributed ticket serverhttp://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/

UUIDshttp://tools.ietf.org/html/rfc4122

How random are random UUIDs?http://stackoverflow.com/a/2514722/15318

Twitter Snowflakehttps://github.com/twitter/snowflake

Boundary Flakehttps://github.com/boundary/flake

PHP Cruftflakehttps://github.com/davegardnerisme/cruftflake

private function mintId64($timestamp, $machine, $sequence){ $timestamp = (int)$timestamp; $value = ($timestamp << 22) | ($machine << 12) | $sequence; return (string)$value;}

private function mintId32($timestamp, $machine, $sequence){ $hi = (int)($timestamp / pow(2,10)); $lo = (int)($timestamp * pow(2, 22)); // stick in the machine + sequence to the low bit $lo = $lo | ($machine << 12) | $sequence;

// reconstruct into a string of numbers $hex = pack('N2', $hi, $lo); $unpacked = unpack('H*', $hex); $value = $this->hexdec($unpacked[1]); return (string)$value;}

public function generate(){ $t = floor($this->timer->getUnixTimestamp() - $this->epoch); if ($t !== $this->lastTime) { $this->sequence = 0; $this->lastTime = $t; } else { $this->sequence++; if ($this->sequence > 4095) { throw new \OverflowException('Sequence overflow'); } } if (PHP_INT_SIZE === 4) { return $this->mintId32($t, $this->machine, $this->sequence); } else { return $this->mintId64($t, $this->machine, $this->sequence); }}