Sharding using MySQL and PHP

Post on 27-Jan-2015

119 views 10 download

Tags:

description

In deploying MySQL, scale-out techniques can be used to scale out reads, but for scaling out writes, other techniques have to be used. To distribute writes over a cluster, it is necessary to shard the database and store the shards on separate servers. This session provides a brief introduction to traditional MySQL scale-out techniques in preparation for a discussion on the different sharding techniques that can be used with MySQL server and how they can be implemented with PHP. You will learn about static and dynamic sharding schemes, their advantages and drawbacks, techniques for locating and moving shards, and techniques for resharding.

transcript

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 122

Sharding using PHP

Mats Kindahl (Senior Principal Software Developer)

Insert Picture Here

3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

About the Presentation

After this presentation you should know what sharding is and the basic caveats surrounding sharding. You should also have an idea of what is needed to develop a sharding solution.

4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Program Agenda

Why do we shard

Introduction to sharding

High-level sharding architecture

Elements of a sharding solution

Sharding planning

5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

What is sharding?

● Slice your database into independent data “shards”

● Queries execute only on one shard

● Shards can be stored on different servers

Splintering

HorizontalPartitioning

6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Sharding for locality “Big Data” close to user

7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Sharding for performance

Reduced working set

Parallel processing

Database vs. cache

8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Sharding Limitations

● Auto-increment

– Composite key

– Distributed key generation

– UUID?

● Cross-shard joins

– Very expensive: avoid them

– Federated tables?

9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Developing a Sharding Solution

10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

High-level Architecture

● Broker

– Distributes queries

● Sharding Database

– Information about the shards

– If it goes down, all goes down

– Need to be HA

11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Running Example: Employees sample database

Table Rows

salaries 2 844 047

titles 443 308

employees 300 024

dept_emp 331 603

dept_manager 24

departments 9

00

0000

00

0000

12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Data

Query Operations

Meta-Data

Areas to cover

ShardingSharding

13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

PartitionData

MappingKeys

ShardAllocation

Data

Key Columns

Dependent Columns

Tables to Shard

Single Shard

Multiple Shards

Range Mapping

Hash Mapping

List Mapping

14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data

Table Rows

salaries 284 404 700

titles 44 330 800

employees 30 002 400

dept_emp 33 160 300

dept_manager 2 400

departments 900

15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: sharding column(s)

● Sharding columns dictated by queries

– Queries should give same result before and after sharding

● One or more columns

– Does not have to be primary key, but easier if it is

● Sharding key is needed for re-sharding

emp_no birth_date first_name last_name gender hire_date

4711 1989-06-13 John Smith M 2009-12-24

19275 1954-11-12 Sally Smith F 1975-01-01

27593 1477-05-19 Mats Kindahl M 2002-02-27

587003 1830-08-28 Charles Bell M 2003-11-31

16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

● Choice of sharding columns

– Distribution

– Locality

● Avoid non-unique keys

– Difficult to get good distribution

– Avoid: Country

– Prefer: Employee ID

9 millions

200 millions

US

SE

Partitioning the data: sharding column(s)

17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: dependent columns

Table Rows

salaries 284 404 700

titles 44 330 800

employees 30 002 400

dept_emp 33 160 300

dept_manager 2 400

departments 900

??

??

??

??Foreign keys

18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: dependent columns

● Referential Integrity Constraint

– Example query joining salaries and employees

– Same key, same shard

● JOIN within a shard

SELECT first_name, last_name, salaryFROM salaries JOIN employees USING (emp_no)WHERE emp_no = 21012 AND CURRENT_DATE BETWEEN from_date AND to_date;

19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: dependent columns

● Referential Integrity

– Foreign Keys

● Dependent rows

– Same shard

– Join on equality

● Sharding Columns

– Follow foreign keys

mysql> SELECT table_schema, table_name, column_name -> FROM -> information_schema.key_column_usage -> JOIN -> information_schema.table_constraints -> USING -> (table_schema, table_name, constraint_name) -> WHERE constraint_type = 'FOREIGN KEY' -> AND referenced_table_schema = 'employees' -> AND referenced_table_name = 'employees' -> AND referenced_column_name = 'emp_no';+--------------+--------------+-------------+| table_schema | table_name | column_name |+--------------+--------------+-------------+| employees | dept_emp | emp_no || employees | dept_manager | emp_no || employees | salaries | emp_no || employees | titles | emp_no |+--------------+--------------+-------------+4 rows in set (0.56 sec)

Handy query to f

ind

all dependent colu

mns

20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: unsharded tables

Table Rows

salaries 284 404 700

titles 44 330 800

employees 30 002 400

dept_emp 33 160 300

dept_manager 2 400

departments 900

??

21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: unsharded tables

● Referential Integrity Constraint

– Join with sharded tables

– Tables dept_emp (and dept_manager) references two tables

● Shard table departments?

– Not necessary: small table

– Difficult to get right: keeping shards of two tables in same location

SELECT first_name, last_name, GROUP_CONCAT(dept_name) FROM employees JOIN dept_emp USING (emp_no) JOIN departments USING (dept_no)WHERE emp_no = 21012 GROUP BY emp_no;

22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Partitioning the data: unsharded tables

● Solution: do not shard departments

– Keep table on all shards

– Joins will only need to address one shard

● You need to consider

… how to update unsharded table

SELECT first_name, last_name, GROUP_CONCAT(dept_name) FROM employees JOIN dept_emp USING (emp_no) JOIN departments USING (dept_no)WHERE emp_no = 21012 GROUP BY emp_no;

23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

PartitionData

MappingKeys

ShardAllocation

Data

Key Columns

Dependent Columns

Tables to Shard

Single Shard

Multiple Shards

Range Mapping

Hash Mapping

List Mapping

24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Keys to Shards

● Given

– Sharding key value

– Optional other information (tables accessed, RO or RW, etc.)

● Provide the following

– Shard location (host, port)

– Shard identifier (if you have multiple shards for each server)

25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Keys to Shards

● Range Mapping: range of values for each shard

– Type-dependent

● Hash Mapping: hash of key to find shard

– Type-independent

– Complicated?

● List Mapping: list of keys for each shard

– Does not offer good distribution

26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

PartitionData

MappingKeys

ShardAllocation

Data

Key Columns

Dependent Columns

Tables to Shard

Single Shard

Multiple Shards

Range Mapping

Hash Mapping

List Mapping

27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Shard Allocation: Single Shard per Server

● Idea: there is only one shard on each server

● Advantage: Cross-database queries does not require rewrite

● Disadvantage: Expensive to balance server load

… moving hot data from server requires re-sharding

SELECT first_name, last_nameFROM  employees.employees JOIN expenses.reciepts USING (emp_no)WHERE  currency = 'USD'

28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Shard Allocation: Multiple Shards per Server

● Idea: Keep several “virtual shards” on each server

● Advantages

– Easier to balance load of servers

… move hot virtual shards to other server

– Improves performance

– Increases availability

29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

SELECT first_name, last_nameFROM  employees.employees JOIN expenses.reciepts USING (emp_no)WHERE  currency = 'USD'

Shard Allocation: Multiple Shards per Server

● Disadvantage: cross-database queries require rewrite

– Error-prone

– Expensive?

● Queries that go to one database not a problem

30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Shard Allocation: Multiple Shards per Server

● Idea: Add suffix to database name (optionally table name)

employees_N.employees

employees_N.employees_N

● Idea: Keep substitution pattern in query string

SELECT first_name, last_nameFROM  {employees.employees} JOIN {expenses.reciepts} USING (emp_no)WHERE  currency = 'USD'

31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Shard Allocation: Multiple Shards per Server

class my_mysqli extends mysqli {  var $shard_id;

  public function query($query,                        $resultmode = MYSQLI_STORE_RESULT)  {    $real_query = preg_replace('/\{(\w+)\.(\w+)\}/',                               “$1_{$this­>shard_id}.$2”,                               $query);    return parent::query($real_query, $resultmode);  }}

32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Data

Query Operations

Meta-Data

Areas that we need to cover

ShardingSharding

33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

MappingSchemes

Range Mapping

Hash Mapping

List Mapping

ShardInformation

Shard ID

Shard Host

Shard Specifics*

MappingMethods

Static Sharding

Dynamic Sharding

Meta Data

* If you use multiple shards per server

34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Methods: Static Sharding

● Idea: Compute shard statically

● Advantages

– Simple

– No extra lookups

– No single point of failure

● Disadvantage

– Lack of flexibility

35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Methods: Static Sharding, in code

class Dictionary { private $emp_no; public function __construct() { ... } public function set_key($emp_no) { $this->emp_no = $emp_no; }

public function get_connection() { $i = $this->shardinfo[$this->emp_no % count($this->shards)]; return new mysqli("p:{$i->host}", $i->user, $i->passwd, $i->db, $i->port); }}

● Dictionary class

● Input: sharding key

● Output: connection

36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Methods: Static Sharding, in code

$HIRED = <<<END_OF_QUERYSELECT first_name, last_name, hire_date, salary FROM employees AS e, salaries AS sWHERE s.emp_no = e.emp_no AND e.emp_no = ? AND CURRENT_DATE BETWEEN s.from_date AND s.to_dateEND_OF_QUERY;

$DICTIONARY = new Dictionary();

$DICTIONARY->set_key($emp_no);$link = $DICTIONARY->get_connection();if ($stmt = $link->prepare($HIRED)) { $stmt->bind_param('i', $emp_no); $stmt->execute(); $stmt->bind_result($first, $last, $hire, $salary); while ($stmt->fetch()) printf("%s %s was hired at %s and have a salary of %s\n", $first, $last, $hire, $salary);}

37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Methods: Dynamic Sharding

● Idea: use a sharding database to keep track of shard locations

● Advantages:

– Easy to migrate shards

– Easy to re-shard

● Disadvantages:

– Complex

● Performance?

38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Dynamic sharding, in code

$FETCH_SHARD = <<<END_OF_QUERYshard selection queryEND_OF_QUERY;

class Dictionary { var $dict; var $emp_no;

public function __construct() { $this->dict = new mysqli('shardinfo.example.com', ...); }

public set_key($emp_no) { $this->emp_no = $emp_no; }

public function get_connection() { $stmt = $this->dict->prepare($FETCH_SHARD)) $stmt->bind_param('i', $this->emp_no); $stmt->execute(); $stmt->bind_result($no, $host, $user, $passwd, $db, $port); $stmt->fetch(); return new mysqli("p:{$host}", $user, $passwd, $db, $port); }}

39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

MappingSchemes

Range Mapping

Hash Mapping

List Mapping

ShardInformation

Shard ID

Shard Host

Shard Specifics*

MappingMethods

Static Sharding

Dynamic Sharding

Meta Data

* If you use multiple shards per server

40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Schemes: Range Mapping

● Most basic scheme

● One row for each range

● Just store lower bound

Shard ID Lower

0 0

1 20000

2 50000

SELECT shard_id, hostname, portFROM shard_ranges JOIN shard_locations USING (shard_id)WHERE key_id = 1 AND 2345 >= shard_ranges.lower_boundORDER BY shard_ranges.lower_bound LIMIT 1;

41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Schemes: Regular Hashing

● Computing a hash from the key

ShardID = SHA1(key) mod N

● Adding (or removing) a shard

… can require moving rows between many shards

… often a lot of rows

42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Schemes: Regular Hashingemp_no=36912emp_no=23456emp_no=43210emp_no=20101

0 1 2 43

HASH(key) mod N

N

N+1

43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Mapping Schemes: Consistent Hashing

● Computing a hash from the key

SHA1(key)

● Adding (or removing) a shard

… only require moving rows from one shard to the new shard

Shard ID Hash

6 08b1286ad1bebe6...

2 1c2d4132144211a...

4 9893238ed75cfc9...

1 989bb9d2bc381f4...

5 cab8c76b85c4e24...

3 eccf30f69fe850f...

44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

HashRing

shard3

shard1

shard2

shard4

Mapping Schemes: Consistent Hashing

emp_no=20101

emp_no=43210

emp_no=23456

emp_no=36912

shard5

45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Data

Query Operations

Meta-Data

Areas that we need to cover

ShardingSharding

46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

QueryDispatch

Mechanism

Single/Multi Cast

Handling Reads

Handling Updates

ConnectorCaches

QueryHandling

ShardingKey

Parsing

Applicationprovided

Time (TTL)

On Error

Explicit

Transaction Handling

47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Query Dispatch: Mechanism

● Proxy

– Sharding key extracted from query

– Requires extra hop

● Application level

– Application provides sharding key

– No extra hop

48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Query Dispatch: Query Type

● Read Query

– How do you ensure that it is executing on the right shard?

– How do you ensure that it is not cross-shard?

● Update Query

– Updating an unsharded table – think about consistency

49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Query Dispatch: Handling Transactions

● All statements of a transaction should go to the same session

– Sharding key on start of transaction?

– Is it a read-only or read-write transaction?

● Statements for different transactions can go to different sessions

– How to detect transaction boundaries

● Maintaining the session state

50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Query Dispatch: Handling Transactions

BEGINSELECT salary INTO @s FROM salaries WHERE emp_no = 20101;SET @s = 1.1 * @s;INSERT INTO salaries VALUES (20101, @s);COMMITBEGININSERT INTO ... COMMIT

Sharding key? Ah, there it is!Session state?

Hmm... looks likea read transaction

Oops.. it was awrite transaction!

Transaction done!Clear session state?

New transaction! Different connection?

51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

QueryDispatch

Mechanism

Single/Multi Cast

Handling Reads

Handling Updates

ConnectorCaches

QueryHandling

ShardingKey

Parsing

Applicationprovided

Time (TTL)

On Error

Explicit

Transaction Handling

52 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Extracting Sharding Key

● Parsing the query

– Locating the key

– Handling Transactions

● Application-provided sharding key

– Annotating queries

– Separate function in connector

53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Extracting Sharding Key: Parsing Query

● Problem: Locating the key

● No generic parser

– Application specific parser

– Constrain application developer

● Transactions

– Key needed for first statement

INSERT INTO  titles(emp_no, title, from_date)SELECT emp_no, '', CURRENT_DATEFROM titles JOIN employees     USING (emp_no)WHERE first_name = 'Keith'

BEGINSELECT …INSERT …COMMIT;

54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Extracting Sharding Key: Application Provided

● Idea: Provide key explicitly

● Annotate the statement

● Extend connection manager

– Demonstrated previously

/* emp_no=20101 */ BEGIN;SELECT …INSERT …COMMIT;

…$DICT­>set_key($key);$link = $DICT­>get_connection();…

55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Data

Query Operations

Meta-Data

Areas that we need to cover

ShardingSharding

56 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Monitoring the System

● Monitor load of each node

… to see if any node get an unfair number of queries

● Monitor load of each shard (multiple shards per node)

… to see if a shard gets an unfair number of queries

57 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Re-balancing the System

● If a instance is hot:

– Move Shard: Move one shard to another instance

● If a shard is hot:

– Split Shard: Split the shard into multiple shards

– Move Shard: Move one of the shards to another instance

● If a shard is cold:

– Merge Shard: Merge a shard with other shards

● Avoid it – very difficult to do on-line

58 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Moving a Shard

● Offline (trivial)

– Bring source and target nodes down

– Copy shard from source to target

– Update dictionary

● Online (tricky)

– We go through it on the following slides

59 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Online Move of Shard

1. Backup shard

– Might be multiple databases

– Note down binary log position

● “Backup position”

– Online backup

● mysqldump

● MySQL Enterprise Backup

2. Restore backup on destination

Dst Src

@Pos

Application

60 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Online Move of Shard

3. Start replication

– Source to target

– Start replication from backup position

– Only replicate shard?Dst Src

@Pos

replicate­wild­do­table=db_1.*

Application

61 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Online Move of Shard

4. Wait until destination is close enough

5. Write lock on source

LOCK TABLES

6. Note binary log position

– “Catch-up Position”

Dst Src

@Pos

Application

62 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Online Move of Shard

7. Wait for destination to reach catch-up position

START SLAVE UNTIL

MASTER_POS_WAIT Dst Src

Application

@Pos

63 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Online Move of Shard

8. Update sharding database

… will re-direct queries

9. Stop replication

RESET SLAVE

10.Drop old shard

… unless you just wanted a copy

Dst Src

Application

@Pos

64 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Splitting a Shard

● Application dependent

– Change sharding key?

– Change sharding scheme?

● Can be expensive

● You will have to do it

… eventually

65 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Operations: Splitting a Shard

1. Copy shard to new location

– Use on-line move described on previous slides

2. Update sharding database

– Will re-direct queries

3. Remove rows from both shards

– Remove rows that do not belong to the shard

2

3 31

one.example.com two.example.com

66 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Great!Let's Shard!

Wait aminute...

67 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

When to shard?

● Inherently more complex

– Requires careful planning

– Application design?

● Alternatives?

– Functional partitioning?

– Archiving old data?

68 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Preparations for sharding

● Monitor the system

– Types of queries

● What are the join queries

– Access patterns

● What tables are accessed

● Find natural partition keys

– Robust and easy to implement

– Watch out for cross-shard joins

69 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Summary

● What are your goals?

● Do your homework

● Don't be too eager

● Plan

● Develop sharding solution

● Revise the plans

70 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Thanks for attending!

● Questions? Comments?

● Download MySQL!

http://dev.mysql.com

● Read our book!

– Covers replication, sharding, scale-out, and much much more