Post on 27-Jan-2015
description
transcript
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2
MySQL High Availability:Managing Farms of Distributed Servers(MySQL Fabric)
Mats KindahlAlfranio CorreiaNarayanan Venkateswaran
3 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decision. The development,
release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Safe Harbor Statement
4 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Agenda
MySQL High Availability Options
MySQL Fabric – New kid on the block
MySQL Fabric – Failure detection and Failover
MySQL Fabric-aware connectors
MySQL Fabric – Playing with the new kid
5 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL High Availability Options
6 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What Causes Downtime?
System Failures– Server faults
– Software bugs or crashes
Physical Disasters
Scheduled Maintenance
User Errors
7 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Effect and Impact
Effect:– Service Unavailability
– Bad response time
Impact:
– Revenue loss
– Negative impact on customer relationships
– Reduced employee productivity
– Regulatory issues
8 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Another Amazon Outage Exposes the Cloud's Dark LiningBy Brad Stone - Bloomberg Businessweek
“The entire incident lasted all of 49 minutes...”
9 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Causes of Downtime in Production MySQL ServersBy Baron Schwartz – Percona
“It is ironic but true that high-availability tools can cause downtime.”
10 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failures are inevitable so design your systems taking this into account.
11 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
High Availability Solutions
Primary-Secondary
Shared Nothing Clusters
Tightly-coupled Clusters
12 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Simple to configure
Different Platforms
Configured over LAN or WAN
No Shared Storage or Virtual IP required
Primary-Secondary
Characteristics
MySQL Replication in 5.6
Ma
ster
Sla
ve
Sla
ve
Sla
veS
lave
13 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Asynchronous Replication: risk of data loss (unless using semi-sync)
Performance overhead to master
No automatic failover or switchover (unless using MySQL Utilities)
Primary-Secondary
Characteristics
MySQL Replication in 5.6
Ma
ster
Sla
ve
Sla
ve
Sla
veS
lave
14 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Multi-master architecture
No single point of failure
Support for SQL and NoSQL Interfaces
Synchronous replication
Shared Nothing Clusters
Characteristics
MySQL Cluster
MySQL Cluster Data Nodes MySQL Servers
15 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Tightly Coupled Clusters
Provide Active/Passive Solution
Examples:
– DRBD
– WSFC
– Solaris Clustering
– Oracle Virtual Machines
16 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Linux Kernel module integrated into Oracle Linux
Synchronous replication
Only one MySQL operational
Distributed Replicated Block Device
Characteristics
DRBD (Regular Operation)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
Se
rvic
es
Clu
ste
r
17 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Cluster Management System required
Virtual IP migration
Distributed Replicated Block Device
Characteristics
DRBD (Failover)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
Se
rvic
es
Clu
ste
r
18 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Cluster Management System required
Virtual IP migration
Distributed Replicated Block Device
Characteristics
DRBD (Failover)
Pacemaker
MySQL
DRBD
MySQL
DRBD
Corosync
Se
rvic
es
Clu
ste
r
19 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Required:– Windows Clustering– Shared Storage
Only one MySQL Operational
Virutal IP migration
Shared storage used to vote
Shared Storage
Characteristics
Windows Server Failover Clustering (Regular Operation)
Sh
are
d S
tora
ge
Se
rve
rs
MySQL
Windows Clustering
MySQL
Windows Clustering
Se
rvic
es
VoteData
BinaryLog
20 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric – New kid on the block
21 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Distributed framework
Extensions are first-class Citizens
Supported by a variety of connectors
Fault-tolerant solution
You can suggest features, report bugs and contribute patches
MySQL Fabric
Still early alpha, long journey ahead
Farms of MySQL 5.6 Servers
22 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Support for Primary-Secondary
Focus on MySQL 5.6 and later
Written in Python
Birds-eye View
Characteristics
High Availability Groups
MySQL Fabric Application
XML-RPC
SQL
Key Components
23 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Fabric-aware connectors:– Route Transactions– Cache Information– Currently Python, Java,
PHP
Birds-eye View
Characteristics
High Availability Groups
MySQL Fabric Application
XML-RPC
SQL
Fabric-aware Connectors
24 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
XML-RPC is widely available
Extensible Framework
Failures taken into account
Architecture
Characteristics
MySQL
MySQL FabricFramework
ExecutorState Store(Persister)
Sh
?HA
MySQLAMQP XML-RPC
??Extensions
Backing Store
Protocols
25 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric: Prerequisites
MySQL Servers 5.6.10 (or later):– Backing Store
– Managed Servers
Python 2.6 or 2.7 MySQL Utilities 1.4.0
– Available at labs (http://labs.mysql.com)
26 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric – Failure Detection and Failover
27 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Fabric keeps information on groups
Application defines the group that it will use
Connection failures regularly propagated
HA Overview
Characteristics
High Availability GroupMySQL Fabric
ApplicationOperator
28 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failure Detection and Failover
Current Status:– Simple failure detector/recovery per group
Considering:– Make connectors report failures
– Support external/custom failure detectors
– Improve failover/switchover algorithm
– Extend servers/system to avoid the split-brain problem
29 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Enabled per groupFailure Detection
group = Group.fetch(self.__group_id)for server in group.servers(): if server.is_alive(): continue if group.master == server.uuid: trigger("FAIL_OVER", [], self.__group_id) else: trigger("SERVER_LOST", [], self.__group_id, server.uuid) server.status = MySQLServer.FAULTY
Failover if master has gone
Notification if not master
Server marked as faulty
30 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failover
Ma
ster
Sla
ve
Sla
ve
Sla
ve
Sla
ve
T1T2T3 T1
T2T3
T1
T1T2
T1
Master fails
31 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failover
Ma
ster
Sla
ve
Sla
ve
Sla
ve
Sla
ve
T1T2T3 T1
T2T3
T1
T1T2
T1
Choosing a candidate
32 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Failover
Ma
ster
Sla
ve
Sla
ve
Sla
ve
Sla
ve
T1T2T3 T1
T2T3
T1
T1T2
T1
Pointing to the new master
33 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Making Fabric Itself HA
Current Status:– Fabric can automatically resume on-going activities
– Backing store is not left in an inconsistent state
– Information is cached in the connector
Considering:– Replicated State Machine among Fabric nodes
– Use MySQL Cluster as backing store
34 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Crash-safe Procedures
MySQL FabricFramework
State Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Regular Execution
35 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Crash-safe Procedures
MySQL FabricFramework
State Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Failover/Recovery Execution
36 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Crash-safe Procedures
MySQL FabricFramework
State Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
Executor
Procedure. Step 1
Procedure. Step 2
Procedure. Step 3
Resuming Execution
37 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Writing a procedure
@_events.on_event(STEP_1)def do_something(group_id): _do_it(group_id) _events.trigger_within_procedure(STEP_2, group_id) )
@do_something.undodef undo_something(group_id): _undo_it(group_id)
Trigger the next step
Compensate Operation
Transactional Context
38 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric: Using MySQL Cluster
MySQL FabricFramework
State Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL FabricFramework
State Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL Cluster
Executor Executor
39 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL
MySQL FabricFramework
Executor State Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
MySQL
MySQL FabricFramework
State Store(Persister)
Sh
HA
XML-RPC
MySQL FabricFramework
Executor
MySQLAMQP
MySQL
MySQL FabricFramework
ExecutorState Store(Persister)
Sh
HA
MySQLAMQP XML-RPC
RSMRSM
MySQL Fabric: Using Replicated State Machine
40 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric-aware Connectors
41 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Use MySQLFabricConnectionWriting an application
import mysql.connector.fabric as connector
conn = connector.MySQLFabricConnection( fabric={"host": "fabric.example.com", "port" : 8080}, user='mats', passwd= 'passwd', database="employees")conn.set_property(group='YYZ')cur = conn.cursor()
Connecting to a Group
Define a group
Get a cursor to master in YYZ
42 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Connectors cannot hide failuresMulti-statement transaction
43 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Connectors cannot hide failuresSingle-statement transaction
44 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Writing an application
try: conn.start_transaction() conn.execute('INSERT...') conn.execute('UPDATE...') self.__cnx.commit()except InterfaceError as error: cur = conn.cursor()
Handling Connection Failures
Connectors cannot safely retry orreconnect
45 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Plan your application to retry after a failure.
46 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Good practices
Handle session information in the retry logic:– Temporary tables
– Session variables
– Prepared statements
Check the wait_timeout server's property Do not set connection_timeout
47 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Blogs http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html http://alfranio-distributed.blogspot.com/2013/09/tips-to-build-fault-tolerant-database.html
Documents
http://miscalculation/why-mysql/white-papers/mysql-guide-to-high-availability-solutions/
http://dev.mysql.com/doc/workbench/en/mysql-utilities.html
Code
MySQL Fabric available at http://labs.mysql.com/
References
48 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
MySQL Fabric – Playing with the new kid
49 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Use MTR
Do it manually, use sandbox, whatever you like
Starting MySQL Servers
Quick Setup rpl_fabric_gtid.cnf:
!include ../my.cnf
[mysqld.n]reporthost=localhostlogslaveupdatesinnodbgtidmode=onenforcegtidconsistencymasterinforepository=TABLE
source include/have_innodb.inc
rpl_fabric_gtid.test:
50 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Python 2.6 or 2.7
MySQL Utilities 1.4.0
Check configuration file
MySQL Fabric Installation
Quick Setup fabric.cfg:[storage]address = localhost:3306user = fabricpassword = database = fabricconnection_timeout = 6
[protocol.xmlrpc]address = localhost:8080threads = 5url = file:///var/log/fabric.log
51 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Configure the state store
Start fabric
Manage your groups
Run MySQL Fabric
Quick Setupmysqlfabric manage setup
mysqlfabric manage start
Terminal 1:
mysqlfabric listcommands
mysqlfabric group create YYZ
mysqlfabric group add localhost:1300root ''
Terminal 2:
52 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thoughts for the Future
● Connector multi-cast● Scatter-gather
● Internal interfaces● Improve extension support● Improve procedures support
● Command-line interface● Improving usability● Focus on ease-of-use
● More protocols● MySQL-RPC Protocol?● AMQP?
● More frameworks?
● More HA group types● DRBD● MySQL Cluster
● Fabric-unaware connectors?
53 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thoughts for the Future
● “More transparent” sharding● Single-query transactions● Cross-shard joins is a problem
● Multiple shard mappings● Independent tables
● Multi-way shard split● Efficient initial sharding● Better use of resources
● High-availability executor● Node failure stop execution● Replicated State Machine● Fail over to other Fabric node
● Distributed failure detector● Connectors report failures● Custom failure detectors
54 | 21/09/2013 | Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Thank you!
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.55
Your Feedback is Highly Appreciated!
http://forums.mysql.com/list.php?144