+ All Categories
Home > Software > MySQL High Availability - Managing Farms of Distributed Servers

MySQL High Availability - Managing Farms of Distributed Servers

Date post: 28-Nov-2014
Category:
Upload: narayanan-venkateswaran
View: 209 times
Download: 2 times
Share this document with a friend
Description:
MySQL Fabric High Availability Presentation for Oracle Open World 2014
51
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | MySQL High Availability: Managing Farms of Distributed Servers Narayanan Venkateswaran Alfranio Correia Mats Kindahl Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Transcript
Page 1: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL High Availability:Managing Farms of Distributed Servers

Narayanan Venkateswaran

Alfranio Correia

Mats Kindahl

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.

Page 2: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

The following is intended to outline our general product direction. It

is intended for information purposes only, and may not be

incorporated into any contract. It is not a commitment to deliver any

material, code, or functionality, and should not be relied upon in

making purchasing decision. The development, release, and timing

of any features or functionality described for Oracle’s products

remains at the sole discretion of Oracle.

Page 3: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems➢ MySQL Fabric Overview

➢ Groups

➢ Failure Detection

➢ Integration with other HA Solutions

➢ High Availability @ Fabric Node

➢ Future Work

Page 4: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Causes of Downtime

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● System Failures

● Server faults

● Software bugs or crashes

● Physical Disasters

● Scheduled Maintenance

● User Errors

Page 5: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

High Availability is an integral part of designing a reliable system

Page 6: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

High Availability Solutions

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● Primary – Secondary

● E.g. MySQL Replication

● Shared Nothing Clusters

● MySQL Cluster

● Tightly Coupled Clusters

● DRBD

● WSFC

● Solaris Clustering

● Oracle VM High Availability

Page 7: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems

➢ MySQL Fabric Overview➢ Groups

➢ Failure Detection

➢ Integration with other HA Solutions

➢ High Availability @ Fabric Node

➢ Future Work

Page 8: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Fabric

Jan 2010

Sept 2014

An extensible and easy-to-use framework for managing a farm of MySQL servers supporting high-availability and sharding.

Page 9: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

What does all that mean?

● Management System– Manages a MySQL Farm– Distributed Framework

● Framework– Procedure execution– State store– Transaction Routing

● Extensible– High-availability groups– “Semi-automatic” sharding

● Written in Python● Latest Release 1.5.2

GA● Open Source

– You can participate– Suggest features– Report bugs– Contribute patches

● MySQL 5.6 is focus

Page 10: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

Birds-eye View

High Availability Groups

Application

XML-RPCMySQL-RPC

SQL

SQL

Connector

Connector

Connector

Operator

MySQLFabricNode

DatabaseServers

Page 11: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

High-Level Components

● Fabric-aware Connectors– Python, PHP, Java, .NET,

C– Enhanced Connector API

● MySQL Fabric Node– Manage information about

farm– Provide status information– Execute procedures

● MySQL Servers– Organized in high-

availability groups– Handling application data

High AvailabilityGroup

ApplicationConnector

Connector

Connector

MySQLFabricNode

Page 12: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

MySQL Fabric Node Architecture

MySQL

MySQL FabricFramework

Executor State Store(Persister)

Sh

HA

MySQL-RPCAMQP XML-RPC

?Connector

Connector

Connector

Protocols

Extensions

BackingStore

Page 13: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems

➢ MySQL Fabric Overview

➢ Groups➢ Failure Detection

➢ Integration with other HA Solutions

➢ High Availability @ Fabric Node

➢ Future Work

Page 14: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Groups

Jan 2010

Sept 2014

● Group of Servers

– Hardware Redundancy– Data Redundancy

● Generic Concept

– Default Master-Slave configuration

– Supports generic High Availability methods

● DRBD● MySQL Cluster● MySQL Replication● Etc.

DRBD

ndbdndbd

ndbd ndbd

Page 15: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Creating a MySQL Fabric Group

Jan 2010

Sept 2014

Creating a Group

Command:mysqlfabric group create <group_id> [--description=NONE]

E.g.mysqlfabric group create group-1 –description=”Example Group”

Creates a logical entity named group-1 that represents a farm of MySQL Servers in a HA configuration

Page 16: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Fabric Group – Add Servers

Jan 2010

Sept 2014

Adding Servers to a Group

Command:mysqlfabric group add <group_id> <address>

E.g.mysqlfabric group add group-1 server1.example.commysqlfabric group add group-1 server2.example.com

Adds two servers to group-1

Page 17: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Fabric Group – Create Master / Slave Setup

Jan 2010

Sept 2014

Adding Servers to a Group

Command:mysqlfabric group promote group_id [ --slave_id=NONE]

E.g.mysqlfabric group promote group-1 --slave_id=”server1.example.com”

Promote server1 as the master

Page 18: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems

➢ MySQL Fabric Overview

➢ Groups

➢ Failure Detection➢ Integration with other HA Solutions

➢ High Availability @ Fabric Node

➢ Future Work

Page 19: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Built-In Failure Detector

Jan 2010

Sept 2014

● Group level failure detector

● Monitor servers within groups

● On master failure– Mark master as faulty

– Trigger fail-over

● On slave failure– Mark slave as faulty

● Solution is only for servers managed by Fabric

Page 20: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Built-in Failure detector configuration

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

[failure_tracking]detections = 3detection_interval = 6detection_timeout = 1

detectionsNumber of times a server must fail an alive check before being marked as FAULTY.

detection_intervalInterval at which a server must be checked to be alive.

detection_timeoutElapsed time after which a server alive ping must timeout.

Page 21: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection

Jan 2010

Sept 2014

● Connector reports error to Fabric

● Error used to update backing store

● Trigger Failover

– Error count reaches threshold

● The report API can be used by any generic process to report failures

Page 22: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – MySQL Fabric Configuration

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

[failure_tracking]notifications = 300notification_clients = 50notification_interval = 60failover_interval = 0prune_time = 3600

notificationsThe number of error notifications required to mark a server FAULTY.

notification_clientsThe number of clients that need to report an error to mark a server FAULTY.

Page 23: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – MySQL Fabric Configuration

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

notification_intervalThe interval of interest in which we want to verify error notifications.

failover_intervalThe interval at which we can safely do a failover in a group without causing system instability.

prune_timeInterval at which we need to prune the error logs.

Page 24: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – Connector – Enabling Error Reporting

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● The option to report errors is part of the Fabric configuration

● “report_errors” can be turned on as follows

● Causes an error during a connection to a server to be reported

fabric_config = { 'host': .., 'report_errors': True, }

cnx = mysql.connector.connect(fabric=fabric_config)

Page 25: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – Connector – Error Reporting

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● The following errors are reported by default

● The above can be extended by setting extra_failure_report

REPORT_ERRORS = ( errorcode.CR_SERVER_LOST, errorcode.CR_SERVER_GONE_ERROR, errorcode.CR_CONN_HOST_ERROR, errorcode.CR_CONNECTION_ERROR, errorcode.CR_IPSOCK_ERROR,)

from mysql.connector.fabric import extra_failure_report extra_failure_report([error_code_0, error_code_1, ...])

Page 26: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – Connector – Error Reporting

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● The following errors are reported by default

● The above can be extended by setting extra_failure_report

REPORT_ERRORS = ( errorcode.CR_SERVER_LOST, errorcode.CR_SERVER_GONE_ERROR, errorcode.CR_CONN_HOST_ERROR, errorcode.CR_CONNECTION_ERROR, errorcode.CR_IPSOCK_ERROR,)

from mysql.connector.fabric import extra_failure_report extra_failure_report([error_code_0, error_code_1, ...])

Page 27: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – Connector – Cache Invalidation

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● Define which errors cause invalidation of the connector cache

● The following errors cause a cache invalidation by default

• The above can be extended by

RESET_CACHE_ON_ERROR = ( errorcode.CR_SERVER_LOST, errorcode.ER_OPTION_PREVENTS_STATEMENT,)

from mysql.connector.fabric import RESET_CACHE_ON_ERROR RESET_CACHE_ON_ERROR.append(error_code_0)

Page 28: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – Avoiding thundering herds

Jan 2010

Sept 2014

Application

1. Error

MySQL Fabric Node

Application

Application

Application

2. Error

3. Error

4.Error

● When to report errors ?

● Reporting errors at the session

level can cause flooding

● E.g. Orders Page

– Orders Page writes customer

orders to the database

– Every failing order will cause

a report

Page 29: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Distributed Failure Detection – Connector – Role

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● MySQL User role for accessing to Fabric dump and reporting

● Adding users done using

● connector role

● Causes an error during a connection to a server to be reported

3 connector Role for MySQL Connectors + Access to dump commands + Reporting to Fabric

mysqlfabric user add <username> --role

Page 30: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems

➢ MySQL Fabric Overview

➢ Groups

➢ Failure Detection

➢ Integration with other HA Solutions➢ High Availability @ Fabric Node

➢ Future Work

Page 31: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Update Only

Jan 2010

Sept 2014

Scenario

● Using Fabric as a lookup server

● Servers managed outside Fabric

– For E.g. using DRBD

Requirement

● Update Fabric without affecting servers

Page 32: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Group Based and Virtual IP Based solutions

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf

● Group Based Solutions

● Offer a notion of Group Membership

● Virtual IP Based Solutions

● Only one virtual server identified by a virtual IP Address

● E.g. WFSC, DRBD

Page 33: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Group Based System

Jan 2010

Sept 2014

● Notion of membership

● Dead servers automatically excluded from the group

● Use update_only to store topology information in Fabric

Page 34: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Group Based System

Jan 2010

Sept 2014

● Creating a Group● Adding Servers to a Group● Promote

Creating a Groupmysqlfabric group create "group-1"

Adding Servers to a Groupmysqlfabric group add group-1 server1.example.com --update_onlymysqlfabric group add group-1 server2.example.com -- update_only

Promotemysqlfabric group promote group-1 --slave_id=7bcb0804-41bb-11e4-b4d2-f4ce2963772b --update_only

Page 35: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Group Based System – Updating Failures

Jan 2010

Sept 2014

● Updating the state when all the servers are dead● Updating the state when the system is brought back

online

When all the servers are deadmysqlfabric threat report_error 7bcb0804-41bb-11e4-b4d2-f4ce2963772b --error=FAULTY --update_only

When the system is brought back onlinemysqlfabric group promote group-1 --slave_id=7bcb0804-41bb-11e4-b4d2-f4ce2963772b --update_only

Page 36: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Virtual IP System – Integrating DRBD

Jan 2010

Sept 2014

● Creating a Group● Adding Virtual IP to Group

Creating a Groupmysqlfabric group create "group-1"

Adding Servers to a Groupmysqlfabric group add group-1 IPAddress:Portno --update_only

● Add the virtual IP of the DRBD setup to the group definition.● The DRBD setup handles the maintanence of the cluster

Page 37: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Virtual IP System – Updating Failures

Jan 2010

Sept 2014

● Updating the state when all the servers are dead● Updating the state when the system is brought back

online

When all the servers are deadmysqlfabric threat report_error 7bcb0804-41bb-11e4-b4d2-f4ce2963772b --error=FAULTY --update_only

When the system is brought back onlinemysqlfabric group promote group-1 --slave_id=7bcb0804-41bb-11e4-b4d2-f4ce2963772b --update_only

Page 38: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems➢ MySQL Fabric Overview

➢ Groups

➢ Failure Detection

➢ Integration with other HA Solutions

➢ High Availability @ Fabric Node➢ Future Work

Page 39: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

HA @ The MySQL Fabric Node

Jan 2010

Sept 2014

● Fabric Node Fails

● Crash-Safe Procedures

● State Store Fails

● Safeguarding meta-data

● Both Fabric and State Store Fails

Page 40: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Crash Safe Procedures

Jan 2010

Sept 2014

● Each Fabric Operation is a procedure

● Procedure consists of a set of steps/jobs

● Steps/Jobs

● Can be Rolled Back

● Are Idempotent

Page 41: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

State Store failure

Jan 2010

Sept 2014

● Storing Data in MySQL Cluster

● Storing Data in replicated Storage

Page 42: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Agenda

Jan 2010

Sept 2014

➢ Building Reliable Systems

➢ MySQL Fabric Overview

➢ Groups

➢ Failure Detection

➢ Integration with other HA Solutions

➢ High Availability @ Fabric Node

➢ Future Work

Page 43: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Future Work

Jan 2010

Sept 2014

● Ideas for the future

Page 44: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Detecting and Restarting when Fabric Fails

Jan 2010

Sept 2014

● Detecting Fabric Failure

● Using Pacemaker

● Avoiding split-brain

● Re-Starting Fabric

● Providing script for restart

Page 45: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Handling Failure of MySQL Fabric and State Store

Jan 2010

Sept 2014

● Building a replicated state machine

● Multiple Fabric instance that can take over from any of

the other instances

● Uses Paxos/Raft-like implementation

Page 46: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Handling Connector Failover when Fabric goes down

Jan 2010

Sept 2014

● Alternate Fabric Addresses

– Connector has a list of potential addresses

– Try these addresses when a Fabric node fails

● Connector notices server failure

– PRIMARY failure

– SECONDARY failure

– All servers fail

Page 47: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Error reporting when Fabric goes down

Jan 2010

Sept 2014

● PRIMARY failure

– Report error to application

– Remove from cache

– Set group to read only

● SECONDARY failure

– Remove from cache

– Pick alternate server

● All server failure

– Report Error to Application

Page 48: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

Reading for the Interested

● MySQL Forum: Fabric, Sharding, HA, Utilities

http://forums.mysql.com/list.php?144

● MySQL Fabric Documentationhttp://dev.mysql.com/doc/mysql-utilities/1.4/en/fabric.html

● Migrating From an Unsharded to a Sharded Setup

http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-migration.html

● Configuring and running MySQL Fabrichttp://alfranio-distributed.blogspot.com/2014/03/mysqlfabric-installation.html

Page 49: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

Want to contribute?

● Check it– … and send us use-case and feature suggestions

● Test it– … and send comments to the forum

● Break it– … and send in bugs to http://bugs.mysql.com

Page 50: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |2014-10-02

Mats KindahlTwitter: @mkindahlhttp://mysqlmusings.blogspot.com

Alfranio CorreiaTwitter: @alfraniohttp://alfranio-distributed.blogspot.com

Keeping in Touch

Narayanan Venkateswaran

Twitter: @vn_tweetshttp://vnwrites.blogspot.com

Geert VanderkelenTwitter: @geertjanvdkhttp://geert.vanderkelen.org

Page 51: MySQL High Availability - Managing Farms of Distributed Servers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Thank You !

Jan 2010

Sept 2014

2x Engineering Staf3x QA Staf2x Support Staf


Recommended