MySQL Cluster Availability -...

Safety-Critical Embedded Systems

Course No 02228, Fall 08

Technical University of Denmark

Dept of Informatics

MySQL Cluster Availability

Baldur Þór Emilsson (s081854)Evangelos Katsikaros (s080997)

Paul Pop (Supevisor)

February 15, 2009

Abstract

MySQL is an RDBMS that also o�ers a cluster version of the system. Besidesfast access under high throughput conditions, MySQL Cluster provides afault tolerant architecture to achieve high data availability. The purposeof this study is to propose a modelling of the cluster in order to study itsavailability.

Contents

1 Introduction 2

2 Introduction to MySQL 2

2.1 Overview of MySQL . . . . . . . . . . . . . . . . . . . . . . . 22.2 Storage Engines . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 The MySQL Cluster 5

3.1 MySQL Cluster and NDB storage engine . . . . . . . . . . . . 63.2 Cluster architecture . . . . . . . . . . . . . . . . . . . . . . . 73.3 Replication and Partitioning . . . . . . . . . . . . . . . . . . . 73.4 Points of Failure . . . . . . . . . . . . . . . . . . . . . . . . . 103.5 Recovery from Failures . . . . . . . . . . . . . . . . . . . . . . 12

4 Using MySQL Cluster 12

4.1 Introduction to QEMU . . . . . . . . . . . . . . . . . . . . . . 134.2 Creating the VMs and Installing MySQL . . . . . . . . . . . . 134.3 Con�guring MySQL Cluster . . . . . . . . . . . . . . . . . . . 134.4 Running the Cluster . . . . . . . . . . . . . . . . . . . . . . . 15

5 Modelling the Cluster 15

5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Method used . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.3 General model . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Model evaluation 19

7 Conclusion 19

References 22

A Conventions Used 23

B Installation 24

B.1 QEMU Virtual Machines . . . . . . . . . . . . . . . . . . . . . 24B.2 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . 24B.3 VMs and networking . . . . . . . . . . . . . . . . . . . . . . . 25B.4 MySQL: Installation and Start-up Con�guration . . . . . . . 25

C Con�guration of the Cluster's Nodes 27

D Starting the Cluster 29

1

1 Introduction

The goal of this report is to study the MySQL Cluster database systemand, particularly, how to achieve the highest availability of a database usingthat technology. On the o�cial website of the MySQL Cluster it is statedthat the "MySQL Cluster provides a fault tolerant architecture that ensuresyour organization's mission critical applications achieve 99.999% availability.[That] means less than 5 minutes downtime per year, including scheduledmaintenance time."

It is in particular this number, 99.999%, that will be the main focus ofthe report. What characteristics of the named architecture are importantto the availability of the database, what possible hazards are there for arunning system to fail to respond and which is the best strategy to keep theavailability as high as possible while keeping the size and structure of thesystem within reasonable bounds?

This report is organized as follows. In section 2 we give an overview ofthe MySQL RDBMS. In section 3 we discuss in depth the MySQL Clustergiving an overview of the architecture, analyzing what can go wrong and howthe Cluster can recover from failures. In section 4 we present our experienceusing the cluster. In section 5 we present the modelling of the Cluster and insection 6 we evaluate the model. Finally, we sumarize the report in section7

2 Introduction to MySQL

This chapter provides an overview of the MySQL RDBMS system, focusingon the parts of its architecture that are needed in order to understand theMySQL Cluster. In section 2.1 we give an overview of the RDBMS. In section2.2 we introduce the concept of storage engines and how they are used inMySQL.

2.1 Overview of MySQL

MySQL is an Relational Database Management System (RDBMS), devel-oped by �MySQL AB�, currently owned by �Sun Microsystems, Inc� [15].The application o�ers two versions: community and enterprise. The enter-prise version is a commercial product, whereas the community version isopen source, and is released under the GPL licence. The release cycle ofthe community version is not strictly de�ned, however it incorporates thelatest patches and �xes of the enterprise edition. The current productioncommunity version is 5.0, and the release candidate is 5.1.

MySQL is a client/server RDBMS, but also o�ers an embedded version.For this study we are going to focus on the client/server architecture. An ab-straction of the way the client interacts with the MySQL server is presented

2

Figure 1: The client/server architecture

in Figure 1. The dashed lines represent the time and 1, 2, 3 and 4 representthe steps of the client/server transaction; the number represents the orderof the steps. The client connects to the server and it uses SQL queries inorder to read and write from the database. The client's query might be forexample "SELECT * FROM table1". The server receives the query andtries to execute it. If the execution is successful it will return the data ofthe table, if not it will return an error; for example the user might not havethe privileges to perform SELECT on this table. The server then sends backto the client the result of the query. The client then uses the results of thequery to perform the actions it is designed to perform. Examples of usingthe results include the presentation of table's data in a nice GUI or in thecommand prompt interface, or manipulation of the data to do calculations.

2.2 Storage Engines

When looking in more detail at what happens on the server side we see thatthe following procedure takes place (Figure 2). The SQL query is parsed

3

Figure 2: Abstract overview of MySQL server and storage engines

and then the server decides what data it needs to read/write. Each databasemanaged by the server has tables and each table has a speci�c storage engine.Once the server has �understood� what actions must be performed, in orderto execute the query it asks the storage engine of the table to perform theseactions. The storage engine is then accessing the �lesystem and performsthe actions. The existence of a particular storage engine is transparent tothe client. The only cases where the client will have to deal with storageengine details is during the creation of the table, when it has to choose astorage engine, and if an SQL query is issued which contains SQL commandsnot supported by a storage engine (this will become more clear in the nextparagraphs). In this abstraction we intentionally don't cover issues like au-thentication, caching, optimization of execution etc. in order to keep theabstraction simple.

4

So, the storage engines are an interface, of the actual reading/writing tothe �lesystem. This allows the server not to worry about how to actuallyread/write the data and focus instead on handling it correctly.The actualread/write is performed by the storage engine.

There is a variety of storage engines, which can be helpful in case thatthe read/write of the data must meet speci�c requirements. For example,one might need a transaction-safe environment, where a set of read/writeSQL queries is applied only if all the queries of the set are successfully com-pleted. So, a storage engine that o�ers transactions can be used. Of course,the implementation of transactions adds an overhead due to the checks theRDBMS performs, for each SQL query of the set. However, in another case,transactions might not be needed and the overhead they impose may becausing performance problems. In order to bypass the performance issues,there is the ability to choose a storage engine that does not o�er transactions.Another example is the ARCHIVE storage engine. With this storage engineone can write data in the database but the data can neither be deleted norchanged, once they are written in the database (using SQL). This can behelpfull if we use the database to store logs or other data, and we want tomake sure that they will never be altered in the future.

So, in order to get back to the client transparency issue, the client shouldbe aware of the storage engine each table is using. But this is only needed sothat it doesn't experience errors if it tries to issue SQL queries on a table thatuses a feature not supported by the storage engine of that table. MySQLprovides a wide variety of built-in storage engines. Third-parties provideextra storage engines and it is also possible to implement a custom one anduse it.

MySQL also supports clustering. This capability is implemented in astorage engine called NDB. Instead of just reading/writing to a speci�c localor remote �lesystem, NDB is able to replicate the data in several nodes, thusproviding high availability and performance and ability to sustain a numberof nodes failing without interrupting the availability of the database. TheNDB storage engine is covered in depth in the next chapter.

In this chapter we presented the MySQL RDBMS and more precisely therole of storage engines in the client/server architecture and reading/writingof data. The storage engine we are going to focus on, is the NDB storageengine. NDB is responsible for adding clustering capabilities to MySQL andit is going to be covered in depth in chapter 3.

3 The MySQL Cluster

This chapter provides an overview of the MySQL Cluster system. In section3.1 we brie�y introduce the NDB storage engine. In section 3.2 we presentthe architecture of the cluster. In section 3.3 the way the data is replicated

5

and partitioned is explained. In section 3.4 we present possible points offailure and in 3.5 recovery from these failures.

3.1 MySQL Cluster and NDB storage engine

In 2003 MySQL AB acquired Alzato, a company started by Ericsson. Alzatohad been developing and marketing the "NDB Cluster", a high availabilitydata management system designed for the telecom/IP environment. Thegoal was to integrate the NDB Cluster technology into the existing MySQLRDBMS providing a clustered environment [11]. MySQL Cluster �rst ap-peared in the production version 4.1.10 (2005 era) and has been heavilydeveloped since then [5].

MySQL cluster is an implementation that enables the clustering of databasesin main memory using a shared-nothing architecture. This means that thedi�erent parts of the cluster share no common resources at all, such as com-mon hard drives or common memory. Each part of the cluster is independentand can be based on inexpensive hardware.

The part of the MySQL Cluster that enables this in-memory clusteringis a speci�c storage engine called NDB. Since NDB is a storage engine itsinternals are not visible to the database user. From the user point of view,the creation and manipulation of clustered databases and tables is the sameas with non-clustered ones. The only thing that changes is that the clustereddata is inside tables and databases that use the NDB storage engine, insteadof another storage engine. On the other hand, from the database adminis-trator point of view, there are a lot of changes since he has to install andcon�gure the appropriate software to the nodes that comprise the cluster.

During the MySQL version 5.1 series, MySQL Cluster NDB began fol-lowing a somewhat di�erent release pattern than the mainline MySQL 5.1series of releases. It branched out as a di�erent product, that is integratingthe latest MySQL changes and also follows a di�erent version system [6].

Each MySQL Cluster has a simple and a full version identi�er. The sim-ple is the version of the NDB storage engine and the full version is comprisedof two parts:

• the MySQL server version, on which MySQL Cluster was based

• and the version of the NDB storage engine.

For example, the simple version is "MySQL Cluster NDB 6.2.15" andthe full version is "mysql-5.1.24 ndb-6.2.15". From this information, we canunderstand that:

• MySQL Cluster NDB 6.2.15 derives from the MySQL 5.1.24, and con-tains all feature enhancement and bug�xes from MySQL 5.1 up to andincluding MySQL 5.1.24.

6

• MySQL Cluster NDB 6.2.15 uses version 6.2.15 of the NDB storageengine.

3.2 Cluster architecture

A MySQL cluster is a set of computers, each running one or more processes.These processes may include an NDB API node (propably a MySQL server),a data node and a management node [2], [6].

A Management node (MGM node) is a special node that manages theother nodes of the cluster. Management actions include stopping nodes,running backups, con�guring other nodes etc. Since a management nodeis responsible for the con�guration of other nodes, it must be started �rst,before any other node. The process of a management node is ndb_mgmd.

The data node is where data is stored. In order to achieve redundancy,more than one data node must be deployed. The process of a data node isndbd.

The API node is a node that uses the NDB API in order to access thedata (which is stored in the data nodes). We can divide the API nodes intwo categories. This categorization simply helps the understanding of thecluster architecture, serving "educational" reasons:

• an SQL node, which is a typical MySQL server, which uses the NDBstorage engine. The NDB storage engine uses the NDB API, in orderto access the data. The process of a MySQL server is mysqld

• any other application that accesses the data of the cluster by using theNDB API.

Figure 3 summarizes the above description. Any typical MySQL clientcan connect to a MySQL server that is using the NDB storage engine, sincethe usage of a storage engine is transparent to the client. In order to start theoperation of the cluster a Management Node must be started �rst, followedby the data nodes and �nally the SQL nodes. The management node isresponsible for the managing of other nodes (the SQL nodes and the datanodes). The data is stored on the data nodes; the speci�cs of the procedureis described in section 3.3. The SQL nodes receive SQL queries from theclients, then they read/write from the data nodes, and return the result ofthe queries back to the clients. Instead of a typical client, we can also haveclients that make use of the NDB API directly, like an NDB managementclient.

3.3 Replication and Partitioning

In the previous section we introduced the architecture of the cluster and wedescribed the data �ow from the clients through the SQL nodes and �nallyto the data nodes. In Figure 3, the data nodes were presented, without

7

Figure 3: Overview of the MySQL Cluster architecture

8

Figure 4: Example of partitioning the data

giving any further information about the way they work. In this section weare going to dive into the data nodes and see how the data is replicated andpartitioned.

In order to present the internals of the data nodes, we will use the fol-lowing terms:

• Partition: A portion of the data stored in the cluster.

• Replica: A copy of a partition. Each partition has one primary replicaand a number of backup replicas. In case the primary replica is on anode that fails, one of the remaining backup replicas is rendered theprimary.

• Node Group: a node group consists of one or more data nodes.

The data of the database is divided into partitions. Each partition hasa number of replicas which are stored in the data nodes. For each partitionthere is a primary replica and a number of backup replicas [7].

For example, let's suppose our data is divided into 4 partitions; P0, P1,P2 and P3 (see Figure 4). Each one of the four has one primary replica andwe can decide the number of backup replicas we want to introduce (howeverthere is a limit to this number). Every data node is responsible for keepinga number of primary replicas and a number of backup replicas.

During the con�guration of the cluster the administrator de�nes twoimportant parameters:

9

• the number of data nodes per node group

• and the total number of data nodes. Each data node has an ID toidentify it.

These two numbers control how the other terms, we mentioned above, arecon�gured, and they are con�gured according to the following rules:

• Each node group has a number of data nodes (currently only 1 or 2is supported [10]) and all node groups have the same number of datanodes [8]. The node groups are formed implicitly depending on theorder of the declaration in the con�guration �les and N, where N is thenumber of data nodes per node group. The �rst node group is formedby the set of the �rst N data nodes, the next node group by the set ofthe next N data nodes, and so on.

• The number of partitions is equal to the number of data nodes partic-ipating in the cluster [8].

• The number of replicas is equal to the number of nodes per node group[8].

• Each data node should be located on a separate computer [8]. Eachdata node can keep more than one primary replica and more than onebackup replicas [4].

We give an example in order to make things more clear (see Figure 5).Let's suppose that we have 4 machines used as data nodes in the cluster(Data Node 0, Data Node 1, ..., Data Node 3). This means that our datacan be divided in 4 partitions, P0, P1, P2 and P3. If we decide that eachNode Group has 2 nodes, then the 4 Data Nodes are implicitly divided in2 Node Groups (Node Group 0, Node Group 1). So, the number of replicasof each partition is 2. Each partition has a primary replica (replica 0) anda backup replica (replica 1). For example �P0 - replica 0� is the primaryreplica of P0, and �P0 - replica 1� is the backup replica.

What have we achieved so far? We are able to sustain the loss of onedata node and still be able to access the whole database. For example if adata node, from Node Group 0, let's say Data Node 1, becomes unavailablethen we can still access the replicas of P0 and P1 from Data Node 1. On topof that, we can sustain the loss of another node from Node Group 1, let'ssay node 3. We are still able to access the replicas of P2 and P3 from DataNode 2.

3.4 Points of Failure

The Cluster can fail in several di�erent ways depending on what part of itgoes wrong. If an SQL server dies or stops handling the queries it can simply

10

Figure 5: Example of data nodes structure

be restarted and connect to any data node when it is back up. Other serversand programs could still query the storage nodes so that gives an optionfor redundancy. The second part is the management node but that does nothave to be operational most of the time so it too can be restarted without anydamage. The only time a management node is critical is when the system ischanged, so it could lead to greater issues if other parts of the cluster hadfailed and would have to be brought up again while the management node isdown. There is also an option for a secondary management node that couldtake over if the primary one is not operational when it is needed.

A data node can stop responding for a number of reasons. There canbe network problems, the load on the node can be too high or it couldsimply crash. When it stops responding for any reason the node trying tocommunicate with it noti�es the rest of the cluster and the faulty node isclassi�ed as failed. There is a possibility that the node will become availableagain automatically, e.g. if there is a network problem that is resolved, andthe node will then simply synchronize its data and resume its position withinthe cluster. Otherwise it has to be restarted manually.

One node failure is easy to deal with but if two or more nodes fail at thesame time the situation can become complicated. If all the nodes are stillrunning but do not have a connection between them it is possible that we endup having many di�erent instances of the whole database. Imagine a setupwith data nodes S1...S4 in node groups {S1, S2} and {S3, S4}. We thenexperience some network problems so that S1 sees only S3 and S2 sees onlyS4 and vice versa. If more than one SQL node is connected to the cluster

11

we might begin to experience desynchronization when they send di�erentqueries to di�erent nodes in each node group. To prevent this we run anarbitration service on the management node and the SQL nodes which isconsulted each time one or more nodes go down.

If all nodes in a node group become unavailable the cluster shuts downand needs to be restarted from scratch.

3.5 Recovery from Failures

The recovery process is not the same for all nodes, as said before the SQLnodes and the management nodes can be brought up again without any extraprecaution. When data nodes go down, however, measures need to be takento keep the data on the cluster synchronised.

When one data node goes o�ine and is restarted it begins with updatingits partitions to match those of the other nodes in its nodegroup. As long asat least one node in each group is functional this process can bring all nodesback online but as soon as one node group has failed completely the wholecluster stops and has to be restarted. This introduces new problems as thecluster is typically run in memory and the data is lost if all the machines godown. Regular checkpointing is used to save the state of the database andthe checkpoint can be used to restore it in memory after a crash. In the timefrom the last checkpoint and up to the crash a redo log is kept on each nodethat is used to update the restored database to include all changes from thecheckpoint.

4 Using MySQL Cluster

The best way to familiarize with a software implementation and understandthe way it works is to acquire hands on experience. So, we installed, con�g-ured and used the MySQL Cluster. The problem we encountered is that inorder to run the cluster we need one machine for each node of the cluster. So,for a simple setup we would need six to eight machines. Since we didn't haveaccess to that many machines we decided to run the cluster on Virtual Ma-chines (VMs), which were installed on one machine. In section 4.1 we brie�yintroduce QEMU which was used for the VMs. We describe the installationprocedure in 4.2, the con�guration in section 4.3 and a demonstration ofrunning the cluster in section 4.4. The above sections provide an outlineof the procedures followed. Speci�c details and instructions concerning theinstallation can be found in Appendix B, concerning the con�guration inAppendix C and concerning the way the cluster is launched in D.

12

4.1 Introduction to QEMU

QEMU is a �generic and open source machine emulator and virtualizer� [14].QEMU gives us the opportunity to run di�erent OSes on our own PC, thusproviding virtualization of an OS, as long as the OS supports x86 compatibleprocessors. We de�ne as �host� is the machine that runs the VMs (our PC)and as �guests� the VMs. QEMU provides emulation of peripherals likenetwork cards, USB ports, optical drives etc.

In order to use QEMU we �rst create a big �le on the host. This �le isgoing to be used as the hard disk of the guest. Then we can install on thisvirtual hard disk the OS of our liking. We used an Intel machine which runsUbuntu as the host and Debian for the OS of the clusters VMs.

A total number of 8 guest VMs were created:

• switch: a virtual switch that connects all the other VMs, as if theywere connected on a switch. We are not using any �rewall or routingcapabilities on the guest OS. Instead we use QEMU's network abilitiesto make this work.

• tserver: the VM running the MySQL server.

• tclient: the VM running a MySQL client application, used to demon-strate the usage of the cluster.

• tmgm: the VM running the NDB management node

• dn0, dn1, dn2, dn3: the VMs running Data Node 0, 1, 2, 3.

4.2 Creating the VMs and Installing MySQL

The OS installed on the guest VMs is Debian 4.0r5 [1], and more speci�callythe version that supports Intel x86 architecture. Each VM was given a virtualhard disk of 1G, access to the network card and 256M of RAM. Table 1presents the software installed on each of the VMs, after the installation ofthe OS.

More details on the creation of the VMs, the installation of the OS onthe VMs and the installation of the software needed to run the cluster ineach VM can be found in Appendix B.

4.3 Con�guring MySQL Cluster

The structure of the cluster is presented in Figure 6.Each VM can have a number of VLANs connected with it. When launched,

the switch VM directs QEMU to listen on a TCP port on the host machineand connects its own VLAN to this port. The other VMs, when launched,connect their VLANs to this port. The result is that the VMs can commu-nicate as if they were all connected on a switch. The reason the switch is

13

VM IP Software installed - process running

switch 192.168.10.1 nonetserver 192.168.10.2 MySQL - mysqld: port 3306tmgm 192.168.10.3 MySQL - ndb_mgmd: port 1186dn0 192.168.10.10 MySQL - ndbd: port 1186dn1 192.168.10.11 MySQL - ndbd: port 1186dn2 192.168.10.12 MySQL - ndbd: port 1186dn3 192.168.10.13 MySQL - ndbd: port 1186tclient 192.168.10.200 MySQL - mysql (command line client)

Table 1: The software that was installed and the MySQL processes that wereused in each VM

Figure 6: The structure of the virtual cluster

a separate VM, is that it gives us the �exibility to shut down (gracefully ornot) any of the nodes of the cluster, but they can still communicate with eachother, thus facilitating experiments with the cluster. One thing we haven'timplemented is to setup a �rewall on the switch machine, so that we canblock the communication between speci�c VMs. This would have allowed usto simulate network problems.

We con�gured the SQL node and the data nodes, so that they were awareof the management node. The management node was con�gured so that itwas aware of the SQL node, the data nodes and the number of replicas perNode Group. The details of the con�guration can be found in Appendix C.

14

4.4 Running the Cluster

After the cluster had been con�gured and launched we were able to use it.More details on starting the cluster can be found in Appendex D.

We ran the cluster with the most simple con�guration available, two datanodes in one Node Group, successfully. However, we had problems runningthe cluster with more data nodes and two nodes per Node Group, or with 2or more nodes and one node per Node Group. We tried to �nd the solutionto the problem but had no luck. So, we were not able to try a complexand closer to real world scenario. Moreover, we weren't able to make anybenchmarks.

5 Modelling the Cluster

In this section, we present an e�ort to model the availability of the cluster.In order to do this we make some assumptions

• We mainly focus on the data nodes, so our model will not includeinformation about multiple management or SQL nodes.

• We model the cluster, under the assumption that each Node Grouphas 2 data nodes (currently the maximum number of nodes per NodeGroup).

In 5.1 we present related work on the issue. In 5.2 we give some examplesof Markov chains for speci�c number of data nodes and in 5.2 we describethe general model of a Markov chain for the data nodes of the cluster.

5.1 Related Work

To get an idea of how the cluster can be modelled we looked for other reportsthat addressed the same problem. Our search turned out one such report[13] about the Sun Cluster, which is a software from Sun Microsystems Inc.designed to bring high-availability to the Solaris Operating System. Thisreport presented a simple Markov model of the cluster followed by testingresults. These results were applied to the model and used to calculate bestpractices to run the cluster.

We used the presented model as a foundation of our Markov chains, fromwhich they diverged and evolved.

5.2 Method used

We are going to see three speci�c examples of Markov chains, for a clusterwith two data nodes, four data nodes and six data nodes. In the �gurespresenting the states of the cluster, we note each state as a circle, the numberof the state outside the circle with bold typeset, and inside the state we note

15

the number of nodes that are �up� and running (for example 4up) and with1 or 0 we note whether the cluster is functional or not. Moreover, λ denotesthe constant failure rate and µ the constant repair rate of the data nodes.

Figure 7: Two Data Nodes Markov Chain

As we can clearly see in Figure 7, the 2 data node structure providesminimum fault tolerance. The cluster can sustain the loss of one node (state1). When the other node fails, the whole cluster fails too (state 0).

Figure 8: Four Data Nodes Markov Chain

In the case of four data nodes (Figure 8) , the cluster is able to sustain theloss of a number of nodes, thus providing fault tolerance. When consisting of4 nodes it can lose any of the four nodes (state 5). When three nodes remain(state 4), it can either lose a node from the other Node Group or from thesame as before. If the two nodes of one Node Group are lost the cluster isnot functional (state 3). However, if the cluster loses a node from the otherNode Group then it can still be functional (state 2). When one more nodeis lost then the cluster is not functional (state 1). If the remaining node fails(state 0) then nothing changes since the cluster is already down.

The case of six data nodes (Figure 9) follows closely the case of the fourdata nodes, so we are not going to describe it. However it can help the

16

Figure 9: Six Data Nodes Markov Chain

reader understand the general Markov chain model, that we present in thefollowing paragraph.

5.3 General model

In this section we are going to present the general model of a Markov chainthat can describe the states of all the possible numbers of data nodes. Inorder to �nd a generic model we work as following:

Number of states First, we �nd the number of states of the chain. Byobserving, di�erent Markov chains for di�erent number of data nodes (Fig-ure 10) we notice that the following equation can be used to calculate thenumber of states:

S =3N2

where S the total number of states of the Markov chain, and N the totalnumber of data nodes (which is always a multiple of 2). The numbering ofthe states begins with zero, so in the rest of the section S is denotes thetotal_number_of_states− 1

Figure 10: The way the Markov Chain �grows�

17

Number of nodes �up� in each state By observing di�erent Markovchains for di�erent number of data nodes (Figure 10) we notice a way tocalculate the number of nodes for each state. We let U : {0, 1, 2, . . . , S} →{2, 4, . . . , N} denote the function that calculates the number of nodes thatare up and running in each state, from the number of the state, where S thetotal number of states of the Markov chain, and N the total number of datanodes (which is always a multiple of 2). U(s) is de�ned as:

U(s) =

s if s ∈ {0, . . . , N

2}

N

2+ ds/2e − 2 if s ∈ {N

2+ 1, . . . , S − 2}

N − 1 if s = S − 1N if s = S

The model The Markov Chain for the data nodes is presented in Fig-ure 11. S is the total number of states in the Markov chain and i is the num-ber of nodes that are up in each state. For the state s, s ∈ {0, 1, 2, . . . , S}we have the following di�erential equations describing the Markov Chain:

dPs(t)dt

=

λP1(t)−NµP0(t) if s = 0

U(s+ 1)λPs+1(t) +(N − U(s− 1)

)µPs−1(t)

−(U(s)λ+

(N − U(s)

)µ)Ps(t) if s ∈ {1, . . . , N

2}

U(s+ 2)λPs+2(t) +(N − U(s− 1)

)µPs−1(t)

−(U(s)λ+

(N − U(s)

)µ)Ps(t) if s =

N

2+ 1

U(s+ 2)λPs+2(t) +(N − U(s− 2)

)µPs−2(t)

+(N − U(s− 1)

)µPs−1(t)−

(2U(s)λ+ (N − U(s))µ

)Ps(t)

if s ∈ {N2

+ 2,N

2+ 4, , . . . , S − 1}

U(s+ 1)λPs+1(t)− U(s)µPs(t)

if s ∈ {N2

+ 3,N

2+ 5, . . . , S − 2}

U(s+ 1)λPs+1(t) +(N − U(s− 2)

)µPs−2(t)

+(N − U(s− 1)

)µPs−1(t)−

(2U(s)λ+ (N − U(s))µ

)Ps(t)

if s = S − 1

1µPS−1(t)−NλPS(t) if s = S

18

From the di�erential equations we can produce the probabilities of eachstate and use them to calculate the availability. The equations might lookintimidating at �rst but are actually quite simple to use, and are trivial for asmall number of nodes. For example it can be easily manually veri�ed thatif N = 6 (Figure 9) or N = 8 the Markov chain is constructed in the properway.

The availability A(t) of the data nodes is calculated by:

A(t) = PS(t) +∑

Pk(t)

where S the total number of data nodes, and k ∈ {N2

+ 3,N

2+ 5, . . . , S −

1} ∪ {S}.

6 Model evaluation

The model we presented has some advantages as well as limitations. Themain advantage of the model is that it successfully models the data nodesof the cluster and that it is usable.

Among the limitations of the model is that it only focuses on the datanodes. However, a real high-tra�c production system will probably havemore than one MySQL server (SQL nodes), more than one managementnodes and probably a proxy server that will distribute the client tra�c tothe available MySQL servers. The model we presented doesn't take theexistence of these nodes into account.

Moreover, the model doesn't take into account that the cluster needssome readjustment when a node fails, and that it also needs readjustmentwhen a node recovers. The period during which the cluster makes the properreadjustments is not covered and might be signi�cant for critical applications.

Finally, something that is not a drawback but we must take into account,since it applies to all the models that we might develop. In order to use themodel, we must take measurements for the λ and µ. Such a benchmark canalso be implemented for the data nodes only.

As a conclusion, the model we presented is a good start, and it certainlyhas room for development. Moreover, for the available model or for a con-crete generic model, that includes the other parts of the cluster, we mustproperly de�ne a benchmark and take measurements. We think that thereport covered su�ciently its goals, since it paved the way for further workand in the same time presented concrete and usable results.

7 Conclusion

The problem that this report had to deal with was the availability of theMySQL Cluster and the way we could actually calculate it. In order to solve

19

Figure 11: The general Markov Chain for the data nodes of the cluster

20

this problem we presented the way the cluster works as well as the procedurewe followed to actually install it. Then by following the available documen-tation and the working example we focused on one part of the cluster, thedata nodes, where the data is actually stored. For this part of the cluster wedescribed a Markov chain that can represent the states of the data nodes,for any number of data nodes, thus serving as a generic model.

We managed to make a �rst concrete step towards the formulation of theMySQL Cluster's availability by modeling one important part of it. So, theproject is considered successful, but it has more room for improvement thusopening the way for future work including the modeling of all the parts ofthe cluster and running well de�ned benchmarks to measure the failure andrecovery rates of the nodes of the cluster.

Finally, this project gave us the opportunity to work with an enterpriselevel cluster application and put the course's theory into practice, thus ac-quiring valuable hands on experience on the cluster as well as a better un-derstanding of the theory involved.

21

References

[1] Debian. Debian gnu/linux 4.0 updated. http://www.debian.org/

News/2008/20081023, 2008.

[2] MySQL Documentation. The mysql cluster api developers' guide. http://downloads.mysql.com/docs/ndbapi-en.a4.pdf, 2008.

[3] QEMU forum users. The qemu forum. http://qemu-forum.ipi.fi/,2008.

[4] MySQL Internals mailing list. Re: Mysql cluster availability. http:

//lists.mysql.com/internals/36074, 2008.

[5] MySQL Manual. B.1.15. changes in mysql 4.1.11 (01 april 2005). http://dev.mysql.com/doc/refman/4.1/en/news-4-1-11.html, 2005.

[6] MySQL Manual. 17 mysql cluster. http://dev.mysql.com/doc/

refman/5.1/en/mysql-cluster.html, 2008.

[7] MySQL Manual. 17.1.1. mysql cluster core concepts. http://dev.

mysql.com/doc/refman/5.1/en/mysql-cluster-basics.html, 2008.

[8] MySQL Manual. 17.1.2. mysql cluster nodes, node groups, repli-cas, and partitions. http://dev.mysql.com/doc/refman/5.1/en/

mysql-cluster-nodes-groups.html, 2008.

[9] MySQL Manual. 17.2. simple multi-computer how-to. http://dev.

mysql.com/doc/refman/5.1/en/mysql-cluster-multi-computer.

html, 2008.

[10] MySQL Manual. 17.3.4.5. de�ning data nodes. http://dev.mysql.

com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html,2008.

[11] MySQL. Mysql ab acquires alzato. http://www.mysql.com/

news-and-events/generate-article.php?id=2003_30, 2003.

[12] MySQL. Mysql cluster architecture overview - high availability featuresof mysql cluster. http://www.mysql.com/why-mysql/white-papers/,2008.

[13] Ira Pramanick. Modeling sun cluster availability. www.sun.com/

blueprints/1202/817-0905.pdf, 2002.

[14] QEMU. Qemu. http://bellard.org/qemu/, 2008.

[15] SUN. Sun microsystems announces agreement to acquire mysql. http://www.sun.com/aboutsun/pr/2008-01/sunflash.20080116.1.xml,2008.

22

http://www.debian.org/News/2008/20081023

http://www.debian.org/News/2008/20081023

http://downloads.mysql.com/docs/ndbapi-en.a4.pdf

http://downloads.mysql.com/docs/ndbapi-en.a4.pdf

http://qemu-forum.ipi.fi/

http://lists.mysql.com/internals/36074

http://lists.mysql.com/internals/36074

http://dev.mysql.com/doc/refman/4.1/en/news-4-1-11.html

http://dev.mysql.com/doc/refman/4.1/en/news-4-1-11.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-basics.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-basics.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-nodes-groups.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-nodes-groups.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-multi-computer.html



http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html

http://www.mysql.com/news-and-events/generate-article.php?id=2003_30

http://www.mysql.com/news-and-events/generate-article.php?id=2003_30

http://www.mysql.com/why-mysql/white-papers/

www.sun.com/blueprints/1202/817-0905.pdf

www.sun.com/blueprints/1202/817-0905.pdf

http://bellard.org/qemu/

http://www.sun.com/aboutsun/pr/2008-01/sunflash.20080116.1.xml

http://www.sun.com/aboutsun/pr/2008-01/sunflash.20080116.1.xml

A Conventions Used

The following typographical conventions are used in the report:

> foo

This style suggests that a command �foo� must be executed on the shell ofa VM or the host. The symbol �>� represents the prompt of the shell.

> foo

bash: foo: command not found

If furhter text follows and the lines don't start with the symbol �>� thethis text is either output of the command, or input. This will be statedexplicitly or it will be clear from the context.

When there is no name before the symbol �>� it means that the commandmust be executed on host machine (our PC).

When there is a name before the symbol �>� it means that the commandmust be executed on the name of the VM before the symbol

server> foo

bash: foo: command not found

When the name before the symbol �>� is not a VM's but a MySQL orNDB clent like mysql, ndb_mgmd, etc. it means that the command mustbe executed on the appropriate client of the mentioned VM.

server> mysql -u root

mysql> SHOW ENGINES;

23

B Installation

This chapter describes the procedure followed in order to install the virtualcluster. The machine we used as a host in order to install the VMs wasrunning Ubuntu Linux 8.04.

> uname -a

Linux host_name 2.6.24-21-generic #1 SMP Tue Oct 21 23:09:30 UTC

2008 x86_64 GNU/Linux

In section B.1 we describe the creation of the QEMU Virtual Machines(VMs) and the way the OS is going to be installed. In section B.2 the stepsof OS installation are presented. In section B.3 we show how we setup thevirtual network, so that the VMs are able to communicate with each other.In B.4 we give the outline of the installation of the MySQL software.

B.1 QEMU Virtual Machines

We create the virtual hard disk on the host with the �qemu-img� tool, thatuses the QEMU native �le format:

> qemu-img create name_of_the_machine.img 1G

where name_of_the_machine is one of: server, client, mgm, dn0, dn1, dn2,dn3, switch.

The OS we installed is Debian Linux, 4.0r5 for i386 with the abilityto install the packages from the network. We downloaded the iso imagedebian-40r5-i386-netinst.iso. In order to install the OS, we booted the VMand commanded it to boot from the cdrom. Additionally, we asked it to loadin the cdrom drive the iso image of the OS installation disk.

> qemu -boot d -cdrom debian-40r5-i386-netinst.iso

-hda name_of_the_machine.img

where name_of_the_machine is one of: server, client, mgm, dn0, dn1, dn2,dn3, switch. After that, the installation of Debian is ready to begin.

B.2 Operating System

In this section we look at the steps of the installation that are needed inorder to recreate the environment we used in this report.

• Installation Language: English

• Choose country: Denmark. This will automatically select the closestapt mirrors.

• Choosing name of the hostname of the machine. One of switch, tserver,tclient, dn0, dn1, dn2, dn3, tmgm.

24

• Set the same domain �testnet� for all the VMs.

• We can choose automatic partition and all �les on one partition (noseperate partition for /home for example). In the end we had 900MBof ext3 and 100MB of swap.

• Use a simple root and user password, since we are going to use itrepeatedly and security is not an issue.

B.3 VMs and networking

After the installation of the OS (Debian) on the guest machines, we wantthe nodes of the cluster to be able to communicate with each other throughTCP/IP connections. QEMU can create a socket �le on the host machineand then we can connect the VMs to this socket. So, we launch the switchVM which creates a VLAN and then we specify that this VLAN will waitfor incoming connections on port 1234:

> qemu -hda switch.img -net nic -net socket,listen=:1234

A listening port is indeed created on the host machine:

> netstat -lpt

(Not all processes could be identified, non-owned process info

will not be shown, you would have to be root to see it all.)

Active Internet connections (only servers)

Proto ... Local Address Foreign Address State PID/Program name

...

tcp ... *:1234 *:* LISTEN 18830/qemu

...

We then launch the other VMs so that they connect their own VLAN tothe VLAN of the switch VM:

> qemu -hda name_of_the_machine.img -net nic

-net socket,connect=127.0.0.1:1234

where name_of_the_machine is one of: server, client, mgm, dn0, dn1, dn2,dn3.

B.4 MySQL: Installation and Start-up Con�guration

All the VMs After installing Debian we installed additional software byusing �apt� (Advanced Packaging Tool), Debian's package manager. This isperformed in all VMs.

VM> su

VM> nano /etc/apt/sources.list

25

We uncomment the line that mentions the cdrom, since we are going touse installation of packages from the network. We then install the mysqlpackage that includes server, clients and cluster support

VM> apt-get install mysql-server

We should note here that at the time of writing the �mysql-server� pack-age included the MySQL Cluster. In the future this might change, since theversioning and management of the MySQL Cluster is now separate from theMySQL Server (see section 3.1)

Finally we setup the IP of the VMs as static by editing the �interfaces��le:

VM> nano /etc/network/interfaces

auto eth0

address 192.168.10.XXX

netmask 255.255.255.0

where XXX a di�erent number for each VM (see Table 1).

Server VM We must make the following adjustments in the my.cnf �le:

• comment the line �bind-address = 127.0.0.1�, so that the mysql serveris reachable from all the network

• add the directive �ndbcluster� in the �[mysqld]� section, so that theNDB storage engine is enabled

We also added a database user �dbuser�, who is able to connect from anymachine on the network, has full privileges on the server and has no password:

server> mysql -u root

mysql> GRANT ALL ON *.* TO 'dbuser'@'\%';

We don't have to modify anything in the init.d �les.We then modi�ed the OS starup �les so that during startup only the

processes that are needed are loaded (see Table 1).

26

C Con�guration of the Cluster's Nodes

In order to con�gure the cluster we followed closely the documentation at[9], but we also present some minor extra steps we had to perform.

First of all we must stop the mysqld process from running to all the VMs(if they are not already stopped).

• SQL Node: In VM tserver, in the �[mysqld]� section of the �my.cnf� weadded:

tserver> nano /etc/mysql/my.cnf

ndbcluster # enable NDB storage engine

ndb-connectstring=192.168.10.3 # IP of the MGM node

• Data Nodes: In VMs dn0 and dn1, in the �my.cnf� �le of each node weadded:

dnX> nano /etc/mysql/my.cnf

[mysql_cluster]

ndb-connectstring=192.168.10.3 # IP of the MGM node

We must also create the directory where the DB data will be stored:

dnX> mkdir /usr/local/mysql/

dnX> mkdir /usr/local/mysql/data

• MGM Node: In VM tmgm, we created the con�guration �le of themanagament node �con�g.ini�:

tmgm> nano /etc/mysql/my.cnf

# Options affecting ndbd processes on all data nodes:

[ndbd default]

NoOfReplicas=2 # Number of replicas

DataMemory=80M # memory to allocate for data storage

IndexMemory=18M # memory to allocate for index storage

# these ^^^ are the default values

# TCP/IP options:

[tcp default]

# we do not specify port, use default instead

# Management process options:

[ndb_mgmd]

hostname=192.168.10.3 # IP of management node

datadir=/var/lib/mysql-cluster # management node log files

#--------------------------------

# one [ndbd] section per data node

27

# Options for data node A:

[ndbd]

hostname=192.168.10.10 # Hostname or IP address

datadir=/usr/local/mysql/data # Directory for this data node's data files

# Options for data node B:

[ndbd]

hostname=192.168.10.11 # Hostname or IP address

datadir=/usr/local/mysql/data # Directory for this data node's data files

#--------------------------------

# SQL node options:

[mysqld]

hostname=192.168.10.2 # IP of SQL node

28

D Starting the Cluster

In order to start the cluster we must start the nodes in a speci�c order.Each cluster node process must be started separately, and on the VM whereit resides.

1. MGM Node:

tmgm> ndb_mgmd -f /var/lib/mysql-cluster/config.ini

2. Data Nodes: In VMs dn0 and dn1, in the my.cnf �le of each node weadded:

dnX> ndbd

3. SQL Node: In VM tserver, in the mysqld section of the my.cnf weadded:

tserver> /etc/init.d/mysql start

If the cluster has been set up correctly and the processes encounteredno problems, the cluster should be operational at this stage. We can verifythat by running the �ndb_mgm� client on the management node, to see thestatus of the cluster. Figure 12 shows the output of the SHOW commandon the client. We can see that the MGM node, API node (the mysql server)and 2 data nodes (which belong to Node Group 0) are reported:

Figure 12: The cluster runs successfully with 2 data nodes!

29

Date post:	03-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

MySQL Cluster Availability -...

Documents