High Availability - How to get 99.99% service availabilty - Designing clusters (DOs & DON'Ts)

Post on 12-Jan-2015

677 views 1 download

Tags:

description

Presentation at BarcampSaigon 2013 - RMIT 7th July Presenter: Lukas Rypl

transcript

High Availability

Lukas Rypl

Twitter: @LukasRypl

7th July 2013

Agenda

Intro - what and why (3 mins)

Describing Requirements (10 mins)

Solutions (7 mins)

Q&A

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 2 / 17

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 3 / 17

Customers Talking aboutRequirements

fault tolerance

fail-over solution

high availability

disaster recovery

geographic redundancy

cluster

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 4 / 17

What You Need to Know

Why they need it?

Budget

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 5 / 17

Use Only One, Some or All

Active-Active (Master-Master)

Active-Standby (Master-Slave)

Operations:

read/write

read-only

none

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 6 / 17

Use Only One, Some or All

Active-Active (Master-Master)

Active-Standby (Master-Slave)

Operations:

read/write

read-only

none

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 6 / 17

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 7 / 17

RPO and RTORecovery Time ObjectiveRecovery Point Objective

Beyond Redundancy: How Geographic Redundancy Can Improve Service Availability and Reliability of

Computer-based Systems by Eric Bauer, Randee Adams, Daniel Eustace

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 8 / 17

Availability

24x7

SLA Weekly Monthly Yearly99% 1h 40m 7h 12m 3days 15hrs99.9% 10m 43m 12s 8h 45m99.99% 1m 4m 19s 52m 33s99.999% 6s 26s 5m 15s

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 9 / 17

What You Need to Know

How to handle network partitioning?

Manual or automatic failover and recovery?

Local or geographical redundancy?

Connection parameters (BW, RTT, L2/L3)

Other goals - load balancing?

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17

What You Need to Know

How to handle network partitioning?

Manual or automatic failover and recovery?

Local or geographical redundancy?

Connection parameters (BW, RTT, L2/L3)

Other goals - load balancing?

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17

What You Need to Know

How to handle network partitioning?

Manual or automatic failover and recovery?

Local or geographical redundancy?

Connection parameters (BW, RTT, L2/L3)

Other goals - load balancing?

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17

What You Need to Know

How to handle network partitioning?

Manual or automatic failover and recovery?

Local or geographical redundancy?

Connection parameters (BW, RTT, L2/L3)

Other goals - load balancing?

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17

What You Need to Know

How to handle network partitioning?

Manual or automatic failover and recovery?

Local or geographical redundancy?

Connection parameters (BW, RTT, L2/L3)

Other goals - load balancing?

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17

Solutions - Layers

High Availability and Disaster Recovery: Concepts, Design, Implementation by Klaus Schmidt

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 11 / 17

Solutions - Hardware

disks: RAID 1, 10, 5, ...(controller with BBWC)

power supply

network interface: teaming/bonding

out-of-band management(HP iLO, Dell DRAC, IBM RSA)

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 12 / 17

Solutions - Data

snapshots

filesystem: drbd

NAS, SAN

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 13 / 17

CAP (Brewster’s) theorem

http://blog.rizzif.com/2011/08/31/intro-to-nosql/

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 14 / 17

Solutions - Databases

MySQL - Master-Master replication

PostgreSQL - Master-Slave, 3rd party forMaster-Master

Oracle RAC

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 15 / 17

Solutions - CRM

corosync + pacemakerresources (application, IP addresss, ...)rules (where, when, what requires)master-slave

Partitioning not allowedSTONITH required

www.clusterlabs.org

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 16 / 17

Q & A

Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 17 / 17