Download - SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

SUSE® Linux EnterpriseHigh Availability

Kai Dupke

Senior Product Manager

SUSE Linux Enterprise Server

[email protected]

Kristoffer Grönlund

Senior Software Engineer

HA Architect

[email protected]

SUSE High Availability:Easy, Quick, Anywhere

● Overview

● High Availability

● Geo Cluster

● Roadmap

3

Topics

ChallengeSUSE® Linux Enterprise High Availability

● Murphy’s Law is universal

● Faults will occur– Hardware crash, flood, fire, power outage, earthquake

● Service outage and loss of data– You might afford a five second blip, but can you afford a longer outage?

● Can you afford low availability?

HA or no HA?

Quis custodit custodes?SUSE® Linux Enterprise High Availability

Reboot instead of failing over• (virtualized) hardware needs to be available

Re-deployment instead of failing over• Monitor needs to be always available

Farmed services• Client needs to handle fail-over ('F5', SMTP)

• 3rd party application must support scale-out

• Backend needs to be available

SUSE® Linux EnterpriseHigh Availability

OverviewSUSE® Linux Enterprise High Availability

Most modern and complete open source solution for high availability Linux clusters

A suite of robust open source technologies that is• Easy to use

• Integrated

• Virtualization agnostic

Used with SUSE Linux Enterprise Server, it helps to• Maintain business continuity

• Protect data integrity

• Reduce unplanned downtime for mission-critical workloads

• Service Failover

• Cluster File Systems

• Clustered Samba

• Virtualization Agnostic

• Full support for x86, x86_64, POWER, and System z

• Network Load-Balancer

• Data Replication

• Node Recovery

• HAWK Web GUI

• Unlimited Geo Clustering

FeaturesSUSE® Linux Enterprise High Availability

SUSE unique!

TargetsSUSE® Linux Enterprise High Availability

Quickly and easily install, configure and manage

Continuous access to mission-critical systems and data

Transparent to Virtualization

Meet Service Level Agreements

Increase service availability

Key Use Cases – mission-critical servicesSUSE® Linux Enterprise High Availability

Active/active services OCFS2, Databases, Samba File Servers

Active/passive service fail-over Traditional databases, SAP setups, regular services

High availability across guests Fine granular monitoring and HA on top of virtualization

Network Load-Balancing with transparent fail-over

All Topologies Local, Metro, and Geographical area clusters

Simple Stack Enqueue Replication

DRBD Data Sync HA in Virtual Environments

Sample Use Cases - SAPSUSE® Linux Enterprise High Availability

Pharmaceutical drugs & products

Part of STADA group

Running Highly Available SAP

Reference – Ciclum PharmaSUSE® Linux Enterprise High Availability

„SUSE Linux Enterprise offers the perfect combination of flexibility and reliability.”

„100 percent uptime for SAP since the solution is live.”

„The partnership between SUSE and SAP gave us confidence.”

“SUSE Linux Enterprise High Availability Extension gives us powerful tools.”

— ANTÓNIO DAMASIT Manager

Ciclum Farma

Geo Cluster

Cluster fail-over between different locations• Provide disaster resilience in case of site failure

• Each site is a self-contained, autonomous cluster

• Support manual and automatic switch-/fail-over

Extends Metro Cluster capabilities• No distance limit between data centers

• No unified storage / network needed

Storage replicated as active / passive• Leverage SUSE included data replication (DRBD)

• Integrate third-party solutions via scripts

Geo Cluster – OverviewSUSE® Linux Enterprise High Availability

Local cluster• Negligible network latency

• Typically synchronous concurrent storage access

Metro area (stretched) cluster• Network latency <15ms (~20mls)

• Unified / redundant network between sites

• Usually some form of replication at the storage level

Geo clustering• High network latency, limited bandwidth

• Asynchronous storage replication

Geo Cluster – From Local to GeoSUSE® Linux Enterprise High Availability

Geo Cluster – SetupSUSE® Linux Enterprise High Availability

Site A Site B

(Arbitrator)

boothd

Node 1 Node 2 Node 7 Node 8

Site C

boothd boothd

Service failover at any distance – from local to geo

Up to 99.9999% availability

Rolling updates for less planned downtime

Easy setup, administration, management

Virtualization agnostic

Leading open source High Availability

Fighting Murphy's Law

When will you start?

SummarySUSE® Linux Enterprise High Availability

Roadmap

RoadmapSUSE® Linux Enterprise High Availability

SP1SP2SP3

2015 2017 2018

High Availability• Host based mirroring optimization• AWS cloud support

GEO Cluster• Virtualization for standard workloads

High Availability• HAWK GUI redesign• HA for POWER• md-cluster data mirroring

High Availability• Azure cloud support

GEO Cluster• Bootstrap support• Wizard support

SERVICE PACK 1 SERVICE PACK 2 SERVICE PACK 3

2016

This information is forward-looking and is subject to change at any time.

Recent Improvements – 12 SP1SUSE® Linux Enterprise High Availability

Hawk 2• Redesigned and updated interface

• Many new wizards

• Command log

SUSE Linux High Availability 12Service Pack 2

● Hawk 2 now default

● Hawk Batch Mode

● Pacemaker 1.15: Event-based Alerts

● Clustered RAID 1 (cluster-md)

● HAProxy 1.6

● AWS fencing agent, tool support

● Power LE

● UEFI support in ReaR

SUSE® Linux Enterprise High AvailabilityService Pack 2

26

Setup & Management

Easy to bootstrap

node1 # ha-cluster-init -i bond0 -t ocfs2 -p /dev/disk/by-id/...node[2...N] # ha-cluster-join -c node1

Web interface for cluster management & wizards

Easy Setup – Bootstrap & WizardsSUSE® Linux Enterprise High Availability

Hawk 2 – Batch Mode

Hawk 2 – History Explorer

crm shell – Cluster Scripts

# crm script run virtual-ip id=admin ip=10.13.37.98

INFO: Virtual IP

INFO: Nodes: alice, bob

OK: Configure cluster resources

# crm cfg show admin

primitive admin IPaddr2 \

params ip=10.13.37.98 \

op start timeout=20 interval=0 \

op stop timeout=20 interval=0 \

op monitor interval=10 timeout=20

● Shared device RAID-1● Avoid SAN as SPOF● High performance

See dedicated talk on cluster-md!

cluster-md

cluster-md

Outlook

Public Cloud● AWS / EC2● Azure

Geo Cluster● Bootstrap● Wizards

Interface / Tools● Hawk 2 - Fencing Topology● Hawk 2 – Alerts

Upcoming Improvements – 12 SP3SUSE® Linux Enterprise High Availability


Future – Beyond the next FrontierSUSE Linux Enterprise High Availability


Failure will occur

• How to predict & avoid failures?

Virtualization, Containers and Cloud

• Monitor from outside or inside the guests?

Local, Metro, Geo...

• What is the next new cluster scenario?

Scalability

• What is the right cluster size – 2 nodes, 20 nodes, 200 nodes, 2000+ nodes?

Usability

• What makes cluster deployment, operation, support easier?

Questions

Backup

● Remote monitoring of resources

– no HA components needed

– re-use of Nagios/icinga plugins

● Improved handling of virtual guests

– monitor virtual services from the hypervisor

– improve protection of VMs as cluster workload

– guests remain unaltered – monitoring is external

● Extends pacemaker to include the concept of “container” resources

External Remote MonitoringSUSE Linux Enterprise High Availability

38

● Core is a traditional cluster (up to 32 nodes)

● Core drives arbitrary number of remote nodes

– Remote nodes can be virtual or physical

● Remote management and monitoring

– Remote agent (pacemaker-remote) needed

– Uses resource agents & system init scripts

– More feature-rich than external monitoring

● Remote nodes can host (almost) all resources

– Exceptions: DLM, cLVM2, OCFS2, GFS2

Scale-out via Remote NodesSUSE Linux Enterprise High Availability

39

MasterNodes

Architecture

Cluster Software Stack

Corosync

Messaging / Infrastructure

Resource Allocation

Resource Agents

ResourceResourceResource

Resource

Local Resource Manager Local Resource

Manager

Cluster Resource Manager

Policy Engine Cluster Information Base (CIB)

CIB Replica Cluster Resource

Manager

Corosync

Designated Coordinator (DC)

CO

RO

SYN

CPA

CEM

AK

ERR

ESO

UR

CES

Linux High Availability StackSUSE® Linux Enterprise High Availability Extension

The stack includes:• corosync – cluster infrastructure

• Pacemaker – cluster resource manager

• resource-agents – manage and monitor availability of services

• stonith – IO fencing support (also Xen and VMware VMs)

• Hawk – Web console for cluster monitoring and administration

• crm shell – Advanced cluster command line interface

• DRBD – network cluster storage

• cLVM – Cluster-aware LVM

• OCFS2, GFS2 – active/active file systems