SUSE® Linux EnterpriseHigh Availability
Kai Dupke
Senior Product Manager
SUSE Linux Enterprise Server
Kristoffer Grönlund
Senior Software Engineer
HA Architect
SUSE High Availability:Easy, Quick, Anywhere
● Overview
● High Availability
● Geo Cluster
● Roadmap
3
Topics
ChallengeSUSE® Linux Enterprise High Availability
● Murphy’s Law is universal
● Faults will occur– Hardware crash, flood, fire, power outage, earthquake
● Service outage and loss of data– You might afford a five second blip, but can you afford a longer outage?
● Can you afford low availability?
HA or no HA?
Quis custodit custodes?SUSE® Linux Enterprise High Availability
Reboot instead of failing over• (virtualized) hardware needs to be available
Re-deployment instead of failing over• Monitor needs to be always available
Farmed services• Client needs to handle fail-over ('F5', SMTP)
• 3rd party application must support scale-out
• Backend needs to be available
SUSE® Linux EnterpriseHigh Availability
OverviewSUSE® Linux Enterprise High Availability
Most modern and complete open source solution for high availability Linux clusters
A suite of robust open source technologies that is• Easy to use
• Integrated
• Virtualization agnostic
Used with SUSE Linux Enterprise Server, it helps to• Maintain business continuity
• Protect data integrity
• Reduce unplanned downtime for mission-critical workloads
• Service Failover
• Cluster File Systems
• Clustered Samba
• Virtualization Agnostic
• Full support for x86, x86_64, POWER, and System z
• Network Load-Balancer
• Data Replication
• Node Recovery
• HAWK Web GUI
• Unlimited Geo Clustering
FeaturesSUSE® Linux Enterprise High Availability
SUSE unique!
TargetsSUSE® Linux Enterprise High Availability
Quickly and easily install, configure and manage
Continuous access to mission-critical systems and data
Transparent to Virtualization
Meet Service Level Agreements
Increase service availability
Key Use Cases – mission-critical servicesSUSE® Linux Enterprise High Availability
Active/active services OCFS2, Databases, Samba File Servers
Active/passive service fail-over Traditional databases, SAP setups, regular services
High availability across guests Fine granular monitoring and HA on top of virtualization
Network Load-Balancing with transparent fail-over
All Topologies Local, Metro, and Geographical area clusters
Simple Stack Enqueue Replication
DRBD Data Sync HA in Virtual Environments
Sample Use Cases - SAPSUSE® Linux Enterprise High Availability
Pharmaceutical drugs & products
Part of STADA group
Running Highly Available SAP
Reference – Ciclum PharmaSUSE® Linux Enterprise High Availability
„SUSE Linux Enterprise offers the perfect combination of flexibility and reliability.”
„100 percent uptime for SAP since the solution is live.”
„The partnership between SUSE and SAP gave us confidence.”
“SUSE Linux Enterprise High Availability Extension gives us powerful tools.”
— ANTÓNIO DAMASIT Manager
Ciclum Farma
Geo Cluster
Cluster fail-over between different locations• Provide disaster resilience in case of site failure
• Each site is a self-contained, autonomous cluster
• Support manual and automatic switch-/fail-over
Extends Metro Cluster capabilities• No distance limit between data centers
• No unified storage / network needed
Storage replicated as active / passive• Leverage SUSE included data replication (DRBD)
• Integrate third-party solutions via scripts
Geo Cluster – OverviewSUSE® Linux Enterprise High Availability
Local cluster• Negligible network latency
• Typically synchronous concurrent storage access
Metro area (stretched) cluster• Network latency <15ms (~20mls)
• Unified / redundant network between sites
• Usually some form of replication at the storage level
Geo clustering• High network latency, limited bandwidth
• Asynchronous storage replication
Geo Cluster – From Local to GeoSUSE® Linux Enterprise High Availability
Geo Cluster – SetupSUSE® Linux Enterprise High Availability
Site A Site B
(Arbitrator)
boothd
Node 1 Node 2 Node 7 Node 8
Site C
boothd boothd
Service failover at any distance – from local to geo
Up to 99.9999% availability
Rolling updates for less planned downtime
Easy setup, administration, management
Virtualization agnostic
Leading open source High Availability
Fighting Murphy's Law
When will you start?
SummarySUSE® Linux Enterprise High Availability
Roadmap
RoadmapSUSE® Linux Enterprise High Availability
SP1SP2SP3
2015 2017 2018
High Availability• Host based mirroring optimization• AWS cloud support
GEO Cluster• Virtualization for standard workloads
High Availability• HAWK GUI redesign• HA for POWER• md-cluster data mirroring
High Availability• Azure cloud support
GEO Cluster• Bootstrap support• Wizard support
SERVICE PACK 1 SERVICE PACK 2 SERVICE PACK 3
2016
This information is forward-looking and is subject to change at any time.
Recent Improvements – 12 SP1SUSE® Linux Enterprise High Availability
Hawk 2• Redesigned and updated interface
• Many new wizards
• Command log
SUSE Linux High Availability 12Service Pack 2
● Hawk 2 now default
● Hawk Batch Mode
● Pacemaker 1.15: Event-based Alerts
● Clustered RAID 1 (cluster-md)
● HAProxy 1.6
● AWS fencing agent, tool support
● Power LE
● UEFI support in ReaR
SUSE® Linux Enterprise High AvailabilityService Pack 2
26
Setup & Management
Easy to bootstrap
node1 # ha-cluster-init -i bond0 -t ocfs2 -p /dev/disk/by-id/...node[2...N] # ha-cluster-join -c node1
Web interface for cluster management & wizards
Easy Setup – Bootstrap & WizardsSUSE® Linux Enterprise High Availability
Hawk 2 – Batch Mode
Hawk 2 – History Explorer
crm shell – Cluster Scripts
# crm script run virtual-ip id=admin ip=10.13.37.98
INFO: Virtual IP
INFO: Nodes: alice, bob
OK: Configure cluster resources
# crm cfg show admin
primitive admin IPaddr2 \
params ip=10.13.37.98 \
op start timeout=20 interval=0 \
op stop timeout=20 interval=0 \
op monitor interval=10 timeout=20
● Shared device RAID-1● Avoid SAN as SPOF● High performance
See dedicated talk on cluster-md!
cluster-md
cluster-md
Outlook
Public Cloud● AWS / EC2● Azure
Geo Cluster● Bootstrap● Wizards
Interface / Tools● Hawk 2 - Fencing Topology● Hawk 2 – Alerts
Upcoming Improvements – 12 SP3SUSE® Linux Enterprise High Availability
This information is forward-looking and is subject to change at any time.
Future – Beyond the next FrontierSUSE Linux Enterprise High Availability
This information is forward-looking and is subject to change at any time.
Failure will occur
• How to predict & avoid failures?
Virtualization, Containers and Cloud
• Monitor from outside or inside the guests?
Local, Metro, Geo...
• What is the next new cluster scenario?
Scalability
• What is the right cluster size – 2 nodes, 20 nodes, 200 nodes, 2000+ nodes?
Usability
• What makes cluster deployment, operation, support easier?
Questions
Backup
● Remote monitoring of resources
– no HA components needed
– re-use of Nagios/icinga plugins
● Improved handling of virtual guests
– monitor virtual services from the hypervisor
– improve protection of VMs as cluster workload
– guests remain unaltered – monitoring is external
● Extends pacemaker to include the concept of “container” resources
External Remote MonitoringSUSE Linux Enterprise High Availability
38
● Core is a traditional cluster (up to 32 nodes)
● Core drives arbitrary number of remote nodes
– Remote nodes can be virtual or physical
● Remote management and monitoring
– Remote agent (pacemaker-remote) needed
– Uses resource agents & system init scripts
– More feature-rich than external monitoring
● Remote nodes can host (almost) all resources
– Exceptions: DLM, cLVM2, OCFS2, GFS2
Scale-out via Remote NodesSUSE Linux Enterprise High Availability
39
MasterNodes
Architecture
Cluster Software Stack
Corosync
Messaging / Infrastructure
Resource Allocation
Resource Agents
ResourceResourceResource
Resource
Local Resource Manager Local Resource
Manager
Cluster Resource Manager
Policy Engine Cluster Information Base (CIB)
CIB Replica Cluster Resource
Manager
Corosync
Designated Coordinator (DC)
CO
RO
SYN
CPA
CEM
AK
ERR
ESO
UR
CES
Linux High Availability StackSUSE® Linux Enterprise High Availability Extension
The stack includes:• corosync – cluster infrastructure
• Pacemaker – cluster resource manager
• resource-agents – manage and monitor availability of services
• stonith – IO fencing support (also Xen and VMware VMs)
• Hawk – Web console for cluster monitoring and administration
• crm shell – Advanced cluster command line interface
• DRBD – network cluster storage
• cLVM – Cluster-aware LVM
• OCFS2, GFS2 – active/active file systems