HACMP-and-XDIntro

8/7/2019 HACMP-and-XDIntro

1/37

IBM System p5 and eServer p5

2006 IBM Corporation

Introduction to

High Availability Cluster Multi-Processing(HACMP)

and

HACMP Extended Distance(HACMP-XD)

Shawn Bodily

ATS HACMP Specialist

IBM

server

pSeries

IBM


2/37



I

2 2005 IBM Corporation

Although hardware is now very reliable, hardware

failures account for a small minority of system outages Several studies place the proportion between 20% and

45% Human error, software error and planned maintenance

cause the majority of service outages


3/37



I


Downtime and poor performance are expensive both

financially and in terms of customer perceptions Overall downtime-costs average 3.6% of annual revenue.

Infonetics

Many studies estimate average cost of downtime at over$5,000/hour

Popular Web sites estimate cost of downtime at millions of dollars

A 22-hour crash in June, 2003 cost eBayan estimated $5M

Losses go beyond immediate sales

revenue To clients, availability equates to reliability

and trustworthiness

Internal application failures preventemployees from working


4/37



I


HACMP - Proven Technology for Business

Mature product now in its 17th major release

Averaging 40,000 licenses sold world-wide annually

Built on a decade of IBM cluster leadership

HACMP allows you to create highly available environmentswith minimal hardware.

HACMP is scalable up to 32-nodes, allowing your cluster toadapt to the growing demands of your business.

The optional XD feature allows your clusters to spanunlimited geographic distances.


5/37



I


HACMP Is NOT the right solution if:

Your environment is not secure

Network security is not in place

Change management procedures are not respected

You do not have trained administrator

Environment is prone to user fiddle faddle Application requires manual intervention

HACMP will never be an out-of-the-box

solution to availability. A certain degree

of skill will be always be required.


6/37



I


Reducing both Planned and Unplanned downtime

Unplanned Outage System Failure

Hardware Operating System Crash Power Loss

User Error Component Failure

NIC SCSI/SAN Adapter Network Hub/Switch

SAN Switch Disk Failure (both O/S and application data)

Planned Outage Maintenance

System Hardware Change/Upgrade OS & Application Upgrades & Fixes

Testing Applied Fixes Failure scenarios for HA & DR


7/37



I


HACMP protects against service outages by detecting

problems and quickly failing over to backup hardware Two nodes (A and B) Two networks

Private (internal) network

Public (shared) network

Shared disk

All data in shared storageavailable to both nodes

Critical applications

Database server

Web server

Dependent on DB Shared DiskShared Disk

PrivatePrivateNetworkNetwork

!IBMserve

r

pSeries

AA

IBM

server

pSeries

BB

Company Shared NetworkCompany Shared Network

Web SrvDatabase


8/37



I


Example Failure #1: Node failure

Shared DiskShared Disk

PrivatePrivate

NetworkNetwork

Node A fails completely

Node B detects the lossof Node A

Node B starts up its own

instance of the Database.

Database is temporarilytaken-over by Node Buntil Node A is broughtback online

!IBMserve

r

pSeries

AA

IBM

server

pSeries

BB


Web SrvDatabase


9/37



I


Example Failure #2: Loss of network connection

Node A loses a NIC

Because of NIC redundancy,

the service IP swaps locally

Operations continue normally

while problem is resolved

If total public network

connectivity was lost a

fallover could occur

Shared DiskShared Disk

PrivatePrivate

NetworkNetwork

!IBM

serve

r

pSeries

AA

IBM

server

pSeries

BB


Web SrvDatabase


10/37



I


One to one

One to any

Any to anyAny to one

Failover possibilities


11/37



I


Custom Resource Groups

Startup Preferences

Online On Home Node Only (cascading) - (OHNO) Online on First Available Node (rotating or cascading w/inactive takeover)

- (OFAN) Online On All Available Nodes (concurrent) - (OAAN) Startup Distribution

Fallover Preferences Fallover To Next Priority Node In The List - (FOHP) Fallover Using Dynamic Node Priority - (FDNP) Bring Offline (On Error Node Only) - (BOEN)

Fallback Preferences

Fallback To Higher Priority Node - (FBHP) Never Fallback - (NFB)


12/37



I


Common Resources to make highly available

Service IP Address(es)

The IP Addresses that users/client apps will use for production This can be one or multiple addresses

Not limited to the number of interfaces when utilizing aliasing

Application (Server)

Application(s) desired to be controlled/protect by HACMP Many cases can be user provided start/stop script May take advantage of pre-packaged application Smart Assists.

Shared Storage Volume Groups Logical Volumes JFS NFS


13/37



I


Additional Granular Options

Resource Group Dependencies Parent/Child Relationships

Great for Multi-Tier environments

Location Dependencies

Online on Same Node

All resource groups must be online on the same node

Online on Different Nodes All resource groups must be online on different nodes

Online on Same Site

All resource groups must be online on the same site

Define Resource Group Priorities (Different Node Dep.) Low

Intermediate

High


14/37



I


Application Monitoring

HACMP can monitor applications in one of two ways:

Process Monitor determines the death of a process

Custom Monitor monitors health of the application using a monitormethod you provide

Decisions upon failure

Restart Can establish a number of restarts to restart locally. After aspecified restart count, if app continues to fail you can escalate to afallover.

Notifiy Send email notification

Fallover Move application and associated resource group to nextcandidate node.

Suspend/Resume Application Monitoring at anytime.


15/37



I


DLPAR/CUoD configuration

Active Processors Inactive Processors

WebServer

OrderEn

try

HACMP

HACMP

ProductionDatabase Server

DLPAR/CUoD Server

(running applications on active processors)

Database

Server

Shared

Disk

HACMP on the primary machine detects the failure

Running in a partition on another server, HACMP grows the backuppartition, activates the required inactive processors and restartsapplication

HACMP

HACMP


16/37



I


Recent HACMP releases greatly improve ease of use Enhancements include:

Configuration wizard for typical two-node cluster

Automatic detection and configuration of IP networks

Online Planning Worksheet guides you through configuration

Simplified Web-based interface for management and monitoring

Online Planning

Worksheets ForResource GroupsShown Here


17/37



I


With HACMP V5.x, you can configure a cluster in just

five questions

1. What is the address of the backup node?

2. What is the name of the application?3. What script HACMP should use to start it?

4. What script HACMP should use to stop it?

5. What is the service IP label that clients will use to access

the application?


18/37



I



19/37



I


IBM S t 5 d S 5


20/37



I


WebSMIT Overview Demo

IBM S t 5 d S 5


21/37



I


HACMP Cluster Test Tool

The Cluster Test Tool reduces implementation costs by simplifyingvalidation of cluster functionality.

It reduces support costs by automating testing of an HACMP cluster

to ensure correct behavior in the event of a real cluster failure.

The Cluster Test Tool executes a test plan, which consists of a seriesof individual tests.

Tests are carried out in sequence and the results are analyzed by thetest tool.

Administrators may define a custom test plan or use the automatedtest procedure.

Test results and other important data are collected in the test tool'slog file.

IBM S t 5 d S 5


22/37



I


New features make HACMP V5.X easier to use

and more flexible Automatic detection and correction of common cluster

configuration problems

Enhanced support for complex multi-tier applications,relationships and dependencies

Clusters can be configured with simple ASCII files

Parallel resource processing recovers applications faster

Simpler, more flexible configuration and management

New Smart-Assists simplify HACMP implementation inDB2, Oracle and WebSphere environments

Inexpensive option includes all three Smart-Assists

IBM S t 5 d S 5


23/37



I


HACMP with Oracle 10g fallover Demo

(1) p52A(1) p505(1) HMCHACMP 5.4

AIX 5.3 TL5Oracle 10gDS4300LPARMon (http://www.alphaworks.ibm.com/tech/lparmon)

Swingbench (http://www.dominicgiles.com/swingbench.html)Web-based System Manager

The cluster shown, was actually created using the two-nodeconfiguration assistant within HACMP.


24/37



HACMP Extended Distance(HACMP-XD)

IBM

server

pSeries

IBM



25/37



I


HA/DR is a balance of recovery time requirements and cost

Do you really need HA or DR ?

What is the target recovery time ?

Minutes ? Hours ? Days ?

Costs associated with implementing andmaintaining an HA or DR solution

Redundant hardware Inter site networking

Operations staff



26/37



I


Tiers of Disaster Recovery:Level Setting HACMP/XD

Recovery TimeTiers based on SHARE definitions

15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Tier 4 - Batch/Online database shadowing & journaling,Point in Time disk copy (FlashCopy), TSM-DRM

Tier 3 - Electronic Vaulting, TSM**, Tape

Tier 2 - PTAM, Hot Site,TSM**

Value

*PTAM = Pickup Truck Access Method with Tape

**TSM = Tivoli Storage Manager*** = Geographically Dispersed Parallel Sysplex

Tier 7 - Highly automated, business wide, integrated solution (Example:GDPS/PPRC/VTS P2P, AIX HACMP/XD , OS/400 HABP....

Tier 6 - Storage mirroring (example: XRC,PPRC, VTS Peer to Peer)

Tier 5 - Software two site, two phase commit (transaction integrity)

Applications withLow tolerance to

outage

ApplicationsSomewhat Tolerant

to outage

Applications verytolerant to outage*Tier 1 - PTAM

Zero or near zero datarecreationZero or near zero data

minutes to hoursminutes to hours

data recreationdata recreation

up to 24 hoursup to 24 hours

data recreationdata recreation

24-48 hours24-48 hoursdata recreationdata recreation

Best D/R practice is to blend tiers of solutions in order to maximize application

coverage at lowest possible cost . One size, one technology, or one

methodology doesn't fit all applications.

HACMP /XDfits in here



27/37



I




28/37



I


HACMP Extended Distance (XD) is an optional

component for cross-site geographic disaster recovery

Backup systems may be physically separate from primary

operations for protection in the event of power failure, flood,earthquake etc.

The XD option provides a basket of disaster recoverycapabilities and integration points

XD provides multiple options:

IP-based data mirroring (GLVM, HAGEO) Support for hardware-based data mirroring (Metro-Mirror/PPRC)



29/37



I


HACMP XD Extended Distance for Disaster Recovery

Data replication between sites ensures a copy of the data isavailable after a site wide disaster

Choice of Technology depends on distance, performancerequirements

Campus-wide use LVM Split Site Mirroring

S

A

N

LAN / MAN



30/37



I



Metro wide use SVC or ESS/PPRC Mirroring

ServerA ServerB ServerC ServerD

Router Router

PPRC/Metro

Mirror

oreRCMF

Primary

ESS/DS

Secondary

ESS/DS

Production

Site

Recovery

Site

SVC Mirroring

SVC SVC



31/37



I



Unlimited use GLVM Mirroring

Subset of disks are defined as Remote Physical Volumes or RPVs

copy 1 Mirror 2 copy 2copy 1 Mirror 2 copy 2

copy 1 Mirror 1 copy 2copy 1 Mirror 1 copy 2

RPV Driver

Replicates

data over

WAN

LVMMirroredVolumeGroup

Both sites always have a complete copy of all mirrors



32/37



I


New HACMP Geographic Logical Volume Manager is

a reliable, easy-to-use data mirror and failovercapability

GLVM provides unlimited-distance IP-based data mirroring

Fully integrated with AIX 5L logical volume management

Easier to use than existing HAGEO solution

No need to define and manage separate state maps Long-term replacement for HAGEO

Automatically reverses direction of data replication on

failover

Supports all IBM TotalStorage products certified withbase HACMP



33/37



I


HACMP XD HACMP automates the solution

HACMP integrates support for all the replication options

Manages data replication direction, switching and resyncafter recovery

Recovers locally or moves entire application to backup site

Common infrastructure supports all solutions

Choose the one that meets your performance and distance

requirements



34/37



I


Thank You

Questions?????



35/37

y p p


I


Backup Slides on Networking



36/37

y p p


I


Typical Local HACMP Clustering Configuration

A single network view on a common subnet.Multiple networks can be used.

switch

switch

en0

en1

en0

en1

10.70.10.x



37/37

y p p

I

HACMP Clustering Across Sites

Different subnets, routers connected to allow cross subnet communications

switch

switch

en0

en1

en0

en1

10.70.10.x

switch

switch

10.50.10.x

Router Router

Date post:	09-Apr-2018
Category:	Documents
Upload:	sanjay-jain
View:	218 times
Download:	0 times

HACMP-and-XDIntro

Documents