+ All Categories
Home > Documents > HACMP-and-XDIntro

HACMP-and-XDIntro

Date post: 09-Apr-2018
Category:
Upload: sanjay-jain
View: 218 times
Download: 0 times
Share this document with a friend

of 37

Transcript
  • 8/7/2019 HACMP-and-XDIntro

    1/37

    IBM System p5 and eServer p5

    2006 IBM Corporation

    Introduction to

    High Availability Cluster Multi-Processing(HACMP)

    and

    HACMP Extended Distance(HACMP-XD)

    Shawn Bodily

    ATS HACMP Specialist

    IBM

    server

    pSeries

    IBM

  • 8/7/2019 HACMP-and-XDIntro

    2/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    2 2005 IBM Corporation

    Although hardware is now very reliable, hardware

    failures account for a small minority of system outages Several studies place the proportion between 20% and

    45% Human error, software error and planned maintenance

    cause the majority of service outages

  • 8/7/2019 HACMP-and-XDIntro

    3/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    3 2005 IBM Corporation

    Downtime and poor performance are expensive both

    financially and in terms of customer perceptions Overall downtime-costs average 3.6% of annual revenue.

    Infonetics

    Many studies estimate average cost of downtime at over$5,000/hour

    Popular Web sites estimate cost of downtime at millions of dollars

    A 22-hour crash in June, 2003 cost eBayan estimated $5M

    Losses go beyond immediate sales

    revenue To clients, availability equates to reliability

    and trustworthiness

    Internal application failures preventemployees from working

  • 8/7/2019 HACMP-and-XDIntro

    4/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    4 2005 IBM Corporation

    HACMP - Proven Technology for Business

    Mature product now in its 17th major release

    Averaging 40,000 licenses sold world-wide annually

    Built on a decade of IBM cluster leadership

    HACMP allows you to create highly available environmentswith minimal hardware.

    HACMP is scalable up to 32-nodes, allowing your cluster toadapt to the growing demands of your business.

    The optional XD feature allows your clusters to spanunlimited geographic distances.

  • 8/7/2019 HACMP-and-XDIntro

    5/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    5 2005 IBM Corporation

    HACMP Is NOT the right solution if:

    Your environment is not secure

    Network security is not in place

    Change management procedures are not respected

    You do not have trained administrator

    Environment is prone to user fiddle faddle Application requires manual intervention

    HACMP will never be an out-of-the-box

    solution to availability. A certain degree

    of skill will be always be required.

  • 8/7/2019 HACMP-and-XDIntro

    6/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    6 2005 IBM Corporation

    Reducing both Planned and Unplanned downtime

    Unplanned Outage System Failure

    Hardware Operating System Crash Power Loss

    User Error Component Failure

    NIC SCSI/SAN Adapter Network Hub/Switch

    SAN Switch Disk Failure (both O/S and application data)

    Planned Outage Maintenance

    System Hardware Change/Upgrade OS & Application Upgrades & Fixes

    Testing Applied Fixes Failure scenarios for HA & DR

  • 8/7/2019 HACMP-and-XDIntro

    7/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    7 2005 IBM Corporation

    HACMP protects against service outages by detecting

    problems and quickly failing over to backup hardware Two nodes (A and B) Two networks

    Private (internal) network

    Public (shared) network

    Shared disk

    All data in shared storageavailable to both nodes

    Critical applications

    Database server

    Web server

    Dependent on DB Shared DiskShared Disk

    PrivatePrivateNetworkNetwork

    !IBMserve

    r

    pSeries

    AA

    IBM

    server

    pSeries

    BB

    Company Shared NetworkCompany Shared Network

    Web SrvDatabase

  • 8/7/2019 HACMP-and-XDIntro

    8/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    8 2005 IBM Corporation

    Example Failure #1: Node failure

    Shared DiskShared Disk

    PrivatePrivate

    NetworkNetwork

    Node A fails completely

    Node B detects the lossof Node A

    Node B starts up its own

    instance of the Database.

    Database is temporarilytaken-over by Node Buntil Node A is broughtback online

    !IBMserve

    r

    pSeries

    AA

    IBM

    server

    pSeries

    BB

    Company Shared NetworkCompany Shared Network

    Web SrvDatabase

  • 8/7/2019 HACMP-and-XDIntro

    9/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    9 2005 IBM Corporation

    Example Failure #2: Loss of network connection

    Node A loses a NIC

    Because of NIC redundancy,

    the service IP swaps locally

    Operations continue normally

    while problem is resolved

    If total public network

    connectivity was lost a

    fallover could occur

    Shared DiskShared Disk

    PrivatePrivate

    NetworkNetwork

    !IBM

    serve

    r

    pSeries

    AA

    IBM

    server

    pSeries

    BB

    Company Shared NetworkCompany Shared Network

    Web SrvDatabase

  • 8/7/2019 HACMP-and-XDIntro

    10/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    10 2005 IBM Corporation

    One to one

    One to any

    Any to anyAny to one

    Failover possibilities

  • 8/7/2019 HACMP-and-XDIntro

    11/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    11 2005 IBM Corporation

    Custom Resource Groups

    Startup Preferences

    Online On Home Node Only (cascading) - (OHNO) Online on First Available Node (rotating or cascading w/inactive takeover)

    - (OFAN) Online On All Available Nodes (concurrent) - (OAAN) Startup Distribution

    Fallover Preferences Fallover To Next Priority Node In The List - (FOHP) Fallover Using Dynamic Node Priority - (FDNP) Bring Offline (On Error Node Only) - (BOEN)

    Fallback Preferences

    Fallback To Higher Priority Node - (FBHP) Never Fallback - (NFB)

  • 8/7/2019 HACMP-and-XDIntro

    12/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    12 2005 IBM Corporation

    Common Resources to make highly available

    Service IP Address(es)

    The IP Addresses that users/client apps will use for production This can be one or multiple addresses

    Not limited to the number of interfaces when utilizing aliasing

    Application (Server)

    Application(s) desired to be controlled/protect by HACMP Many cases can be user provided start/stop script May take advantage of pre-packaged application Smart Assists.

    Shared Storage Volume Groups Logical Volumes JFS NFS

  • 8/7/2019 HACMP-and-XDIntro

    13/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    13 2005 IBM Corporation

    Additional Granular Options

    Resource Group Dependencies Parent/Child Relationships

    Great for Multi-Tier environments

    Location Dependencies

    Online on Same Node

    All resource groups must be online on the same node

    Online on Different Nodes All resource groups must be online on different nodes

    Online on Same Site

    All resource groups must be online on the same site

    Define Resource Group Priorities (Different Node Dep.) Low

    Intermediate

    High

  • 8/7/2019 HACMP-and-XDIntro

    14/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    14 2005 IBM Corporation

    Application Monitoring

    HACMP can monitor applications in one of two ways:

    Process Monitor determines the death of a process

    Custom Monitor monitors health of the application using a monitormethod you provide

    Decisions upon failure

    Restart Can establish a number of restarts to restart locally. After aspecified restart count, if app continues to fail you can escalate to afallover.

    Notifiy Send email notification

    Fallover Move application and associated resource group to nextcandidate node.

    Suspend/Resume Application Monitoring at anytime.

  • 8/7/2019 HACMP-and-XDIntro

    15/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    15 2005 IBM Corporation

    DLPAR/CUoD configuration

    Active Processors Inactive Processors

    WebServer

    OrderEn

    try

    HACMP

    HACMP

    ProductionDatabase Server

    DLPAR/CUoD Server

    (running applications on active processors)

    Database

    Server

    Shared

    Disk

    HACMP on the primary machine detects the failure

    Running in a partition on another server, HACMP grows the backuppartition, activates the required inactive processors and restartsapplication

    HACMP

    HACMP

  • 8/7/2019 HACMP-and-XDIntro

    16/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    16 2005 IBM Corporation

    Recent HACMP releases greatly improve ease of use Enhancements include:

    Configuration wizard for typical two-node cluster

    Automatic detection and configuration of IP networks

    Online Planning Worksheet guides you through configuration

    Simplified Web-based interface for management and monitoring

    Online Planning

    Worksheets ForResource GroupsShown Here

  • 8/7/2019 HACMP-and-XDIntro

    17/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    17 2005 IBM Corporation

    With HACMP V5.x, you can configure a cluster in just

    five questions

    1. What is the address of the backup node?

    2. What is the name of the application?3. What script HACMP should use to start it?

    4. What script HACMP should use to stop it?

    5. What is the service IP label that clients will use to access

    the application?

  • 8/7/2019 HACMP-and-XDIntro

    18/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    18 2005 IBM Corporation

  • 8/7/2019 HACMP-and-XDIntro

    19/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    19 2005 IBM Corporation

    IBM S t 5 d S 5

  • 8/7/2019 HACMP-and-XDIntro

    20/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    20 2005 IBM Corporation

    WebSMIT Overview Demo

    IBM S t 5 d S 5

  • 8/7/2019 HACMP-and-XDIntro

    21/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    21 2005 IBM Corporation

    HACMP Cluster Test Tool

    The Cluster Test Tool reduces implementation costs by simplifyingvalidation of cluster functionality.

    It reduces support costs by automating testing of an HACMP cluster

    to ensure correct behavior in the event of a real cluster failure.

    The Cluster Test Tool executes a test plan, which consists of a seriesof individual tests.

    Tests are carried out in sequence and the results are analyzed by thetest tool.

    Administrators may define a custom test plan or use the automatedtest procedure.

    Test results and other important data are collected in the test tool'slog file.

    IBM S t 5 d S 5

  • 8/7/2019 HACMP-and-XDIntro

    22/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    22 2005 IBM Corporation

    New features make HACMP V5.X easier to use

    and more flexible Automatic detection and correction of common cluster

    configuration problems

    Enhanced support for complex multi-tier applications,relationships and dependencies

    Clusters can be configured with simple ASCII files

    Parallel resource processing recovers applications faster

    Simpler, more flexible configuration and management

    New Smart-Assists simplify HACMP implementation inDB2, Oracle and WebSphere environments

    Inexpensive option includes all three Smart-Assists

    IBM S t 5 d S 5

  • 8/7/2019 HACMP-and-XDIntro

    23/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    23 2005 IBM Corporation

    HACMP with Oracle 10g fallover Demo

    (1) p52A(1) p505(1) HMCHACMP 5.4

    AIX 5.3 TL5Oracle 10gDS4300LPARMon (http://www.alphaworks.ibm.com/tech/lparmon)

    Swingbench (http://www.dominicgiles.com/swingbench.html)Web-based System Manager

    The cluster shown, was actually created using the two-nodeconfiguration assistant within HACMP.

  • 8/7/2019 HACMP-and-XDIntro

    24/37

    IBM System p5 and eServer p5

    2006 IBM Corporation

    HACMP Extended Distance(HACMP-XD)

    IBM

    server

    pSeries

    IBM

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    25/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    25 2005 IBM Corporation

    HA/DR is a balance of recovery time requirements and cost

    Do you really need HA or DR ?

    What is the target recovery time ?

    Minutes ? Hours ? Days ?

    Costs associated with implementing andmaintaining an HA or DR solution

    Redundant hardware Inter site networking

    Operations staff

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    26/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    26 2005 IBM Corporation

    Tiers of Disaster Recovery:Level Setting HACMP/XD

    Recovery TimeTiers based on SHARE definitions

    15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

    Tier 4 - Batch/Online database shadowing & journaling,Point in Time disk copy (FlashCopy), TSM-DRM

    Tier 3 - Electronic Vaulting, TSM**, Tape

    Tier 2 - PTAM, Hot Site,TSM**

    Value

    *PTAM = Pickup Truck Access Method with Tape

    **TSM = Tivoli Storage Manager*** = Geographically Dispersed Parallel Sysplex

    Tier 7 - Highly automated, business wide, integrated solution (Example:GDPS/PPRC/VTS P2P, AIX HACMP/XD , OS/400 HABP....

    Tier 6 - Storage mirroring (example: XRC,PPRC, VTS Peer to Peer)

    Tier 5 - Software two site, two phase commit (transaction integrity)

    Applications withLow tolerance to

    outage

    ApplicationsSomewhat Tolerant

    to outage

    Applications verytolerant to outage*Tier 1 - PTAM

    Zero or near zero datarecreationZero or near zero data

    minutes to hoursminutes to hours

    data recreationdata recreation

    up to 24 hoursup to 24 hours

    data recreationdata recreation

    24-48 hours24-48 hoursdata recreationdata recreation

    Best D/R practice is to blend tiers of solutions in order to maximize application

    coverage at lowest possible cost . One size, one technology, or one

    methodology doesn't fit all applications.

    HACMP /XDfits in here

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    27/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    27 2005 IBM Corporation

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    28/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    28 2005 IBM Corporation

    HACMP Extended Distance (XD) is an optional

    component for cross-site geographic disaster recovery

    Backup systems may be physically separate from primary

    operations for protection in the event of power failure, flood,earthquake etc.

    The XD option provides a basket of disaster recoverycapabilities and integration points

    XD provides multiple options:

    IP-based data mirroring (GLVM, HAGEO) Support for hardware-based data mirroring (Metro-Mirror/PPRC)

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    29/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    29 2005 IBM Corporation

    HACMP XD Extended Distance for Disaster Recovery

    Data replication between sites ensures a copy of the data isavailable after a site wide disaster

    Choice of Technology depends on distance, performancerequirements

    Campus-wide use LVM Split Site Mirroring

    S

    A

    N

    LAN / MAN

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    30/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    30 2005 IBM Corporation

    HACMP XD Extended Distance for Disaster Recovery

    Metro wide use SVC or ESS/PPRC Mirroring

    ServerA ServerB ServerC ServerD

    Router Router

    PPRC/Metro

    Mirror

    oreRCMF

    Primary

    ESS/DS

    Secondary

    ESS/DS

    Production

    Site

    Recovery

    Site

    SVC Mirroring

    SVC SVC

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    31/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    31 2005 IBM Corporation

    HACMP XD Extended Distance for Disaster Recovery

    Unlimited use GLVM Mirroring

    Subset of disks are defined as Remote Physical Volumes or RPVs

    copy 1 Mirror 2 copy 2copy 1 Mirror 2 copy 2

    copy 1 Mirror 1 copy 2copy 1 Mirror 1 copy 2

    RPV Driver

    Replicates

    data over

    WAN

    LVMMirroredVolumeGroup

    Both sites always have a complete copy of all mirrors

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    32/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    32 2005 IBM Corporation

    New HACMP Geographic Logical Volume Manager is

    a reliable, easy-to-use data mirror and failovercapability

    GLVM provides unlimited-distance IP-based data mirroring

    Fully integrated with AIX 5L logical volume management

    Easier to use than existing HAGEO solution

    No need to define and manage separate state maps Long-term replacement for HAGEO

    Automatically reverses direction of data replication on

    failover

    Supports all IBM TotalStorage products certified withbase HACMP

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    33/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    33 2005 IBM Corporation

    HACMP XD HACMP automates the solution

    HACMP integrates support for all the replication options

    Manages data replication direction, switching and resyncafter recovery

    Recovers locally or moves entire application to backup site

    Common infrastructure supports all solutions

    Choose the one that meets your performance and distance

    requirements

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    34/37

    IBM System p5 and eServer p5

    2004 IBM Corporation

    I

    34 2005 IBM Corporation

    Thank You

    Questions?????

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    35/37

    y p p

    2004 IBM Corporation

    I

    35 2005 IBM Corporation

    Backup Slides on Networking

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    36/37

    y p p

    2004 IBM Corporation

    I

    36 2005 IBM Corporation

    Typical Local HACMP Clustering Configuration

    A single network view on a common subnet.Multiple networks can be used.

    switch

    switch

    en0

    en1

    en0

    en1

    10.70.10.x

    IBM System p5 and eServer p5

  • 8/7/2019 HACMP-and-XDIntro

    37/37

    y p p

    I

    HACMP Clustering Across Sites

    Different subnets, routers connected to allow cross subnet communications

    switch

    switch

    en0

    en1

    en0

    en1

    10.70.10.x

    switch

    switch

    10.50.10.x

    Router Router


Recommended