John Graham – STRATEGIC Information Group Steve Lamb - QAD Disaster Recovery Planning MMUG Spring...

Post on 23-Dec-2015

216 views 0 download

Tags:

transcript

MMUG Cleveland OH

John Graham – STRATEGIC Information Group

Steve Lamb - QAD

Disaster Recovery PlanningMMUG Spring 2013

March 19, 2013 Cleveland, OH

03/19/2013

Menu

Statistics

Definitions

Example high level tasks

Questions

Shocking Statistics

• 43% of companies experiencing disasters never re-open.

• 29% close within two years.

Source - McGladrey and Pullen

More Statistics

93% of businesses that lost their data center for 10 days went bankrupt within one year.

Source - National Archives & Records Administration

Shocking Statistics

40% of all companies that experience a major disaster will go out of business if they cannot gain access to their data within 24 hours.

Source - Gartner

Definitions

• Business Continuity• Disaster Recovery• Business Recovery• High Availability• Redundancy• ColdSite• WarmSite• HotSite

• Mission Critical• RPO• RTO

Business Continuity

• (BC): Planning to ensure the continuity of business critical functions in the event of a major unplanned service failure or disaster

• Includes key aspects such as personnel, facilities, crisis communication, project management and change control. A BC strategy includes a Disaster Recovery Plan (DRP) for IT related infrastructure recovery.

• An all encompassing term covering both disaster recovery planning and business resumption planning

Disaster Recovery

• (DR): Part of a larger Business Continuity plan that includes processes and solutions to restore business critical applications, data, hardware, communications (such as networking) and other IT infrastructure.

• Can also include measures to protect against other unplanned events such as the failure of an individual server or shorter service interruptions

Business Recovery

• The common critical path that all companies follow during a recovery effort. There are major nodes along the path which are followed regardless of the organization. The process includes: – Immediate response, – Environmental restoration or relocation– Functional restoration – Data recovery and synchronization – Restore business functions – Return to normal

High Availability

• (HA): A system or component that is continuously operational for a desirably long length of time.

• Usually includes redundant local systems

Redundancy

Systematically using multiple sources, devices or connections to eliminate single points of failure that could completely stop the flow of information.

Mission Critical Systems

Systems or applications that are essential to the functioning of your business and its processes.

Recovery Point Objective

(RPO): The age of files that must be recovered for normal operations to resume if a system goes down as a result of a failure.

Recovery Time Objective

(RTO): The maximum tolerable length of time that a computer, system, network or application can be down after a failure or disaster occurs.

Coldsite

• An alternate facility that is void of any resources or equipment except air-conditioning, raised flooring and power.

• Equipment and resources must be installed in such a facility to duplicate the critical business functions of an organization.

Warmsite

• An alternate processing site which is only partially equipped

• As compared to Hot Site which is fully equipped

Hotsite

• A DR facility fully equipped with the equipment, network connections and environmental conditions necessary for restoring your data and getting your systems up and running instantly.

• unlike coldsites and warmsites, which are not ready to go in an instant

Availability vs. DR

• There's a huge difference between disaster prevention (Availability) and disaster recovery.

• Both are necessary. But the former only mitigates the risk of downtime. The latter ensures quick recovery in the event of downtime.

High Availability

DATA CENTER LEVEL• Utility Power and UPS• Generator• Core Networking• Security• HVAC• Fire Prevention• Monitoring

• CONFIGURATION LEVEL• Power• Network Devices• Security Devices• Server Clustering• Storage• Encryption & Policies

DISASTER RECOVERY

• Backup & Retention• Data Replication• Application Recovery• Server Recovery• DR Assessment• DR Plan Testing

High Level Tasks

• Perform Needs Analysis/Discovery• Identify Requirements Based on Analysis• Identify Recovery Time Objectives• Identify Recovery Point Objectives• Perform Initial Design• Review Initial Design• Implement Design• Test Plan

Identify Requirements Based on Analysis

• Facilities• Hardware• System / Applications

Identify Recovery Time Objectives

• Per System / Application

Identify Recovery Point Objectives

• Per System / Application

Example RTO/RPO Analysis

Mission Critical Systems

Application Preferred RTO Preferred RPO Compromised RTO

Compromised RPO

QAD 4 0 8 Last Backup

Gentran 4 0 8 Last Backup

Active Directory

4 0 8 Last Backup

Exchange 4 0 8 Last Backup

Initial Design

• Identify disaster recovery project members• Perform Risk Analysis• Define high-level recovery strategy• Define costs associated with strategy

Review Initial Design

• Justify costs to risk analysis

Detailed Design

• Architect systems & recovery.• Create detailed project plan.

Implement Design

• Procure facilities, equipment, and software.• Build recovery site.

Test Plan

• Test Plan

Summary

• Potential ramifications of a disaster without a plan

• Definitions of a few important terms

• Key tasks to develop and implement a plan

Questions