+ All Categories
Home > Documents > 2015-01-13 Resiliency (v04)

2015-01-13 Resiliency (v04)

Date post: 14-Aug-2015
Category:
Upload: john-mymryk
View: 83 times
Download: 6 times
Share this document with a friend
Popular Tags:
10
Resiliency John Mymryk December 15, 2014
Transcript
Page 1: 2015-01-13 Resiliency (v04)

Resiliency

John MymrykDecember 15, 2014

Page 2: 2015-01-13 Resiliency (v04)

2

Contents

1) Types of Business Impacts (Outages) & Their Costs2) Three Foundational Pillars of Business Continuity3) Problem Statement4) High Availability & Sustained Resiliency5) Our Methodology6) Value Proposition7) Appendix

Page 3: 2015-01-13 Resiliency (v04)

Types of Business Impacts (Outages) & Their Costs

3

* Sources: www.symantec.com; www.informationweek.com;: www.businesscomputingworld.co.uk; www.evolven.com; www.quorum.net

Corporations implement a Business Continuity Program (BCP) to address these types of outages as they directly impact the bottom line.

* Lost Labor: $46,000,000(Per 10,000 person company @ 1.6 hr/wk)

* Lost Revenue: $26,500,000,000(Survey across 200 companies)

* Brand Failure: $ ??RIM (Blackberry)

Revenue lost for a single outage can be in the Millions ($). Outages may also start a brand failure (i.e. Blackberry RIM outage ~ $100M)

Important! Most outages are either Hardware Failure or Human Error – a very small percentage of the overall outages are Natural Disasters (5%).

What are these Outages? (Yearly Combined)

77%

What are the Costs? Here are some Data Points:

So, What are the key elements of a BCP Program?

* Losses are estimated at $ 1,200,000,000,000 Trillion dollars annually

Page 4: 2015-01-13 Resiliency (v04)

Three Foundational Pillars of Business Continuity

4

Resiliency Recovery ContingencyResiliency is a destination

A state where critical business functions and the supporting infrastructure are unaffected by most outages.

Resiliency is the ability of a corporation to move its capability and capacity seamlessly around its environment.

Recovery is a journey

Also known as “Disaster Recovery” or DR, it is complex to maintain and difficult to implement.

DR has data loss to a point-in-time and production downtime as an acceptable outcome.

Contingency is a last resort

Establish a generalized capability and readiness to cope with major incidents and disasters. Not all are known.

Contingencies involve data loss and production downtime as an acceptable outcome.

Increasing Resiliency efforts will naturally reduce efforts in Recovery & Contingency.

$ $$ $$

Page 5: 2015-01-13 Resiliency (v04)

Business Continuity

Resiliency

Recovery

Contingency

5

Corporations leave a hole in their overall Business Continuity programs spending considerable dollars in Recovery & Contingency (covering only ~5% of outages) with diminishing return. This leaves a BCP program with built-in downtime and data loss as acceptable outcomes.

Resize Recovery & Contingency

Efforts Appropriately

Improve & Increase

Resiliency Efforts

Resiliency is pro-active in dealing with significant business survival events (hardware failure, human error, power outages, pandemic, natural disaster, social un-rest, etc.).

Problem Statement

Goal: Reset the BCP Balance

$

$$

Resiliency is a super-set of Recovery and Contingency, which leverages established process and procedure used in Recovery and Contingency.

Page 6: 2015-01-13 Resiliency (v04)

1

Tier 1 - BusinessApplication

High Availability & Sustained Resiliency

6

Tier 0 - Load BalancerInfrastructure

Tier 0 – Physical ServerInfrastructure

Tier 0 - DB Servers & DB Infrastructure

Tier 0 - Directories(LDAP, AD,..)

Tier 0 - StorageInfrastructure

Tier 0 – VirtualizationInfrastructure

CriticalOverlaps

Identified HA/SR Gaps

Identifying &

Resolving HA/SR Gaps

Reduces

InfrastructureFailures

Unplanned Outages

TangibleLoss

Intangible Loss

Provider Confidence

Regulatory Fines

A compounding and/or cascading failure can occur when many HA/SR gaps are concentrated.

The value is to find those HA/SR Gaps and address them

High Availability (HA)Component availability, which can be Inter-site or Intra-site.

Sustained Resiliency (SR)Moving capacity & capability seamlessly around the physical environment

Resilience: Critical business functions and the supporting infrastructure are designed and engineered in such a way that they are materially unaffected by most disruptions, for example through the use of redundancy and spare capacity. There are two (2) methods to do this:

Benefits of HA/SR2

Page 7: 2015-01-13 Resiliency (v04)

Our Resiliency Methodology

7

Develop Test

Schedule Test

Submit Events

HA/SR Testing

Feedback & Improvement

Validate

Gap Exposure Risk, Value Assessment

Application Testing Capability

Gap Remediation

Investigate

Applications Submitted for Assessment or Review

Perform Assessment and Onboarding

Develop Test Requirements and Objectives

Assess

5

1

2

3

4

6

7

8

9

10

11

All applications (infrastructure, services, applications or utilities) that execute all 11 steps along the Resiliency Methodology would be considered mature in their Resiliency profile, and by extension would be able to endure business impactful (outage) events.

Maturing

Page 8: 2015-01-13 Resiliency (v04)

The Value – A Proven Resiliency Program

8

Lowered IT Effort

Meet SLAs (SR)

Lower Outages (HA)

Contingency Planning

Disaster Recovery

ResiliencyMost Corporation’s

Current State

Implement Resiliency Methodology

Meets most audit requirements.

Has great planning, but limited impact on improving the production environment’s ability to sustain outages.

Increasing more effort on Disaster Recovery & Contingency Planning results in a diminishing return.

Focusing more on Resiliency, IT teams can reduce efforts and costs as DR and CP goals are met through Resiliency implementation.

End result is a resilient application infrastructure.

Page 9: 2015-01-13 Resiliency (v04)

9

Appendix

• Anatomy of a Recovery Event

Page 10: 2015-01-13 Resiliency (v04)

Area of Potential Data Loss

An Application’s Data

RTO, RPO, RTC, TTTR & BTTR (Visually) – Anatomy of a Recovery Event

10

Timeline of Incident/Outage

RPO

No Data

Business Decision Window

BusinessResumption

TakeAction

InfraReady

GoodData

InconsistentData

RebuildingData

GoodData

DataAvailable

Fixed(if applies)

Business Time To Resume(BTTR)

Recovery Time Objective (RTO)

Technology Time To Recover (TTTR)

Return To Capacity (RTC)

Application Recovery

Business RecoveryInfrastructure Recovery

People / Facilities Recovery

Time To Fix

All Hands Fix

IncidentStart

Fix It Outage Time

Recovery Outage Time


Recommended