Date post: | 14-Aug-2015 |
Category: |
Documents |
Upload: | john-mymryk |
View: | 83 times |
Download: | 6 times |
Resiliency
John MymrykDecember 15, 2014
2
Contents
1) Types of Business Impacts (Outages) & Their Costs2) Three Foundational Pillars of Business Continuity3) Problem Statement4) High Availability & Sustained Resiliency5) Our Methodology6) Value Proposition7) Appendix
Types of Business Impacts (Outages) & Their Costs
3
* Sources: www.symantec.com; www.informationweek.com;: www.businesscomputingworld.co.uk; www.evolven.com; www.quorum.net
Corporations implement a Business Continuity Program (BCP) to address these types of outages as they directly impact the bottom line.
* Lost Labor: $46,000,000(Per 10,000 person company @ 1.6 hr/wk)
* Lost Revenue: $26,500,000,000(Survey across 200 companies)
* Brand Failure: $ ??RIM (Blackberry)
Revenue lost for a single outage can be in the Millions ($). Outages may also start a brand failure (i.e. Blackberry RIM outage ~ $100M)
Important! Most outages are either Hardware Failure or Human Error – a very small percentage of the overall outages are Natural Disasters (5%).
What are these Outages? (Yearly Combined)
77%
What are the Costs? Here are some Data Points:
So, What are the key elements of a BCP Program?
* Losses are estimated at $ 1,200,000,000,000 Trillion dollars annually
Three Foundational Pillars of Business Continuity
4
Resiliency Recovery ContingencyResiliency is a destination
A state where critical business functions and the supporting infrastructure are unaffected by most outages.
Resiliency is the ability of a corporation to move its capability and capacity seamlessly around its environment.
Recovery is a journey
Also known as “Disaster Recovery” or DR, it is complex to maintain and difficult to implement.
DR has data loss to a point-in-time and production downtime as an acceptable outcome.
Contingency is a last resort
Establish a generalized capability and readiness to cope with major incidents and disasters. Not all are known.
Contingencies involve data loss and production downtime as an acceptable outcome.
Increasing Resiliency efforts will naturally reduce efforts in Recovery & Contingency.
$ $$ $$
Business Continuity
Resiliency
Recovery
Contingency
5
Corporations leave a hole in their overall Business Continuity programs spending considerable dollars in Recovery & Contingency (covering only ~5% of outages) with diminishing return. This leaves a BCP program with built-in downtime and data loss as acceptable outcomes.
Resize Recovery & Contingency
Efforts Appropriately
Improve & Increase
Resiliency Efforts
Resiliency is pro-active in dealing with significant business survival events (hardware failure, human error, power outages, pandemic, natural disaster, social un-rest, etc.).
Problem Statement
Goal: Reset the BCP Balance
$
$$
Resiliency is a super-set of Recovery and Contingency, which leverages established process and procedure used in Recovery and Contingency.
1
Tier 1 - BusinessApplication
High Availability & Sustained Resiliency
6
Tier 0 - Load BalancerInfrastructure
Tier 0 – Physical ServerInfrastructure
Tier 0 - DB Servers & DB Infrastructure
Tier 0 - Directories(LDAP, AD,..)
Tier 0 - StorageInfrastructure
Tier 0 – VirtualizationInfrastructure
CriticalOverlaps
Identified HA/SR Gaps
Identifying &
Resolving HA/SR Gaps
Reduces
InfrastructureFailures
Unplanned Outages
TangibleLoss
Intangible Loss
Provider Confidence
Regulatory Fines
A compounding and/or cascading failure can occur when many HA/SR gaps are concentrated.
The value is to find those HA/SR Gaps and address them
High Availability (HA)Component availability, which can be Inter-site or Intra-site.
Sustained Resiliency (SR)Moving capacity & capability seamlessly around the physical environment
Resilience: Critical business functions and the supporting infrastructure are designed and engineered in such a way that they are materially unaffected by most disruptions, for example through the use of redundancy and spare capacity. There are two (2) methods to do this:
Benefits of HA/SR2
Our Resiliency Methodology
7
Develop Test
Schedule Test
Submit Events
HA/SR Testing
Feedback & Improvement
Validate
Gap Exposure Risk, Value Assessment
Application Testing Capability
Gap Remediation
Investigate
Applications Submitted for Assessment or Review
Perform Assessment and Onboarding
Develop Test Requirements and Objectives
Assess
5
1
2
3
4
6
7
8
9
10
11
All applications (infrastructure, services, applications or utilities) that execute all 11 steps along the Resiliency Methodology would be considered mature in their Resiliency profile, and by extension would be able to endure business impactful (outage) events.
Maturing
The Value – A Proven Resiliency Program
8
Lowered IT Effort
Meet SLAs (SR)
Lower Outages (HA)
Contingency Planning
Disaster Recovery
ResiliencyMost Corporation’s
Current State
Implement Resiliency Methodology
Meets most audit requirements.
Has great planning, but limited impact on improving the production environment’s ability to sustain outages.
Increasing more effort on Disaster Recovery & Contingency Planning results in a diminishing return.
Focusing more on Resiliency, IT teams can reduce efforts and costs as DR and CP goals are met through Resiliency implementation.
End result is a resilient application infrastructure.
9
Appendix
• Anatomy of a Recovery Event
Area of Potential Data Loss
An Application’s Data
RTO, RPO, RTC, TTTR & BTTR (Visually) – Anatomy of a Recovery Event
10
Timeline of Incident/Outage
RPO
No Data
Business Decision Window
BusinessResumption
TakeAction
InfraReady
GoodData
InconsistentData
RebuildingData
GoodData
DataAvailable
Fixed(if applies)
Business Time To Resume(BTTR)
Recovery Time Objective (RTO)
Technology Time To Recover (TTTR)
Return To Capacity (RTC)
Application Recovery
Business RecoveryInfrastructure Recovery
People / Facilities Recovery
Time To Fix
All Hands Fix
IncidentStart
Fix It Outage Time
Recovery Outage Time