Post on 25-Jul-2020
transcript
Disaster Recovery 101
Sudarshan Ranganath & Matthew Phillips
Ellucian
SESSION OBJECTIVES
Business continuity is critical to every institution and its IT organization. How do you set up your ERP and other Tier 1 apps to reduce the risk of a disaster, and quickly recover from one should disaster strike? Learn about the infrastructure and practices that Ellucian’s Cloud Services uses to minimize the impact of a disaster for your Banner systems.
AGENDA
3 May 2, 2013
Tier -1 Apps
Disaster Prevention
Disaster Readiness
DR Execution
DR Options
TIER 1 APPS FOR DISASTER RECOVERY /
BC
4 May 2, 2013
Communication
SIS/ERP
LMS
CRM
Other Financial/Operational
RISKS THAT COULD IMPACT YOUR
OPERATIONS
Frequent Likely Occassional Seldom Unlikely
Catastrophic
Critical
Moderate
Negligible
PriorityExtremely
HighHigh Moderate Low
Probability
Seve
rity
Disk
Failure/
Trip over
a wire
CPU
Failure
Hurricane
/Flooding Security
Breach
Hit by
Tornado
Power
Outage
Staff
Attrition
Demand
Surge
Impact
• Business
Interruption
• Financial
• Legal
• Reputational
Causes
• Natural disasters
• Human errors
• Technological failures
KEY KPIS YOU CARE ABOUT AS AN IT
ORGANIZATION
• Application Availability Downtime
• User Experience for Public and Private apps Performance
• Number of Security Incidents
• Extent of compromise per Incident Security
• Ability of current infrastructure to handle load
• Time to add capacity in response to demand spike Scalability
• Probability of a disaster affecting the datacenter
• Time to recover from a site-level disaster Disaster Recoverability
• Time to update to newest version after being made available by vendor
Backup Currency
Software Currency
Stakeholder Support
Costs/Investment Efficiency
• Lost work product because of inefficient backup practices, and aging of backed-up data as a result
• Effectiveness in furthering student/staff satisfaction
• TCO to operate solution, ROI for every $$ invested
DISASTER PREVENTION
7 May 2, 2013
Power
Facility
Network
Hardware
Application Architecture
Replication
Process
DISASTER PREVENTION - POWER
8 May 2, 2013
Multiple Utilities or Stations
A and B power Grids
All components connected A&B
UPS Generator
Generator Backup
Fueling agreements for outage >2 days
DISASTER PREVENTION - FACILITY
9 May 2, 2013
Multiple Physical Entries for Power, Network
Hardened Walls and Roof
Temperature – Humidity
Secure personnel and equipment Entries
Multi-stage Fire Detection
DISASTER PREVENTION - NETWORK
10 May 2, 2013
Multiple Internet connections
Multiple ISP providers
Redundant firewalls
Redundant core network
Servers, storage redundant connections
DISASTER PREVENTION - HARDWARE
Redundancy is key at every level
SAN vs. non-SAN
Virtualization vs. Dedicated Server Hardware
Redundant cold/warm/hot hardware in DR location
May 2, 2013
DISASTER PREVENTION - APPLICATION
ARCHITECTURE
Again… redundancy is key at every level
DB tier and App tier
Monitoring & alerting considerations
Integrations
Customization and Modifications
Licensing
May 2, 2013
Backup architecture considerations
OS is static
Application tier is static
Database backup considerations
Database backup architecture
Fullexp, RMAN, cold, custom hot
Archive vs no-archive mode (prod vs non-prod)
Data-Domain style vs Tape architecture
Architecture must consider RTO and RPO
DISASTER PREVENTION - APPLICATION
ARCHITECTURE
May 2, 2013
14 May 2, 2013
DISASTER RECOVERY REPLICATION
Backup Process
Replication Process
Recovery Point
Recovery Time
DISASTER PREVENTION - PROCESS
15 May 2, 2013
ITIL® Change Management
Incident Management
Shutdown / Startup Processes
Access Control / Role Based Security
Training
DISASTER READINESS
How do you test your readiness for disaster
Failover Test
Power
Network test
VM test
Application / Database test
Monitoring test
17 May 2, 2013
EXECUTION WHEN YOU HAVE A
SITE LEVEL DISASTER
Requires People & Process
Facility to restore
Infrastructure (Network, servers, storage, Recovery software, DNS)
Most Recent Backups
Prioritization
Move IP networking from primary to DR
Recover Virtual Machines
Recover Databases
Recover Apps
Integrations to other systems
18 May 2, 2013
DISASTER RECOVERY STRATEGIES
Strategy RPO RTO Cost
Server Replication Secs – Min < Hr $$$$$
SAN Replication Min – hours Hours – Day $$$$
VM + DB logs Hours – day Hours - days $$$
Offsite Tape + DR
Contract
Days Days-weeks $$
Offsite Tape Days Months $
SUMMARY
• DR is about
• Planning & Testing Readiness
• Prevention, Readiness, Execution