Date post: | 22-Jan-2018 |
Category: |
Business |
Upload: | databarracks |
View: | 181 times |
Download: | 0 times |
How would you recover?Lessons from 2016’s most interesting disasters
www.databarracks.com | 2www.databarracks.com | 2
INTRO & AGENDA
Duration: 30 mins
(including Q&A)
Type questions on
the rightQ
• The most common causes of data loss in 2016
• Examination of 6 real disasters suffered by Databarracks customers in 2016
• What mistakes led to the disaster
• Recommendations for becoming more resilient
*Slides will be made available and sent out following this session
www.databarracks.com | 3www.databarracks.com | 3
THE BCPCAST
http://www.thebcpcast.com/
www.databarracks.com | 4
THE COST OF IT DOWNTIME
http://costofitdowntime.com/
www.databarracks.com | 5
CYBER ATTACK AS THE LEADING CAUSE OF DATA LOSS
2015 2016
4%
6%
8%9%
20142013
datahealthcheck.databarracks.com
www.databarracks.com | 6
FROM THE DATA HEALTH CHECK–LEADING CAUSE OF DATA LOSS
Cyber attack
9%
Hardware failure Human error
16%
23%
datahealthcheck.databarracks.com
www.databarracks.com | 7https://www.theregister.co.uk/2017/02/01/gitlab_data_loss/
www.databarracks.com | 8
GITLAB
LVM snapshots are by default only taken once every 24 hours. YP happened to run one manually about 6 hours prior to the outage
Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are
stored. According to JN these don’t appear to be working, producing files only a few bytes in size.
SH: It looks like pg_dump may be failing because PostgreSQL 9.2 binaries are being run instead of 9.6 binaries. This happens
because omnibus only uses Pg 9.6 if data/PG_VERSION is set to 9.6, but on workers this file does not exist. As a result it defaults
to 9.2, failing silently. No SQL dumps were made as a result. Fog gem may have cleaned out older backups.
Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers.
The synchronisation process removes webhooks once it has synchronised data to staging. Unless we can pull these from a regular
backup from the past 24 hours they will be lost
The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented
Our backups to S3 apparently don’t work either: the bucket is empty
“So in other words, out of 5 backup/replication techniques
deployed none are working reliably or set up in the first place.”
www.databarracks.com | 9
HAVE YOU TESTED DR IN THE LAST 12 MONTHS?
HOW WOULD YOU RECOVER?
www.databarracks.com | 11www.databarracks.com | 11
Installation
Contact with C&C
Search
Encryption
Ransom
CASE STUDY 1 – MAJOR PLAYERS
www.databarracks.com | 12www.databarracks.com | 12
LESSONS #1
If your users need to open attachments from unknown
sources:
• How can you limit the damage a ransomware attack
might inflict?
• How quickly would you be able to recover?
• How much data would be lost?
www.databarracks.com | 13www.databarracks.com | 13
CASE STUDY 2 – RANSOMWARE #2
Installation
Contact with C&C
Search
Encryption
Ransom
IT manager leaves business
www.databarracks.com | 14www.databarracks.com | 14
LESSONS #2
• What happens if the person or people responsible for
your IT aren’t available?
• How many people have access beyond what they
really need?
• Do you remove access properly for leavers?
www.databarracks.com | 15www.databarracks.com | 15
CASE STUDY 3 – STOLEN SERVERS
www.databarracks.com | 16www.databarracks.com | 16
LESSONS #3
• How secure is your office, server room and data
centre? (Do you have CCTV and access control?)
• Is there a possibility of data loss through the physical
removal of your hardware?
• How long would it take to source replacement
hardware or to recover at a second site?
• Would you be able to do so if you only lost a small
sub-set of systems?
www.databarracks.com | 17www.databarracks.com | 17
CASE STUDY 4 – PERMISSIONS ON FILE SYSTEM ERROR
www.databarracks.com | 18www.databarracks.com | 18
LESSONS #4
If you made a similar mistake:
• How long would it realistically take before you found
out?
• How long would it take to restore normal
permissions?
• What would be the impact if all employees had
access to sensitive customer, financial and HR data?
www.databarracks.com | 19www.databarracks.com | 19
CASE STUDY 5 –COMDEMMED BUILDING
www.databarracks.com | 20www.databarracks.com | 20
LESSONS #5
•What would you do if you lost access to
your premises?
•Does your backup and recovery plan
account for all users needing to
continue to operate?
www.databarracks.com | 21www.databarracks.com | 21
CASE STUDY 6 – HOLBORN FIRE, YORK FLOODS, RANSOMWARE
www.databarracks.com | 22www.databarracks.com | 22
LESSONS #6
•Do you have skills in place to recover
from a range of different risks?
•Does your service provider have the
capacity to cope with multiple
invocations in parallel?
www.databarracks.com | 23www.databarracks.com | 23
SUMMARY
• Put in methods to limit the damage of a ransomware attack
• Do not allow greater access than necessary for users
• Make server rooms secure with CCTV and access control
• Make sure your DR plans include the option that all users need
to continue working
www.databarracks.com | 24
RESOURCES
• The Business Continuity Podcast
– http://www.thebcpcast.com/
• Tabletop testing simulator
– https://tools.databarracks.com/
dr-tabletop-simulation/index.html
• The Cost of IT Downtime
– http://costofitdowntime.com/
• Data Health Check
– http://datahealthcheck.databar
racks.com/
• GitLab data loss
– https://www.theregister.co.uk/2
017/02/01/gitlab_data_loss/
Thank you