© 2008 Kroll Ontrack Inc. | www.krollontrack.com
Recovering Your Virtual Data
April 29, 2009
David LogueSr. Data Recovery Engineer
2
Learning Objectives
Identify common data loss scenarios in virtual environments
Challenges with recovering virtual data
Recommendations when virtual data loss occurs
Design recommendations for a virtual environment with data loss prevention in mind
3
BSOD or PSODDo They Give You Chills?
4
Common Data Loss Scenarios
Hardware failures RAID Disk
Software failures File System Data Corruption Database Corruption VMware Metadata Corruption
Human error Deleted Overwritten Formatted (Guest and Host level)
5
Common Data Loss Scenarios Failure Types
6
Common Data Loss Scenarios Failure Types
Source: Over 100 Kroll Ontrack Virtual Data Recovery Jobs Over the Past 12 Months
7
Learning Objectives
Common data loss scenarios in virtual environments
Challenges with recovering virtual data
Recommendations when virtual data loss occurs
Design recommendations for a virtualenvironment with data lossprevention in mind
8
Challenges with Recovering Virtual Data
Recovery of multiple guests on a single volume
Snapshots, logs and swap files add complexity
Virtual file system fragmentation
Size of the recovery
Lack of a good backup that has been tested
Using traditional methods of recovery, such as restore, may make the problem worse
9
Case Study – Hospital in CrisisInitial Facts
Hospital had a 5 drive RAID 5 array attached to their VMware ESX server (1.2TB volume)
The array hosted 4 MS Windows 2003 Server virtual machines running MS SQL 2005 which contained their patient medical records
The RAID controller failed
Hospital replaced the RAID controller and rebooted
All of the drives stayed offline after the reboot
10
Case Study – Hospital in CrisisCustomer Plan
Force the drives online and rebuild
If that failed, restore from backup
If that failed, recreate the missing patient data from other sources
11
Case Study – Hospital in CrisisAdditional Options
Customer contacted Kroll Ontrack for a free Data Recovery consultation.
Kroll Ontrack’s recommendations:
Image the drives before starting the restore/rebuild process
If the restore or rebuild fails: Start a Remote Data Recovery or Ship drives to Kroll Ontrack for recovery
12
Case Study – Hospital in CrisisAdditional Customer Challenges
The customer imaged the drives
The customer forced the drives online and determined:
The RAID configuration was damaged and One of the drives was out of date (degraded)
Forcing a rebuild with a degraded would cause additional damage
Backups did not include the SQL data
Time to recreate data – 3 months to 2 years
13
Case Study – Hospital in CrisisKroll Ontrack to the Rescue
The customer contacted Kroll Ontrack
Kroll Ontrack connected the customer remotely and started the evaluation and recovery
14
Case Study – Hospital in Crisis VMware ® recovery overview
15
Case Study – Hospital in Crisis VMware ® recovery overview
Locally attached drives, SANs, iSCSI, NFS Storage.
16
Case Study – Hospital in Crisis VMware ® recovery overview
Locally attached drives, SANs, iSCSI, NFS Storage.
Software RAID manager used to replace RAID controllers that are no longer presenting the LUNs correctly. Supports all the types of RAID configurations.
17
Case Study – Hospital in Crisis VMware ® recovery overview
Locally attached drives, SANs, iSCSI, NFS Storage.
Software RAID manager used to replace RAID controllers that are no longer presenting the LUNs correctly. Supports all the types of RAID configurations.
Virtual device presented by the RAID manager. It is seen by the tools as if it was the original device.
18
Case Study – Hospital in Crisis VMware ® recovery overview
Locally attached drives, SANs, iSCSI, NFS Storage.
Software RAID manager used to replace RAID controllers that are no longer presenting the LUNs correctly. Supports all the types of RAID configurations.
Virtual device presented by the RAID manager. It is seen by the tools as if it was the original device.
Specialized recovery tools are used to recover from corruption inside most any file system.
19
Case Study – Hospital in CrisisInside the RAID
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
Kroll Ontrack Raid Manager
KO Rollback Layer
The RAID failure was causing VMware data to be inaccessible, Ontrack replaced the RAID controller with software to get to the data.
20
Case Study – Hospital in CrisisInside the RAID
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
Kroll Ontrack Raid Manager
KO Rollback Layer
VMFS Metadata
VM1-VMDK1 VM1-VMDK2 VM2-VMDK1 VM2-VMDK2VM3-VMDK1 VM3-VMDK2
VM1-MetaData
VM2-MetaData
VM3-MetaData
Ontrack engineers mapped out the data to determine the original RAID configuration and present ed the array to our recovery tools.
This virtual Raid is then accessed like the original array for the rest of the recovery process
21
Case Study – Hospital in CrisisInside the RAID
VMFS Metadata
VM1-VMDK1 VM1-VMDK2 VM2-VMDK1 VM2-VMDK2VM3-VMDK1 VM3-VMDK2
VM1-MetaData
VM2-MetaData
VM3-MetaData
VM1 VM2 VM3
Once the array was presented, individual virtual machines were recovered from the VMFS volume
22
Case Study – Hospital in CrisisInside the RAID
VMFS Metadata
VM1-VMDK1 VM1-VMDK2 VM2-VMDK1 VM2-VMDK2VM3-VMDK1 VM3-VMDK2
VM1-MetaData
VM2-MetaData
VM3-MetaData
VM1 VM2 VM3
Once the array was presented, individual virtual machines were recovered from the VMFS volume
Proprietary NTFS and SQL recovery tools were then used to recover critical databases
23
Case Study – Hospital in CrisisConclusion
Ontrack used four levels of recovery to get to the customer data Raid recovery tools to re-assemble the original Raid configuration VMFS recovery tools to repair damage to the file system and copy
out the VMDK files NTFS recovery tools to repair the NT file system and copy out the
SQL files MS SQL recovery tools to extract the tables into a new database
Kroll Ontrack was able to get a full recovery of the critical SQL data
24
Learning Objectives
Common data loss scenarios in virtual environments
Challenges with recovering virtual data
Recommendations when virtual data loss occurs
Design recommendations for a virtualenvironment with data lossprevention in mind
25
Recommendations When Data Loss Occurs
Don’t panic and don’t update your resume
When troubleshooting, do not write any data to the storage array or change storage configurations.
Don’t format the volume that has missing data
Use the support system offered by the software provider
Restore data to an alternate location and contact a data recovery company with extensive virtual data recovery experience including the ability to perform remote recoveries
26
Recommendations When Data Loss Occurs
Definition of Data Recovery (DR) DR gets back files from corrupted or inaccessible storage (directly from
the failed system, not from a backup) DR gets back most recent files vs most recent backup In some cases, DR is faster than restoring from the last backup DR fits well as part of an overall disaster recovery plan
27
Learning Objectives
Common data loss scenarios in virtual environments
Challenges with recovering virtual data
Recommendations when virtual data loss occurs
Design recommendations for a virtual environment with data lossprevention in mind
28
Design Recommendations
Implement naming conventions for hosts, guests, physical servers and virtual file system volume
Control who has access to the environment
Document the backup and recovery plan and include the contact information of your preferred data recovery vendor in the plan
Test your backups on a regular basis
Use the tools to manage your virtual environment; don’t take shortcuts
Be careful how you use snapshots and do your housekeeping
Monitor the data stores, logs and swaps
29
Learning ObjectivesSummary
Common data loss scenarios in virtual environments
Challenges with recovering virtual data
Recommendations when virtual data loss occurs
Design recommendations for a virtualenvironment with data lossprevention in mind
30
Conclusion
Thank you!
Dave LogueSr. Remote Data Recovery Engineer
Kroll Ontrack, a Marsh & McLennan [email protected]
© 2008 Kroll Ontrack Inc. | www.krollontrack.com