Confidential1
Data De-DuplicationVMUG Dallas
March 26, 2008
Kyle GreenDirector, South Central [email protected]
Confidential2
Where are we?
Confidential3
Where are we?
Confidential4
Storage Array 1:1
LZ Compression ~2x
White space reduction
Single Instance Storage ~5x
File level
Fixed Block ~8x
Fixed blocks, snapshots
Data Deduplication Significantly Reduces• Power• Heat• Cooling• Management
Data Deduplication Significantly Reduces• Power• Heat• Cooling• Management
Hierarchy of Data Reduction Types
Data Deduplication
~20x
Confidential5
Deduplication Storage Systems > 3,700 systems installed > 1,500 customers > 325 petabytes under Data Domain protection worldwide
A History of Industry Firsts
Data Domain: Leadership and Innovation
First Dedupe NASFirst Dedupe NAS
First Dedupe Volume ReplicationFirst Dedupe Volume Replication
First Dedupe GatewayFirst Dedupe Gateway Largest Dedupe ArrayLargest Dedupe Array
First DedupeDirectory ReplicationFirst DedupeDirectory Replication
First Dedupe VTLFirst Dedupe VTL
2003 2004 2005 2006 2007
First Dedupe Nearline StorageFirst Dedupe Nearline Storage
Confidential7
Storage 3.0 - The Long Term Play
Storage1.0 PRIMARY TAPE
Storage2.0 PRIMARY
SATA & RAID TAPE
Storage3.0
PR
IMA
RY
Deduplicated Storage
TA
PE
Confidential8
Key Attributes of Data Domain Technology
Easily Integrates with Existing Infrastructure
Retention: Deduplication
Recovery: Data Invulnerability Architecture
Replication: WAN Efficient
Data Domain Deduplication Storage for Nearline Applications
Confidential9
Today’s Data Protection Challenges
Challenges Massive data growth Economic pressures Regulatory compliance Challenges with tape
• Questionable reliability• Mechanical failures• DR via trucks• Longer recovery times
The Solution
Confidential10
Easily Integrates with Existing Infrastructure
3U(15) 500 GB SATA drives
RAID-6NVRAMN+1 Fan
1 - 4 Ports5.4 to 21.6 TB with Shelves
File System
(Gateway to: EMC, HDS, Nexsan, Pillar, NetApp, 3PAR)
CIFS, NFS, NDMP
Ethernet
FC = VTL
Replication
No rip and replace.
…
plus other nearline
applications
Confidential11
Second Friday Full BackupSecond Friday Full Backup
B C D E F L G H
Data Deduplication: Under the Hood
A B C D E F G H I J
Friday Full BackupFriday Full Backup
A B C D A E F G
Mon IncrMon Incr A B H
Tues IncrTues Incr C B I
Thurs IncrThurs Incr A C K
Weds IncrWeds Incr E G J
BACKUP DATA LOGICAL ESTIMATED PHYSICALREDUCTION
Monday Incr 100 GB 7-10x 10 GB
Tuesday Incr 100 GB 7-10x 10 GB
K L
Wednesday Incr 100 GB 7-10x 10 GB
Thursday Incr 100 GB 7-10x 10 GB
2nd FRIDAY FULL 1 TB 50-60x 18 GB
TOTAL 2.4 TB 7.8x 308 GB
FRIDAY FULL 1 TB 2- 4x 250 GB
Store more backups in a smaller footprint.
Confidential12
Longer Retention: Store More with Less
Week 1Week 1
BACKUP DATA LOGICAL ESTIMATED PHYSICALREDUCTION
April 14 3.8 TB 10x 366 GB
April 21 5.2 TB 12x 424 GB
April 28 6.6 TB 14x 482 GB
May 31 12.2 TB 17x 714 GB
June 30 17.8 TB 19x 946 GB
TOTAL 23.4 TB 20x 1178 GB
April 7 2.4 TB 8x 308 GB
Over 1 year of retention in 3µ of Data Domain protection storage.
Week 2Week 2
Week 3Week 3
Month 1Month 1
Month 2Month 2
Month 3Month 3
Month 4Month 4 July 31 23.4 TB 20x 1178 GB
Confidential13
Inline Deduplication for Optimized Time-to-DR
Post-process DR restore point is usually obsolete
Replicate During Backup
DR-ReadyData DomainInline Dedupe/
Replication
Backup to Cache Dedupe & Replicate DR Ready
Post-ProcessDedupe
VTL/Tape/Truck Backup to VTL Copy to Tape Truck to DR Site
DR-Ready
Backup WindowAdditional 2-3x backup time
to get to DR Ready
Confidential14
In Line vs Post Process
5 TAddressable
5 TAddressable
5 TB Initial Full Backup @ 2:1 Deduplicated inline @ 60MB/s – 2.5T written
Initial Full Cached Data
5 TB Initial Full Backup @ 2:1 Deduplicated Post Process @ 30MB/s – 2.5T cached to disk while 2.5T deduped to 1.25T
Deduped Data
500 GB Incremental Backup @ 7:1 Deduplicated inline @ 60MB/s – 71 GB written Daily. 426 GB Total (6 Days Inc) 2.926 TB Total written to the system
500 GB Incremental Backup @ 7:1 Deduplicated Post Process @ 30MB/s – 250G Cached to disk while 250 deduplicated to 36GB. Remaining deduped after backup 2.926 T Total Written
5 TAddressableInitial Full
5 TAddressableInitial Full
5 TAddressableInitial Full
5 TB Subsequent Full Backup @ 50:1 Deduplicated inline @ 60MB/s – 100GB written. 3.026 TB Total written to System.
5 TAddressableInitial Full
5 TB Subsequent Full Backup @ 50:1 Deduplicated Post Process @ 30MB/s – 2.5T cached to disk while 2.5T deduped to 50 GB – OUT OF SPACE
2.5T Needed. 2.0 t Avail
After 1 week retention a 5 TB post processing system is out of space for caching. All backups must slow to accommodate incoming data without caching.
2.074 TB Remaining
1.25T rem.2.5TB Remaining
2.074 TB Remaining
1.974TB Remaining
Confidential15
Recovery: Data Invulnerability Architecture
Other RAID-6 NVRAM Snapshots
Data Verification CheckSum Dedupe, write to disk Verify
Self-healing file system Cleaning Expired data Defrag Verify
Trust but verify – hope is not a strategy.
Confidential16
Replication: WAN Efficient
WAN
home
Backup Data
Backup DataBackup
Data
home
DIR A
Source: Remote Sites
Destination: Data Center Hub
95- 99% Bandwidth Reduction95- 99% Bandwidth Reduction
1- 5%
1- 5%
1- 5%
True DR; lowers WAN costs; improves SLAs.
Archive Data
Backup Data
Confidential18
So … How does this work with VMware?
Confidential19
Backing Up VMware to Data Domain
Confidential22
“…is he still talking..?” - Summary Concepts
Data Domain enables NAS, (CIFS, NFS) NDMP & VTL backup targets for all virtualized applications
• Drops into existing enterprise backup architectures• Works with Virtualized and Non-Virtualized environments• In 80/20 data centers, centralized capacity optimization provides single
instance store across all applications and systems, virtual or actual
Back-up VMs to DDR with agent, or service console level• Choose to place an agent on critical VMs for file level restore• Choose to place an agent on the service console as well• Back-up all to same DDRs and watch compression happen
Consolidated back-ups sent from proxy to DDR• If you prefer an agent free virtual machine…
Global Rule: all data is compared to all other data in the DDR
Replicate all or some to anywhere, whenever, and back • DR, test, development, virtual application migration
Confidential24
Clients Server Primarystorage
Backup/mediaserver
OnsiteRetentionStorage
Offsite Disaster
Recovery Storage
Retention/Restore Replication DRBackup
Archive to tapeAs required
WAN
Data Domain: Dedupe Simplified
High-speed, inline deduplication storage; disk target for nearline applications Any leading backup software, archive apps, or custom nearline use All data types: structured and file Any fabric: NFS / CIFS / NDMP via Ethernet, or VTL via Fibre Channel Disk storage: Internal, or gateway to SAN array One dedupe infrastructure: remote office, datacenter with inline replication
OnsiteRetentionStorage
Offsite Disaster
Recovery Storage
Data Domain
Archive
Archive Application
Server
‘Drag&Drop’ Archiving
Confidential25
Summary: Key Attributes
Easily Integrates with Existing Infrastructure No rip/replace
Retention: Deduplication for Nearline Applications Store more backups and archived data in smaller footprint
Recovery: Data Invulnerability Architecture Trust but verify – hope is not a strategy
Replication: WAN efficient True DR Lowers cost of WAN Improves SLAs
Confidential26
Summary: Simplifying Deduplication Storage
Lower TCO Much lower cost for disk-based retention Lower operational costs, smaller foot print Neutral to price of tape automation Low bandwidth for replication, DR
Faster Handles variable streams smoothly, unlike tape Better SLAs: Random access to restores and archives
Secure Designed as store of last resort No tapes on a truck
Simple Set it and forget it Any backup or archive software, any storage fabric, all data types
Confidential27
Thank You