ARCHIVING FOR DATA PROTECTION IN THE
MODERN DATA CENTER
Tony Walker, Dell, Inc.Molly Rector, Spectra Logic
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved. 2
SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions:
Any slide or slides used must be reproduced in their entirety without modificationThe SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations.
This presentation is a project of the SNIA Education Committee.Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney.The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information.
NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Agenda
What archive isWhat archive is notDifferences between archive and backupChallengesWhy archiveRecommendationsBenefitsActive Archives – definition and benefitsCase studies of Active ArchivesEvolution of tape-based storage and role in archiving
3
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved. 4
Abstract
Archive We’ll establish a standard definition of data archiving, forces driving it’s adoption in enterprise environments and the benefits to be achieved with an optimized approach. We’ll also review data lifecycle divisions and subcategories to understand where data should be stored (including storage tiers) and why (how to value data). By examining guidelines and policies for an effective archiving system we’ll review how companies can optimize their storage network and reduce the bandwidth requirements needed for replication and disaster recovery – necessary elements to satisfy data integrity and regulatory compliance requirements. In addition, we’ll review how today’s active archive solutions enable affordable, online, long-term data retention through case studies from the healthcare, broadcast, finance, education and oil/gas industries. Learning about the software applications available to manage data archives will help companies facilitate user online search and access
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
What archive is
SNIA – A collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data.Archives are long term repositories for the storage of records. Electronic archives preserve the content, prevent or track alterations and control access to electronic records. (Source: Sedona Principles)Specialized repository (including the supporting processes, policies, hardware, and software) used to preserve information and data for the long term. (Source: Building a Terminology Bridge for Digital Information Retention and Preservation Practices)
5
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
What archive is not…
NOT backupNOT data protectionNOT user selected, but policy drivenNOT “keep everything forever” – deletion policies are criticalNOT storage tiering
Moving older data to lower cost tiers for storage reclamationStorage that is physically partitioned into multiple distinct classes based on price, performance or other attributes. Data may be dynamically moved among classes in a tiered storage implementation based on access activity or other considerations. Source: SNIA
6
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Archive vs. Backup
ArchivePrimary record of dataRetention and access Long-termData maintained for analysis, value generation, history, or complianceOnline or nearlineNot data protection
BackupSecondary copy of dataRecoveryShort-termData typically overwritten periodicallyNot archive
7
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Archive vs. Tiered Storage
ArchiveRetention for future reference or compliancePolicies determined by regulations
Dynamic Tiered StoragePrimary storage reclamationPolicies driven by access patterns
8
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Challenges – Tiered Storage
Primary storage overflowing with aged data95% of data growth is unstructured data which is rarely accessed after creation*Issues with existing data management policies sometimes compound the data growth problemBackups taking too long; backing up static dataReluctance to address the issue due to lack of understanding the value of the dataLack of visibility into access patterns
9
* IDC Multimedia White Paper, "As the Economy Contracts, the Digital Universe Expands," Sponsored by EMC, May 2009.
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Challenges – Archiving
Regulatory requirements:RetentionDisposalChange history
Important information at risk of being lostReferencePerceived as difficult to implementNo one cares – sometimes even inside IT
10
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Why archive?
Determine retention requirements on creation. If archive, then why archive?
Because you want toBecause you have to
11
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Retention Requirements
12
Source: 100 Year Archive Requirements Survey
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Archive – Because you want to
Data has some value to the businessRequires policies and software to enable migration from the production environment into the archive environment
13
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Archive – because you have to
Data retention is required for legal reasonsLegalComplianceRegulatory
Archive policies typically require legal approval
14
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
“Some” Legal Requirements
FRCP – Federal Rules of Civil ProcedureRule 16 – Pretrial ConferencesRule 26 – General ProvisionsRule 33 – Interrogatories Rule 34 – Production of informationRule 37 – Failure make disclosure or cooperate
SEC 17 – specifies retention, authenticity discovery, security, DR with penalties for noncomplianceHIPAA – requires security, privacy, authenticity, retention with penalties for noncomplianceSOX – penalties for knowingly altering, concealing, or destroying informaiton relative to an investigationGramm-Leach-Bliley Act – protect PII confidentiality and authenticity with penalties for noncomplianceCFR – requires preservation, authenticity, retention, protection, availability 15
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Archive
16
Tier 1
Tier 2
Tier 3
Per
form
ance C
ost
• Data is moved from the production environment into the archive environment. This movement removes data from the production environment.
• Archive occurs across all storage tiers• The Archive environment provides long term data retention and
disposition• The Archive environment is not just another storage tier
Archive E
nvironment
DR
Cop
y
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Recommendations
Archiving is not an isolated / stand-alone solution. Archiving should be part of a broader data management view. Do not implement an archive solution and simply move your data management problem from one place to another (ex. “archive everything forever”)Establish retention periods and disposition polices for all business important information at creationClassify information upon creation and save a copy of all required records in a proper preservation-class archive upon creation
17
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Archive Benefits
Meet compliance requirementsPreserves business important informationSimplified data managementSimplified backup/recovery operationsSimplified disaster recovery operations
18
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
StoragePyramid
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Active Archive Approach
Extended File System
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Active Archive Result:
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Active archiveA set of unstructured data such as office files and documents, video/audio files, email PST files and CAD/CAM files, that contains production data, no matter how old or infrequently accessed, that can accessed online.
Fueled initially by introduction of high density, lower power disk drivesMomentum continued to build with release of power efficient disk arrays and high density, lower power disk drives.Next generation archives can leverage the latest in automated tape technologies offering high density, low power, cost effective archive storage
2323
Definitions
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
DATA PROTECTIONAPPLICATION
ARCHIVE
The Evolution of Active Archive
BackupOnsite/Offsite Disaster Recovery
Offsite
Copy of DataActive Archive of Primary Data Set
Host
Data Creation
ACTIVE ARCHIVEAPPLICATION
2000 2004 2008 2010
24
SCSI/FC
SATA
/SSD
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
• Up to 70% of capacity of every disk drive installed today is misused
– 40% of data inert– 15% allocated but unused– 10% orphan data– 5% contraband data
• Disk storage accounts for between 33 and 70 cents of every dollar spent on IT hardware: trend accelerating So
urce
: M
akin
g IT
Mat
ter (
Chal
fant
/Toi
go, 2
009)
Ineffective Storage Management
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Requirements for Active Storage
26
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Tape technologies are reliable…
Reliability has increased 700% over the technology available a decade earlier
• Advances in the coating of tape film
• Read-after-write data verification
• Error correction codes
• Drive technology features simplified tape paths and servo tracking systems
Beech, Debbie. “Best Practices for backup and long-term data retention” SylvaticaWhitepaper. The evolving role of disk and tape in the data center. June 2009
27
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Some libraries today have the intelligence to proactively alert if a media, drive or hardware issues are developing
Gone are the days of not knowing if…A tape has been used beyond any manufacturer thresholdsThere has been environmental damage to mediaFailures are drive or media relatedData is on the tape media
28
…and even more reliable in intelligent libraries
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Tape storage offers high energy efficiency
29
Tape Library Disk System
Clipper Group Study, “Tape and Disk Costs”- Starting with 125 TB of data - Assuming 14.5 cents per kW/Hr increasing by 10% per year
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Tape libraries offer the greatest storage density
Reclaim space: 44 – 218%Low density disk array
Same capacity, smaller
footprint30
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved.
Tape Libraries are Fast…
31
* Based on benchmarked data; assumes 2:1 data compression* Based on 120 LTO-5 Drives* *Times vary based on library and tape drive in use
…really fast!
Write to tape: 121 TB/hr*
Access from tape: 65-75 seconds avg**
Archiving for Data Protection for the Modern Data Center © 2010 Storage Networking Industry Association. All Rights Reserved. 3232
Q&A / Feedback
Please send any questions or comments on this presentation to SNIA: [email protected]
Many thanks to the following individuals for their contributions to this tutorial.
- SNIA Education Committee
Tony Walker Name of contributor hereMolly Rector Name of contributor hereName of contributor here Name of contributor hereName of contributor here Name of contributor hereName of contributor here Name of contributor hereName of contributor here Name of contributor here