1© Copyright 2009 EMC Corporation. All rights reserved.
Information Archiving in the Public Sector
Jean-Pierre [email protected]
2© Copyright 2009 EMC Corporation. All rights reserved.
Agenda
Trends and drivers
Information archiving challenge
Archive storage options
EMC Solutions– Architecture– Improve operational efficiency
Summary
3© Copyright 2009 EMC Corporation. All rights reserved.
Top Business Imperatives in Public Sector
Improve delivery of citizen services
Create a more efficient government
Protect citizen information
Ensure government preparedness
Provide for public safety
4© Copyright 2009 EMC Corporation. All rights reserved.
The Digital Universe: 2009 Update
About the only growth rate that hasn’t gone negative since the recession began is the creation of new digital information
The Digital Universe continues to double every 18 months; it is safe to say that the challenges for enterprise IT will more than double every year
Source: IDC Digital Universe White Paper, Sponsored by EMC, May 2009
5© Copyright 2009 EMC Corporation. All rights reserved.
“My information is growing >50% per year. But my IT budget is 14% less than last year.”
Information Storage Dilemma
VideoSurveillance
Imaging
Call CenterVoice RecordingHistorical Data
Inventory Data
E-mail DataControl Data
Real Time Data
Web Data
Financial Data
Employee Data
6© Copyright 2009 EMC Corporation. All rights reserved.
The Digital Universe, IT Budgets, and IT Staff:Growth Over Four Years
CAGR (2008 thru 2012)
Source: IDC Digital Universe White Paper, Sponsored by EMC, May 2009
DigitalUniverse
WW ITSpending
WW ITStaffing
0%
40%
80%
7© Copyright 2009 EMC Corporation. All rights reserved.
Information Protection Challenges Continue
Performance Not meeting backup windows Cannot provide adequate restore levels Backup speed versus recovery SLAs Reliability of tape infrastructure Inability to back up remote offices
Costs Media purchases Offsite tape handling Tape and library upgrades Media migrations Requirements to keep more information longer
Management and security Losing offsite tape Constant troubleshooting eDiscovery for litigation Limited oversight at remote offices
7
As the Economy Contracts, the Digital Universe Expands !
3,892,179,868,480,350,000,000 more than prior 2008 projection.
Digital Universe expected to double in size each 18 months !
May 2009 Update
Growth of the “digital universe” to 988
exabytes by the year 2010
— IDC
6xMarch 2008
8© Copyright 2009 EMC Corporation. All rights reserved.
50%-70% of E-mail Messages More than
Four Months Old
Over 60%-80% of Files More than Six Months
Old
What data is out there ?
Message Age BreakdownFile Age Breakdown
9© Copyright 2009 EMC Corporation. All rights reserved.
START 1.1 MB
E-mail with Attachment
Document1 MB
RedundantBackup
E-mail Server/Desktop Backup
Tape Back-up
E-mail withdoc 1.1 MB
1.1 MB1.0 MB2.1 MB
2.1 MB
SENT TO FOURCOLLEAGUES
Backup
Tape Back-up
Exponential Growth in Backup & Archive
Source: IDC White Paper, "The Diverse and Exploding Digital Universe," Sponsored by EMC, March 2008
4.2 MB4.2 MB 8.8 MB8.8 MB
2.2 MB
2.2 MB
2.2 MB
2.2 MBE-mail withdoc 1.1 MB
E-mail withdoc 1.1 MB
E-mail withdoc 1.1 MB
E-mail withdoc 1.1 MB
2% 10%
16%72%
Primary
Primary Copies
Backup
Backup Copies
10© Copyright 2009 EMC Corporation. All rights reserved.
Current Methodology
Traditional production issues– Data growth in tier 1 costly and difficult to manage– Constant tuning to maintain performance – Continually adding expensive tier 1 resources
Traditional backup issues– Backup windows not being met, some jobs don’t finish– Too many copies of same data in backup environment– SLAs met through adding hardware and software– Recoveries take too long, or fail altogether– Too much time spent troubleshooting
Traditional archiving issues– Duplicate data from backup environment saved in archive– Accessing archived data takes too long and is difficult– Humans handling tapes introduces risk– No investment protection for long term retention requirements
A new approach is neededArchive
Backup
Primary
11© Copyright 2009 EMC Corporation. All rights reserved.
Backup Archive
Secondary copy of information Primary copy of information
Used for recovery operations Available for information retrieval
Improves availability by enabling application to be restored to a specific point in time
Adds operational efficiencies by moving fixed/unstructured data out of the operational environment
Typically short-term (weeks or months) Typically long-term (months, years, even decades)
Data overwritten on periodic basis (monthly) Data retained for analysis or compliance
Not useful for compliance Useful for compliance
ArchiveBackup
Backup and Archiving are Fundamentally Different and Complementary
12© Copyright 2009 EMC Corporation. All rights reserved.
Backing up too much data
51% of open system data is unnecessary,
duplicate, or non-business related*
70% of data on Windows has not been accessed for 90 days or
more
55% of unplanned server outages occur
from disk space consumption
*Source: SNIA
10 copies - 10 MB 10 x 12 = 120 copies120 MB
Keep 3 months of full backups
Saved by 10 employees
1 copie - 1 MB
Initial e-mail attachement
One 1 MB e-mail 120 MB of copies
13© Copyright 2009 EMC Corporation. All rights reserved.
How Do You Manage Growth—By Adding Storage Capacity?
Larger mailboxes mean…
Slower performance
Increased administration overhead
Longer backup windows
More data to replicate
Longer restores
Duplicate copies
Cap
aci
ty R
equ
irem
ents
(T
B)
Year 1 Year 3 Year 5
120
100
80
60
40
20
0
Production capacity requirements
Source: EMC ROI/TCO Analyst estimates
14© Copyright 2009 EMC Corporation. All rights reserved.
How Do You Manage Growth—By Restricting Mailbox Sizes?
Strict mailbox quotas…
Delegate management to users !
Force the creation of personal archives (e.g., PST, Notes local archives)
Pose security and eDiscovery risks
User desktops
Mail servers
File servers
EnterpriseSAN
15© Copyright 2009 EMC Corporation. All rights reserved.
Archive First !
Backup/Recovery Process
PrimaryApplication
2
3
1
3
ArchiveProcess
Archive valuable information to tiered infrastructure– Improves TCO through use of tiered storage– Recovers capacity on primary resources
Backup to disk active production information– Much less content to backup, greater likelihood of full
backups– Backup to disk offers large increase in performance, reliability
Retrieve from archive or recover from backup– Backups focused on recovery—no longer force-fit for
retention– Archive information is now available for new business uses
16© Copyright 2009 EMC Corporation. All rights reserved.
Archive First – The Effect
-500.00
0.00
500.00
1000.00
1500.00
2000.00
2500.00
3000.00
2002 2003 2004 2005 2006 2007
Year
GB
BuRA Eff ect Storage Growth
Current Storage Growth
Projected CURRENT Storage Growth through 2010
Projected BuRA EFFECT Storage Growth through 2010
The Value: Do more With Less
17© Copyright 2009 EMC Corporation. All rights reserved.
What drives archiving?
Experience operational benefits– Cost savings– Streamline operational systems– Improve backup/restore times
and ...
Sometimes you have to– Legal reasons, compliance, corporate governance/audit
Sometimes you want to– Keep valuable information for future reference
18© Copyright 2009 EMC Corporation. All rights reserved.
Network
Copies all documentsto other site(s)
What should an ideal digital archive look like?
Never fills, ever
Self Manages & Self heals
Stores just 1 copyOf any document
Finds what you wantWhen you want it, wherever you are
Helps you comply withregulatory needs
Continuously checkthe integrity
of data
Doesn’t Need Backup
Never loses anything. Doesn’t allow deletes until a specified date.
Cannot be broken in to
19© Copyright 2009 EMC Corporation. All rights reserved.
Tape and optical Too labor intensive (management and repair) Too many technology turns, migrations ! Prone to media damage Slow response times Information is inaccessible when offsite
SAN/NAS ATA arrays Provides online access Too labor intensive, provisioning tasks ! No intelligence to:
– Simplify management
– Reduce the quantity of storage consumed
– Assure information authenticity
– Improve TCO beyond acquisition price
Archiving Options
20© Copyright 2009 EMC Corporation. All rights reserved.
Centera4-Node
Centera
ContentAuthenticity
Easy toManage
LowTCO
EMC Centera for Archiving
Proven, “purpose-built” archiving platform– >5,000 customers– >11,000 systems shipped– Integrated with >270 ISVs– >370 PBs installed
Designed to store billions of data objects for decades
Guaranteed content authenticity and online access
Content-addressed storage – Decouple ‘address’ of an item from the storage structure
Highly available, high performance – Five-9s: no single point of failure– Linear performance scaling
21© Copyright 2009 EMC Corporation. All rights reserved.
Centera stores data and metadata
Make the archive self-describing
Make the content self-identifying
Make information more portable
Excellent for long-term archiving
Centera Object based Storage Enabling the power of meta data
22© Copyright 2009 EMC Corporation. All rights reserved.
No complex storage area networking management
No LUN/RAID Group carving or allocation
No file systemmanagement
Centera: Low TCO
Investment protection—multi-generation hardware support
One addressable pool—ingestion machine for content
Constant validation of content objects and structures
Only one copy of an object is stored
23© Copyright 2009 EMC Corporation. All rights reserved.
Application stores Content Address for future reference
EMC Centera performs Content Address
calculation and sends address back to
application
Application server sends object to EMC Centera over IP network
Object is created and sent to application server
LANCA
CA
Content Address
10001010 Digital
fingerprint
Globally unique
Location- independent
ContentAddressalgorithm
Content Addressalgorithm
10111011
How EMC Centera Works: Application Example
24EMC CONFIDENTIAL—INTERNAL USE ONLY
EMC SourceONE : Email Management Message Flow
Organize and Classify• Address Rules• Retention and disposal policies• Unique ID for single instancing
Archive Messages• Better than 2:1
compression• Create container files
Full Text Index• Optional indexing of
messages, attachments• Includes embedded and Zip
messages (20 layers deep)• Ability to set indexing policy at
folder level
Store Container Files • Copy containers to Email
Management server storage
• Write to archive storage device on set schedule
Messaging servers
Email Management
Server(s)
EMC CenteraEMC Celerra
Collect• Real-time• Scheduled • PST/NSF
collection• User
Directed Archiving
• Centera sends back digital finger print “content address”
Discovery Manager searches• Search and collect content in archive
for discovery searches• Put on legal hold
User/Admin searches • Web-based search to
access archive
© Copyright 2009 EMC Corporation. All rights reserved. EMC SourceOne :: Email Management
25© Copyright 2009 EMC Corporation. All rights reserved.
Standardization: What is XAM?
eXtensible Access Method (XAM) is:
An industry-standard interface definition (e.g., API)
Between “consumers” (application and management software) and “providers” (storage systems) of fixed content storage services
XAM specification is controlled by Storage Networking Industry Association (SNIA)
XAM feature highlights
XAM decouples the software application from the storage platform– Applications write to XAM; don’t need knowledge of the storage device
An unlimited number of objects can be stored – The object is independent of the platform– Not subject to limitations of file systems
Metadata is bundled with objects– The archive can be searched without involving applications– Record retention and disposition for governance/compliance
ILM Framework for classification, policy, and implementation
XAM
28© Copyright 2009 EMC Corporation. All rights reserved.
Remember : What drives archiving?
Experience operational benefits– Cost savings– Streamline operational systems– Improve backup/restore times
and ...
Sometimes you have to– Legal reasons, compliance, corporate governance/audit
Sometimes you want to– Keep valuable information for future reference
Make sure you go for a solution that can be used in all use-cases.Compliancy archiving will be mandatory sooner or later !
29© Copyright 2009 EMC Corporation. All rights reserved.
Thank You