Tivoli Storage, IBM Software Group
Understanding Disk Storage in Tivoli Storage Manager
D CDave CannonTivoli Storage Manager ArchitectOxford University TSM SymposiumSeptember 2007
© 2007 IBM Corporation
Tivoli Storage, IBM Software Group
DisclaimerDisclaimer
This presentation describes potential future enhancements to the IBM Tivoli Storage Manager family of productsStorage Manager family of products
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice and represent goals and objectiveschange or withdrawal without notice, and represent goals and objectives only
Information in this presentation does not constitute a commitment to deliver pthe described enhancements or to do so in a particular timeframe
IBM reserves the right to change product plans, features, and delivery h d l di t b i d d i tschedules according to business needs and requirements
This presentation uses the following designations regarding availability of potential product enhancementspotential product enhancements– Planned 5.5: Planned for delivery in TSM v5.5 (2007) – Next Release Candidate: Candidate for delivery in the next release after v5.5
Future Candidate: Candidate for delivery in future release
© 2007 IBM Corporation2 Understanding Disk Storage in Tivoli Storage Manager
– Future Candidate: Candidate for delivery in future release
Tivoli Storage, IBM Software Group
AgendaAgenda
Background
Random- and sequential-access disk in TSM
S fSpecial considerations for sequential-access disk
Potential future enhancements for disk storage
© 2007 IBM Corporation3 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Disk vs TapeDisk vs. Tape
Potential disk advantagesFaster access by avoiding delays for tape mounts and positioning– Faster access by avoiding delays for tape mounts and positioning
– Reduced management cost (no tape handling)– Avoidance of errors introduced by media handling
Potential tape advantages– Removability/portability for offsite storage (disaster recovery)– High-speed data transfer for large objects – Cost effectiveness (especially for long-term, offsite archiving)
Tiered approach, with copies on offsite tape, exploits strengths of disk and tape
© 2007 IBM Corporation4 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Industry Trend Toward Increasing Use of DiskIndustry Trend Toward Increasing Use of Disk
Lower cost of disk storage (SATA)
Promotion of disk-based appliances and solutions
( ) f fVirtual tape library (VTL) products comprised of preconfigured disk systems that emulate tape
Disk based technologiesDisk-based technologies – Replication– Snapshots (point-in-time copies)
C ti d t t ti (CDP)– Continuous data protection (CDP)– Deduplication
© 2007 IBM Corporation5 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
TSM is Designed for Disk in a Storage Hierarchyg g yDisk has been an integral part of the TSM data storage hierarchy since 1993
Virtualization of disk volumes in a storage pool allows objects to be stored across multiple volumes andVirtualization of disk volumes in a storage pool allows objects to be stored across multiple volumes and file systems
Policy-based provisioning of disk storage pool space and allocation of that space during store operations
Retention based on object-level policies rather than the tape used to store objects
Automatic, policy-based migration to tape or other media types in tiered hierarchy
Incremental backup of objects from primary disk pool to tape copy pool for availability or offsite vaulting
Objects automatically accessed in copy pool if not available in primary storage pool
Migration Copy
Store Client
DB Migration CopyCopy PoolDB
TSM database tracks location of
files in data storage
© 2007 IBM Corporation6 Understanding Disk Storage in Tivoli Storage Manager
Storage Pool Hierarchy
Tivoli Storage, IBM Software Group
Disk Usage Trend in TSMDisk Usage Trend in TSM
Traditional Disk Usage Emerging Disk UsageTraditional Disk Usage
LAN-based data transfer between client and disk storage
Emerging Disk Usage
Growing interest in LAN-free transfer between client and diskclient and disk storage
Data initially buffered on disk to allow concurrent client backups
transfer between client and disk
Data stored on disk to allow concurrent client backups without p
without tape delays
Backup from disk to tape copy t l f il bilit d
ptape delays
Backup from disk to tape copy t l ( b i i lstorage pool for availability and
disaster recovery
Most data migrated to tape within
storage pool (may be principal use for tape)
Data may be stored on diskMost data migrated to tape within 24 hours
Data may be stored on disk indefinitely for faster access
© 2007 IBM Corporation7 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
A Detour on File AggregationA Detour on File Aggregation
TSM server groups client objects into aggregates during backup or archiveInformation about individual client objects is maintained and used for certain operations (e.g., deletion, retrieval)For internal data transfer operations (migration, storage pool backup), entire aggregate is processed as a single entity for greatly improved performanceis processed as a single entity for greatly improved performance
a b c d e f g h i j k l ma
Over time, wasted space accumulates as logical files
are deletedPhysical file (non-aggregate)
with logical file a
a
Aggregate
a b c d e f g h l mb c d e f gg g
reconstruction
l ma b c g hPhysical file (aggregate)
with logical files b - f
© 2007 IBM Corporation8 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
AgendaAgenda
Background
Random- and sequential-access disk in TSM
S fSpecial considerations for sequential-access disk
Potential future enhancements for disk storage
© 2007 IBM Corporation9 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Overview of Random- and Sequential-Access DiskOverview of Random and Sequential Access Disk
TSM supports two methods for storing and accessing data on magnetic disk – Random-access storage pools (also known as DISK pools)– Sequential-access storage pools (also known as FILE pools)
Random- and sequential-access disk pools differ in how TSM manages disk storage and the operations that are supported
TSM development views sequential-access disk as strategic– Current functions on random-access disk supported for the foreseeable future
Future product enhancements involving disk storage may be offered only for– Future product enhancements involving disk storage may be offered only for sequential-access disk
© 2007 IBM Corporation10 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Basics of Random- and Sequential-Access DiskBasics of Random and Sequential Access Disk
Random-Access Disk Sequential-Access DiskRandom Access Disk Sequential Access Disk
Storage pool definition Predefined device class DISK Device class with device type of FILE
Pools spanning file systems Supported Supported
Storage pool volumes Files or raw logical volumes Files
Volume creation Define Volume commandSpace trigger
Define Volume commandSpace triggerScratch volumes
TSM caching Supported Not supported
Collocation by filespace nodeCollocation Not applicable Collocation by filespace, node or group of nodes
Use for copy storage pool Not supported Supported
© 2007 IBM Corporation11 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Space Allocation and TrackingSpace Allocation and TrackingRandom Access Sequential Access +
a b a c
Storage pool volumes
b ba b cStorage pool volume
ab b a c
Location of
Location and length of file a
Database 011010010010
Location of blocks for file a
Database
Database
Space allocated in randomly located 4KB blocksTSM server tracks volumes and blocks on which
Bit vector
Files written sequentially in FILE volumeTSM database only tracks volume and offsetTSM server tracks volumes and blocks on which
each file is storedBit vector in TSM database tracks allocated and free blocks for each volumeSpace allocation and tracking requires overhead
TSM database only tracks volume and offset at which each file is stored TSM has less overhead for space allocation and tracking
© 2007 IBM Corporation12 Understanding Disk Storage in Tivoli Storage Manager
Space allocation and tracking requires overheadMay not scale well for extremely large files
Tivoli Storage, IBM Software Group
Concurrent Volume AccessConcurrent Volume AccessRandom Access Sequential Access+(based on 5.4 behavior)
Backup storage pool
Restore Node 1
Backup storage pool
Restore Node 1
Restore Node 2
Backup Node 3
Restore Node 2
Backup Node 3
Disk volume is locked by a single process or session using that volumeOther operations cannot access the volume
Multiple TSM sessions or processes can concurrently use the same disk volumeHowever, individual I/O operations for each
until the lock is released, usually when the locking operation has completed all work on the volumeConcurrent access (multiple read
volume are serialized
operations, one write operation) planned for 5.5
T id l t ti ll l i h ld b d f
© 2007 IBM Corporation13 Understanding Disk Storage in Tivoli Storage Manager
To avoid volume contention, smaller volume sizes should be used for sequential-access disk as compared to random-access disk
Tivoli Storage, IBM Software Group
LAN-free Backup/RestoreLAN free Backup/RestoreRandom Access Sequential Access +
Not supported Supported using SANergy to control shared access to sequential disk volumesReduces CPU cycles on TSM server and moves
t k t ffi f LAN t SANnetwork traffic from LAN to SAN
LAN
Client / St A t
ServerSAN
Storage AgentControlData flow
© 2007 IBM Corporation14 Understanding Disk Storage in Tivoli Storage Manager
Alternative approach for LAN-free to disk would be a virtual tape library (VTL) appliance
Tivoli Storage, IBM Software Group
Multi-Session RestoreMulti Session RestoreRandom Access Sequential Access +
Session 1 Session 1
ClientServer Server
Session 2
Client / Storage Agent
Multi-session restore allows one session per sequential-access volume
Multi-session restore allows only one session for all random-access disk volumes
© 2007 IBM Corporation15 Understanding Disk Storage in Tivoli Storage Manager
Multi-session restore is performed only for no-query restore (NQR) operations
Tivoli Storage, IBM Software Group
MigrationMigrationRandom Access Sequential Access
High/low migration thresholds based on percentage occupancy of the poolIf node is grouped and target pool is collocated
High/low migration thresholds based on percentage of volumes containing data (behavior change planned for 5.5)If node is grouped and target pool is collocated
by group, parallel migration processes each work on a different groupOtherwise, parallel migration processes each work on a different node
(behavior change planned for 5.5)Parallel migration processes each work on a different source volume, possibly dividing work more evenly among processesCollocated sequential disk can be used as awork on a different node
Optimized for transfer by node and file space, making it an ideal intermediate buffer for transfer from non-collocated tape to collocated tape (e.g., restore from copy pool to collocated
Collocated sequential disk can be used as a buffer for transfer from non-collocated tape to collocated tape
© 2007 IBM Corporation16 Understanding Disk Storage in Tivoli Storage Manager
tape (e.g., restore from copy pool to collocated tape pool)
Tivoli Storage, IBM Software Group
Migration Thresholds ExampleMigration Thresholds ExampleSequential-Access Disk Today Enhanced Sequential-Access Disk
Migration from sequential-access disk is based on tape paradigm
Migration begins when percentage of
Migration thresholds for sequential-access disk similar to random-access disk
Migration begins when percentage volumes containing data reaches the high migration threshold
Example
occupancy for the entire pool reaches the high migration threshold
Example- High migration threshold is 80%- 5 volumes in pool, each 30%
occupied- Percent migratable is 100%
- High migration threshold is 80%- 5 volumes in pool, each 30% occupied- Percent migratable is 30%- Migration does not begin until the- Percent migratable is 100%
- Migration begins even though pool is only 30% occupied
- Migration does not begin until the entire pool is 80% occupied
© 2007 IBM Corporation17 Understanding Disk Storage in Tivoli Storage Manager
More data stored on sequential-access disk before migration Planned 5.5
Tivoli Storage, IBM Software Group
Storage Pool BackupStorage Pool BackupRandom Access Sequential Access +
If node is grouped and target pool is collocated by group, parallel backup processes each work on a different group
Parallel backup processes each work on a different source volumeOptimization: For each primary pool volume
Otherwise, parallel backup processes each work on a different nodeEach physical file (aggregate or non-aggregated file) must be checked during
and copy pool, database stores offset of volume that has already been backed up (no need to recheck during each backup)Optimization can be especially important for
every storage pool backup long-term storage of data on disk
A B C
© 2007 IBM Corporation18 Understanding Disk Storage in Tivoli Storage Manager
Volume backed up up to this point
Tivoli Storage, IBM Software Group
Space RecoverySpace RecoveryRandom Access Sequential Access +
When physical file is moved to another pool (if caching not enabled)Space occupied by cached data is recovered
d d
Space is not immediately recovered after data movement or deletion, but is recovered via reclamationDuring reclamation processingas needed
When physical file is deleted (for aggregates, all files in aggregate must be deleted)No reconstruction of empty space within
t di d t if t d fil
During reclamation processing, aggregates are reconstructed to recover space occupied by deleted files
aggregates, a disadvantage if aggregated files are stored for long periods of time
lh lAggregate
reconstruction
a b c d e f g h l ma b c d e f g h l mEmpty space accumulates until
entire aggregate is deleted
l ma b c g h
© 2007 IBM Corporation19 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
FragmentationFragmentation
Random-Access Disk Sequential-Access Disk+q
Aggregate fragmentation d b i ti f fil
Empty space accumulates in aggregates until all logical files in aggregate are deleted. May
Empty space is recovered by aggregate reconstruction during caused by expiration of files gg g y
result in wasted space for long-term storage on disk.
gg g greclamation.
Volume fragmentation can occur due to allocation of
Fragmentation of space within TSM volumes caused by deletion of physical files
occur due to allocation of multiple extents if client size estimate is too low. Fragmentation can degrade performance but is relieved by
Deletion of physical files results in empty space within volumes, but this is recovered during reclamation.performance, but is relieved by
migration if no TSM caching.
File system fragmentation Fragmentation is usually Use of scratch volumes causes fragmentation because File system fragmentation
leading to fragmentation of files that constitute TSM volumes
minimal because volumes are predefined or created by space trigger.
volumes are extended as needed. Fragmentation can be avoided either by predefining volumes or using space trigger.
© 2007 IBM Corporation20 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Database RegressionDatabase Regression
1. Database backup with f t fil b
2. Files a,b,c deleted and itt b d f
3. Database restored with
DBBackup
DB
references to files a,b,c
BackupDBDB
space overwritten by d,e,f
DBDB
invalid references to a,b,c
Restore
DB Backup
ba
cDisk storage pool volume
DB Backup
de
fDisk storage pool volume
DB Backup
de
fDisk storage pool volume
b a c d e f b a c
Random Access Sequential Access +
b c d f d f
q
After database regression, all volumes must be auditedThis may be time-consuming for large DISK
After database regression, audit only volumes that were reused or deleted after database backup ORThis may be time consuming for large DISK
pools (for example, pools used for long-term data storage)
backup ORWith REUSEDELAY set, volume audit can be avoided completelyTime delays for volume audits during critical recovery operations can be minimized or
© 2007 IBM Corporation21 Understanding Disk Storage in Tivoli Storage Manager
recovery operations can be minimized or eliminated
Tivoli Storage, IBM Software Group
Shredding of Data Stored on DiskShredding of Data Stored on Disk
Shreddable Storage Pool Shreddable Storage Pool
a bab
a c
b
b b
a c
a bab
a c
b
b b
a cObject “a”
deleted/moved
Database
Database references to object “a”
Database
Database references deleted and object
“a” overwrittenDatabase Database
Random Access Sequential Access+ q
Disk storage pools can be designated as “shreddable”When a data object is moved or deleted from
Not supported (future candidate)
When a data object is moved or deleted from a shreddable pool, TSM server overwrites the objectSensitive data objects are destroyed when deleted/moved, preventing undesirable data
© 2007 IBM Corporation22 Understanding Disk Storage in Tivoli Storage Manager
deleted/moved, preventing undesirable data discovery
Tivoli Storage, IBM Software Group
Active Data PoolsActive Data Pools
A ti d t l di k fActive data pool on disk for fast restore Storage hierarchy contains
active and inactive dataActive/inactive data in copy
Storage Hierarchy
Active/inactive data in copy pool for disaster recovery
Random Access Sequential Access +Not supported
q
Typical restores require active data onlyBenefits of active data storage pools
- Optimized access to active versions for fastOptimized access to active versions for fast restore
- Reduced size of disk pools if only active versions are stored
- Avoids data movement to disk in preparation
© 2007 IBM Corporation23 Understanding Disk Storage in Tivoli Storage Manager
Avoids data movement to disk in preparation for restore of active data
Tivoli Storage, IBM Software Group
More on Active Data PoolsData copied to active data pool using
Copy Activedata command
More on Active Data PoolsData copied to active data pool using
simultaneous write
Simultaneous write
StoreClient
StoreClient
Active data lSt Hi h
Active data lSt Hi h
Copy Activedata
Reclamation of active data pool recovers space used by inactive and deleted files
poolStorage Hierarchy poolStorage Hierarchy
Storage pool restore from active data pool allows restore of active data onlyspace used by inactive and deleted files
Reclamation
allows restore of active data only
Active data pool
Active files
Inactive filesExpired files Storage Hierarchy
p
Active/inactive copy pool
All files
© 2007 IBM Corporation24 Understanding Disk Storage in Tivoli Storage Manager
Expired files S o age e a c y
Tivoli Storage, IBM Software Group
Active Data Pools: ExampleActive Data Pools: Example1. Client backs up A0, B0, C0, D0 to primary pool with simultaneous write to active data pool.
2. Client backs up B1, E1 with simultaneous write to active data pool. B0 deactivated.
Client
A0 B0 C0 D0
Server Client Server
B1 E1
A0 B0 C0 D0
A0 B0 C0 D0Active/inactive
B1A0 B0 C0 D0 E1B1A0 B0 C0 D0 E1
B1A0 B0 C0 D0 E1Active/inactive
Active data pool Active data pool
A0 B0 C0 D0primary pool B1A0 B0 C0 D0 E1
3. Reclamation removes inactive B0 from active data pool.
4. Client restores active files A0, C0, D0, B1, and E1 from active data pool.
primary pool
Client Server Client Server
A0 C0 D0 B1 E1
Active data pool B1A0 B0 C0 D0 E1B1A0 C0 D0 E1 B1A0 B0 C0 D0 E1B1A0 C0 D0 E1
Active/inactive Active/inactive
Active data pool
© 2007 IBM Corporation25 Understanding Disk Storage in Tivoli Storage Manager
B1A0 B0 C0 D0 E1 B1A0 B0 C0 D0 E1Active/inactive primary pool
Active/inactive primary pool
Tivoli Storage, IBM Software Group
Random vs Sequential Disk: Which is Best?Random vs. Sequential Disk: Which is Best?
Disk Storage Usage RecommendationDisk Storage Usage Recommendation
Traditional disk usage LAN-based storage to disk
f
Either random or sequential, depending on requirementsDaily migration from disk to tape requirements
L t t f d t di k
Sequential offers significant advantagesReconstruction recovers space in aggregatesOptimized storage pool backupLong-term storage of data on disk Optimized storage pool backupReduced volume fragmentationMulti-session restore Avoidance of volume audit
Exploitation of new disk storage features Sequential-access disk may be required
LAN-free data transfer between client and disk storage Sequential or VTL
Data shredding Random
© 2007 IBM Corporation26 Understanding Disk Storage in Tivoli Storage Manager
Data shredding Random
Tivoli Storage, IBM Software Group
AgendaAgenda
Background
Random- and sequential-access disk in TSM
S fSpecial considerations for sequential-access disk
Potential future enhancements for disk storage
© 2007 IBM Corporation27 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Optimizing Space Efficiencyp g p yScratch volumes– Volumes are created and extended only as needed
Space is conserved at the expense of file system fragmentation– Space is conserved at the expense of file-system fragmentation
Non-scratch volumes (created by Define Volume command or space trigger)No collocation is more space efficient– No collocation is more space-efficient
– Smaller volumes are more space-efficient
Reclamation should be performed regularly to recover spaceReclamation should be performed regularly to recover space – Efficient because no mount/dismount – Many volumes can be reclaimed concurrently
S Effi i
me
Size
Space Efficiency
--Increased space efficiency
Volu
m
++
© 2007 IBM Corporation28 Understanding Disk Storage in Tivoli Storage Manager
None Group Node FilespaceCollocation
+
Tivoli Storage, IBM Software Group
Reducing Volume ContentionReducing Volume Contention
High-granularity collocation (by node or filespace) reduces contention
Smaller volume sizes reduce contention
Volume contention greatly reduced with introduction of concurrent volume access in v5.5
e --Minimizing Volume Contention
Volu
me
Size -
Decreased volume contention
None Group Node FilespaceC ll ti
V
++
© 2007 IBM Corporation29 Understanding Disk Storage in Tivoli Storage Manager
Collocation
Tivoli Storage, IBM Software Group
Improving Client Restore PerformanceImproving Client Restore Performance
For no-query restore operations (used for most large restores of file data), database scanning is greatly reduced if data is well collocateddatabase scanning is greatly reduced if data is well collocated
Multi-session restore operations achieve greater parallelism if data is spread over multiple sequential-access volumes, indicating that parallelism gmay be increased by– Lower-granularity collocation – Smaller volumes
NQR Database Processing Multi-session Parallelism
olum
e Si
ze
----
++++Reduced DB
processingol
ume
Size --Increased
parallelism
None Group Node Filespace
Vo
++--+
None Group Node Filespace
Vo
++
© 2007 IBM Corporation30 Understanding Disk Storage in Tivoli Storage Manager
None Group Node FilespaceCollocation
None Group Node FilespaceCollocation
Tivoli Storage, IBM Software Group
Avoiding FragmentationAvoiding Fragmentation
Perform reclamation regularly
Avoid use of scratch volumes
f fPredefine volumes using Define Volume command
Use space trigger to provision additional volumes as needed
© 2007 IBM Corporation31 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Striking a BalanceStriking a Balance
Configuration of sequential-access pools involves tradeoffs, but the f ll i b bl t ti i t f t i tfollowing may be a reasonable starting point for most environments
Define volumes and use space triggers for additional volume provisioning
Collocate by node or group of nodes
Use volume size scaled to the size of stored objects– For file systems, volume size of 2 GB to 10 GB
F d t b d th l bj t l i f 100 GB– For databases and other large objects, volume size of 100 GB
Set reclamation threshold at 20-60% and allow multiple reclamation processesprocesses
Consider use of active data pools to achieve fast restore for active data while reducing disk storage requirements
© 2007 IBM Corporation32 Understanding Disk Storage in Tivoli Storage Manager
while reducing disk storage requirements
Tivoli Storage, IBM Software Group
AgendaAgenda
Background
Random- and sequential-access disk in TSM
S fSpecial considerations for sequential-access disk
Potential future enhancements for disk storage
© 2007 IBM Corporation33 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
Potential Future Enhancements for Disk StoragePotential Future Enhancements for Disk StorageEnhancements specifically for sequential-access disk pools– Migration thresholds based on percentage occupancy rather than volumes containing data– Migration thresholds based on percentage occupancy rather than volumes containing data
(planned 5.5)– Concurrent access for volumes (planned 5.5)– Performance improvements for sequential-access disk on z/OS server (next release candidate)
LAN-free to sequential-access disk volumes in GPFS (next release candidate)
Data deduplication (next release candidate)ata dedup cat o ( e t e ease ca d date)
Data shredding for sequential-access disk (future candidate)
Improvements to snapshot support (5.5, next release candidate, future candidate)
Additional exploitation of continuous data protection (CDP) technology (future candidate)candidate)
© 2007 IBM Corporation34 Understanding Disk Storage in Tivoli Storage Manager
Tivoli Storage, IBM Software Group
SummarySummary
Trend toward increasing use of disk for long-term data storage in the TSM hierarchy
TSM supports both random- and sequential-access disk which differ in howTSM supports both random and sequential access disk, which differ in how disk is managed and operations supported
S ti l di k i id d t t i d ti t bSequential-access disk is considered strategic and continues to be enhanced
© 2007 IBM Corporation35 Understanding Disk Storage in Tivoli Storage Manager