Toward new HSM Toward new HSM solution solution
usingusing GPFS/TSM/StoRM GPFS/TSM/StoRM
integrationintegrationVladimir Sapunenko Vladimir Sapunenko (INFN, CNAF)(INFN, CNAF)
Luca dell’Agnello (INFN, CNAF)Luca dell’Agnello (INFN, CNAF)Daniele Gregori (INFN, CNAF)Daniele Gregori (INFN, CNAF)Riccardo Zappi (INFN, CNAF)Riccardo Zappi (INFN, CNAF)Lunca Magnoni (INFN, CNAF)Lunca Magnoni (INFN, CNAF)
Elisabetta Ronchieri (INFN, CNAF)Elisabetta Ronchieri (INFN, CNAF)Vincenzo Vagnoni (INFN, BolognaVincenzo Vagnoni (INFN, Bologna))
07/05/2008 2HEPiX 2008, Geneve
Storage classes @ CNAFStorage classes @ CNAF Implementation of 3 Storage Classes needed for LHCImplementation of 3 Storage Classes needed for LHC Disk0Tape1 (D0T1) Disk0Tape1 (D0T1) CASTOR CASTOR
Space managed by systemSpace managed by system Data migrated to tapes and deleted from when staging area Data migrated to tapes and deleted from when staging area
is full is full Disk1tape0 (D1T0) Disk1tape0 (D1T0) GPFS/StoRM (in production) GPFS/StoRM (in production)
Space managed by VO Space managed by VO Disk1tape1 Disk1tape1 (D1T1) (D1T1) CASTOR (production), CASTOR (production),
GPFS/StoRM (production prototype for LCHb only)GPFS/StoRM (production prototype for LCHb only) Space managed by VO (i.e. if disk is full, copy fails)Space managed by VO (i.e. if disk is full, copy fails) Large permanent buffer of disk with tape back-end and no Large permanent buffer of disk with tape back-end and no
gcgc
07/05/2008 3HEPiX 2008, Geneve
Looking into HSM solution Looking into HSM solution on the base of on the base of
StoRM/GPFS/TSMStoRM/GPFS/TSM Project developed as a collaboration between:Project developed as a collaboration between:
GPFS development team (US)GPFS development team (US) TSM HSM development team (Germany)TSM HSM development team (Germany) End-users (INFN-CNAF)End-users (INFN-CNAF)
Main idea is to combine new features of GPFS Main idea is to combine new features of GPFS (v.3.2) and TSM (v.5.5) with SRM (StoRM), to (v.3.2) and TSM (v.5.5) with SRM (StoRM), to provide transparent GRID-friendly HSM solution.provide transparent GRID-friendly HSM solution. Information Lifecycle Management (ILM) used to order Information Lifecycle Management (ILM) used to order
moving of data between disks and tapesmoving of data between disks and tapes Interface between GPFS and TSM is on our shouldersInterface between GPFS and TSM is on our shoulders
Improvements and development are needed from Improvements and development are needed from all sidesall sides Transparent recall vs. massive (list ordered, optimized) Transparent recall vs. massive (list ordered, optimized)
recallsrecalls
07/05/2008 4HEPiX 2008, Geneve
What we have nowWhat we have now GPFS and TSM are widely used as GPFS and TSM are widely used as
separate productsseparate products Build-in functionality in both products Build-in functionality in both products
to implement backup and archiving to implement backup and archiving from GPFS.from GPFS.
In GPFS v.3.2 concept of “external In GPFS v.3.2 concept of “external storage pool” extends use of policy storage pool” extends use of policy driven ILM to tape storage.driven ILM to tape storage.
Some groups in HEP world are Some groups in HEP world are starting to investigate this solution or starting to investigate this solution or expressed interest to startexpressed interest to start
07/05/2008 5HEPiX 2008, Geneve
GPFS Approach: GPFS Approach: “External Pools”“External Pools”
External pools are really interfaces to External pools are really interfaces to external storage managers, e.g. HPSS or external storage managers, e.g. HPSS or TSMTSM External pool “rule” defines script to call to External pool “rule” defines script to call to
migrate/recall/etc. filesmigrate/recall/etc. filesRULE EXTERNAL POOL ‘PoolName’ EXEC ‘InterfaceScript’ RULE EXTERNAL POOL ‘PoolName’ EXEC ‘InterfaceScript’ [ OPTS ’options’][ OPTS ’options’]
GPFS policy engine builds candidate lists GPFS policy engine builds candidate lists and passes them to external pool scriptsand passes them to external pool scripts
External storage manager actually moves External storage manager actually moves the datathe data
07/05/2008 6HEPiX 2008, Geneve
Storage class Disk1-Tape1Storage class Disk1-Tape1 D1T1 prototype in GPFS/TSM was tested for D1T1 prototype in GPFS/TSM was tested for
about two monthsabout two months Quite simple when no competition between Quite simple when no competition between
migration and recallmigration and recall D1T1 requires that every file written to disk will be copied D1T1 requires that every file written to disk will be copied
to tape (and remain resident on disk)to tape (and remain resident on disk) recalls needed only in case of data loss (on disk)recalls needed only in case of data loss (on disk)
Although the D1T1 is a living concept…Although the D1T1 is a living concept…
Some adjustments were needed in StoRMSome adjustments were needed in StoRM Basically to place a file on hold for migration until the Basically to place a file on hold for migration until the
write operation is completed (SRM “putDone” on file)write operation is completed (SRM “putDone” on file) Definitely positive results of the test with the Definitely positive results of the test with the
current testbed hardwarecurrent testbed hardware Need to more tests up with a larger scaleNeed to more tests up with a larger scale Need to establish production modelNeed to establish production model
07/05/2008 7HEPiX 2008, Geneve
Storage class Disk0-Tape1Storage class Disk0-Tape1 Prototype is ready and being tested nowPrototype is ready and being tested now More complicated logic is neededMore complicated logic is needed
Define priority between reads and writesDefine priority between reads and writes For example in actual version of CASTOR migration to For example in actual version of CASTOR migration to
tape have absolute prioritytape have absolute priority logic of reordering of recall “list optimized logic of reordering of recall “list optimized
recall”: by tapes and by files inside a taperecall”: by tapes and by files inside a tape The logic is realized by means of special The logic is realized by means of special
scripts scripts First tests are encouraging, even First tests are encouraging, even
considering the complexity of the problemconsidering the complexity of the problem Modification were requested in StoRM to Modification were requested in StoRM to
implement recall logic and file pinning for implement recall logic and file pinning for files in use.files in use. The identified solutions are simple and linearThe identified solutions are simple and linear
07/05/2008 8HEPiX 2008, Geneve
GPFS+TSM testsGPFS+TSM tests So far we have performed full tests of a So far we have performed full tests of a
D1T1 solution (StoRM+GPFS+TSM) D1T1 solution (StoRM+GPFS+TSM) and the D0T1 implementation is being and the D0T1 implementation is being developed in close contact with IBM developed in close contact with IBM GPFS and TSM developersGPFS and TSM developers
The D1T1 is entering now its first The D1T1 is entering now its first production phase, being used by LHCb production phase, being used by LHCb during this month’s CCRC08during this month’s CCRC08 As well as the D1T0, which is served by the As well as the D1T0, which is served by the
same GPFS cluster but without migrationssame GPFS cluster but without migrations GPFS/StoRM based D1T0 is also already GPFS/StoRM based D1T0 is also already
used since February by Atlasused since February by Atlas
07/05/2008 9HEPiX 2008, Geneve
D1T0 and D1T1 @CNAFD1T0 and D1T1 @CNAF using StoRM/GPFS/TSMusing StoRM/GPFS/TSM
3 STORM 3 STORM instances instances
3 major HEP 3 major HEP experimentsexperiments
2 Storage 2 Storage classesclasses
12 servers, 200TB 12 servers, 200TB of disk spaceof disk space
3 LTO2 tape 3 LTO2 tape drivesdrives
07/05/2008 10HEPiX 2008, Geneve
Hardware used for testHardware used for test
40TB GPFS File system (v.3.2.0-3) served by 4 40TB GPFS File system (v.3.2.0-3) served by 4 I/O NSD servers (SAN devices are EMC CX3-80)I/O NSD servers (SAN devices are EMC CX3-80) FC (4Gbit/s) interconnection between servers and FC (4Gbit/s) interconnection between servers and
disks arraydisks array TSM v.5.5TSM v.5.5 2 servers (1Gb Ethernet) HSM front-ends each 2 servers (1Gb Ethernet) HSM front-ends each
one acting as:one acting as: GPFS client (reads and writes on the file-system via GPFS client (reads and writes on the file-system via
LAN)LAN) TSM client (reads and writes from/to tapes via FC)TSM client (reads and writes from/to tapes via FC)
3 LTO-2 tape drives3 LTO-2 tape drives Sharing of the tape library (STK L5500) between Sharing of the tape library (STK L5500) between
Castor e TSMCastor e TSM i.e. working together with the same tape libraryi.e. working together with the same tape library
07/05/2008 11HEPiX 2008, Geneve
GPFS Server
GPFS/TSM client
GPFS/TSM client
TSM server
Tape drive
Tape drive
Tape drive
GPFS
TSM
Gigabit LAN
Gigabit LAN
FC SANFC SAN
GPFS Server
gridftp Server
DB
TSM server (backup)
DBmirror
2 EMC CX3-80 controllers4 GPFS server2 StoRM servers2 Gridftp Servers2 HSM frontend nodes3 Tape Drive LTO-21 TSM server
1/10 Gbps Ethernet 2/4 Gbps FC
LHCb D1T0 and D1T1 detailsLHCb D1T0 and D1T1 details
…
FC TANFC TAN
07/05/2008 12HEPiX 2008, Geneve
How it worksHow it works GPFS performs file system metadata scans according GPFS performs file system metadata scans according
to ILM policies specified by the administratorsto ILM policies specified by the administrators The metadata scan is very fast (is not a find…) and is used The metadata scan is very fast (is not a find…) and is used
by GPFS to identify the files which need to be migrated to by GPFS to identify the files which need to be migrated to tapetape
Once the list of files are obtained, it is passed to an Once the list of files are obtained, it is passed to an external process which is run on the HSM nodes and external process which is run on the HSM nodes and it actually performs the migration to TSMit actually performs the migration to TSM This is in particular what we implementedThis is in particular what we implemented
Note:Note: The GPFS file system and the HSM nodes can be kept The GPFS file system and the HSM nodes can be kept
completely decoupled, in the sense that it is possible to completely decoupled, in the sense that it is possible to shutdown the HSM nodes without interrupting the file shutdown the HSM nodes without interrupting the file system availabilitysystem availability
All components of the system are having intrinsic All components of the system are having intrinsic redundancy (GPFS failover mechanisms). redundancy (GPFS failover mechanisms).
No need to put in place any kind of HA features (apart from the No need to put in place any kind of HA features (apart from the unique TSM server)unique TSM server)
07/05/2008 13HEPiX 2008, Geneve
Example of a ILM policyExample of a ILM policy
/* Policy implementing T1D1 for LHCb:/* Policy implementing T1D1 for LHCb: -) 1 GPFS storage pool-) 1 GPFS storage pool -) 1 SRM space token: LHCb_M-DST-) 1 SRM space token: LHCb_M-DST -) 1 TSM management class-) 1 TSM management class -) 1 TSM storage pool */-) 1 TSM storage pool */
/* Placement policy rules *//* Placement policy rules */RULE 'DATA1' SET POOL 'data1' LIMIT (99)RULE 'DATA1' SET POOL 'data1' LIMIT (99)RULE 'DATA2' SET POOL 'data2' LIMIT (99)RULE 'DATA2' SET POOL 'data2' LIMIT (99)RULE 'DEFAULT' SET POOL 'system'RULE 'DEFAULT' SET POOL 'system'
/* We have 1 space token: LHCb_M-DST. Define 1 external pool accordingly. *//* We have 1 space token: LHCb_M-DST. Define 1 external pool accordingly. */RULE EXTERNAL POOL 'TAPE MIGRATION LHCb_M-DST‘RULE EXTERNAL POOL 'TAPE MIGRATION LHCb_M-DST‘
EXEC '/var/mmfs/etc/hsmControl' OPTS 'LHCb_M-DST‘EXEC '/var/mmfs/etc/hsmControl' OPTS 'LHCb_M-DST‘/* Exclude from migration hidden directories (e.g. .SpaceMan), /* Exclude from migration hidden directories (e.g. .SpaceMan), baby files, hidden and weird files. */baby files, hidden and weird files. */RULE 'exclude hidden directories' EXCLUDE WHERE PATH_NAME LIKE '%/.%'RULE 'exclude hidden directories' EXCLUDE WHERE PATH_NAME LIKE '%/.%'RULE 'exclude hidden file' EXCLUDE WHERE NAME LIKE '.%'RULE 'exclude hidden file' EXCLUDE WHERE NAME LIKE '.%'RULE 'exclude empty files' EXCLUDE WHERE FILE_SIZE=0RULE 'exclude empty files' EXCLUDE WHERE FILE_SIZE=0RULE 'exclude baby files' EXCLUDERULE 'exclude baby files' EXCLUDE
WHERE (CURRENT_TIMESTAMP-MODIFICATION_TIME)<INTERVAL '3' MINUTEWHERE (CURRENT_TIMESTAMP-MODIFICATION_TIME)<INTERVAL '3' MINUTE
07/05/2008 14HEPiX 2008, Geneve
Example of a ILM Example of a ILM policy policy (cont.)(cont.)
/* Migrate to the external pool according to /* Migrate to the external pool according to space token (i.e. fileset). */space token (i.e. fileset). */
RULE 'migrate from system to tape LHCb_M-DST'RULE 'migrate from system to tape LHCb_M-DST'MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)TO POOL 'TAPE MIGRATION LHCb_M-DST'TO POOL 'TAPE MIGRATION LHCb_M-DST'FOR FILESET('LHCb_M-DST')FOR FILESET('LHCb_M-DST')
RULE 'migrate from data1 to tape LHCb_M-DST'RULE 'migrate from data1 to tape LHCb_M-DST'MIGRATE FROM POOL 'data1' THRESHOLD(0,100,0)MIGRATE FROM POOL 'data1' THRESHOLD(0,100,0)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)TO POOL 'TAPE MIGRATION LHCb_M-DST'TO POOL 'TAPE MIGRATION LHCb_M-DST'FOR FILESET('LHCb_M-DST')FOR FILESET('LHCb_M-DST')
RULE 'migrate from data2 to tape LHCb_M-DST'RULE 'migrate from data2 to tape LHCb_M-DST'MIGRATE FROM POOL 'data2' THRESHOLD(0,100,0)MIGRATE FROM POOL 'data2' THRESHOLD(0,100,0)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)TO POOL 'TAPE MIGRATION LHCb_M-DST'TO POOL 'TAPE MIGRATION LHCb_M-DST'FOR FILESET('LHCb_M-DST')FOR FILESET('LHCb_M-DST')
07/05/2008 15HEPiX 2008, Geneve
Example of configuration Example of configuration filefile
# HSM node list (comma separated)# HSM node list (comma separated)HSMNODES=diskserv-san-14,diskserv-san-16HSMNODES=diskserv-san-14,diskserv-san-16
# system directory path# system directory pathSVCFS=/storage/gpfs_lhcb/systemSVCFS=/storage/gpfs_lhcb/system
# filesystem scan minimum frequency (in sec)# filesystem scan minimum frequency (in sec)SCANFREQUENCY=1800SCANFREQUENCY=1800
# maximum time allowed for a migrate session # maximum time allowed for a migrate session (in sec)(in sec)
MIGRATESESSIONTIMEOUT=4800MIGRATESESSIONTIMEOUT=4800
# maximum number of migrate threads per node# maximum number of migrate threads per nodeMIGRATETHREADSMAX=30MIGRATETHREADSMAX=30
# number of files for each migrate stream# number of files for each migrate streamMIGRATESTREAMNUMFILES=30MIGRATESTREAMNUMFILES=30
# sleep time for lock file check loop# sleep time for lock file check loopLOCKSLEEPTIME=2LOCKSLEEPTIME=2
# pin prefix# pin prefixPINPREFIX=.STORM_T1D1_PINPREFIX=.STORM_T1D1_
# TSM admin user name# TSM admin user nameTSMID=xxxxxTSMID=xxxxx
# TSM admin user password# TSM admin user passwordTSMPASS=xxxxxTSMPASS=xxxxx
# report period (in sec)# report period (in sec)REPORTFREQUENCY=86400 REPORTFREQUENCY=86400
# report email addresses (comma # report email addresses (comma separated)separated)
REPORTEMAILADDRESS=Vladimir.Sapunenko@[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],Vincenzo,[email protected],[email protected]@bo.infn.it
# alarm email addresses (comma # alarm email addresses (comma separated)separated)
[email protected][email protected]
# alarm email delay (in sec)# alarm email delay (in sec)ALARMEMAILDELAY=7200ALARMEMAILDELAY=7200
07/05/2008 16HEPiX 2008, Geneve
Example of a reportExample of a reportA first automatic reporting system has been implementedA first automatic reporting system has been implemented
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Start: Sun 04 May 2008 11:38:48 PM CESTStart: Sun 04 May 2008 11:38:48 PM CESTStop: Mon 05 May 2008 08:03:15 AM CEST Seconds: 30267Stop: Mon 05 May 2008 08:03:15 AM CEST Seconds: 30267------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Tape Files Failures File throughput Total throughputTape Files Failures File throughput Total throughputL00595 5 0 31.0798 MiB/s 0.702259 MiB/sL00595 5 0 31.0798 MiB/s 0.702259 MiB/sL00599 10 0 32.4747 MiB/s 1.41891 MiB/sL00599 10 0 32.4747 MiB/s 1.41891 MiB/sL00611 57 0 29.0862 MiB/s 6.59165 MiB/sL00611 57 0 29.0862 MiB/s 6.59165 MiB/sL00614 47 0 31.5084 MiB/s 6.61944 MiB/sL00614 47 0 31.5084 MiB/s 6.61944 MiB/sL00615 46 0 30.3926 MiB/s 6.57133 MiB/sL00615 46 0 30.3926 MiB/s 6.57133 MiB/sL00617 47 0 31.1735 MiB/s 6.5116 MiB/sL00617 47 0 31.1735 MiB/s 6.5116 MiB/sL00618 62 0 28.4119 MiB/s 6.06469 MiB/sL00618 62 0 28.4119 MiB/s 6.06469 MiB/sL00619 44 0 27.0226 MiB/s 4.10937 MiB/sL00619 44 0 27.0226 MiB/s 4.10937 MiB/sL00620 53 0 27.1009 MiB/s 7.13976 MiB/sL00620 53 0 27.1009 MiB/s 7.13976 MiB/sL00621 66 0 28.9043 MiB/s 6.67269 MiB/sL00621 66 0 28.9043 MiB/s 6.67269 MiB/sL00624 44 0 11.4347 MiB/s 5.82468 MiB/sL00624 44 0 11.4347 MiB/s 5.82468 MiB/sL00626 62 0 30.4792 MiB/s 6.53114 MiB/sL00626 62 0 30.4792 MiB/s 6.53114 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Drive Files Failures File throughput Total throughputDrive Files Failures File throughput Total throughputDRIVE3 218 0 30.2628 MiB/s 25.7269 MiB/sDRIVE3 218 0 30.2628 MiB/s 25.7269 MiB/sDRIVE4 197 0 29.5188 MiB/s 23.6487 MiB/sDRIVE4 197 0 29.5188 MiB/s 23.6487 MiB/sDRIVE5 128 0 21.5395 MiB/s 15.3819 MiB/sDRIVE5 128 0 21.5395 MiB/s 15.3819 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Host Files Failures File throughput Total throughputHost Files Failures File throughput Total throughputdiskserv-san-14 285 0 29.9678 MiB/s 34.0331 MiB/sdiskserv-san-14 285 0 29.9678 MiB/s 34.0331 MiB/sdiskserv-san-16 258 0 25.6928 MiB/s 30.7245 MiB/sdiskserv-san-16 258 0 25.6928 MiB/s 30.7245 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Files Failures File throughput Total throughputFiles Failures File throughput Total throughputTotal 543 0 27.9366 MiB/s 64.7575 MiB/sTotal 543 0 27.9366 MiB/s 64.7575 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alarm part is being developedAlarm part is being developed
An email is sent with the reports every day (period of time is configurable by the option file)
07/05/2008 17HEPiX 2008, Geneve
Description of the testsDescription of the tests Test ATest A
Data transfer of LHCb files from CERN Castor-disk to Data transfer of LHCb files from CERN Castor-disk to CNAF StoRM/GPFS using the File Transfer ServiceCNAF StoRM/GPFS using the File Transfer Service
Automatic migration of the data files from GPFS to TSM Automatic migration of the data files from GPFS to TSM while the data was being transferred by FTSwhile the data was being transferred by FTS
This is a realistic scenarioThis is a realistic scenario Test BTest B
1GiB zero’ed files created locally on the GPFS file system 1GiB zero’ed files created locally on the GPFS file system with the migration turned off, then migrated to tape when with the migration turned off, then migrated to tape when the writes were finishedthe writes were finished
The migration of zero’ed files to tape is faster due to The migration of zero’ed files to tape is faster due to compression compression measures physical limits of the system measures physical limits of the system
Test CTest C Similar to Test B, but with real LHCb data files instead of Similar to Test B, but with real LHCb data files instead of
dummy zero’ed filesdummy zero’ed files Realistic scenario, e.g. when for maintenance a long queue Realistic scenario, e.g. when for maintenance a long queue
of files to be migrated accumulates in the file systemof files to be migrated accumulates in the file system
07/05/2008 18HEPiX 2008, Geneve
Test A: input filesTest A: input files
Most of the files are Most of the files are of of 4 and 2 GiB size, 4 and 2 GiB size, with a bit of other with a bit of other sizes in additionsizes in addition
data files are LHCb data files are LHCb stripped DSTstripped DST
2477 files2477 files 8 TiB in total8 TiB in total
File size distribution
07/05/2008 19HEPiX 2008, Geneve
Test A: resultsTest A: resultsBlack curve: net data throughput from CERN to CNAF vs. time
Red curve: net data throughput from GPFS to TSM
FTS transfers were temporarily interrupted
Just two LTO-2 drives
A third LTO-2
drive was added
A drive was
removed
8 TiB in total were transferred to tape in 150k seconds (almost 2 days) from CERN
About 50 MiB/s to tape with two LTO-2 drives and 65 MiB/s with three LTO-2 drives
Zero tape migration failuresZero retrials
07/05/2008 20HEPiX 2008, Geneve
Test A: results (II)Test A: results (II)
Most of the files were Most of the files were migrated within less migrated within less than 3 hours with a than 3 hours with a tail up to 8 hourstail up to 8 hours The tail comes from The tail comes from
the fact that at some the fact that at some point the CERN-to-point the CERN-to-CNAF throughput CNAF throughput raised to 80 MiB/s, raised to 80 MiB/s, overcoming max overcoming max performance of tape performance of tape migration at that time. migration at that time. So, GPFS/TSM So, GPFS/TSM accumulated a queue accumulated a queue of files with respect to of files with respect to the FTS transfers the FTS transfers
Retention time on disk (time since file is written
until it is migrated to tape)
07/05/2008 21HEPiX 2008, Geneve
Test A: Test A: resultsresults (III) (III) The distribution peaks The distribution peaks
at about 33 MiB/s at about 33 MiB/s which is the maximum which is the maximum sustainable for LHCb sustainable for LHCb data files by the LTO-2 data files by the LTO-2 drivesdrives Due to compression the Due to compression the
actual performance actual performance depend on the content depend on the content of the files…of the files…
Tail is mostly due to Tail is mostly due to the fact that some of the fact that some of the tapes showed the tapes showed much smaller much smaller throughputsthroughputs For this test we reused old For this test we reused old
tapes no longer used by tapes no longer used by CastorCastor
Distribution of throughput per migration to tape
What is this secondary peak?
It is due to files which are written to the end of the tapes and the TSM splits them to a subsequent tape (i.e. must dismount and remount a new tape to continue writing the file)
07/05/2008 22HEPiX 2008, Geneve
IntermezzoIntermezzo
Between Test A and Test B we Between Test A and Test B we realized that the interface logics was realized that the interface logics was not perfectly balancing between the not perfectly balancing between the two HSM nodestwo HSM nodes
Then the logics of the interface has Then the logics of the interface has been slightly changed in order to been slightly changed in order to improve the performanceimprove the performance
07/05/2008 23HEPiX 2008, Geneve
Test B: resultsTest B: results File system prefilled File system prefilled
with 1000 files of 1 with 1000 files of 1 GiB size each all GiB size each all filled with zeroesfilled with zeroes migration to tape migration to tape
turned off while turned off while writing data to diskwriting data to disk
Migration to tape Migration to tape turned on when turned on when prefilling finishedprefilling finished
Hardware Hardware compression is very compression is very effective for such effective for such filesfiles
About 100 MiB/s About 100 MiB/s observed over 10k observed over 10k secondsseconds
What is this valley here?
Explained in the next slide where they are more visible
Net throughput to tape versus time
No tape migration failures
and no retrials observed
07/05/2008 24HEPiX 2008, Geneve
Test C: resultsTest C: results Similar to Test B, but with real Similar to Test B, but with real
LHCb data files taken from the LHCb data files taken from the same sample of Test A instead of same sample of Test A instead of zero’ed fileszero’ed files The valleys clearly visible here The valleys clearly visible here
have a period of exactly 4800 have a period of exactly 4800 secondsseconds
They were also partially present in They were also partially present in Test A, but not clearly visible in the Test A, but not clearly visible in the plot due to larger binningplot due to larger binning
The valleys are due to a tunable The valleys are due to a tunable feature of our interfacefeature of our interface
Each migration session is timed out Each migration session is timed out if not finished within 4800 secondsif not finished within 4800 seconds
After the timeout GPFS performs a After the timeout GPFS performs a new metadata scan and a new new metadata scan and a new migration session is initiatedmigration session is initiated
4800 seconds is not a magic 4800 seconds is not a magic number, could be larger or even number, could be larger or even infiniteinfinite
No tape migration failuresand no retrials observed
Net throughput to tape versus time
About 70 MiB/s on averagewith peaks up to 90 MiB/s
07/05/2008 25HEPiX 2008, Geneve
ConclusionsConclusions and outlook and outlook
First phase of tests for T1D1 StoRM/GPFS/TSM-based First phase of tests for T1D1 StoRM/GPFS/TSM-based concludedconcluded LHCb is now starting the first production experience with such a LHCb is now starting the first production experience with such a
T1D1 systemT1D1 system
Work is ongoing for a T1D0 implementation in collaboration Work is ongoing for a T1D0 implementation in collaboration with IBM GPFS and TSM HSM development teamswith IBM GPFS and TSM HSM development teams T1D0 is more complicated since it should include active recalls T1D0 is more complicated since it should include active recalls
optimization, concurrence between migrations and recalls, etc.optimization, concurrence between migrations and recalls, etc. IBM will introduce efficient IBM will introduce efficient ordered recallsordered recalls features in the next features in the next
major release of TSMmajor release of TSM Waiting for that release, in the meanwhile we are implementing it Waiting for that release, in the meanwhile we are implementing it
through an intermediate layer of intelligence between GPFS and through an intermediate layer of intelligence between GPFS and TSM driven by StoRMTSM driven by StoRM
A first proof of principle prototype already exists, but this is A first proof of principle prototype already exists, but this is something to be discussed in a future talk… stay tuned!something to be discussed in a future talk… stay tuned!
New library recently acquired at CNAFNew library recently acquired at CNAF Once the new library will be online and old data files will be Once the new library will be online and old data files will be
repacked to the new one, the old library will be devoted entirely to repacked to the new one, the old library will be devoted entirely to TSM production systems and testbedsTSM production systems and testbeds
About 15 drives, much more realistic and interesting scale than 3 drivesAbout 15 drives, much more realistic and interesting scale than 3 drives