TheHDF5‐iRODSModuleADataGridSystemforObjectLevelAccess
PeterCao,TheHDFGroup([email protected])MichaelWan,SanDiegoSupercomputerCenter([email protected])
• Simulationscangenerateverylargeandcomplexdatasets.
• Researchersatdifferentsitesneedfastaccesstobothrecentandhistoricaldata.• Storage,networking,andcomputeplatformsvary;somecannothandlefulldatasets.• Frequently,onlysubsetsofthedataareofinterest.
HDF5,iRODS,andtheHDF5‐iRODSmoduleaddressthesechallenges.
HDFView,avisualtoolforbrowsingandeditingHDFfiles,wasextendedtousetheHDF5‐iRODSmodulesothatuserscanviewHDF5filesstoredremotelyontheiRODSserver.
http://www.hdfgroup.org/projects/irods
ThisprojectwassponsoredbyCIP/NLADR,anNSFPACIProjectinsupportofNCSA‐SDSCcollaboration,andmanagedbytheCyberInfrastructurePartnership(CIP),ajointeffortledbyNCSAandSDSC.TheworkwascarriedoutbyTheHDFGroupandtheSDSCSRBteam.TheASC/AllianceCenterforAstrophysicalThermonuclearFlashesattheUniversityofChicagoprovidedtheFLASHsimulationdata(HDF5files)andotherassistance.Theislicetoolwasbasedonextract_slice_from_chkpnt,aslicetoolpreviouslydevelopedbyPaulRicker(NCSA/UIUC).
TheIntegratedRule‐OrientedDataSystem(iRODS)isadatagridsystemdevelopedbytheDataIntensiveCyberEnvironments(DICE)group.ThemostpowerfulfeatureofiRODSistheDistributedRuleEngine,whichallowsuserstoautomateenforcementofmanagementpoliciesbyapplyingiRODSRulesthatcontroltheexecutionofalldataaccessandmanipulationoperationsatdistributedsites.
DATACHALLENGES
iRODS
HDF5‐iRODSMODULE HDFViewAPPLICATION
ACKNOWLEDGMENTS
PROJECTWEBSITE
TheHDF5‐iRODSmodulecomponents,togetherwithHDF5andiRODS,implementaclient‐serversystemthatprovidesinteractiveandefficientaccesstoHDF5filesmanagedbyaremoteiRODSserver.ApplicationsonthelocalmachineuseclientfunctionstoaccessspecificdataandmetadatainHDF5filesstoredremotely.Onlytherequesteddataandmetadataaretransferredtothelocalmachine,nottheentirefile.
TheHDF5‐iRODSmoduleincludestwomainparts:asetofHDF5micro‐servicesandasetofHDF5objects.TheHDF5micro‐servicesperformsimplewell‐definedHDF5tasks,suchasopenafile,readfromadataset,readgroupattributes,orcloseafile.TheHDF5objects,representingobjectsinHDF5files,areusedtospecifyrequestsfromtheclientandtotransferresults(data)fromtheserver.
HDF5
HierarchicalDataFormatVersion5(HDF5)isauniquetech‐nologysuitethatmakesitpossibletomanageextremelylargeandcomplexdatacollections.Morethan600organizations,over200typesofapplications,andmillionsofindividualsareusingHDF5.TerabytesofdataarestoredinHDF5everyday.
TheHDF5suiteincludes:• Aversatiledatamodel• Aportablefileformat• Alibraryoptimizedforaccesstimeandstoragespace• Toolsandapplicationstomanage,manipulate,view,andanalyzedatainHDF5files
iRODSequipsuserstohandleafullrangeoftasks:• Managedistributeddata• Extractmetadata• Movedataefficiently• Sharedatasecurely• Publishdataindigitallibrary• Archivedataforlong‐termpreservation
TheislicetoolusestheHDF5‐iRODSmoduletoextractasliceofdatafromaFLASHfilestoredremotelyontheserver.Thesliceofinterestistransferredandstoredonthelocal(client)system.
isliceAPPLICATION
BENEFITSoftheHDF5‐iRODSMODULE• Reducesstorageneededonlocalmachine.Terabytesofdataresideremotely;onlysmallsubsetsarestagedlocally.• Facilitatesdatasharing.Scientistscaneasilyaccessupdateddataafteranewsimulationrun,aswellaspriorresults.ClientsdonotrequiretheHDF5library;allHDF5callsarehandledbytheiRODSserver.• Supportsfastbrowsingofdataobjects.Userscanexaminethestructureofafilewithoutloadingthedatacontent.• Providesrapidaccesstoselecteddatacontentandmetadata.Bytransferringonlytheselecteddatacontentandmetadata,accesstimeisreduced.
iRODSmessage(pack/unpack)
HDF5object(H5Dataset)
HDF5file
Ineedtosee
Application
HDF5Library
Root(fileentrypoint)
Dataarray
GroupA GroupB
ExampleHDF5FileStructure
ThefigureabovedepictshowtheHDF5‐iRODSclient‐serversystemworks.Auseronalocal(client)machinerequestsasliceofdatafromanHDF5filemanagedbyaremoteiRODSserver.Toservethisrequest,H5DATASET_OP_READissetinanH5Dataset(HDF5object).TheHDF5objectisthenpackedintoaniRODSmessageandsenttotheiRODSserver.Theserverunpacksthemessageandcheckstheruleengineformatches.ItfindsandexecutestheassociatedHDF5micro‐service,msiH5Dataset_read,whichcallsH5Dataset.read()togettherequesteddatafromtheHDF5file.TheHDF5object,whichcontainstherequestedsliceofdata,ispackedintoaniRODSmessageandreturnedtotheclient,whereitisunpackedanddeliveredtotheapplication.
iRODSmessage(pack/unpack)
HDF5object(H5Dataset)
FLASHsimulationresults
Fulldataset:20GB Sliceofinterest:16MB
DataTransferTime
Fulldataset NetworkBandwidth Sliceofinterest
3.3min 100MB/sec .16sec
33min 10MB/sec 1.6sec
5.5hours 1MB/sec 16sec
HDF5microservices
RuleEngineClientInterface
request result
request result