PETRAIII/EuXFEL data archiving
Sergey Yakubov, Martin Gasthuber (@desy.de) / DESY-ITGeneva, June 5, 2019
Page 2| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
(National)
Page 3
DESY Campus Hamburg – much more communities
Synchrotron radiation source (highest brilliance)
VUV & soft-x-ray free-electron laser
MPI-SD
FLASH I+II
PETRA III
+X-Ray Free-Electron Laseratomic structure & fs dynamicsof complex matter
CHyN
HARBOR
CXNSNanoLab
CWS
Page 4
sources of data
• 3 active accelerators on-site (all photon science) – Petra III, FLASH and EuXFEL
• currently 30 active experimental areas (called beamlines) - operated in parallel
• more in preparation
• Petra IV (future) – expect 104-5 more (raw) data - not all to be stored
• FLASH21+
• majority of generated data is analyzed within a few months (cooling afterwards)
• have two independent copies asap (raw & calibration data i.e. for EuXFEL)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
Page 5
DESY datacenter - resources interacting with ARCHIVER
data processing resources before archiving
• HPC cluster – 500 nodes, 30,000 cores, large InfiniBand fabric (growing)
• GPFS – 30 building blocks, 30PB, all InfiniBand connected (growing)
• BeeGFS - 3PB, InfiniBand connected
• LHC computing - Analysis Facility + Tier-2, 1000 nodes, 30,000 cores (growing)
• ~40% more resources outside the datacenter (mostly at experimental areas)
current archiving capabilities
• dCache - 6 large instances, 35PB capacity, >120 building blocks, Tape gateway
• Tape – 2 x SL8500 (15000 Slots), 25 x LTO8, 8 x LTO6, >80PB capacity
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Page 6
data life cycle as of today - from the cradle to the grave
• new archive service connected to ‘Core-FS’ and/or after dCache to fit seamlessly into existing workflow
• this scenario will most likely use the full automated (API/CLI) archive system interface
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Page 7
PETRAIII/EuXFEL data archiving
• end user workflows (3)• scientific data and user
• admin workflow• service integration & planning• configuration based on site+community data policy and contracts between
● SIP == DIP (AIP should allow sequential media efficiently)● Archival Storage - here is the ‘hybrid’ in
○ replication (horizontal)○ multi tiering (vertical) - similar to HSMs○ instances should run on distributes sites
● Archival Storage == instances of bit-stream-preservation● Data Management + Ingest + Access == core of archive instance
Page 8
end user workflow 1
• individual scientist archiving important work (i.e. publication, partial analysis results, …) – DOI required
• key metrics• Single archive size: average 10-100 GB.
• Files in archive: average 10,000
• Total archive size per user: 5 TB
• Duration: 5-10 years
• Ingest rates: 10-100 MB/s (more is better)
• encryption: not required, nice to have
• browser based interaction (authentication, data transfers, metadata query/ingest)
• cli tools usable for data ingest
• metadata query
• starting from a single string input (like Google search) - interactive/immediate selection response
• change QOS - i.e. #replications after re-evaluating ‘value’ of that data
• DOI generated - (like i.e. zenodo) for durable external references
• mobile devices (tablet, phone, …) (tools + protocols) should not excluded
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
individual scientist – managing private scientific data (on its own generated and managed)
Page 9
end user workflow 2
• beamline (experimental station) specific + experiment specific, medium size and rate
• key size parameters• Single archive size: average 5 TB
• Files in archive: average 150,000
• Total archive size per beamline: 400 TB, doubles every year
• Duration: 10 years
• Ingest rates: 1-2GB/s
• encryption: no required
• 3’rd party copy - ‘gather’ all data from various primary storage systems - controlled from single point• local (to site) data transport should be RDMA based and operate (efficiently) on networks faster than 10Gbs
• data encryption in transit not required
• API + CLI for seamless automation - i.e. API manifested as Rest-API• CLI on Linux, API should support used platforms (focus on Linux but incl. Windows ;-)
• MetaData
• other methods (i.e. referencing/finding through experiment managing services) used in addition
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
beamline manager – mix of automated and experiment specific/manual archive interaction
Page 10
end user workflow 3
• large collaboration or site managing and controlling archive operations on behalf of (all experiments) - all automated and
large scale
• all inherited from previous workflow - except the manual part - all interactions automated
• key size parameters• Single archive size: average 400 TB.
• Files in archive: average 25,000
• Total archive size per beamline: 10s PB, doubles every year
• Duration: 10 years
• Ingest rates: 3-10GB/s - averaged over 1-3 hours
• encryption: not required
• bulk recall - planned re-analysis require bulk restore operation with decent rates (50% of ingest rate) (feed the compute engine)
• async notification from archive on reaching certain states (i.e. data accepted and stored) to be updated in external DBs
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Integrated data archiving for large standardized beamline/facility experiments
Page 11
site manager & administrative workflows
• create and config core archive and related bit-stream-preservation instances• based on site and community data policies + contracts with community
• create ‘archive profiles’ determining operation modes and limits (all what could generate costs ;-)• i.e. this includes tradeoffs between costs and data resiliency (probability of data loss)• select appropriate ‘bit-stream-preservation’ instances and hierarchy among them (i.e. replication)
• setup further admin and end user accounts and their roles (authorizations)• delegation of limited admin tasks by group admins of community/groups
• configure/setup AAI - i.e. local IDP
• wide range of authentication methods usable (beside local site ones) – x509, OpenID, eduGAIN, … - more is better
used to ‘authenticate’ and to be usable in ‘ACL’ like authorization settings (the identity or DN)
• multiple authentication mapped to single ‘identity’
• setup role based model (identity select roles select archive profile)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
integration, setup and control - workflow derived requirements
Page 12
site manager & administrative workflows
• deployment scenarios (instance architectures)• deploy main services and esp. metadata store/query (Data Management+Ingest+Access in OAIS
speech)• locally• in cloud (using remote service and storage/handling hardware for MD operations)
• create/attach bit stream preservation layer (Archival Storage in OAIS speech)• local only• remote only• tiered - local and remote (i.e. remote tape) - remote could be ‘cooperating lab’, public cloud, …
• (streaming) protocol to transfer data between tiers should support efficient and secure ‘wide area’ transfers
• Deployment based on open standards / open source version preferrable• avoid vendor lock-in, assure long-term viability, benefit from wide community support• subscribing to paid support not excluded• commercial version not excluded as well (depending on the licencing model, exit strategy, etc.)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Deployment models/business models
Page 13
left over…
• life cycle of archive objects (not bound to a single access session) - create, fill (meta)data, close - data becomes immutable, query
• archive objects could be related to existing ones - i.e. containing new versions of derived data• all data access should be ‘stream’ based
• no random access (within a file) is required• recalls of pre-selected files out of single archive object• network protocol ‘firewall friendly) - i.e. http* based
• Billing• any ‘non-local’ deployment requires billing services and methods (obvious) seperated in service and storage costs
(at least)• external storage resource - long term predictable costs/contracts preferred (less ‘pay as you go’)• per user and group billing (user may be member of several groups and groups might be nested)
• encryption - in all cases is ‘nice to have’ - expecting issues with local ‘key management’ services• pre and post en/decryption of data in motion and/or at rest is a valid alternative
• (Meta)Data formats• no special (known to the archive service) data formats required, thus no format conversions (without user
interaction) required• Metadata, needs ‘exportable/importable’ to new/updated instances• Metadata - query engine should handle binary, strings, integer and date/time
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
other thoughts, requirements and options