Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | sara-summers |
View: | 216 times |
Download: | 0 times |
Shaun de Witt, STFCMaciej Brzeźniak, PSNCMartin Hellmich, CERN
Federating Grid and Cloud Storage in EUDAT
International Symposium on Grids and Clouds 2014,
23-28 March 2014
Agenda• Introduction
• …
• …
• …
• Test results
• Future work
3rd EUDAT Technical meeting in Bologna 7th February 2013
Introduction• We present and analyze the results
of Grid and Cloud Storage integration
• In EUDAT we used:– iRODS as Grid Storage federation mechanism
– OpenStack Swift as scalable object storage solution
• Scope:– Proof of concept
– Pilot OpenStack Swift installation in PSNC
– Production iRODS servers in PSNC (Poznan) and EPCC (Edinburgh)
3rd EUDAT Technuical meeting in Bologna 7th February 2013
EUDAT project introduction• pan-European Data Storage & mgmt infrastructure
• Long term data preservation:
• Storage safety, availability – replication, integrity control
• Data Accessibility – visibility, possibility to refer over years
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Partners: data center & communities:
EUDAT challenges:
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• Federate heterogeneous data management systems:
• dCache, AFS, DMF, GPFS, SAM-FS
• File systems, HSMs, file servers
• Object Storage systems (!)
while ensuring:
• Performance, scalability,
• Data safety, durability, HA, fail-over
• Unique access, Federation transparency,
• Flexibility (rule engine)
• Implement the core services:
• safe and long-term storage: B2SAFE,
• efficient analysis: B2STAGE,
• easy deposit & sharing: B2SHARE,
• Data & meta-data exploration: B2FIND.
Picture showing various storagesystems federated under iRODS
EUDAT CDI domain of registered data:
Grid – Cloud storage integration
• Need to integrate Grids and Cloud/Object storage• Grids get another, cost-effective, scalable backend
• Many institutions and initiativesare testing & using in production object storage including
• Most Cloud Storage use Object Storage concept
• Object Storage solutions have limited supportfor federation that is well addressed in Grids
• In EUDAT we integrated:• object storage system – OpenStack Swift
• iRODS servers and federations
3rd EUDAT Technuical meeting in Bologna 7th February 2013
Context: Object Storage Concept
• The concept enables building low-cost, scalable, efficient storage:• Within data centre
• DR / distributed configurations
• Reliability thanks to redundancy of components:• Many cost-efficient storage servers w/ disk drives (12-60
HDD/SSD)
• Typical (cheap) network: 1/10 Gbit Ethernet
• Limitations of traditional appraoches:• High investment cost and maintenance
• Vendor lock-in, Closed architecture, Limited scalability
• Slow adoption of new technologies than in commodity market
Context: Object Storage importance• Many institutions and initiatives
(DCs, NRENs, companies, R&D projects)are testing & using in production object storage including:• Open source / private cloud:
• Open Stack Swift
• Ceph / RADOS
• Sheepdog, Scality…
• Commercial:• Amazon S3, RackSpace Cloud Files…
• MS Azzure Object Storage…
• Most promising open source: Open Stack Swift & Ceph
Object Storage: Architectures
OpenStack Swift
User Apps
Load balancer
ProxyNode
ProxyNode
ProxyNode
StorageNode
StorageNode
StorageNode
StorageNode
StorageNode
UploadDownload
CEPH
LibRados
RadosGW RBD CephFS
APP HOST / VM Client
Rados
MDS
MDS.1
MDS.n
......
MONs
MON.1
MON.n
......
OSDs
OSD.1
OSD.n
......
Object Storage: concepts:
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
Object Storage concepts: no DB lookups!
OpenStack Swift Ring
Source:The Riak Project
Source:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Ceph’s map
• No meta-data lookups, no meta-data DB!, data placement/location computed!
• Swift: Ring: represents the space of all possible computed hash values divided in equivalent parts (partitions); partitions are spread across storage nodes
• Ceph: CRUSH map: list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data.
Grid – Cloud storage integration
• Most cloud/object storage solutions expose:• S3 interface
• Other native interfaces: OSS: Swift; Ceph: RADOS
• S3 (by Amazon) is de facto standard in cloud storage:• Many PetaBytes, Global systems
• Vendors use it (e.g. Dropbox) or provides it
• Large take up
• Similar concepts:• CDMI: Cloud Data Management Interface –
SNIA standard, not many implementationshttp://www.snia.org/cdmi
• Nimbus.IO: https://nimbus.io
• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/
• RackSpace Cloud Files:www.rackspace.com/cloud/files/
3rd EUDAT Technuical meeting in Bologna 7th February 2013
S3 and S3-like in commercial systems:
• S3 re-sellers:• Lots of services
• Including Dropbox
• Services similar to S3 concept:• Nimbus.IO:
https://nimbus.io
• MS-Azzure blob Storage:http://www.windowsazure.com/en-us/manage/services/storage/
• RackSpace Cloud Files:www.rackspace.com/cloud/files/
• S3 implementations ‚in the hardware’:• Xyratex
• Amplidata
3rd EUDAT Technuical meeting in Bologna 7th February 2013
o
Why build PRIVATE S3-like storage?• Features/ benefits:
• Reliable storage on top of commodity hardware
• User still able to control the data
• Easy scalability, possible to grow the system• Adding resources and redistributing data possible in non-disruptive way
• Open source software solutions and standards available:
• e.g. OpenStack Swift: Open Stack Native API and S3 API
• Other S3-enabled storage: e.g. RADOS
• CDMI: Cloud Data Management Interface
3rd EUDAT Technuical meeting in Bologna 7th February 2013
Why to federate iRODS with S3/OpenStack?
• Some communities have data stored in OpenStack
• VPH is building reliable storage cloud on top of OpenStack Swift within pMedicine project (together with PSNC)
• These data should be available to EUDAT
• Data Staging: Cloud -> EUDAT -> PRACE HPC and back
• Data Replication: Cloud -> EUDAT -> other back-end storage
• We could apply rule engine to data in the cloud, assign PIDs
3rd EUDAT Technuical meeting in Bologna 7th February 2013
• We were asked to consider cloud storage:
• From EUDAT 1st year review report:
EUDAT’s iRODS federation
VPH case analysis:
iRODS server
S3 driver
S3 APIOSS API
iRODS server
other storage driver
Storage system
S3/OSS
client
iRODS client
HPC system
iRODS server
storage driver
Data access
Data ingestion
Regi-stration
Data Staging
EUDAT’s PID Service
Replication
Dataingestion
Dataaccess
Dataaccess
PIDassigned
Our 7.2 project
• Purpose:
• To examine existing iRODS-S3 driver
• (possibly) to improve it / provide another one
• Steps/status:
• 1st stage:• Play with what is there – done for OpenStack/S3 + iRODS
• Examine functionality
• Evaluate scalability – found some issues already
• Follow-up• Try to improve the existing S3 driver
• Functionality
• Performance
• Implement native Open Stack driver?
• Get in touch with iRODS developers
3rd EUDAT Technuical meeting in Bologna 7th February 2013
iRODS-OpenStack tests
TEST SETUP:• iRODS server:
• Cloud as compoundresources
• Disk cache in front of it
• Open Stack Swift:
• 3 proxies, 1 with S3
• 5 storage nodes
• Extensive functionality and perf. tests
• Amazon S3:
• Only limited functionality tests
3rd EUDAT Technuical meeting in Bologna 7th February 2013
S3/OpenStack API
S3 API
iRODS server(s)
iRODS-OpenStack test
TEST RESULTS:
• S3 vs native OSS overhead• Upload: ~0%
• Download: ~8%
• iRODS overhead:
• Upload: ~19%
• Download:• From compound S3: ~0%
• Cached: SPEEDUP: 230%
(cache resources faster than S3)
iRODS-OpenStack test
Conclusions and future plans:
• Conclusions
• Performance-wise iRODS does not bring much overhead – files <2GB
• Problems arise for files >2GB – no support for multipart upload in iRODS-S3 driver – this prevents iRODS from storing files >2GB in clouds
• Some functional limits (e.g. imv problem)
• Using iRODS to federate S3 clouds in large scalewould require improving the existing or developing a new driver
• Future plans:
• Test the integration with VPH’s cloud using existing driver
• Ask SAF for supporting the driver development
• Get in touch with iRODS developers to assure the sustainability of our work
EUDAT’s iRODS federation
Object storage on top of iRODS?
S3 driver
S3 API S3/OSS
client
iRODS client
Data Access/
ingest
Dataingestion
Dataaccess
iRODS server
Other storage
iRODS server
other storage driver
Storage system Storage system
iRODS API S3 API
Problems:• Data organisation mapping: * filesystem vs objects * big files vs fragments
• Identity mapping? * S3 keys/accounts vs X.509?
• Out of scope of EUDAT? * a lot of work needed