Date post: | 13-Jan-2017 |
Category: |
Technology |
Upload: | sandeep-patil |
View: | 58 times |
Download: | 5 times |
#ibmedge© 2016 IBM Corporation
Software Defined Analytics with File and Object Access Plus Geographically Distributed DataSandeep Patil, STSM, Spectrum Scale
Trishali Nayar, AFM Development, Spectrum Scale
Smita Raut, Object Development, Spectrum Scale
22 Sep 2016
Acknowledgement: Bill Owen, Dean Hilderbrand, Sanjay Gandhi, Brian Nelson, Tomonori Kubota, Gyoh Ohsawa
#ibmedge2
Agenda• Introduction to Spectrum Scale Active File Manager (AFM)
• AFM Use Cases
• Spectrum Scale Protocol
• Unified File & Object Access (UFO) Feature Details
• AFM + Object : Unique Wan Caching for Object Store
• Deep Dive on Single Site & Multi-site Caching
• Configuration Commands with Demo
• Q & A
© 2016 IBM Corporation #ibmedge
Spectrum Scale Active File Management (AFM)
#ibmedge4
Spectrum Scale –The Complete Data Management Solution
#ibmedge5
AFM Overview• Active file management (AFM) uses a home-and-cache model in which a single
home provides the primary storage of data, and exported data is cached in a local GPFS™ file system
• AFM is primarily suited for remote caching
• Users access files from the cache system • For read requests, when the file is not yet cached, AFM retrieves the file from the home site• For write requests, writes are allowed on the cache system and can be pushed back to the
home system, depending on the cache types
#ibmedge6
AFM Caching Overview
Spectrum Scale
Storage Array
Storage node
Storage node
Home Cluster
Spectrum Scale
Storage Array
Storage node
Storage node
Cache Cluster
Nodes are made NFS servers
Few nodes are made gateway nodes
Cache filesets are associated to NFS export at home.
#ibmedge7
Global Sharing with Spectrum Scale AFM• Expands the GPFS global namespace across geographical distances
– Caches local ‘copies’ of data distributed to one or more GPFS clusters – Low latency ‘local’ read and write performance – Automated namespace management– As data is written or modified at one location, all other locations see that same data
• Efficient data transfers over wide area network (WAN) - Works with unreliable, high latency connections
• Speeds data access to collaborators and resources around the world
GPFS
GPFS
GPFS
#ibmedge8
AFM Caching Basics• Sites – two sides for a cache relationship
• A single home cluster– Presents a fileset that can be cached (export with NFS)– Can be non-GPFS cluster/nodes
• One or more cache clusters– Associates a local fileset with the home export
• AFM Fileset• Independent fileset with per-inode in xattrs• Data is fetched into the fileset on access (or prefetched on command)• Data written to the fileset is copied back to home
• Gateway Node (designation)• Maintains an in-memory queue of pending operations• Moves data between the cache and home clusters• Monitors connectivity to home, switches to disconnected mode on outage, triggers recovery on failure
#ibmedge9
Spectrum Scale AFM Use CasesGlobal Namespace
• Provides common name space across globally distributed cloud
• Persistent scalable cache for remote File System
Content distribution• Central site is
where data is created, maintained
• Branch/edge sites can periodically pre-fetch or pull on demand
Content Consolidation Disaster Recovery
• Replication of data across WAN with consistency points
• Failover and Failback support
• Branch offices work on local active data
• Master repository maintained centrally
• Adv functions – backup etc on central site
© 2016 IBM Corporation #ibmedge
Spectrum Scale Protocol
#ibmedge11
Enhanced Protocol Support from 4.1.1 releaseThe Challenge: How can I share my storage infrastructure across all of my legacy and new
generation applications?
The Solution:
• The new IBM Spectrum Scale Protocol Node allows access to data stored in a Spectrum Scale filesystem, using additional access methods and protocols.
• The Protocol Node functions are clustered and can support transparent failover for NFS and SWIFT protocols as well as SMB protocols.
• Multiprotocol data access from other systems using the following protocols• NFS v3 and v4• SMB 2 and SMB 3.0 mandatory features / CIFS for Windows support.• OpenStack Swift and S3 API support for object storage.
#ibmedge12
Adding Protocol Support
Administrator
Command Line Interface
Users
NFS
SMB/CIFS
POSIX
Open Stack Swift
PN1
ProtocolNode
Flash
Disk
Tape
Ext
erna
l TC
P/IP
or I
B N
etw
ork
PN2
PNn
…
NSD1
Network Shared Disks
NSD2
NSDn
…
Physical Storage
IBM
Spe
ctru
m S
cale
Clu
ster
TC
P/IP
or I
B N
etw
ork
Mgmt Nodes
AuthenticationServices
keystone
Open Stack Cinder
Spe
ctru
m S
cale
Clu
ster
Nod
es
Elastic Storage Server
#ibmedge13
IBM Spectrum Scale Benefits
Better performance Eliminate hotspots with massively parallel access to files Sequential I/O with ES greater than 400 GB/s Throughput advantage for parallel streaming workloads, e.g. Tech Computing and Analytics
More Storage. More Files. Hyper Scale. Simplified Management Easier management with one global namespace instead of managing islands of
NAS arrays, e.g. no need to copy data between compute clusters
Integrated policy driven automation Fewer storage administrators required
Lower Cost Optimizes storage tiers including flash, disk and tape Increased efficiency and more efficient provisioning due to parallelization and striping technology
Remove duplicate copies of data, e.g. run analytics on one copy of data without having to set up a separate silo
#ibmedge14
IBM Spectrum Scale – Protocol Integration• Software Offering - protocol support is added to GPFS
• Can be configured on existing GPFS clusters or new cluster• Support for Intel and Power Systems• RHEL 7/7.1
– Protocol node requirement– Remaining GPFS nodes can have any supported environment/platform
• Use of installation”) also limited to RHEL 7/7.1
• Add support for the following protocols• SMB• NFS• Object (HTTP Rest)
• Some cluster nodes are designated as “Protocol Nodes” (aka. CES nodes)• Integrated management of the protocol services• Active-Active clustering• High availability through IP fail-over
#ibmedge15
IBM Spectrum Scale – Protocol Support
#ibmedge16
Protocol Support Considerations• Adding Protocol Nodes to GPFS Cluster:
• All RHEL7/xServers or All RHEL7/pServers • Not NSD Servers• Protocol Export IPs distributed among the protocol nodes
– Different policies for balancing and failback
• Management: GUI and CLI
• Deployment: Easy Automated Deployment
• Flexibility: customer choice of nodes/disks/storage options
• Scale: limits for capacity/performance based on GPFS;
• CES nodes limits based on protocols enabled• 16 nodes, 3000 connections/node and 20K connections/cluster for SMB• 32 nodes for only NFS or only Object or NFS+Object
• Security: root access for cluster management but have sudo access support
• Roll your own or combine with Lab Services to meet expectations
© 2016 IBM Corporation #ibmedge
Spectrum Scale Object (Part of Spectrum Scale Protocol)
#ibmedge18
Spectrum Scale Object Storage• Basic support added in 4.1.1 release & enhanced in 4.2 and 4.2.1 release
• Based on Openstack Swift (Juno Release)
• REST-based data access• Growing number of clients due to extremely simple protocol• Applications can easily save & access data from anywhere using HTTP• Simple set of atomic operations:
– PUT (upload)– POST (update metadata)– GET (download)– DELETE
• Amazon S3 Protocol support
• High Availability with CES Integration
• Simple and Automated Installation Process
• Integrated authentication (Keystone) support
• Native GPFS Command Line Interface to manage Object service (mmobj command)
#ibmedge19
Spectrum Scale Object Storage – Additional Features• Unified file and object support with Hadoop connectors• Support for Encryption• Support for Compression• Only Object Store with Tape support for Backup• Object store with integrated transparent cloud tiering Support • Multi Region support• AD/LDAP support for authentication• ILM support for Object• Movement of Object across storage tiers based on access heat• Spectrum Scale Object with IBM DeepFlash becomes object store over all flash
array for newer faster workloads.• Spectrum Scale Object with WAN caching support (AFM)
© 2016 IBM Corporation #ibmedge
IBM Spectrum Scale: Unified File and Object Access Feature Overview
#ibmedge21
Unified File and Object (UFO Support)
Spectrum Scale: Redefining Unified Storage
• Challenge The world is not converged/file/object/HDFS today!
and never will be completely…
• Unified Scale-out Content Repository• File or object in. Object or file out.• Integrated big data analytics support• Native protocol support• High-performance that scales• Single Management Plane
Spectrum Scale
NFS SMBPOSIX
SSD FastDisk
SlowDisk
Tape
Swift/S3HDFS
#ibmedge22
What is Unified File and Object Access?• Accessing object using file interfaces (SMB/NFS/POSIX) and
accessing file using object interfaces (REST) helps legacy applications designed for file to seamlessly start integrating into the object world.
• It allows object data to be accessed using applications designed to process files. It allows file data to be published as objects.
• Multi protocol access for file and object in the same namespace (with common User ID management capability) allows supporting and hosting data oceans of different types of data with multiple access options.
• Optimizes various use cases and solution architectures resulting in better efficiency as well as cost savings.
<Clustered file system>
Swift (With Swift on File)
NFS/SMB/POSIXObject(http)
2 1
<Container>
Data ingested as Objects
3Data ingested as Files 4
Files accessed as Objects
© 2016 IBM Corporation #ibmedge
IBM Spectrum Scale: AFM + Object (Unique Proposition)
#ibmedge24
The Need: Thin-Thick storage capacity site deployments for Object Data
Applications
Applications
Applications
…
Limited storage
Limited storage
Limited storage
Unlimited storageCentral Site
Site 3
Site 2
Site 1Object Data
Object Data
Object Data
Centralized Analytics
Centralized Backup
• Geo Dispersed multiple sites with limited storage capacity • Independent Applications running on each sites accessing/generating object data.• Centralized Home for consolidated object data – ability to grow storage capacity.
• centralized backup for all sites via central location• ability to run analytics for all sites in central location
#ibmedge25
Usecase Requirements• There is an object store site that is closer to the end application but has a limited
storage capacity. • To cater to large storage capacity requirement there is another object store setup at a
geographically remote site which has unlimited or expandable storage capacity, that acts as a central archive.
• The relationship between these two object stores need to be setup in such a way that allows applications to access all object data from the site closer to them for faster access, even though it has limited storage capacity.
• The central site should have ability to do in place analytics of data.
• The central site should have ability to do backup of the data.
• If cache goes down the application should be able to failover to the central site.
#ibmedge26
The Solution: Unique WAN caching for Object Store - available only with Spectrum Scale
…
Unlimited storage
Central Site Centralized Analytics
Centralized Backup
Applications Limited storage
Site 1Object Data
Spectrum Scale Cluster with
Protocol Nodes (Object Enabled)
Spectrum Scale Cluster with
Protocol Nodes (Object Enabled)
Spectrum ScaleAFM (IW) Relationship with
cache eviction enabled on Site 1
Object Data can be accessed as Files using Unified file and Object Feature and used for
analytics
Data can be centrally backed to TapeSpectrum Scale Feature Requirements Addressed
AFM with Spectrum Scale Object - Allows objects store to have thin cache with eviction enabled and thick home.
AFM in IW Modes Allows for fail-back and fail-over from cache site to Home useful during disaster.
Unified File and Object Access with HDFS connector Allows centralized and in-place analytics of data at Home site
Tape Integration Centralized backup
#ibmedge27
Thin Object Store Cache – Thick Object Store Archive
Spectrum ScaleHome#1
Spectrum ScaleCache#1 Service
1
SerivesXXX
Site #1
FilesetObject access
Object Ingest
Fileset11TB/day
AFM Independent-Writer
Swift API Swift APIFailover/Failback
Existing Services Cache in Region 1 Archive in Region 2
ReplicateXXTB of dataeveryday
• Cache Site in Region 1 with limited storage and Home site in Region with max storage per data center• Object data to be archived from cache site in Region 1 to home site in Region 2 using AFM –IW• On cache failure, application will fail over home site for object access. Application will fail-back when
cache comes up.• Limited storage on cache site addressed by using Eviction along with AFM• Key Features used in Solution: Spectrum Scale Object , AFM (IW) with Eviction\• Available and documented in 4.2.1
#ibmedge
Spectrum Scale Cluster for Region 1
Home Cluster for Region 1
Services
Services
Region #1
Spectrum Scale Cluster for Region 1
Services
Services
Region #2
Sw
ift A
PI
Objects
Objects
Existing Services Cache Home in Region 3
Home Cluster for Region 2
Swift API Swift APIFailover/Failback
Swift API Swift APIFailover/Failback
One can include multiple sites where each site has its own home cluster at the central region and replicate the setup shown in previous slide for single site.
Multiple site Deployment
#ibmedge
Configuration Steps
• Details Configuration Step Available in 4.2.1 in Knowledge Center
Using AFM with Spectrum Scale Object• http://
www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1ins_usingafmwithobject.htm
29
#ibmedge
Conclusion• Spectrum Scale provides rich set of features like
• AFM• Protocols with POSIX, SMB,NFS and Object• Unified File and Object Access• In Place analytics using build-in Hadoop connectors
• Integrating AFM with spectrum scale object delivers unique solution required for many multi-site deployments wherein:
• One can have thin cache object store with auto eviction facility closer to the applications or users
• Centralized thick home object store which can act as failback object store for the thin cache sites.
• Ability to do in-place analytics of all the data on the home site• Ability to do a central backup at the home site.
30
#ibmedge
Spectrum Scale User Group• The Spectrum Scale User Group is free
to join and open to all using, interestedin using or integrating Spectrum Scale.
• Join the User Group activities to meetyour peers and get access to expertsfrom partners and IBM.
• Driven and owned by Customers
• Next meetings:- APAC: October 14, Melbourne- Global at SC16 : November 13 1pm to 5pm, Salt Lake City
• Web page: http://www.spectrumscale.org/ • Presentations: http://www.spectrumscale.org/presentations/ • Mailing list: http://www.spectrumscale.org/join/ • Contact: http://www.spectrumscale.org/committee/ • Meet Bob Oesterlin (US Co-Principal) at Edge2016: [email protected]
#ibmedge32
Session : How to apply Flash benefits to big data analytics and unstructured data
NDA & Customers ONLY
• Who: IBM Elastic Storage Server Offering Management
• Alex Chen
• When: Thursday, September 22, 2016
• 1:15pm to 2:15pm
• Where: Grand Garden Arena, Lower Level, MGM, Studio 10
• Contact(if any questions)
#ibmedge33
Spectrum Scale Trial VM• Download the IBM Spectrum Scale Trial VM from :
• http://www-03.ibm.com/systems/storage/spectrum/scale/trial.html
#ibmedge
ReferencesSpectrum Scale 4.2.1 Knowledge Center: Using AFM with object http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1ins_usingafmwithobject.htm
Spectrum Scale Object Store – Unified File and Objecthttp://www.slideshare.net/SandeepPatil154/spectrum-scaleexternalunifiedfile-object
From Archive to Insight: Debunking Myths of Analytics on Object Stores – Dean Hildebrand, Bill Owen,Simon Lorenz, Luis Pabon, Rui Zhang. Vancouver Summit, Spring 2015.https://www.youtube.com/watch?v=brhEUptD3JQ
Deploying Swift on a File System – Bill Owen, Thiago Da Silva. BrownBag at OpenStack Paris, Fall 2014https://www.youtube.com/watch?v=vPn2uZF4yWo
Breaking the Mold with OpenStack Swift and GlusterFS – Jon Dickinson, Luis Pabo. Atlanta Summit, Spring 2014https://www.youtube.com/watch?v=pSWdzjA8WuA
SNIA SDC 2015
http://www.snia.org/sites/default/files/SDC15_presentations/security/DeanHildebrand_Sasi__OpenStack%20SwiftOnFile.pdf
#ibmedge
Notices and Disclaimers
35
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
#ibmedge
Notices and Disclaimers Con’t.
36
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
#ibmedge37
IBM Spectrum Scale Summary
• Avoid vendor lock-in with true Software Avoid vendor lock-in with true Software Defined Storage and Open Standards
• Seamless performance & capacity scaling• Automate data management at scale• Enable global collaboration
Data management at scale OpenStack and Spectrum Scale helps clients manage data at scale
Business: I need virtually unlimited storage
Operations: I need a flexible infrastructure that supports both object and file based storage
Operations: I need to minimize the time it takes to perform common storage management tasks
A single data plane that supports Cinder, Glance, Swift, Manila as well as NFS, SMB, et. al.
A fully automated policy based data placement and migration tool
An open & scalable cloud platform
Sharing with a variety of WAN caching modes
Results
• Converge File and Object based storage under one roof
• Employ enterprise features to protect data, e.g. Snapshots, Backup, and Disaster Recovery
• Support native file, block and object sharing to data
Spectrum Scale
NFS
SMBPOSIX
SSD FastDisk
SlowDisk
Tape
Swift
HDFS
CinderGlance Manila
37
Collaboration: I need to share data between people, departments and sites with low latency.
Data management at scale
© 2016 IBM Corporation #ibmedge
Thank You