Post on 19-May-2015
description
transcript
Introduction To GlusterFS
A Better Way To Do Storage 2
How To Ask a Question?
Some Housekeeping Items…
Ask a question at any time
Questions will be answered at
the end of the webinar
Slides will be available after
the webinar
The webinar is being
recorded
A Better Way To Do Storage 3
Heather Wellington Marketing Manager
Gluster, Inc.
Tom Trainer Director
Product Marketing
Gluster, Inc.
Today’s Speakers
A Better Way To Do Storage 4
Poll Question
Are you using GlusterFS today? – Yes, in a test environment
– Yes, it‟s deployed in a production environment
– No, however we are considering it
– Just researching
A Better Way To Do Storage 5
History of Gluster
How it all started – Backgrounds in high performance, clustered computing
– Working at Lawrence Livermore National Labs
• AB Periasamy & Hitesh Chellani design “Thunder”
• One of the worlds fastest super computers
• On Intel commodity hardware
• Solved filesystem scalability and performance limitations
– Large customer in oil & gas persuaded them to focus on storage
– Gluster founded by Hitesh & AB to bring technology to market
Result: award winning technology
Thunder
A Better Way To Do Storage 6
What is the Gluster File System?
Scale-out file storage software for – Network Attached Storage (NAS)
– Object
– Big Data / Analytics
GlusterFS provides – Flexibility to deploy in ANY environment
– High availability
– Unified files and objects
– File system for Apache Hadoop
– Scalability to Petabytes & beyond
– Linearly scalable performance
– Superior storage economics
August 23rd
A Better Way To Do Storage 7
GlusterFS Architecture Design Goals
Innovation – Eliminate metadata
– Dramatically improve time
– Unify files and objects
Elasticity – Flexibility adapt to growth/reduction
– Add, delete volumes & users
– Without disruption
Scale linearly – Multiple dimensions
• Performance
• Capacity
– Aggregated resources
Simplicity – Ease of management
– No complex Kernel patches
– Run in user space
Capacity
Per
form
ance
A Better Way To Do Storage 8
Key Differentiators
Software only
Open source
Modular, stackable storage OS architecture
Data stored in native formats
No metadata – Elastic hashing
Unified files and objects
Apache Hadoop file system compatible replacement
Filesystem runs in user space
Virtual Machine (VM) virtual motion enabler
A Better Way To Do Storage 9
Software Only - Future Proofing Storage
Hardware agnostic
Superior storage economics & flexibility – Data center / private cloud use commodity hardware
– Public cloud – i.e. AWS, RackSpace, GoGrid, Nimbula – pay for only what you need
No lock-in – Hardware vendors-at purchase time or in the future
– Any Cloud – Public, private, and hybrid
– Performance, capacity, or availability levels
– GlusterFS – not proprietary, files are stored in native formats (i.e. EXT4)
A Better Way To Do Storage 10
Open Source
200,000+ downloads – ~16,000 /month
550+ registered deployments – 45 countries
2,500+ registered users – Mailing lists, Forums, etc.
Active community – Diverse testing environments
– Bugs identification and fixes
– Code contributions
Member of broader ecosystem – OpenStack, Linux Foundation, Open
Virtualization Alliance
Global Adoption
A Better Way To Do Storage 11
Elastic Hashing
No metadata server
An algorithmic approach – Unique hash tag for each file stored
– Tags stored within the file system
– Rapid file read – low latency
Traditional Central Metadata Server
Traditional Distributed Metadata Server
Innovative Elastic Approach
A Better Way To Do Storage 12
A Standard Gluster Deployment
Gluster Global Namespace (HTTP, NFS, CIFS, Gluster Native) Application Data Objects
Clients/Apps Clients/Apps Clients/Apps
IP Network
VMs
VMDK VMDK
virtual storage pool
ApacheTM HadoopTM
Standard clients running
standard apps
Over any standard IP
network
Access to application data,
as files and folders and or
objects, in global
namespace, using a variety
of standard protocols
Stored in a commoditized,
virtualized, scale-out,
centrally managed pool of
DAS, SAN, NAS
A Better Way To Do Storage 13
Unified On-premise, Private and Public Cloud Storage
Client/Apps Client/Apps
Data Center / Private Cloud Public Cloud
Replication
Gluster Global Namespace
Client/Apps Client/Apps
Client/Apps Client/Apps
Client/Apps Client/Apps
Client/Apps
IP Network
A Better Way To Do Storage 14
Deployment Scenarios Common Solutions Built on GlusterFS
Media serving (CDN)
Large scale file storage
Tier 2 & 3 archive
File sharing
High Performance Computing (HPC) storage
IaaS storage layer
Disaster recovery
Backup & restore
Private cloud
A Better Way To Do Storage 15
Introducing GlusterFS 3.3 Beta 1
Next generation file and object storage – The first system for data storage that enables you to store and access data as
an object and as a file
– Flexible and powerful, it simplifies access and management of data
– Eases migration of legacy, file-based applications to object storage for use in
the cloud
Public beta availability: July 20, 2011 – Broad community testing and participation
– Selected enterprise customer engagements
A Better Way To Do Storage 16
The “Traditional” Unified Approach
Proprietary
Bolt-on hardware approach
– Combined hardware raises costs
– Higher TCO
– Paying for what you many not need
Increased risk
– Common hardware elements can fail
• Power supplies
• Fans
• Cabling…lots of cabling
Files Objects DB’s
Carved Up Storage Pool
NAS Object
Traditional Monolithic Hardware
Bolt-on Approach (i.e. EMC VNX)
Block “VNX reminds me of my old VHS, DVD and cable box….
….one thing fails and I’m blown out of the water.”
Beta Customer , 2011
A Better Way To Do Storage 17
Software Approach to File and Object
Network Attached Storage (NAS)
– NFS / CIFS / GlusterFS
– POSIX compliant
– Access files within objects
Window Access
– Improves Windows performance
– Uses HTTP, not slower CIFS
– We will still support SAMBA
Object Storage
– API
– Internet Protocol (IP)
– ResTFul
– Get/Put
– Buckets
– Objects seen as files
Standards based
– Amazon S3 ReSTFul interface
compatible
– Access data as objects and a NAS
interface to access files (NFS, CIFS,
GlusterFS)
High performance storage across heterogeneous server environments
A Better Way To Do Storage 18
GlusterFS 3.3 Unified File & Object Storage
Widely deployable and extremely flexible – On-premise, virtualized and in public and private clouds
– Deep unification of file and object data storage
• Not just unified at the management layer
• Not a bolt-together hardware product
– Access data within objects as files
– Compatible with Amazon Web Services
• S3
• Create S3 on EC2 and EBS
– Back up objects from the data center to AWS
– Enable S3 functionality in the data center
– Built to run on commodity hardware
A Better Way To Do Storage 19
GlusterFS 3.3 Easing & Accelerating Legacy App Migration
Enables cloudification of applications
– Removes remaining storage hurdles related to file only
access
– Allows for gradual app migration to the cloud
– Enables moves to both private and public cloud
infrastructures
Gluster FS 3.3
Object Storage
Data Center, Virtual
Public, Private Cloud
Unified file & object storage accelerates legacy app migration to the cloud
Enterprise Apps / Enterprise Data Center
Ob
ject In
fo
A Better Way To Do Storage 20
Gluster 3.3 Unified File & Object – Use Cases
Data Center – Take control of cloud services
– Reduce AWS S3 costs
– Deliver S3 like global services in-house
– Legacy application migration
IaaS – Deliver File and S3 Services
– Unified file and object
• Competitive differentiator
– Drastically reduce storage costs
– Increase offerings, revenues and margins
Traditional Data Center
Private Cloud
IaaS
S3
Data Center: S3 in house &/ integrate with S3
IaaS: Deliver S3 to clients
A Better Way To Do Storage 21
Introducing GlusterFS Compatibility for Apache Hadoop
Enhancement to GlusterFS providing a new file system
option for Apache Hadoop – Proven scale-out storage solution provides simultaneous file and object access within the
Hadoop
– Introduces a 4th storage option for Hadoop (HDFS, local disk, Kosmos)
Included in GlusterFS 3.3 beta 2 available August 23, 2011 – Community beta for testing and participation
– Select enterprise customer engagements
Requirements driven by community and customer requests – “Eliminate the 64MB fixed block size imposed by HDFS”
– “Eliminate the centralized metadata server”
– “Give us NAS” (via POSIX compliance)
Benefits – Out of the box compatibility with MapReduce applications, no rewrite required
– Enables organizations to unify data storage
– Flexible and powerful, it simplifies access and management of data
A Better Way To Do Storage 22
MapReduce
HDFS
Seamless Integration for Hadoop Deployments
GlusterFS can co-exist HDFS
Does NOT use the NameNode metadata server
Built using the Hadoop file system API – Requires simple configuration file changes
– C Lib Gluster client enable Gluster direct access
– Java Client
• JNI interface
– gluster_hadoop.jar
• Provides Java binding for Hadoop compatibility
MapReduce
HDFS
MapReduce
HDFS
MapReduce
HDFS
MapReduce
GlusterFS
NameNode
GlusterFS
Metadata
Server
A Better Way To Do Storage 23
Seamless Integration for Hadoop Deployments
Co-exists or replace HDFS
Ultimately eliminate the need for the NameNode
Faster access times – faster filesystem
All the features and benefits of GlusterFS
Metadata
Server NameNode
MapReduce MapReduce MapReduce MapReduce MapReduce
GlusterFS GlusterFS GlusterFS GlusterFS HDFS GlusterFS
A Better Way To Do Storage 24
Metadata
Server NameNode
Seamless Integration for Hadoop Deployments
NameNode metadata server eliminated
Faster access times – faster filesystem
All the features and benefits of GlusterFS
MapReduce MapReduce MapReduce MapReduce MapReduce
GlusterFS GlusterFS GlusterFS GlusterFS GlusterFS
A Better Way To Do Storage 25
Why It’s Different?
No metadata server – No single point of failure, automated self heal and failover
– No performance bottleneck on data lookups for fast file access
Built in replication – Synchronous for inter-node replication
– Asynchronous for geo-replication
No block size restrictions – Ideal for small and large files
POSIX compliant file system – Out of the box NFS, CIFS and Gluster native access
Expanded data access options – File and object access to data
– Access files from your object interface and access data within objects as files
– File based applications can access data without modification
Reduces requirement for replicated files from 3 to 2 – 33% capacity savings
A Better Way To Do Storage 26
Potential Uses for GlusterFS and Hadoop
Simplify and unify storage deployments – Centralized data store providing access to more applications
Provide users with file level access to data – Users can easily brose data using off the shelf tools
Enable legacy applications to access data via NFS – Analytic apps can access data without modification
Enable object base access to data – Modern applications can use object based access to data
A Better Way To Do Storage 27
User Space
Filesystem Runs in User Space
GlusterFS
Server (CPU/Mem)
1 TB
1 TB
1 TB
1 TB
1 TB
1 TB
1 TB
1 TB
1 TB
1 TB
1 TB 1 TB
Kernel
1 TB 1 TB
Not tied to kernel
No reassemblies
Independence
A Better Way To Do Storage 28
The Gluster Connector for OpenStack – July 27, 2011
SWIFT
Enables GlusterFS to be the underlying file system
Connects GlusterFS to Xen and KVM hypervisor
– Unified File and Object storage
– Highly-available, scale-out NAS
– Alternative to SWIFT
OpenStack Imaging Services
Unified File &
Object Storage
… Compute
API Layer
Mobile Apps. Web Clients. Enterprise Software Ecosystem
OpenStack Prior to Gluster
OpenStack with Gluster
A Better Way To Do Storage 29
The Gluster Connector for OpenStack – July 27, 2011
Connector enables GlusterFS to be chosen as the filesystem
– Provides:
• Unified File and Object storage
• Highly scalable NAS
• High Availability – synchronous and asynchronous replication
• Preferred, scalable alternative to SWIFT
• Virtual motion of virtual machines (a.k.a. vmotion)
GlusterFS
Server (CPU/Mem)
Hypervisor
VM
GlusterFS
Server (CPU/Mem)
Hypervisor
VM
VM
VM
VM
Virtual storage pool
VM
VM
VM
A Better Way To Do Storage 30
Pandora Internet Radio
Problem • Explosive user & title growth
• As many as 12 file formats for each song
• „Hot‟ content and long tail
Solution • Three data centers, each with a six-node
GlusterFS cluster
• Replication for high availability
• 250+ TB total capacity
Benefits • Easily scale capacity
• Centralized management; one administrator
to manage day-to-day operations
• No changes to application
• Higher reliability
• 1.2 PB of audio served
per week
• 13 million files
• Over 50 GB/sec peak
traffic
A Better Way To Do Storage 31
Brightcove
Problem • Cloud-based online video platform
• Explosive customer & title growth
• Massive video in multiple locations
• Costs rising, esp. with HD formats
Solution • Complete scale-out based on commodity
DAS/JBOD
• Replication for high availability
• 1PB total capacity
Benefits • Easily scale capacity
• Centralized management; one administrator
to manage day-to-day operations
• Higher reliability
• Path to multi-site
• Over 1 PB currently in
Gluster
• Separate 4 PB project
in the works
A Better Way To Do Storage 32
Cincinnati Bell Technology Solutions
Problem • Host a dedicated enterprise cloud solution
• Large scale VMware environment
• Need high availability
Solution • Gluster for VM storage, NFS to clients
• SAS drives on back-end
• Replication for high availability
Benefits • Storage provisioning from 6 wks to 15 min.
• Vendor agnostic storage
• Low cost of service delivery
• Elastic growth
• Large scale VM
storage
• Low cost service
delivery for enterprise
customer
• Drastic reduction in
provisioning time
A Better Way To Do Storage 33
Problem • Capacity growth from 144TB to 1+PB
• Multiple distributed users/departments
• Multi OS access - Windows, Linux and Unix
Solution • GlusterFS Cluster
• Solaris/ZFS/x4500 w/ InfiniBand
• Native CIFS/ NFS access
Benefits • Capacity on demand / pay as you grow
• Centralized management
• Higher reliability
• OPEX decreased by 10X
Partners Healthcare
• Over 500 TB
• 9 Sun “Thumper”
systems in cluster
Private Cloud: Centralized Storage as a Service
A Better Way To Do Storage 34
Gluster Enterprise Deployment Options
Storage Software Appliance – Deploy on bare metal
– Any hardware on Red Hat Hardware HCL
Virtual Machines – Deployable on the leading virtual machines
Amazon Web Services (AWS) – Runs within Amazon Machine Image (AMI)
RightScale Cloud Management – GlusterFS managed via a RightScale ServerTemplate
– Deployable via the RightScale Cloud Management Dashboard
GoGrid Cloud – Gluster Server Image (GSI) for scale-out NAS on GoGrid cloud
On-premise/datacenter
Virtualization/private cloud
Public cloud
A Better Way To Do Storage 35
Many Enterprises Rely on Gluster Now
A Better Way To Do Storage 36
Summary
GlusterFS is scale-out storage – NAS
– Object
– Big Data / Analytics
Flexibility, scalability, superior economics
OpenStack cloud – Unified file and object storage
– Virtual machine (VM) virtual motion
Innovative architecture provides a better way to do
storage
A Better Way To Do Storage 37
Your turn - ask our experts
Questions and Answers
Try Gluster for free here: http://www.gluster.com/trybuy/
Additional resources here: http://www.gluster.com/products/resources/
Join the community: http://www.gluster.org/
Follow on twitter: @gluster.
Read our blog: http://blog.gluster.com/
Contact us at: info@gluster.com or 1-800-805-5215