+ All Categories
Home > Technology > Scalable POSIX File Systems in the Cloud

Scalable POSIX File Systems in the Cloud

Date post: 15-Apr-2017
Category:
Upload: redhatstorage
View: 1,282 times
Download: 2 times
Share this document with a friend
50
SCALABLE POSIX FILE SYSTEMS IN THE CLOUD Jason Callaway @jasoncallaway | blog.jasoncallaway.com January 2016
Transcript
Page 1: Scalable POSIX File Systems in the Cloud

SCALABLE POSIX FILE SYSTEMSIN THE CLOUD

Jason Callaway@jasoncallaway | blog.jasoncallaway.comJanuary 2016

Page 2: Scalable POSIX File Systems in the Cloud

THE RED HAT STORAGE MISSIONTo offer a unified, open software-defined

storage portfolio that delivers a range of data services for next generation workloads

thereby accelerating the transition to modern IT infrastructures.

Page 3: Scalable POSIX File Systems in the Cloud

Traditional StorageComplex proprietary silos

Open, Software-Defined StorageStandardized, unified, open platforms

Custom GUI

Proprietary Software

ProprietaryHardware

StandardComputersand DisksS

tand

ard

Har

dwar

eO

pen

Sou

rce

Sof

twar

e

Ceph Gluster +++

Control Plane (API, GUI)

ADMIN USER

THE FUTURE OF STORAGE

ADMIN

USER

ADMIN

USER

ADMIN

USER

Custom GUI

Proprietary Software

ProprietaryHardware

Custom GUI

Proprietary Software

ProprietaryHardware

Page 4: Scalable POSIX File Systems in the Cloud

WHY BOTHER?

PROPRIETARYHARDWARE

HARDWARE-BASED INTELLIGENCE

SCALE-UPARCHITECTURE

CLOSED DEVELOPMENT PROCESS

Common, off-the-shelf hardwareLower cost, standardized supply chain

Scale-out architectureIncreased operational flexibility

Software-based intelligenceMore programmability, agility, and control

Open development processMore flexible, well-integrated technology

Page 5: Scalable POSIX File Systems in the Cloud

A RISING TIDE

“By 2020, between 70-80% of unstructured data will be held on lower-cost storage managed by SDS environments.”

“By 2019, 70% of existing storage array productswill also be available as software only versions”

“By 2016, server-based storage solutions will lower storage hardware costs by 50% or more.”

Gartner: “IT Leaders Can Benefit From Disruptive Innovation in the Storage Industry”

Innovation Insight: Separating Hype From Hope for Software-Defined Storage

Innovation Insight: Separating Hype From Hope for Software-Defined Storage

Market size is projected to increase approximately 20% year-over-year between 2015 and 2019.

2013 2014 2015 2016 2017 2018 2019

$1,349B

$1,195B

$1,029B

$859B

$706B

$592B

SDS-P MARKET SIZE BY SEGMENT

$457B

Block StorageFile StorageObject StorageHyperconverged

Source: IDC

Software-Defined Storage is leading a shift in the global storage industry, with far-reaching

effects.

Page 6: Scalable POSIX File Systems in the Cloud

Open Software-Defined Storage is a fundamental reimagining of how storage infrastructure works.It provides substantial economic and operational advantages, and it has quickly become ideally suited for a growing number of use cases.

TODAY EMERGING FUTURE

CloudInfrastructure

CloudNative Apps

Analytics

Hyper-Convergence

Containers

???

???

THE JOURNEY

Page 7: Scalable POSIX File Systems in the Cloud

THE RED HAT STORAGEPORTFOLIO

Page 8: Scalable POSIX File Systems in the Cloud

THE RED HAT STORAGE PORTFOLIO

Cephmanagement

OP

EN S

OU

RC

ESO

FTW

AR

E

Glustermanagement

Cephdata services

Glusterdata services

STA

ND

AR

DH

AR

DW

AR

E

Share-nothing, scale-out architecture provides durability and adapts to changing demands

Self-managing and self-healing features reduce operational overhead

Standards-based interfaces and full APIs ease integration with applications and systems

Supported by theexperts at Red Hat

Page 9: Scalable POSIX File Systems in the Cloud

GROWING INNOVATION COMMUNITIES

lOver 11M downloads in the last 12 monthslIncreased development velocity, authorship, and discussion has resulted in rapid feature expansion.

lContributions from Intel, SanDisk, SUSE,and DTAG.lPresenting Ceph Days in cities around the world and quarterly virtual Ceph Developer Summit events.

78 AUTHORS/mo

1500 COMMITS/mo

258 POSTERS/mo

41 AUTHORS/mo

259 COMMITS/mo

166 POSTERS/mo

Page 10: Scalable POSIX File Systems in the Cloud

SanDisk sells the InfiniFlash storage arrays, designed for use with Red Hat Ceph Storage. Optimizations contributed by SanDisk deliver high performance which allow Ceph customers to service new workloads.

Our relationship includes:

lEngineering and product collaboration

lCommunity thought leadership

PARTNER SOLUTIONS

All-flash arrays, optimized for Ceph

Supermicro's Red Hat Ceph Storage optimized solutions offer durable, software-defined, scale-out storage platforms in 1U/2U/4U form factors and are designed to maximize performance, density, and capacity

Customer can expect to see:

lReference architectures, validated for

performance, density, and capacity

lWhitepapers and datasheets that support

Red Hat Storage solutions

Systems designed with storage in mind

Through silicon innovation and software optimization, Intel pushes the envelope on open, software-defined storage. A key contributor, Intel recently donated significant hardware to the Ceph project.

Intel development efforts have included:

lSSD and performance optimizations

lCephFS development

Accelerating software-defined storage

Page 11: Scalable POSIX File Systems in the Cloud

Version 1.3 of Red Hat Ceph Storage is the first major release since joining the Red Hat Storage product portfolio, and incorporates feedback from customers who have deployed in production at large scale.

Areas of improvement:

lRobustness at scale

lPerformance tuning

lOperational efficiency

REFINEMENTS FORPETABYTE-SCALE OPERATORS

Optimized for large-scale deployments

Version 3.1 of Red Hat Gluster Storage contains many new features and capabilities aimed to bolster data protection, performance, security, and client compatibility.

New capabilities include:

lErasure coding

lTiering

lBit Rot detection

lNVSv4 client support

Enhanced for flexibility and performance

Page 12: Scalable POSIX File Systems in the Cloud

RED HAT GLUSTER STORAGE

Page 13: Scalable POSIX File Systems in the Cloud

Nimble file storage for petabyte-scale workloads

lMachine analytics with SplunklBig data analytics with Hadoop

TARGET USE CASES

Enterprise File SharinglMedia StreaminglActive Archives

Analytics

Enterprise Virtualization

OVERVIEW:RED HAT GLUSTER STORAGE

Purpose-built as a scale-out file store with a straightforward architecture suitable for public, private, and hybrid cloud

Simple to install and configure, with a minimal hardware footprint

Offers mature NFS, SMB and HDFS interfaces for enterprise use

Customer Highlight: IntuitIntuit uses Red Hat Gluster Storage to provide flexible, cost-effective storage for their industry-leading financial offerings. Rich Media & Archival

Page 14: Scalable POSIX File Systems in the Cloud

OVERVIEW:TERMINOLOGY

Brick: basic unit of storage, represented by an export directory on a server in the trusted storage pool.

Cluster: a group of linked computers, working together closely thus in many respects forming a single computer.

FUSE: Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code.

Geo-Replication: provides a continuous, asynchronous, and incremental replication service from site to another over Local Area Networks (LANs), Wide Area Network (WANs), and across the Internet.

Page 15: Scalable POSIX File Systems in the Cloud

OVERVIEW:TERMINOLOGY

Metadata: defined as data providing information about one or more other pieces of data. There is no special metadata storage concept in GlusterFS. The metadata is stored with the file data itself.

Namespace: an abstract container or environment created to hold a logical grouping of unique identifiers or symbols. Each Gluster volume exposes a single namespace as a POSIX mount point that contains every file in the cluster.

Volume: a logical collection of bricks. Most of the gluster management operations happen on the volume.

Page 16: Scalable POSIX File Systems in the Cloud

OVERVIEW:GLUSTER ARCHITECTURE

Page 17: Scalable POSIX File Systems in the Cloud

OVERVIEW:VOLUME PERMUTATIONS

Distributed -- Namespace is distributed horizontally across n bricks

Replicated -- Namespace is synchronously replicated to an identical namespace

Striped -- Namespace stripes data across bricks

Dispersed -- Namespace uses Erasure Coding

No Redundancy Replicated (synchronous replication) Dispersed (erasure codes)

Distributed Distributed Distributed

Striped Striped

Distributed-Striped Distributed-Striped

Geo-replicated Geo-replicated Geo-replicated

Page 18: Scalable POSIX File Systems in the Cloud

Standard replicated back-ends are very durable, and can recover very quickly, but they have an inherently large capacity overhead.Erasure coding back-ends reconstruct corrupted or lost data by using information about the data stored elsewhere in the system.Providing failure protection with erasure coding eliminates the need for RAID, consumes far less space than replication, and can be appropriate for capacity-optimized use cases.

ERASURE CODING

Storing more data with less hardwareOBJECT/FILE

1 2 3 4 X Y

ERASURE CODED POOL/VOLUME

STORAGE CLUSTER

Up to 75% reduction in TCO

Page 19: Scalable POSIX File Systems in the Cloud

Optimally, infrequently accessed data is served from less expensive storage systems while frequently accessed data can be served from faster, more expensive ones.However, manually moving data between storage tiers can be time-consuming and expensive.Red Hat Gluster Storage 3.1 now supports automated promotion and demotion of data between “hot” and “cold” sub volumes based on frequency of access.

TIERING

Cost-effective flash accelerationOBJECT/FILE

HOT SUBVOLUME(FLASH, REPLICATED)

STORAGE CLUSTER

COLD SUBVOLUME(ROTATIONAL, ERASURE CODED)

Page 20: Scalable POSIX File Systems in the Cloud

Bit rot detection is a mechanism that detects data corruption resulting from silent hardware failures, leading to deterioration in performance and integrity.Red Hat Gluster Storage 3.1 provides a mechanism to scan data periodically and detect bit-rot.Using the SHA256 algorithm, checksums are computed when files are accessed and compared against previously stored values. If they do not match, an error is logged for the storage admin.

BIT ROT DETECTION

Detection of silent data corruption

ADMIN

0

!!!

0 0 0 0 00 X

Page 21: Scalable POSIX File Systems in the Cloud

Using NFS-Ganesha, an NFS server implementation, Red Hat Gluster Storage 3.1 provides client access with simplified failover and failback in the case of a node or network failure.Supporting both NFSv3 and NFSv4 clients, NFSGanesha introduces ACLs for additional security, Kerberos authentication, and dynamic export management.

SECURITY

Scalable and secure NFSv4 client supportCLIENT CLIENT CLIENT CLIENT

NFS NFS NFS NFS

NFS-GANESHA NFS-GANESHA

STORAGE CLUSTER

Page 22: Scalable POSIX File Systems in the Cloud

OVERVIEW:MAXIMUMS

• 64 nodes per cluster• 8 volumes per LVM RAID• XFS max recommended size for brick

• 100 TB certified / 8 EB maximum on RHEL 6• 500 TB certified / 8 EB maximum on RHEL 7

• 16 PB usable per distributed-replicated cluster• (500 TB * 64 nodes / 2 replication factor)

• ~4 PB with EC2 16 TB EBS volumes

• http://blog.gluster.org/category/performance/

Page 23: Scalable POSIX File Systems in the Cloud

RED HAT GLUSTER STORAGETARGET WORKLOADS

Page 24: Scalable POSIX File Systems in the Cloud

WHAT TOOL DO I USE?Use Case Gluster Ceph

File• NFS• CIFS• FUSE Native

Client

Works great! Tech-preview in June

OLTP-like Maybe Works great!Block Nope Works great!Object Works great! Works great!Cloud Works great! NopeBig Data Works great! Works great!Geo-replication Works great! It’s complicated

Page 25: Scalable POSIX File Systems in the Cloud

ANALYTICSBig Data analytics with Hadoop

CLOUD INFRASTRUCTURE

RICH MEDIAAND ARCHIVAL

SYNC ANDSHARE

ENTERPRISEVIRTUALIZATION

Machine data analytics with Splunk

Virtual machine storage with OpenStack

Object storage for tenant applications

Cost-effective storage for rich media streaming

Active archives

File sync and share with ownCloud

Storage for conventional virtualization with RHEV

FOCUSED SET OF USE CASES

Page 26: Scalable POSIX File Systems in the Cloud

In-place Hadoop analytics in a POSIX compatible environment

HADOOP MAP REDUCE FRAMEWORK

lAllows the Hortonworks Data Platform 2.1 to be deployed on Red Hat Gluster Storage lHadoop tools can operate on data in-placelAccess to the Hadoop ecosystem of toolslAccess to non-Hadoop analytics toolslConsistent operating model: Hadoop can run directly on Red Hat Gluster Storage nodes

lFlexible, unified enterprise big data repositorylBetter analytics (Hadoop and non-Hadoop)lFamiliar POSIX-compatible file system and toolslStart small, scale as big data needs growlMulti-volume support (HDFS is single-volume)lUnified management (Hortonworks HDP Ambari and Red Hat Gluster Storage)

FEATURES

BIG DATA ANALYTICS

Hadoop DistributedFile System

Red Hat GlusterStorage Cluster

BENEFITS

Page 27: Scalable POSIX File Systems in the Cloud

High-performance, scale-out, online cold storage for Splunk Enterprise

Hot/warm data optimized for performance

10s of TB on Splunk server DAS

Cold optimized for cost, capacity and elasticity

Red Hat Storage Server on commodity x86 servers

lMultiple ingest options using NFS & FUSElExpand storage pools without incurring downtimelSupport for both clustered and non-clustered configurations

lRun high speed indexing and search onlSplunk’s cold data storelPay as you grow economics for Splunk cold datalReduce ingestion time for data withlstandard protocolsl“Always online”, fast, disk-based storage pools provide constant access to historical data

MACHINE DATA ANALYTICS

FEATURES BENEFITS

Page 28: Scalable POSIX File Systems in the Cloud

Massively-scalable, flexible, and cost-effective storage for image, video, and audio content

Unstructured image, video,and audio content

lSupport for multi-petabyte storage clusters on commodity hardwarelErasure coding and replication for capacity-optimized or performance-optimized poolslSupport for standard file & object protocolslSnapshots and replication capabilities for high availability and disaster recovery

lProvides massive and linear scalability in on-premise or cloud environmentslOffers robust data protection with an optimal blend of price & performancelStandard protocols allow access to broadcast content anywhere, on any devicelCost-effective, high performance storage for on-demand rich media content

RICH MEDIA

Red Hat GlusterStorage Cluster

Red Hat CephStorage Cluster

FEATURES BENEFITS

Page 29: Scalable POSIX File Systems in the Cloud

Open source, capacity-optimized archival storage on commodity hardware

Unstructuredfile data

lCache tiering to enable "temperature"-based storagelErasure coding to support archive and cold storage use caseslSupport for industry-standard file and object access protocols

lStore data based on its access frequencylStore data on premise or in a public or hybrid cloudlAchieve durability while reducing raw capacity requirements and limiting costlDeploy on industry-standard hardware

ACTIVE ARCHIVES

Red Hat GlusterStorage Cluster

Red Hat CephStorage Cluster

Unstructuredobject data

Volumebackups

FEATURES BENEFITS

Page 30: Scalable POSIX File Systems in the Cloud

Powerful, software-defined, scale-out, on-premise storage for file sync and share with ownCloud

Webbrowser

lSecure file sync and share with enterprise-grade auditing and accountinglCombined solution of Red Hat Gluster Storage, ownCloud, HP ProLiant SL4550 Gen 8 serverslDeployed on-premise, managed by internal ITlAccess sync and share data from mobile devices, desktop systems, web browsers

lSecure collaboration with consumer-grade ease of uselLower risk by storing data on-premiselConform to corporate data security and compliance polices lLower total cost of ownership with standard, high-density servers and open source

FILE SYNC AND SHARE

OWNCLOUD ENTERPRISE EDITION

Mobileapplication

DesktopOS

RED HAT GLUSTER STORAGE

FEATURES BENEFITS

Page 31: Scalable POSIX File Systems in the Cloud

The Journey to Software-Defined Storage INTERNAL ONLY31

Scalable, reliable storage forRed Hat Enterprise Virtualization

lReliably store virtual machine images in a distributed Red Hat Gluster Storage volumelManage storage through the RHEV-M consolelDeploy on standard hardware of choicelSeamlessly grow and shrink storage infrastructure when demand changes

lReduce operational complexities by eliminating dependency on complex and expensive SAN infrastructures lDeploy efficiently on less expensive, easier to provision, standard hardwarelAchieve centralized visibility and control of server and storage infrastructure

ENTERPRISE VIRTUALIZATION

FEATURES BENEFITS

Page 32: Scalable POSIX File Systems in the Cloud

ROADMAP DETAIL

Page 33: Scalable POSIX File Systems in the Cloud

ROADMAP:RED HAT GLUSTER STORAGE

TODAY (v3.1) V3.2 (H1-2016) FUTURE (v4.0 and beyond)

Gluster 3.8, RHEL 6, 7

lDynamic provisioning of resources

lInode quotaslFaster Self-heallControlled Rebalance

lSMB 3 (advanced features)lMulti-channel

lAt-rest encryption

lNew UIlGluster REST API

lCompressionlDeduplicationlHighly scalable control planelNext-gen replication/distribution

lpNFS

lQoSlClient side caching

MG

MT

CO

RE

FIL

ES

EC

MG

MT

CO

RE

FIL

EP

ER

F

Gluster 4, RHEL 7Gluster 3.7, RHEL 6, 7

lDevice ManagementlGeo-Replication, SnapshotslDashboard

lErasure CodinglTieringlBit Rot DetectionlSnap Schedule

lActive/Active NFSv4lSMB 3 (basic subset)

lSELinuxlSSL encryption (in-flight)

MG

MT

CO

RE

FIL

ES

EC

Page 34: Scalable POSIX File Systems in the Cloud

MG

MT Support in the console for discovery, format, and creation of bricks based on recommended best practices; an improved dashboard that shows vital statistics of pools.

MG

MT

CO

RE

CO

RE

CO

RE

New support in the console for snapshotting and geo-replication features.

New features to allow creation of a tier of fast media (SSDs, Flash) that accompanies slower media, supporting policy-based movement of data between tiers and enhancing create/read/write performance for many small file workloads.

Ability to detect silent data corruption in files via signing and scrubbing, enabling long term retention and archival of data without fear of “bit rot”.

Ability to schedule periodic execution of snapshots easily, without the complexity of custom automation scripts.

Device management, dashboard

Snapshots,Geo-replication

Tiering

Bit rot detection

Snapshot scheduling

These features were introduced in the most recent release of Red Hat Gluster Storage, and are now supported by Red Hat.

CO

RE

CO

RE

Features that enable incremental, efficient backup of volumes using standard commercial backup tools, providing time-savings over full-volume backups.

Introduction of erasure coded volumes (dispersed volumes) that provide cost-effective durability and increase usable capacity when compared to standard RAID and replication.

Backup hooks

Erasure coding

DETAIL:RED HAT GLUSTER STORAGE 3.1

Page 35: Scalable POSIX File Systems in the Cloud

DETAIL:RED HAT GLUSTER STORAGE 3.1

PE

RF Optimizations to enhance small file performance,

especially with small file create and write operations.

PE

RF

SEC

UR

ITY

PR

OT

OC

OL

Optimizations that result in enhanced rebalance speed at large scale.

Introduction of the ability to operate with SELinux in enforcing mode, increasing security across an entire deployment.

Support for data access via clustered, active-active NFSv4 endpoints, based on the NFS-Ganesha project.

Enhancements to SMB 3 protocol negotiation, copy-data offload, and support for in-flight data encryption [Sayan: what is copy-data offload?]

Small file

Rebalance

SELinux enforcing mode

NFSv4(multi-headed)

SMB 3 (subset of features)

These features were introduced in the most recent release of Red Hat Gluster Storage, and are now supported by Red Hat.

PR

OT

OC

OL

Page 36: Scalable POSIX File Systems in the Cloud

SCALABLE POSIX FILESYSTEMIN THE CLOUD

Page 37: Scalable POSIX File Systems in the Cloud

IN THE CLOUDGLUSTER WORKS GREAT IN AWS

us-east-1a

us-east-1c

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

vpc

Page 38: Scalable POSIX File Systems in the Cloud

IN THE CLOUDGLUSTER WORKS GREAT IN AWS

us-east-1a

us-east-1c

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

vpc

40GB Gluster Distributed Volume

40GB Gluster Distributed Volume

Rep

licat

ion

Page 39: Scalable POSIX File Systems in the Cloud
Page 40: Scalable POSIX File Systems in the Cloud
Page 41: Scalable POSIX File Systems in the Cloud

SEEMS LEGIT…SINGLE-THREAD PERF FOR GOOGLE EARTH

Type Operation Protocol Test Size MB/s

Random Read NFS POSIX AIO 16MB 97.11

Random Write NFS POSIX AIO 16MB 55.12

Sequential Read NFS POSIX AIO 16MB 130.55

Sequential Write NFS POSIX AIO 16MB 56.26

Random Read FUSE Client POSIX AIO 16MB 156.34

Random Write FUSE Client POSIX AIO 16MB 83.23

Sequential Read FUSE Client POSIX AIO 16MB 146.07

Sequential Write FUSE Client POSIX AIO 16MB 82.55

Page 42: Scalable POSIX File Systems in the Cloud

DEVOPSDEPLOY & SCALE GLUSTER WITH ANSIBLE

Page 43: Scalable POSIX File Systems in the Cloud

DEVOPSSHOULD HAVE THAT DONE IN AN HOUR OR SO…

tinyurl.com/gluster-ansible

Page 44: Scalable POSIX File Systems in the Cloud

IN THE CLOUDGLUSTER WORKS GREAT IN AWS

us-east-1a

us-east-1c

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

vpc

40GB Gluster Distributed Volume

40GB Gluster Distributed Volume

Rep

licat

ion

Page 45: Scalable POSIX File Systems in the Cloud

PUSH BUTTONSCALE OUT YOUR POSIX STORAGE!

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

m4.xlarge

root

10G

vpc

m4.xlarge

root

10G

m4.xlarge

root

10G

Page 46: Scalable POSIX File Systems in the Cloud
Page 47: Scalable POSIX File Systems in the Cloud
Page 48: Scalable POSIX File Systems in the Cloud
Page 49: Scalable POSIX File Systems in the Cloud

DEMO VIDEOhttps://youtu.be/-wsJjrLQKqk

Page 50: Scalable POSIX File Systems in the Cloud

THANK YOU


Recommended