+ All Categories
Home > Technology > 2012 open storage summit keynote

2012 open storage summit keynote

Date post: 16-Jan-2015
Category:
Upload: cloudscaling-inc
View: 5,106 times
Download: 0 times
Share this document with a friend
Description:
Randy Bias, Co-Founder and CTO of Cloudscaling, speaks on open storage, fault tolerance and the concept of failure "blast radius" at the Open Storage Summit, hosted by Nexenta in May 2012.
Popular Tags:
41
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses Cloud Storage Futures (previously: Designing Private & Public Clouds) May 22 nd , 2012 Randy Bias, CTO & Co-founder @randybias
Transcript
Page 1: 2012 open storage summit   keynote

CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution** All unlicensed or borrowed works retain their original licenses

Cloud Storage Futures (previously: Designing Private & Public Clouds)

May 22nd, 2012

Randy Bias, CTO & Co-founder

@randybias

Page 2: 2012 open storage summit   keynote

Part 1:

The Two Cloud Architectures

2

Page 3: 2012 open storage summit   keynote

A Story of Two Clouds

3

Scale-out

Enterprise

Page 4: 2012 open storage summit   keynote

... Driven by Two App Types

4

New Elastic Apps

Existing Apps

Page 5: 2012 open storage summit   keynote

Cloud Computing ... Disrupts

EnterpriseComputing

"Client-Server"

CloudComputing

"Web"

MainframeComputing

"Big Iron"

1960 1980 2000 2020

5

Page 6: 2012 open storage summit   keynote

Cloud Computing ... Disrupts

Disruption

EnterpriseComputing

"Client-Server"

CloudComputing

"Web"

MainframeComputing

"Big Iron"

1960 1980 2000 2020

5

Page 7: 2012 open storage summit   keynote

Cloud Computing ... Disrupts

Disruption Disruption

EnterpriseComputing

"Client-Server"

CloudComputing

"Web"

MainframeComputing

"Big Iron"

1960 1980 2000 2020

5

Page 8: 2012 open storage summit   keynote

Cloud Computing ... Disrupts

Disruption Disruption Disruption

EnterpriseComputing

"Client-Server"

CloudComputing

"Web"

MainframeComputing

"Big Iron"

1960 1980 2000 2020

5

Page 9: 2012 open storage summit   keynote

Mainframe“big-iron”

Enterprise“client/server”

Cloud“scale-out”

SLA

Scaling

Hardware

HA Type

Software

Consumption

6

IT – Evolution of Computing Models

Page 10: 2012 open storage summit   keynote

99.999

Vertical

Custom

Hardware

Centralized

Centralized Service

Mainframe“big-iron”

Enterprise“client/server”

Cloud“scale-out”

SLA

Scaling

Hardware

HA Type

Software

Consumption

6

IT – Evolution of Computing Models

Page 11: 2012 open storage summit   keynote

99.999

Vertical

Custom

Hardware

Centralized

Centralized Service

99.9

Horizontal

Enterprise

Software

Decentralized

Shared Service

Mainframe“big-iron”

Enterprise“client/server”

Cloud“scale-out”

SLA

Scaling

Hardware

HA Type

Software

Consumption

6

IT – Evolution of Computing Models

Page 12: 2012 open storage summit   keynote

99.999

Vertical

Custom

Hardware

Centralized

Centralized Service

99.9

Horizontal

Enterprise

Software

Decentralized

Shared Service

Always On

Commodity

Distributed

Self-service

Mainframe“big-iron”

Enterprise“client/server”

Cloud“scale-out”

SLA

Scaling

Hardware

HA Type

Software

Consumption

6

IT – Evolution of Computing Models

Page 13: 2012 open storage summit   keynote

Enterprise Computing(existing apps built in silos)

7

Page 14: 2012 open storage summit   keynote

Cloud Computing(new elastic apps)

8

Page 15: 2012 open storage summit   keynote

Traditional apps Elastic cloud-ready apps

APPS

INFRA

9

Scale-out apps require elastic infrastructure

Page 16: 2012 open storage summit   keynote

Traditional apps Elastic cloud-ready apps

APPS

INFRA

10

Scale-out Cloud Technology

Page 17: 2012 open storage summit   keynote

Scale-out Principles

• Small failure domains

• Risk acceptance vs. risk mitigation

• More boxes for throughput & redundancy

• Assume app manages complexity:

• Data replication

• Assumes infrastructure is unreliable:

• Server & data redundancy

• Geo-distribution

• Auto-scaling

11

Page 18: 2012 open storage summit   keynote

What’s a failure domain?

• “Blast radius” during a failure

• What is impacted?

• Public SAN failures:

• FlexiScale SAN failure in 2007

• UOL Brazil in 2011:

• http://goo.gl/8ct9n

• There are many more

• Enterprise HA ‘pairs’ typically support BIG failure domains

12

Page 19: 2012 open storage summit   keynote

Two Diff Arches for Two Kinds of Apps

13

Loca

tion

of c

ompl

exity

Infra

App

Up OutPrimary scaling dimension

Elastic Scale-out Cloud

“Enterprise” Cloud

Page 20: 2012 open storage summit   keynote

Part 2:

Storage Architectures Are Changing

14

Page 21: 2012 open storage summit   keynote

Uptime in InfraEvery part is redundant

Data mgmt in InfraBigger SAN/NAS/DFS

Two Diff Storages for Two Kinds of Clouds

15

“Scale-out” Storage

“Classic”Storage

Uptime in appsMinimal h/w redundancy

Data mgmt in appsSmaller failure domains

Page 22: 2012 open storage summit   keynote

Difference in Tiers

16

Tier $ Purpose Classic Scale-out

1 $$$$Mission Critical

•SAN, then NAS•10-15K RPM•SSD

•On-demand SAN (EBS)•DynamoDB (AWS)•Variable service levels

2 $$ Important •NAS then SAN•7.2K RPM

•DAS•App / DFS to scale out

3 $Archive & Backups

•Tape•Nearline 5.4K •Object Storage

Page 23: 2012 open storage summit   keynote

• In scale-out systems, apps are managing the data:

• Riak / Scale-out distributed data store

• Hadoop+HDFS / Scale-out distributed computation systems

• Cassandra / Scale-out distributed columnar database

The Biggest Difference is in Where Data Management Resides

17

Page 24: 2012 open storage summit   keynote

Cassandra / Netflix use case

18

• 3 x Replication

• Linearly scaling performance

• 50 - 300 nodes

• > 1M writes/second

• When is this perfect?

• data size unknown

• growth unknown

• lots of elastic dynamism

Page 25: 2012 open storage summit   keynote

Cassandra / Netflix use case

19

• DAS (‘ephemeral store’)

• Per node perf is constant

• disk

• CPU

• network

• Client write times constant

• Nothing special here

Page 26: 2012 open storage summit   keynote

Cassandra / Netflix use case

20

• On-demand & app-managed

• Cost per GB/hr: $.006

• Cost per GB/mo: $4.14

• Includes: storage, DB, storage admin, network, network admin, etc. etc. etc.

Page 27: 2012 open storage summit   keynote

Part 3:

Scale-out Storage ... Now & Future

21

Page 28: 2012 open storage summit   keynote

Only Change is Certain

22

Page 29: 2012 open storage summit   keynote

There are a few basic approaches being taken ...

23

Page 30: 2012 open storage summit   keynote

Dedicated Storage SW

9K Jumbo Frames

SSD caches (ZIL/L2ARC)

No replication

Max HW redundancy

• In-rack SAN == faster, bigger DAS w/ better stat-muxing

• Accept normal DAS failure rates

• Assume app handles data replication

• Like AWS ‘ephemeral storage’

• KT architecture

• Customers didn’t “get it”

• “Ephemeral SAN” not well understood

Scale-out SAN

24

Page 31: 2012 open storage summit   keynote

AWS EBS - “Block-devices-as-a-Service”

• Scale-out SAN (sort of)

• Block scheduler

• Async replication

• Some failure tolerance

• Scheduler:

• Allocates customer block devices across many failure domains

• Customer run RAID inside VMs to increase redundancy

25

Core Network1

2

31

2

3

RAC

K

RAC

K

EBS Clusters

1

2

3

4

5

6

7

8

Intra-rack cluster async replication

Inter-rack cluster async replication

VM1

VM2

N1

N2

EBSScheduler

Cloud Control System

API

Page 32: 2012 open storage summit   keynote

DAS + Big Data(Storage + Compute + DFS)

• Storage capability:

• Replication

• Disk & server failure

• Data rebalancing

• Data locality

• rack awareness

• Checksums (basic)

• Also:

• Built in computation

26

Page 33: 2012 open storage summit   keynote

Distributed File Systems (DFS) over DAS

• Storage capability:

• Replication

• Disk & server failure

• Data rebalancing

• Checksums (w/ btrfs)

• Block devices

• Also:

• No computation

27

ceph architecture

✴ Looks familiar doesn’t it?

Page 34: 2012 open storage summit   keynote

Why is DFS at the Physical Layer Dangerous for Scale-out?

28

DFS

==

Page 35: 2012 open storage summit   keynote

DAS + Database Replication / Scaling

29

• Storage capability:

• Async/Sync Replication

• Server failure

• Checksums (sort of)

• Also:

• Std RDBMS

• SQL i/f

• Well understood

Page 36: 2012 open storage summit   keynote

• Storage capability:

• Replication

• Disk & server failure

• Data rebalancing

• Checksums (sometimes)

• Also:

• Looks like a big web app

• Uses a DHT/CHT to ‘index’ blobs

• Very simple

Object Storage

30

Page 37: 2012 open storage summit   keynote

Where does OpenStorage Fit?

31

Scale-out Solution

Purpose / Tier Virtual or Physical?

Fit

Scale-out SAN Tier-1/2 Physical In-rack SAN

EBS Tier-1 PhysicalEBS Clusters

(scale-out SAN)

DAS+BigData Tier-2 VirtualReliable, bit-rot resistant DAS

DAS+DFS Tier-2 Physical / VirtualReliable, bit-rot resistant DAS

(unproven)

DAS+DB Tier-2 Virtual In-VM reliable DAS

Object Storage Tier-3 PhysicalReliable, bit-rot resistant DAS

Page 38: 2012 open storage summit   keynote

Summarizing ZFS Value in Scale-out

• Data integrity & bit rot an issue that few solve today

• Most SAN/NAS solutions don’t ‘scale down’

• Commodity x86 servers are winning

• There are two scale-out places ZFS wins:

• Small SAN clusters

• Best DAS management

32

Page 39: 2012 open storage summit   keynote

Summary

33

Page 40: 2012 open storage summit   keynote

Conclusions / Speculations

• Build the right cloud

• Which means the right storage for *that* cloud

• A single cloud might support both ...

• Open storage can be used for both ...

• ... WITH the appropriate design/forethought

34

Page 41: 2012 open storage summit   keynote

35

Q&A@randybias


Recommended