+ All Categories
Home > Education > Accel Partners New Data Workshop 7-14-10

Accel Partners New Data Workshop 7-14-10

Date post: 26-Jun-2015
Category:
Upload: keirdo1
View: 242 times
Download: 2 times
Share this document with a friend
Popular Tags:
127
® 1 New Data Stack Workshop: Building a Scalable Cloud Datacenter Ping Li, Accel Partners [email protected] July 14, 2010 Stanford University
Transcript
Page 1: Accel Partners New Data Workshop 7-14-10

®

1

New Data Stack Workshop: Building a Scalable Cloud Datacenter

Ping Li, Accel [email protected]

July 14, 2010Stanford University

Page 2: Accel Partners New Data Workshop 7-14-10

2

®

Accel Partners Confidential

Delivering Cloud Computing

• Cloud data centers will share infrastructure layers common to mainframes but redelivered for cloud capabilities

• “New Data Stack” will form foundation for cloud computing

• Elasticity

• Multi-app/user

• User-provisioned

• Portability

“Cloud Frame” MainframeMonitoring—Security

(RACF)

Resource Scheduler(z/VM & OS 370)

Monitoring—Performance(Mainview)

Provisioning & ConfigurationManagement

Virtualization(z/VM)

Performance Acceleration & dedicated processors (OS 370)

Clustering, failover, and mirroring(OS 370 & purpose built hw & microcode)

Backup and DR Tivoli Storage Manager, Parallel Sysplex

Private/Public

Page 3: Accel Partners New Data Workshop 7-14-10

3

®

Accel Partners Confidential

Data Explosion

Legacy Stack

New Data Stack

• 2,500 exabytes of new information in 2012 with Internet/web as primary driver• “Digital universe” grew by 62% last year to 800K petabytes and will grow to 1.2 zettabytes this yearSource: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.

.

Cloud Application Data

Business Transaction Data

Page 4: Accel Partners New Data Workshop 7-14-10

4

®

Accel Partners Confidential

“New Data” Trends

Data is growing faster than processing power – leading to coping strategies like throwing away data or frequent archiving to tape

61% CAGR

42% CAGRData

Transistors

Application responsiveness/scale trumps immediate consistency

Absolute consistency is the primary requirement – ACID transactions

Unstructured, complex data blobs (images, voice, logs, video) – doesn’t fit nicely into rows/columns

Highly structured, relatively small data records

Extremely large data sets (petabytes)Smaller data sets (bytes)

2,000 users = Tiny2,000 users = Huge

Circa 2010 – Cloud DataCirca 1975 – Transaction Data

Source: Gartner. .

Page 5: Accel Partners New Data Workshop 7-14-10

5

®

Accel Partners Confidential

New Data Stack Technologies

CloudLegacy

Distributed computing layer (virtual machines,Map Reduce, networked commodity servers)

High speed networking is pervasive

Non-relational/”no sql” data stores

Distributed file systems

Flash/SSD (high performance and abundant)

Open platforms

Internet/cloud scale

Distributed computing layer (virtual machines,Map Reduce, networked commodity servers)

High speed networking is pervasive

Non-relational/”no sql” data stores

Distributed file systems

Flash/SSD (high performance and abundant)

Open platforms

Internet/cloud scale

Centralized/monolithic computing layer

Computer networking limited

Relational databases

FC SAN/NAS

Disks/Tape (memory scarce/expensive)

Proprietary/closed vendors

Enterprise-scale

Centralized/monolithic computing layer

Computer networking limited

Relational databases

FC SAN/NAS

Disks/Tape (memory scarce/expensive)

Proprietary/closed vendors

Enterprise-scale

Page 6: Accel Partners New Data Workshop 7-14-10

6

®

Accel Partners Confidential

Agenda

1:15 pm NorthscaleSharon Barr, Vice President EngineeringJames Phillips, Founder, Chief Product OfficerDustin Sailings, Chief ArchitectBob Wiederhold, President, CEO

2:15 pm ClouderaAmr Awadallah, CTO/Co-FounderJeff Hammerbacher, Chief Scientist/Co-Founder

3:15 pm FacebookBobby Johnson, Director, Software EngineeringMark Rabkin, Software Engineer

4:15 pm Fusion-ioRobert Wipfel, Fellow

5:30 pm Cocktails!

Page 7: Accel Partners New Data Workshop 7-14-10

Elastic Data Management Softwarefor web applications and cloud computing environments

Page 8: Accel Partners New Data Workshop 7-14-10

The opportunity.

“ Relational database technology has served us well for 40 years, and will likely continue to do so for the foreseeable future to support transactions requiring ACID guarantees. But a large, and increasingly dominant, class of software systems and data do not need those guarantees. Much of the data manipulated by Web applications have less strict transactional requirements but, for lack of a practical alternative, many IT teams continue to use relational technology, needlessly tolerating its cost and scalability limitations. For these applications and data, distributed key-value cache and database technologies such as NorthScale provide a promising alternative. ”

Carl OlofsonResearch Vice PresidentDatabase Management Software ResearchIDC

Page 9: Accel Partners New Data Workshop 7-14-10

Modern interactive software architecture

3

To support more users …

… simply add more commodity web servers

(or virtual machines) behind a load balancer …

… but you must get a bigger, more complex

database server.

Page 10: Accel Partners New Data Workshop 7-14-10

Application scales linearly, data hits a wall

Application Scales OutJust add more commodity web servers

Database Scales UpGet a bigger, more complex server

4

Page 11: Accel Partners New Data Workshop 7-14-10

What’s driving the curves?

5

1.Transaction overhead.

Same hardware, over an order of magnitude difference in supportable user base.

2.Expensive hardware.

More costly to start with, and the cost differential widens with growth.

3.Complex administration.

RDBMS technology is extremely complex and expensive to administer.

750 OPS 15,000 OPS

$7,500 $2,500750 OPS 750 OPS

RDBMS NorthScale

RDBMS NorthScaleRDBMS

Schema committee

Add new table(s)Re-normalize

Shard if needed

Tune performanceUpdate views

Insert and select.

NorthScale

Set and get.$125,000 $12,50015,000 OPS 15,000 OPS

3x

10x

Create indices

Page 12: Accel Partners New Data Workshop 7-14-10

Billions in data management savings available

RDBMS ideal for intended purpose, will continue to be appropriate for debit-credit data – costly overkill for most new data

6

Relational databasetechnology ideal

Alternative database technology needed

Relational database technology was $18.8 billion market in 2007 (IDC)

Page 13: Accel Partners New Data Workshop 7-14-10

Big leap from relational database to alternatives

7

Where do I start? What data should I move first? Which alternative database technology will “win”? This looks really complicated.

Page 14: Accel Partners New Data Workshop 7-14-10

NorthScale solution.

“ I can’t tell you how many email requests I’ve received from our developers asking for something that is as simple and fast as memcached, but that promises data durability. Cassandra is just far too complex and heavyweight and we won’t be doing any more deployments. NorthScale is definitely on to something here. ”

Director of EngineeringLeading Social Network

Page 15: Accel Partners New Data Workshop 7-14-10

Before: Where you are today

9

Relational database technology powers 99.999% of web applications.

Page 16: Accel Partners New Data Workshop 7-14-10

Step 1: Cache relational data in memcached

10

Memcached is simple, fast and infinitely scalable. It is easy to adopt, and delivers immediate cost, performance and scalability benefits.

NorthScale Memcached Servers

Relational Database

Page 17: Accel Partners New Data Workshop 7-14-10

Step 2: Gradually migrate data to membase

11

NorthScale Memcached Servers

Relational Database

NorthScale Membase Servers

Page 18: Accel Partners New Data Workshop 7-14-10

After: Elastic compute and data layersData layer now scales with linear cost and constant performance.

Application Scales OutJust add more commodity web servers

12

Database Scales OutJust add more commodity data servers

Scaling out flattens the cost and performance curves.

Page 19: Accel Partners New Data Workshop 7-14-10

An evolutionary path toward elastic data

13

Page 20: Accel Partners New Data Workshop 7-14-10

NorthScale Membase Server

Page 21: Accel Partners New Data Workshop 7-14-10

Membase is an elastic key-value database

15

Membase data servers

In the data center

Web application server

Application user

On the administrator console

Page 22: Accel Partners New Data Workshop 7-14-10

Five minutes or less to a working cluster• Downloads for Linux and Windows• Start with a single node• One button press joins nodes to a clusterEasy to develop against• Just SET and GET – no schema required• Drop it in. 10,000+ existing applications

already “speak membase” (via memcached)• Practically every language and application

framework is supported, out of the boxEasy to manage• One-click failover and cluster rebalancing• Graphical and programmatic interfaces• Configurable alerting

Membase is Simple, Fast, Elastic

16

Page 23: Accel Partners New Data Workshop 7-14-10

Membase is Simple, Fast, Elastic

17

Predictable• “Never keep an application waiting”• Quasi-deterministic latency and throughput

Low latency• Auto-migration of hot data to lowest latency

storage technology (RAM, SSD, Disk)• Selectable write behavior – asynchronous,

synchronous (on replication, persistence)• Back-channel rebalancing [FUTURE]

High throughput• Multi-threaded• Low lock contention• Asynchronous wherever possible• Automatic write de-duplication

Page 24: Accel Partners New Data Workshop 7-14-10

Membase is Simple, Fast, Elastic

18

Scale out• Spread I/O and data across commodity

servers (or VMs) • Consistent performance with linear cost• Dynamic rebalancing of a live clusterAll nodes are created equal• No special case nodes• Clone to growExtensible• Filtered TAP interface provides hook points

for external systems (e.g. full-text search, backup, warehouse)

• Data bucket – engine API for specialized container types

• Membase NodeCode [FUTURE]

Page 25: Accel Partners New Data Workshop 7-14-10

vBucket mapping

19

Key1Key2

All possiblemembase keys

Key3Key4Key5Key6Key7Key8Key9Key10

Keym

vBucket1

vBucket2

vBucket3

vBuckets

vBucketn

Server1 / Server2, Server3

Server1 / Server2, Server3

Server2 / Server3, Server4

Key  vBucket(hash function)

vBucket  Servers(table lookup)

Serverp / Serverq, Serverr

Host Server/Replica Servers

vBucket‐Server Map ‐ Example

vBuckets

vBucket5 ServerC / ServerA, ServerB

vBucket1 ServerA / ServerB, ServerC

Host Server/Replica Servers

vBucket3 ServerB / ServerA, ServerC

vBucket6 ServerC / ServerA, ServerB

vBucket2 ServerA / ServerB, ServerC

vBucket4 ServerB / ServerA, ServerC

Page 26: Accel Partners New Data Workshop 7-14-10

Deployment options

20

applicationlogic

OTC memcached

client

data operations

applicationlogic

OTC memcached

client

data operations

cluster operations

11211

serverlist

OTC Memcached Server

11211

Membase Server

serverlist

proxy vbucketmap

applicationlogic

OTC memcached

client

Membase Server

localhost

proxyvbucket

map

applicationlogic

NEWmemcached

client

Membase Server

vbucketmap

Embedded proxy Standalone proxy “vBucket-aware” client

Deployment Option 1 Deployment Option 2 Deployment Option 3

11210

data operations

cluster operations

11211

proxy vbucketmap

11210

data operations

cluster operations

11211

proxy vbucketmap

11210

Page 27: Accel Partners New Data Workshop 7-14-10

Membase “write” data flow – application view

21

User action results in the need to change the VALUE of KEY

Application updates key’s VALUE, performs SET operation

Membase (memcached) client hashes KEY, identifies KEY’s master serverSET request sent over

network to master server

Membase replicates KEY-VALUE pair, caches it in memory and stores it to disk

1

2

34

5

Page 28: Accel Partners New Data Workshop 7-14-10

Listener‐Sender

DiskDisk Disk

RAM*

mem

base storage engine

SSDSSD SSD

Listener‐Sender

DiskDisk Disk

RAM*

mem

base storage engine

SSDSSD SSD

Membase data flow – under the hood

22

SET request arrives at KEY’s master server

Listener-Sender

Master server for KEY Replica Server 2 for KEYReplica Server 1 for KEY

2 2

1 SET acknowledgement returned to application5

DiskDiskDiskDisk DiskDisk

RAM*

mem

base

sto

rage

eng

ine

SSDSSDSSDSSD SSDSSD

3

4

Page 29: Accel Partners New Data Workshop 7-14-10

moxi

11211 11210

memcachedprotocol listener/sender

membase storage engine

engine interface

memcapable 1.0 memcapable 2.0

21100 – 2119943698080

httpR

ES

T m

anag

emen

t AP

I/Web

UI

Hea

rtbea

t

Pro

cess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Con

figur

atio

n m

anag

er

on each node

Erlang/OTP

Reb

alan

ce o

rche

stra

tor

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icat

ion

man

ager

HTTP distributed erlangerlang port mapper

Data Manager Cluster Manager

Membase Architecture

Page 30: Accel Partners New Data Workshop 7-14-10

moxi

11211 11210

memcachedprotocol listener/sender

membase storage engine

engine interface

memcapable 1.0 memcapable 2.0

21100 – 2119943698080

httpR

ES

T m

anag

emen

t AP

I/Web

UI

Hea

rtbea

t

Pro

cess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Con

figur

atio

n m

anag

er

on each node

Erlang/OTP

Reb

alan

ce o

rche

stra

tor

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icat

ion

man

ager

HTTP distributed erlangerlang port mapper

Membase Architecture

Page 31: Accel Partners New Data Workshop 7-14-10

Data buckets are secure membase “slices”

25

Membase data servers

In the data center

Web application server

Application user

On the administrator console

Bucket 1

Bucket 2

Aggregate Cluster Memory and Disk Capacity

Page 32: Accel Partners New Data Workshop 7-14-10

Leading cloud service (PAAS) providerOver 65,000 hosted applicationsNorthScale Memcached Server serving over 1,200 Heroku customers (as of June 10, 2010)

NorthScale in production

26

Social game leader – FarmVille, Mafia Wars, Café WorldOver 230 million monthly usersNorthScale Membase Serveris the 500,000 ops-per-second database behind FarmVille and Café World

Page 33: Accel Partners New Data Workshop 7-14-10
Page 34: Accel Partners New Data Workshop 7-14-10

Wednesday, July 14, 2010

Page 35: Accel Partners New Data Workshop 7-14-10

Evolving a New Analytical PlatformWhat Works and What’s Missing

Jeff HammerbacherChief Scientist, ClouderaJuly 14, 2010

Wednesday, July 14, 2010

Page 36: Accel Partners New Data Workshop 7-14-10

My BackgroundThanks for Asking

[email protected]▪ Studied Mathematics at Harvard▪ Worked as a Quant on Wall Street▪ Conceived, built, and led Data team at Facebook▪ Nearly 30 amazing engineers and data scientists▪ Several open source projects and research papers

▪ Founder of Cloudera▪ Chief Scientist▪ Also, check out the book “Beautiful Data”

Wednesday, July 14, 2010

Page 37: Accel Partners New Data Workshop 7-14-10

Presentation Outline▪ 1. Defining the Platform▪ BI: Science for Profit▪ Need tools for whole research cycle▪ SQL Server 2008 R2: defining the platform

▪ 2. State of the Platform Ecosystem▪ 3. Foundations for a New Implementation▪ Hadoop▪ Boiling the Frog

▪ 4. Future Developments▪ Questions and Discussion

Wednesday, July 14, 2010

Page 38: Accel Partners New Data Workshop 7-14-10

1. Defining the Platform

Wednesday, July 14, 2010

Page 39: Accel Partners New Data Workshop 7-14-10

BI is looking more like science (for profit)

Wednesday, July 14, 2010

Page 40: Accel Partners New Data Workshop 7-14-10

Jim Gray: Science entering Fourth Paradigm“We have to do better at producing tools to

support the whole research cycle”

Wednesday, July 14, 2010

Page 41: Accel Partners New Data Workshop 7-14-10

RDBMS only a small part of this tool set

Wednesday, July 14, 2010

Page 42: Accel Partners New Data Workshop 7-14-10

Example: SQL Server 2008 R2

Wednesday, July 14, 2010

Page 43: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL Server

Wednesday, July 14, 2010

Page 44: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Wednesday, July 14, 2010

Page 45: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting Services

Wednesday, July 14, 2010

Page 46: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Wednesday, July 14, 2010

Page 47: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

Wednesday, July 14, 2010

Page 48: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

Wednesday, July 14, 2010

Page 49: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

OLAP: PowerPivot

Wednesday, July 14, 2010

Page 50: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

OLAP: PowerPivot

MDM: Master Data Services

Wednesday, July 14, 2010

Page 51: Accel Partners New Data Workshop 7-14-10

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

OLAP: PowerPivot

MDM: Master Data ServicesCollaboration: SharePoint

Wednesday, July 14, 2010

Page 52: Accel Partners New Data Workshop 7-14-10

What do we call this unified suite?

Wednesday, July 14, 2010

Page 53: Accel Partners New Data Workshop 7-14-10

For today: Analytical Data Platform

Wednesday, July 14, 2010

Page 54: Accel Partners New Data Workshop 7-14-10

LAMP Stack for Analytical Data ManagementFor today: Analytical Data Platform

Wednesday, July 14, 2010

Page 55: Accel Partners New Data Workshop 7-14-10

2. The State of the Platform Ecosystem

Wednesday, July 14, 2010

Page 56: Accel Partners New Data Workshop 7-14-10

Who makes up the platform ecosystem?

Wednesday, July 14, 2010

Page 57: Accel Partners New Data Workshop 7-14-10

Platform Providers

Wednesday, July 14, 2010

Page 58: Accel Partners New Data Workshop 7-14-10

Platform ProvidersInfrastructure Providers

Wednesday, July 14, 2010

Page 59: Accel Partners New Data Workshop 7-14-10

Platform ProvidersInfrastructure Providers

Application Developers

Wednesday, July 14, 2010

Page 60: Accel Partners New Data Workshop 7-14-10

Platform ProvidersInfrastructure Providers

Application Developers

Content Providers

Wednesday, July 14, 2010

Page 61: Accel Partners New Data Workshop 7-14-10

Platform ProvidersInfrastructure Providers

Application DevelopersEnd Users

Content Providers

Wednesday, July 14, 2010

Page 62: Accel Partners New Data Workshop 7-14-10

What is new about the ecosystem today?

Wednesday, July 14, 2010

Page 63: Accel Partners New Data Workshop 7-14-10

Content Providers1. > 95% of enterprise data is unstructured

2. Data volumes growing rapidly

Wednesday, July 14, 2010

Page 64: Accel Partners New Data Workshop 7-14-10

Infrastructure Providers1. Cloud

2. Warehouse-Scale Computers

Wednesday, July 14, 2010

Page 65: Accel Partners New Data Workshop 7-14-10

Platform Providers1. Open source

2. Driven by consumer web properties

Wednesday, July 14, 2010

Page 66: Accel Partners New Data Workshop 7-14-10

Application Developers1. Data Scientists

2. Diversity of languages

Wednesday, July 14, 2010

Page 67: Accel Partners New Data Workshop 7-14-10

End Users1. Browser is the client

2. Tell a story about the business

Wednesday, July 14, 2010

Page 68: Accel Partners New Data Workshop 7-14-10

3. Foundations for a New Implementation

Wednesday, July 14, 2010

Page 69: Accel Partners New Data Workshop 7-14-10

New foundations: HDFS and MapReduce

Wednesday, July 14, 2010

Page 70: Accel Partners New Data Workshop 7-14-10

2005: Doug/Mike start project inside Nutch

Wednesday, July 14, 2010

Page 71: Accel Partners New Data Workshop 7-14-10

2006: Doug joins Yahoo!

Wednesday, July 14, 2010

Page 72: Accel Partners New Data Workshop 7-14-10

2007: Make Hadoop scale

Wednesday, July 14, 2010

Page 73: Accel Partners New Data Workshop 7-14-10

2007: Make Hadoop scaleYahoo! makes Pig open source

Wednesday, July 14, 2010

Page 74: Accel Partners New Data Workshop 7-14-10

2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source

Wednesday, July 14, 2010

Page 75: Accel Partners New Data Workshop 7-14-10

2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source

Randy Bryant’s “DISC” lecture

Wednesday, July 14, 2010

Page 76: Accel Partners New Data Workshop 7-14-10

2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source

Randy Bryant’s “DISC” lecture

Powerset makes HBase open source

Wednesday, July 14, 2010

Page 77: Accel Partners New Data Workshop 7-14-10

2008: Make Hadoop fast

Wednesday, July 14, 2010

Page 78: Accel Partners New Data Workshop 7-14-10

2008: Make Hadoop fastYahoo! wins Daytona terabyte sort benchmark

Wednesday, July 14, 2010

Page 79: Accel Partners New Data Workshop 7-14-10

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmark

Wednesday, July 14, 2010

Page 80: Accel Partners New Data Workshop 7-14-10

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop

Wednesday, July 14, 2010

Page 81: Accel Partners New Data Workshop 7-14-10

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop

Facebook makes Hive open source

Wednesday, July 14, 2010

Page 82: Accel Partners New Data Workshop 7-14-10

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop

Facebook makes Hive open source“MapReduce: A Major Step Backwards”

Wednesday, July 14, 2010

Page 83: Accel Partners New Data Workshop 7-14-10

2009: Insert Hadoop into the enterprise

Wednesday, July 14, 2010

Page 84: Accel Partners New Data Workshop 7-14-10

2009: Insert Hadoop into the enterpriseCloudera releases CDH

Wednesday, July 14, 2010

Page 85: Accel Partners New Data Workshop 7-14-10

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYC

Wednesday, July 14, 2010

Page 86: Accel Partners New Data Workshop 7-14-10

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYCYahoo! sorts a petabyte with Hadoop

Wednesday, July 14, 2010

Page 87: Accel Partners New Data Workshop 7-14-10

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYCYahoo! sorts a petabyte with Hadoop

Cloudera adds training, support, services

Wednesday, July 14, 2010

Page 88: Accel Partners New Data Workshop 7-14-10

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYCYahoo! sorts a petabyte with Hadoop

Cloudera adds training, support, services

“The Unreasonable Effectiveness of Data”

Wednesday, July 14, 2010

Page 89: Accel Partners New Data Workshop 7-14-10

2010: Integrate Hadoop into the enterprise

Wednesday, July 14, 2010

Page 90: Accel Partners New Data Workshop 7-14-10

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Wednesday, July 14, 2010

Page 91: Accel Partners New Data Workshop 7-14-10

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Wednesday, July 14, 2010

Page 92: Accel Partners New Data Workshop 7-14-10

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Datameer and Karmasphere funded

Wednesday, July 14, 2010

Page 93: Accel Partners New Data Workshop 7-14-10

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Datameer and Karmasphere funded

Quest, Talend, Netezza, and more integrate

Wednesday, July 14, 2010

Page 94: Accel Partners New Data Workshop 7-14-10

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Datameer and Karmasphere funded

Quest, Talend, Netezza, and more integrateHive adds JDBC and ODBC

Wednesday, July 14, 2010

Page 95: Accel Partners New Data Workshop 7-14-10

Hadoop will be an Analytical Data Platform

Wednesday, July 14, 2010

Page 96: Accel Partners New Data Workshop 7-14-10

4. Future Developments

Wednesday, July 14, 2010

Page 97: Accel Partners New Data Workshop 7-14-10

Capture: Log collection and CEP

Wednesday, July 14, 2010

Page 98: Accel Partners New Data Workshop 7-14-10

Curate: Workflow and Scheduling

Wednesday, July 14, 2010

Page 99: Accel Partners New Data Workshop 7-14-10

Curate: Secondary and Full-Text Indexing

Wednesday, July 14, 2010

Page 100: Accel Partners New Data Workshop 7-14-10

Curate: Learn Structure from Data

Wednesday, July 14, 2010

Page 101: Accel Partners New Data Workshop 7-14-10

Analyze: Mesos-enabled frameworks

Wednesday, July 14, 2010

Page 102: Accel Partners New Data Workshop 7-14-10

Analyze: Link working set and historical data

Wednesday, July 14, 2010

Page 103: Accel Partners New Data Workshop 7-14-10

All behind a single user interface

Wednesday, July 14, 2010

Page 104: Accel Partners New Data Workshop 7-14-10

HUEMaking Many Computers Feel Like One

Wednesday, July 14, 2010

Page 105: Accel Partners New Data Workshop 7-14-10

!"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42*

2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$-

! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$)

"#$%&'()%&($*+&),%"#-"(-)./01,! 63-.*313$()! 7*,2($&')-'"'%$/)

&$823&$()+,-.,"$"#)9$&/3,"/)

0)($.$"($"+3$/

! :.$")/,2&+$)! ;<<=)>.'+5$)

*3+$"/$(

! ?$*3'@*$)! .'#+5$()43#5)13A$/)

1&,-)12#2&$)&$*$'/$/)#,)

3-.&,9$)/#'@3*3#B

! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"()

'#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$)

+,-.,"$"#/G

Wednesday, July 14, 2010

Page 106: Accel Partners New Data Workshop 7-14-10

(c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0

Wednesday, July 14, 2010

Page 107: Accel Partners New Data Workshop 7-14-10

ioMemory for Scale-out

Robert Wipfel, Fellow [email protected]

14th July, 2010, Accel Partners Panel Discussion

Page 108: Accel Partners New Data Workshop 7-14-10

Factors impacting Scale-out

Balance • CPU • Disk • Network

Contention • Sharing • Locking

Throughput • IOPS • Bandwidth

Latency • Distributed • Dependencies

Graceful Recovery • No SPOFs • Fast Replay

Energy • Servers • RAM • Disks

Management and Monitoring

Page 109: Accel Partners New Data Workshop 7-14-10

Need Disk

What’s *really* Needed…

Want •  Really fast

Don’t Want •  Volatile •  Expensive •  Limited capacity

Want •  Non-volatile •  Cheap •  Large capacity

Don’t Want •  Really slow

Want •  Non-volatile •  Really fast •  Large capacity •  Reasonable price •  Low energy

DRAM

Page 110: Accel Partners New Data Workshop 7-14-10

Solution: ioMemory

A disruption called ioMemory

•  High speed like DRAM

•  Persistence and capacity of disks

PCIe based NAND Flash Storage

•  Very high IOPS

•  Micro-second latency

•  Very high data throughput

Page 111: Accel Partners New Data Workshop 7-14-10

DRA

M

L1

SAN

, N

AS,

RA

IDed

DA

S

L2

L3

6  orders  of  magnitude  

SSD

s

5  orders  of  magnitude  

3  orders  of  magnitude  

50µs    (10E-­‐6)    

ioM

emor

y

Millisecond (10E-3) Nanosecond (10E-9) ACCESS DELAY IN TIME

Why is it called ioMemory?

Page 112: Accel Partners New Data Workshop 7-14-10

Fusion-io ioDrive Maximum Write

24 GB, Flash, PCIe x4

Fusion-io ioDrive Improved Write

40 GB, Flash, PCIe x4

Fusion-io ioDrive Maximum Capacity

80 GB, Flash, PCIe x4

SSD SATA Vendor A 3.0Gbps 2.5 RAID 0

128 GB, Flash SATA/300

SSD SATA Vendor B 3.0Gbps 2.5 RAID 0

64 GB, Flash SATA/300

SSD SATA Vendor C 32 GB, Flash SATA/300

H2benchw 3.6: Interface Bandwidth MB/s

Raw Storage Performance

7/14/10

Application Performance

Fusion-io ioDrive Maximum Write

24 GB, Flash, PCIe x4

Fusion-io ioDrive Improved Write

40 GB, Flash, PCIe x4

Fusion-io ioDrive Maximum Capacity

80 GB, Flash, PCIe x4

SSD SATA Vendor A 3.0Gbps 2.5 RAID 0

128 GB, Flash SATA/300

SSD SATA Vendor B 3.0Gbps 2.5 RAID 0

64 GB, Flash SATA/300

SSD SATA Vendor C 32 GB, Flash SATA/300

IOMeter Database Benchmark I/O: Average Throughput MB/s

2x Faster Storage I/O

50x Faster Application I/O

ioMemory Performance

Page 113: Accel Partners New Data Workshop 7-14-10

PCI bus protection

Checksums Poison bit

Strong ECC Wear leveling

Bad block re-mapping

Data labeling Parity-

protected pipelines

Flashback Chip protection

Power cut protection

ioMemory Reliability

MTBF = 2 Million Hours +

Page 114: Accel Partners New Data Workshop 7-14-10

SSD

SSD

5

RAID Controller Application CPU

6 5

1

ioMemory

ioMemory

Application CPU

1

2

4

3

SSD

4b

3b

2

3a

4a

8 9

ioMemory is not a Solid State Disk

Page 115: Accel Partners New Data Workshop 7-14-10

KI

LO

WA

TT

S

97 kWh/yr

3,013 kWh/yr

133,493 kWh/yr

15,000 RPM FC HDD

ioDrive Fusion-io

SSD ZeusIOPS

ioMemory is Green

Page 116: Accel Partners New Data Workshop 7-14-10

Case Study

One of the world’s fastest growing Webmonsters

•  Over 900% more database queries per second

•  Dramatically improved server replication for most current data

•  Over 800% improvement to disaster recovery back-up time

•  Cut server footprint, power costs, and IT overhead by 75%

•  Full and immediate ROI on repurposed servers with

•  Continued ROI on operational cost saving

Page 117: Accel Partners New Data Workshop 7-14-10

Case Study

Page 118: Accel Partners New Data Workshop 7-14-10

Case Study

•  5x improvement to

•  Database replication performance

•  Data intensive query response

•  Analysis routines

•  Eliminating 210 failure points from system

•  Implemented full system redundancy

•  Dramatically lowered power and cooling expenses

Internet security company that protects over 1 billion inboxes

Page 119: Accel Partners New Data Workshop 7-14-10

Case Study

Page 120: Accel Partners New Data Workshop 7-14-10

Disruption

By deploying ioMemory… Cloudmark eliminated the need for this…

Page 121: Accel Partners New Data Workshop 7-14-10

Department of Defense takes NASTRAN from 3-days to 6-hours

Demos Dynamics NAV can get a 4x performance improvement

Other Customer Examples

HMO achieves a 200 HDD to 1 ioDrive reduction for their Data Warehouse

Does a 30 to 1 box reduction for their reliable messaging system

Shows a 35x performance increase of unstructured search at OracleWorld

Stock exchange doubles the performance of their trading systems

Page 122: Accel Partners New Data Workshop 7-14-10
Page 123: Accel Partners New Data Workshop 7-14-10
Page 124: Accel Partners New Data Workshop 7-14-10

ioMemory Products

160 GB •  116,046 (4k read packet size) •  93,199 (75/25 r/w mix 4k packet size)

320 GB •  71,256 (4k read packet size) •  67,659 (75/25 r/w mix 4k packet size)

640 GB •  122,601 (4k read packet size) •  121,008 (75/25 r/w mix 4k packet size)

320 GB •  185,022 (4k read packet size) •  129,699 (75/25 r/w mix 4k packet size)

80 GB •  119,790 (4k read packet size) •  89,549 (75/25 r/w mix 4k packet size)

Page 125: Accel Partners New Data Workshop 7-14-10

19 Confiden8al  Informa8on:  Fusion-­‐io  

OEM Partners

Page 126: Accel Partners New Data Workshop 7-14-10

20

Questions?

Page 127: Accel Partners New Data Workshop 7-14-10

T H A N K Y O U


Recommended