Ronaldo Amá VP, R&D, Data Services VMware, · PDF fileNetezza) Batch Processing...

© 2009 VMware Inc. All rights reserved

Big Data in the Cloud

Ronaldo Amá

VP, R&D, Data Services

VMware, Inc

2

The answer is

What was your question?

Well, this is a Hadoop show after all…

3

Big Data Landscape

ETL

Real Time

Streams (Social,

sensors)

Structured and Unstructured Data (HDFS, MAPR)

Real Time

Database (Shark,

Gemfire, hBase,

Cassandra)

Interactive

Analytics (Impala,

Greenplum,

AsterData,

Netezza…)

Batch Processing (Map-Reduce)

Real-Time

Processing (s4, storm,

spark) Data Visualization (Excel, Tableau)

(Informatica, Talend,

Spring Integration)

Compute Storage Networking

Cloud Infrastructure

Machine Learning (Mahout, etc…)

4

Hadoop

batch analysis

Technology Stack

HDFS

HBase

real-time queries NoSQL – Cassandra,

Mongo, etc

Big SQL –

Impala

Compute

layer

Data

layer

Other Spark,

Shark,

Solr,

Platfora,

Etc,…

Compute Storage Networking

Cloud Infrastructure

Host Host Host Host Host Host

Some sort of distributed, resource management OS + Filesystem

Host

5

Why Virtualize Hadoop

Shrink and expand

cluster on demand

Independent scaling of

Compute and data

Strong multi-tenancy

Elasticity & Multi-tenancy

High availability for

entire Hadoop stack

One click to setup

Proven solution

Highly Availability

Rapid deployment,

cloning

Unified life-cycle

management

Easy to

configure/reconfigure

Operational Simplicity

6

Common Infrastructure for Big Data

Single purpose clusters for various

business applications lead to cluster

sprawl.

Virtualization Platform

Simplify

• Single Hardware Infrastructure

• Unified operations

Optimize

• Shared Resources = higher utilization

• Elastic resources = faster on-demand access

MPP DB Hadoop HBase

Virtualization Platform

MPP DB

Hadoop

HBase

Cluster Sprawling

Cluster Consolidation

7

Mixing Workloads: Three big types of Isolation are Required

Resource Isolation

• Control the greedy noisy neighbor

• Reserve resources to meet needs

Version Isolation

• Allow concurrent OS, App, Distro versions

Security Isolation

• Provide privacy between users/groups

• Runtime and data privacy required

Host Host Host Host Host Host

Some sort of distributed, resource management OS + Filesystem

Host

8

Virtual Storage Architecture Include Local Disk

Shared Storage: SAN or NAS

• Easy to provision

• Automated cluster rebalancing

• Leverage vmotion/HA/FT

Local Storage: Local Disks

• Local disk for Hadoop

• Scalable Bandwidth, lower cost/GB

Host

Had

oo

p

Oth

er

VM

Oth

er

VM

Host

Had

oo

p

Had

oo

p

Oth

er

VM

Host

Had

oo

p

Ha

do

op

Oth

er

VM

Host

Had

oo

p

Oth

er

VM

Oth

er

VM

Host

Had

oo

p

Had

oo

p

Oth

er

VM

Host

Had

oo

p

Had

oo

p

Oth

er

VM

Shared Storage Shared Storage

Local Storage

9

Hadoop Runs Well on Virtualization

0

50

100

150

200

250

300

350

400

450

TeraGen TeraSort TeraValidate

Elapsed time, seconds (lower is better)

Native

1 VM

2 VMs

4 VMs

Source: http://www.vmware.com/files/pdf/techpaper/VMW-Hadoop-Performance-vSphere5.pdf

http://www.vmware.com/files/pdf/techpaper/VMW-Hadoop-Performance-vSphere5.pdf









10

Project Serengeti

Open source project launched in June 2012, meta-updates released

on regular schedule (~3 Months intervals)

Toolkit that leverage virtualization to simplify Hadoop deployment

and operations

Commercial support via Data Director

Deploy a Hadoop cluster in 10 Minutes

Customize Hadoop cluster

Use Your Favorite Hadoop Distribution

One stop command center

Serengeti

11

Hadoop Resources

Download and try Serengeti

• projectserengeti.org

Commercial support via Data

Director • vmware.com/products/application-

platform/vfabric-data-

director/overview.html

VMware Hadoop site

• vmware.com/hadoop

Hadoop performance on vSphere

• vmware.com/files/pdf/VMW-Hadoop-

Performance-vSphere5.pdf

Hadoop High Availability solution

• vmware.com/files/pdf/Apache-Hadoop-

VMware-HA-solution.pdf

projectserengeti.org

http://www.vmware.com/products/application-platform/vfabric-data-director/overview.html










vmware.com/hadoop

vmware.com/hadoop

http://www.vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf









http://www.vmware.com/files/pdf/Apache-Hadoop-VMware-HA-solution.pdf











© 2009 VMware Inc. All rights reserved

THANK YOU!

Ronaldo Amá

VP, R&D, Data Services

VMware, Inc

Date post:	06-Feb-2018
Category:	Documents
Upload:	habao
View:	214 times
Download:	0 times

Ronaldo Amá VP, R&D, Data Services VMware, · PDF fileNetezza) Batch Processing...

Documents