+ All Categories
Home > Technology > The Elephant in the Clouds

The Elephant in the Clouds

Date post: 16-Apr-2017
Category:
Upload: dataworks-summithadoop-summit
View: 888 times
Download: 0 times
Share this document with a friend
13
The Elephant in the Clouds Sanjay Radia Chief Architect, Founder Hortonworks
Transcript
Page 1: The Elephant in the Clouds

The Elephant in the CloudsSanjay RadiaChief Architect, Founder Hortonworks

Page 2: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why Hadoop in the Cloud?

Unlimited Elastic Scale

Ephemeral & Long-Running

IT & Business Agility

No UpfrontHW Costs

$0

Page 3: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Today’s Hadoop Cloud Solutions

The Forrester WaveTM

Big Data Hadoop Cloud SolutionsQ2 2016Get it at //aka.ms/forresterwave

Rackspace

OracleAltiscaleQubole

Google

IBMAmazon Web Services

Microsoft

LeadersStrong

PerformersContendersChallengers

StrongWeak Strategy

Weak

Strong

CurrentOffering

MarketPresence

Page 4: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Architectural Considerations for Hadoop in the Cloud

Shared Data& Storage

On-Demand Ephemeral Workloads

1010110101010101

010101010101010101010101010101010

Elastic Resource Management

Shared Metadata, Security & Governance

Page 5: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Prescriptive On-Demand Ephemeral Workloads

On-DemandEphemeralWorkloads

Data ScienceR/W TablesCompute Fabric

ETL

R/W TablesCompute Fabric

WarehouseR/W TablesCompute Fabric

Search

R/W TablesCompute Fabric

Page 6: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Shared Data and Storage

Understand and Leverage Unique Cloud Properties Shared data lake is cloud storage accessible

by all apps Cloud storage segregated from compute Built-in geo-distribution and DR

Focus Areas Address cloud storage consistency

and performance Enhance performance via memory

and local storage

Shared Data& Storage

1010110101010101

010101010101010101010101010101010

Page 7: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Enhance Performance via Caching

Tabular Data: LLAP Read + Write-thru Cache Cache only the needed columns Shared across jobs / apps and across engines Spills to SSD when memory is full (anti-caching) Read & Write-through cache Security: Column-level and row-level

HDFS Caching for Non-tabular Data Cache data from cloud storage as needed Write-through cache

Workloads

Cloud Storage

LLAP R/W TablesHDFS Files

Cache

Page 8: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Shared Data Requires Shared Metadata, Security, and Governance

Shared Metadata Across All Workloads Metadata considerations

– Tabular data metastore– Lineage and provenance metadata– Pipeline and job management metadata– Add upon ingest– Update as processing modifies data

Access / tag-based policies and audit logs Centrally stored to facilitate use across apps

– Ex. backed by Cloud RDS (or shared DB)

Classification

Prohibition

Time

Location

Streams

Pipelines

Feeds

Tables

Files Objects

SharedMetadata

Policies

Page 9: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Elastic Resource Management in Context of Workload

Workload Management vs. Cluster Management Understand resource needs of different

workload types Add / remove resources to meet workload SLAs Manage compute power and high-performance

data-access (ex., LLAP) Pricing-aware: instances (spot, reserved),

data, bandwidthElasticResourceManagement

Page 10: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ram VenkateshSenior Director of EngineeringHortonworks

Demo of Cloud Tech PreviewEffectiveness of mobile ad spend (cross device attribution)

Clickstream ETL BI & Reporting Data Science

Data, Metadata, Security

Cloud Control Plane

Page 11: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Vision: Connected Data Architecture Enables Enterprise Transformations

Data in Motion

Data in Motion

Data at Rest

Data at Rest

MachineLearning

Deep HistoricalAnalysis

C L O U D

D ATA C E N T E R

Stream Analytics

Edge Data

Edge Data

Edge Analytics

Page 12: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Recommended Sessions…Thursday Hadoop & Cloud Storage: Object Store Integration in Production LLAP: Sub-Second Analytical Queries in Hive Zeppelin + Livy: Bringing multi tenancy to interactive data analysis

CHECK OUT HORTONWORKS CLOUD TECH PREVIEW!http://hortonworks.com/news-blogs/

Page 13: The Elephant in the Clouds

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You


Recommended