1 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Cloudera Enterprise in the Cloud November 2016
2 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
No@fica@on • The informa@on in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmiEed in any form for any purpose without the express prior wriEen permission of Cloudera. • This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains only intended strategies, developments and func@onali@es of Cloudera products and is not intended to be binding upon Cloudera to any par@cular course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at any @me without no@ce. • Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the informa@on, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warran@es of merchantability, fitness for a par@cular purpose or non-‐infringement. • Cloudera shall have no liability for damages of any kind including without limita@on direct, special, indirect or consequen@al damages that may result from the use of these materials. The limita@on shall not apply in cases of gross negligence.
3 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
What’s Driving Hadoop to the Cloud? Enterprise customers using cloud for big data analy@cs
Hadoop deployments in cloud are accelera@ng: ● Execu@ve mandate: minimize on-‐prem
datacenter footprint
● Increased agility: end-‐user self-‐service
● Elas@city: op@mize infrastructure usage
4 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Enterprises want a hybrid cloud strategy
Nearly 20% of organiza@ons will run hybrid cloud by 2017. -‐ 2015 Gartner Cloud Adop@on survey
5 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Why Cloudera in the Cloud?
Size compute and storage independently, grow and shrink clusters dynamically, and pay only for what you use on ad-‐hoc, transient workloads
Preserve business flexibility and data portability and minimize cloud lock-‐in by running in any one of the three major public cloud providers or in private cloud
Reduce risk with comprehensive manageability, availability, security, and governance required for produc@on big data workloads
Elas@c Hybrid/Mul@-‐Cloud Enterprise Grade
6 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Common workloads in the cloud
Only pay for what you need,
when you need it
▪ Transient clusters ▪ Elas@c workload ▪ Object storage centric ▪ Cloud-‐na@ve deployment
ETL/Modeling (Data Engineering)
App Delivery (Opera@onal Database)
Reduce Opera9ng Costs New Insights, New Revenue Run Without Risk
BI/Analy9cs (Analy@c Database)
Explore and analyze all data,
wherever it lives
▪ Transient or Persistent clusters ▪ Sized to demand ▪ HDFS or object storage ▪ Lie-‐and-‐shie or cloud-‐na@ve
deployment
Enterprise-‐grade to protect your
business, no maEer what
▪ Fixed clusters ▪ Periodic sync ▪ All HDFS storage ▪ Lie-‐and-‐shie deployment
7 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
LiE and ShiE
Cloud-‐na9ve
Cloudera Enterprise in the Cloud With choice in deployment models
Object Store
Bringing enterprise-‐class Big Data solu@ons to cloud:
• Leaders in Hadoop infrastructure • Enterprise class stack • No vendor lock-‐in • Hybrid on-‐prem and public cloud
8 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Cluster Lie-‐and-‐shie Use Cases Perpetually “on” clusters in the cloud Lie-‐and-‐shie clusters have similar requirements to on-‐prem clusters:
• High availability and disaster recovery • Cluster opera@onal management • Cluster auto-‐scaling • Resource management • Security
Examples of lie-‐and-‐shie-‐use cases in the cloud:
• HBase clusters • Kala clusters • BI analy@cs • Large, mul@-‐user clusters
9 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Embrace Transience for Lower Costs
Decoupled Storage and Compute for Elas9c Scale
PaEerns of Cloud-‐Na@ve Applica@ons Flexibility, Self-‐Service Models, and New Cost Dynamics
Compartmentalize for Greater Isola9on
Object Store
COMPUTE
1hr
SPIN UP SPIN DOWN
Object Store
10 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Clusters Using Cloud-‐na@ve Infrastructure Leverage object storage and elas@c compute to support transient clusters Transient cluster requirements: ● Object store integra@on ● Fast cluster provisioning ● Cluster metadata persistence ● Usage-‐based pricing Examples of transient clusters in the cloud: ● ETL workflows ● Model training ● Ad hoc analy@cs ● Dev and test workflows
11 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Delivering a modern data planorm on any cloud
• Component support w/
performance op@miza@ons for object store
• Cloud-‐na@ve support for mul@ple
IaaS planorms
• Service metadata persistence across cluster lifecycles
• Op@miza@ons for cluster grow
and shrink
Transient & Elas@c Cluster Support
• Navigator support for audit and
lineage across cluster lifecycles • Unified permissions with fine-‐
grained, role based ACLs (column + rows)
• Object store and cluster-‐wide
data-‐at-‐rest and in-‐mo@on encryp@on
• Manage encryp@on keys on-‐prem
Comprehensive, Granular Security
• Simplified administra@on with
cluster lifecycle support
• Mul@-‐cluster view through single pane of glass
• Rapid cluster deployments and
scaling
• Manage CDH deployments at scale
Cluster Lifecycle Management
12 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Enterprise-‐Grade Security and Governance in the Cloud
• Confidently run big data workloads on sensi@ve data in the cloud • Empower users with differing permissions to share clusters and data • Complement and extend the security and governance protec@ons
from your cloud provider
13 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Comprehensive, Compliance-‐Ready Security Inside and Outside the Cloud
Access Defining what users and applica@ons can
do with data
Technical Concepts: Permissions Authoriza@on
Data Protec@ng data in the
cluster from unauthorized
visibility
Technical Concepts: Encryp@on
Key management
Visibility Repor@ng on where data came from and how it’s being used
Technical Concepts: Audi@ng Lineage
Perimeter Guarding access to the cluster itself
Technical Concepts:
Authen@ca@on Network isola@on
14 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Maintaining Keys Outside Control of Cloud Provider
Navigator Key Trustee
Hardware Security Module (op@onal)
HDFS
STORA
GE CO
MPU
TE
APPS
Machine Learning
Business Intelligence ETC...
S3
Server-‐side encryp@on
Impala Hive
MR Spark HBase
HDFS encryp@on and S3 client-‐side encryp@on (in storage client)
Local disk
Cloud On Prem
Best Security Prac@ce • Encrypt higher in the stack • Store and manage keys separately from
Cloud Provider
Gartner, Hype Cycle for Cloud Security, 17 July 2015: “...users of infrastructure, planorm and soeware as a service looking for providers with a good story on encryp@on, and frequently looking for mechanisms that can be applied outside of the control of the cloud service provider.”
15 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Cloudera complements and extends cloud provider security
Sensi9ve data can be analyzed in the cloud today, using HDFS or cloud object storage
• Using a combina@on of Cloudera and Cloud provider security controls • Encryp@on Keys can be stored and managed separately from cloud provider (today
for HDFS, later for object stores) • Single-‐user transient clusters offer simplified security
The majority of users will be accessing structured data on mul9-‐user clusters • Sentry and RecordService work together to provide column and row-‐level security • These users don’t need to use Cloud-‐provider security
16 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Sample CDH in Cloud Architecture
Data Sources
Real-‐Time Serving
KaVa/ Flume
Spark Streaming
HBase or
Impala/Kudu (beta)
KaVa Applica9on
Object Storage
Hive/Spark/HoS
Impala
Analy9cs
Batch Data Transforma9ons
17 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Director Provisioning: Cluster Lifecycle Management Spin up, grow & shrink, terminate CDH clusters that read/write to object store
Easy Administra@on • Dynamic cluster lifecycle management • Single pane of glass: mul@-‐cluster view Flexible Deployments • Mul@-‐cloud: AWS, Azure, GCP • Fast cluster deployments • Scaling of CDH clusters • Spot instance support Enterprise-‐grade • Integra@on across Cloudera Enterprise • Management of CDH deployments at scale
Cloudera Director
18 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Supported Today: Cloudera Director 2.1 and C5.8 Notable cloud features supported by Director, CM, and CDH
• Hive on AWS S3 • Spark on AWS S3 • Hive-‐on-‐Spark on AWS S3 • Impala on S3 • Support for S3 s3a connector
Object Store Support
• Faster cluster deployments • Cluster templates • Cluster cloning • Enablement of HA & Kerberos
during bootstrap
Cluster Lifecycle • Create, grow, shrink,
terminate clusters • Single pane-‐of-‐glass for
cluster health • AWS spot instances for
worker nodes
Cluster Management
19 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Sample CDH in Cloud Architecture
Data Sources
Real-‐Time Serving
KaVa/ Flume
Spark Streaming
HBase, or
Impala/Kudu (beta)
KaVa Applica9on
Object Storage
Hive/Spark/HoS
Impala
Analy9cs
Batch Data Transforma9ons
Batch Analy9cs
20 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Batch analytics in cloud • What is Hive on Spark?
• Enables Hive to use Spark as underlying execution engine • Functional parity and full compatibility with Hive on MR* • Provides ~3X better perf than Hive on MR • Seamless migration via automatic config and optimizations via CM • Fully supported production release (generally available) in CDH5.7 • Community effort by Cloudera, Intel, MapR, IBM, DataBricks • Various optimizations (Dynamic Partition Pruning, Vectorization support,
Cost-Based Optimizer,Others – Caching RDDs across queries, Optimize self join/union etc.)
• Performance op@miza@ons improve TCO in the cloud *See release notes for known issues
21 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Tradi@onally-‐Architected Analy@c Databases
Inelas@c Scale with Tightly-‐Coupled Compute/Storage
Rigid Structure & Proprietary Formats
Limited to SQL with Data Movement Necessary
COMPUTE STORE
Sta@c Sizing
∞
22 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Impala’s Cloud-‐Na@ve Capabili@es
Cloud Elas9city • Pay-‐per-‐Use • Grow/shrink cluster sizes • Elas@c compute scale • Transient support
Data Agility • Faster, more agile data acquisi@on
• Data portability: Open formats and open storage
Scalability • Proven over 100s of nodes • Proven with high-‐concurrency
Hybrid • Runs across mul@-‐cloud & on-‐prem
• Mul@-‐storage over S3, HDFS, Kudu, Isilon, DSSD, etc
Object Store
COMPUTE
23 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Impala More Cost-‐Effec@ve on both EBS & S3 ETL + Mul@-‐user queries
• Redshie “General Purpose Schema” -‐ schema for general-‐purpose usage • Redshie “Fixed Repor@ng” – fixed-‐purpose schema tuned for this specific test workload
Impala >200% cheaper than Redshift General Purpose Impala 8-28% cheaper than Redshift Fixed Reporting
Exploratory BI can be expensive on Redshift
24 © Cloudera, Inc. All rights reserved.
Highligh@ng Cloudera Cloud Differen@ators Product leadership driving unique innova@on for data analy@cs in the cloud ● Focus on price-‐performance through component op9miza9ons for the cloud
● Bringing the best low-‐latency SQL-‐query engine to the cloud with Impala
● Superior integrated security solu9on with fine-‐grained access control
● Mul9-‐ and hybrid cloud support avoiding lock-‐in and enabling flexibility
● Integrated framework for transient and permanent clusters
● Value-‐added tools & rich partner integra@ons for best data analy@cs experience
25 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Why Cloudera in the Cloud? CDH is the most deployed distro in the cloud
Hadoop Exper9se u Most commiEers u World-‐class innova@on u Enterprise-‐class stack u Granular data security + governance u Best support, services, training
Flexible Deployments u No vendor lock-‐in u Mul@-‐cloud and on-‐prem u Transient and long-‐lived clusters
Customer success – not cloud consump9on
u Focus on infrastructure choice u Security separa@on from infrastructure
leads to greater choice
Flexible Pricing u Pay-‐as-‐you-‐go cloud uage u Tradi@onal node-‐based licensing
26 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Extensive Partner Ecosystem
Platform & Cloud
System Integration
Data Systems
Software and OEM
27 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential
Get started with Cloudera Enterprise in the cloud
Deploy and manage Cloudera Enterprise in the cloud environment of your choice
Deploy an enterprise data hub on AWS
Provision and deploy Cloudera Enterprise on the Azure Marketplace
Cloudera Director
AWS Quickstart Azure Marketplace