+ All Categories
Home > Documents > ClouderaEnterpriseinthe Cloud -...

ClouderaEnterpriseinthe Cloud -...

Date post: 20-Apr-2018
Category:
Upload: lydang
View: 218 times
Download: 5 times
Share this document with a friend
28
1 © Cloudera, Inc. All rights reserved. Cloudera internal and confidential Cloudera Enterprise in the Cloud November 2016
Transcript

1  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Cloudera  Enterprise  in  the  Cloud  November  2016  

 

2  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

No@fica@on  • The  informa@on  in  this  document  is  proprietary  to  Cloudera.    No  part  of  this  document  may  be  reproduced,  copied  or  transmiEed  in  any  form  for  any  purpose  without  the  express  prior  wriEen  permission  of  Cloudera.    • This  document  is  a  preliminary  version  and  not  subject  to  your  license  agreement  or  any  other  agreement  with  Cloudera.    This  document  contains  only  intended  strategies,  developments  and  func@onali@es  of  Cloudera  products  and  is  not  intended  to  be  binding  upon  Cloudera  to  any  par@cular  course  of  business,  product  strategy  and/or  development.    Please  note  that  this  document  is  subject  to  change  and  may  be  changed  by  Cloudera  at  any  @me  without  no@ce.    • Cloudera  assumes  no  responsibility  for  errors  or  omissions  in  this  document.    Cloudera  does  not  warrant  the  accuracy  or  completeness  of  the  informa@on,  text,  graphics,  links  or  other  items  contained  within  this  material.    This  document  is  provided  without  a  warranty  of  any  kind,  either  express  or  implied,  including  but  not  limited  to  the  implied  warran@es  of  merchantability,  fitness  for  a  par@cular  purpose  or  non-­‐infringement.    • Cloudera  shall  have  no  liability  for  damages  of  any  kind  including  without  limita@on  direct,  special,  indirect  or  consequen@al  damages  that  may  result  from  the  use  of  these  materials.    The  limita@on  shall  not  apply  in  cases  of  gross  negligence.    

3  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

What’s  Driving  Hadoop  to  the  Cloud?  Enterprise  customers  using  cloud  for  big  data  analy@cs  

Hadoop  deployments  in  cloud  are  accelera@ng:  ●  Execu@ve  mandate:  minimize  on-­‐prem  

datacenter  footprint  

●  Increased  agility:  end-­‐user  self-­‐service  

●  Elas@city:  op@mize  infrastructure  usage  

4  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Enterprises  want  a  hybrid  cloud  strategy  

Nearly  20%  of  organiza@ons  will  run  hybrid  cloud  by  2017.    -­‐  2015  Gartner  Cloud  Adop@on  survey  

5  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Why  Cloudera  in  the  Cloud?    

Size  compute  and  storage  independently,  grow  and  shrink  clusters  dynamically,  and  pay  only  for  what  you  use  on  ad-­‐hoc,  transient  workloads    

Preserve  business  flexibility  and  data  portability  and  minimize  cloud  lock-­‐in  by  running  in  any  one  of  the  three  major  public  cloud  providers  or  in  private  cloud    

Reduce  risk  with  comprehensive  manageability,  availability,  security,  and  governance  required  for  produc@on  big  data  workloads    

Elas@c   Hybrid/Mul@-­‐Cloud   Enterprise  Grade  

6  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Common  workloads  in  the  cloud  

 Only  pay  for  what  you  need,  

when  you  need  it    

▪  Transient  clusters  ▪  Elas@c  workload  ▪  Object  storage  centric  ▪  Cloud-­‐na@ve  deployment  

ETL/Modeling  (Data  Engineering)  

App  Delivery  (Opera@onal  Database)  

Reduce  Opera9ng  Costs   New  Insights,  New  Revenue   Run  Without  Risk  

BI/Analy9cs  (Analy@c  Database)  

 Explore  and  analyze  all  data,  

wherever  it  lives    

▪  Transient  or  Persistent  clusters  ▪  Sized  to  demand  ▪  HDFS  or  object  storage  ▪  Lie-­‐and-­‐shie  or  cloud-­‐na@ve  

deployment  

 Enterprise-­‐grade  to  protect  your  

business,  no  maEer  what    

▪  Fixed  clusters  ▪  Periodic  sync  ▪  All  HDFS  storage  ▪  Lie-­‐and-­‐shie  deployment    

7  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

LiE  and  ShiE            

Cloud-­‐na9ve        

Cloudera  Enterprise  in  the  Cloud  With  choice  in  deployment  models    

Object  Store  

Bringing  enterprise-­‐class  Big  Data  solu@ons  to  cloud:      

•  Leaders  in  Hadoop  infrastructure  •  Enterprise  class  stack  • No  vendor  lock-­‐in  • Hybrid  on-­‐prem  and  public  cloud  

8  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Cluster  Lie-­‐and-­‐shie  Use  Cases  Perpetually  “on”  clusters  in  the  cloud    Lie-­‐and-­‐shie  clusters  have  similar  requirements  to  on-­‐prem  clusters:    

•  High  availability  and  disaster  recovery  •  Cluster  opera@onal  management  •  Cluster  auto-­‐scaling  •  Resource  management  •  Security  

 Examples  of  lie-­‐and-­‐shie-­‐use  cases  in  the  cloud:    

•  HBase  clusters  •  Kala  clusters  •  BI  analy@cs  •  Large,  mul@-­‐user  clusters  

9  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Embrace  Transience  for  Lower  Costs    

Decoupled  Storage  and  Compute  for  Elas9c  Scale  

PaEerns  of  Cloud-­‐Na@ve  Applica@ons  Flexibility,  Self-­‐Service  Models,  and  New  Cost  Dynamics  

Compartmentalize  for  Greater  Isola9on  

Object  Store  

COMPUTE  

1hr  

SPIN  UP   SPIN  DOWN  

Object  Store  

10  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Clusters  Using  Cloud-­‐na@ve  Infrastructure  Leverage  object  storage  and  elas@c  compute  to  support  transient  clusters      Transient  cluster  requirements:    ●  Object  store  integra@on  ●  Fast  cluster  provisioning  ●  Cluster  metadata  persistence  ●  Usage-­‐based  pricing    Examples  of  transient  clusters  in  the  cloud:    ●  ETL  workflows  ●  Model  training  ●  Ad  hoc  analy@cs  ●  Dev  and  test  workflows  

11  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Delivering  a  modern  data  planorm  on  any  cloud    

 •  Component  support  w/  

performance  op@miza@ons  for  object  store  

 •  Cloud-­‐na@ve  support  for  mul@ple  

IaaS  planorms    

•  Service  metadata  persistence  across  cluster  lifecycles  

 •  Op@miza@ons  for  cluster  grow  

and  shrink      

Transient  &  Elas@c  Cluster  Support  

 •  Navigator  support  for  audit  and  

lineage  across  cluster  lifecycles    •  Unified  permissions  with  fine-­‐

grained,  role  based  ACLs  (column  +  rows)  

 •  Object  store  and  cluster-­‐wide  

data-­‐at-­‐rest  and  in-­‐mo@on  encryp@on  

 •  Manage  encryp@on  keys  on-­‐prem    

Comprehensive,  Granular  Security  

 •  Simplified  administra@on  with  

cluster  lifecycle  support    

•  Mul@-­‐cluster  view  through  single  pane  of  glass  

 •  Rapid  cluster  deployments  and  

scaling    

•  Manage  CDH  deployments  at  scale  

Cluster  Lifecycle  Management  

12  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Enterprise-­‐Grade  Security  and  Governance  in  the  Cloud  

 •  Confidently  run  big  data  workloads  on  sensi@ve  data  in  the  cloud      •  Empower  users  with  differing  permissions  to  share  clusters  and  data    •  Complement  and  extend  the  security  and  governance  protec@ons  

from  your  cloud  provider    

 

13  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Comprehensive,  Compliance-­‐Ready  Security  Inside  and  Outside  the  Cloud  

Access  Defining  what  users  and  applica@ons  can  

do  with  data      

Technical  Concepts:  Permissions  Authoriza@on  

 

Data  Protec@ng  data  in  the  

cluster  from  unauthorized  

visibility    

Technical  Concepts:  Encryp@on  

Key  management    

Visibility  Repor@ng  on  where  data  came  from  and  how  it’s  being  used  

   

Technical  Concepts:  Audi@ng  Lineage  

 

Perimeter  Guarding  access  to  the  cluster  itself  

     Technical  Concepts:  

Authen@ca@on  Network  isola@on  

   

14  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Maintaining  Keys  Outside  Control  of  Cloud  Provider  

Navigator  Key  Trustee  

Hardware  Security  Module  (op@onal)  

 HDFS  

STORA

GE  CO

MPU

TE  

APPS  

Machine  Learning  

Business  Intelligence   ETC...  

 S3  

Server-­‐side  encryp@on  

Impala  Hive  

MR   Spark   HBase  

HDFS  encryp@on  and  S3  client-­‐side  encryp@on  (in  storage  client)  

 Local  disk  

Cloud   On  Prem  

Best  Security  Prac@ce  •  Encrypt  higher  in  the  stack  •  Store  and  manage  keys  separately  from  

Cloud  Provider  

Gartner,  Hype  Cycle  for  Cloud  Security,  17  July  2015:  “...users  of  infrastructure,  planorm  and  soeware  as  a  service  looking  for  providers  with  a  good  story  on  encryp@on,  and  frequently  looking  for  mechanisms  that  can  be  applied  outside  of  the  control  of  the  cloud  service  provider.”    

     

15  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Cloudera  complements  and  extends  cloud  provider  security  

Sensi9ve  data  can  be  analyzed  in  the  cloud  today,  using  HDFS  or  cloud  object  storage  

•  Using  a  combina@on  of  Cloudera  and  Cloud  provider  security  controls    •  Encryp@on  Keys  can  be  stored  and  managed  separately  from  cloud  provider  (today  

for  HDFS,  later  for  object  stores)  •  Single-­‐user  transient  clusters  offer  simplified  security      

The  majority  of  users  will  be  accessing  structured  data  on  mul9-­‐user  clusters  •  Sentry  and  RecordService  work  together  to  provide  column  and  row-­‐level  security  •  These  users  don’t  need  to  use  Cloud-­‐provider  security  

 

16  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Sample  CDH  in  Cloud  Architecture  

                         

Data  Sources  

Real-­‐Time  Serving  

KaVa/  Flume  

Spark  Streaming  

           HBase  or  

             Impala/Kudu  (beta)  

KaVa  Applica9on  

Object  Storage      

Hive/Spark/HoS  

Impala  

Analy9cs  

Batch  Data  Transforma9ons  

17  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Director  Provisioning:  Cluster  Lifecycle  Management  Spin  up,  grow  &  shrink,  terminate  CDH  clusters  that  read/write  to  object  store  

Easy  Administra@on  •  Dynamic  cluster  lifecycle  management  •  Single  pane  of  glass:  mul@-­‐cluster  view    Flexible  Deployments  •  Mul@-­‐cloud:  AWS,  Azure,  GCP  •  Fast  cluster  deployments  •  Scaling  of  CDH  clusters    •  Spot  instance  support    Enterprise-­‐grade  •  Integra@on  across  Cloudera  Enterprise  •  Management  of  CDH  deployments  at  scale      

       

Cloudera Director

18  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Supported  Today:  Cloudera  Director  2.1  and  C5.8  Notable  cloud  features  supported  by  Director,  CM,  and  CDH    

 •  Hive  on  AWS  S3  •  Spark  on  AWS  S3  •  Hive-­‐on-­‐Spark  on  AWS  S3  •  Impala  on  S3  •  Support  for  S3  s3a  connector      

Object  Store  Support  

 •  Faster  cluster  deployments  •  Cluster  templates  •  Cluster  cloning  •  Enablement  of  HA  &  Kerberos  

during  bootstrap    

Cluster  Lifecycle    •  Create,  grow,  shrink,  

terminate  clusters  •  Single  pane-­‐of-­‐glass  for  

cluster  health  •  AWS  spot  instances  for  

worker  nodes  

Cluster  Management  

19  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Sample  CDH  in  Cloud  Architecture  

                         

Data  Sources  

Real-­‐Time  Serving  

KaVa/  Flume  

Spark  Streaming  

           HBase,  or  

             Impala/Kudu  (beta)  

KaVa  Applica9on  

Object  Storage        

Hive/Spark/HoS  

Impala  

Analy9cs  

Batch  Data  Transforma9ons  

Batch  Analy9cs  

20  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Batch analytics in cloud •  What is Hive on Spark?

•  Enables Hive to use Spark as underlying execution engine •  Functional parity and full compatibility with Hive on MR* •  Provides ~3X better perf than Hive on MR •  Seamless migration via automatic config and optimizations via CM •  Fully supported production release (generally available) in CDH5.7 •  Community effort by Cloudera, Intel, MapR, IBM, DataBricks •  Various optimizations (Dynamic Partition Pruning, Vectorization support,

Cost-Based Optimizer,Others – Caching RDDs across queries, Optimize self join/union etc.)

•  Performance  op@miza@ons  improve  TCO  in  the  cloud  *See release notes for known issues

21  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Tradi@onally-­‐Architected  Analy@c  Databases  

Inelas@c  Scale  with  Tightly-­‐Coupled  Compute/Storage  

Rigid  Structure  &  Proprietary  Formats  

Limited  to  SQL  with  Data  Movement  Necessary  

COMPUTE  STORE  

Sta@c  Sizing  

∞  

22  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Impala’s  Cloud-­‐Na@ve  Capabili@es  

Cloud  Elas9city  •  Pay-­‐per-­‐Use  •  Grow/shrink  cluster  sizes    •  Elas@c  compute  scale  •  Transient  support  

Data  Agility  •  Faster,  more  agile  data  acquisi@on  

•  Data  portability:  Open  formats  and  open  storage  

Scalability  •  Proven  over  100s  of  nodes  •  Proven  with  high-­‐concurrency  

Hybrid    •  Runs  across  mul@-­‐cloud  &  on-­‐prem  

•  Mul@-­‐storage  over  S3,  HDFS,  Kudu,  Isilon,  DSSD,  etc  

Object  Store  

COMPUTE  

23  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Impala  More  Cost-­‐Effec@ve  on  both  EBS  &  S3  ETL  +  Mul@-­‐user  queries  

•  Redshie  “General  Purpose  Schema”  -­‐  schema  for  general-­‐purpose  usage  •  Redshie  “Fixed  Repor@ng”  –  fixed-­‐purpose  schema  tuned  for  this  specific  test  workload  

Impala >200% cheaper than Redshift General Purpose Impala 8-28% cheaper than Redshift Fixed Reporting

Exploratory BI can be expensive on Redshift

24  ©  Cloudera,  Inc.  All  rights  reserved.  

Highligh@ng  Cloudera  Cloud  Differen@ators    Product  leadership  driving  unique  innova@on  for  data  analy@cs  in  the  cloud      ●  Focus  on  price-­‐performance  through  component  op9miza9ons  for  the  cloud    

●  Bringing  the  best  low-­‐latency  SQL-­‐query  engine  to  the  cloud  with  Impala  

●  Superior  integrated  security  solu9on  with  fine-­‐grained  access  control  

●  Mul9-­‐  and  hybrid  cloud  support  avoiding  lock-­‐in  and  enabling  flexibility  

●  Integrated  framework  for  transient  and  permanent  clusters    

●  Value-­‐added  tools  &  rich  partner  integra@ons  for  best  data  analy@cs  experience      

25  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Why  Cloudera  in  the  Cloud?  CDH  is  the  most  deployed  distro  in  the  cloud  

Hadoop  Exper9se  u  Most  commiEers  u  World-­‐class  innova@on  u  Enterprise-­‐class  stack  u  Granular  data  security  +  governance  u  Best  support,  services,  training          

 Flexible  Deployments  u  No  vendor  lock-­‐in  u  Mul@-­‐cloud  and  on-­‐prem  u  Transient  and  long-­‐lived  clusters      

 Customer  success  –  not  cloud  consump9on  

u    Focus  on  infrastructure  choice  u  Security  separa@on  from  infrastructure  

leads  to  greater  choice      

 Flexible  Pricing  u  Pay-­‐as-­‐you-­‐go  cloud  uage  u  Tradi@onal  node-­‐based  licensing      

26  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Extensive  Partner  Ecosystem  

Platform & Cloud

System Integration

Data Systems

Software and OEM

27  ©  Cloudera,  Inc.  All  rights  reserved.  Cloudera internal and confidential

Get  started  with  Cloudera  Enterprise  in  the  cloud  

Deploy  and  manage  Cloudera  Enterprise  in  the  cloud  environment  of  your  choice  

Deploy  an  enterprise  data  hub  on  AWS  

Provision  and  deploy  Cloudera  Enterprise  on  the  Azure  Marketplace  

Cloudera  Director      

AWS  Quickstart   Azure  Marketplace  

28  ©  Cloudera,  Inc.  All  rights  reserved.  

Thank  You  


Recommended