Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | xebia-france |
View: | 5,950 times |
Download: | 3 times |
Data Has Changed in the Last 30 Years DA
TA GRO
WTH
END-‐USER APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
Data Management Strategies Have Stayed the Same
• Raw data on SAN, NAS
and tape • Data moved from
storage to compute • Rela,onal models with
predesigned schemas
Too Much Data, Too Many Sources
• Can’t ingest fast enough
Too Much Data, Too Many Sources
$ !
$ $
$
• Can’t ingest fast enough
• Costs too much to store
Too Much Data, Too Many Sources
1
2 3 4 5
• Can’t ingest fast enough
• Costs too much to store
• Exists in different places
Too Much Data, Too Many Sources
• Can’t ingest fast enough
• Costs too much to store
• Exists in different places
• Archived data is lost
Can’t Use It The Way You Want To
• Analysis and processing takes too long
Can’t Use It The Way You Want To
1
2 3 4 5
• Analysis and processing takes too long
• Data exists in silos
Can’t Use It The Way You Want To
? ? ? • Analysis and processing takes too long
• Data exists in silos
• Can’t ask new ques,ons
Can’t Use It The Way You Want To
• Analysis and processing takes too long
• Data exists in silos
• Can’t ask new ques,ons
• Can’t analyze unstructured data
12
Transform The Way You Think About Data
Cloudera
Ask Bigger Ques,ons
13
When customer x visits my store what can I recommend based on their recent web behavior across our various brand websites?
What is the best loca,on in North America to efficiently produce both tomato plants and corn?
What does every fraudulent ac,vity in the last 2 years have in common that will help us iden,fy and proac,vely prevent the next incident?
Are hotel room sales at Christmas slow because of inventory or compe,,ve pricing?
What did customer x view on their last website visit?
` What makes tomato plants more frui[ul than others ?
What incidents of fraud did we detect last year?
What search terms are used most o\en when looking for hotels in NYC?
SIMPLIFIED, UNIFIED, EFFICIENT
• Bulk of data stored on scalable low cost pla[orm • Perform end-‐to-‐end workflows • Specialized systems reserved for specialized workloads • Provides data access across departments or LOB
COMPLEX, FRAGMENTED, COSTLY
•Data silos by department or LOB • Lots of data stored in expensive specialized systems • Analysts pull select data into EDW • No one has a complete view
The Cloudera Approach
14
Meet enterprise demands with a new way to think about data.
THE CLOUDERA WAY THE OLD WAY Single data pla[orm to support BI, Repor,ng &
App Serving
Mul,ple pla[orms for mul,ple workloads
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CDH CLOUDERA MANAGER
CLOUDERA SUPPORT
Cloudera Enterprise: The Pla[orm for Big Data
15
BRINGS STORAGE & COMPUTE TOGETHER
WORKS WITH EVERY TYPE OF DATA
CHANGES THE ECONOMICS OF DATA
MANGAGEMENT
A Revolu,onary Solu,on Built on Apache Hadoop
CLOUDERA NAVIGATOR
16
Cloudera Enterprise Includes Advanced System Management & Support for the Core CDH Projects
CDH 100% OPEN SOURCE HADOOP DISTRIBUTION
CLOUDERA MANAGER END-‐TO-‐END SYSTEM MANAGEMENT
CORE PROJECTS PREMIUM PROJECTS CONNECTORS
HDFS MAPREDUCE FLUME HCATALOG
MICROSTRATEGY
NETEZZA
ORACLE
QLIKVIEW
TABLEAU
TERADATA
HIVE HUE MAHOUT OOZIE
PIG SQOOP WHIRR ZOOKEEPER
HBASE
IMPALA
SEARCH (BETA)
DEPLOYMENT MONITORING API SNMP CONFIG ROLLBACKS PHONE HOME
SERVICE MGMT DIAGNOSTICS ROLLING UPGRADES LDAP REPORTING BACKUP/DR
CLOUDERA SUPPORT BEST-‐IN-‐CLASS TECHNICAL SUPPORT, COMMUNICTY ADVOCACY & INDEMNIFICATION
CLOUDERA NAVIGATOR END-‐TO-‐END DATA MANAGEMENT
ACCESS MGMT DATA AUDIT
CORE HADOOP PROJECTS
CLOUDERA MANAGER
CLOUDERA NAVIGATOR HBASE IMPALA Search
17
RTD SubscripVon Includes Support & Indemnity for Apache HBase
CDH 100% OPEN SOURCE HADOOP DISTRIBUTION
CLOUDERA MANAGER END-‐TO-‐END SYSTEM MANAGEMENT
CORE PROJECTS PREMIUM PROJECTS CONNECTORS
HDFS MAPREDUCE FLUME HCATALOG
MICROSTRATEGY
NETEZZA
ORACLE
QLIKVIEW
TABLEAU
TERADATA
HIVE HUE MAHOUT OOZIE
PIG SQOOP WHIRR ZOOKEEPER
HBASE
IMPALA
SEARCH (BETA)
DEPLOYMENT MONITORING API SNMP CONFIG ROLLBACKS PHONE HOME
SERVICE MGMT DIAGNOSTICS ROLLING UPGRADES LDAP REPORTING BACKUP/DR
CLOUDERA SUPPORT BEST-‐IN-‐CLASS TECHNICAL SUPPORT, COMMUNICTY ADVOCACY & INDEMNIFICATION
CLOUDERA NAVIGATOR END-‐TO-‐END DATA MANAGEMENT
ACCESS MGMT DATA AUDIT
CORE HADOOP PROJECTS
CLOUDERA MANAGER
CLOUDERA NAVIGATOR HBASE IMPALA Search
18
RTQ SubscripVon Includes Support & Indemnity for Cloudera Impala
CDH 100% OPEN SOURCE HADOOP DISTRIBUTION
CLOUDERA MANAGER END-‐TO-‐END SYSTEM MANAGEMENT
CORE PROJECTS PREMIUM PROJECTS CONNECTORS
HDFS MAPREDUCE FLUME HCATALOG
MICROSTRATEGY
NETEZZA
ORACLE
QLIKVIEW
TABLEAU
TERADATA
HIVE HUE MAHOUT OOZIE
PIG SQOOP WHIRR ZOOKEEPER
HBASE
IMPALA
SEARCH (BETA)
DEPLOYMENT MONITORING API SNMP CONFIG ROLLBACKS PHONE HOME
SERVICE MGMT DIAGNOSTICS ROLLING UPGRADES LDAP REPORTING BACKUP/DR
CLOUDERA SUPPORT BEST-‐IN-‐CLASS TECHNICAL SUPPORT, COMMUNICTY ADVOCACY & INDEMNIFICATION
CLOUDERA NAVIGATOR END-‐TO-‐END DATA MANAGEMENT
ACCESS MGMT DATA AUDIT
CORE HADOOP PROJECTS
CLOUDERA MANAGER
CLOUDERA NAVIGATOR HBASE IMPALA Search
19
RTS SubscripVon Includes Support & Indemnity for Cloudera Search
CDH 100% OPEN SOURCE HADOOP DISTRIBUTION
CLOUDERA MANAGER END-‐TO-‐END SYSTEM MANAGEMENT
CORE PROJECTS PREMIUM PROJECTS CONNECTORS
HDFS MAPREDUCE FLUME HCATALOG
MICROSTRATEGY
NETEZZA
ORACLE
QLIKVIEW
TABLEAU
TERADATA
HIVE HUE MAHOUT OOZIE
PIG SQOOP WHIRR ZOOKEEPER
HBASE
IMPALA
SEARCH (BETA)
DEPLOYMENT MONITORING API SNMP CONFIG ROLLBACKS PHONE HOME
SERVICE MGMT DIAGNOSTICS ROLLING UPGRADES LDAP REPORTING BACKUP/DR
CLOUDERA SUPPORT BEST-‐IN-‐CLASS TECHNICAL SUPPORT, COMMUNICTY ADVOCACY & INDEMNIFICATION
CLOUDERA NAVIGATOR END-‐TO-‐END DATA MANAGEMENT
ACCESS MGMT DATA AUDIT
CORE HADOOP PROJECTS
CLOUDERA MANAGER
CLOUDERA NAVIGATOR HBASE Search IMPALA
20
BDR SubscripVon Includes Centralized Management For Disaster Recovery Workflows
CDH 100% OPEN SOURCE HADOOP DISTRIBUTION
CLOUDERA MANAGER END-‐TO-‐END SYSTEM MANAGEMENT
CORE PROJECTS PREMIUM PROJECTS CONNECTORS
HDFS MAPREDUCE FLUME HCATALOG
MICROSTRATEGY
NETEZZA
ORACLE
QLIKVIEW
TABLEAU
TERADATA
HIVE HUE MAHOUT OOZIE
PIG SQOOP WHIRR ZOOKEEPER
HBASE
IMPALA
SEARCH (BETA)
DEPLOYMENT MONITORING API SNMP CONFIG ROLLBACKS PHONE HOME
SERVICE MGMT DIAGNOSTICS ROLLING UPGRADES LDAP REPORTING BACKUP/DR
CLOUDERA SUPPORT BEST-‐IN-‐CLASS TECHNICAL SUPPORT, COMMUNICTY ADVOCACY & INDEMNIFICATION
CLOUDERA NAVIGATOR END-‐TO-‐END DATA MANAGEMENT
ACCESS MGMT DATA AUDIT
CORE HADOOP PROJECTS
CLOUDERA MANAGER
CLOUDERA NAVIGATOR HBASE IMPALA Search
21
Navigator SubscripVon Enables Cloudera Navigator for Automated Data Management
CDH 100% OPEN SOURCE HADOOP DISTRIBUTION
CLOUDERA MANAGER END-‐TO-‐END SYSTEM MANAGEMENT
CORE PROJECTS PREMIUM PROJECTS CONNECTORS
HDFS MAPREDUCE FLUME HCATALOG
MICROSTRATEGY
NETEZZA
ORACLE
QLIKVIEW
TABLEAU
TERADATA
HIVE HUE MAHOUT OOZIE
PIG SQOOP WHIRR ZOOKEEPER
HBASE
IMPALA
SEARCH (BETA)
DEPLOYMENT MONITORING API SNMP CONFIG ROLLBACKS PHONE HOME
SERVICE MGMT DIAGNOSTICS ROLLING UPGRADES LDAP REPORTING BACKUP/DR
CLOUDERA SUPPORT BEST-‐IN-‐CLASS TECHNICAL SUPPORT, COMMUNICTY ADVOCACY & INDEMNIFICATION
CLOUDERA NAVIGATOR END-‐TO-‐END DATA MANAGEMENT
ACCESS MGMT DATA AUDIT
CORE HADOOP PROJECTS
CLOUDERA MANAGER
CLOUDERA NAVIGATOR HBASE IMPALA Search
22
Customer Case Studies
A mul,na,onal bank saves millions by op,mizing DW for analy,cs & reducing data
storage costs by 99%.
Ask Bigger Ques,ons: How can we op,mize our
data warehouse investment?
Cloudera op,mizes the EDW, saves millions
24
The Challenge: • Teradata EDW at capacity: ETL processes consume 7 days; takes 5 weeks to make historical data available for analysis
• Performance issues in business cri,cal apps; liqle room for discovery, analy,cs, ROI from opportuni,es
Mul,na,onal bank saves millions by op,mizing exis,ng DW for analy,cs & reducing data storage costs by 99%.
The Solu,on: • Cloudera Enterprise offloads data storage, processing & some analy,cs from EDW
• Teradata can focus on opera,onal func,ons & analy,cs
A Semiconductor Manufacturer uses predic,ve analy,cs to take preventa,ve ac,on
on chips likely to fail.
Ask Bigger Ques,ons: Which semiconductor
chips will fail?
Cloudera enables beqer predic,ons
26
The Challenge: • Want to capture greater granular and historical data for more accurate predic,ve yield modeling
• Storing 9 months’ data on Oracle is expensive
Semiconductor manufacturer can prevent chip failure with more accurate predic,ve yield models.
The Solu,on: • Dell | Cloudera solu,on for Apache Hadoop
• 53 nodes; plan to store up to 10 years (~10PB)
• Capturing & processing data from each phase of manufacturing process
CONFIDENTIAL -‐ RESTRICTED
The quant risk LOB within a mul,na,onal bank saves millions through beqer risk exposure
analysis & fraud preven,on.
Ask Bigger Ques,ons: How can we prevent
fraud?
Cloudera delivers savings through fraud preven,on
28
The Challenge: • Fraud detec,on is a cumbersome, mul,-‐step analy,c process requiring data sampling
• 2B transac,ons/month necessitate constant revisions to risk profiles • Highly tuned 100TB Teradata DW drives over-‐budget capital reserves & lower investment returns
Quant risk LOB in mul,na,onal bank saves millions through beqer risk exposure analysis & fraud preven,on
The Solu,on: • Cloudera Enterprise data factory for fraud preven,on, credit & opera,onal risk analysis
• Look at every incidence of fraud for 5 years for each person
• Reduced costs; expensive CPU no longer consumed by data processing
BlackBerry eliminates data sampling & simplifies data processing for beqer, more
comprehensive analysis.
Ask Bigger Ques,ons: How do we retain customers in a compe,,ve market?
Cloudera delivers ROI through storage alone
30
The Challenge: • BlackBerry Services generates .5PB (50-‐60TB compressed) data per day • RDBMS is expensive – limited to 1% data sampling for analy,cs
BlackBerry can analyze all their data vs. relying on 1% sample for beqer network capacity trending & management.
The Solu,on: • Cloudera Enterprise manages global data set of ~100PB
• Collec,ng device content, machine-‐generated log data, audit details
• 90% ETL code base reduc,on
31
A global retailer’s customers benefit from more personalized communica,ons and offers
based on interac,ons across all channels.
Ask Bigger Ques,ons: How can we offer customers
the best experience?
Cloudera op,mizes the DW for improved ROI
32
Global retailer’s customers benefit from more personalized communica,ons based on interac,ons across all channels.
The Solu,on: • Cloudera Enterprise with Impala — 1PB over 250 nodes
• Consolidated pla[orm for Big Data with single environment for query and machine learning
CONFIDENTIAL -‐ RESTRICTED
The Challenge: • Need to correlate online/offline data across disparate, costly legacy DWs • Data takes up to 4 weeks to get data from one group – inhibits produc,vity
33
Any Ques,ons, Big or Small?