1
Hadoop: Extending Your Data WarehouseTony Baer | Principal Analyst, OvumModerated by Matt Brandwein | Product Marketing Manager, Cloudera
May 9, 2013
Welcome to the webinar!
• All lines are muted• Q&A after the presentation• Ask questions at any time by typing them in the
“Questions” pane on your WebEx panel• Recording of this webinar will be available
on-demand at cloudera.com
• Join the conversation on Twitter:@cloudera @TonyBaer #EDWHadoop
2
3
Who is Cloudera?
What the Enterprise Requires
Only 100% open source Hadoop-based platform with both batch and real-time processing engines, enterprise-ready with native high availability
Suite of system and data management software
Comprehensive support and consulting services
Broadest Hadoop training and certification programs
Extensive Partner Ecosystem
Over 600 partners across hardware, software and services
The Leader in Big Data
Management
Deliver a revolutionary data management platform powered by Apache Hadoop
World’s leading commercial vendor of Apache Hadoop
Enable organizations to improve operational efficiency and Ask Bigger Questions of all their data
Customers & Users Across Industries
More production deployments than all other vendors combined
© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.4
Hadoop: Extending your Data Warehouse
Tony Baer
May 9, 2013
Twitter: @TonyBaer
© Copyright Ovum. All rights reserved. Ovum is an Informa business.5
The BI Bottleneck
Hadoop & Enterprise Data Warehousing strategy
How Cloudera supports Hadoop as extended DW
Agenda
© Copyright Ovum. All rights reserved. Ovum is an Informa business.6
Sources Target(s)Staging Server
Extract Transform Load
Data Marts
DW
Traditional BI/Data warehousing architecture
ETL Tool
© Copyright Ovum. All rights reserved. Ovum is an Informa business.7
DWs conceived for MBytes/GBytes of structured data
Data structured based on expected queries & analytics
Multiple tiers to separate distinct workloads
OLTP – ongoing, shallow interactions, simple queries
Transform – batch-oriented, IOPS-intensive
BI/analytics – data-intensive, spikey
Reduced, eliminated impact on OLTP
More complex architecture, more tradeoffs
DW — The base case
© Copyright Ovum. All rights reserved. Ovum is an Informa business.8
EDW hitting the wall
Data growing in volume & complexity
Use cases require more, richer data
Customer retention
Operational Efficiency
Risk Mitigation
Data retention mandates/policies forcing hard decisions
ETL bursting batch windows
EDWs straining to accommodate volumes, varieties of data
© Copyright Ovum. All rights reserved. Ovum is an Informa business.9
Sources Target(s)
Extract Load/Transform
DW
Data Marts
The ELT pattern
© Copyright Ovum. All rights reserved. Ovum is an Informa business.10
The benefits – and limits – of ELT
Pros
Fewer data movements
Flatter architecture
Reduced errors with fewer data movements
Cons
Transform vs. analytic workload tradeoffs
SLAs jeopardized
Triggers arms race for more infrastructure
Processing Times
Infrastructure CostsData
Volumes
Assuming constant SLAs
© Copyright Ovum. All rights reserved. Ovum is an Informa business.11
Enterprise DWs – Size has its limits
SLAs hit the wall
Software licensing costs
PBytes @ $20k - $50k/TByte get $$$$$$
Managing/transforming new data types consumes resource
© Copyright Ovum. All rights reserved. Ovum is an Informa business.12
But what if...
You don’t have to worry about batch windows
You don’t have to trade off transformation vs. analytic processing cycles
You can control s/w license cost escalation
You can keep that archived data live
You can more readily consume new types of data & keep your analytic options open
© Copyright Ovum. All rights reserved. Ovum is an Informa business.13
The BI Bottleneck
Hadoop & Enterprise Data Warehousing strategy
How Cloudera supports Hadoop as extended DW
Agenda
© Copyright Ovum. All rights reserved. Ovum is an Informa business.14
Introducing Hadoop
Originally, data processing framework for solving unique Internet-scale problems
Based on Google File System (GFS) & MapReduce
Apache Hadoop community emerged to develop platform for wider scale adoption
FS, telcos, retail media discovered Hadoop’s benefits
© Copyright Ovum. All rights reserved. Ovum is an Informa business.15
Hadoop benefits
Scalability
Near linear performance up to
1000s of nodes
Cost Flexibility
Leverages commodity h/w & open source s/w
Versatility with data, analytics & operation
© Copyright Ovum. All rights reserved. Ovum is an Informa business.16
Hadoop’s trump card —Flexibility
Accommodates all kinds of data
Accommodates multiple workloads
Keeps your options open
Extensibility
Life beyond MapReduce
Many personalities
Best of both worlds
Convergence with SQL
Get the best of both worlds
© Copyright Ovum. All rights reserved. Ovum is an Informa business.17
Sources Target
Extract Load/Transform
Data Marts
Existing DW/Data Mart environment
Hadoop
DW
Hadoop as Data transformation platform
© Copyright Ovum. All rights reserved. Ovum is an Informa business.18
Why Hadoop as your data transformation platform?
Inexpensive cycles/storage
Low-cost platform reduces or eliminates tradeoff contingencies
No more transformation vs. analytics choice
Keep your archive active
Flexible division of labor
Data can remain in Hadoop or moved to SQL
Raw data sits alongside transformed data
© Copyright Ovum. All rights reserved. Ovum is an Informa business.19
Why Hadoop as extension to your DW?
Efficient division of labor
Run time-consuming, resource-intensive analytic workloads inside Hadoop
Routine query, analytics, & reporting in SQL DW or data mart
Query Hadoop directly
Most commercial BI tools read Hive metadata
Query Hadoop interactively
Emerging MapReduce alternatives supporting interactive query
© Copyright Ovum. All rights reserved. Ovum is an Informa business.20
The BI Bottleneck
Hadoop & Enterprise Data Warehousing strategy
How Cloudera supports Hadoop as extended DW
Agenda
© Copyright Ovum. All rights reserved. Ovum is an Informa business.21
Cloudera supports SQL convergence
Partners with leading ETL, BI, and Data warehousing platform & tool providers
Connect Hadoop & SQL platforms
Emerging trend: BI, ETL tools are working natively inside Hadoop
Introducing Impala
Brings high-performance interactive SQL inside Hadoop
Turns Hadoop into an MPP SQL analytic data target
Extends, doesn't replace your SQL EDW or data mart
Makes your DW strategy more flexible, iterative
© Copyright Ovum. All rights reserved. Ovum is an Informa business.22
Taming Hadoop
Cloudera Manager
Automates deployment and health monitoring
Automates Hadoop configuration
New side-by-side deployment support
Cloudera Navigator
New feature of Cloudera Manager
Tracks data utilization activity from HDFS, Hive & HBase
Stepping stone for data security/stewardship… watch this space
Backup & Disaster Recovery (BDR)
New feature to automate recovery workflows
© Copyright Ovum. All rights reserved. Ovum is an Informa business.23
Hadoop –Takeaways
Economical platform for offloading data transformation cycles
Extends enterprise analytics
Hadoop & SQL are converging– broadening your analytic options
Hadoop won’t replace your EDW, but will take more of the workload
Cloudera actively broadening CDH to support & extend your EDW
SQL convergence
Platform manageability
Data security & stewardship
24
Impala: Cloudera’s Design Strategy
Storage
Integration
Resource Management
Met
adat
a
BatchProcessingMAPREDUCE,
HIVE & PIG
…Interactive
SQLIMPALA
MathMachineLearning, Analytics
HDFS HBase
TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS
EnginesComplement MapReduce withinteractive MPP SQL engine
One pool of data
One metadata model
One security framework
One set of system resources
100% open source
An Integrated Part of the Hadoop Platform
25
Impala Use Cases
Interactive BI/analytics on more data
Asking new questions
Data processing with tight SLAs
Query-able archive w/ full fidelity
Cost-effective, ad hoc query environment that offloads the data warehouse for:
26
Leading BI tools work with Impala
Questions?
27
• Type in the “Questions” panel
• Tweet @cloudera #EDWHadoop
• Recording will be available on-demand at cloudera.com
• Contact us:[email protected]: @TonyBaer
[email protected]: @MattBrandwein
Thank you for attending!
Try Cloudera todaycloudera.com/downloads
Learn more about Impala cloudera.com/impala
Get Hadoop Traininguniversity.cloudera.com
Ready to go?Check out Cloudera Quickstart
cloudera.com/quickstart