Post on 02-Jul-2015
description
transcript
© Talend 2014 1
Starting Small and
Scaling Big with Hadoop
November 20, 2014
© Talend 2014 2
Your Speakers Today
Jim Walker Director, Product Marketing
Julien Sauvage Director, Product Marketing
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
2013 Digital universe
2.3 Zettabytes
1 Zettabyte (ZB) = 1 million Petabytes (PB); Sources: IDC and IDG Enterprise
85% of growth from new types of data
with machine-generated data increasing
15x
2020 Digital universe
40 Zettabytes
& Hadoop Market $50B
Analysts consensus estimates
enterprise data growth of
year over year through 2020
50x
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A shift from reactive to proactive interactions
HDP and Hadoop allow
organizations to shift
interactions from…
Reactive Post Transaction
Proactive Pre Decision
…to Real-time Personalization From static branding
…to repair before break From break then fix
…to Designer Medicine From mass treatment
…to Automated Algorithms From Educated Investing
…to 1x1 Targeting From mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Telco
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP Realized Cost Savings with EDW Optimization
Archive Data away from EDW
• Move cold or rarely used data to Hadoop
as active archive
• Store more of data longer
Offload costly ETL process
• Free your EDW to perform high-value
functions like analytics & operations, not ETL.
• Use Hadoop for advanced ETL
Optimize the value of your EDW
• Use Hadoop to refine new data sources, such
as web and machine data for new analytical
context
AN
AL
YT
ICS
D
AT
A S
YS
TE
MS
Data
Marts
Business
Analytics
Visualization
& Dashboards
Systems of
Record
RDBMS
ERP
CRM
Other
Clickstream Web & Social Geolocation Sensor & Machine
Server Logs
Unstructured NE
W
SO
UR
CE
S
HDP 2.2
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
Hadoop Helps you optimize and reduce costs associated with your EDW
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Realize dramatic savings for cost of storage
Cost Efficiencies
Reduce costs associated with
expensive archive systems
• Utilize existing relationships with
hardware vendors
• Open Source Software
Active Archive
Provide access to archived data not
just collect dust
MPP
SAN
Engineered System
NAS
HADOOP
Cloud Storage
$0 $20,000 $40,000 $60,000 $80,000 $180,000
Fully-loaded Cost Per Raw TB of Data (Min–Max Cost)
Hadoop Enables Scalable Compute & Storage at a Compelling Cost Structure
Storage Costs/Compute Costs
from $19/GB to $0.23/GB
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Unlock New Applications from New Types of Data
INDUSTRY USE CASE Sentiment
& Web
Clickstream
& Behavior
Machine
& Sensor Geographic Server Logs
Structured &
Unstructured
Financial Services
New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔ ✔ ✔
Telecom
Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔
Retail
360° View of the Customer ✔ ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
Manufacturing
Supply Chain and Logistics ✔
Assembly Line Quality Assurance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔ ✔ ✔
Monitor Patient Vitals in Real-Time
Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔ ✔
Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔ ✔
Government ETL Offload/Federal Budgetary Pressures ✔ ✔
Sentiment Analysis for Government Programs ✔
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
End Game: Data Lake - An architectural shift S
CA
LE
SCOPE
Unlocking the Data Lake
RDBMS
MPP
EDW
Data Lake Enabled by YARN
• Single data repository,
shared infrastructure
• Multiple biz apps
accessing all the data
• Enable a shift from
reactive to proactive
interactions
• Gain new insight across
the entire enterprise
New Analytic Apps
or IT Optimization
HDP 2.1
Go
ve
rna
nc
e
& I
nte
gra
tio
n
Se
cu
rity
Op
era
tio
ns
Data Access
Data Management
YARN
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise Goals for the Modern Data Architecture
• Consolidate siloed data sets structured
and unstructured
• Central data set on a single cluster
• Multiple workloads across batch
interactive and real time
• Central services for security, governance
and operation
• Preserve existing investment in current
tools and platforms
• Single view of the customer, product,
supply chain
AP
PL
ICA
TIO
NS
D
AT
A
SY
ST
EM
Business
Analytics
Custom
Applications
Packaged
Applications
RDBMS
EDW
MPP
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-Time Batch CRM
ERP
Other 1 ° ° °
° ° ° °
HDFS (Hadoop Distributed File System)
SO
UR
CE
S
EXISTING Systems
Clickstream Web &Social
Geolocation Sensor & Machine
Server Logs
Unstructured
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP delivers a comprehensive data management platform
Hortonworks Data Platform 2.2
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider Slider
SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
Cluster: Ranger
Deployment Choice Linux Windows On-Premises Cloud
YARN is the architectural
center of HDP
Enables batch, interactive
and real-time workloads
Provides comprehensive
enterprise capabilities
The widest range of
deployment options
Delivered Completely in the OPEN
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP and Talend in the Modern Data Architecture S
OU
RC
ES
EXISTING Systems
Clickstream Web &Social Geolocation Sensor & Machine
Server Logs Unstructured
DA
TA
S
YS
TE
M
RDBMS EDW
AP
PLI
CA
TIO
NS
BusinessObjects BI
HDP 2.1
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
ns
Data Access
Data Management
YARN
Hadoop 2.0, YARN
Data Quality
Pig, Hive,
ETL ELT
HBase, NoSQL
Deep Partnerships Hortonworks engages
in deep engineered relationships
with the leaders in the data center,
such as Microsoft, Teradata, Redhat,
HP, SAS & SAP
Broad Partnerships Over 600 partners work with us to
certify their applications to work with
Hadoop so they can extend big data
to their users
© Talend 2014 12
Connecting the Data-Driven Enterprise
© Talend 2014 13
The Talend Platform
© Talend 2014 14
Still Hand-Coding Data Integration?
Hand-coding Talend Enterprise
• Unproductive
• Need specialized skills
• Hard to maintain
• Limited support
• 800+ drag-n-drop components
• Generates optimized code
• Collaboration & management
• Gold support (SLAs)
© Talend 2014 15
Encumbered with Legacy ETL?
Legacy ETL Talend Enterprise
• Proprietary engine
• Hard to scale Big Data
• Expensive
• Open
• Generates native code
• Low TCO
© Talend 2014 16
Next big
thing
SQL
ELT
DW appliance
Future-Proof Architecture
ETL
Day-to-day
integration
JAVA
Hadoop
Highly Scalable
MapReduce
CAMEL
Message
transform-ation
CAMEL
© Talend 2014 17
ONE cluster to deploy
ONE cluster to manage
ONE cluster to monitor
ONE cluster to scale ONE cluster to update
ONE cluster to pay for!
And it will be 100x faster in 2 years
Infinite Scale
© Talend 2014 18
Unlock New Applications from New Types of Data
INDUSTRY USE CASE Sentiment & Web
Clickstream & Behavior
Machine & Sensor
Geographic Server Logs Structured & Unstructured
Financial Services
New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔ ✔ ✔
Telecom
Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔
Retail
360° View of the Customer ✔ ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
Manufacturing
Supply Chain and Logistics ✔
Assembly Line Quality Assurance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔ ✔ ✔
Monitor Patient Vitals in Real-Time
Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔ ✔
Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔ ✔
Government ETL Offload/Federal Budgetary Pressures ✔ ✔
Sentiment Analysis for Government Programs ✔
integration
jobs + ++
© Talend 2014 19
100x performance increase
< 1 sec response
Address new use cases
(last minute defense, dynamic pricing, real-time
fraud detection, etc.)
Simplify Real-Time Big Data
New components for streaming data
© Talend 2014 20
The Talend Solution
Scalable
• Generates native code
• Future-proof
• Built-in data quality
• More productive
• Open source
• Innovative
Agile
• Open source platform
• Learn once
• Expand many times
Easy
• Subscription pricing
• Per developer
• Predictable cost
Lowest TCO
The ease of use of the Talend platform allows us to deliver
© Talend 2014 21
The Three Drivers of Success
Product Innovation Market Adoption Industry Recognition
Customers
Community
Partners
“Visionary”
“Leader”
Multi-award winner
Big Data
Cloud
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Customer Case Study
Product Inventory and Pricing
© Talend 2014 23
The Old Way to Do Forecasting
Product category “HALLOWEEN”
© Talend 2014 24
Data Explosion in Size
Multiple SKUs Multiple stores
Product 2
Product 1
Product 3
Halloween mask
Halloween candies
Pumpkin
10,000’s 1, 000’s X
© Talend 2014 25
Need for a Modern Architecture
data at rest
DAO
Cassandra OLTP
Hadoop EDW
data in motion
BI
Viz &
Analytics
Graphical Generates code Runs on Hadoop
© Talend 2014 26
A New EDW “Eco” System
Enterprise
Intelligence &
Advanced Analytics
SSAS
Enterprise Data
Warehouse
Advanced Analytics Platform
or
Data
Refinery & Ingest Engine
Fast Data Cache
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Talend + Hortonworks = Open = Awesome!
• Pure open source governed cluster
• Don’t need to recode or reformat data
• No vendor lock-in
• Subscription models
• Most recent releases of Apache projects
• We are always aligned and up to date
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Forrester Wave™
Big Data Hadoop Solutions
Q1 2014
“Hortonworks loves and lives
open source innovation”
World Class Support and Services.
Hortonworks' Customer Support received a
maximum score and was significantly higher than
both Cloudera and MapR
A Leader in Hadoop
© Talend 2014 29
Questions?
Jim Walker @jaymce
Julien Sauvage @sauvageju
© Talend 2014 30
Check Out Our Talend + Hortonworks Sandbox!
http://www.talend.com