© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank you for joining!
We’ll get started soon…
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Customer Analytics and Risk Management in Financial Services
We do Hadoop.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers…
Mark Lochbihler, Partner Solutions Engineer Hortonworks @MarkLochbihler
Bob Welshmer, Technical Director, Strategic Accounts Platfora @BDubya22
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Our Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop
Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter • Two thirds of customers come from F1000
Hortonworks and Hadoop at Scale • HDP in production on largest clusters on planet • Multiple +1000 node clusters, including 35,000 nodes at
Yahoo!, 800 nodes at Spotify
• Founded in 2011 • Original 24 architects, developers,
operators of Hadoop from Yahoo! • We are leaders in Hadoop community • 500+ employees
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Forrester Wave™ Big Data Hadoop Solutions Q1 2014
“Hortonworks loves and lives open source innovation” World Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR
A Leader in Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Approach
Innovate the Core 1
Architect and build innovation at the core of Hadoop
• YARN: Data Operating System • HDFS as the storage layer • Key processing engines
Extend Hadoop as an Enterprise Data Platform 2 Enable the Ecosystem 3
Extend Hadoop with enterprise capabilities for governance, security & operations Apply enterprise software rigor to the open source development process
Enable the leaders in the data center to easily adopt & extend their platforms
• Establish Hadoop as standard component of a modern data architecture
• Joint engineering
YARN : Data Opera.ng System
Script Pig
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
HDP 2.2
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
Memory
Spark
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enabling a Modern Data Architecture with HDP and Apache Hadoop
Hortonworks. We do Hadoop.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
Traditional systems under pressure
• Silos of Data • Costly to Scale • Constrained Schemas
Clickstream
Geolocation
Sentiment, Web Data
Sensor. Machine Data
Unstructured docs, emails
Server logs
SOU
RC
ES
Existing Sources (CRM, ERP,…)
RDBMS EDW MPP
New Data Types
…and difficult to manage new data
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP2 and YARN enable the Modern Data Architecture
Hortonworks architected and led development of YARN
Common data set, multiple applications • Optionally land all data in a single cluster
• Batch, interactive & real-time use cases
• Support multi-tenant access, processing & segmentation of data
YARN: Architectural center of Hadoop • Consistent security, governance & operations • Ecosystem applications certified
by Hortonworks to run natively in Hadoop
SOU
RC
ES
EXISTING Systems
Clickstream Web &Social
Geoloca.on Sensor & Machine
Server Logs
Unstructured
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
RDBMS EDW MPP YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
HDFS (Hadoop Distributed File System)
Interactive Real-Time Batch
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
1. Unlock New Applications from New Types of Data INDUSTRY USE CASE Sentiment
& Web Clickstream & Behavior
Machine & Sensor Geographic Server Logs Structured &
Unstructured
Financial Services New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔ ✔ ✔
Telecom Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔
Retail 360° View of the Customer ✔ ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
Manufacturing Supply Chain and Logistics ✔
Assembly Line Quality Assurance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔ ✔ ✔
Monitor Patient Vitals in Real-Time ✔ ✔
Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔ ✔
Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔ ✔
Government ETL Offload/Federal Budgetary Pressures ✔ ✔
Sentiment Analysis for Government Programs ✔
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
..to shift from reactive to proactive interactions
HDP and Hadoop allow organizations to shift interactions from…
Reactive Post Transaction
Proactive Pre Decision
…to Real-time Personalization From static branding
…to repair before break From break then fix
…to Designer Medicine From mass treatment
…to Automated Algorithms From Educated Investing
…to 1x1 Targeting From mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Telco
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
2. Or to realize a dramatic cost savings…
✚
EDW Optimization
OPERATIONS 50%
ANALYTICS 20%
ETL PROCESS 30%
OPERATIONS 50% ANALYTICS
50%
Current Reality EDW at capacity: some usage from low value workloads
Older data archived, unavailable for ongoing exploration
Source data often discarded
Augment w/ Hadoop
Free up EDW resources from low value tasks
Keep 100% of source data and historical data for ongoing exploration
Mine data for value after loading it because of schema-on-read
Hadoop Parse, Cleanse
Apply Structure, Transform
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
2. Or to realize a dramatic cost savings…
✚
EDW Optimization
OPERATIONS 50%
ANALYTICS 20%
ETL PROCESS 30%
OPERATIONS 50% ANALYTICS
50%
Current Reality EDW at capacity: some usage from low value workloads
Older data archived, unavailable for ongoing exploration
Source data often discarded
Augment w/ Hadoop
Free up EDW resources from low value tasks
Keep 100% of source data and historical data for ongoing exploration
Mine data for value after loading it because of schema-on-read
MPP
SAN
Engineered System
NAS
HADOOP
Cloud Storage
$0 $20,000 $40,000 $60,000 $80,000 $180,000
Fully-loaded Cost Per Raw TB of Data (Min–Max Cost)
Commodity Compute & Storage Hadoop Enables Scalable Compute & Storage at a Compelling Cost Structure
Hadoop Parse, Cleanse
Apply Structure, Transform
Storage Costs/Compute Costs from $19/GB to $0.23/GB
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. Data Lake: An architectural shift SC
ALE
SCOPE
Unlocking the Data Lake
RDBMS
MPP
EDW
Data Lake Enabled by YARN • Single data repository,
shared infrastructure
• Multiple biz apps accessing all the data
• Enable a shift from reactive to proactive interactions
• Gain new insight across the entire enterprise
New Analytic Apps or IT Optimization
HDP 2.2
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
© Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2014 - Confidential
Banking Data Lake for 100s of Use Cases
Page 15
Problem Architecture unsuited to capitalize on server log data • Huge investments company generates valuable data assets • Current EDW solutions are appropriate for some data workloads but too expensive
for others • Financial log data is difficult to aggregate & analyze at scale • Short retention hampers price history & performance analysis • Limited visibility into cost of acquiring customers
Solution Multi-tenant Hadoop cluster to merge data across groups • Server log data merged with structured data to uncover trends across traders • ETL offload saves money for Hadoop-appropriate workloads • Longer data retention enables price history analysis • Joining data sets for insight into customer acquisition costs • Accumulo enforces read permissions on individual data cells
Investment Services
Global investments company
> $1.5 trillion assets under management
> $14B billion in revenue
~ 50K employees
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP delivers a comprehensive data management platform
Hortonworks Data Platform 2.2
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume Kafka NFS
WebHDFS
Authentication Authorization Accounting
Data Protection
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon
Cluster: Knox Cluster: Ranger
Deployment Choice Linux Windows On-Premises Cloud
YARN is the architectural center of HDP
Enables batch, interactive and real-time workloads
Provides comprehensive enterprise capabilities
The widest range of deployment options
Delivered Completely in the OPEN
Introducing Platfora
LEAD THE INDUSTRY TRANSITION FROM BUSINESS INTELLIGENCE TO BIG DATA ANALYTICS. #1 Big Data Analytics
platform native on Hadoop
MISSION
End-to-end platform built for Multi-Structured Data
Self-service, iterative, interactive, and fast
WORLD CLASS CUSTOMERS
Proven Leader in Big Data Analytics for Hadoop
PROVEN COMPANY TO WATCH
Ones to Watch in Big Data
April 2014 10 Hot Hadoop Startups to Watch
CRN 10 Coolest Big Data Products of 2013
RAISED $65M BY LEADING INVESTORS
• Launched 9 product versions with feature innovations
• Grew customers by 4x and employees by 2x
MOMENTUM OVER THE PAST 12 MONTHS
The Way We Access and Interact with Data has Changed Data Warehousing: 1980s Technology that Still Exists Today
1. 1980s: DATA WAREHOUSING 2. EARLY 2000s: HADOOP 3. PLATFORA’S DISRUPTION
3-6 Months
ETL
Personnel Needed ETL Programmer
RAW DATA
Customer Interactions Machine Data Transactions
90% of copy trashed
Data Warehouse
Expensive $$$
Personnel Needed Data Warehouse Architect & Admin
BI Tool
Personnel Needed BI Architect & Admin
Business Analyst
Outcome Business Analyst gets the data after 3-6 months with little control. Rinse and repeat.
The Way We Access and Interact with Data has Changed Hadoop: Early 2000s New Data Storage Technology Arrives
RAW DATA
1. 1980s: DATA WAREHOUSING 2. EARLY 2000s: HADOOP 3. PLATFORA’S DISRUPTION
3-6 Months + +
H A D O O P MapReduce
Hive Pig
Personnel Needed Hadoop Expert Data Scientist
ETL
SQL on
Hadoop
Or Data Warehouse
Personnel Needed Data Warehouse Architect & Admin
BI Tool
Personnel Needed BI Architect & Admin
Business Analyst
Outcome Business Analyst gets the data after 3-6 months with little control. Rinse and repeat.
The Fastest Way to go from Raw Data to Analytics Platfora is Leading the Transition from Business Intelligence to Big Data Analytics
RAW DATA
1. 1980s: DATA WAREHOUSING 2. EARLY 2000s: HADOOP 3. PLATFORA’S DISRUPTION
H A D O O P
Minutes
Business Analyst iterates & repeats
No Additional Personnel Needed Easily accessible by Data Admins & Data Scientists
Business Analyst
Outcome Business Analyst gets the data in minutes, collaborates with team, and ask new questions quickly.
The Only Platform that Offers an End-to-End Architecture
Business Analyst
Interactive Big Data Analytics
Data Preparation
HDFS & Other Data Sources
RAW DATA & DATA
CONNECTORS Transactions Customer
Interactions Machine
Sec
urity
Dat
a C
atal
og
AP
I/Ext
ensi
ons
A Truly Scalable Platform
MapReduce/ Spark
Lens Raw data at PB+ scale Accessible data at TB+ scale
HADOOP PLATFORA
Platfora natively leverages the deep processing, scalability, and limitless data storage of Hadoop and combines it with a scale-out in-memory data processing engine to make access extremely fast across infinite nodes.
Unlocking Big Data Analytics Solutions
• Data Exfiltration Monitoring • Advanced Threat Analysis • Patch and Version Coverage
Security Analytics
• Omni-channel Conversion Analysis • Audience Segmentation • Behavior Analysis
Customer Analytics
• Consumer Devices and Monitoring • Utility Usage Monitoring • Product Telemetry
Internet of Things • No limit to new use cases / verticals
• Highly leveraged platform services
• Partners can add custom IP
Open Solution Platform
Customer Analytics
Internet of
Things
Security Analytics
Detecting Advanced Persistent Cybersecurity Attacks
“Platfora was built from the ground up for Hadoop. Other players have been around longer and they are all trying to shoehorn themselves into the Hadoop infrastructure. And being architected for Hadoop is very different from creating a Hadoop connector. That’s a big differentiator for Platfora.” Chief Information Security Officer
Multiple Large
Financial Organization
s The Business Challenge • MicroStrategy and other traditional BI tools
could not handle the volume of data that this financial org was ingesting
• Needed to be able to respond dynamically and instantly to any risk of malicious in-network activity
The Solution • Identified malicious in-network activity that had
stayed under the radar of traditional security solutions
• Combined internal, netflow, firewall, IDS, clickstream, and behavioral datasets for a wider perspective
• As a result, security analysts could see patterns of exfiltration and infiltration, and iterate to details without IT’s help
Platfora in Financial Services • Retail
• 360 degree view of customer • Marketing -Identify relevant customers for marketing campaigns - cross/up-sells • Digital Marketing – consolidation of channel analysis
• Investment Banking • Identify common customer investment/product behaviors and build strategies to leverage insights • Identify/track trends in stock performance
• Risk & Fraud • Comprehensive view of enterprise risk profile - Financial Risk (Credit, Market, Liquidity) Operational Risk
(Internal Audit, Vendor, Systems, Human Capital) • Potential Fraud identification
• IT Management & Cyber Security • Track IT events over time by user directly from logs • Analyze threat detection data for anomalies or outliers • Malicious email identification and threat resolution
27
DEMO
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Next Steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Platfora & Hortonworks http://hortonworks.com/partner/platfora/
Contact us: [email protected]