Date post: | 12-Jul-2015 |
Category: |
Data & Analytics |
Upload: | tin-ho |
View: | 298 times |
Download: | 0 times |
Making the leap to BI on Hadoop
Predictive Analytics & Business Insights 2014
November 19, 2014
David P. Mariani
CEO
AtScale, Inc.
2
THE TRUTH
ABOUT DATA
2
“We think only 3% of the
potentially useful data is tagged,
and even less is analyzed.”
Source: IDC Predictions 2013: Big Data, IDC
“90% of the data in the world
today has been created in
the last two years”Source: IBM
The Broken PromiseWhat We WantedCentralized Data Warehouse
What We GotData Marts
WHAT WE GOT
ETL + STAR SCHEMAS
6
INPUT DATA
ETL
MART MART MART
QUERY ENGINE
ANALYSIS TOOLS
DATA
WAREHOUSE
Traditional Data Architecture
7
INPUT DATA
ETL
MART MART MART
QUERY ENGINE
ANALYSIS TOOLS
DATA
WAREHOUSE
What’s Wrong with this Picture
Highly complex
Lots of people & skillsets
Multiple copies of data
Stale data
Rigid schema
Tough to change
Write Many StructuredEarly Transformation
8
It Takes an Army
BI Engineer
Design Reports/Dashboards
ETL Engineer
Automate Cube Load
BI Engineer
Design Cube
DBA
Automate Data Load
ETL Engineer
Write ETL Code
DBA
Create Tables
Data Warehouse Architect
Design Star Schema
SAN/NAS Engineer
Define Storage Architecture
9
Star Schema = Unnatural!
WHAT WE WANTED
SCHEMA ON DEMAND
11
Data Management Approaches
INPUT DATA
ETL
MART MART MART
QUERY ENGINE
ANALYSIS TOOLS
DATA
WAREHOUSE
Traditional Approach New Approach
INPUT DATA
ANALYSIS TOOLS
HADOOP
Time for a New Approach
VS
Write Once Semi-StructuredLate Transformation
✔ ✔ ✔
13
Not This, That
BI Engineer
Run Queries/Create Reports
Hadoop Engineer
Create EXTERNAL Tables
Hadoop Engineer
Define location to store files
BI Engineer
Design Reports/Dashboards
ETL Engineer
Automate Cube Load
BI Engineer
Design Cube
DBA
Automate Data Load
ETL Engineer
Write ETL Code
DBA
Create Tables
Data Warehouse Architect
Design Star Schema
SAN/NAS Engineer
Define Storage Architecture
VS
Example: Key-Values
Example: JSON
DEMOMOBA Game Analytics
17
Demo: DOTA 2 – What the User Sees
Key Data Points: 5 vs. 5 players per match. Players choose ‘Heroes’, use ‘Items’ & earn ‘Gold’.
FOR THE DATA SCIENTISTS!
Demo: Dota2 – Raw Data (JSON)
Match Details Player Details Player Profile
View Source
View Source
20
As Easy As 1,2,3
BI Engineer
Run Queries/Create Reports
Hadoop Engineer
Create EXTERNAL Tables
Hadoop Engineer
Define location to store files
21
Demo: DOTA 2 – Use Case 1
Question: Who are the most popular heroes?
22
Demo: DOTA 2 – Use Case 2
Question: Which heroes have the highest win rate?
23
Demo: DOTA 2 – Use Case 3
Question: What are the top 3 items associated with the best win rate?
24
Practical Applications
Time Server Analysis (session data)
Affinity Analysis
Segmentation Analysis
Many to Many
NO JOINS = HORIZONTAL SCALE
FOR THE
ORDINARY HUMAN!
27
DEMO
29
Summary: The Do’s & Don’ts
Capture data “as is” Pre-aggregate data
Apply schema on read Force schema on load
Land new data on Hadoop Land new data on relational
DBs
Create a data warehouse Create data marts
Leverage open source engines Invest in proprietary databases
Do Don’t
Business Intelligence Redefined