Date post: | 21-Feb-2017 |
Category: |
Data & Analytics |
Upload: | spark-summit |
View: | 124 times |
Download: | 0 times |
Virtualizing Analytics with Apache Spark
Arsalan Tavakoli-ShirajiSpark Summit East 2017
Enterprise aspirations:More data, more intelligence
So what’s the formula for success?
ANALYTICS
PEOPLEDATA
3 pillars of any data-driven use case
Data: Bigger, messier, more spread out
DATA • Spread out into silos• Varying types and structure• Faster Velocity
Analytics: More variety and complexity
• Multiple approaches• Iterative discovery• Difficult to productionize
ANALYTICS
People: Collaboration from start to finish
PEOPLE • Many roles involved• Diverse skillsets and goals• Inefficient hand-offs
Can we reuse existing technologies?
DATA
Only structured data; Costly to scale
First Generation: The Data WarehouseReporting on small data
ANALYTICS
PEOPLE
SQL only
Targeted at BI
ANALYTICS
PEOPLE
Disparate and complex tools
Limited to developers with big data expertise
Second Generation: Hadoop + Data LakeCapture data first, ETL later
DATA
Hard to centralize the data;Limited value without ETL
V I R T U A L A N A LY T I CS
Decoupled compute and storage
Uniform data management and security model
Unified analytics engine
Enterprise-wide collaboration
Data Warehouses
DATA
Cloud storage
Cloud Storage
And many others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many others…
BI Analysts
The New Paradigm
Is Spark the Answer?
Data Warehouses
DATA
Cloud storage
Cloud Storage
And many others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many others…
BI Analysts
Databricks + Apache Spark
Managed Cloud Platform Integrated Workspace
Production Workflow
Automation
Optimized Data Access
Layer
Databricks Enterprise Security
Data Warehouses
DATA
Cloud storage
Many others…
Cloud Storage
And many others…
Hadoop Storage
PEOPLE
Data Science
Data Engineering
And many others…
BI Analysts
Case Study |
Video qualityReal-time anomaly detection
Viewer loyaltyGrow the Viacom audience
The Road Ahead