Post on 06-Aug-2015
transcript
More than 5,000 people trained over past year
2
Alleviating Data Scientist Scarcity Challenge
“Intro to Big Data with Apache Spark” • Anthony Joseph, UC Berkeley • Started June 1st
“Scalable Machine Learning”
• Ameet Talwalkar, UCLA • To start July 5th
More than 5,000 people trained over past year
3
Alleviating Data Scientist Scarcity Challenge
“Intro to Big Data with Apache Spark” • Anthony Joseph, UC Berkeley • Started June 1st, over 64K registered students
“Scalable Machine Learning”
• Ameet Talwalkar, UCLA • To start July 5th, over 26K registered students
4
…
Spark Core Python, Java, Scala, R
Spark Streaming real-time
Spark SQL interactive
MLlib machine learning
GraphX graph
a
Fast • Expressive • General
Spark Significantly Simplifies Big Data Processing
5
Still need to set up and manage your own Spark cluster
Still more complex to operate than existing single node tools (R, Python)
But Big Data Processing Remains Complex...
Databricks Truly Makes Big Data Simple A hosted end-to-end platform from ingest to production
6
Cluster Manager
Jobs Notebooks Third-Party Apps Dashboards
June 2014: Unveiling • Over 3,500 sign ups
November 2014: Limited Availability
Today • Over 150 organizations using Databricks
Databricks: The Journey Thus Far
7
Better products Update customers’ databases weekly instead of monthly
What can Databricks and Spark do for organizations?
8
Faster time to market Create new products in 3 weeks rather than 2 months
Democratize data access within enterprises Increase number of data analysts by 4x and number of data projects by 6x
Ease of use Increase user productivity
10
Key Areas of Focus
1
2
Integration with existing (small and big) data tools Make non-Spark experts instantly productive
3
Security Enable mission-critical applications
11
Cluster manager with multiple Spark versions
From notebooks to dashboards and jobs with just a few clicks
Lunch and monitor jobs, including streaming
Ease of Use
Notebooks
Dashboards
Jobs