Spark Summit 2015 keynote: Making Big Data Simple with Spark

transcript

Making big Data Simple with Spark

Ion Stoica and Ali Ghodsi June 15, 2015

More than 5,000 people trained over past year

Alleviating Data Scientist Scarcity Challenge

“Intro to Big Data with Apache Spark” •  Anthony Joseph, UC Berkeley •  Started June 1st

“Scalable Machine Learning”

•  Ameet Talwalkar, UCLA •  To start July 5th

More than 5,000 people trained over past year

Alleviating Data Scientist Scarcity Challenge

“Intro to Big Data with Apache Spark” •  Anthony Joseph, UC Berkeley •  Started June 1st, over 64K registered students

“Scalable Machine Learning”

•  Ameet Talwalkar, UCLA •  To start July 5th, over 26K registered students

Spark Core Python, Java, Scala, R

Spark Streaming real-time

Spark SQL interactive

MLlib machine learning

GraphX graph

Fast • Expressive • General

Spark Significantly Simplifies Big Data Processing

Still need to set up and manage your own Spark cluster

Still more complex to operate than existing single node tools (R, Python)

But Big Data Processing Remains Complex...

Databricks Truly Makes Big Data Simple A hosted end-to-end platform from ingest to production

Cluster Manager

Jobs Notebooks Third-Party Apps Dashboards

June 2014: Unveiling •  Over 3,500 sign ups

November 2014: Limited Availability

Today •  Over 150 organizations using Databricks

Databricks: The Journey Thus Far

Better products Update customers’ databases weekly instead of monthly

What can Databricks and Spark do for organizations?

Faster time to market Create new products in 3 weeks rather than 2 months

Democratize data access within enterprises Increase number of data analysts by 4x and number of data projects by 6x

General Availability starting today!

www.databricks.com

Ease of use Increase user productivity

Key Areas of Focus

Integration with existing (small and big) data tools Make non-Spark experts instantly productive

Security Enable mission-critical applications

Cluster manager with multiple Spark versions

From notebooks to dashboards and jobs with just a few clicks

Lunch and monitor jobs, including streaming

Ease of Use

Notebooks

Dashboards

Best-of-breed apps Versioning R Notebooks

Integration

Run in your own Amazon account

Access Control Lists

Security

Encryption at rest

Spark Summit 2015 keynote: Making Big Data Simple with Spark

Software