Spark Summit Europe 2016 Keynote - Databricks CEO

Post on 06-Jan-2017

535 views 0 download

transcript

Democratizing AI with Apache Spark

Ali GhodsiCo-Founder and CEO

AI is changing the world

2

Why now?

AlphaGoSIRI/assistantsSelf-driving cars

Data is the catalyst

3

AI hasn’t been democratized

Better training, tuning, validationMore data

Clickstreams

Sensor data (IoT)

Video

Speech

Handwriting

The hardest part of AI isn’t AI

4

“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015

How do we democratize AI?

5

“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015

+ AI

FLEXIBLE FAST BIG DATA

Some gaps remain

6

Manage Data infrastructure

• Create, configure, monitor resilient big data clusters.• Securely access silos of disparate data sources.• Enforce proper data governance.•1

Empower teams to be productive

• Interactively explore data and prototype ideas.• Securely share big data clusters among analysts.• Debug, troubleshoot, version-control big data applications.•

2

Establish Production-Ready Applications

• Setup robust ML data pipelines for ETL/ELT.• Productionize real-time applications with HA, FT.• Build, serve, maintain advanced machine learning models.•3

Databricks: Closing the gap

7

• Separate compute & storage

• Integrate existing data stores

• Efficient cache on first access

Just-in-Time Data Platform1

Agile + Low TCO

• Interactive notebooks, dashboards, reports

• Real-time exploration, machine learning, graph use cases

Integrated Workspace2

Accelerate Time to Value

• Workflow scheduler for ML, streaming, SQL, ETL

• Performance-optimized, high availability, fault-tolerant

Automated Spark Management3

Performance

Enterprise AI use-cases

8

Predict credit score, credit limit, anomalies

Predict energy demand based on massive weather data

Natural language processing to extract author graph

Predict player churn, predicting network outages

Predict machine equipment failure

New Frontier of AI: Deep Learning

9

Detect cancer Understand speech Infer locationIdentify landmarks in photosRecognize Mandarin and

EnglishImprove cancer detection

Faster and easier deep learning with Databricks

10

GPUs

• TensorFlow: The most popular deep learning framework.

• TensorFrames: Makes TensorFlow computations faster and easier to program on Spark.

TensorFlow on

TensorFrames and GPUs support out-of-the-box

Massive parallelism

Deep Learning on Databricks

11

Data Ingest

Feature extraction

Model Training

Product-ionizeClusters

Jobs & WorkflowsTensorFrames+

GPUs

Interactive exploration

Just-in-time data platform

Automated management

Thank you.