Databricks: Exploring all the ways to analyze data with Spark – Couchbase Connect 2016

Post on 15-Feb-2017

143 views 0 download

transcript

Distributed Analytics with Apache Spark and Couchbase

Jason Pohl (Databricks)Michael Nitschinger (Couchbase)

OUR PRODUCT• Creators of Apache Spark. Contribute

75% of the code - 10x more than others

• Trained 20K Spark users

• Largest number of customers deploying Spark (300+)

• Just-in-Time Data Platform

• Empower your organization to swiftly build and deploy advanced analytics

WHY US

Who is Databricks?

open source data processing engine built around speed, ease of use, and sophisticated analytics

largest open source data project with 1000+ contributors

UNIFIED ENGINE ACROSS DIVERSE WORKLOADS & ENVIRONMENTS

Scale out, fault tolerant

Python, Java, Scala, and R APIs

Standard libraries

APACHE SPARK ENGINE

First Cellular Phones Unified DeviceSpecialized Devices

ANALOGY: EVOLUTION OF CONSUMER ELECTRONICS

HISTORY REPEATS: FASTER, EASIER TO USE, UNIFIED

First DistributedProcessing Engine

Specialized Data Processing Engines

Unified Data Processing Engine

Google Trends: Hadoop vs. Spark

MAJOR FEATURES IN SPARK 2.0

PerformanceTungsten Phase 2speedups of 5-20x

Structured Streaming

Engine

SQL 2003& Machine Learning

Couchbase + Apache Spark Storage Processing

RecommendationsNext gen data warehousingPredictive analyticsFraud detection

Catalog Customer 360 + IOTPersonalizationMobile applications

Couchbase + Apache Spark Operations Analysis

RecommendationsNext gen data warehousingPredictive analyticsFraud detection

Catalog Customer 360 + IOTPersonalizationMobile applications

COUCHBASE SPARK CONNECTOR 2.0Spark 2.0 Support

Structured Streaming

Efficiency

Improved DCP handling memory allocation creates less garbageEasier Management

Tolerates Couchbase cluster topology changes (eg. add nodes & rebalance)

… except rollbacks

Demo

HADOOP / DATA LAKES

DATABRICKS JUST-IN-TIME DATA PLATFORM

Build a PoC on Databricks today.Professional services and training also available.

Contact sales@databricks.comor

Sign up for a trial at https://databricks.com/try-databricks