Distributed Analytics with Apache Spark and Couchbase
Jason Pohl (Databricks)Michael Nitschinger (Couchbase)
OUR PRODUCT• Creators of Apache Spark. Contribute
75% of the code - 10x more than others
• Trained 20K Spark users
• Largest number of customers deploying Spark (300+)
• Just-in-Time Data Platform
• Empower your organization to swiftly build and deploy advanced analytics
WHY US
Who is Databricks?
open source data processing engine built around speed, ease of use, and sophisticated analytics
largest open source data project with 1000+ contributors
UNIFIED ENGINE ACROSS DIVERSE WORKLOADS & ENVIRONMENTS
Scale out, fault tolerant
Python, Java, Scala, and R APIs
Standard libraries
APACHE SPARK ENGINE
First Cellular Phones Unified DeviceSpecialized Devices
ANALOGY: EVOLUTION OF CONSUMER ELECTRONICS
HISTORY REPEATS: FASTER, EASIER TO USE, UNIFIED
First DistributedProcessing Engine
Specialized Data Processing Engines
Unified Data Processing Engine
Google Trends: Hadoop vs. Spark
MAJOR FEATURES IN SPARK 2.0
PerformanceTungsten Phase 2speedups of 5-20x
Structured Streaming
Engine
SQL 2003& Machine Learning
Couchbase + Apache Spark Storage Processing
RecommendationsNext gen data warehousingPredictive analyticsFraud detection
Catalog Customer 360 + IOTPersonalizationMobile applications
Couchbase + Apache Spark Operations Analysis
RecommendationsNext gen data warehousingPredictive analyticsFraud detection
Catalog Customer 360 + IOTPersonalizationMobile applications
COUCHBASE SPARK CONNECTOR 2.0Spark 2.0 Support
Structured Streaming
Efficiency
Improved DCP handling memory allocation creates less garbageEasier Management
Tolerates Couchbase cluster topology changes (eg. add nodes & rebalance)
… except rollbacks
Demo
HADOOP / DATA LAKES
DATABRICKS JUST-IN-TIME DATA PLATFORM
Build a PoC on Databricks today.Professional services and training also available.
Contact [email protected]
Sign up for a trial at https://databricks.com/try-databricks