+ All Categories
Home > Software > H2O World - H2O Rains with Databricks Cloud

H2O World - H2O Rains with Databricks Cloud

Date post: 08-Jan-2017
Category:
Upload: srisatish-ambati
View: 708 times
Download: 2 times
Share this document with a friend
15
H2O Rains With Databricks Cloud for Spark Michal Malohlava <[email protected]>, H2O Richard Garris <[email protected]>, Databricks
Transcript
Page 1: H2O World - H2O Rains with Databricks Cloud

H2O Rains With Databr icks Cloud for Spark

Michal Malohlava <[email protected]>, H2O Richard Garris <[email protected]>, Databricks

Page 2: H2O World - H2O Rains with Databricks Cloud

Open-source distributed execution platform

!

User-friendly API for data transformation based on RDDs, DataFrames and Datasets

!

Platform components - SparkSQL, MLLib, Streaming, GraphX

Multitenancy

!

Large and active community

Spark

Page 3: H2O World - H2O Rains with Databricks Cloud

Databr icks

• Databricks • founded by the creators of Apache Spark • still contribute 75% of the code to the Spark project • cloud platform for running Spark in your AWS account

!

• Databricks Platform • integrated collaborative data science workspace • notebook interface inspired by iPython and Zeplin but purpose built for Spark • self service cluster manager and job scheduler for production Spark

workloads

Page 4: H2O World - H2O Rains with Databricks Cloud

Can I run H2O !on top of

Databricks cloud?

Page 5: H2O World - H2O Rains with Databricks Cloud

YES,!You can!

Page 6: H2O World - H2O Rains with Databricks Cloud

Sparkl ing Water

Provides Transparent integration of H2O with Spark ecosystem Transparent use of H2O data structures and algorithms with Spark API Platform for building Smarter Applications !

Excels in existing Spark workflows requiring advanced Machine Learning algorithms

Page 7: H2O World - H2O Rains with Databricks Cloud

Databr icks with H2O

Databricks

Worker EC2 node

worker

worker

Spark executor

Scala/Py main program

Worker EC2 node

worker

worker

Spark executor

Worker EC2 node

worker

worker

Spark executor

Driver EC2 node

Page 8: H2O World - H2O Rains with Databricks Cloud

Lets play !with it!

Page 9: H2O World - H2O Rains with Databricks Cloud

What do you need?

• Databricks account (14 day free trial at www.databricks.com)

• Your AWS account

• Sparkling Water jar !

• And some cool machine learning idea!

Page 10: H2O World - H2O Rains with Databricks Cloud

OR

Detect spam text messages

Page 11: H2O World - H2O Rains with Databricks Cloud

Data sample

Page 12: H2O World - H2O Rains with Databricks Cloud

Machine Learning Workf low

1. Extract data 2. Transform, tokenize messages 3. Build Tf-IDF model 4. Create and evaluate

Deep Learning model 5. Use the model to detect

spam

Goal: For a given text message identify if it is spam

or not

Page 13: H2O World - H2O Rains with Databricks Cloud

Databr icks setup step-by -step

• Create a cluster • Setup a library oMaven coordinate ai.h2o:sparkling-water-core_2.10:1.5.6 o Attach library to cluster • Load data o upload and create a table • Create a new notebook • Expose driver’s URL o assign elastic IP or create a proxy • Start coding

Page 14: H2O World - H2O Rains with Databricks Cloud

Live demo!

Page 15: H2O World - H2O Rains with Databricks Cloud

Learn more at h2o.ai Follow us at @h2oai

Thank you!


Recommended