+ All Categories
Home > Technology > Databricks Community Cloud

Databricks Community Cloud

Date post: 15-Apr-2017
Category:
Upload: clairvoyantllc
View: 54 times
Download: 6 times
Share this document with a friend
32
Databricks Community Cloud By: Robert Sanders
Transcript
Page 1: Databricks Community Cloud

Databricks Community Cloud

By: Robert Sanders

Page 2: Databricks Community Cloud

2Page:

Databricks Community Cloud

• Free/Paid Standalone Spark Cluster•Online Notebook• Python• R• Scala• SQL

• Tutorials and Guides• Shareable Notebooks

Page 3: Databricks Community Cloud

3Page:

Why is it useful?

• Learning about Spark• Testing different versions of Spark• Rapid Prototyping•Data Analysis• Saved Code•Others…

Page 4: Databricks Community Cloud

4Page:

Forumshttps://forums.databricks.com/

Page 5: Databricks Community Cloud

5Page:

Login/Sign Uphttps://community.cloud.databricks.com/login.html

Page 6: Databricks Community Cloud

6Page:

Home Page

Page 7: Databricks Community Cloud

7Page:

Active Clusters

Page 8: Databricks Community Cloud

8Page:

Create a Cluster - Steps

1. From the Active Clusters page, click the “+ Create Cluster” button

2. Fill in the cluster name3. Select the version of Apache Spark4. Click “Create Cluster”5. Wait for the Cluster to start up and be in a

“Running” state

Page 9: Databricks Community Cloud

9Page:

Create a Cluster

Page 10: Databricks Community Cloud

10Page:

Active Clusters

Page 11: Databricks Community Cloud

11Page:

Active Clusters – Spark Cluster UI - Master

Page 12: Databricks Community Cloud

12Page:

Workspaces

Page 13: Databricks Community Cloud

13Page:

Create a Notebook - Steps

1. Right click within a Workspace and click Create -> Notebook

2. Fill in the Name3. Select the programming language4. Select the running cluster you’ve created that you

want to attach to the Notebook5. Click the “Create” button

Page 14: Databricks Community Cloud

14Page:

Create a Notebook

Page 15: Databricks Community Cloud

15Page:

Notebook

Page 16: Databricks Community Cloud

16Page:

Using the Notebook

Page 17: Databricks Community Cloud

17Page:

Using the Notebook – Code Snippets

> sc

> sc.parallelize(1 to 5).collect()

Page 18: Databricks Community Cloud

18Page:

Using the Notebook - Shortcuts

Short Cut ActionShift + Enter Run Selected Cell and Move to

next CellCtrl + Enter Run Selected CellOption + Enter Run Selected Cell and Insert Cell

BellowCtrl + Alt + P Create Cell Above Current CellCtrl + Alt + N Create Cell Bellow Selected

Cell

Page 19: Databricks Community Cloud

19Page:

Tables

Page 20: Databricks Community Cloud

20Page:

Create a Table - Steps

1. From the Tables section, click “+ Create Table”2. Select the Data Source (bellow steps assume you’re using

File as the Data Source)3. Upload a file from your local file system

1. Supported file types: CSV, JSON, Avro, Parquet4. Click Preview Table5. Fill in the Table Name6. Select the File Type and other Options depending on the File

Type7. Change Column Names and Types as desired8. Click “Create Table”

Page 21: Databricks Community Cloud

21Page:

Create a Table – Upload File

Page 22: Databricks Community Cloud

22Page:

Create a Table – Configure Table

Page 23: Databricks Community Cloud

23Page:

Create a Table – Review Table

Page 24: Databricks Community Cloud

24Page:

Notebook – Access Table

Page 25: Databricks Community Cloud

25Page:

Notebook – Access Table – Code Snippets

> sqlContext

> sqlContext.sql("show tables").collect()

> val got = sqlContext.sql("select * from got")> got.limit(10).collect()

Page 26: Databricks Community Cloud

26Page:

Notebook – Display

Page 27: Databricks Community Cloud

27Page:

Notebook – Data Cleaning for Charting

Page 28: Databricks Community Cloud

28Page:

Notebook – Plot Options

Page 29: Databricks Community Cloud

29Page:

Notebook – Charting

Page 30: Databricks Community Cloud

30Page:

Notebook – Display and Charting – Code Snippets

> filter(got)

> val got = sqlContext.sql("select * from got")> got.limit(10).collect()

> import org.apache.spark.sql.functions._ > val allegiancesCleanupUDF = udf[String, String] (_.toLowerCase().replace("house ", ""))> val isDeathUDF = udf{ deathYear: Integer => if(deathYear != null) 1 else 0}> val gotCleaned = got.filter("Allegiances != \"None\"").withColumn("Allegiances", allegiancesCleanupUDF($"Allegiances")).withColumn("isDeath", isDeathUDF($"Death Year"))> display(gotCleaned)

Page 31: Databricks Community Cloud

31Page:

Publish Notebook - Steps

1. While in a Notebook, click “Publish” on the top right

2. Click “Publish” on the pop up3. Copy the link and send it out

Page 32: Databricks Community Cloud

32Page:

Publish Notebook


Recommended