Databricks Community Cloud

Post on 15-Apr-2017

54 views 6 download

transcript

Databricks Community Cloud

By: Robert Sanders

2Page:

Databricks Community Cloud

• Free/Paid Standalone Spark Cluster•Online Notebook• Python• R• Scala• SQL

• Tutorials and Guides• Shareable Notebooks

3Page:

Why is it useful?

• Learning about Spark• Testing different versions of Spark• Rapid Prototyping•Data Analysis• Saved Code•Others…

4Page:

Forumshttps://forums.databricks.com/

5Page:

Login/Sign Uphttps://community.cloud.databricks.com/login.html

6Page:

Home Page

7Page:

Active Clusters

8Page:

Create a Cluster - Steps

1. From the Active Clusters page, click the “+ Create Cluster” button

2. Fill in the cluster name3. Select the version of Apache Spark4. Click “Create Cluster”5. Wait for the Cluster to start up and be in a

“Running” state

9Page:

Create a Cluster

10Page:

Active Clusters

11Page:

Active Clusters – Spark Cluster UI - Master

12Page:

Workspaces

13Page:

Create a Notebook - Steps

1. Right click within a Workspace and click Create -> Notebook

2. Fill in the Name3. Select the programming language4. Select the running cluster you’ve created that you

want to attach to the Notebook5. Click the “Create” button

14Page:

Create a Notebook

15Page:

Notebook

16Page:

Using the Notebook

17Page:

Using the Notebook – Code Snippets

> sc

> sc.parallelize(1 to 5).collect()

18Page:

Using the Notebook - Shortcuts

Short Cut ActionShift + Enter Run Selected Cell and Move to

next CellCtrl + Enter Run Selected CellOption + Enter Run Selected Cell and Insert Cell

BellowCtrl + Alt + P Create Cell Above Current CellCtrl + Alt + N Create Cell Bellow Selected

Cell

19Page:

Tables

20Page:

Create a Table - Steps

1. From the Tables section, click “+ Create Table”2. Select the Data Source (bellow steps assume you’re using

File as the Data Source)3. Upload a file from your local file system

1. Supported file types: CSV, JSON, Avro, Parquet4. Click Preview Table5. Fill in the Table Name6. Select the File Type and other Options depending on the File

Type7. Change Column Names and Types as desired8. Click “Create Table”

21Page:

Create a Table – Upload File

22Page:

Create a Table – Configure Table

23Page:

Create a Table – Review Table

24Page:

Notebook – Access Table

25Page:

Notebook – Access Table – Code Snippets

> sqlContext

> sqlContext.sql("show tables").collect()

> val got = sqlContext.sql("select * from got")> got.limit(10).collect()

26Page:

Notebook – Display

27Page:

Notebook – Data Cleaning for Charting

28Page:

Notebook – Plot Options

29Page:

Notebook – Charting

30Page:

Notebook – Display and Charting – Code Snippets

> filter(got)

> val got = sqlContext.sql("select * from got")> got.limit(10).collect()

> import org.apache.spark.sql.functions._ > val allegiancesCleanupUDF = udf[String, String] (_.toLowerCase().replace("house ", ""))> val isDeathUDF = udf{ deathYear: Integer => if(deathYear != null) 1 else 0}> val gotCleaned = got.filter("Allegiances != \"None\"").withColumn("Allegiances", allegiancesCleanupUDF($"Allegiances")).withColumn("isDeath", isDeathUDF($"Death Year"))> display(gotCleaned)

31Page:

Publish Notebook - Steps

1. While in a Notebook, click “Publish” on the top right

2. Click “Publish” on the pop up3. Copy the link and send it out

32Page:

Publish Notebook