+ All Categories
Home > Documents > Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js,...

Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js,...

Date post: 09-May-2018
Category:
Upload: vodien
View: 220 times
Download: 1 times
Share this document with a friend
17
Visualizing big data in the browser using Spark Hossein Falaki @mhfalaki Spark Summit East – March 18, 2015
Transcript
Page 1: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Visualizing big data in the browser using SparkHossein Falaki @mhfalakiSpark Summit East – March 18, 2015

Page 2: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Exploratory Visualization

Put visualization back in the normal workflow of data analysis regardless of data size.

2

“Critical part of data analysis”—William S. Cleveland

• Interactive• Collaborative• Reproducible

Page 3: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Expository Visualization

3

Communication is often the bottleneck in data science, and a graph is worth a thousand words.

• Control over details• Shareable

Page 4: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Requirements

• Interactive• Collaborative• Shareable• Reproducible• Control over details

4

Use visualization libraries}

Use the browser}

Page 5: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Visualization as programming

• For complex tasks point and click may not be enough• Best expressed with a grammar (API)• Scripts are reproducible• Control over all details• Data scientists are already familiar with these tools

5

D3.js, Three.js, matplotlib, ggplot, Bokeh, Vincent, …

Page 6: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Do it in the browser

• Output of these tools can be readily used on the web (PNG, SVG, Canvas, WebGL)

• No need to transfer data and results• Browser is conducive to collaboration (e.g.,

Notebooks)• Separating data manipulation from rendering enables

users to freely choose the best tool for each job

6

Page 7: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Challenges with big data visualization

1.Manipulating large data can take a long time

2.We have more data points than pixels

7

Apache Spark can help solve both problems

Page 8: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Challenges

1. Manipulating large data can take a long time

8

> Memory> CPU

Page 9: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Reducing latency: caching

Take advantage of memory and storage hierarchy

9

• Serialized storage levels (for memory)• Memory & GC tuning

Page 10: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Reducing latency: parallelism

10

Increase number of CPUs> Get more executors with Mesos or Yarn> Click a button to increase cluster size in DBC

• Control level of parallelism for map and reduce tasks• Configure spark locality if needed

Page 11: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Challenges

1. Manipulating large data can take a long time

2. We have more data points than possible pixels

11

> Summarize> Model> Sample

Page 12: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

More data than pixels? Summarize

• Extensively used by BI tools> Aggregation> Pivoting

• Most data scientists’ nightly jobs summarize data

12

Page 13: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

More data than pixels? ModelMLLib supports a large (and growing) set of distributed algorithms• Clustering: k-means, GMM, LDA• Classification and regression:

LM, DT, NB• Dimensionality reduction: SVD, PCA• Collaborative filtering: ALS• Correlation, hypothesis testing

13

Page 14: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

More data than pixels? SampleExtensively used in statistics

Spark offers native support for:• Approximate and exact sampling• Approximate and exact stratified

sampling

Approximate sampling is faster and is good enough in most cases

14

Page 15: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Demo

15

Page 16: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

SummaryUsing Spark we can extend interactive visualization of large data

Reduce interaction latency to seconds> Cache data in memory> Increase parallelism

To visualize millions of points in the browser> Summarize> Model> Sample

16

Page 17: Visualizing big data in the browser using Spark big data in the browser using Spark ... D3.js, Three.js, ... Visualizing big data in the browser using Spark.key

Visualizing big data in the browser using Spark


Recommended