+ All Categories
Home > Documents > © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM...

© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM...

Date post: 30-Dec-2015
Category:
Upload: osborne-green
View: 228 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa
Transcript
Page 1: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation

UNIT 2: BigData Analytics with Spark and Spark Platforms

1

Shelly Garion

IBM Research -- Haifa

Page 2: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation2

Outline

Map/Reduce

Scala

Spark Core API

Transformations and Actions

Spark Platforms:– MLLib – Machine Learning–GraphX – Graph Processing–SQL–Streaming

What’s new?

Page 3: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation3

How to Analyze BigData?

Page 4: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation4

Basic Example: Word Count (Spark & Python)

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 5: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation5

Basic Example: Word Count (Spark & Scala)

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 6: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation6

Scala

Spark was originally written in Scala – Java and Python API were added later

Scala: high-level language for the JVM– Object oriented– Functional programming– Immutable– Inspired by criticism of the shortcomings of Java

Static types– Comparable in speed to Java– Type inference saves us from having to write explicit types most of the time

Interoperates with Java– Can use any Java class– Can be called from Java code

Page 7: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation7

Scala vs. Java

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 8: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation8

Spark

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 9: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation9

Spark & Scala: Creating RDD

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

or SoftLayer object store

Page 10: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation10

Spark & Scala: Basic Transformations

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 11: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation11

Spark & Scala: Basic Actions

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 12: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation12

Spark & Scala: Key-Value Operations

Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 13: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation13

Example: Spark Core API

Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

Page 14: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation14

Example: Spark Core API

Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

Page 15: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation15

Example: Spark Core API

Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

Page 16: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation16

Example: Spark Core API

Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

Better implementation:

Page 17: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation17

Example: PageRank

How to implement PageRank algorithm using Map/Reduce?

Hossein Falaki, Numerical Computing with Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 18: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation18

Spark Platform

Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 19: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation19

Spark Platform: GraphX

Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 20: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation20

Spark Platform: GraphXExample: PageRank

PageRank is implemented using Pregel graph processing

Page 21: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation21

Spark Platform: MLLib

Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 22: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation22

Spark Platform: MLLibExample: K-Means Clustering

Goal:

Segment tweets into clusters by geolocation using Spark MLLib K-means clustering

https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/

Page 23: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation23

Spark Platform: MLLibExample: K-Means Clustering

https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/

Page 24: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation24

Spark Platform: MLLibExample: K-Means Clustering

https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/

Page 25: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation25

Spark Platform: Streaming

Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 26: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation26

Spark Platform: StreamingExample

Page 27: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation27

Spark Platform: SQL

Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 28: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation28

Spark Platform: SQL & MLLibExample

// SVM using Stochastic Gradient Descent

Xiangrui Meng, MLLib: scalable machine learning on Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

Page 29: © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

© 2015 IBM Corporation29

What’s new in 2015?

Spark R (R interface)

DataFrame – API via Spark SQL

Spark ML – support for pipelines

Matei Zaharia, New directions for Spark in 2015, Spark Summit East March 2015, https://spark-summit.org/east-2015/


Recommended