Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | osborne-green |
View: | 228 times |
Download: | 0 times |
© 2015 IBM Corporation
UNIT 2: BigData Analytics with Spark and Spark Platforms
1
Shelly Garion
IBM Research -- Haifa
© 2015 IBM Corporation2
Outline
Map/Reduce
Scala
Spark Core API
Transformations and Actions
Spark Platforms:– MLLib – Machine Learning–GraphX – Graph Processing–SQL–Streaming
What’s new?
© 2015 IBM Corporation3
How to Analyze BigData?
© 2015 IBM Corporation4
Basic Example: Word Count (Spark & Python)
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation5
Basic Example: Word Count (Spark & Scala)
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation6
Scala
Spark was originally written in Scala – Java and Python API were added later
Scala: high-level language for the JVM– Object oriented– Functional programming– Immutable– Inspired by criticism of the shortcomings of Java
Static types– Comparable in speed to Java– Type inference saves us from having to write explicit types most of the time
Interoperates with Java– Can use any Java class– Can be called from Java code
© 2015 IBM Corporation7
Scala vs. Java
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation8
Spark
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation9
Spark & Scala: Creating RDD
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
or SoftLayer object store
© 2015 IBM Corporation10
Spark & Scala: Basic Transformations
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation11
Spark & Scala: Basic Actions
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation12
Spark & Scala: Key-Value Operations
Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation13
Example: Spark Core API
Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
© 2015 IBM Corporation14
Example: Spark Core API
Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
© 2015 IBM Corporation15
Example: Spark Core API
Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
© 2015 IBM Corporation16
Example: Spark Core API
Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/
Better implementation:
© 2015 IBM Corporation17
Example: PageRank
How to implement PageRank algorithm using Map/Reduce?
Hossein Falaki, Numerical Computing with Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation18
Spark Platform
Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation19
Spark Platform: GraphX
Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation20
Spark Platform: GraphXExample: PageRank
PageRank is implemented using Pregel graph processing
© 2015 IBM Corporation21
Spark Platform: MLLib
Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation22
Spark Platform: MLLibExample: K-Means Clustering
Goal:
Segment tweets into clusters by geolocation using Spark MLLib K-means clustering
https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/
© 2015 IBM Corporation23
Spark Platform: MLLibExample: K-Means Clustering
https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/
© 2015 IBM Corporation24
Spark Platform: MLLibExample: K-Means Clustering
https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/
© 2015 IBM Corporation25
Spark Platform: Streaming
Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation26
Spark Platform: StreamingExample
© 2015 IBM Corporation27
Spark Platform: SQL
Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation28
Spark Platform: SQL & MLLibExample
// SVM using Stochastic Gradient Descent
Xiangrui Meng, MLLib: scalable machine learning on Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/
© 2015 IBM Corporation29
What’s new in 2015?
Spark R (R interface)
DataFrame – API via Spark SQL
Spark ML – support for pipelines
Matei Zaharia, New directions for Spark in 2015, Spark Summit East March 2015, https://spark-summit.org/east-2015/