Spark SQL and DataFrames ���Spark GraphX ���Spark Mlib ���Spark Streaming
Lightning-fast cluster computing
Chaining transformations
2
Creating a SQL context
4
Creating DataFrames
6
Creating a DataFrame from Hive
7
Place your hive-site.xml, core-site.xml (for security configuration), hdfs-site.xml (for HDFS configuration) file in your spark conf/
Creating a DataFrame from MySQL
8
Creating a DataFrame from MySQL
9
Transforming and querying DataFrames
10 https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#
Working data in a DataFrame
11
Working data in a DataFrame
12
DataFrame queries
13
DataFrame queries
14
DataFrame queries
15
Query DataFrame using columns
16
Query DataFrame using columns
17
Saving DataFrames
19
DataFrames and RDDs
20
DataFrames and RDDs
21
Working with Row objects
22
Extracting data from rows
23
Covert RDD to DataFrame
24
ML and GraphX in Spark
25
Common spark use case
26
Common spark use case
27
Spark examples
28
Iterative algorithms in Spark: PageRank
29
PageRank algorithm
30
PageRank algorithm
31
PageRank algorithm
32
PageRank algorithm
33
Neighbor contribution function
34
Pairs of page links
36
Page links grouped by source page
37
Persisting the link pair RDD
38
Set initial ranks
39
First iteration
40
First iteration
41
First iteration
42
First iteration
43
Second iteration
44
Checking point
45
Checking point
46
GraphX in Spark
47
Examples in GraphX
48
MLlib in Spark���
49
https://spark.apache.org/docs/2.0.2/ml-guide.html
What is MLlib?
50
Why MLlib?
51
https://docs.databricks.com/spark/latest/mllib/decision-trees.html
Spark streaming
52 http://spark.apache.org/docs/latest/streaming-programming-guide.html