+ All Categories
Home > Data & Analytics > Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Date post: 11-Apr-2017
Category:
Upload: brandon-obrien
View: 285 times
Download: 2 times
Share this document with a friend
13
Real Time Data Processing with Spark Streaming Brandon O’Brien Oct 26 th , 2016
Transcript
Page 1: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Real Time Data Processing with

Spark Streaming

Brandon O’BrienOct 26th, 2016

Page 2: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Intro

1. Intro2. Demo + high level walkthrough3. Spark in detail4. Detailed demo walkthrough and/or

workshop

Page 3: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming

Spark experience level?

Select one: Beginner Intermediate Expert

Page 4: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Demo

DEMO

Page 5: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Demo Info

• Data Source:• Data Producer Thread• Redis

• Data Consumer• Spark as Stream Consumer• Redis Publish

• Dashboard:• Node.js/Redis Integration• Socket.io Publish• AngularJS + JavaScript

Page 6: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Spark in detail

SPARK IN DETAIL

Page 7: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Concepts

Application:• Driver program• RDD• Partition• Elements• DStream• InputReceiver• 1 JVM for driver

program• 1 JVM per executor

Cluster:• Master• Executors• Resources• Cores• Gigs RAM

• Cluster Types:• Standalone• Mesos• YARN

Page 8: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Lazy execution

//Allocate resources on clusterval conf = new SparkConf().setAppName(appName).setMaster(master) val sc = new SparkContext(conf)

//Lazy definition of logical processing (transformations)val textFile = sc.textFile("README.md")

.filter(line=> {line.length> 10})

//foreachPartition() triggers execution (actions)textFile.foreachPartition(partition=> {

partition.foreach(line => {println(line)

})})• Use rdd.persist() when multiple actions are called on the same RDD

Page 9: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Execution Env

• Distributed data, distributed code• RDD partitions are distributed across executors• Actions trigger execution and return results to the driver program• Code is executed on either the driver or executors• Be careful of function closures!

//Function arguments to transformations executed on executorsval textFile = sc.textFile("README.md")

.filter(line=> {line.length> 10})

//collect() triggers execution (actions)//executed on driver. foreachPartition executed on executorstextFile.collect().foreach(line => {

println(line)})

Page 10: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Execution Env

Page 11: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: Parallelism

• RDD partitions are processed in parallel• Elements in a single partition are processed serially• You control the number of partitions in an RDD• If you need to guarantee any particular ordering of processing, use

groupByKey() to force all elements with the same key onto the same partitions

• Be careful of shuffles

val textFile = sc.textFile("README.md”)val singlePartitionRDD = textFile.repartition(1)

val linesByKey = shopResultsEnriched.map(line => (getPartitionKey(line), line)).groupByKey()

Page 12: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Spark Streaming: DStreams

• Receiver Types• Kafka (Receiver + Direct)• Flume• Kinesis• TCP Socket• Custom (Ex: redis.receiver.RedisReceiver.scala)

• Note: Kafka receiver will consume an entire core (no context switch)

Page 13: Real Time Data Processing With Spark Streaming, Node.js and Redis with Visualization

Real Time Data Processing with

Spark Streaming

Brandon O’BrienOct 26th, 2016


Recommended