+ All Categories
Home > Technology > Apache Spark Streaming - bigdata.com

Apache Spark Streaming - bigdata.com

Date post: 07-Aug-2015
Category:
Upload: knowbigdata
View: 230 times
Download: 0 times
Share this document with a friend
Popular Tags:
10
Transcript
Page 1: Apache Spark Streaming -  bigdata.com
Page 2: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMINGExtension of the core Spark API: high-throughput, fault-tolerant

Page 3: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING

Workflow • Spark Streaming receives live input data streams • Divides the data into batches • Spark Engine generates the final stream of results in batches.

Provides a discretized stream or DStream - a continuous stream of data.

Page 4: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING - DSTREAMInternally represented using RDD

Each RDD in a DStream contains data from a certain interval.

pairs.reduceByKeyAndWindow(reduceFunc, new Duration(30000), new Duration(10000)); // Reduce last 30 seconds of data, every 10 seconds

Window Operations

Page 5: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING - EXAMPLEProblem: You can to do the word count every second.

Step 1: Create a connection to the service

JavaStreamingContext jssc = new JavaStreamingContext( "local[2]", "JavaNetworkWordCount", new Duration(1000) ) JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);

Page 6: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING - EXAMPLEProblem: You can to do the word count every second.

Step 2: Split each line into words

//Run a split function on each line with the help of flatMap //Create an Stream on top of the Array of Words JavaDStream<String> words = lines.flatMap( new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String x) { return Arrays.asList(x.split(" ")); } });

Page 7: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING - EXAMPLEProblem: You can to do the word count every second.

Step 3: With the help of Map() function create key-value on each word. Key is word and value is 1

JavaPairDStream<String, Integer> pairs = words.map( new PairFunction<String, String, Integer>() { public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } });

Page 8: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING - EXAMPLEProblem: You can to do the word count every second.

Step 4: Using reduceByKey Action, find sum of counts (1’s). Create a DStream on top of the counts’ array

JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey( new Function2<Integer, Integer, Integer>() { public Integer call(Integer i1, Integer i2){ return i1 + i2; } });

Step 5: wordCounts.print();

Page 9: Apache Spark Streaming -  bigdata.com

Sandeep GiriHadoop

SPARK STREAMING - DSTREAM

Page 10: Apache Spark Streaming -  bigdata.com

Recommended