Apache Spark Streaming - bigdata.com

Sandeep GiriHadoop

SPARK STREAMINGExtension of the core Spark API: high-throughput, fault-tolerant

Sandeep GiriHadoop

SPARK STREAMING

Workflow • Spark Streaming receives live input data streams • Divides the data into batches • Spark Engine generates the final stream of results in batches.

Provides a discretized stream or DStream - a continuous stream of data.

Sandeep GiriHadoop

SPARK STREAMING - DSTREAMInternally represented using RDD

Each RDD in a DStream contains data from a certain interval.

pairs.reduceByKeyAndWindow(reduceFunc, new Duration(30000), new Duration(10000)); // Reduce last 30 seconds of data, every 10 seconds

Window Operations

Sandeep GiriHadoop

SPARK STREAMING - EXAMPLEProblem: You can to do the word count every second.

Step 1: Create a connection to the service

JavaStreamingContext jssc = new JavaStreamingContext( "local[2]", "JavaNetworkWordCount", new Duration(1000) ) JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);

Sandeep GiriHadoop


Step 2: Split each line into words

//Run a split function on each line with the help of flatMap //Create an Stream on top of the Array of Words JavaDStream<String> words = lines.flatMap( new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String x) { return Arrays.asList(x.split(" ")); } });

Sandeep GiriHadoop


Step 3: With the help of Map() function create key-value on each word. Key is word and value is 1

JavaPairDStream<String, Integer> pairs = words.map( new PairFunction<String, String, Integer>() { public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } });

Sandeep GiriHadoop


Step 4: Using reduceByKey Action, find sum of counts (1’s). Create a DStream on top of the counts’ array

JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey( new Function2<Integer, Integer, Integer>() { public Integer call(Integer i1, Integer i2){ return i1 + i2; } });

Step 5: wordCounts.print();

Sandeep GiriHadoop

SPARK STREAMING - DSTREAM

Date post:	07-Aug-2015
Category:	Technology
Upload:	knowbigdata
View:	230 times
Download:	0 times

Apache Spark Streaming - bigdata.com

Technology