+ All Categories

Streams

Date post: 19-Jun-2015
Category:
Upload: marielle-lange
View: 166 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
Streams of social consciousness Real-time data transformation
Transcript
Page 1: Streams

Streams of social consciousness

Real-time data transformation

Page 2: Streams

Who am I?

PsycholinguistResearch/Data

analysis

Flex ProgrammerOO, Enterprise

Interactive Developer

Browser + Server

2000 2008 2013

Marielle Lange @widged

Page 3: Streams

Stream expertiseFairly recent and rather limited

๏Gulp -> custom modules written by adapting other modules.

๏Data analysis -> Using streams to process large size data sets.

➡ I will Attempt to provide the minimal orientation to get started. Staying clear of complex topics like back-pressure handling.

Page 4: Streams

Streams for data analysisGarden Data. Aggregating data scrapped from a large number of websites. Parsing them. Normalizing them (Farenheit vs Celsius, March in NH or SH). Reducing them (converting [55-65] to 55 #1, 60 #1, 65 #1). Rendering them (average vs visualisation).

Page 5: Streams

Streams?

Streams manage a data flow.

Sources. Where data pour from.

Sinks. Where results pour to.

Throughs. Where data gets manipulated and transformed.

ReadStream.

WriteStream.

Page 6: Streams

What are they good for?๏ Gulp - writing your own modules.

๏ Real-time data obtained from remote servers that would be too impractical to buffer in a device with limited memory.

๏ Map-reduce types of computations - a programming model for processing and generating large data sets. A map function generates a set of intermediate key/value pairs ({word: ‘hello’, length: 5}) and a reduce function merges all intermediate values associated with the same intermediate key ([‘agile’ , ‘greet’ ,‘hello’] - list of words of length 5). Great if you want to run computations on distributed systems.

Page 7: Streams

Streams 101

Page 8: Streams

Readable Streams

Abstraction for a source of data that you are reading data from.

‣ http responses, on the client‣ http requests, on the server‣ fs read streams‣ zlib streams‣ crypto streams‣ tcp sockets‣ child process stdout and stderr‣ process.stdin

Notes

๏A readable stream will not start emitting data until you indicate that you are ready to receive it.

๏Readable streams have two “modes”: a flowing mode and a non-flowing mode.

var flappyStream = readable.read();

Page 9: Streams

Writable Streams

Abstraction for a destination that you are writing data to.‣ http responses, on the client‣ http requests, on the server‣ fs write streams‣ zlib streams‣ crypto streams‣ tcp sockets‣ child process stdin‣ process.stdout, process.stderr

writeable .write(flappyBird);

Page 10: Streams

Transforms

Compressing a file using gzip

var fs     = require(“fs”), zlib   = require(“fs”);var readable = fs.createReadStream("foo.txt"), writable = fs.createWriteStream("foo.txt.gz");

readable   .pipe(gzip)   .pipe(writable);

var evilStream = transform.output .read();

transform.input.write(flappyBird);

Abstraction for a stream that is both readable and writable, where the input is related to the output (map or filter step).

Dominic Tarr’s `through` module provides a similar functionality

Page 11: Streams

Basic APIReadable stream

var fs = require('fs');

var readable = fs.createReadStream('foo.txt');

// this is the classic apireadable .on('data', function (data) { console.log('Data!', data); }) .on('error', function (err) { console.error('Error', err); }) .on('end', function () { console.log('All done!'); });

var fs = require('fs');

var readable = fs.createReadStream('foo.txt') , writable = fs.createWriteStream('copy.txt');

readable.pipe(writable) .on('finish', function () { writable.write('an extra line'); });

Writable stream

Page 12: Streams

Toolbox

Page 13: Streams

event-stream (D. Tarr)

var fs     = require(“fs”), JSONStream = require('JSONStream'), map = require('map-stream');

var input = fs.createReadStream("twitter-feed.json"), output = fs.createWriteStream("twitter-sentiments.json");

input .pipe(JSONStream.parse("*")) .pipe(map(computeSentiments)) .pipe(output);

Page 14: Streams

Stream playground (J. Resig)

Page 15: Streams

Stream handbook (@Substack)

Page 16: Streams

Rapidly define a list of files to read from with glob strings

Vinyl

var fs = require('fs'), vinyl = require('vinyl-fs')vinyl.src('./data/*/quad/*.comp.json', { buffer: false }).pipe(map(mapSource));function mapSource(file, asyncReturn) { var srcStream = file.contents; srcStream .pipe(JSONStream.parse("*")) .pipe(SomeAnalysis) .pipe(vinyl.dest("./out"))};

Page 17: Streams

Example

Page 18: Streams

Twitter SentimentsRegister an application with the Twitter API – https://dev.twitter.com/

Create an access token.

In your projects, add a file “secret_keys.js” with:

Takes advantage of the sentiment module:

https://github.com/thisandagain/sentiment

module.exports = { twitter : { consumer_key: "YOUR_CONSUMER_KEY", consumer_secret: "YOUR_CONSUMER_SECRET", access_token_key: "USER_ACCESS_TOKEN", access_token_secret: "USER_ACCESS_TOKEN_SECRET" }};

Page 19: Streams

Programming Style

Page 20: Streams

Separation of concernsThe #1 reason to use streams for me is that the piping structure encourages the writing of programs as bite-size modules that are highly interchangeable.

In the early stages of writing the example program, I had:

tweets .pipe(map(englishOnly)) .pipe(map(addSentiment))

Then I found out that the API gave you the option to specify a language filter. All I had to do was drop one line of code.

Page 21: Streams

Functional ProgrammingA more functional style of programming encourages the avoidance of side effects or state mutation.

var fs     = require(“fs”), map   = require(“map-stream”);var readable = fs.createReadStream("foo.txt"),

readable   .pipe(map(filterEnglish))

function filterEnglish(data, asyncReturn) {   if(data.language === “en”) { // write these data to the output stream      asyncReturn(null, data);   } else { // but don’t write these.      asyncReturn();   } }

๏ Single Responsibility Principle: "A function should do one thing, and do it well."

๏ Pure functions. No knowledge of the external world whatsoever. Every bit of information required for the running of the function is explicitly passed as paramter.

๏ Immutable data. A function returns a new data that captures the transformation rather than a reference to the old data.

๏ Higher Order Functions. Functions that return functions (partials, currying). A way to capture local context through closure.


Recommended