Data integra*on with IBM InfoSphere Streams
Sco7 Schneider Gabriela Jacques da Silva
{sco7.a.s,g.jacques}@us.ibm.com IBM Research
2
Streams Programming Model
• Streams applica*ons are data flow graphs that consist of: – Tuples: structured data item – Operators: reusable stream analy*cs – Streams: series of tuples with a fixed type – Processing Elements: operator groups in execu*on
3
x86 host x86 host x86 host x86 host x86 host
PE PE
Sink
Source
Source PE
PE
PE PE
Sink
Sink
Streams Run*me
(Job management, Security,
Con*nuous Resource Management)
Source è Compila*on è Execu*on
3
SPL source
PE
PE
PE
PE
PE
PE
PE
PE
Conn
ec*o
ns
Source
Sink
PE
SPL compiler
Applica*on Example composite Main { !type! Entry = int32 uid, rstring server, ! rstring msg; ! Sum = uint32 uid, int32 total; !graph ! stream<Entry> Msgs = ParSource() { ! param servers: "logs.*.com"; ! partitionBy: server; ! } !! stream<Sum> Sums = Aggregate(Msgs) { ! window Msgs: tumbling, time(5), ! partitioned; ! param partitionBy: uid; ! } !! stream<Sum> Suspects = Filter(Sums) { ! param filter: total > 100; ! } !! () as Sink = FileSink(Suspects) { ! param file: "suspects.csv"; ! }!} !
4
ParSrc
Aggr
Filter
Sink
ParSrc
Aggr
Filter
ParSrc
Aggr
Filter
Sink
ParSrc
Aggr
Filter
Data Formats are Fundamental type ! Trade = decimal64 price, decimal64 volume; ! Quote = decimal64 briprice, decimal64 askprice, ! decimal64 asksize; ! TradesAndQuotes = Trade, Quote, ! tuple<rstring ticker, rstring dayAndTime> !
type ! User = rstring name, uint64 id, rstring screen_name; ! Twitter = User user, rstring text, rstring retweet_count; !
type ! CDR = rstring calling, rstring called, ! timestamp startTime, timestamp connectTime; !
type ! Video = blob video; !
Avid Movie Goer Buzz During Super Bowl
Project X
John Carter
Battleship
Ghost Rider
21 Jump Street
The Dark Knight Rises
G.I. Joe
Spider-man
The Avengers
Act of Valor
The Lorax The Dictator
Social Media Analy*cs Architecture
Social Media Consumer Profiles
Customer Models
InfoSphere Streams
InfoSphere BigInsights
Entity Integration
Predictive Analytics
Data Ingest & prep.
Text Analytics: Timely Insights
Entity Integration:
Profile Resolution
Predictive Analytics:
Action Determination
Social Media Data
Online Flow: Data-‐in-‐mo0on analysis
Text Analytics
Offline Flow: Data-‐at-‐rest analysis
Timely Decisions
Social Media Data
Customer Database
Consumer Lists
Customer & Prospect
profiles
Entity Integration