Date post: | 24-Jan-2015 |
Category: |
Technology |
Upload: | acunu |
View: | 2,005 times |
Download: | 0 times |
Realtime Analytics with Cassandra
Acunu Analytics
Tom Wilkie, Acunu21st August 2012
Analytics
• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?
2
Analytics
• Motivation / alternatives
• What is it?• How does it work?• Approximate Analytics• Whats it good for?
3
Analytics
Why bother?
“Companies that can harness big data will trample data incompetents”
The Economist, May 26th 2011
4
Analytics
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
5
Analytics
Live & historicalaggregates... Trends... Drill downs
and roll ups
Combining “big” and “real-time” is hard
6
Analytics7
Solution Con
Scalability$$$
Not realtime
Spartan query semantics => complex, DIY solutions
Analytics
• Motivation / alternatives• What is it?
• How does it work?• Approximate Analytics• Whats it good for?
8
Analytics
• Aggregate incrementally, on the fly• Store live + historical aggregates
events
counterupdates
Acunu Analytics
Click streamSensor data
etc
Analytics
{time : TIME(HOUR; MIN; SEC),page : PATH(/),category : STRING,loadTime : LONG
}
{select : ["COUNT", "AVG(loadTime)"],where : “time, ?path”,group : “time, ?category”
}
10
Analytics11
Dashboard UI
Analytics
• Motivation / alternatives• What is it?• How does it work?
• Approximate Analytics• Whats it good for?
12
Analytics
countgrouped by ...
daycount
distinct (session)
count ... geography
... browseravg(duration)
13
Analytics
time : TIME(HOUR; MIN; SEC),cust_id : LONG,session_id : LONG,geography : STRING,browser : STRING,load_time : LONG
Data Definition
{ select: “COUNT” patterns: [ { where : “?time”, group : “?time” }, { where : “”, group : “geography” }, { where : “”, group : “browser” } ]}, { select: [“COUNT_DISTINCT(session_id)”, “AVG(load_time)”], where: “time”, group: “”}
QueryPatterns
14
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :00→19 :02→104 ...
... ...
UK all→228 user01→1 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
15
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :00→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
16
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :00→19 :02→104 ...
... ...
UK all→228 user01→1 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
17
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
18
where time 21:00-22:00count(*)
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
19
where time 21:00-22:00count(*)
where time 22:00-23:00, group by minute
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
20
where time 21:00-22:00count(*)
where time 22:00-23:00, group by minute
where geography=UK group all by user,
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
21
where time 21:00-22:00count(*)
where time 22:00-23:00, group by minute
where geography=UK group all by user,
count all
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
22
where time 21:00-22:00count(*)
where time 22:00-23:00, group by minute
where geography=UK group all by user,
count all
group all by geo
Analytics
• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics
• Whats it good for?
23
Analytics
Approximate Analytics
Exact
Large ScaleReal-time
24
Analytics
Count Distinct
Plan A: keep a list of all the things you’ve seen count them at query time
Quick to update ... but at scale ...Takes lots of spaceTakes a long time to query
25
Analytics
Approximate Distinct
xitem
00101001110...
hash max so far
22leading zeroes
y 11010100111... 0 2z 00011101011... 3 3
...
max # leading zeroes seen so far
... to see a max of M takes about 2M items
26
Analytics
Approximate Distinct
to reduce var, average over m=2k sub-streams
xitem
00101001110...
hash
0, 0
index, zeroes max so far
0,0,0,0y 11010100111... 3, 1 0,0,1,0z 00011101011... 0, 1 1,0,1,0
...
take the harmonic mean
27
Analytics
• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?
28
Analytics
Was it worth it?
29
Analytics
• Ad Hoc: same queries, but without the need to pre-define them
• Geolocation: support for location-based events and queries
• Drill down: see the events that make up any given aggregate
30
What’s Coming?
Analytics
• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?
31
Analytics
Manufacturing
Systems Monitoring
Financial Services
Social Media Ad Analytics
Oil + Gas
Analytics
“Up and running in about 4 hours”
“We found out a competitor was scraping our data”
“We keep discovering use cases we hadn’t thought of ”
Analytics
Analytics
www.acunu.com @acunu
Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation.
35