+ All Categories
Home > Technology > NoSQL and Big Data Analytics at NOSQL NOW! 2013

NoSQL and Big Data Analytics at NOSQL NOW! 2013

Date post: 15-Jan-2015
Category:
Upload: acunu
View: 3,544 times
Download: 1 times
Share this document with a friend
Description:
This presentation by Tim Moreton at NoSQL NOW! 2013 looks at the history of doing analytics in NoSQL databases. We look at the relative strengthes of normalized and denormalized approaches, and look at how Twitter and Facebook have built custom denormalized systems over NoSQL to support real-time analytics. We look at the lambda architecture, and show how Acunu Analytics provides OLAP cubes over NOSQL, combining denormalization with expressive SQL-like queries. You can see the full talk here: http://www.slideshare.net/Dataversity/nosql-and-big-data-analytics
Popular Tags:
18
NOSQL and Big Data Analytics Tim Moreton Founder and CTO
Transcript
Page 1: NoSQL and Big Data Analytics at NOSQL NOW! 2013

NOSQL and Big Data Analytics

Tim MoretonFounder and CTO

Page 2: NoSQL and Big Data Analytics at NOSQL NOW! 2013

In the beginning, NOSQL was about storage

Page 3: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Google Personalized Search, 2006

profiles

Serve customised search results using user profiles

(read only, low latency)

Collect user queries, clickstream(write only, high throughput)

user_id

searches clicks

BigTable

MapReduce via GFS

Out-of band batch analysis to produce user profiles

Page 4: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Discovery Analytics

UnstructuredWarehouses

Data Mining

?Machine Learning

Operational Intelligence

Dashboards Real-time Decisions

Alerting

!

Complex, long-runningTotal lack of structure

Low latency, fresh data Some structure to exploit

When NOSQL, when Hadoop?

Page 5: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Normalization and its limits

For each update:A few random writes

For each query:Many random reads

Page 6: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Denormalization

For each query:One sequential read

For each update:Many writes, sequential IO

Page 7: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Building block: Distributed counters

+1

+1

+1+1

Total tweets

@timmoreton

2013-08-12

By date

By user

752

+1

+1

CASSANDRA

HBASE

RIAK

UPDATE table SET col = col + 1 WHERE id = 2;

curl -i http://host:8098/buckets/x/counters/count2 -X POST -d "1"

table.incrementColumnValue(row, cf, col, 1);

Page 8: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Twitter’s Rainbird

Source: Twitter

Page 9: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Facebook’s Puma, ODS, Claspin

Source: Facebook

Page 10: NoSQL and Big Data Analytics at NOSQL NOW! 2013

"I believe firmly that ... you should "denormalize" only as a last resort. That is, you should back off from a fully normalized design only if all other strategies for improving performance have somehow failed to meet requirements."

C J Date 2005

Page 11: NoSQL and Big Data Analytics at NOSQL NOW! 2013
Page 12: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Denormalization and agility

Page 13: NoSQL and Big Data Analytics at NOSQL NOW! 2013

‘Lambda Architecture’

http://www.josemalvarez.es/web/wp-content/uploads/2013/03/toy-lambda-arch.png

Page 14: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Acunu Analytics

count by day count by hour of day

uniques by hashtagraw events

2 New events update cubes

1 Define aggregate cubesCREATE CUBE APPROX TOP(hashtag) WHERE browser, time GROUP BY time

3 Rich instant queries over cubesSELECT TOP(x) FROM t WHERE ..GROUP BY d1, d2, ... JOIN ... HAVING.. ORDER BY ..

+

4 Drilldown to raw events5 Backfill new cubes using historic data

Page 15: NoSQL and Big Data Analytics at NOSQL NOW! 2013

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interfaceAPI

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

Cassandra stores raw events and aggregates

Acunu Analytics manages cubes and maps inserts and SQL-like queries to Cassandra reads and writes

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interface

PROCESSING AT INGEST

JSON, CSV, log ingest

via RESTful HTTP API, Flume, Storm, AMQP

Storm, MQ HTTP

Acunu Dashboards provides rich, real-time, embeddable visualizations

SELECT AVG(r) FROM metrics GROUP BY host;

AQL Alerting

!Cubes

MILLISECOND QUERIES

API

event stream

event store

roll-upcubes

Ingest Processing

dashboard queries programatic interfaceAPI for rich queries,threshold alerting

Acunu Analytics

Page 16: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Conclusions

NoSQL is a great fit for collecting or serving datasetswith some structure at high scale, performance, availability

Real-time Big Data apps can’t use unplanned rich queries

Use atomic counters to pre-materialize quantitative results in real-time -- but think carefully about flexibility

Do analytics out-of-band if timeliness is unimportant

A lambda architecture combines real-time with richer processing, but adds complexity

Acunu Analytics offers real-time OLAP-style queries

Page 17: NoSQL and Big Data Analytics at NOSQL NOW! 2013

Recommended