Realtime Analytics with MongoDB Counters (mongonyc 2012)

Post on 22-Apr-2015

4,358 views 2 download

description

Real time analytics with pre-aggregation with counters.

transcript

Real Time Data Analytics

Pre-aggregation with counters

© Copyright 2010 10gen Inc.

Goals

• Dashboard style reports• (Known) Reports• Real-time numbers

Framework

• Know your metrics/counter• Prepared reports• Calculate during write• Fast queries• Always up to date data• Record time-series collections

Rationale

• Documents are updated in-place*• $inc update operator• Working set is small• Aggregations are much smaller*

Dashboard

JavaScript Java Ruby Python

16 27 42 45

55436497

231435433401

9212342

1234

Projects Lines Events

Monday TuesdayThursday Friday

Demo Dashboard

Roads not traveled

• Map/Reduce• Reprocess raw data• Now possible to do partial reduce

• Aggregation Framework (aggregate in 2.2)• Also reprocess data on operation (initial release)• Optimizations to come

• More costly during reads

Not Appropriate For

• Ad-hoc aggregations (unknown metrics)• One-off reports• Possibly complex calculations

Processing

• Event received• Split into many updates w/$inc• Aggregate

• Input Field(s)• Time periods (hourly, daily, monthly)• Defined Metrics

Example Data: github> db.events.findOne() {

"repository" : {

"url" : "https://github.com/vidageek/games",

...

"open_issues" : 25,

"watchers" : 6,

"pushed_at" : "2012/03/10 08:34:00 -0800",

"language" : "Java"

},

"actor_attributes" : {...},

"created_at" : "2012/03/11 15:20:24 -0700",

"public" : true,

"actor" : "juliano",

"payload" : {...},

"url" : "https://github.com/...",

"type" : "CommitCommentEvent” }

Define Metrics

• “actor”• “repository.name”• “repository.language”• “type”

PushEvent, IssuesEvent, WatchEvent, GistEvent

• “payload.ref” efs/heads/improved_history, refs/heads/master, refs/heads/signs

Aggregations

TimePeriod, type #

TimePeriod, author #

TimePeriod, project #

Stats Collections

stats_[hourly/daily/monthly].actors

stats_[hourly/daily/monthly].projects

stats_[hourly/daily/monthly].langs

stats_[hourly/daily/monthly].types

Stats

> db.stats_hourly.types.find({"_id.type":"GistEvent"}) {

"_id" : {

"p" : ISODate("2012-05-21T00:00:00Z"),

"type" : "GistEvent” },

"hour" : {

"2" : { "count" : 65 },

"3" : { "count" : 2 },

"7" : { ”count" : 130},

"8" : { "count" : 5 } },

"total" : { ”count" : 202 } }

Updates Increment

Query:

{ ”p" : Date(…), "actor" : "neoplastic"}}

Update:

{ "$inc" : { "h.21.c" : 1 , "t.c" : 1}}

Upsert : true

Query/Graphing

• Select by grouping (by date, by type/value)• Documents hold many data points

The Whys

• Writing more data up front, helps with reads• Multiple data points per document• Documents hold many timed points• Good for graphs by time, or types• Nested for improved performance

Thanks for coming… ne questions

© Copyright 2010 10gen Inc.

drivers at mongodb.org

RESTActionScript3C# and .NETClojureColdFusionDelphiErlangF#Go: gomongoGroovyHaskellJavascriptLua

CC#C++ErlangHaskellJavaJavascriptPerlPHPPythonRuby

node.jsObjective CPHPPowerShellBlog postPythonRubyScalaScheme (PLT)Smalltalk: Dolphin Smalltalk

Community Supportedmongodb.org Supported

@mongodb

© Copyright 2010 10gen Inc.

conferences, appearances, and meetupshttp://www.10gen.com/events

http://bit.ly/mongofb

Facebook | Twitter | LinkedIn http://linkd.in/joinmongo

download at mongodb.org

support, training, and this talk brought to you by