Real Time Data Analytics
Pre-aggregation with counters
© Copyright 2010 10gen Inc.
Goals
• Dashboard style reports• (Known) Reports• Real-time numbers
Framework
• Know your metrics/counter• Prepared reports• Calculate during write• Fast queries• Always up to date data• Record time-series collections
Rationale
• Documents are updated in-place*• $inc update operator• Working set is small• Aggregations are much smaller*
Dashboard
JavaScript Java Ruby Python
16 27 42 45
55436497
231435433401
9212342
1234
Projects Lines Events
Monday TuesdayThursday Friday
Demo Dashboard
Roads not traveled
• Map/Reduce• Reprocess raw data• Now possible to do partial reduce
• Aggregation Framework (aggregate in 2.2)• Also reprocess data on operation (initial release)• Optimizations to come
• More costly during reads
Not Appropriate For
• Ad-hoc aggregations (unknown metrics)• One-off reports• Possibly complex calculations
Processing
• Event received• Split into many updates w/$inc• Aggregate
• Input Field(s)• Time periods (hourly, daily, monthly)• Defined Metrics
Example Data: github> db.events.findOne() {
"repository" : {
"url" : "https://github.com/vidageek/games",
...
"open_issues" : 25,
"watchers" : 6,
"pushed_at" : "2012/03/10 08:34:00 -0800",
"language" : "Java"
},
"actor_attributes" : {...},
"created_at" : "2012/03/11 15:20:24 -0700",
"public" : true,
"actor" : "juliano",
"payload" : {...},
"url" : "https://github.com/...",
"type" : "CommitCommentEvent” }
Define Metrics
• “actor”• “repository.name”• “repository.language”• “type”
PushEvent, IssuesEvent, WatchEvent, GistEvent
• “payload.ref” efs/heads/improved_history, refs/heads/master, refs/heads/signs
Aggregations
TimePeriod, type #
TimePeriod, author #
TimePeriod, project #
Stats Collections
stats_[hourly/daily/monthly].actors
stats_[hourly/daily/monthly].projects
stats_[hourly/daily/monthly].langs
stats_[hourly/daily/monthly].types
Stats
> db.stats_hourly.types.find({"_id.type":"GistEvent"}) {
"_id" : {
"p" : ISODate("2012-05-21T00:00:00Z"),
"type" : "GistEvent” },
"hour" : {
"2" : { "count" : 65 },
"3" : { "count" : 2 },
"7" : { ”count" : 130},
"8" : { "count" : 5 } },
"total" : { ”count" : 202 } }
Updates Increment
Query:
{ ”p" : Date(…), "actor" : "neoplastic"}}
Update:
{ "$inc" : { "h.21.c" : 1 , "t.c" : 1}}
Upsert : true
Query/Graphing
• Select by grouping (by date, by type/value)• Documents hold many data points
The Whys
• Writing more data up front, helps with reads• Multiple data points per document• Documents hold many timed points• Good for graphs by time, or types• Nested for improved performance
Thanks for coming… ne questions
© Copyright 2010 10gen Inc.
drivers at mongodb.org
RESTActionScript3C# and .NETClojureColdFusionDelphiErlangF#Go: gomongoGroovyHaskellJavascriptLua
CC#C++ErlangHaskellJavaJavascriptPerlPHPPythonRuby
node.jsObjective CPHPPowerShellBlog postPythonRubyScalaScheme (PLT)Smalltalk: Dolphin Smalltalk
Community Supportedmongodb.org Supported
@mongodb
© Copyright 2010 10gen Inc.
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongofb
Facebook | Twitter | LinkedIn http://linkd.in/joinmongo
download at mongodb.org
support, training, and this talk brought to you by