GitHubJohn NunemakerMongoChicago 2012
November 12, 2012
MongoDB for AnalyticsA loving conversation with @jnunemaker
BackgroundHow hernias can be good for you
1 monthOf evenings and weekends
18 monthsSince public launch
10-15 MillionPage views per day
2.7 BillionPage views to date
13 tiny servers2 web, 6 app, 3 db, 2 queue
requests/sec
ops/sec
cpu %
lock %
ImplementationHow we do what we do
Doing It (mostly) LiveNo aggregate querying
get('/track.gif') do track_service.record(...) TrackGifend
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) endend
class TrackProcessor def run loop { process } end
def process record @client.get(@queue) end
def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) endend
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates)
Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self) Search.record(self) Notification.record(self) View.record(self) endend
class Resolution def record(hit) query = {'_id' => "..."} update = {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1
collection(hit.created_on) .update(query, update, :upsert => true) end endend
Pros
ProsSpace
ProsSpace
RAM
ProsSpace
RAM
Reads
ProsSpace
RAM
Reads
Live
Cons
ConsWrites
ConsWrites
Constraints
ConsWrites
Constraints
More Forethought
ConsWrites
Constraints
More Forethought
No raw data
http://bit.ly/rt-counters
http://bit.ly/rt-counters2
Time FrameMinute, hour, month, day, year, forever?
# of VariationsOne document vs many
Single DocumentPer Time Frame
{ "t" => 336381, "u" => 158951, "2011" => { "02" => { "18" => { "t" => 9, "u" => 6 } } }}
{ '$inc' => { 't' => 1, 'u' => 1, '2011.02.18.t' => 1, '2011.02.18.u' => 1, }}
Single DocumentFor all ranges in time frame
{ "_id" =>"...:10", "bx" => { "320" => 85, "480" => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359, "768" => 4515, "900" => 3833, "1024" => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 }}
{ "_id" =>"...:10", "bx" => { "320" => 85, "480" => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359, "768" => 4515, "900" => 3833, "1024" => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 }}
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1, 'by.768' => 1, }}
Many DocumentsSearch terms, content, referrers...
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables", "sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 },]
Writes{'_id' => "#{sid}:#{hash}"}
Reads[['sid', 1], ['v', -1]]
GrowthDon’t say shard, don’t say shard...
Partition Hot DataCurrently using collections for time frames
[ "content.2011.7", "content.2011.8", "content.2011.9", "content.2011.10", "content.2011.11", "content.2011.12", "content.2012.1", "content.2012.2", "content.2012.3", "content.2012.4",]
[ "resolutions.2011", "resolutions.2012",]
Move
MoveBigintMove
MoveBigintMoveMakeYouWannaMove
MoveBigintMoveMakeYouWannaMoveDaMove
MoveBigintMoveMakeYouWannaMoveDaMoveSmoothMove
MoveBigintMoveMakeYouWannaMoveDaMoveSmoothMoveNightMove
MoveBigintMoveMakeYouWannaMoveDaMoveSmoothMoveNightMoveDanceMove
Bigger, Faster ServerMore CPU, RAM, Disk Space
UsersSitesContentReferrersTermsEnginesResolutionsLocations
UsersSitesContentReferrersTermsEnginesResolutionsLocations
Partition by FunctionSpread writes across a few servers
Users
Sites
Content
Referrers
Terms
Engines
Resolutions
Locations
Partition by ServerSpread writes across a ton of servers, way down the road, not worried yet