+ All Categories
Home > Documents > MongoDB for Analytics

MongoDB for Analytics

Date post: 01-Dec-2014
Category:
Upload: mongodb
View: 5,132 times
Download: 3 times
Share this document with a friend
Description:
The flexibility of MongoDB makes it perfect for storing analytics. I'll discuss a few patterns for storing data that we have learned while growing Gaug.es from zero to millions of page views a day. You'll leave with a desire to measure everything and the ability to do it.
67
GitHub John Nunemaker MongoChicago 2012 November 12, 2012 MongoDB for Analytics A loving conversation with @jnunemaker
Transcript
Page 1: MongoDB for Analytics

GitHubJohn NunemakerMongoChicago 2012

November 12, 2012

MongoDB for AnalyticsA loving conversation with @jnunemaker

Page 2: MongoDB for Analytics

BackgroundHow hernias can be good for you

Page 3: MongoDB for Analytics
Page 4: MongoDB for Analytics
Page 5: MongoDB for Analytics

1 monthOf evenings and weekends

Page 6: MongoDB for Analytics

18 monthsSince public launch

Page 7: MongoDB for Analytics

10-15 MillionPage views per day

Page 8: MongoDB for Analytics

2.7 BillionPage views to date

Page 9: MongoDB for Analytics

13 tiny servers2 web, 6 app, 3 db, 2 queue

Page 10: MongoDB for Analytics

requests/sec

Page 11: MongoDB for Analytics

ops/sec

Page 12: MongoDB for Analytics

cpu %

Page 13: MongoDB for Analytics

lock %

Page 14: MongoDB for Analytics

ImplementationHow we do what we do

Page 15: MongoDB for Analytics

Doing It (mostly) LiveNo aggregate querying

Page 16: MongoDB for Analytics
Page 17: MongoDB for Analytics
Page 18: MongoDB for Analytics

get('/track.gif') do track_service.record(...) TrackGifend

Page 19: MongoDB for Analytics

class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) endend

Page 20: MongoDB for Analytics

class TrackProcessor def run loop { process } end

def process record @client.get(@queue) end

def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) endend

Page 21: MongoDB for Analytics

http://bit.ly/rt-kestrel

Page 22: MongoDB for Analytics

class Hit def record site.atomic_update(site_updates)

Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self) Search.record(self) Notification.record(self) View.record(self) endend

Page 23: MongoDB for Analytics

class Resolution def record(hit) query = {'_id' => "..."} update = {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1

collection(hit.created_on) .update(query, update, :upsert => true) end endend

Page 24: MongoDB for Analytics

Pros

Page 25: MongoDB for Analytics

ProsSpace

Page 26: MongoDB for Analytics

ProsSpace

RAM

Page 27: MongoDB for Analytics

ProsSpace

RAM

Reads

Page 28: MongoDB for Analytics

ProsSpace

RAM

Reads

Live

Page 29: MongoDB for Analytics

Cons

Page 30: MongoDB for Analytics

ConsWrites

Page 31: MongoDB for Analytics

ConsWrites

Constraints

Page 32: MongoDB for Analytics

ConsWrites

Constraints

More Forethought

Page 33: MongoDB for Analytics

ConsWrites

Constraints

More Forethought

No raw data

Page 34: MongoDB for Analytics

http://bit.ly/rt-counters

http://bit.ly/rt-counters2

Page 35: MongoDB for Analytics

Time FrameMinute, hour, month, day, year, forever?

Page 36: MongoDB for Analytics

# of VariationsOne document vs many

Page 37: MongoDB for Analytics

Single DocumentPer Time Frame

Page 38: MongoDB for Analytics
Page 39: MongoDB for Analytics

{ "t" => 336381, "u" => 158951, "2011" => { "02" => { "18" => { "t" => 9, "u" => 6 } } }}

Page 40: MongoDB for Analytics

{ '$inc' => { 't' => 1, 'u' => 1, '2011.02.18.t' => 1, '2011.02.18.u' => 1, }}

Page 41: MongoDB for Analytics

Single DocumentFor all ranges in time frame

Page 42: MongoDB for Analytics
Page 43: MongoDB for Analytics

{ "_id" =>"...:10", "bx" => { "320" => 85, "480" => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359, "768" => 4515, "900" => 3833, "1024" => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 }}

Page 44: MongoDB for Analytics

{ "_id" =>"...:10", "bx" => { "320" => 85, "480" => 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359, "768" => 4515, "900" => 3833, "1024" => 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 }}

Page 45: MongoDB for Analytics

{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1, 'by.768' => 1, }}

Page 46: MongoDB for Analytics

Many DocumentsSearch terms, content, referrers...

Page 47: MongoDB for Analytics
Page 48: MongoDB for Analytics

[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables", "sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 },]

Page 49: MongoDB for Analytics

Writes{'_id' => "#{sid}:#{hash}"}

Page 50: MongoDB for Analytics

Reads[['sid', 1], ['v', -1]]

Page 51: MongoDB for Analytics

GrowthDon’t say shard, don’t say shard...

Page 52: MongoDB for Analytics

Partition Hot DataCurrently using collections for time frames

Page 53: MongoDB for Analytics

[ "content.2011.7", "content.2011.8", "content.2011.9", "content.2011.10", "content.2011.11", "content.2011.12", "content.2012.1", "content.2012.2", "content.2012.3", "content.2012.4",]

Page 54: MongoDB for Analytics

[ "resolutions.2011", "resolutions.2012",]

Page 55: MongoDB for Analytics

Move

Page 56: MongoDB for Analytics

MoveBigintMove

Page 57: MongoDB for Analytics

MoveBigintMoveMakeYouWannaMove

Page 58: MongoDB for Analytics

MoveBigintMoveMakeYouWannaMoveDaMove

Page 59: MongoDB for Analytics

MoveBigintMoveMakeYouWannaMoveDaMoveSmoothMove

Page 60: MongoDB for Analytics

MoveBigintMoveMakeYouWannaMoveDaMoveSmoothMoveNightMove

Page 61: MongoDB for Analytics

MoveBigintMoveMakeYouWannaMoveDaMoveSmoothMoveNightMoveDanceMove

Page 62: MongoDB for Analytics

Bigger, Faster ServerMore CPU, RAM, Disk Space

Page 63: MongoDB for Analytics

UsersSitesContentReferrersTermsEnginesResolutionsLocations

UsersSitesContentReferrersTermsEnginesResolutionsLocations

Page 64: MongoDB for Analytics

Partition by FunctionSpread writes across a few servers

Page 65: MongoDB for Analytics

Users

Sites

Content

Referrers

Terms

Engines

Resolutions

Locations

Page 66: MongoDB for Analytics

Partition by ServerSpread writes across a ton of servers, way down the road, not worried yet

Page 67: MongoDB for Analytics

GitHub

Thank [email protected]

John NunemakerMongoChicago 2012November 12, 2012

@jnunemaker


Recommended