Harmony in Tune
Philip (flip) KromerHuston Hoburg infochimps.com
Feb 15 2013
How we Refactored Cube to Terabyte Scale
Big Data for All
Big Data for All
why dashboards?
Lightweight Dashboards
• Understand what’s happening
• Understand data in context
• NOT exploratory analytics
• real-time insight...but not just about real-time
mainline: j.mp/sqcube
hi-scale branch: j.mp/icscube
The “Church of Graphs”
Predictive Kvetching
Lightweight Dashboards
Approach to Tuning
• Measure: “Why can’t it be faster?”
• Harmonize: “Use it right”
• Tune: “Align it to production resources”
cube is awesome
What’s so great?• Streaming, real-time
• Ad-hoc data: write whatever you want
• Ad-hoc queries: make up new queries whenever
• Efficient (“pyramidal”) calculations
Event Stream
• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }
Events vs Metrics
• { time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }
Event:
• “# of tweets in 10s bucket at 1:02:10 on 2013-02-15”
• “# of non-english-language tweets in 1hr bucket at ...”
Metrics:
Events vs Metrics
• { time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
Event:
Metrics:
• “# of requests in 10s bucket at 3:05:10 on 2013-02-15”
• “Average duration of requests with 4xx status in the 5 minute bucket at 3:05:00 on 2013-02-15”
Events vs Metrics• Events:
• baskets of facts
• narcissistic
• LOTS AND LOTS
{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
Events vs Metrics• Events:
• baskets of facts
• narcissistic
• LOTS AND LOTS
• Metrics:
• a timestamped number
• look like the graph
• one per time bucket
{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } }
{ time: "2013-02-15T01:02:03Z", value: 90 }
billions and billions
3000 events/second
tuning methodology
Monkey See Monkey Do
Google for the #s the cool kids use
Spinal Tap
Turn everythingto 11!!!!
Hillbilly Mechanic
Rewrite formemcachedHBase onCassandra!!!
Moneybags
SSD plz
Moar CPU
Moar RAM
Moar Replica
Tuning How to do it
• Measure: “Why can’t it be faster?”
• Harmonize: “Use it right”
• Tune: “Align it to production resources”
see throughthe magic
• Why can’t it be faster than it is now?
• dstat (http://j.mp/dstatftw): dstat -drnycmf -t 5
• htop
• mongostat
Grok: client-side
• Made a sprayer to inject data
• invalidate a time range at max speed
• writes variously-shaped data: noise, ramp, sine, etc
• Or just reach into the DB and poke
• delete range of metrics, leave events
• delete range of events, leave metrics
Fault injection
• raise when packet comes in with certain flag
• { time: "2013...", data: {...}, _raise:"db_write" }
• (only in development mode, obvs.)
app-side tracing
• “Metalog” announces lifecycle progress:
• writes to log...
• ... or as cube metrics!
metalog.event('connect', { method: 'ws', ip: connection.remoteAddress, path: request.url }, 'minor');
app-side tracing
fits on machine
• Rate:
• 3000 ev/sec ≈ 250 M ev/day ≈ 2 BILLION/wk
• Expensive. Difficult.
• 250 GB accumulated per day (@1000 bytes/ev)
• 95 TB accumulated per year (@1000 bytes/ev)
3000 events/second
Metrics• Rate:
• 3M tensec/year (π· 107 sec/year)
• < 100 bytes/metric ...
• Manageable!
• a 30 metric dashboard is ~ 10 GB/year @10sec
• a 30 metric dashboard is ~ 170 MB/year @ 5min
20% gains are boring
At scale, your first barriers are either:
• Easy
• Impossible
Metrics: 10 GB/year
Events: 10 TB/month
Scalability síPerformance no
Still CPU and Memory Use
• Problem
• Mongo seems to be working
• but high resident memory and fault rate
• Memory-mapped Files
• 1Tb data served by 4Gb ram is no good
Capped Collections
AA B C D E F
• Fixed size circular queue
• records are in order of insertion
• oldest records are discarded when full
AH C D E F G ......G
Capped Collections
• Extremely efficient on write
• Extremely efficient for insertion-order reads
• Very efficient if queries are ‘local’
• events in same timebucket typically arrived at nearby timesand so are nearby on disk
AA B C D E F
don’t like the answer?
change the question.
uncapped events
capped metrics:
metrics are a view on data
mainline
capped events
uncapped metrics:
events are ephemeral
hi-scale branch
Harmony
• Make your pattern of accessmatch your system’s strengths and rhythm
Validate Mental Model
Easy fixes
• Duplicate requests = duplicate calculations
• Cube patch for request queues exists
• Easy fix!
• Non-pyramidal are inefficient
• Remove until things are under control
• ( solve paralyzing problems first )
cube 101
Cube Systems
Collector
• Receives events
• writes to MongoDB
• marks metrics for re-calculation (“invalidates”)
Evaluator
• receives, parses requests for metrics
• calculates metrics “pyramidally”
• then stores them, cached
Pyramidal Aggregation
10 20 15 25 10 10
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 6 0 0 1 0 3
90
ev ev ev ev ev ev ...
10s
1min
5min
Pyramidal Aggregation
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2
ev ev ev ev ev ev ...
10s
1min
5min
Uses Cached Results
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2
ev ev ev ev ev ev ...
10 20 15 25 10
5 5 4 6 4 1 2 7 0 0 0 1 10s
1min
5min
Pyramidal Aggregation
5 min
1 min
10 sec
ev ev ev ev ev....
• calculates metrics...
• from metrics and constants ... from metrics ...
• from events
• (then stores them, cached)
fast writes
how fast can we write?
how fast can we write?
FASTstreaming writes: way efficient
locked out
Writes and Invalidations
Inserts Stop Every 5s
• working
• working
• ANGRY
• ANGRY
• working
• working
Thanks, mongostat!
• working
• working
• ANGRY
• ANGRY
• working
• working
...
(simulated)
Inserts Stop Every 5sEvents Collection
AH C D E F G ......G
hi-speed writes localized reads
Metrics Collection. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . ..
. ..
randomishreads
hi-speeddeletes
xxxxxxx
updates
Inserts Stop Every 5sEvents Collection
AH C D E F G ......G
hi-speed writes localized reads
Metrics Collection. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . ..
. ..
randomishreads
hi-speeddeletes
xxxxxxx
updates
Inserts Stop Every 5s• What’s really going on?
• Database write locks
• Events and metrics have conflicting locks
• Solution: split the databasesEvents Collection
AH C D E F G ......G
hi-speed writes localized reads
Metrics Collection. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . ..
. ..
randomishreads
hi-speeddeletes
xxxxxxx
fast reads
Pre-cache Metrics
• Keep metrics fresh (Warmer)
• Only calculate recent updates (Horizons)
fancy metrics
Non-pyramidal Aggregates
• Can’t calculate from warmed metrics
• Store values with counts in metrics
• Counts can be vivified for aggregations
• Smaller footprint than full events
• Works best for dense, finite values
finally, scaling
Multicore
• MongoDB
• Writes limited to single core
• Requires sharding for multicore
Multicore
• Cube (node.js)
• Concurrent, but not multi-threaded
• Easy solution
• Multiple collectors on different ports
• Produces redundant invalidations
• Requires external load balancing
Multicore
Hardware
• High Memory
• Capped events size scale with memory
• CPU
• Mongo / cube not optimized for multicore
• Faster cores
• EC2 Best value: m2.2xlarge
• < $700/mo, 34.2GB RAM, 13 bogo-hertz
Cloud helps
• Tune machines to application
• Dedicating databases for each application makes life a lot easier
Cloud helps
• Tune machines to application
•
github.com/ infochimps-labs
good ideas that didn’t help
Queues
• Different queueing methods
• Should optimize metric calculations
• No significant improvement
Locks: update VS remove
• Uncapped metrics allow ‘remove’ as invalidation option
• Remove doesn’t help with database locks
• It was a stupid idea anyway: that’s OK
• “Hey, poke it and see what happens!”
Mongo Aggregations
• Mongo has aggregations!
• Node ends up working better
• Mongo aggregations aren’t faster
• Less flexible
• Would require query language rewrite
Why not Graphite?
• Data model
• Metrics-centric vs Events-centric(metrics code not intertwingled with app code)
• Environment familiarity
• Cube: d3, node.js, mongo
• Graphite: Django, Whisper, C