Date post: | 17-Dec-2014 |
Category: |
Technology |
Upload: | mongodb |
View: | 2,674 times |
Download: | 2 times |
Engineer, 10gen
Mark Hillick - @markofu
#mongosv
Using the MongoDB Monitoring Service (MMS)
What, where, numbers?
What is MMS?
• MongoDB monitoring Saas solution with:
– Per minute granularity
– Alerting: host up / down, metrics etc
– Event tracking (server restart, step down, …)
• Host management (auto discover)
• Profiling
• Hardware stats also
Why use MMS? (1)
• Overview – Bird’s Eye
– Macro
• Drill down (minute by minute)
– Micro
Why use MMS? (2)
• Haz all teh things
• Tailored specifically for MongoDB
• Incredibly helpful for 10gen Support when troubleshooting
A few numbers …
• Monitors over 19k database servers
• 40k writes per second
• 400 metrics per ping packet
• 9 billion metrics recorded per day
How?
Set up MMS – it’s easy
• Go to http://mms.10gen.com
– Create a new account or sign in with jira user.
– Pick an explicit company name
– Download and run the agent
– From MMS dashboard, add a host to monitor
The MMS client (agent)
• Small Python app
• A single agent process
– Failover – multiple agents
• Connect to mms.10gen.com (SSL over TCP 443)
Host
Operational Stats
Alerting
Alerts - Config
All good
Alerts - Closed
Events
Security
Security
• Purely stats (metadata). – Log transfer has to be turned on.
• HTTPS & connections are outbound only (from the agent)
• If profiling in db & MMS, then profiling data is sent
On-premise MMS
• Locally Hosted in Customer Infrastructure
• PCI, HIPAA etc
• Enterprise Customers (2.4)
Measure me!!!
Metrics
• Source : http://www.kaushik.net/avinash/wp-content/uploads/2007/10/metrics.jpg
opcounters• Count of every operation per second
• getMore – each batch of a query
memory• Mapped: sum of files on disk
• Virtual memory: 2 x mapped (j) + process overhead
• Resident memory: data in RAM actively used
Lock %• Amount of time spent in the write lock
• From 2.2 : each db has own lock
Background flush• Flush every 60 seconds
• Watch: if flush time gets close to sync delay
Page faults• Disk IO
• Readahead
Replication• On primary: amount of time in oplog
• On secondary: replication delay to primary
Metrics that we discussed• Opcounters
• Lock %
• Background Flush
• Page Faults
• Replication
Metrics for performance
• Resident memory: how much data in RAM?
• Page Faults: paging to disk? Readahead?
• Journal commits in write lock: separate journal
• High background flush: reduce sync delay to smooth
Documentation
Docs? Where?
• Manual : https://mms.10gen.com/help/
– Web– PDF
• FAQ : https://mms.10gen.com/docs/faq
• Blah
Futures
Feature Request
• JIRA Ticket - MMSSUPPORT
Coming up…
• Data visualization, e.g. shard distribution (Q1 2013)???
• Move from Python to Java
• Blah – Ryan???
Conclusion
Conclusion
• Easy to use
• Macro & micro
• Detailed monitoring features
• Aides 10gen Support immensely
Engineer, 10gen
Mark Hillick - @markofu
#mongosv
Questions?