+ All Categories
Home > Technology > Performance Tuning on the Fly at CMP.LY

Performance Tuning on the Fly at CMP.LY

Date post: 18-Jul-2015
Category:
Upload: mongodb
View: 431 times
Download: 0 times
Share this document with a friend
Popular Tags:
59
1 JUNE 2014 Performance Tuning on the Fly at CMP.LY Michael De Lorenzo CTO, CMP.LY Inc. [email protected] @ mikedelorenzo
Transcript
Page 1: Performance Tuning on the Fly at CMP.LY

1JUNE 2014

Performance Tuning on the Fly at CMP.LY

Michael De Lorenzo

CTO, CMP.LY Inc.

[email protected]

@mikedelorenzo

Page 2: Performance Tuning on the Fly at CMP.LY

2JUNE 2014

Agenda• CMP.LY and CommandPost

• What is MongoDB Management Service?

• Performance Tuning

• MongoDB Issues we’ve faced

• Slow response times and delayed writes

• Unindexed queries

• Increased Replication Lag and Plummeting oplog Window

• Keep your deployment healthy with MMS

• Using MMS Alerts

• Using MMS Backups

Page 3: Performance Tuning on the Fly at CMP.LY

3JUNE 2014

A venture-funded NYC startup that offers proprietary social media, monitoring,

measurement, insight and compliance solutions for Fortune 100

A Monitoring, Measurement & Insights (MMI) tool for managed social

communications.

Page 4: Performance Tuning on the Fly at CMP.LY

4JUNE 2014

Use CommandPost to:• Track and measure cross-platform in real-time

• Identify and attribute high-value engagement

• Analyze and segment engaged audience

• Optimize content and engagement strategies

• Address compliance needs

Page 5: Performance Tuning on the Fly at CMP.LY

5JUNE 2014

What is MongoDB

Management Service?

Page 6: Performance Tuning on the Fly at CMP.LY

6JUNE 2014

MongoDB Management Service• Free MongoDB Monitoring

• MongoDB Backup in the Cloud

• Free Cloud service or Available

to run On-Prem for Standard or

Enterprise Subscriptions

• Automation coming soon—FTW!

Ops

Makes MongoDB easier to use and

manage

Page 7: Performance Tuning on the Fly at CMP.LY

7JUNE 2014

Who Is MMS for?• Developers

• Ops Team

• MongoDB Technical Service Team

Page 8: Performance Tuning on the Fly at CMP.LY

8JUNE 2014

Performance Tuning

Page 9: Performance Tuning on the Fly at CMP.LY

9JUNE 2014

How To Do Performance Tuning?• Assess the problem and establish acceptable behavior.

• Measure the performance before modification.

• Identify the bottleneck.

• Remove the bottleneck.

• Measure performance after modification to confirm.

• Keep it or revert it and repeat.

Adapted from [http://en.wikipedia.org/wiki/Performance_tuning]

Page 10: Performance Tuning on the Fly at CMP.LY

10JUNE 2014

What We’ve Faced

Page 11: Performance Tuning on the Fly at CMP.LY

11JUNE 2014

Issues We’ve Faced• Concurrency Issues

• Slow response times and delayed writes

• Querying without indexes

• Slow reads, timeouts

• Increasing Replication Lag + Plummeting oplog Window

Page 12: Performance Tuning on the Fly at CMP.LY

12JUNE 2014

Concurrency

Slow responses and delayed writes

Page 13: Performance Tuning on the Fly at CMP.LY

13JUNE 2014

Concurrency• What is it?

• How did it affect us?

• How did MMS help identify it?

• How did we diagnose the issue in our app and fix it?

• Today

Page 14: Performance Tuning on the Fly at CMP.LY

14JUNE 2014

Concurrency in MongoDB• MongoDB uses a readers-writer lock

• Many read operations can use a read lock

• If a write lock exists, a single write lock holds the lock exclusively

• No other read or write operations can share the lock

• Locks are “writer-greedy”

Page 15: Performance Tuning on the Fly at CMP.LY

15JUNE 2014

How Did This Affect Us?• Slow API response times due to slow database operations

• Delayed writes

• Backed up queues

Page 16: Performance Tuning on the Fly at CMP.LY

16JUNE 2014

MMS: Identify Concurrency Issues

Page 17: Performance Tuning on the Fly at CMP.LY

17JUNE 2014

Lock % Greater than 100%?!?!?• time spent in write lock state; sum of global lock + hottest database at that time,

can make value > 100%

• Global lock percentage is a derived metric:

% of time in global lock (small number)

+% of time locked by hottest (“most locked”) database

• Data is sampled and combined, it is possible to see values over 100%.

Page 18: Performance Tuning on the Fly at CMP.LY

18JUNE 2014

Diagnosis• Identified the write-heavy collections in our applications

• Used application logs to identify slow API responses

• Analyzed MongoDB logs to identify slow database queries

Page 19: Performance Tuning on the Fly at CMP.LY

19JUNE 2014

Our Remedies• Schema changes

• Message queues

• Multiple databases

• Sharding

Page 20: Performance Tuning on the Fly at CMP.LY

20JUNE 2014

Schema Changes• Denormalized our schema

• Allowed for atomic updates

• Customized documents’ _id attribute

• Leveraged existing index on _id attribute

Page 21: Performance Tuning on the Fly at CMP.LY

21JUNE 2014

Modeling for Atomic OperationsDocument{

_id: 123456789,

title: "MongoDB: The Definitive Guide",

author: [ "Kristina Chodorow", "Mike Dirolf"

],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher_id: "oreilly",

available: 3,

checkout: [ { by: "joe", date:

ISODate("2012-10-15") } ]

}

Update Operationdb.books.update (

{ _id: 123456789, available: { $gt: 0 } },

{

$inc: { available: -1 },

$push: { checkout: { by: "abc", date: new

Date() } }

}

)

ResultWriteResult({ "nMatched" : 1, "nUpserted" : 0,

"nModified" : 1 })

Page 22: Performance Tuning on the Fly at CMP.LY

22JUNE 2014

Message Queues• Controlled writes to specific collections using Pub/Sub

• We chose Amazon SQS

• Other options include Redis, Beanstalkd, IronMQ or any other message queue

• Created consistent flow of writes versus bursts

• Reduced length and frequency of write locks by controlling flow/speed of writes

Page 23: Performance Tuning on the Fly at CMP.LY

23JUNE 2014

Using Multiple Databases• As of version 2.2, MongoDB implements locks at a per database granularity for

most read and write operations

• Planned to be at the document level in version 2.8

• Moved write-heavy collections to new (separate) databases

Page 24: Performance Tuning on the Fly at CMP.LY

24JUNE 2014

Using Sharding• Improves concurrency by distributing databases across multiple mongod

instances

• Locks are per-mongod instance

Page 25: Performance Tuning on the Fly at CMP.LY

25JUNE 2014

Lock %: Today

Page 26: Performance Tuning on the Fly at CMP.LY

26JUNE 2014

Queries without Indexes

Slow responses and timeouts

Page 27: Performance Tuning on the Fly at CMP.LY

27JUNE 2014

Indexing• What is it?

• How did it affect us?

• How did MMS help identify it?

• How did we diagnose the issue in our app and fix it?

• Today

Page 28: Performance Tuning on the Fly at CMP.LY

28JUNE 2014

Indexing with MongoDB• Support for efficient execution of queries

• Without indexes, MongoDB must scan every document

• Example

Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros) r:46877422 nreturned:16 reslen:6948 38172ms

38 seconds! Scanned 17k documents, returned 16

• Create indexes to cover all queries, especially support common and user-facing

• Collection scans can push entire working set out of RAM

Page 29: Performance Tuning on the Fly at CMP.LY

29JUNE 2014

How Did this Affect Us?• Our web apps became slow

• Queries began to timeout

• Longer operations mean longer lock times

Page 30: Performance Tuning on the Fly at CMP.LY

30JUNE 2014

MMS: Identifying Indexing IssuesPage Faults

• The number of times that

MongoDB requires data

not located in physical

memory, and must read

from virtual memory.

Page 31: Performance Tuning on the Fly at CMP.LY

31JUNE 2014

Diagnosis• Log Analysis

• Use mtools to analyze MongoDB logs

• mlogfilter• filter logs for slow queries, collection scans, etc.

• mplotqueries• graph query response times and volumes

• https://github.com/rueckstiess/mtools

Page 32: Performance Tuning on the Fly at CMP.LY

32JUNE 2014

Diagnosis• Monitoring application logs

• Enabling ‘notablescan’ option in development and testing versions of apps

• MongoDB profiling

Page 33: Performance Tuning on the Fly at CMP.LY

33JUNE 2014

The MongoDB Profiler• Collects fine grained data about MongoDB write operations, cursors, database

commands on a running mongod instance.

• Default slowOpThreshold value is 100ms, can be changed from the Mongo shell

Page 34: Performance Tuning on the Fly at CMP.LY

34JUNE 2014

Our Remedies• Add indexes!

• Make sure queries are covered

• Utilize the projection specification to limit fields (data) returned

Page 35: Performance Tuning on the Fly at CMP.LY

35JUNE 2014

Adding Indexes• Improved performance for common queries

• Alleviates the need to go to disk for many operations

Page 36: Performance Tuning on the Fly at CMP.LY

36JUNE 2014

Projection SpecificationControls the amount of data that needs to be (de-)serialized for use in your app

• We used it to limit data returned in embedded documents and arrays

db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )

Page 37: Performance Tuning on the Fly at CMP.LY

37JUNE 2014

Page Faults: Today

Page 38: Performance Tuning on the Fly at CMP.LY

38JUNE 2014

Increasing Replication Lag + Plummeting oplog Window

Page 39: Performance Tuning on the Fly at CMP.LY

39JUNE 2014

Replication• What is it?

• How did it affect us?

• How did MMS help identify it?

• How did we diagnose the issue in our app?

• How did we fix it?

• Today

Page 40: Performance Tuning on the Fly at CMP.LY

40JUNE 2014

What is Replication?• A replica set is a group of mongod

processes that maintain the same data

set.

• Replica sets provide redundancy and

high availability, and are the basis for all

production deployments

Page 41: Performance Tuning on the Fly at CMP.LY

41JUNE 2014

What Is the Oplog?• A special capped collection that keeps a rolling record of all operations that

modify the data stored in your databases.

• Operations are first applied on the primary and then recorded to its oplog.

• Secondary members then copy and apply these operations in an asynchronous

process.

Page 42: Performance Tuning on the Fly at CMP.LY

42JUNE 2014

What is Replication Lag?• A delay between an operation on the primary and the application of that

operation from the oplog to the secondary.

• Effects of excessive lag

• “Lagged” members ineligible to quickly become primary

• Increases the possibility that distributed read operations will be inconsistent.

Page 43: Performance Tuning on the Fly at CMP.LY

43JUNE 2014

How did this affect us?• Degraded overall health of our production deployment.

• Distributed reads are no longer eventually consistent.

• Unable to bring new secondary members online.

• Caused MMS Backups to do full re-syncs.

Page 44: Performance Tuning on the Fly at CMP.LY

44JUNE 2014

Identifying Replication Lag Issues with MMSThe Replication Lag chart displays the lag for your deployment

Page 45: Performance Tuning on the Fly at CMP.LY

45JUNE 2014

Diagnosis• Possible causes of replication lag include network latency, disk throughput,

concurrency and/or appropriate write concern

• Size of operations to be replicated

• Confirmed Non-Issues for us

• Network latency

• Disk throughput

• Possible Issues for us

• Concurrency/write concern

• Size of op is an issue because entire document is written to oplog

Page 46: Performance Tuning on the Fly at CMP.LY

46JUNE 2014

Concurrency/Write Concern• Our applications apply many updates very quickly

• All operations need to be replicated to secondary members

• We use the default write concern—Acknowledge

• The mongod confirms receipt of the write operation

• Allows clients to catch network, duplicate key and other errors

Page 47: Performance Tuning on the Fly at CMP.LY

47JUNE 2014

Concurrency Wasn’t the IssueLock Percentage

Page 48: Performance Tuning on the Fly at CMP.LY

48JUNE 2014

Operation Size Was the IssueCollection A (most active)

Total Updates: 3,373

Total Size of updates: 6.5 GB

Activity accounted for nearly 87% of total traffic

Collection B (next most active)

Total Updates: 85,423

Total Size of updates: 740 MB

Page 49: Performance Tuning on the Fly at CMP.LY

49JUNE 2014

Fast Growing oplog causes issuesReplication oplog Window – approximate hours available in the primary’s oplog

Page 50: Performance Tuning on the Fly at CMP.LY

50JUNE 2014

How We Fixed It• Changed our schema

• Changed the types of updates that were made to documents

• Both allowed us to utilize atomic operations

• Led to smaller updates

• Smaller updates == less oplog space used

Page 51: Performance Tuning on the Fly at CMP.LY

51JUNE 2014

Replication Lag: Today

Page 52: Performance Tuning on the Fly at CMP.LY

52JUNE 2014

oplog Window: Today

Page 53: Performance Tuning on the Fly at CMP.LY

53JUNE 2014

Keeping Your Deployment Healthy

Page 54: Performance Tuning on the Fly at CMP.LY

54JUNE 2014

MMS Alerts

Page 55: Performance Tuning on the Fly at CMP.LY

55JUNE 2014

Watch for Warnings• Be warned if you are

• Running outdated versions

• Have startup warnings

• If a mongod is publicly visible

• Pay attention to these warnings

Page 56: Performance Tuning on the Fly at CMP.LY

56JUNE 2014

MMS Backups• Engineered by MongoDB

• Continuous backup with point-in-time recovery

• Fully managed backups

Page 57: Performance Tuning on the Fly at CMP.LY

57JUNE 2014

Using MMS Backups• Seeding new secondaries

• Repairing replica set members

• Development and testing databases

• Restores are free!

Page 58: Performance Tuning on the Fly at CMP.LY

58JUNE 2014

Summary• Know what’s expected and “normal” in your systems

• Know when and what changes in your systems

• Utilize MMS alerts, visualizations and warnings to keep things running smoothly

Page 59: Performance Tuning on the Fly at CMP.LY

59JUNE 2014

Questions?

Michael De Lorenzo

CTO, CMP.LY Inc.

[email protected]

@mikedelorenzo


Recommended