Benchmarking at Parse

transcript

Advanced Benchmarking at Parse

Travis Redman Parse + Facebook

Parse?

• Parse is a backend service for mobile apps

• Data Storage

• Server-side code

• Push Notifications

• Analytics

• … all by dropping an SDK into your app

Parse Stats

• Parse has 400,000 apps

• Rapidly growing MongoDB deployment with:

• 500 databases

• 2.5M collections

• 8M indexes

• 50T storage (excluding replication)

• We have all kinds of workloads!

Variety is Fun• We support just about any kind workload you can

imagine

• Games, social networking, events, travel, music, etc

• Apps that are read heavy or write heavy

• Heavy push users (time sensitive notifications)

• Apps that store large objects

• Apps that use us for backups

• Inefficient queries

2.6 - Why Upgrade?

• General desire to stay current, precursor for 2.8 and pluggable storage engines

• Specific features in 2.6

• Background indexing on secondaries

• Index intersection

• query plan summary logging

Upgrading is Scary

• In the early days, we just upgraded

• Put a new version on a secondary

• ???

• Upgrade primaries

• ???

• Fix bugs as we find them - LIVE!

Upgrading

• We’re too big now to cowboy it up

• Upgrading blindly is a potential catastrophe

• In particular, we want to avoid:

• Significant performance regressions

• Unexpected bugs that break customer apps

Benchmarking

• We know that:

• Benchmarking can detect performance regressions between versions

• Tools and sample workloads (sysbench, YCSB, …) already exist

• MongoDB runs its own benchmarks

• Our workload is complex - we want more confidence

A Customized Approach

• Why not test with production workloads?

• Flashback: https://github.com/ParsePlatform/flashback

• Record - python tool to record ops

• Replay - go tool to play back ops

Record

• Record leverages mongo’s profiling and oplog

• Profiling is enabled on all DBs

• Inserts are collected from the oplog

• All other ops taken from profile db

• Ops are recorded for specified time period (24H) and then merged

• Produces a JSON file of ops to feed the replay tool

Recording

Base Snapshot

• Need to replay prod ops on prod data

• It’s best to play back ops on a consistent copy of the data, otherwise:

• inserts are duplicate key errors

• deletes are no-ops

• queries don’t return the right data

• Using EBS snapshots, we grab a copy of the db during the recording

• Discard ops before the snapshot

Recording Timeline

Base Snapshot

• Snapshot is restored to our benchmark server(s)

• EBS volume has to be “warmed” because snapshot blocks are not instantiated

• Multi TB volumes can take a few hours to warm

• After warming we create an LVM snapshot

• We can “rewind” (merge) after each playback, iterating faster

Playback

1. Freeze the LVM volume

2. Start the version of mongo being tested

3. Adjust replay parameters

• # workers

• # num ops

• timestamp to start at (when base snapshot was taken)

4. Go!

5. Client-side results are logged to file, server-side collected from monitoring tools

Playback

Our Workload

• 24h of ops collected

• 10M ops at a time, as fast as possible

• 10 workers

• No warming of RS

• LVM snapshot reset, mongod restarted for each version

• Rinse and repeat for multiple replica sets

Our Results

2.4.10

3061.96 ops/sec (avg)

Results2.6.3

2062.69 ops/sec (avg)

• 33% loss in throughput.

• A second workload showed a 75% drop in throughput

• 3669.73 ops/sec vs 975.64 ops/sec

• Ouch! What do we do next?

Results

2.4.10 P99

2.4.10 MAX

2.6.3 P99

2.6.3 MAX

query 18.45ms 20953ms 19.21ms 60001ms

insert 23.5ms 6290ms 50.29ms 48837ms

update 21.87ms 3835ms 21.79ms 48776ms

FAM 21.99ms 6159ms 24.91ms 49254ms

Replay Data

Bug Hunt!

• Old fashioned troubleshooting begins

• Began isolating query patterns and collections with high max times

• Reproduced issue, confirmed slowness in 2.6

• Lots of documentation and log gathering, including extremely verbose QLOG

• Started investigation with the Mongo team that ran several weeks

What we found

• Basically, new query planner in 2.6 meets Parse auto-indexer

• We create lots of indexes automatically

• More indexes to score and potentially race

• Increased likelihood of running into query planner bugs

Example 1

Remove op on “Installation”

{ "installationId": {"$ne": ? }, "appIdentifier": "?", "deviceToken": “?”}

• 9M documents

• installationId is UUID, unique value

• "installationId": {"$ne": ? } matches most documents

• deviceToken is a unique token identifying the device

{ "installationId": {"$ne": ? }, "appIdentifier": "?", "deviceToken": “?”}

• Three candidate indexes:

{installationId: 1, deviceToken: 1} {deviceToken: 1, installationId: 1} {deviceToken: 1}

• The second and third indexes are clearly better candidates for this query, since the device token is a simple point lookup.

• Mongo bug where the work required to skip keys was not factored in to the plan ranking, causing the inefficient plan to sometimes tie

• Since it’s a remove op, held the write lock for the DB

• Fixed in: https://jira.mongodb.org/browse/SERVER-14311

Example 2

Query on “Activity”:

{ $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }

• 25M documents

• _p_project and _p_newProject are pointers to unique IDs of other objects

• acl matches most documents

• Four candidate indexes for this query

{ _p_newProject: 1 } { _p_project: 1 } { _p_project: 1, _created_at: 1 } { acl: 1 }

{ $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }

• Query Planner would race multiple plans using indexes

• Due to a bug, one of the raced indexes would do a full index scan (acl)

• Index scan was non-yielding, tying up the lock until it had completed

• Parse query killer job kills non-yielding queries after 45s

• Query planner would fail to cache plan, and would re-run on next query with the same pattern

• Fixed: https://jira.mongodb.org/browse/SERVER-15152

Example 3Query on “Activity”: { $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } } (same as previous example)

• Usually fast, but occasionally saw high nscanned and query time > 60s

• Since there were indexes on all fields in AND condition, this was a candidate for index intersection

• planSummary: IXSCAN { _p_project: 1 }, IXSCAN { _p_newProject: 1 }, IXSCAN { acl: 1.0 }

• acl was not selective, but _p_project and _p_newProject would sometimes match 0 documents during race

• intersection-based query plan would get cached, subsequent queries slow

• Fixed in https://jira.mongodb.org/browse/SERVER-14961

Success?2.6.5

4443.10 ops/sec (vs 3061.96 in 2.4.10)

Comparison

2.4.10 P99

2.4.10 MAX

2.6.4 P99

2.6.4 MAX

2.6.5 P99

2.6.5 MAX

query 18 ms

20,953 ms

60,001 ms

4,352 ms

insert 23 ms

6,290 ms

48,837 ms

2,225 ms

update 22 ms

3,835 ms

48,776 ms

4,535 ms

FAM 22 ms

6,159 ms

49,254 ms

4,353 ms

More Results

2.4.10 2.6.5

Ops:10M W:10

3061 ops/sec

4443 ops/sec

Ops:10M W:250

10666 ops/sec

12248 ops/sec

Ops:20M W:1000

11735 ops/sec

14335 ops/sec

What now?

• 2.6 has a green light on performance

• Working through functionality testing

• Unit/integration testing catching majority of issues

• Bonus: Flashback error log helping us to identify problems not caught by tests

Wrap Up

• Benchmarking with something representative of your production workload is worth the time

• Saved us from discovering slowness in production and inevitable and painful rollbacks

• Using actual production data is even better

• Helped us avoid new bugs

• Learned a lot about our own service (indexing algorithms need some work)

• Initial work can be reused to efficiently test future versions

Questions?

• Flashback: https://github.com/ParsePlatform/flashback

• Links to bugs:

• https://jira.mongodb.org/browse/SERVER-14311

Benchmarking at Parse

Technology