Date post: | 22-Jan-2018 |
Category: |
Presentations & Public Speaking |
Upload: | antonios-giannopoulos |
View: | 297 times |
Download: | 0 times |
How Sitecore depends on MongoDB for scalability and performance, and
what it can teach youAntonios Giannopoulos
Database Administrator – ObjectRocket
Grant Killian Sitecore Architect - Rackspace
Percona Live 2017
Agenda
We are going to discuss:
Key terms
- Introduction to Sitecore
- Introduction to MongoDB
Best Practices for MongoDB with Sitecore
Scaling Sitecore
Benchmarks
Who We AreAntonios GiannopoulosDatabase Administrator w/ ObjectRocket
Grant KillianSitecore Architect w/ Rackspace
Sitecore MVP
Sitecore ♥ MongoDB because . . .
● Unstructured document model is a better fit for
Sitecore analytics vs traditional database rows
● ∞ scalability
● Introduces key flexibility to the system
○ HTTP Session state
○ Optional repository for other Sitecore modules
○ 100% replacement for SQL Server (experimental)
■ $$$
MongoDB replica-setA group of mongod processes that maintain the same dataset
Replica sets provides:
- Redundancy
- High availability
- Scaling
MongoDB replica-setConsists of at least 3 nodes- Up to 50 nodes in 3.0 and higher
- 12 on previous versions
A replica-set node may be either:- Primary
- Secondary
- Arbiter
MongoDB replica-set
Asynchronous replication
- Delay between PRI and SECs
- SECs pull and apply operations
Automatic failover
- If a PRI fails a SEC takes its place
MongoDB replica-set
Best Practices
- Odd number of members
- Use same server specs
- Reliable network connections
- Adjust the oplog accordingly
MongoDB Sharded ClustersConsists of:Mongos- It’s a statement (query) router- Connection interface for the driver - makes sharding transparent
Config Servers: Holds cluster metadata - location of the dataShards: Contains a subset of the sharded data
MongoDB Sharded ClustersBest Practices
- Deploy shards as replica-sets
- Reliable network connections
- But most important… pick a shard key
Undo a shard key might require downtime
MongoDB Sharded ClustersWhat makes a good shard key:- High Cardinality
- Not Null values
- Immutable field(s)
- Not Monotonically increased fields
- Even read/write distribution
- Even data distribution
- Read targeting/locality
Most important choose a shard key according to your application requirements
MongoDB Storage Engines
MongoDB version 3.0 and higher supports:- MMAPv1
- WiredTiger
- RocksDB (Percona Server)
- In Memory (Percona Server)
- Fractal Tree (Percona Server)
Sitecore MongoDB Databases1. Analytics - customer visit metrics (IP address, browser,pages…) 2. Tracking_contact - contact processing3. Tracking_history - history worker queue for full rebuilds4. Tracking_live - task queue for real-time processing5. Private_session - “classic” http session state 6. Shared_session - meta http session state for contacts
(engagement state for livetime of interactions…)
Scaling Sitecore – Separate Workloads
Move each Sitecore database to a separate instance
Sitecore uses different connection string per DatabaseconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_" />connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_database_name_" />
Instances can be optimized according to their workload
Scaling Sitecore – PolyglotUse a different storage engine per database:
- Different instances- Sharded clusters, different storage engines per shard
Percona In-memory storage engine is a good fit for _sessions- Based on the in-memory storage engine used in MongoDB Enterprise Edition- _sessions data are not persistent
Scaling Sitecore - ShardingWhat to shard:- Large collections for capacity
- Busy collections for load distribution
How to pick a shard key:- Collect a representative statement sample and identify statement patterns- Pick a shard key that scales the workload/statements- Meet sharding constraints
Scaling Sitecore - Sharding
From Sitecore documentation: “Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace”
Sharding interaction and contact for capacity.
Scaling Sitecore - ShardingCollection InteractionReceives: Inserts, Queries and Updates
Read/Write Ratio: 60-40
Updates are using the _id
Queries are using:
"_id, ContactId” : 80%
"ContactId,_t”: 5%
"ContactId,ContactVisitIndex”: 15%
Scaling Sitecore - ShardingCollection InteractionRecommended shard key is _id:1 or _id:hashed
- Scale vast majority of statements
- But… few scatter-gather queries (around 20%)
{ContactId:1} is also decent, But:
- Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule
- _id is generated by the application not the driver
- Potential for Jumbo chunks
Scaling Sitecore - Sharding
Collection InteractionChoose your shard key according to your engine
- MMAP _id:1 or _id:hashed
- WiredTiger _id:1 or _id:hashed or ContactId:1
Sitecore may optimize sharding by including ContactId on the updates
Scaling Sitecore - ShardingCollection ContactsReceives: Inserts, Queries and UpdatesRead/Write Ratio: 80-20
Updates are using the _id
Queries are using the _id (with additional fields)
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - ShardingCollection Devices
Recommended shard key is _id:1 or _id:hashed
Collection ClassificationsMap
Recommended shard key is _id:1 or _id:hashed
Collection KeyBehaviorCache
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - ShardingCollection GeoIps
Recommended shard key is _id:1 or _id:hashed
Collection OperationStatuses
Recommended shard key is _id:1 or _id:hashed
Collection ReferringSites
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}
Client generated _id are monotonically increased thus “hashed” added for randomness
Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled on BinData datatype
Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}
You may use the uuidhelpers.js utility to convert _id to UUID
Download from: https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js
>doc = db.test.findOne()
{ "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") }
>doc._id.toCSUUID()
CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection
Using numInitialChunks allows to pre-split and distribute empty chunks.- Avoid chunk splits- Avoid chunk moves
db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} , numInitialChunks:<number>} ) , number < 8192 per shard.
Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection
Define numInitialChunks Size= Collection size (in MB)/32Count= Number of documents/125000Limit= Number of shards*8192
numInitialChunks = Min(Max(Size, Count), Limit)
Scaling Sitecore - ShardingMove Primary
Move each sitecore database to a different shard:
(analytics, tracking_live …)
db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } )
Requires downtime for live databases
Scaling Sitecore – Secondary ReadsYou can configure Secondary Reads from the driver (secondary or
secondaryPreferred)
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da
tabase_name_?readPreference=secondary/>
In 3.4 maxStalenessSeconds was introduced to control stale reads
Specifies, in seconds, how stale a secondary can be before the client stops using
it for read operations
Scaling Sitecore – Secondary ReadsUse ReplicaSet Tags to target reads:- Direct reads to specific replica set nodes- Reduces availability
conf = rs.conf();
conf.members[0].tags = {"db": "analytics"}
rs.reconfig(conf)
Set readPreferenceTags on the connection string connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreferenceTags=analytics/>
Order matters when setting multiple tagsOrder matters
Scaling Sitecore – Multi Region
Challenges:
- Direct reads to the closest node
- Direct writes to the closest node
- Single database entity for reporting
- Minimum complexity
Scaling Sitecore – Multi Region
Replica Set:- Target reads using nearest read concern
- Target reads using region based tags
- Writes must go to the Primary
- Requires at least one secondary per region
Scaling Sitecore – Multi RegionSharded cluster:
- Target reads using nearest read concern
- Target reads using region based tags
- Requires at least one secondary per region
- Writes must go to the Primaries
- Tags or Zones are based on shard key ranges
- Add location to shard key as prefix – change the source code
Scaling Sitecore – Multi Region
Mongo to Mongo connector:- Creates a pipeline from a MongoDB cluster to another
MongoDB cluster
- Reads and replicates oplog operations
- Easy deployment
mongo-connector -m <name:port> -t <name:port> -d <database>
Scaling Sitecore – Connector
oplog oplog
db.Insert.foo ({a:1})
db.Insert.foo ({_id:1, a:1})
{ "ts" : Timestamp(), "h" : NumLong(), "v" : 2, "op" : "i", "ns”:”foo.foo”, "o" : {
"_id" : 1, a:1}
BenchmarksBenchmark 1: Single/Replica set MMAP vs Single shard/Replica set
WiredTiger (3.2.8)
Results: WiredTiger is 9.5% faster
Benchmark 2: Sharded cluster MMAP vs Sharded cluster
WiredTiger (Analytics sharded on {_id:1})
Results: WiredTiger is 9.4% faster
So what?
- Evaluate your MongoDB architecture to determine if it
would benefit from scaling
- If scaling is in order, consider this talk as a
reference
- Recognize how MongoDB’s versatility makes it
relevant to a wide variety of applications
Whats next?
- Test MongoRocks (Percona Server) against Sitecore
- Test In-Memory (Percona Server) for sessions or
cache(s)
- Expand sharding recommendations on add-ons
- Evaluate other Sitecore modules for suitability with
MongoDB
- Re-invent our benchmarks
We’re Hiring! Looking to join a dynamic & innovative team?
Justine is here at Percona Live 2017,
Reach out directly to our Recruiter at [email protected]
Questions?Thank you!!!
@iamantonios
🍍
@sitecoreagent