+ All Categories
Home > Presentations & Public Speaking > How sitecore depends on mongo db for scalability and performance, and what it can teach you

How sitecore depends on mongo db for scalability and performance, and what it can teach you

Date post: 22-Jan-2018
Category:
Upload: antonios-giannopoulos
View: 297 times
Download: 0 times
Share this document with a friend
50
How Sitecore depends on MongoDB for scalability and performance, and what it can teach you Antonios Giannopoulos Database Administrator – ObjectRocket Grant Killian Sitecore Architect - Rackspace Percona Live 2017
Transcript

How Sitecore depends on MongoDB for scalability and performance, and

what it can teach youAntonios Giannopoulos

Database Administrator – ObjectRocket

Grant Killian Sitecore Architect - Rackspace

Percona Live 2017

Agenda

We are going to discuss:

Key terms

- Introduction to Sitecore

- Introduction to MongoDB

Best Practices for MongoDB with Sitecore

Scaling Sitecore

Benchmarks

Who We AreAntonios GiannopoulosDatabase Administrator w/ ObjectRocket

Grant KillianSitecore Architect w/ Rackspace

Sitecore MVP

Sitecore ArchitectureMinimum necessary to understand this talk

Gartner Magic

Quadrant for

WCM (Web

Content

Management)

-Sept 2016

Sitecore is a framework for building websites...

Sitecore ♥ MongoDB because . . .

● Unstructured document model is a better fit for

Sitecore analytics vs traditional database rows

● ∞ scalability

● Introduces key flexibility to the system

○ HTTP Session state

○ Optional repository for other Sitecore modules

○ 100% replacement for SQL Server (experimental)

■ $$$

MongoDB replica-setA group of mongod processes that maintain the same dataset

Replica sets provides:

- Redundancy

- High availability

- Scaling

MongoDB replica-setConsists of at least 3 nodes- Up to 50 nodes in 3.0 and higher

- 12 on previous versions

A replica-set node may be either:- Primary

- Secondary

- Arbiter

MongoDB replica-set

Asynchronous replication

- Delay between PRI and SECs

- SECs pull and apply operations

Automatic failover

- If a PRI fails a SEC takes its place

MongoDB replica-set

Best Practices

- Odd number of members

- Use same server specs

- Reliable network connections

- Adjust the oplog accordingly

MongoDB Sharded ClustersConsists of:Mongos- It’s a statement (query) router- Connection interface for the driver - makes sharding transparent

Config Servers: Holds cluster metadata - location of the dataShards: Contains a subset of the sharded data

MongoDB Sharded Clusters

MongoDB Sharded ClustersBest Practices

- Deploy shards as replica-sets

- Reliable network connections

- But most important… pick a shard key

Undo a shard key might require downtime

MongoDB Sharded ClustersWhat makes a good shard key:- High Cardinality

- Not Null values

- Immutable field(s)

- Not Monotonically increased fields

- Even read/write distribution

- Even data distribution

- Read targeting/locality

Most important choose a shard key according to your application requirements

MongoDB Storage Engines

MongoDB version 3.0 and higher supports:- MMAPv1

- WiredTiger

- RocksDB (Percona Server)

- In Memory (Percona Server)

- Fractal Tree (Percona Server)

Sitecore MongoDB Databases1. Analytics - customer visit metrics (IP address, browser,pages…) 2. Tracking_contact - contact processing3. Tracking_history - history worker queue for full rebuilds4. Tracking_live - task queue for real-time processing5. Private_session - “classic” http session state 6. Shared_session - meta http session state for contacts

(engagement state for livetime of interactions…)

For example . . .

Graphic courtesy of http://www.techphoria414.com

Scaling Sitecore – Separate Workloads

Move each Sitecore database to a separate instance

Sitecore uses different connection string per DatabaseconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_" />connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_database_name_" />

Instances can be optimized according to their workload

Scaling Sitecore – PolyglotUse a different storage engine per database:

- Different instances- Sharded clusters, different storage engines per shard

Percona In-memory storage engine is a good fit for _sessions- Based on the in-memory storage engine used in MongoDB Enterprise Edition- _sessions data are not persistent

Scaling Sitecore - ShardingWhat to shard:- Large collections for capacity

- Busy collections for load distribution

How to pick a shard key:- Collect a representative statement sample and identify statement patterns- Pick a shard key that scales the workload/statements- Meet sharding constraints

Scaling Sitecore - Sharding

From Sitecore documentation: “Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace”

Sharding interaction and contact for capacity.

Scaling Sitecore - ShardingCollection InteractionReceives: Inserts, Queries and Updates

Read/Write Ratio: 60-40

Updates are using the _id

Queries are using:

"_id, ContactId” : 80%

"ContactId,_t”: 5%

"ContactId,ContactVisitIndex”: 15%

Scaling Sitecore - ShardingCollection InteractionRecommended shard key is _id:1 or _id:hashed

- Scale vast majority of statements

- But… few scatter-gather queries (around 20%)

{ContactId:1} is also decent, But:

- Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule

- _id is generated by the application not the driver

- Potential for Jumbo chunks

Scaling Sitecore - Sharding

Collection InteractionChoose your shard key according to your engine

- MMAP _id:1 or _id:hashed

- WiredTiger _id:1 or _id:hashed or ContactId:1

Sitecore may optimize sharding by including ContactId on the updates

Scaling Sitecore - ShardingCollection ContactsReceives: Inserts, Queries and UpdatesRead/Write Ratio: 80-20

Updates are using the _id

Queries are using the _id (with additional fields)

Recommended shard key is _id:1 or _id:hashed

Scaling Sitecore - ShardingCollection Devices

Recommended shard key is _id:1 or _id:hashed

Collection ClassificationsMap

Recommended shard key is _id:1 or _id:hashed

Collection KeyBehaviorCache

Recommended shard key is _id:1 or _id:hashed

Scaling Sitecore - ShardingCollection GeoIps

Recommended shard key is _id:1 or _id:hashed

Collection OperationStatuses

Recommended shard key is _id:1 or _id:hashed

Collection ReferringSites

Recommended shard key is _id:1 or _id:hashed

Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}

Client generated _id are monotonically increased thus “hashed” added for randomness

Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled on BinData datatype

Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")

Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}

You may use the uuidhelpers.js utility to convert _id to UUID

Download from: https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js

>doc = db.test.findOne()

{ "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") }

>doc._id.toCSUUID()

CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")

Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection

Using numInitialChunks allows to pre-split and distribute empty chunks.- Avoid chunk splits- Avoid chunk moves

db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} , numInitialChunks:<number>} ) , number < 8192 per shard.

Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection

Define numInitialChunks Size= Collection size (in MB)/32Count= Number of documents/125000Limit= Number of shards*8192

numInitialChunks = Min(Max(Size, Count), Limit)

Scaling Sitecore - ShardingMove Primary

Move each sitecore database to a different shard:

(analytics, tracking_live …)

db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } )

Requires downtime for live databases

Scaling Sitecore – Secondary ReadsYou can configure Secondary Reads from the driver (secondary or

secondaryPreferred)

connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da

tabase_name_?readPreference=secondary/>

In 3.4 maxStalenessSeconds was introduced to control stale reads

Specifies, in seconds, how stale a secondary can be before the client stops using

it for read operations

Scaling Sitecore – Secondary ReadsUse ReplicaSet Tags to target reads:- Direct reads to specific replica set nodes- Reduces availability

conf = rs.conf();

conf.members[0].tags = {"db": "analytics"}

rs.reconfig(conf)

Set readPreferenceTags on the connection string connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreferenceTags=analytics/>

Order matters when setting multiple tagsOrder matters

Scaling Sitecore – Multi Region

Challenges:

- Direct reads to the closest node

- Direct writes to the closest node

- Single database entity for reporting

- Minimum complexity

Scaling Sitecore – Multi Region

Replica Set:- Target reads using nearest read concern

- Target reads using region based tags

- Writes must go to the Primary

- Requires at least one secondary per region

Scaling Sitecore – Multi RegionSharded cluster:

- Target reads using nearest read concern

- Target reads using region based tags

- Requires at least one secondary per region

- Writes must go to the Primaries

- Tags or Zones are based on shard key ranges

- Add location to shard key as prefix – change the source code

Scaling Sitecore – Multi Region

Mongo to Mongo connector:- Creates a pipeline from a MongoDB cluster to another

MongoDB cluster

- Reads and replicates oplog operations

- Easy deployment

mongo-connector -m <name:port> -t <name:port> -d <database>

Scaling Sitecore – Connector

oplog oplog

db.Insert.foo ({a:1})

db.Insert.foo ({_id:1, a:1})

{ "ts" : Timestamp(), "h" : NumLong(), "v" : 2, "op" : "i", "ns”:”foo.foo”, "o" : {

"_id" : 1, a:1}

Scaling Sitecore – Multi Region

Mongo to Mongo Connector

Scaling Sitecore – Multi Region

Mongo to Mongo Connector

Scaling Sitecore – Multi Region

Mongo to Mongo Connector

BenchmarksBenchmark 1: Single/Replica set MMAP vs Single shard/Replica set

WiredTiger (3.2.8)

Results: WiredTiger is 9.5% faster

Benchmark 2: Sharded cluster MMAP vs Sharded cluster

WiredTiger (Analytics sharded on {_id:1})

Results: WiredTiger is 9.4% faster

So what?

- Evaluate your MongoDB architecture to determine if it

would benefit from scaling

- If scaling is in order, consider this talk as a

reference

- Recognize how MongoDB’s versatility makes it

relevant to a wide variety of applications

Whats next?

- Test MongoRocks (Percona Server) against Sitecore

- Test In-Memory (Percona Server) for sessions or

cache(s)

- Expand sharding recommendations on add-ons

- Evaluate other Sitecore modules for suitability with

MongoDB

- Re-invent our benchmarks

We’re Hiring! Looking to join a dynamic & innovative team?

Justine is here at Percona Live 2017,

Reach out directly to our Recruiter at [email protected]

Questions?Thank you!!!

[email protected]

@iamantonios

🍍

[email protected]

@sitecoreagent


Recommended