Date post: | 28-Nov-2014 |
Category: |
Technology |
Upload: | mongodb |
View: | 462 times |
Download: | 2 times |
Preparing for Peak Holiday Season:A Seamless Customer Experience!
Global Business Architect & Strategist, MongoDB
@rebeccabucnis
Rebecca Bucnis
Principal Solutions Engineer, MongoDB@antoinegirbal
Antoine Girbal
2
1. How is Peak Season shaping up this year?
2. How does MongoDB scale to support your business?
3. How do you capture the holiday Digital Customer Experience with MongoDB?
3 Questions for this Session?
3
MongoDB Speakers
About Rebecca:
Rebecca Bucnis
Global Business Architect
- Business Strategy
- Using data for business value
- Former Retailer
Washington, DC
@rebeccabucnis
About Antoine:
Antoine Girbal
Principal Solutions Engineer- Original team of MongoDB
- Engineer
- Solution Designer
Palo Alto, CA
@antoinegirbal
4
• Consumers more positive
• Increased spending (+25%*)
• Extended holiday buying window (with fewer days) starts 6pm
What to expect - Holiday Season 2014
* From Accenture Holiday Survey Oct 2014 Study on US Consumer Holiday Spending Plans
5
• Cyber Monday bigger than “Black Friday”
• Amazon has opened “stores” for returns
• 58%* of shoppers will shop with on-line retailers only:
What to expect - Holiday Season 2014
* From Accenture Holiday Survey Oct 2014 Study on US Consumer Holiday Spending Plans
6
• Consumers want the message right (*43% will defect when irrelevant)
• Price, Convenience, relevance & entertainment
• Collect immediate & longer term shopping behavior for action
The Opportunity - Holiday Season 2014
* From Gigya Personalization Study 2014 State of Consumer Privacy & Personalization
7
• A document model (holds mixed, variant data)
• Ability to add new & different data (agility)
• Ability to ask real-time questions based on right now update (complex queries & in-place updates)
• Geo-Location built-in
• Power of traditional data bases (full consistency,
durability, atomic operations)
• Near linear expansion (scaling via sharding)
• MongoDB is a unique fit for frictionless retail
The System of Engagement for Retail
8
“Global Product 360”
Themes: Up to date product details – with minimal down time; Images, reviews;
Vendor and order management;
Use Cases: Modern, Seamless Retail
9
Consolidated Customer View & Insight
Themes: Single View of Customer, Consumer 360; Activity Capture; Profiles for personalization
Use Cases: Modern, Seamless Retail
10
1. Detailed Product Information:- Single View of Product Information– Catalog
2. Real-time Inventory and Fulfillment- Real-time Inventory- Shopping Carts / Orders
3. Detailed Customer Views:- User Activity Logging - Integrating Customer Insights
4. Monitoring and Scaling- What to watch for and how to scale
Technical Deep Dive
11
Information Management
Merchandising
Content
Inventory
Customer
Channel
Sales & Fulfillment
Insight
Social
Architecture Overview
Customer
ChannelsAmazon
Ebay…
StoresPOSKiosk
…
MobileSmartphone
Tablet
Website
Contact Center
APIData and Service
Integration
SocialFacebook
Twitter…
Data Warehouse
Analytics
Supply Chain Management
System
Suppliers
3rd Party
In Network
Web Servers
Application Servers
12
Commerce Functional Components
Information Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotions
Chat
Ads
Customer's Perspective
ResearchBrowseSearch
SelectShopping Cart
PurchaseCheckout
ReceiveTrack
UseFeedbackMaintain
DialogAssist
Market / Offer
Guide
Offer
Semantic Search
Recommend
Rule-based Decisions
Pricing
Coupons
Sell / Fullfill
Orders
Payments
Fraud Detection
Fulfillment
Business Rules
InsightSession CaptureActivity
Monitoring
Customer Enterprise
Information Management
Merchandising
Content
Inventory
Customer
Channel
Sales & Fulfillment
Insight
Social
Deep Dive: Product Catalog
14
The many catalogs problem
15
1. One department in charge of master product works hard at fitting data into SQL tables
2. Resulting data sits in a SQL server with a couple replicas. It's forbidden to hit it more than 100 times / sec
3. Other departments need to access the data way more often for their own services
4. Other departments need more information that is not available since it did not fit in that long devised rigid SQL schema
5. ETLs and Message Buses are put in place for other teams to try figure it out themselves…
6. Data becomes inconsistent, fragmented, not up-to-date…Problem visible both internally and by customers!
The many catalogs problem
16
The many catalogs problem
Online Store
Catalog
Marketing
Catalog
Department 3
Catalog
Product Department
MasterCatalog
Department 4
Catalog
Department 5
Catalog
Department 1
Catalog
Message Bus
ETLs
Dozens of catalogs!
17
How many Catalogs do you have?
Catalog Caches?
Message Buses and ETLs for them?
Too many catalogs problem
18
• Single view of a product, one central service
• Flexible schema containing all useful data
• Read volume high and sustained, 100k reads / s
• Can seamlessly take write spikes during catalog update
• Advanced indexing and querying
• Geographical distribution for HA and low latency
Goal: Single View of Product
19
MongoDB Data Store
Merchandising - Architecture
Items Pricing Promotions
VariantsRatings & Reviews
Search Engine
Product Service API
…
Online Store Marketing Inventory SCMS Public API …
20
• Item: the overall product info (e.g. Levi’s 501)
• Variant: a specific variant of an item (e.g. in black size 6) which typically has a specific SKU / UPC
• Price: price information may vary based on the store, the variant, etc
• Hierarchy: the item taxonomy
• Facet: facets to search products by
• Vendors: a given sku may be available through several vendors if the site is a marketplace
Models - Overview
21
{ "_id": "054VA72303012P", // the item id "desc": [ // item descriptions { "lang": "en", "val": "Give your dressy look a lift with ..." }, ... ], "name": "Women's Kate Ivory Peep-Toe Stiletto Heel", "category": "/84700/80009/1282094266/1200003270", // hierarchy "brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" }, "assets": { // references to all assets "imgs": [ { "img": { "width": 1900, "height": 1900, "src": "http://..." }, ... ] }, "shipping": { // shipping specs }, "specs": { // item specs }, "attrs": [ // list of items attributes (facets) { "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" }, { "name": "Toe", "value": "Open toe" }, ... ], "variants": { // quick info on the variants "cnt": 9, "attrs": [ { "dispType": "DROPDOWN", "name": "Color" }, { "dispType": "DROPDOWN", "name": "Shoe Size" }, ... ] }, "lastUpdated": 1400877254787 // keep track of updates }
Models - Item Model
22
Product Search – Traditional Architecture
Product Data Store Product Search
Indexing
#1 obtain search
results IDs
ApplicationCache
#2 obtain objects by ID from cache or DB
Pre-joined into objects
23
Product Search – New Architecture
Product Data Store Product Search
Indexing
#1 obtain search
results IDs
Applications
#2 obtain objects by list of IDs
MongoDB
Ready-to-use product documents
Search Engine
Product API
Application issues single
query
Deep Dive: Real-time Inventory and Fulfillment
25
Less than Real-Time Inventory
26
1. The Inventory system is centralized in a single SQL server
2. Latency to Inventory is too high, not accessible from individual stores or distribution centers
3. Stores / DCs need to manage their own local inventory, then ship the result once a day to the central system
4. Central inventory has no view of intra-day quantities. It does forecast and replenish with up to 24h delay
5. Opportunities are lost due to overstock / shortage
6. Sometimes products are sold due to existing quantities in a distant inventory. The product turns out not actually available, customers are upset
Less than Real-Time Inventory
27
Inventory – Traditional Architecture
Relational DBSystem of Records
Analytics, Aggregations,
Reports
Caching Layer
Field Inventory
Internal & External Apps
Local view only
Once-a-day sync
Stale view
Suboptimal logic
28
• Single view of the inventory, one central service
• Used by most services and channels
• Read dominated workload
• Local, real-time writes
• Bulk writes for refresh
• Geographically distributed
• Horizontally scalable
Goal: Real-Time Inventory
29
MongoDB
Inventory – Target Architecture
Relational DBSystem of Records
Analytics, Aggregations,
Reports
Field Inventory
Internal & External Apps
Inventory
Assortments
Shipments
Audits
Orders
Stores
Point-in-time Loads
NightlycheckReal-time
updates
Real-timeview
Relevant dataset
30
Representing quantities …
Inventory Levels - Inventory
31
Solution: 1 document per SKU / store
> 100 million items x 1000 stores
= 100 billions entries
Inventory Levels - Inventory
{ "_id": "SPM7597703608A/store0", "storeId": "store0", "location": [-86.95444, 33.40178], "q": 88, "ts": 1400877254787 }
32
Solution: 1 document per key / store grouping SKUs
_id: item id or hash of SKU, with store id
> Good for geo distribution, low number of docs
Inventory Levels - Inventory
{ "_id": "SPM7597703608/store0", // unique key "storeId": "store0", "location": [-86.95444, 33.40178], "geoCode": 1, "skus": [ // list of skus quantities { "id": "SPM7597703608A", "q": 88 }, { "id": "SPM7597703608B", "q": 55 }, { "id": "SPM7597703608C", "q": 104 }, … ], "ts": 1400877254787 }
33
• Increment / decrement / set quantity for an item at a store, atomically
Inventory Updates - Quantities
db.inventory.update( { "_id": { regex: "^SPM7597703608/" }, "skus.id": "SPM7597703608A" }, { "$dec": { "skus.$.q": 1 }})
db.inventory.update( { "_id": { regex: "^SPM7597703608/" }, "skus.id": "SPM7597703608A" }, { "$inc": { "skus.$.q": 20 }})
// use $set for setting …
34
• Get closest stores with available SKU
Inventory Levels – Inventory
db.runCommand({ geoNear: "inventory", near: { type: "Point", coordinates: [-82.8006, 40.0908] }, maxDistance: 10000.0, spherical: true, limit: 10, query: { _id: { regex: "^SPM7597703608/" }, skus: { $elemMatch: { id: "SPM7597703608A", q: { $gt: 0 }}}} })
35
How to keep reads / writes local with low latency?
How to stay available during network partition?
Inventory Updates – Availability
36
East DCCentral DCWest DC
ShardEast
ShardCentral
ShardWest
Inventory Updates – Availability
Primary
Primary
Primary
AppAp
pApp
AppAp
pApp
AppAp
pApp
Basic Setup: Writes go
everywhere
37
• Basic shard key– { _id: 1 } // built as group key + store
• Shard key for "Geo-sharding"– { geoCode: 1, _id: 1}
• Alternative "Geo-sharding", more granular– { storeId: 1, _id: 1 }
Inventory Updates – Availability
38
East DCCentral DCWest DC
ShardEast
ShardCentral
ShardWest
Inventory Updates – Availability
Primary
Primary
Primary
AppAp
pApp
AppAp
pApp
AppAp
pApp
Using tag-aware sharding: mostly
local writes
39
Shopping Carts – Model
40
• Shopping cart fits naturally in 1 document
Shopping Carts – Model
{ _id: ObjectId(…), ts: ISODate("2011-12-09T00:00:00.000Z”), userId: "c12398", geoCode: 1, totalPrice: 1050.99, items: [{ sku: "SPM7597703608A", quantity: 1, price: 799, storeId: "store100", name: "Apple Macbook Air", thumbnail: "http://…", … }, { sku: "SPM7587703609C", quantity: 4, price: 20, storeId: "store100", name: "Oral-B Toothbrush", thumbnail: "http://…", … }, … ] }
41
East DCCentral DCWest DC
ShardEast
ShardCentral
ShardWest
Shopping Carts – Availability
Primary
Primary
user
user
1. Shops in West, cart
written locally
2. Shops in East, same cart
read locally
Travel
ReplicationPrimary
42
• Each shard has 1 replica in every DC
• Primary servers are distributed among DCs
• Local Cart insert / update:– Tag-aware Sharding using the geoCode field
• Local Cart lookup:– Tag-aware Sharding using the geoCode field
• Local Cart lookup for all regions:– Nearest Read Preference (closest replica)
Shopping Carts – Topology
Deep Dive: User Activity Logging
and Insight
44
Insights
Data Intelligence
45
Many user activities can be of interest:
• Search terms
• Product viewed, liked or wished
• Shopping cart add / remove
• Orders submitted
• Sharing on social network
• Ad impression, Clickstream
Insights – Data of interest
46
Data will be used to compute:
• User / Product History
• Product Map (relationships, etc)
• User Preferences
• Recommendations
• Trends
> This is the basis for Personalization
Insights – Data of interest
47
1. Originally system does not record user activity much, since it is too voluminous. It ends up forgotten in log files.
2. Attempts are made to store it in SQL, but expensive to achieve adequate write performance. Reporting across large data sets (TB+) does not work.
3. Activity is recorded to Data Warehouse system which provides good reporting but too expensive to scale.
4. Using technologies like Hadoop, good scaling and powerful reporting are achieved.
5. Still there is a lack of scalable front end Data Store for real time queries and aggregations from applications.
Insights – Today's Limitations
48
Insights – Traditional Architecture
External Analytics:Hadoop,
Greenplum,Terradata,
…
Apps
Log ProcessorActivity Logs
SQL Data Store
Delays moving logs
Delays processing
Output limited by schema
Limited read capacity
49
• Store and manage large stream of data samples– High arrival rate from many sources– Variable schema– Control retention period of data
• Compute aggregations and derivative data sets– Aggregations and statistics based on data – Roll-up data into pre-computed reports and summaries
• Low latency access to up-to-date data– Flexible indexing of raw and derived data sets – Rich querying based on time + meta-data fields
Goal: Scalable and Powerful Insights
50
Insights – MongoDB Architecture
MongoDB
HVDFAPI
Activity LoggingUser History
External Analytics:Hadoop,Spark,Storm,
…
User Preferences
Recommendations
Trends
Product MapApps
Internal Analytics:
Aggregation,MR
All user activity is recorded
MongoDB – Hadoop
Connector
Personalization
51
Insights
NOW!
52
Insights – MongoDB + Hadoop
Applicationspowered by
Analysispowered by
• Products & Inventory• Recommended products• Customer profile• Session management
• Elastic pricing• Recommendation models• Predictive analytics• Clickstream history
MongoDB Connector for
Hadoop
53
{ _id: ObjectId(),
geoCode: 1, // used to localize write operations
sessionId: "2373BB…",
device: { id: "1234",
type: "mobile/iphone",
userAgent: "Chrome/34.0.1847.131"
}
userId: "u123",
type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity
itemId: "301671",
sku: "730223104376",
order: { id: "12520185",
… },
location: [ -86.95444, 33.40178 ],
tags: [ "smartphone", "iphone", … ], // associated tags
timeStamp: Date("2014/04/01 …")
}
Insight – User Activity Model
54
• Recent activity for a user: db.activity.find({ userId: "u123" }) .sort({ time: -1 }).limit(100)
• Recent activity for a product: db.activity.find({ itemId: "301671" }) .sort({ time: -1 }).limit(100)
• Indices: – userId + time, itemId + time, time
• All queries should be time bound for performance!
Insight – User History
55
• Recent number of views, purchases, etc for user db.activities.aggregate(([ { $match: { userId: "u123", ts: { $gt: DATE }}}, { $group: { _id: "$type", count: { $sum: 1 }}}])
• Recent total sales for a user db.activities.aggregate(([ { $match: { userId:"u123", ts:{$gt:DATE}, type:"ORDER"}}, { $group: { _id: "result", count: {$sum: "$total" }}}])
• Recent number of views, purchases, etc for item db.activities.aggregate(([ { $match: { itemId: "301671", ts: { $gt: DATE }}}, { $group: { _id: "$type", count: { $sum: 1 }}}])
> Those aggregations are very fast, real-time
Insight – User Stats
56
• Map Reduce calculation of unique visitors: var map = function() { emit(this.userId, 1); }
var reduce = function(key, values)
{ return Array.sum(values); }
db.activities.mapreduce(map, reduce,
{ query: { time: { $gt: NOW-1H } },
out: { replace: "lastHourUniques", sharded: true })
// number activities for a user
db.lastHourUniques.find({ userId: "u123" })
// total uniques, immediate result
db.lastHourUniques.count()
Insight – User Stats
Monitoring and Scaling
58
MMS
59
MMS
60
MMS
61
Following are useful Monitoring tools:
• Mongo Monitoring Service (MMS)
• Mongostat – console based
• Mongotop – activity of each Namespace
• IOStat – disk activity
• Plugins for most popular frameworks (Munin, Nagios, Cacti, SNMP …)
> Without Monitoring, impossible to quickly troubleshoot and recover from downtime!
Monitoring Tips – Tools
62
MMS
63
Metrics to watch for:
• Data Size vs Disk Size
• Active Set Size vs Ram Size
• Disk IO
• Write Lock
> Account and test for highest possible traffic!
> MongoDB's support team is there to help!
Monitoring Tips – Metrics
64
Add replicas to:
• Reduce latency to users
• Add read capacity (data potentially stale)
• Increase data safety
> Adding / Removing replica is seamless
Replication Tips
65
MMS
66
If you are not sharding yet …
It may be time to shard
Switch to sharding with no downtime …
Just make sure you pick the right shard key!
MongoDB Support is there to help
Sharding Tips
67
Add shards to:
• Increase read / write IO capacity
• Increase Storage space
• Increase RAM space
• Bring a primary closer to users
> Shard add / remove takes time and capacity
> Scales mostly linearly but broadcast queries are sub-linear
Sharding Tips
68
MMS
69
MMS
Watch MMS Demo at https://www.youtube.com/watch?v=nSJiVXNsPHk
MMS
Closing Comments
1. How is Peak Season shaping up this year?
2. How do you scale your business with MongoDB?
3. How do you capture the holiday Digital Customer Experience with MongoDB?
3 Answers for this Season
1. Spending & confidence are back! Act fast!
2. Create single view services and scale using sharding
3. High volume activity logging capture for now & rest of the season for “insight”
1. Assess your data and determine your monitoring gaps
2. Join us and Engage:
• MongoDB Days – London – November 19
• MongoDB Days- San Francisco – December 3
• MongoDB Meet-ups, MUG, Office Hours
3. Start one step at a time - with “prototype” capabilities
What’s Next?
Questions?