Date post: | 18-Jul-2015 |
Category: |
Technology |
Upload: | mongodb |
View: | 1,832 times |
Download: | 2 times |
1
Breaking the Oracle tie;
High Performance OLTP and analytics using MongoDB
AlexandrosGiamas
Senior Software Engineer
4
Can you afford to leave half the opportunity on the table?
You won't believe it
Pick an Online Number! Why you'll love your Online Number:
1. Your friends without VoIP can call you
2. You answer on VoIP
3. You also have voicemail included
I like that!
2.07%
You won't believe it
Pick an Online Number! Why get an Online Number:
1. Your friends without VoIP can call you
2. You answer on VoIP
3. You also have voicemail included
I like that!
1.42%
You won't believe it
They dial, you answer on VoIP! Why you'll love your Online Number:
1. Family & friends without VoIP can call
you
2. You answer on VoIP
3. And you can use it from anywhere in the
worldI like that!
1.11%
…another 16 Million +
combinations
The Marketing Communication Suite
We Generate the marketing messages that work best.For any customer, any product, at any time.
Flexi-structured data
One User across campaigns and mediums{ "_id" : ObjectId("511e3cbea9f1fd01fbd51c67"),
Sizing transactional data
☛ User Terminated data☛ User Originated data☛ Metadata (state for User per campaign and globally)☛ Must hold data in memory, or at least indexes
ETL for OLAP
Offline / Online processing•Going online is mostly simpler•Offline must take into account data irregularities (data validation policy driven by business needs)
Your own mini transactions
Break down Spring Batch steps in idempotent and non idempotent ones•For idempotent steps, just replay them•For non idempotent, replace current state with last known good state before latest spring batch step invocation (undo log) and retry the step
Your own mini transactions Issues
•16MB document size limit...•Slow to replay•Hard to test using Selenium
Analytics using the Aggregation Framework
{$project: { "rdd": {$isoDate: {
year: {$year:"$_id.receivedDateHour"}, month: {$month:"$_id.receivedDateHour"},
dayOfMonth: {$dayOfMonth:"$_id.receivedDateHour"},hour: {$hour:"$_id.receivedDateHour"}
} },
"value.diffDaysSum.0":1,"value.diffDaysSum.1":1,"value.diffDaysSum.2":1
} },{$project:{rdd:1, diffDaysSum : {$add : ["$value.diffDaysSum.0",
"$value.diffDaysSum.1", "$value.diffDaysSum.2" ] } } },{$group: {
_id:"$rdd", totalSumPerDay: { $sum: "$diffDaysSum" } } }
Analytics using the Aggregation Framework
Pros:✓ More flexible than it sounds✓ Rapid development✓ Easy debugging
Cons:✘ No custom js supported ✘ Memory limitation✘ API still evolving
Fine grained write semantics and asynchronous magic
Fine grained write semantics•WriteConcern.SAFE for most writes•WriteConcern.REPLICAS_SAFE for writes that are costly to recompute in case of failure
Reactive Mongo •Asynchronous and non blocking scala driver for MongoDB•Async writes with WriteConcern.SAFE and callback retry policy in case of error
Lessons Learned
Usereplica setsJournalingAggregation FrameworkMMS
Don't useDevelopment versions across the teamUnbound datasets that can't fit in memoryMapReduceif you don't need to
MongoDB on EC2 Using LVM's
http://goo.gl/8NbV7
For high performance, use LVM's with RAID 0 or 10Have your guerilla team ready:
MongoDB on EC2 Lesson Learned
Unix level tweaks:•Raise ulimit•Raise tcp timeout•Noatimenodirtime•Use XFS or ext4•Use LVM for snapshotting
Use journaling