+ All Categories
Home > Technology > Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB

Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB

Date post: 18-Jul-2015
Category:
Upload: mongodb
View: 1,832 times
Download: 2 times
Share this document with a friend
Popular Tags:
34
1 Breaking the Oracle tie; High Performance OLTP and analytics using MongoDB AlexandrosGiamas Senior Software Engineer
Transcript

1

Breaking the Oracle tie;

High Performance OLTP and analytics using MongoDB

AlexandrosGiamas

Senior Software Engineer

4

Can you afford to leave half the opportunity on the table?

You won't believe it

Pick an Online Number! Why you'll love your Online Number:

1. Your friends without VoIP can call you

2. You answer on VoIP

3. You also have voicemail included

I like that!

2.07%

You won't believe it

Pick an Online Number! Why get an Online Number:

1. Your friends without VoIP can call you

2. You answer on VoIP

3. You also have voicemail included

I like that!

1.42%

You won't believe it

They dial, you answer on VoIP! Why you'll love your Online Number:

1. Family & friends without VoIP can call

you

2. You answer on VoIP

3. And you can use it from anywhere in the

worldI like that!

1.11%

…another 16 Million +

combinations

The Marketing Communication Suite

We Generate the marketing messages that work best.For any customer, any product, at any time.

Persado History

Oracle shop

Persado History

Persado History

• Exponentially growing dataset

• Data value/KB?

Persado History

Not anymore...

Persado History

Transactional Data and Analytics

Transaction (Re)-defined

Social, Mobile, Email, Web, Display, Search

Which one stands out?

Conversational and Transactional PropertiesWeb based channels

Mobile Text Messaging

Conversational and Transactional Properties

Flexi-structured data

One User across campaigns and mediums{ "_id" : ObjectId("511e3cbea9f1fd01fbd51c67"),

Overall Architecture - Data flow

Sizing transactional data

☛ User Terminated data☛ User Originated data☛ Metadata (state for User per campaign and globally)☛ Must hold data in memory, or at least indexes

ETL for OLAP

Offline / Online processing•Going online is mostly simpler•Offline must take into account data irregularities (data validation policy driven by business needs)

ETL for OLAP

☛Custom Data transformation☛Custom “continueOnError” implementation

Analytics

First cut- Custom js server-side using $where

Analytics

GWL

Global Write Lock

Analytics In the real world

Your own mini transactions

Break down Spring Batch steps in idempotent and non idempotent ones•For idempotent steps, just replay them•For non idempotent, replace current state with last known good state before latest spring batch step invocation (undo log) and retry the step

Your own mini transactions Issues

•16MB document size limit...•Slow to replay•Hard to test using Selenium

Analytics In the real world

Map Reduce Implementation

Analytics In the real world

Caching layers✓ Caching in collections

Analytics In the real world

Caching layers✓ Caching in ehcache

Analytics using the Aggregation Framework

{$project: { "rdd": {$isoDate: {

year: {$year:"$_id.receivedDateHour"}, month: {$month:"$_id.receivedDateHour"},

dayOfMonth: {$dayOfMonth:"$_id.receivedDateHour"},hour: {$hour:"$_id.receivedDateHour"}

} },

"value.diffDaysSum.0":1,"value.diffDaysSum.1":1,"value.diffDaysSum.2":1

} },{$project:{rdd:1, diffDaysSum : {$add : ["$value.diffDaysSum.0",

"$value.diffDaysSum.1", "$value.diffDaysSum.2" ] } } },{$group: {

_id:"$rdd", totalSumPerDay: { $sum: "$diffDaysSum" } } }

Analytics using the Aggregation Framework

Double project phase, followed by grouping results

Analytics using the Aggregation Framework

Pros:✓ More flexible than it sounds✓ Rapid development✓ Easy debugging

Cons:✘ No custom js supported ✘ Memory limitation✘ API still evolving

Fine grained write semantics and asynchronous magic

Fine grained write semantics•WriteConcern.SAFE for most writes•WriteConcern.REPLICAS_SAFE for writes that are costly to recompute in case of failure

Reactive Mongo •Asynchronous and non blocking scala driver for MongoDB•Async writes with WriteConcern.SAFE and callback retry policy in case of error

Lessons Learned

Usereplica setsJournalingAggregation FrameworkMMS

Don't useDevelopment versions across the teamUnbound datasets that can't fit in memoryMapReduceif you don't need to

MongoDB on EC2

4 nodes with 6 mongod processes

MongoDB on EC2 Using LVM's

http://goo.gl/8NbV7

For high performance, use LVM's with RAID 0 or 10Have your guerilla team ready:

MongoDB on EC2 Lesson Learned

Unix level tweaks:•Raise ulimit•Raise tcp timeout•Noatimenodirtime•Use XFS or ext4•Use LVM for snapshotting

Use journaling


Recommended