CQRS and Event Sourcing with MongoDB and PHP

transcript

CQRS and Event Sourcingwith MongoDB and PHP

About me

Davide Bellettini● Developer at Onebip● TDD addicted

@SbiellONE — about.bellettini.me

What is this talk about

A little bit of context

About OnebipMobile payment platform.Start-up born in 2005, acquired by Neomobile group in 2011.

Onebip today:- 70 countries- 200+ carriers- 5 billions potential users

LAMP stack

It all started with a Monolith

self-contained services communicating via REST

To a distributed system

First class modern NoSQL distributed dbs

Modern services

But the Monolith is still there

The problem

A reporting horror story

We need three new reports!

― Manager

Sure, no problem!

Deal with the legacy SQL schema

Deal with MongoDB

A little bit of queries here,a little bit of map-reduce there

1 month later...

Reports are finally ready!

until...

Your queries are killing production!

― SysAdmin

Still not enough!

Heavy query optimization,adding indexes

Let’s reuse data from other reports(don’t do that)

DB is ok, reports delivered.

but then...

Houston, we have a problem. Reports are not consistent (with other reports)

― Business guy

Mistakesweremade

Lessonslearned

It’s hard to compare different data in a distributed system splitted across multiple domains

#1Avoid multiple sources of truth

Same words, different concepts across domains

#2Ubiquitous language

Changing a report shouldn’t have side effects

#3Fault tolerance to change

Most common solutions

#1ETL + Map-Reduce

Data Warehouse + Consultants

#3Mad science (Yeppa!)

What we wanted

No downtime in production

Consistent across domains

Must have

A system elastic enough to extract any metric

Real time data

Nice to have

In DDD we found the light

CQRS and Event Sourcing

Command-query responsibility segregation

(CQRS)

Commands

Anything that happens in one of your domains is triggered by a command and generates one or more events.

Order received -> payment sent -> Items queued

-> Confirmation email sent

Generate read models from events depending how data need to be actually used (by users and other application internals)

Event SourcingThe fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied.

― Martin Fowler

Starting from the beginning of time, you are literally unrolling history to reach state in a

given time

Unrolling a stream of events

Idea #1

Every change to the state of your application is captured in event object.

“UserLoggedIn”, “PaymentSent”, “UserLanded”

Idea #2

Events are stored in the sequence they were applied inside an event store

Idea #3

Everything is an event. No more state.

Idea #4

One way to store data/events but potentially infinite ways to read them.

A practical exampleTech ops, business control, monitoring, accounting they all are interested in reading data from different views.

Healthy NoSQL

You start with this{ "_id": ObjectId("123"), "username": "Flash", "city": …, "phone": …, "email": …,}

The more successful your company is, the more people

The more people, the more views

With documental dbs it's magically easy to add new fields to your collections.

Soon you might end up with{

"_id": ObjectId("123"),

"username": "Flash",

"city": …,

"phone": …,

"email": …,

"created_at": …,

"updated_at": …,

"ever_tried_to_purchase_something": …,

"canceled_at": …,

"acquisition_channel": …,

"terminated_at": …,

"latest_purchase_date": …,

A bomb waiting to detonate

It’s impossible to keep adding state changes to your documents and then expect to be able to extract them with

a single query.

Exploring Tools

Event Store

● Engineered for event sourcing● Supports projections● By the father of CQRS (Greg Young)● Great performanceshttp://geteventstore.com/

The badBased on Mono, still too unstable.

LevelWHEN

An event store built with Node.js and LevelDB● Faster than light● Completely custom, no tools to handle

aggregates

https://github.com/gabrielelana/levelWHEN

The known path

● PHP (any other language would just do fine)

● MongoDB 2.2.x

Why MongoDB

Events are not relational

Scales well

Awesome aggregation framework

Hands on

Storing Events

Service |

\ [event payload] |

Service --- Queue System <------------> API -> MongoDB

/ [event payload] |

Service |

The write architecture

Queues

Recruiter - https://github.com/gabrielelana/recruiter

MongoDB replica set

A MongoDB replica set with two logical dbs:

1. Event store where we would store events2. Reporting DB where we would store

aggregates and final reports

Anatomy of an event 1/2{ '_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9', 'type': 'an-event-type', 'data': { 'meta' : { … }, 'payload' : { … } }}

Anatomy of an event 2/2'meta' : { 'creation_date': ISODate("2014-21-11T00:00:01Z"), 'saved_date': ISODate("2014-21-11T00:00:02Z"), 'source': 'some-bounded-context', 'correlation_id': 'a-correlation-id'},'payload' : { 'user_id': '1234', 'animal': 'unicorn', 'colour': 'pink', 'purchase_date': ISODate("2014-21-11T00:00:00Z"), 'price': '20/fantaeuros'}

Don’t trust the network: Idempotence{

'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',

The _id field is actually defined client side and ensures idempotence if an event is received two times

Indexes

● Events collection is huge (~100*N documents)

● Use indexes wisely as they are necessary yet expensive

● With suggested event structure:{‘data.meta.created_at’: 1, type:1}

Benchmarking

How many events/second can you store?

Our machines were able to store roughly 150 events/sec. This number can be greatly increased with dedicated IOPS, more aggressive inserting policies, etc...

Final tips

● Use SSD on your storage machines

● Pay attention to write concerns (w=majority)

● Test your replica set fault tolerance

From eventsto meaningful metrics

Sequential Projector -> Event Mapper -> Projection -> Aggregation

The event processing pipeline

A real life problem

What is the conversion rate of our registered users?

#1 The registration event{

'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',

'type': 'user-registered',

'data': {

'meta' : {

'save_date': ISODate("2014-21-11T00:00:09Z"),

'created_at': ISODate("2014-21-11T00:00:01Z"),

'source': 'core-domain',

'correlation_id': 'user-123456'

'payload' : {

'user_id': 123,

'username': 'flash',

'email': 'a-dummy-email@gmail.com',

'country': 'IT'

#2 The purchase event{

'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',

'type': 'user-purchased',

'data': {

'meta' : {

'save_date': ISODate("2014-21-11T00:10:09Z"),

'created_at': ISODate("2014-21-11T00:10:01Z"),

'source': 'payment-gateway',

'correlation_id': 'user-123456'

'payload' : {

'user_id': 123,

'amount': 20,

'value': EUR,

'payment': 'credit_card',

'item': 'fluffy cat'

Sequential projector 1/2

[]->[x]->[]->[x]->[]->[]->[]->[]

|--------------| |------------|

---> Projector

Divides the stream of events into batches, filters events by type and pass those of interest to the mapper

Sequential projector 2/2

● It’s a good idea to select fixed sizes batches to avoid memory problems when you load your Cursor in memory

● Could be a long-running process selecting events as they arrive in realtime

Event mapper 1/3

Translates event fields to the Read Model domain

Takes an event as input, applies a bunch of logic and will return a list of Read Model fields.

Event mapper 2/3

Input event:user-registered

Output:$output = [

'user_id' => 123, // simply copied

'user_name' => 'flash', // simply copied

'email' => 'a-dummy-email@gmail.com', // simply copied

'registered_at' => "2014-21-11T00:00:01Z" // From the data.meta.created_at event field

Event mapper 3/3

Input event:user-purchased

Output:$output = [

'user_id' => 123, // simply copied

'email' => 'a-dummy-email@gmail.com', // simply copied

'purchased_at': "2014-21-11T00:10:01Z" // From the data.meta.created_at event field

Projection

Essentially it is your read model.The data that the business is interested in.

The Projection after event #1

db.users_conversion_rate_projection.findOne()

'user_id': 123,

'user_name': 'flash',

'registered_at': "2014-21-11T00:00:01Z"

The Projection after event #2

'user_id': 123,

'registered_at': "2014-21-11T00:00:01Z"

'purchased_at': "2014-21-11" // Added this field and rewrote others

The Projection collection{

'user_id': 123,

'registered_at': "2014-21-11",

'user_id': 456,

'user_name': 'batman',

'user_id': 789,

'user_name': 'superman',

The Projection - A few thoughts

Note that we didn't copy from events to projection all the available fields. Just relevant ones.

From these two events we could have generated infinite read models such as

● List all purchased products and related amounts for the company buyers

● Map all sales and revenues for our accounting dept

● List transactions for the financial department

One way to write,infinite ways to read!

The aggregation (1) - Total registered users

var registered = db.users_conversion_rate_projection.aggregate([

$match: {

"registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") }

$group: {

_id: { },

count: { $sum:1 }

The aggregation (2) - User with a purchase

var purchased = db.users_conversion_rate_projection.aggregate([

$match: {

"registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") },

"purchased_at": { $exists: true }

$group: {

_id: { },

count: { $sum:1 }

The aggregation (3) - Automate all the things

● You can easily create the aggregation framework statement by composition abstracting the concept of Column.

● This way you can dynamically aggregate your projections on (for example) an API requests.

● If your Projector is a long running process, your projections will be updated to the second and you automagically get realtime data.

Another events usage:Business & Tech Monitoring

Beware of the beast!No Silver Bullet

Events are expensiveThey require a lot of TIME to be parsed

Events are expensiveYou will end up with this billion size collection

(and counting)

Fixing wrong events is painful

Events are complex

Moving around events is horribly painful

Actually it will make your life incredibly difficult with hidden bugs and leaking

documentation.

Mongo won’t help you

Improvements

● Upgrade from MongoDB 2.2.x to 3.0.x● Switch to WiredTiger storage engine to save

Credits

Based on a talk by Jacopo Nardiello

● Slides: http://bit.ly/es-nardiello-2014 ● Video: https://vimeo.com/113370688

@SbiellONE — about.bellettini.me

Thank you!

CQRS and Event Sourcing with MongoDB and PHP

Software