Lean & Agile with MongoDB

transcript

MongoMunich 2012

#MongoDBMunich@comsysto

Monday, October 15, 12

About us

• first partner of 10gen in Germany (January 2012)

About me

• Lead DevOps Engineer @comsysto• @loomit• Data Nerd• 3 years of high performance web ops• joined comSysto in March 2012

Questions

• Please ask during the presentation!

7Continuous InnovationMonday, October 15, 12

• Instant feedback from customers about features

• eliminate waste

Eliminate waste

Agile?

• Iterative and incremental

• Scrum is a framework for developing and sustaining complex products

Kanban

• Pull from a work queue• originated at Toyota in the 1950s

Agile Adoption

• Ken Schwaber

Agile Adoption

• “There is no SCRUM police”

Agile Adoption

• “Use your intelligence”

Agile Adoption

• Dogmatic Slumber16

Don’t be the little girl

Don’t be the Joker

Cross functional teams

8 hats

Co-location

Appreciation for simplicity

• “Everything should be as simple as possible, but not simpler”

• paraphrased Albert Einstein23

Look familiar?

Schema Free

“Your data schema is a direct corollary with how you view your business’ direction and tech goals. When you pivot, especially if it’s a significant one, your data may no longer make sense in the context of that change. Give yourself room to breath. A schema-less data model is MUCH easier to adapt to rapidly changing requirements than a highly structured, rigidly enforced schema.”

from:http://www.cleverkoala.com/2010/08/why-your-startup-should-be-using-mongodb/

Emergent Architectures

Move fast and break things

Scale out

• MongoDB mostly I/O bound• Storage matters

• EBS (anywhere from 70 to 300 ops/sec)• EBS provisioned IOPS (stable)• Ephemeral • SSD (much higher ops/sec but costly)• use RAID on EC2 (or not?)

MongoDB AWS Storage

• Naming really matters – combine with Route 53– ec2-174-129-227-92.compute-1.amazonaws.com?

Sharded Setup

MongoDB on AWS

Infrastructure as code

Use Cases

• Real-Time Analytics Software

• Operational Intelligence

• High Volume Data Feeds

• Hadoop

Patterns

• Pre Aggregation• Batch

– Hadoop – MapReduce (in MongoDB)– Aggregation Framework

Pre-Aggregation

• Problem:– You require up-to-the minute data, or up-to-the-second if

possible– The queries for ranges of data (by time) must be as fast as

possible

Pre-Aggregation

• Best practises– $inc and upsert are your friend– pre-allocate documents– use REST interface

• MapReduce• Aggregation Framework• Mongo-Hadoop Connector

Mongo Hadoop Connector

Data Storage Data Processing

Projects

• What we have done so far...

Real Time Twitter Heatmap

• The bubbles in the sea?

Friendly Floatees!

Friendly Floatees

• MongoDB Capped Collections• Flask• Redis• Google Maps• heatmaps.js• Server-Sent Events• http://bit.ly/Ou5SsP

Pizza Quattro Shardoni

Quattro Shardoni

• Technology Showcase Product• Complete End2End stack• Real Time Charting• Batch Reporting based on Hadoop

Quattro Shardoni

• Vortrag heute 12:15 BallSaal A

55Tom Zorc Bernd ZutherMonday, October 15, 12

Operational Intelligence

• Analyze behavior of users in web shop• Recommend NBA for business• Real Time Analytics

Online Shop

• Next best activity for support/callcenter• interpret user session • e.g. “RaspberryPi - strong interest”• exp. 2000 events per seconds

It’s Real Time!

Big Data Project

• “which analyzes and visualizes data of mobile networks”

Big Data Project

• started as prototype in production now ;-)

Big Data Project

• started as prototype in production now ;-)• “beyond agile”

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

Big Data Project

– fetch all, calculate in service layer

Big Data Project

– fetch all, calculate in service layer– use MongoDB MapReduce on single node

Big Data Project

– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards

Big Data Project

– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards– use MongoDB MapReduce on 24 shards (2

hi1.4xlarge instances)

Big Data Project

– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards– use MongoDB MapReduce on 24 shards (2

hi1.4xlarge instances)– use EMR (around 10 m2.4xlarge instances)

Big Data Project

• why not use Aggregation Framework?– we started with 2.0.6– would have had to change data model– M/R seemed the way to go (data size)

Big Data Project

• Numbers– data comes in weekly increments– xTB raw data– 14GB / week (into MongoDB)– data grows in direct proportion to polygon count– currently 1 replica set of 3 m2.4xlarge instances

MongoDB on AWS

Big Data Project

• Geo Spatial Features– $within queries (bounding box)– $near queries

Big Data Project

Raw Data MapReduce

Big Data Project

• more polygons -> more data – key length can become an issue

• using polygons to display cell metrics• tried different types of visualizations

Big Data Project

• key-size per doc: 1.8KB– bad: {very_descriptive_long_key : “yay”}– good { v : “yay”}

Big Data Project

0 100.0 200.0 300.0 400.0

GB / year

100000 polygons 500000 polygons

Big Data Project

• 308GB of EBS storage => 332$ per year– backups / snapshot not considered

Big Data Project

• Future Plans– new Use Case– expecting about 1TB of data / week

Conclusion

• rapidly changing business needs• ease of collecting huge amounts of data• infrastructure as part of code• MongoDB provides flexibility

Comments?

• @comsysto• #MongoMunich2012• http://blog.comsysto.com• Don’t forget the hallway track• Mongo User Group Munich

– http://www.meetup.com/Muenchen-MongoDB-User-Group/

• http://careers.comsysto.com

We are hiring!

Lean & Agile with MongoDB

MongoMunich 2012

#MongoDBMunich@comsysto