Lean & Agile with MongoDB

Post on 26-Jan-2015

123 views 2 download

Tags:

description

My slides for lean & agile development with MongoDB from MongoMunich 2012. The talk shows how you can speed up development with NoSQL technologies and also gives some samples from a big data project

transcript

About us

2

Monday, October 15, 12

About us

• first partner of 10gen in Germany (January 2012)

3

Monday, October 15, 12

About me

• Lead DevOps Engineer @comsysto• @loomit• Data Nerd• 3 years of high performance web ops• joined comSysto in March 2012

4

Monday, October 15, 12

Questions

• Please ask during the presentation!

5

Monday, October 15, 12

Lean?

6

Monday, October 15, 12

Lean?

7Continuous InnovationMonday, October 15, 12

Lean?

• Instant feedback from customers about features

• eliminate waste

8

Monday, October 15, 12

Eliminate waste

9

Monday, October 15, 12

Agile?

• Iterative and incremental

10

Monday, October 15, 12

SCRUM

• Scrum is a framework for developing and sustaining complex products

11

Monday, October 15, 12

Kanban

• Pull from a work queue• originated at Toyota in the 1950s

12

Monday, October 15, 12

Agile Adoption

• Ken Schwaber

13

Monday, October 15, 12

Agile Adoption

• “There is no SCRUM police”

14

Monday, October 15, 12

Agile Adoption

• “Use your intelligence”

15

Monday, October 15, 12

Agile Adoption

• Dogmatic Slumber16

Monday, October 15, 12

Don’t be the little girl

17

Monday, October 15, 12

Don’t be the Joker

18

Monday, October 15, 12

Cross functional teams

19

Monday, October 15, 12

Cross functional teams

20

Monday, October 15, 12

8 hats

21

Monday, October 15, 12

Co-location

22

Monday, October 15, 12

Appreciation for simplicity

• “Everything should be as simple as possible, but not simpler”

• paraphrased Albert Einstein23

Monday, October 15, 12

Look familiar?

24

Monday, October 15, 12

NOSQL

25

Monday, October 15, 12

Schema Free

26

“Your data schema is a direct corollary with how you view your business’ direction and tech goals. When you pivot, especially if it’s a significant one, your data may no longer make sense in the context of that change. Give yourself room to breath. A schema-less data model is MUCH easier to adapt to rapidly changing requirements than a highly structured, rigidly enforced schema.”

from:http://www.cleverkoala.com/2010/08/why-your-startup-should-be-using-mongodb/

Monday, October 15, 12

Emergent Architectures

27

Monday, October 15, 12

Move fast and break things

28

Monday, October 15, 12

NOSQL

29

Monday, October 15, 12

Scale out

30

Monday, October 15, 12

AWS

• MongoDB mostly I/O bound• Storage matters

31

Monday, October 15, 12

AWS

• EBS (anywhere from 70 to 300 ops/sec)• EBS provisioned IOPS (stable)• Ephemeral • SSD (much higher ops/sec but costly)• use RAID on EC2 (or not?)

32

Monday, October 15, 12

MongoDB AWS Storage

33

Monday, October 15, 12

AWS

• Naming really matters – combine with Route 53– ec2-174-129-227-92.compute-1.amazonaws.com?

34

Monday, October 15, 12

Sharded Setup

35

Monday, October 15, 12

MongoDB on AWS

36

Monday, October 15, 12

Infrastructure as code

37

Monday, October 15, 12

Use Cases

• Real-Time Analytics Software

• Operational Intelligence

• High Volume Data Feeds

• Hadoop

38

Monday, October 15, 12

Patterns

• Pre Aggregation• Batch

– Hadoop – MapReduce (in MongoDB)– Aggregation Framework

39

Monday, October 15, 12

Pre-Aggregation

• Problem:– You require up-to-the minute data, or up-to-the-second if

possible– The queries for ranges of data (by time) must be as fast as

possible

40

Monday, October 15, 12

Pre-Aggregation

• Best practises– $inc and upsert are your friend– pre-allocate documents– use REST interface

41

Monday, October 15, 12

Batch

• MapReduce• Aggregation Framework• Mongo-Hadoop Connector

42

Monday, October 15, 12

Mongo Hadoop Connector

43

Data Storage Data Processing

Monday, October 15, 12

Projects

• What we have done so far...

44

Monday, October 15, 12

Real Time Twitter Heatmap

45

Monday, October 15, 12

Real Time Twitter Heatmap

• The bubbles in the sea?

Friendly Floatees!

46

Monday, October 15, 12

Friendly Floatees

47

Monday, October 15, 12

Flow

48

Monday, October 15, 12

Real Time Twitter Heatmap

• MongoDB Capped Collections• Flask• Redis• Google Maps• heatmaps.js• Server-Sent Events• http://bit.ly/Ou5SsP

49

Monday, October 15, 12

Pizza Quattro Shardoni

50

Monday, October 15, 12

Quattro Shardoni

• Technology Showcase Product• Complete End2End stack• Real Time Charting• Batch Reporting based on Hadoop

51

Monday, October 15, 12

Quattro Shardoni

52

Monday, October 15, 12

Quattro Shardoni

53

Monday, October 15, 12

Quattro Shardoni

54

Monday, October 15, 12

Quattro Shardoni

• Vortrag heute 12:15 BallSaal A

55Tom Zorc Bernd ZutherMonday, October 15, 12

Operational Intelligence

56

Monday, October 15, 12

Operational Intelligence

• Analyze behavior of users in web shop• Recommend NBA for business• Real Time Analytics

57

Monday, October 15, 12

Online Shop

58

REST

Monday, October 15, 12

Operational Intelligence

• Next best activity for support/callcenter• interpret user session • e.g. “RaspberryPi - strong interest”• exp. 2000 events per seconds

59

Monday, October 15, 12

Operational Intelligence

60

Monday, October 15, 12

Operational Intelligence

61

Monday, October 15, 12

It’s Real Time!

62

Monday, October 15, 12

Big Data Project

• “which analyzes and visualizes data of mobile networks”

63

Monday, October 15, 12

Big Data Project

64

Monday, October 15, 12

Big Data Project

65

Monday, October 15, 12

Big Data Project

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

– fetch all, calculate in service layer

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

– fetch all, calculate in service layer– use MongoDB MapReduce on single node

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards– use MongoDB MapReduce on 24 shards (2

hi1.4xlarge instances)

66

Monday, October 15, 12

Big Data Project

• started as prototype in production now ;-)• “beyond agile”• going from

– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards– use MongoDB MapReduce on 24 shards (2

hi1.4xlarge instances)– use EMR (around 10 m2.4xlarge instances)

66

Monday, October 15, 12

Big Data Project

67

Monday, October 15, 12

Big Data Project

68

Monday, October 15, 12

Big Data Project

• why not use Aggregation Framework?– we started with 2.0.6– would have had to change data model– M/R seemed the way to go (data size)

69

Monday, October 15, 12

Big Data Project

• Numbers– data comes in weekly increments– xTB raw data– 14GB / week (into MongoDB)– data grows in direct proportion to polygon count– currently 1 replica set of 3 m2.4xlarge instances

70

Monday, October 15, 12

MongoDB on AWS

71

Monday, October 15, 12

Big Data Project

• Geo Spatial Features– $within queries (bounding box)– $near queries

72

Monday, October 15, 12

Big Data Project

73

Monday, October 15, 12

Big Data Project

74

Raw Data MapReduce

Monday, October 15, 12

Big Data Project

• more polygons -> more data – key length can become an issue

• using polygons to display cell metrics• tried different types of visualizations

75

Monday, October 15, 12

Big Data Project

• key-size per doc: 1.8KB– bad: {very_descriptive_long_key : “yay”}– good { v : “yay”}

76

Monday, October 15, 12

Big Data Project

77

0 100.0 200.0 300.0 400.0

62

308

GB / year

100000 polygons 500000 polygons

Monday, October 15, 12

Big Data Project

78

Monday, October 15, 12

Big Data Project

• 308GB of EBS storage => 332$ per year– backups / snapshot not considered

79

Monday, October 15, 12

Big Data Project

• Future Plans– new Use Case– expecting about 1TB of data / week

80

Monday, October 15, 12

Conclusion

• rapidly changing business needs• ease of collecting huge amounts of data• infrastructure as part of code• MongoDB provides flexibility

81

Monday, October 15, 12

Comments?

• @comsysto• #MongoMunich2012• http://blog.comsysto.com• Don’t forget the hallway track• Mongo User Group Munich

– http://www.meetup.com/Muenchen-MongoDB-User-Group/

82

Monday, October 15, 12

• http://careers.comsysto.com

83

We are hiring!

Monday, October 15, 12