Webinar: Transitioning from SQL to MongoDB

Post on 06-Jan-2017

618 views 1 download

transcript

Transitioning from SQL to MongoDB

Joe DrumgooleDirector of Developer Advocacy, EMEA

@jdrumgooleJoe.Drumgoole@mongodb.com

V1.2

Before We Begin• This webinar is being recorded

• Use The Chat Window for

• Technical assistance• Q&A

• MongoDB Team will answer quick questions in real time• “Common” questions will be reviewed at the end of the

webinar

Who is your Presenter?• Programmer

• Developer Manager

• Entrepreneur

• Geek

• Some time pre-sales guy

MongoDB: The New Default Database

Document Data Model

Open-Source

Fully FeaturedHigh Performance

Scalable

{ name: “John Smith”, pfxs: [“Dr.”,”Mr.”], address: “10 3rd St.”, phone: {

home: 1234567890, mobile: 1234568138 }}

6

It’s a JSON Database

{u'_id': ObjectId('58511bfbb26a8803b6b4d56c'), u'batchID': 108, u'member': {u'chapters': [{u'id': 1775736,                            u'name': u'London MongoDB User Group',                            u'urlname': u'London-MongoDB-User-Group'},                           {u'id': 1780459,                            u'name': u'Stockholm MongoDB User Group',                            u'urlname': u'Stockholm-MongoDB-User-Group'},                           {u'id': 3478392,                            u'name': u'Dublin MongoDB User Group',                            u'urlname': u'DublinMUG'},                            u'urlname': u'Mannheim-MongoDB-User-Group'}],             u'city': u'Dublin',             u'country': u'Ireland',             u'events_attended': 13,             u'is_organizer': True,             u'join_time': datetime.datetime(2013, 10, 30, 17, 5, 31),             u'last_access_time': datetime.datetime(2016, 12, 13, 15, 45, 27),             u'location': {u'coordinates': [-6.25, 53.33000183105469],                           u'type': u'Point'},             u'member_id': 99473492,             u'member_name': u'Joe Drumgoole',             u'photo_thumb_url': u'http://photos2.meetupstatic.com/photos/member/e/5/0/1/thumb_255178625.jpeg'}, u'timestamp': datetime.datetime(2016, 12, 14, 10, 16, 27, 607000)}

Typed

Hierarchical, with lists and maps

Geo-Spatial

Functions of a Database

• Durable data storage

• Structural representation of data

• CRUD operations

• Authentication and authorization

• Programmer Efficiency?

What Are Your Developers Doing All Day?

1964 - IMS

1977 - Oracle

1984 - dBASE

1991 - MySQL

2009 - MongoDB

The Challenge is Product Development

1976 2016

Business Data Goals

Process Payroll Monthly Process real-time billing to the minute for 1m customers

Release Schedule

Semi-Annually Monthly

Application/Code COBOL, Fortran, Algol, PL/1, assembler, proprietary tools

Python, Java, Node.js, Ruby, PHP, Perl, Scala, Erlang and the rest

Tools None Apache, LAMP, Mean, Eclipse, Intellij, Sourceforge etc.

Database I/VSAM, early RDBMS RDBMS, NoSQL

Rectangles are 1976. Maps and Lists are 2016{ customer_id : 1,

first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {

type : “work”,number: “1-800-555-

1212”},{ type : “home”,

number: “1-800-555-1313”,

DNC: true},{ type : “home”,

number: “1-800-555-1414”,

DNC: true}

] }

An Actual Code ExampleLet’s compare and contrast RDBMS/SQL to MongoDB development using Java over the course of a few weeks.

Some ground rules:1. Observe rules of Software Engineering 101: Assume separation of application,

Data Access Layer, and database implementation

2. Data Access Layer must be able toa. Expose simple, functional, data-only interfaces to the application

• No ORM, frameworks, compile-time bindings, special toolsb. Exploit high performance features of database

3. Focus on core data handling code and avoid distractions that require the same amount of work in both technologies

a. No exception or error handlingb. Leave out DB connection and other setup resources

4. Day counts are a proxy for progress, not actual time to complete indicated task5. Don’t expect to cut and paste this code

The Task: Saving and Fetching Contact data

Map m = new HashMap(); m.put(“name”, “Joe D”);m.put(“id”, “K1”);

Start with this simple, flat shape in the Data Access Layer:

id = save(Map m)And assume we save it in this way:

Map m = fetch(String id)And assume we fetch one by primary key in this way:

Brace yourself…..

MongoDBSQL

DDL: create table contact ( … )

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”);}

save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {

m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));

} return m;}

Day 1: Initial efforts for both technologies

DDL: none

Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())

m = (Map) c.next(); } return m;}

save( Map m ){ collection.insert(Document( m ));}

{“name” : ”Joe D”, “id” : ”K1” }

Day 2: Add simple fields

m.put(“name”, “Joe D”);m.put(“id”, “K1”);m.put(“title”, “Mr.”);m.put(“hireDate”, new Date(2011, 11, 1));

• Capturing title and hireDate is part of adding a new business feature

• It was pretty easy to add two fields to the structure

• …but now we have to change our persistence code

SQL Day 2 (changes in bold)DDL: alter table contact add title varchar(8); alter table contact add hireDate date;

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”);}

save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {

m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”, rs.getString(3));m.put(“hireDate”, rs.getDate(4));

} return m;}

Consequences:1. Code release schedule linked

to database upgrade (new code cannot run on old schema)

2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive)

3. Changes require careful mods in 4 places

4. Beginning of technical debt

MongoDB Day 2

save( Map m ){ collection.insert(Document( m ));}

Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())

m = (Map) c.next(); } return m;}

Advantages:1. Zero time and money spent on

overhead code

2. Code and database not physically linked

3. New material with more fields can be added into existing collections; backfill is optional

4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset

5. No technical debt is created

✔ NO CHANGE

Day 3: Add list of phone numbersm.put(“name”, “Joe D”);m.put(“id”, “K1”);m.put(“title”, “Mr.”);m.put(“hireDate”, new Date(2011, 11, 1));

n1.put(“type”, “work”);n1.put(“number”, “1-800-555-1212”));list.add(n1);n2.put(“type”, “home”));n2.put(“number”, “1-866-444-3131”));list.add(n2);m.put(“phones”, list);

• It was still pretty easy to add this data to the structure• .. but meanwhile, in the persistence code …

REALLY brace yourself…

SQL Day 3 changes: Option 1: Assume just 1 work and 1 home phone number

DDL: alter table contact add work_phone varchar(16); alter table contact add home_phone varchar(16); init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate, work_phone, home_phone ) values ( ?,?,?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, work_phone, home_phone from contact where id = ?”);}

save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { String t = onePhone.get(“type”); String n = onePhone.get(“number”); if(t.equals(“work”)) { contactInsertStmt.setString(5, n);

} else if(t.equals(“home”)) { contactInsertStmt.setString(6, n);

} } contactInsertStmt.execute();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {

m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”, rs.getString(3));m.put(“hireDate”, rs.getDate(4));

Map onePhone;onePhone = new HashMap();onePhone.put(“type”, “work”);onePhone.put(“number”, rs.getString(5));list.add(onePhone);onePhone = new HashMap();onePhone.put(“type”, “home”);onePhone.put(“number”, rs.getString(6));list.add(onePhone);

m.put(“phones”, list);}

This is just plain bad….

SQL Day 3 changes: Option 2:Proper approach with multiple phone

numbersDDL: create table phones ( … )

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”);}

save(Map m){ startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”));

for(Map onePhone : m.get(“phones”)) {c2stmt.setString(1, m.get(“id”));c2stmt.setString(2, onePhone.get(“type”));c2stmt.setString(3,

onePhone.get(“number”));c2stmt.execute();

} contactInsertStmt.execute(); endTrans();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) {

if(i == 0) {m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”,

rs.getString(3));m.put(“hireDate”,

rs.getDate(4)); m.put(“phones”, list);

}Map onePhone = new HashMap();onePhone.put(“type”, rs.getString(5));onePhone.put(“number”, rs.getString(6));

list.add(onePhone);i++;

} return m;}

This took time and money

SQL Day 5: Zero or More Entries

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”);}

Whoops! And it’s also wrong!We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join.

But this ALSO means we have to change the unwind logic

This took more time and money!

while (rs.next()) {if(i == 0) { // …}String s = rs.getString(5);if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”,

rs.getString(6)); list.add(onePhone); } }

…but at least we have a DAL…right?

MongoDB Day 3

Advantages:1. Zero time and money spent on

overhead code

2. No need to fear fields that are “naturally occurring” lists containing data specific to the parent structure and thus do not benefit from normalization and referential integrity

3. Safe from “Zero or More” entities

save( Map m ){ collection.insert(Document( m ));}

Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())

m = (Map) c.next(); } return m;}

✔ NO CHANGE

By Day 14, our structure looks like this:n4.put(“geo”, “US-EAST”);n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } );list2.add(n4);

n4.put(“geo”, “EMEA”);n4.put(“startupApps”, new String[] { “app6” } );n4.put(“useLocalNumberFormats”, false):list2.add(n4);

m.put(“preferences”, list2)

n6.put(“optOut”, true);n6.put(“assertDate”, someDate);seclist.add(n6);m.put(“attestations”, seclist)

m.put(“security”, mapOfDataCreatedByExternalSource);

SQL Day 14

Error: Could not fit all the code into this space.

But very likely, among other things:

• n4.put(“startupApps”,new String[]{“app1”,“app2”,“app3”});was implemented as a single semi-colon delimited string or we had to create another table and change the DAL

• m.put(“security”, anotherMapOfData);was implemented by flattening it out and storing a subset of fields or as a blob

MongoDB Day 14 – and every other day

Advantages:1. Zero time and money spent on

overhead code

2. Persistence is so easy and flexible and backward compatible that the persistor does not upward-influence the shapes we want to persist i.e. the tail does not wag the dog

save( Map m ){ collection.insert(Document( m ));}

Map fetch(String id){ Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext())

m = (Map) c.next(); } return m;}

✔ NO CHANGE

But what if we must do a join?Both RDBMS and MongoDB will have a PhoneTransactions table/collection

{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {

type : “work”,number: “1-800-555-

1212”},{ type : “home”,

number: “1-800-555-1313”,

DNC: true},{ type : “home”,

number: “1-800-555-1414”,

DNC: true}

] }

{ number: “1-800-555-1212”, target: “1-999-238-3423”, duration: 20}{ number: “1-800-555-1212”, target: “1-444-785-6611”, duration: 243}{ number: “1-800-555-1414”, target: “1-645-331-4345”, duration: 132}{ number: “1-800-555-1414”, target: “1-990-875-2134”, duration: 71}

PhoneTransactions

SQL Join Attempt #1select A.id, A.lname, B.type, B.number, C.target, C.durationfrom contact A, phones B, phonestx Cwhere A.id = B.id and B.number = C.number

id | lname | type | number | target | duration-----+--------------+------+----------------+----------------+---------- g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7 g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9

How to turn this into a list of names –each with a list of numbers, each of those with a list of target

numbers?

SQL Unwind Attempt #1Map idmap = new HashMap();ResultSet rs = fetchStmt.execute();while (rs.next()) { String id = rs.getString(“id"); String nmbr = rs.getString("number"); List tnum; Map snum; if((snum = (List) idmap.get(id)) == null) { snum = new HashMap(); idmap.put(did, snum); } if((tnum = snum.get(nmbr)) == null) {

tnum = new ArrayList(); snum.put(number, tnum);

} Map info = new HashMap(); info.put("target", rs.getString("target")); info.put("duration", rs.getInteger("duration")); tnum.add(info);}// idmap[“g9”][“1-900-555-1212”] = ({target:1-222-707-7070,duration:23…)

SQL Join Attempt #2select A.id, A.lname, B.type, B.number, C.target, C.durationfrom contact A, phones B, phonestx Cwhere A.id = B.id and B.number = C.number order by A.id, B.number

id | lname | type | number | target | duration-----+--------------+------+----------------+----------------+---------- g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7 g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23

“Early bail out” from cursor is now possible – but logic to construct list of source and target numbers is similar

SQL is about Disassembly

String s = “select A, B, C, D, E, F from T1,T2,T3 where T1.col = T2.col and T2.col2 = T3.col2 and X = Y and X2 != Y2 and G > 10 and G < 100 and TO_DATE(‘ …”;

ResultSet rs = execute(s);

while(ResultSet.next()) { if(new column1 value from T1) { set up new Object; } if(new column2 value from T2) { set up new Object2 } if(new column3 value from T3) { set up new Object3 } populate maps, lists and scalars}

Design a Big Query including business logic to grab all the data up front

Throw it at the engine

Disassemble Big Rectangle into usable objects with logic implicit in change in column values

MongoDB is about Assembly

Cursor c = coll1.find({“X”:”Y”});while(c.hasNext()) { populate maps, lists and scalars;

Cursor c2 = coll2.find(logic+key from c); while(c2.hasNext()) { populate maps, lists and scalars;

Cursor c3 = coll3.find(logic+key from c2); while(c3.hasNext()) { populate maps, lists and scalars; }}

DIY:

OR assemble usable objects incrementally with explicit calls to $lookup and $graphLookup

MongoDB ”Join”

db.contacts.aggregate([{$unwind: "$phones"},{$lookup: { from: "phonestx”, localField: "phones.number”, foreignField: "number", as:"TX"}}]);

{"customer_id" : 1,"first_name" : "Mark","last_name" : "Smith","city" : "San Francisco","phones" : {

"type" : "home","number" : "1-800-555-

1414","DNC" : true

},"TX" : [

{"number" : "1-800-

555-1414","target" : "1-645-

331-4345","duration" : 132

},{

"number" : "1-800-555-1414",

"target" : "1-990-875-2134",

"duration" : 71}

]}

But what about “real” queries?

• MongoDB query language is a physical map-of-map based structure, not a String

• Operators (e.g. AND, OR, GT, EQ, etc.) and arguments are keys and values in a cascade of Maps

• No grammar to parse, no templates to fill in, no whitespace, no escaping quotes, no parentheses, no punctuation

• Same paradigm to manipulate data is used to manipulate query expressions

• …which is also, by the way, the same paradigm for working with MongoDB metadata and explain()

33

Mongo ShellJD10Gen:mugalyser jdrumgoole$ mongoMongoDB shell version: 3.2.7connecting to: testMongoDB Enterprise > use MUGSswitched to db MUGSMongoDB Enterprise > show collectionsattendeesauditgroupsmemberspast_eventsupcoming_eventsMongoDB Enterprise >MongoDB Enterprise > db.members.find( { "batchID" : 108, "member.member_name" : "Joe Drumgoole" }).pretty(){

"_id" : ObjectId("58511bfbb26a8803b6b4d56c"),"member" : {

"city" : "Dublin","events_attended" : 13,"last_access_time" : ISODate("2016-12-13T15:45:27Z"),"country" : "Ireland","member_id" : 99473492,"chapters" : [

{"urlname" : "London-MongoDB-User-Group","name" : "London MongoDB User Group","id" : 1775736…

MongoDB Query Examples

SQL CLI select * from contact A, phones B whereA.did = B.did and B.type = 'work’;

MongoDB CLI db.contact.find({"phones.type”:”work”});

SQL in Java String s = “select * from contact A, phones B where A.did = B.did and B.type = \'work\’”;ResultSet rs = execute(s);

MongoDB viaJava driver

Cursor c = contact.find(eq( “phones.type”, “work” ));

Find all contacts with at least one work phone

MongoDB Query Examples

SQL select A.did, A.lname, A.hiredate, B.type, B.number from contact A left outer join phones B on (B.did = A.did) where b.type = 'work' or A.hiredate > '2014-02-02'::date

Java db.contacts.find( or( eq( “phones.type’ : ”work” ), gt( “hiredate”, Date( 2014, 2, 2 ))

CLI db.contacts.find( { $or : [ { “phones.type” : “work” }, { “hiredate” : new Date("2014-02-02T00:00:00.000Z")]})

Find all contacts with at least one work phone or hired after 2014-02-02

…and before you ask…

Yes, MongoDB query expressions support

1. Sorting2. Cursor size limit3. Projection (asking for only parts of the rich

shape to be returned)4. Aggregation (“GROUP BY”) functions

Maybe even MORE powerful than SQL…?

> db.results.values.aggregate([{$match: { runnum:23, timeSeriesPath: "CDSSpread.12M//1909468128” }

,{$project: { timeSeriesPath: "$timeSeriesPath", values: foml }}

,{$unwind: {path: "$values", idx: "v_idx"}}

,{$match: {values: {$gt: 60}, {$or: [ {idx: 0}, {idx: {$size: . . .}

,{$group: {_id: {a: "$timeSeriesPath", b: term: "$idx"}, n: {$sum:1}, max: {$max: "$values"}, min: {$min: "$values"}}, sdev: {$stdDevPop: "$values"}}

,{$lookup: { from: ”deskLimits", localField: ”instID", foreignField: ”instID", as: ”inst"}}

,{$match: {maxDeskLimit: {$gt: {$cond: [ {$gt: [2, $max]}, 2, $max]}}}},{$group: {_id: "$deskID", total: {$sum: “$max”}}}]);

What is an Aggregation Pipeline

Match

Project

Join

Graph

Sort

View

39

Aggregation Pipeline

$match{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}

{}

{★ds}{★ds}{★ds}

40

Aggregation Pipeline

$match $project{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}

{}

{★ds}{★ds}{★ds}

{=d+s}

41

Aggregation Pipeline

$match $project{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}

{}

{★ds}{★ds}{★ds}

{★}{★}{★}

{=d+s}

42

Aggregation Pipeline

$match $project $lookup{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}

{}

{★ds}{★ds}{★ds}

{★}{★}{★}{★}

{★}{★}{★}

{=d+s}

43

Aggregation Pipeline

$match $project $lookup{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}

{}

{★ds}{★ds}{★ds}

{★}{★}{★}{★}

{★}{★}{★}

{=d+s}

{★[]}{★[]}{★}

44

Aggregation Pipeline

$match $project $lookup $group{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}{★ds}

{}

{★ds}{★ds}{★ds}

{★}{★}{★}{★}

{★}{★}{★}

{=d+s}

{ Σ λ σ}{ Σ λ σ}{ Σ λ σ}

{★[]}{★[]}{★}

Aggregation Pipeline Stages

• $matchFilter documents

• $geoNearGeospherical query

• $projectReshape documents

• $lookup Left-outer equi joins

• $unwindExpand documents

• $groupSummarize documents

• $sampleRandomly selects a subset of documents

• $sortOrder documents

• $skipJump over a number of documents

• $limitLimit number of documents

• $redactRestrict documents

• $outSends results to a new collection

The Fundamental Change with mongoDB

RDBMS designed in era when:• CPU and disk was slow &

expensive• Memory was VERY expensive• Network? What network?• Languages had limited means to

dynamically reflect on their types• Languages had poor support for

richly structured types

Thus, the database had to• Act as combiner-coordinator of

simpler types• Define a rigid schema• (Together with the code) optimize

at compile-time, not run-time

In mongoDB, the data is the schema!

MongoDB and the Rich Map EcosystemGeneric comparison of two records

Map expr = new HashMap();expr.put("myKey", "K1");DBObject a = collection.findOne(expr);expr.put("myKey", "K2");DBObject b = collection.findOne(expr);List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b);

Getting default values for a thing on a certain date and then overlaying user preferences (like for a calculation run)

Map expr = new HashMap();expr.put("myKey", "DEFAULT");expr.put("createDate", new Date(2013, 11, 1));DBObject a = collection.findOne(expr);expr.clear();expr.put("myKey", "user1");DBObject b = otherCollectionPerhaps.findOne(expr);MapStack s = new MapStack();s.push((Map)a);s.push((Map)b);Map merged = s.project();

Runtime reflection of Maps and Lists enables generic powerful utilities (MapDiff, MapStack) to be created once and used for all kinds of shapes,

saving time and money

Lastly: A CLI with teeth> db.contact.find({"SeqNum": {"$gt”:10000}}).explain();{ "cursor" : "BasicCursor",

"n" : 200000,//..."millis" : 223

}

Try a query and show the diagnostics

> for(v=[],i=0;i<3;i++) {… n = i*50000;… expr = {"SeqNum": {"$gt”: n}};… v.push( [n, db.contact.find(expr).explain().millis)] }

Run it 3 times with smaller and smaller chunks and create a vector of timing result pairs (size,time)

> v[ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ]

Let’s see that vector

> load(“jStat.js”)> jStat.stdev(v.map(function(p){return p[1];}))2.0548046676563256

Use any other javascript you want inside the shell

> for(i=0;i<3;i++) {… expr = {"SeqNum": {"$gt":i*1000}};… db.foo.insert(db.contact.find(expr).explain()); }

Party trick: save the explain() output back into a collection!

And There is More – Compass and AtlasCompass

Atlas

50

What Does This Add Up To?R

elat

iona

l NoSQ

LExpressive Query

Language

StrongConsistency

Secondary Indexes

Flexibility

Scalability

Performance