Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | contextio |
View: | 7,519 times |
Download: | 1 times |
Migrating to MongoDBWhy we moved from MySQL to Mongo
Getting to know Mongo
Demo app using Mongo with PHP
Reasons we looked for alternative to RDBM setup
Issues with our RDBM setup
Architecture was highly distributed, number of databases was becoming an issue
Storing similar objects with different structure
Options for scalability
Storing files
Many DBs
In a MySQL server (with MyISAM)...
1 database = 1 directory
1 table = more than 1 file in DB directory
Filesystem limits number of inodes per directory and it’s not that big
Had a mix of MySQL with SQLite databases spreaded across directory hierarchy
Many DBs
In a Mongo server ...
No 1:1 relation between databases and files
Stores data set of files pre-allocated with increasing size
Number of files grows as needed
Using many collections within a single database allowed to move everything in DB server
A “collection”?
RDBM model:
Database has tables which hold records
Records in a table are identical
Document-oriented storage
Database has collections which hold documents
Obj. with differing structure
For example, events where attributes vary based on type of event
Event A: from, att1
Event B: from, att1, att2
Event C: from, att3, att4
What’s your schema for this?
tbl_events_Atbl_events_Atbl_events_A
id from Att1
1 Jim 1237
2 Dave 362
3 Bob 9283
tbl_events_Btbl_events_Btbl_events_Btbl_events_B
id from Att1 Att2
1 Bill 2938 23
2 Jim 632 9
3 Hugh 12832 14
tbl_events_Ctbl_events_Ctbl_events_Ctbl_events_C
id from Att3 Att4
1 Bob hello 7249
2 Bill goodbye 23091
3 Jim testing 2334
tbl_eventstbl_eventstbl_eventstbl_eventstbl_eventstbl_eventstbl_eventsid type from Att1 Att2 Att3 Att4
1 A Jim 1237 NULL NULL NULL
2 A Dave 362 NULL NULL NULL
3 B Bill 2938 23 NULL NULL
4 C Bob NULL NULL hello 7249
5 A Bob 9283 NULL NULL NULL
6 C Bill NULL NULL goodbye 23091
7 B Jim 632 9 NULL NULL
8 B Hugh 12832 14 NULL NULL
9 C Jim NULL NULL testing 2334
tbl_eventstbl_eventstbl_eventstbl_eventsid type from Attributes
1 A Jim “{‘att1’:1237}”
2 A Dave “{‘att1’:362}”
3 B Bill “{‘att1’:2938, ‘att2’:23}”
4 C Bob “{‘att3’:‘hello’, ‘att4’:7249}”
5 A Bob “{‘att1’:9283}”
6 C Bill “{‘att3’:‘goodbye’, ‘att4’:2391}”
7 B Jim “{‘att1’:632, ‘att2’:9}”
8 B Hugh “{‘att1’:12832, ‘att2’:14}”
9 C Jim “{‘att3’:‘testing’, ‘att4’:2334}”
tbl_eventstbl_eventstbl_eventsid type from
1 A Jim
2 A Dave
3 B Bill
4 C Bob
5 A Bob
6 C Bill
7 B Jim
8 B Hugh
9 C Jim
tbl_events_attributestbl_events_attributestbl_events_attributestbl_events_attributesid eventId name value
1 1 att1 1237
2 2 att1 362
3 3 att1 2938
4 3 att2 23
5 4 att3 hello
6 4 att4 7249
7 5 att1 9283
8 6 att3 goodbye
9 6 att4 2391
10 7 att1 632
11 7 att2 9
............
Obj. with differing structure
Document-oriented storage link Mongo is schema-less
1 collection for all events
Each document has the structure applicable for its type
Can index common attributes for queries
events collection :
{id:1, type:’A’, from:‘Jim’, att1:1237}{id:2, type:’A’, from:‘Dave’, att1:362}{id:5, type:’A’, from:‘Bob’, att1:9238}{id:3, type:’B’, from:‘Bill’, att1:2938, att2:23}{id:7, type:’B’, from:‘Jim’, att1:632, att2:9}{id:8, type:’B’, from:‘Hugh’, att1:12832, att2:14}{id:4, type:’C’, from:‘Bill’, att3:‘hello’, att4:7249}{id:6, type:’C’, from:‘Jim’, att3:‘goodbye’, att4:23091}{id:9, type:’C’, from:‘Hugh’, att3:‘testing’, att4:2334}
Options for scalability
MySQL - Master-slave replication
Mongo - Support master slave, replica pairs, master master and ... auto-sharding
Storing files
In MySQL, you can use a table with BLOB field and other field for file meta data
Mongo has GridFS
Built for storage of large objects
Split into chunks, also stores metadata
> db.fs.files.findOne();{! "_id" : ObjectId("4b9525096b00bd59b95f791f"),! "filename" : "user.png",! "length" : 43717,! "chunkSize" : 262144,! "uploadDate" : "Mon Mar 08 2010 11:25:45 GMT-0500 (EST)",! "md5" : "3f6fcd4c0a51655d392fe95a99c29140",! "mimeType" : "image/png"}> db.fs.chunks.findOne();{! "_id" : ObjectId("4b952509c568bb9fc8e3cddb"),! "files_id" : ObjectId("4b9525096b00bd59b95f791f"),! "n" : 0,! "data" : BinData type: 2 len: 43721}
Getting to know MongoDB
Basic concepts
A database has collections which holds documents
Documents in a collection can have any structure
Documents are JSON objects, stored as BSON
Data types:
all basic JSON types: string, integer, boolean, double, null, array, object
Special types: date, object id, binary, regexp, code
Important differences
Collections instead of tables
ObjectID instead of primary keys
References instead of foreign keys
JavaScript code execution instead of stored procedures
[NULL] instead of joins
Inserting data> doc = { author: 'joe', created : new Date('03-28-2009'), title : 'Yet another blog post', text : 'Here is the text...', tags : [ 'example', 'joe' ], comments : [ { author: 'jim', comment: 'I disagree' }, { author: 'nancy', comment: 'Good post' } ]}> db.posts.insert(doc);
Querying data
> db.posts.find();> db.posts.find({‘author’:‘joe’});> db.posts.find({‘comments.author’:‘nancy’});> db.posts.find({‘comments.comment’: /disagree/i });
> db.posts.findOne({‘comment.author’:‘nancy’});> db.posts.find({‘comment.author’:‘nancy’}).limit(5);
> db.posts.find({},{‘author’:true, ‘tags’:true});
> db.posts.find({‘author’:‘nancy’}).sort({‘created’:1});
Querying - advanced features
Support of OR conditions
$ modifiers to introduce conditions
> db.posts.find({timestamp: {$gte:1268149684}});
$where modifiers
> db.pictures.find({$where: function() { return (this.creationTimestamp >= 1268149684) }})
MapReduce
Server-side code execution
> function getUniques() {... var uniques = [];... db.pictures.find({},{tags:true}).forEach(function(pic) {... pic.tags.forEach(function(tag) {... if (uniques.indexOf(tag) == -1) uniques.push(tag);... });... });... return uniques;... }> db.eval(getUniques); [! "firstTag",! "thirdTag",! "toto",! "test",! "comic",! "secondTag"]
Updating data
update( criteria, objNew, upsert, multi )
> db.myColl.update( { name: "Joe" }, { name: "Joe", age: 20 }, true, false );
save(object) - insert or update if _id exists
Update modifier operators
$inc, $set, $unset, $push, $pushAll, $addToSet, $pop, $pull, $pullAll
> db.myColl.update({name:"Joe"}, { $set:{age:20}});
> db.posts.update({author:”Joe”},{$push:{tags:‘hockey’}});
> db.posts.update({},{$addToSet:{tags:‘hockey’}});
Removing data> db.things.remove({}); // removes all> db.things.remove({n:1}); // removes all where n == 1> db.things.remove({_id: myobject._id});
References> p = db.postings.findOne();{! "_id" : ObjectId("4b866f08234ae01d21d89604"),! "author" : "jim",! "title" : "Brewing Methods"}> // get more info on author> db.users.findOne( { _id : p.author } ){ "_id" : "jim", "email" : "[email protected]" }
> x = { name : 'Biology' }{ "name" : "Biology" }> db.courses.save(x)> x{ "name" : "Biology", "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }
> stu = { name : 'Joe', classes : [ new DBRef('courses', x._id) ] }> db.students.save(stu)> stu{ "name" : "Joe", "classes" : [ { "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } ], "_id" : ObjectId("4b0552e4f0da7d1eb6f126a2")}> stu.classes[0]{ "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }
> stu.classes[0].fetch(){ "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1"), "name" : "Biology" }
Limitations to keep in mind
Namespace limit (24 000 collections and indexes)
Database size maxed to 2GB on 32-bit systems ... use a 64-bit production system!
Licensing
MongoDB is GNU AGPL 3.0, supported drivers re Apache License v2.0
From www.mongodb.org/display/DOCS/Licensing :
If you are using a vanilla MongoDB server from either source or binary packages you have NO obligations. You can ignore the rest of this page.
Hands-on example
SQL schema
blobcontent
creationTimestamp int
title varchar
pictureId int
pictures
name varchar
userId int
users
intcreationTimestamp
varchartxt
userId int
pictureId int
comments
tag varchar
pictureId int
tags
let’s see some code ...