© 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

Post on 18-Jan-2018

220 views 0 download

description

-Get started with PyMongo -Sprinkle in some Ming schemas -ORM: When a dict just won’t do

transcript

© 2011Geeknet Inc

Rapid and Scalable Development with MongoDB,

PyMongo, and Ming

Rick Copeland@rick446

rick@geek.net

© 2011Geeknet Inc

Getting Acquainted

http://www.flickr.com/photos/fazen/9079179/

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ORM: When a dict just won’t do

© 2011Geeknet Inc

PyMongo: Getting Started>>> import pymongo>>> conn = pymongo.Connection()>>> connConnection('localhost', 27017)

>>> conn.testDatabase(Connection('localhost', 27017), u'test')

>>> conn.test.fooCollection(Database(Connection('localhost', 27017), u'test'),

u'foo')

>>> conn['test-db']Database(Connection('localhost', 27017), u'test-db')

>>> conn['test-db']['foo-collection']Collection(Database(Connection('localhost', 27017), u'test-db'),

u'foo-collection')

>>> conn.test.foo.bar.bazCollection(Database(Connection('localhost', 27017), u'test'),

u'foo.bar.baz')

© 2011Geeknet Inc

PyMongo: Insert / Update / Delete>>> db = conn.test>>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {’k':5} ] })>>> idObjectId('4e712e21eb033009fa000000')

>>> db.foo.find()<pymongo.cursor.Cursor object at 0x29c7d50>

>>> list(db.foo.find())[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1,

2, {k': 5}]}]

>>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})>>> db.foo.find().next(){u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2,

{k': 5}]}

>>> db.foo.remove({'_id':id})>>> list(db.foo.find())[ ]

© 2011Geeknet Inc

PyMongo: Queries, Indexes>>> db.foo.insert([ dict(x=x) for x in range(10) ])[ObjectId('4e71313aeb033009fa00000b'), … ]

>>> list(db.foo.find({ 'x': {'$gt': 3} }))[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},

{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},

{u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …]

>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},

{u'x': 9}]

>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) .skip(1).limit(2))

[{u'x': 5}, {u'x': 6}]

>>> db.foo.ensure_index([ ('x', pymongo.ASCENDING), ('y', pymongo.DESCENDING) ] )

u'x_1_y_-1'

© 2011Geeknet Inc

PyMongo and Locking

One Rule (for now): Avoid Javascripthttp://www.flickr.com/photos/lizjones/

295567490/

© 2011Geeknet Inc

PyMongo: Aggregation et.al. You gotta write Javascript (for now) It’s pretty slow (single-threaded JS engine) Javascript is used by

$where in a query .group(key, condition, initial, reduce, finalize=None) .map_reduce(map, reduce, out, finalize=None, …)

If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded.

© 2011Geeknet Inc

PyMongo: GridFS>>> import gridfs>>> fs = gridfs.GridFS(db)>>> with fs.new_file() as fp:... fp.write('The file')... >>> fp<gridfs.grid_file.GridIn object at 0x2cae910>>>> fp._idObjectId('4e727f64eb03300c0b000003')>>> fs.get(fp._id).read()'The file'

Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’) Mime type Filename

© 2011Geeknet Inc

PyMongo: GridFS Versioning>>> file_id = fs.put('Moar data!', filename='foo.txt')>>> fs.get_last_version('foo.txt').read()'Moar data!’>>> file_id = fs.put('Even moar data!', filename='foo.txt')>>> fs.get_last_version('foo.txt').read()'Even moar data!’>>> fs.get_version('foo.txt', -2).read()'Moar data!’>>> fs.list()[u'foo.txt']>>> fs.delete(fs.get_last_version('foo.txt')._id)>>> fs.list()[u'foo.txt']>>> fs.delete(fs.get_last_version('foo.txt')._id)>>> fs.list()[]

© 2011Geeknet Inc

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ORM: When a dict just won’t do

© 2011Geeknet Inc

Why Ming? Your data has a schema

Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager

“Unit of work:” Queuing up all your updates can be handy

Python dicts are nice; objects are nicer

© 2011Geeknet Inc

Ming: Engines & Sessions>>> import ming.datastore>>> ds = ming.datastore.DataStore('mongodb://localhost:27017',

database='test')

>>> ds.dbDatabase(Connection('localhost', 27017), u'test')

>>> session = ming.Session(ds)>>> session.dbDatabase(Connection('localhost', 27017), u'test')

>>> ming.configure(**{'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'})

>>> Session.by_name('main').dbDatabase(Connection(u'localhost', 27017), u'test')

© 2011Geeknet Inc

Surprising Data

http://www.flickr.com/photos/pictureclara/5333266789/

© 2011Geeknet Inc

Ming: Define Your Schema

from ming import schema, Field

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

© 2011Geeknet Inc

Ming: Define Your Schema…Once more, with feeling

from ming import Document, Session, Fieldclass WikiDoc(Document): class __mongometa__: session=Session.by_name(’main')

name='wiki_page’

indexes=[ ('title') ]

title = Field(str)

text = Field(str)

Old declarative syntax continues to exist and be supported, but it’s not being actively improved

Sometimes nice when you want additional methods/attrs on your document class

© 2011Geeknet Inc

Ming: Use Your Schema>>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?'))

>>> doc.m.save()>>> WikiDoc.m.find()<ming.base.Cursor object at 0x2c2cd90>

>>> WikiDoc.m.find().all()[{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]

>>> WikiDoc.m.find().one().textu'I can haz cheezburger?’

>>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))>>> doc.m.save()Traceback (most recent call last): File "<stdin>", line 1, …

ming.schema.Invalid: <class 'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul'])

© 2011Geeknet Inc

Ming Bonus:Mongo-in-Memory

>>> ming.datastore.DataStore('mim://', database='test').dbmim.Database(test)

MongoDB is (generally) fast … except when creating databases … particularly when you preallocate

Unit tests like things to be isolated

MIM gives you isolation at the expense of speed & scaling

© 2011Geeknet Inc

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ORM: When a dict just won’t do

© 2011Geeknet Inc

Ming ORM: Classes and Collections from ming import schema, Fieldfrom ming.orm import (mapper, Mapper, RelationProperty,

ForeignIdProperty)

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

class WikiPage(object): passclass Comment(object): pass

ormsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))ormsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))

© 2011Geeknet Inc

Ming ORM: Classes and Collections (declarative)

class WikiPage(MappedClass): class __mongometa__: session = main_orm_session name='wiki_page’ indexes = [ 'title' ]

_id=FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty(‘Comment’)

class Comment(MappedClass): class __mongometa__: session = main_orm_session name='comment’ indexes = [ 'page_id' ]

_id=FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty(str)

© 2011Geeknet Inc

Ming ORM: Sessions and Queries Session ORMSession My_collection.m… My_mapped_class.query… ORMSession actually does stuff

Track object identity Track object modifications Unit of work flushing all changes at once

>>> pg = WikiPage(title='MyPage', text='is here')>>> session.db.wiki_page.count()0

>>> main_orm_session.flush()>>> session.db.wiki_page.count()1

© 2011Geeknet Inc

Ming Plugins

http://www.flickr.com/photos/39747297@N05/5229733647/

© 2011Geeknet Inc

Ming ORM: Extending the Session Various plug points in the session

before_flush after_flush

Some uses Logging changes to sensitive data or for

analytics purposes Full-text search indexing “last modified” fields Performance instrumentation

© 2011Geeknet Inc

Ming ORM: Extending the Mapper Various plug points in the mapper

before_/after_: Insert Update Delete Remove

Some uses Collection/model-specific logging (user creation,

etc.) Anything you might want a SessionExtension for

but would rather do per-model

Related Projects

Minghttp://sf.net/projects/

merciless/MIT License

Zarkovhttp://sf.net/p/zarkov/

Apache License

Allurahttp://sf.net/p/allura/

Apache License

PyMongohttp://

api.mongodb.org/python

Apache License

© 2011Geeknet Inc

Rick Copeland@rick446

rick@geek.nethttp://www.flickr.com/photos/f-oxymoron/

5005673112/