PyConUK2013 - Validated documents on MongoDB with Ming

Post on 27-Jan-2015

109 views 1 download

Tags:

description

Ming is a SQLAlchemy-inspired object-document mapper (ODM) for MongoDB developed at SourceForge which is also used by the TurboGears2 web framework to provide mongodb support. After a short introduction to the basic Ming layer we will cover the Ming Object Document Mapper layer to show how to take advantage of its Unit Of Work to avoid performing incomplete changes and achieve relations between collections. The last part of the talk will show how to use Ming to perform lazy migration of data when your schema changes and how to drop below the ODM layer to achieve maximum speed.

transcript

VALIDATED DOCUMENTS ON MONGODB WITH MING

Alessandro Molina@__amol__

amol@turbogears.org

Who am I

● CTO @ Axant.it, mostly Python company

(with some iOS and Android)

● TurboGears development team member

● Contributions to Ming project ODM layer

● Really happy to be here at PyConUK!

○ I thought I would have crashed my car driving on

the wrong side!

MongoDB Models

● Schema free

○ It looks like you don’t have a schema, but your

code depends on properties that need to be there.

● SubDocuments

○ You know that a blog post contain a list of

comments, but what it is a comment?

● Relations

○ You don’t have joins and foreign keys, but you still

need to express relationships

What’s Ming?

● MongoDB toolkit

○ Validation layer on pymongo

○ Manages schema migrations

○ In Memory MongoDB

○ ODM on top of all of those

● Born at sourceforge.net

● Supported by TurboGears

community

MongoDB

PyMongo

Ming

Ming.ODM

Getting Started with the ODM

● Ming.ODM looks like SQLAlchemy

● UnitOfWork

○ Avoid half-saved changes in case of crashes

○ Flush all your changes at once

● IdentityMap

○ Same DB objects are the same object in memory

● Supports Relations

● Supports events (after_insert, before_update, …)

Declaring Schema with the ODM

class WikiPage(MappedClass): # Metadata for the collection # like its name, indexes, session, ... class __mongometa__: session = DBSession name = 'wiki_page'

unique_indexes = [('title',)]

_id = FieldProperty(schema.ObjectId) title = FieldProperty(schema.String) text = FieldProperty(schema.String)

# Ming automatically generates # the relationship query comments = RelationProperty('WikiComment')

class WikiComment(MappedClass): class __mongometa__: session = DBSession name = 'wiki_comment'

_id = FieldProperty(schema.ObjectId) text=FieldProperty(s.String, if_missing='')

# Provides an actual relation point # between comments and pages page_id = ForeignIdProperty('WikiPage')

● Declarative interface for models

● Supports polymorphic models

Querying the ODM

wp = WikiPage.query.get(title='FirstPage')

# Identity map prevents duplicateswp2 = WikiPage.query.get(title='FirstPage')assert wp is wp2

# manually fetching related commentscomments = WikiComment.query.find(dict(page_id=wp._id)).all()# orcomments = wp.comments

# gets last 5 wikipages in natural orderwps = WikiPage.query.find().sort('$natural', DESCENDING).limit(5).all()

● Query language tries to be natural for both

SQLAlchemy and MongoDB users

The Unit Of Work

● Flush or Clear the pending changes

● Avoid mixing UOW and atomic operations

● UnitOfWork as a cache

wp = WikiPage(title='FirstPage', text='This is my first page')DBSession.flush()

wp.title = "TITLE 2"DBSession.update(WikiPage, {'_id':wp._id}, {'$set': {'title': "TITLE 3"}})DBSession.flush() # wp.title will be TITLE 2, not TITLE 3

wp2 = DBSession.get(WikiPage, wp._id)# wp2 lookup won’t query the database again

How Validation works

● Ming documents are validated at certain

points in their life cycle

○ When saving the document to the database

○ When loading it from the database.

○ Additionally, validation is performed when the

document is created through the ODM layer or

using the .make() method

■ Happens before they get saved for real

Cost of Validation

● MongoDB is famous for its speed, but

validation has a cost

○ MongoDB documents can contain many

subdocuments

○ Each subdocument must be validated by ming

○ Can even contain lists of multiple subdocuments

Cost of Validation benchmark#With Validationclass User(MappedClass): # ... friends = FieldProperty([dict(fbuser=s.String, photo=s.String, name=s.String)], if_missing=[]) >>> timeit.timeit('User.query.find().all()', number=20000)31.97218942642212

#Without Validationclass User(MappedClass): # ... friends = FieldProperty(s.Anything, if_missing=[]) >>> timeit.timeit('User.query.find().all()', number=20000)23.391359090805054

#Avoiding the field at query time>>> timeit.timeit('User.query.find({}, fields=("_id","name")).all()', number=20000)21.58667516708374

Only query what you need

● Previous benchmark explains why it is

good to query only for fields you need to

process the current request

● All the fields you don’t query for, will still

be available in the object with None value

Evolving the Schema

● Migrations are performed lazily as the

objects are loaded from the database

● Simple schema evolutions:

○ New field: It will just be None for old entities.

○ Removed: Declare it as ming.schema.Deprecated

○ Changed Type: Declare it as ming.schema.Migrate

● Complex schema evolutions:

○ Add a migration function in __mongometa__

Complex migrations with Mingclass OldWikiPage(Document): _id = Field(schema.ObjectId) title = Field(str) text = Field(str, if_missing='') metadata = Field(dict(tags=[str], categories=[str]))

class WikiPage(Document): class __mongometa__: session = DBSession name = 'wiki_page' version_of = OldWikiPage

def migrate(data): result = dict(data, version=1, tags=data['metadata']['tags'], categories=data['metadata']['categories']) del result['metadata'] return result

version = Field(1, required=True) # … more fields ...

Testing MongoDB

● Ming makes testing easy

○ Your models can be directly imported from tests

○ Just bind the session to a DataStorage created in

your tests suite

● Ming provides MongoInMemory

○ much like sqlite://:memory:

● Implements 90% of mongodb, including

javascript execution with spidermonkey

Ming for Web Applications

● Ming can be integrated in any WSGI

framework through the ming.odm.

middleware.MingMiddleware

○ Automatically disposes open sessions at the end

of requests

○ Automatically provides session flushing

○ Automatically clears the session in case of

exceptions

Ming with TurboGears

● Provides builtin support for ming

○ $ gearbox quickstart --ming projectname

● Ready made test suite with fixtures on MIM

● Facilities to debug and benchmark Ming

queries through the DebugBar

● TurboGears Admin automatically

generates CRUD from Ming models

Debugging MongoDB

● TurboGears debugbar has builtin support

for MongoDB

○ Executed queries logging and results

○ Queries timing

○ Syntax prettifier and highlight for Map-Reduce and

$where javascript code

○ Queries tracking on logs for performance

reporting of webservices

DebugBar in action

Ming without learning MongoDB

● Transition from SQL/Relational solutions

to MongoDB can be scary first time.

● You can use Sprox to lower the learning

cost for simple applications

○ Sprox is the library that empowers TurboGears

Admin to automatically generate pages from

SQLA or Ming

Sprox ORM abstractions

● ORMProvider, provides an abstraction over

the ORM

● ORMProviderSelector, automatically

detects the provider to use from a model.

● Mix those together and you have a db

independent layer with automatic storage

backend detection.

Hands on Sprox

● Provider.query(self, entity, **kwargs) → get all objects of a collection

● Provider.get_obj(self, entity, params) → get an object ● Provider.update(self, entity, params) → update an

object● Provider.create(self, entity, params) → create a new

object

# Sprox (Ming or SQLAlchemy)count, transactions = provider.query(MoneyTransfer)

transactions = DBSession.query(MoneyTransfer).all() # SQLAlchemytransactions = MoneyTransfer.query.find().all() # Ming

Questions?