PyConUK2013 - Validated documents on MongoDB with Ming

transcript

VALIDATED DOCUMENTS ON MONGODB WITH MING

Alessandro Molina@__amol__

amol@turbogears.org

Who am I

● CTO @ Axant.it, mostly Python company

(with some iOS and Android)

● TurboGears development team member

● Contributions to Ming project ODM layer

● Really happy to be here at PyConUK!

○ I thought I would have crashed my car driving on

the wrong side!

MongoDB Models

● Schema free

○ It looks like you don’t have a schema, but your

code depends on properties that need to be there.

● SubDocuments

○ You know that a blog post contain a list of

comments, but what it is a comment?

● Relations

○ You don’t have joins and foreign keys, but you still

need to express relationships

What’s Ming?

● MongoDB toolkit

○ Validation layer on pymongo

○ Manages schema migrations

○ In Memory MongoDB

○ ODM on top of all of those

● Born at sourceforge.net

● Supported by TurboGears

community

MongoDB

PyMongo

Ming.ODM

Getting Started with the ODM

● Ming.ODM looks like SQLAlchemy

● UnitOfWork

○ Avoid half-saved changes in case of crashes

○ Flush all your changes at once

● IdentityMap

○ Same DB objects are the same object in memory

● Supports Relations

● Supports events (after_insert, before_update, …)

Declaring Schema with the ODM

class WikiPage(MappedClass): # Metadata for the collection # like its name, indexes, session, ... class __mongometa__: session = DBSession name = 'wiki_page'

unique_indexes = [('title',)]

_id = FieldProperty(schema.ObjectId) title = FieldProperty(schema.String) text = FieldProperty(schema.String)

# Ming automatically generates # the relationship query comments = RelationProperty('WikiComment')

class WikiComment(MappedClass): class __mongometa__: session = DBSession name = 'wiki_comment'

_id = FieldProperty(schema.ObjectId) text=FieldProperty(s.String, if_missing='')

# Provides an actual relation point # between comments and pages page_id = ForeignIdProperty('WikiPage')

● Declarative interface for models

● Supports polymorphic models

Querying the ODM

wp = WikiPage.query.get(title='FirstPage')

# Identity map prevents duplicateswp2 = WikiPage.query.get(title='FirstPage')assert wp is wp2

# manually fetching related commentscomments = WikiComment.query.find(dict(page_id=wp._id)).all()# orcomments = wp.comments

# gets last 5 wikipages in natural orderwps = WikiPage.query.find().sort('$natural', DESCENDING).limit(5).all()

● Query language tries to be natural for both

SQLAlchemy and MongoDB users

The Unit Of Work

● Flush or Clear the pending changes

● Avoid mixing UOW and atomic operations

● UnitOfWork as a cache

wp = WikiPage(title='FirstPage', text='This is my first page')DBSession.flush()

wp.title = "TITLE 2"DBSession.update(WikiPage, {'_id':wp._id}, {'$set': {'title': "TITLE 3"}})DBSession.flush() # wp.title will be TITLE 2, not TITLE 3

wp2 = DBSession.get(WikiPage, wp._id)# wp2 lookup won’t query the database again

How Validation works

● Ming documents are validated at certain

points in their life cycle

○ When saving the document to the database

○ When loading it from the database.

○ Additionally, validation is performed when the

document is created through the ODM layer or

using the .make() method

■ Happens before they get saved for real

Cost of Validation

● MongoDB is famous for its speed, but

validation has a cost

○ MongoDB documents can contain many

subdocuments

○ Each subdocument must be validated by ming

○ Can even contain lists of multiple subdocuments

Cost of Validation benchmark#With Validationclass User(MappedClass): # ... friends = FieldProperty([dict(fbuser=s.String, photo=s.String, name=s.String)], if_missing=[]) >>> timeit.timeit('User.query.find().all()', number=20000)31.97218942642212

#Without Validationclass User(MappedClass): # ... friends = FieldProperty(s.Anything, if_missing=[]) >>> timeit.timeit('User.query.find().all()', number=20000)23.391359090805054

#Avoiding the field at query time>>> timeit.timeit('User.query.find({}, fields=("_id","name")).all()', number=20000)21.58667516708374

Only query what you need

● Previous benchmark explains why it is

good to query only for fields you need to

process the current request

● All the fields you don’t query for, will still

be available in the object with None value

Evolving the Schema

● Migrations are performed lazily as the

objects are loaded from the database

● Simple schema evolutions:

○ New field: It will just be None for old entities.

○ Removed: Declare it as ming.schema.Deprecated

○ Changed Type: Declare it as ming.schema.Migrate

● Complex schema evolutions:

○ Add a migration function in __mongometa__

Complex migrations with Mingclass OldWikiPage(Document): _id = Field(schema.ObjectId) title = Field(str) text = Field(str, if_missing='') metadata = Field(dict(tags=[str], categories=[str]))

class WikiPage(Document): class __mongometa__: session = DBSession name = 'wiki_page' version_of = OldWikiPage

def migrate(data): result = dict(data, version=1, tags=data['metadata']['tags'], categories=data['metadata']['categories']) del result['metadata'] return result

version = Field(1, required=True) # … more fields ...

Testing MongoDB

● Ming makes testing easy

○ Your models can be directly imported from tests

○ Just bind the session to a DataStorage created in

your tests suite

● Ming provides MongoInMemory

○ much like sqlite://:memory:

● Implements 90% of mongodb, including

javascript execution with spidermonkey

Ming for Web Applications

● Ming can be integrated in any WSGI

framework through the ming.odm.

middleware.MingMiddleware

○ Automatically disposes open sessions at the end

of requests

○ Automatically provides session flushing

○ Automatically clears the session in case of

exceptions

Ming with TurboGears

● Provides builtin support for ming

○ $ gearbox quickstart --ming projectname

● Ready made test suite with fixtures on MIM

● Facilities to debug and benchmark Ming

queries through the DebugBar

● TurboGears Admin automatically

generates CRUD from Ming models

Debugging MongoDB

● TurboGears debugbar has builtin support

for MongoDB

○ Executed queries logging and results

○ Queries timing

○ Syntax prettifier and highlight for Map-Reduce and

$where javascript code

○ Queries tracking on logs for performance

reporting of webservices

DebugBar in action

Ming without learning MongoDB

● Transition from SQL/Relational solutions

to MongoDB can be scary first time.

● You can use Sprox to lower the learning

cost for simple applications

○ Sprox is the library that empowers TurboGears

Admin to automatically generate pages from

SQLA or Ming

Sprox ORM abstractions

● ORMProvider, provides an abstraction over

the ORM

● ORMProviderSelector, automatically

detects the provider to use from a model.

● Mix those together and you have a db

independent layer with automatic storage

backend detection.

Hands on Sprox

● Provider.query(self, entity, **kwargs) → get all objects of a collection

● Provider.get_obj(self, entity, params) → get an object ● Provider.update(self, entity, params) → update an

object● Provider.create(self, entity, params) → create a new

object

# Sprox (Ming or SQLAlchemy)count, transactions = provider.query(MoneyTransfer)

transactions = DBSession.query(MoneyTransfer).all() # SQLAlchemytransactions = MoneyTransfer.query.find().all() # Ming

Questions?

PyConUK2013 - Validated documents on MongoDB with Ming

Technology