Post on 01-Nov-2014
description
transcript
Moma-DjangoOverviewDjango Boston meetup, 02-27-2014
Django + MongoDB: building a custom ORM layer
Overview of the talk:
moma-django is a MongoDB manager for Django. It provides native Django ORM support for MongoDB documents, including the query API and the admin interface. It was developed as a part of two commercial products and released as an open source. In the talk we will review the motivation behind its developments, its features and go through 2-3 examples of how to use some of the features: migrating an existing model, advanced queries and the admin interface. If time permits we will discuss unit testing and south migrations
Who are we?
Company: Cloudoscope.com What we do:
– Cloudoscope’s product enable IT vendors to automate the pre-sales process by collecting and analyzing prospect IT performance
– Previous product - Lucidel: B2C marketing analytics based on website data
– Data intensive projects / sites, NoSQL, analytics focus (as a way of funding)
Gadi Oren: @gadioren, gadioren
Why moma-django?
Certain problems can be addressed well with NoSQL The team wants to experiment with a NoSQL
HOWEVER: A lot of code needs to be rewritten Team learn a new API Some of the tools and procedures are no longer functioning
and should be replaced– Admin interface– Unit testing environment
Some of the data need to be somewhat de-normalized*
Why moma-django? (our example)
Needed a very efficient way of processing timeseries The timeseries where constantly growing We required very detailed search/slice/dice capabilities to
find the timeseries to be processed Some of the data was optional (e.g. demographics
information was never complete) Document size, content and structure varied widelyHowever, we have a small distributed team and we did not
want to create a massive project We started experimenting using a stub Manager doing small
iterations, adding functionality as we needed over nine months
Other packages
PyMongo – a dependency for moma-django
MongoEngine – somewhat similar concepts in terms of models
Non relational versions of Django
“Native” - advantages
Django packages and plugins (e.g. Admin functionality)
Using similar code conventions
Easier to bring in new team members
Use the same unit testing frameworks (e.g. Jenkins)
Simple experimentation and migration path
Let’s make it interactiveQuestions Anyone??? (Example Application)
Small question asking application Allows voting and adding images Implemented as a django application over MongoDB, using
moma-django
Register and login at http://momadjango.org
Ask away!
Migrating an existing model
class TstBook(models.Model): name = models.CharField(max_length=64) publish_date = MongoDateTimeField() author = models.ForeignKey('testing.TstAuthor') class Meta: unique_together = ['name', 'author']
class TstAuthor(models.Model): first_name = models.CharField(max_length=32) last_name = models.CharField(max_length=32)
class TstBook(MongoModel): name = models.CharField(max_length=64) publish_date = MongoDateTimeField() author = models.ForeignKey('testing.TstAuthor') class Meta: unique_together = ['name', 'author']
class TstAuthor(MongoModel): first_name = models.CharField(max_length=32) last_name = models.CharField(max_length=32)
models.signals.post_syncdb.connect(post_syncdb_mongo_handler)
Migrating an existing model (2)
Syncdb:
Add objects
Migrating an existing model (2)
Syncdb:
Add objects
>>> TstBook(name=“Good night half moon”, publish_date=datetime.datetime(2014,2,20), author=TstAuthor.objects.get(first_name=“Gadi”)).save()
Migrating an existing model (3) Breaching uniqueness try and save the same object again:
Migrating an existing model (4) In Mongo: content, indexes
Admin
class Meta: unique_together = ['name', 'author']
New field types
MongoIDField – Internal. Used to hold the MongoDB object ID
MongoDateTimeField – Used for Datetime ValuesField – Used to represent a list of objects of any type StringListField – Used for a list of stringsDictionaryField – Used as a dictionary
Current limitation: nested structures have limited support
Queries and update – 1: bulk insert
records.append( { "_id" : ObjectId("502abdabf7f16836f100285a"), "time_on_site" : 290, "user_id" : 1154449631, "account_id" : NumberLong(5), "campaign" : "(not set)", "first_visit_date" : ISODate("2012-07-30T17:10:06Z"), "referral_path" : "(not set)", "source" : "google", "exit_page_path" : "/some-analysis/lion-king/", "landing_page_path" : "(not set)", "keyword" : "wikipedia lion king", "date" : ISODate("2012-07-30T00:00:00Z"), "visit_count" : 1, "page_views" : 3, "visit_id" : "false---------------1154449631.1343668206", "goal_values" : { }, "goal_starts" : { }, "demographics" : { }, "goal_completions" : { }, "location" : { "cr" : "United States", "rg" : "California", "ct" : "Pasadena" }, })
UniqueVisit.objects.filter(account__in=self.list_of_accounts).delete()
UniqueVisit.objects.bulk_insert( records )
Queries and update – 2: examples
def ISODate(timestr): res = datetime.strptime(timestr, "%Y-%m-%dT%H:%M:%SZ") res = res.replace(tzinfo=timezone.utc) return res
# Datetimeqs = UniqueVisit.objects.filter( first_visit_date__lte =ISODate("2012-07-30T12:29:05Z"))self.assertEqual( qs.query.spec, dict( # pymongo expression {'first_visit_date': {'$lte': datetime(2012, 7, 30, 12, 29, 5, tzinfo=timezone.utc)}}))
# Multiple conditionsqs = UniqueVisit.objects.filter( first_visit_date__lte =ISODate("2012-07-30T12:29:05Z"), time_on_site__gt =10, page_views__gt =2)self.assertEqual( qs.query.spec, dict( # pymongo expression {'time_on_site': {'$gt': 10.0}, 'page_views': {'$gt': 2}, 'first_visit_date': {'$lte': datetime(2012, 7, 30, 12, 29, 5, tzinfo=timezone.utc)}}))
Queries and update– 3: examples
# Different query optimizationsqs = UniqueVisit.objects.filter(Q(time_on_site =10)|Q(time_on_site =25)|Q(time_on_site =275))self.assertEqual( qs.query.spec, dict( # pymongo expression {'time_on_site': {'$in': [10.0, 25.0, 275.0]}}))
# Multiple or Q expressionsqs = UniqueVisit.objects.filter(Q(time_on_site =10)|Q(time_on_site =25)|Q(time_on_site =275)|Q(source = 'bing'))self.assertEqual( qs.query.spec, dict( # pymongo expression {'$or': [{'time_on_site': 10.0}, {'time_on_site': 25.0}, {'time_on_site': 275.0}, {'source': 'bing'}]}))
# Negate Qqs = UniqueVisit.objects.filter(~Q(first_visit_date =ISODate("2012-07-30T12:29:05Z")))self.assertEqual( qs.query.spec, dict( # pymongo expression {'first_visit_date': {'$ne': datetime(2012, 7, 30, 12, 29, 5, tzinfo=timezone.utc)}}))
Queries – 4: extensions beyond standard Django
# Dot notationqs = UniqueVisit.objects.filter(location__rg__exact ="New York")self.assertEqual( qs.query.spec, dict(( # pymongo expression {'location.rg': 'New York'}))
# Check key existenceqs = UniqueVisit.objects.filter(demographics__age__exists ="true")self.assertEqual( qs.query.spec, dict(( # pymongo expression {'demographics.age': {'$exists': 'true'}}))
# variable typeqs = UniqueVisit.objects.filter(landing_page_path__type = int)self.assertEqual( qs.query.spec, dict(( # pymongo expression {'landing_page_path': {'$type': 16}}))
Queries - by the structure of documents# How many documents in the DB?>>> UniqueVisit.objects.all().count()20>>> # For how many documents in the DB do we have age information?>>> UniqueVisit.objects.filter(demographics__age__exists ="true").count()7>>> # For how many documents in the DB do we have gender information?>>> UniqueVisit.objects.filter(demographics__gender__exists ="true").count()3>>> # For how many documents in the DB do we have gender and age information?>>> UniqueVisit.objects.filter(demographics__age__exists ="true“, demographics__gender__exists ="true").count()1>>>
Manipulating documents payload
# Store an image: get the image from the “POST” upload form (snippet)docfile = request.FILES['docfile']question_id = form.cleaned_data['question_id']docfile_name = docfile.namedocfile_name_changed = _replace_dots(docfile.name)question = Question.objects.get(id=question_id)
# Store meta-dataquestion.docs.update({docfile_name_changed : docfile.content_type})question.image.update( {docfile_name_changed +'_url' : '/static/display/s_'+docfile_name, docfile_name_changed +'_name' : docfile_name, docfile_name_changed +'_content_type' : docfile.content_type})
# Store the actual image binary block (small scale implementation)file_read = docfile.file.read() # Note – this is a naïve implementation!file_data = base64.b64encode(file_read)question.image.update({docfile_name_changed +'_data' : file_data})question.save()
# Modelclass Question(MongoModel): user = models.ForeignKey(User) date = MongoDateTimeField(db_index=True) question = models.CharField(max_length=256 )
docs = DictionaryField(models.CharField()) image = DictionaryField(models.TextField()) audio = DictionaryField() other = DictionaryField()
vote_ids = ValuesField(models.IntegerField())
def __unicode__(self): return u'%s[%s %s]' % (self.question, self.date, self.user, ) class Meta: unique_together = ['user', 'question',]
Admin interface
So – what’s next?
Github: https://github.com/gadio/moma-django If you want to contribute – please contact (forking is also an
option) Contact: gadi.oren.1 at gmail.com or
gadi at Cloudoscope.com
Backup
South
Dealing with apps with mixed models South to disregard the model
# Enabling South for the non conventional mongo model
add_introspection_rules( [ ( (MongoIdField, MongoDateTimeField, DictionaryField ), [], { "max_length": ["max_length", {"default": None}], }, ), ], ["^moma_django.fields.*",])
Unit testing
The model name is defined in settings.py In unit testing run, a new mongo DB schema is created
MONGO_COLLECTION prefixed with “test_”(e.g. test_momaexample)
MONGO_HOST = 'localhost'MONGO_PORT = 27017MONGO_COLLECTION = 'momaexample'
Moma-django on google…