Date post: | 12-May-2015 |
Category: |
Software |
Upload: | emptysquare |
View: | 434 times |
Download: | 0 times |
Python Profiling:
A. Jesse Jiryu Davis
@jessejiryudavis
MongoDB
The Glory&
The Guts
“PyMongo is slower!compared to the JavaScript version”
MongoDB Node.js driver:!88,000 per secondPyMongo: ! ! ! ! ! ! ! ! ! 29,000 per second
“Why Is PyMongo Slower?”
From:[email protected]!To:!! [email protected]!CC:!! [email protected]
Hi Jesse,!!Why is the Node MongoDB driver 3 times!faster than PyMongo?!http://dzone.com/articles/mongodb-facts-over-80000
The Python Code
# Obtain a MongoDB collection.!import pymongo!!client = pymongo.MongoClient('localhost')!db = client.random!collection = db.randomData!collection.remove()!
n_documents = 80000!batch_size = 5000!batch = []!!import time!start = time.time()
The Python Code
import random!from datetime import datetime!!min_date = datetime(2012, 1, 1)!max_date = datetime(2013, 1, 1)!delta = (max_date - min_date).total_seconds()!
The Python Code
What?!
The Python Codefor i in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
duration = time.time() - start!!print 'inserted %d documents per second' % (! n_documents / duration)!
The Python Code
inserted 30,000 documents per second
The Node.js Code
(not shown)
The Question
Why is the Python script 3 times slower than the equivalent Node script?
Why Profile?
• Optimization is like debugging• Hypothesis:
“The following change will yield a worthwhile improvement.”
• Experiment
• Repeat until fast enough
Why Profile?
Profiling is a way togenerate hypotheses.
Which Profiler?
• cProfile • GreenletProfiler • Yappi
Yappi
By Sümer Cip
Yappi
Compared to cProfile, it is: !
• As fast • Also measures functions • Can measure CPU time, not just wall• Can measure all threads • Can export to callgrind
Yappiimport yappi!!yappi.set_clock_type('cpu')!yappi.start(builtins=True)!!start = time.time()!!for i in range(n_documents):! # ... same code ... !!duration = time.time() - start!stats = yappi.get_func_stats()!stats.save('callgrind.out', type='callgrind')!
Same code as before
KCacheGrind
for index in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
The Python Code
one third
of the tim
e
for index in range(n_documents):! date = datetime.now()!!!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
The Python Code
The Python Code
• Before: 30,000 inserts per second • After: 50,000 inserts per second
Why Profile?
• Generate hypotheses• Estimate possible improvement
How DoesProfiling Work?
int callback(PyFrameObject *frame,! int what,! PyObject *arg);!
int start(void)!{! PyEval_SetProfile(callback);!}!
PyObject *!PyEval_EvalFrameEx(PyFrameObject *frame)!{! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_CALL,! Py_None);! }!! /* ... execute bytecode in the frame! * until return or exception... */!! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_RETURN,! retval);! }!}!
int callback(PyFrameObject *frame,! int what,! PyObject *arg)!{! switch (what) {! case PyTrace_CALL:! {! PyCodeObject *cobj = frame->f_code;! PyObject *filename = cobj->co_filename;! PyObject *funcname = cobj->co_name;!! /* ... record the function call ... */! }! break;!! /* ... other cases ... */!! }!}!
A. Jesse Jiryu Davis
@jessejiryudavis
MongoDB