PyGrunn2013 High Performance Web Applications with TurboGears

transcript

High Performance Web Applications

with Python and TurboGears

Alessandro Molina - @__amol__ - amol@turbogears.org

About me

● Python Developer since 2001

● Work @ on Python and iOS

● development team member since 2.1

What's this talk about?

● Some general rules which apply to any framework

● Some quick-wins for TurboGears

● Some real cases reported

● My personal experiences and preferences, feel free to disagree!

Raw Speed

● People seem obsessed with webservers

● The truth is that it doesn't matter so much○ You are not going to serve "Hello World"s

(If you are asking my personal stack is nginx+mod_wsgi or nginx+Circus-Chaussette-gevent)

● Avoid the great idea of serving mostly empty pages performing hundreds of ajax requests○ Browser have limited concurrency○ HTTP has overhead○ You will actually slow down things

● Learn your framework for real

About TurboGears

● Framework for rapid development encouraging flexibility

● Created in 2005, 2.0 was a major rewrite in 2009 to embrace the WSGI standard.

● Object Dispatch based. Regular expressions can get messy, write them only when you must.

● By default an XML template engine with error detection

● Declarative Models with transactional unit of work

● Built in Validation, Authentication, Authorization, Caching, Sessions, Migrations, MongoDB Support and many more.

Looking at the code

class RootController(BaseController):

@expose('myproj.templates.movie') @expose('json') @validate({'movie':SQLAEntityConverter(model.Movie)} def movie(self, movie, **kw): return dict(movie=movie, user=request.identity and request.identity['user'])

Serving /movie/3 as a webpage and /movie/3.json as a json encoded response

What it looks like

Features vs Speed

● TurboGears is a full-stack framework. That makes it quite slow by default!

● The team invested effort to constantly speed it up since 2.1 release.

● Still keeping all the features around has its price

● To cope with this minimal mode got introduced

Use only what you need

● Only use what you really need. Disabling some features can make a big difference:

○ full featured -> ~900 req/sec○ browser language detection -> ~1000 req/sec○ widgets support -> ~1200 req/sec○ sessions -> ~1300 req/sec○ transaction manager -> ~1400 req/sec○ minimal mode -> ~2100 req/sec

Measures are on wsgiref, purpose is only to show delta

Avoid serving statics

Cascading files serving is a common pattern:

static_app = StaticURLParser('statics')app = Cascade([static_app, app])

What it is really happening is a lot:

○ path gets parsed to avoid ../../../etc/passwd○ path gets checked on file system○ a 404 response is generated○ The 404 response is caught by the Cascade

middleware that forwards the requests to your app

Using Caching

● Caching means preorganizing your data the way you are going to use it, if you already did that during design phase you are already half done. Let the template drive your data, not the opposite.

● Frameworks usually provide various type of caching. TurboGears specifically provides:○ @cached_property○ tg.cache object for custom caching○ @beaker_cache for controllers caching○ Template Caching○ Entity Caching

Use HTML5 & JS

● If only small portions of your page change, cache the

page and use JS to perform minor changes.

○ Invalidating your whole cache to say: "Welcome back

Mister X" is not a great idea.

● If you are using Varnish, nginx or any other frontend

cache consider using JS+localstorage instead of cookies for trivial customizations. Cookies will skip frontend caching

Template Caching

● Template caching means prerendering your template based on controller computation results.

● It's common for template to access related entities, those will be cached for you.

● If correctly organized it's the caching behavior with best trade-off

○ Simple to implement

○ Guarantees correctly updates results

● An updated_at field on models is often all you need

WikiPage Caching

● WikiPage caching is the standard template caching example in TurboGears documentation

@expose('wikir.templates.page')@validate({'page':SQLAEntityConverter(model.WikiPage, slugified=True)}, error_handler=fail_with(404))def _default(self, page, *args, **kw): cache_key = '%s-%s' % (page.uid, page.updated_at.strftime('%Y%m%d%H%M%S')) return dict(wikipage=page, tg_cache={'key':cache_key,

'expire':24*3600, 'type':'memory'})

Caching Partials

● Case study: Notifications

○ Page delivered in 2.16s

○ Query took only 2ms

○ Most of the work was actually in rendering each notification

● Caching was useless as notifications happened often, constantly changing content.

Entity Caching

Map each object to a partial: @entitycached decorator makes easy to cache each notification by itself.

from tgext.datahelpers.caching import entitycached

@entitycached('notification')def render_post(notification): return Markup(notification.as_html)

● Page with cached notifications is now delivered in 158ms

● A single notification can be cached up for days, it will

never change.

Caching can be harmful

If you content changes too often, caching on first reuqest can actually be harmful.

If you have multiple processes and a lot of requests you can end up having a race condition on cache update.

Cache Stampede

● During football matches there were thousands of users constantly pressing "refresh" button to reload page.

● Content constantly changed due to match being reported on real time.

● After each update, all the running processes decided that the cache was not valid anymore at the same time, starting to regenerate the cache.

● Sometimes the content changed even while processes were still updating cache for previous update.

Proactive Update

● To solve cache stampede the cache generation has been

bound to an hook on the article update so it happened only once.

● Easy to achieve using Template caching in together with

tg.template_render on article id as cache key

● SQLALchemy @event.listens_for supports even notifications

on relationships, so it's reasy to update page cache even when related comments, tags, and so on change.

A real solution

● The source of the issue were users pressing "reload" button like there's no tomorrow.

● Solutions has been to push updates to the users through a box that updates in real-time.○ No more insane reloads○ Users were actually more satisfied○ Was a lot easier to maintain○ Not only solved match article issues but also reduced

the load on other parts of the website

Real-Time Box

Think Different

● If you are struggling too much at improving performances, you are probabling doing something your application is not meant to do.

● Lesson learnt?○ Soccer fans are eager for updates (no... for real!) ○ There is only one thing that gets more visits than a

football match: Rumors on football players trading

Offload Work

● The only person that know that something changes is the author of the change itself.○ Only update the core cache to provide author with an

immediate feedback○ Don't be afraid of updating related caches

asynchronously. Author usually understands that it might take some time before his changes propagate and other users don't know that a change happened yet.

● You can often provide an answer to user with little instant computation, messages and notifications are a typical example.

Maste-Slave replication is easy

● SQLAlchemy unit of work pattern makes easy for frameworks to do the right thing 90% of the time○ Read from slaves unless we are flushing the session

● Won't require changes to your code for most common cases

● Exceptions are as easy as @with_engine('master')

● As easy as

sqlalchemy.master.url = mysql://masterhost/dbnamesqlalchemy.slaves.slave1.url = mysql://slavehost/dbnamesqlalchemy.slaves.slave2.url = mysql://slavehost/dbname

Fast enough

● Speed should not be your primary focus, but it makes sense to care a lot about it, users tend to get frustrated by slow responses.

● New Relic App Speed Index reports an average of 5.0 seconds of response time for accepted experience.

● That is End-User time, not request time, to achieve 5 seconds you have to aim a lot lower

● Mean Opinion Score degrades quickly when surpassing 200ms. Less than 200ms is perceived as "right now".

http://newrelic.com/ASI

Development Tools

● It's easy to introduce changes with heavy impacts on performances without noticing. Development tools can help keeping under control impact of changes

● The DebugBar provides core utilities to track your application speed while developing:○ Controller Profiling○ Template & Partials Timing○ Query Reporting○ Query Logging for AJAX/JSON requests

Profiling

Keep an eye on your queries

Check even after release

● Users use your application more widely than you might have expected

● Fast now, doesn't mean fast forever. Like test units avoid breaking things, rely on a speed feedback to keep acceptable speed.

● Keep your Apdex T index updated, user expectations evolve!

There is no silver bullet

● Sorry, there is no silver bullet.

● Every application is a separate case, general and framework optimizations can usually provide little benefit when compared to domain specific optimizations

● Understanding how users interact with your application is the golden rule of optimization

● Don't understimate how its easy to do something really slow unconsciously: development tools can help catching those.

PyGrunn2013 High Performance Web Applications with TurboGears

Technology