+ All Categories
Home > Technology > Caching techniques in python, europython2010

Caching techniques in python, europython2010

Date post: 12-May-2015
Category:
Upload: michael-domanski
View: 2,649 times
Download: 2 times
Share this document with a friend
Description:
Slides from europython2010 conference in Birmingham on the subject of caching in python.
Popular Tags:
49
Caching techinques in python Michael Domanski europython 2010 czwartek, 22 lipca 2010
Transcript
Page 1: Caching techniques in python, europython2010

Caching techinques in python

Michael Domanskieuropython 2010

czwartek, 22 lipca 2010

Page 2: Caching techniques in python, europython2010

who I am

• python developer, professionally for a few years now

• experienced also in c and objective-c

• currently working for 10clouds.com

czwartek, 22 lipca 2010

Page 3: Caching techniques in python, europython2010

Interesting intro

• a bit of theory

• common patterns

• common problems

• common solutions

czwartek, 22 lipca 2010

Page 4: Caching techniques in python, europython2010

How I think about cache

• imagine a giant dict storing all your data

• you have to manage all data manually

• or provide some automated behaviour

czwartek, 22 lipca 2010

Page 5: Caching techniques in python, europython2010

similar to....

• manual memory managment in c

• cache is memory

• and you have to controll it manually

czwartek, 22 lipca 2010

Page 6: Caching techniques in python, europython2010

profits

• improved performance

• ...?

czwartek, 22 lipca 2010

Page 7: Caching techniques in python, europython2010

problems

• managing any type of memory is hard

• automation often have to be done custom each time

czwartek, 22 lipca 2010

Page 8: Caching techniques in python, europython2010

common patterns

czwartek, 22 lipca 2010

Page 9: Caching techniques in python, europython2010

memoization

czwartek, 22 lipca 2010

Page 10: Caching techniques in python, europython2010

• very old pattern (circa 1968)

• we own the name to Donald Mitchie

czwartek, 22 lipca 2010

Page 11: Caching techniques in python, europython2010

• we assosciate input with output, and store in somewhere

• based on the assumption that for a given input, output is always the same

how it works

czwartek, 22 lipca 2010

Page 12: Caching techniques in python, europython2010

code example

CACHE_DICT = {}

def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper

czwartek, 22 lipca 2010

Page 13: Caching techniques in python, europython2010

what if output can change?

• our pattern is still usefull

• we simply need to add something

czwartek, 22 lipca 2010

Page 14: Caching techniques in python, europython2010

cache invalidation

czwartek, 22 lipca 2010

Page 15: Caching techniques in python, europython2010

There are only two hard problems in Computer Science: cache invalidation and naming things

Phil Karlton

czwartek, 22 lipca 2010

Page 16: Caching techniques in python, europython2010

• basically, we update data in cache

• we need to know when and what to change

• the more granular you want to be, the harder it gets

czwartek, 22 lipca 2010

Page 17: Caching techniques in python, europython2010

def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key

code example

czwartek, 22 lipca 2010

Page 18: Caching techniques in python, europython2010

common problems

czwartek, 22 lipca 2010

Page 19: Caching techniques in python, europython2010

invalidating too much/not enough

• flushing all data any time something changes

• not flushing cache at all

• tragic effects

czwartek, 22 lipca 2010

Page 20: Caching techniques in python, europython2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

@cached('big_key1')def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings()

return [simple_function1(),simple_function2()]

if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"

czwartek, 22 lipca 2010

Page 21: Caching techniques in python, europython2010

invalidating too soon/too late

• your cache have to be synchronised to you db

• sometimes very hard to spot

• leads to tragic mistakes

czwartek, 22 lipca 2010

Page 22: Caching techniques in python, europython2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2()

if __name__ == '__main__': some_bigger_function()

czwartek, 22 lipca 2010

Page 23: Caching techniques in python, europython2010

superposition of dependancy

• somehow less obvious problem

• eventually you will start caching effects of computation

• you have to know very preciselly of what your data is dependant

czwartek, 22 lipca 2010

Page 24: Caching techniques in python, europython2010

@cached('key1')def simple_function1(): return db_get(id=1)

@cached('key2')def simple_function2(): return db_get(id=2)

# SUPPOSE THIS IS IN ANOTHER MODULE

@cached('key')def some_bigger_function():

return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) }

if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys"

czwartek, 22 lipca 2010

Page 25: Caching techniques in python, europython2010

summing up

• know your data....

• be aware what and when you cache

• take care when using cached data in computation

czwartek, 22 lipca 2010

Page 26: Caching techniques in python, europython2010

common solutions

czwartek, 22 lipca 2010

Page 27: Caching techniques in python, europython2010

process level cache

czwartek, 22 lipca 2010

Page 28: Caching techniques in python, europython2010

why?

• very fast access

• simple to implement

• very effective as long as you’re using single process

czwartek, 22 lipca 2010

Page 29: Caching techniques in python, europython2010

clever tricks with dicts

czwartek, 22 lipca 2010

Page 30: Caching techniques in python, europython2010

code example

CACHE_DICT = {}

def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper

czwartek, 22 lipca 2010

Page 31: Caching techniques in python, europython2010

invalidation

czwartek, 22 lipca 2010

Page 32: Caching techniques in python, europython2010

def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key

code example

czwartek, 22 lipca 2010

Page 33: Caching techniques in python, europython2010

application level cache

czwartek, 22 lipca 2010

Page 34: Caching techniques in python, europython2010

memcache

czwartek, 22 lipca 2010

Page 35: Caching techniques in python, europython2010

• battle tested

• scales

• fast

• supports a few cool features

• behaves a lot like dict

• supports time-based expiration

czwartek, 22 lipca 2010

Page 36: Caching techniques in python, europython2010

• python-memcache

• python-libmemcache

• python-cmemcache

• pylibmc

libraries?

czwartek, 22 lipca 2010

Page 37: Caching techniques in python, europython2010

why no benchmarks

• not the point of this talk :)

• benchmarks are generic, caching is specific

• pick your flavour, think for yourself

czwartek, 22 lipca 2010

Page 38: Caching techniques in python, europython2010

cache = memcache.Client(['localhost:11211'])

def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper

code example

czwartek, 22 lipca 2010

Page 39: Caching techniques in python, europython2010

invalidation

czwartek, 22 lipca 2010

Page 40: Caching techniques in python, europython2010

def mem_invalidate(key): cache.set(str(key), None)

code example

czwartek, 22 lipca 2010

Page 41: Caching techniques in python, europython2010

batch key managment

czwartek, 22 lipca 2010

Page 42: Caching techniques in python, europython2010

• what if I don’t want to expire each key manually

• that’s a lot to remember

• and we have to be carefull :(

czwartek, 22 lipca 2010

Page 43: Caching techniques in python, europython2010

groups?

• group keys into sets

• which are tied to one key per set

• expire one key, instead of twenty

czwartek, 22 lipca 2010

Page 44: Caching techniques in python, europython2010

how to get there?

• store some extra data

• you can store dicts in cache

• and cache behaves like dict

• so it’s a case of comparing keys and values

czwartek, 22 lipca 2010

Page 45: Caching techniques in python, europython2010

#we start with specified key and groupkey='some_key'group='some_group'

# now retrieve some data from memcacheddata=memcached_client.get_multi(key, group)# now data is a dict that should look like #{'some_key' :{'group_key' : '1234',# 'value' : 'some_value' },# 'some_group' : '1234'}#if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']

czwartek, 22 lipca 2010

Page 46: Caching techniques in python, europython2010

def cached(key, group_key='', exp_time=0 ):

# we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper

czwartek, 22 lipca 2010

Page 47: Caching techniques in python, europython2010

questions?

czwartek, 22 lipca 2010

Page 49: Caching techniques in python, europython2010

follow me

twitter: mdomansblog: blog.mdomans.com

czwartek, 22 lipca 2010


Recommended