Date post: | 15-Jun-2015 |
Category: |
Technology |
Upload: | piotr-przymus |
View: | 1,056 times |
Download: | 1 times |
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Everything You Always Wanted to Know AboutMemory in PythonBut Were Afraid to Ask
Piotr Przymus
Nicolaus Copernicus University
Europython 2014,Berlin
P. Przymus 1/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
About Me
Piotr PrzymusPhD student / Research Assistant at Nicolaus Copernicus University.Interests: databases, GPGPU computing, datamining.8 years of Python experience.Some of my Python projects:
Parts of trading platform in turbineam.com (back testing, tradingalgorithms)Mussels bio-monitoring analysis and data mining software.Simulator of heterogeneus processing environment for evaluation ofdatabase query scheduling algorithms.
P. Przymus 2/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Size of objects
Table: Size of different types in bytes
Type Python32 bit 64 bit
int (py-2.7) 12 24long (py-2.7) / int (py-3.3) 14 30
+2 · number of digitsfloat 16 24complex 24 32str (py-2.7) / bytes (py-3.3) 24 40
+2 · lengthunicode (py-2.7) / str (py-3.3) 28 52
+(2 or 4) ∗ lengthtuple 24 64
+(4 · length) +(8 · length)
P. Przymus 3/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
DIY – check size of objects
sys.getsizeof(obj)
From documentationSince Python 2.6Return the size of an object in bytes. The object can be any type.All built-in objects will return correct results.May not be true for third-party extensions as it is implementationspecific.Calls the object’s sizeof method and adds an additional garbagecollector overhead if the object is managed by the garbage collector.
P. Przymus 4/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Objects interning – fun example
1 a = [ i % 257 for i in xrange (2**20) ]2
Listing 1: List of interned integers
1 b = [ 1024 + i % 257 for i in xrange (2**20) ]2
Listing 2: List of integers
Any allocation difference between Listing 1 and Listing 2 ?
Results measured using psutilsListing 1 – (resident=15.1M, virtual=2.3G)Listing 2 – (resident=39.5M, virtual=2.4G)
P. Przymus 5/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Objects interning – fun example
1 a = [ i % 257 for i in xrange (2**20) ]2
Listing 3: List of interned integers
1 b = [ 1024 + i % 257 for i in xrange (2**20) ]2
Listing 4: List of integers
Any allocation difference between Listing 1 and Listing 2 ?
Results measured using psutilsListing 1 – (resident=15.1M, virtual=2.3G)Listing 2 – (resident=39.5M, virtual=2.4G)
P. Przymus 5/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Objects interning – explained
Objects and variables – general ruleObjects are allocated on assignmentVariables just point to objects (i.e. they do not hold the memory)
Interning of ObjectsThis is an exception to the general rule.Python implementation specific (examples from CPython).”Often” used objects are preallocated and are shared instead of costlynew alloc.Mainly due to the performance optimization.
1 >>> a = 0, b = 02 >>> a is b, a == b3 (True , True)4
Listing 5: Interning of Objects
1 >>> a = 1024 , b = 10242 >>> a is b, a == b3 (False , True)4
Listing 6: Objects allocationP. Przymus 6/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Objects interning – behind the scenes
WarningThis is Python implementation dependent.This may change in the future.This is not documented because of the above reasons.For reference consult the source code.
CPython 2.7 - 3.4Single instances for:
int – in range [−5, 257)str / unicode – empty string and all length=1 stringsunicode / str – empty string and all length=1 strings for Latin-1tuple – empty tuple
P. Przymus 7/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
String interning – example
1 >>> a, b = " strin ", " string "2 >>> a + ’g’ is b # returns False3 >>> intern (a+’g’) is intern (b) # returns True4 >>> a = [ "spam %d" % (i % 257)\5 for i in xrange (2**20) ]6 >>> # memory usage ( resident =57.6M, virtual =2.4G)7 >>> a = [ intern ("spam %d" % (i % 257))\8 for i in xrange (2**20) ]9 >>> # memory usage ( resident =14.9M, virtual =2.3G)
10
Listing 7: String interning
P. Przymus 8/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
String interning – explained
String interning definitionString interning is a method of storing only one copy of each distinct stringvalue, which must be immutable.
intern (py-2.x) / sys.intern (py-3.x)From Cpython documentation:
Enter string in the table of “interned” strings.Return the interned string (string or string copy).Useful to gain a little performance on dictionary lookup (keycomparisons after hashing can be done by a pointer compare instead ofa string compare).Names used in programs are automatically internedDictionaries used to hold module, class or instance attributes haveinterned keys.
P. Przymus 9/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Mutable Containers Memory Allocation Strategy
Plan for growth and shrinkageSlightly overallocate memory neaded by container.Leave room to growth.Shrink when overallocation threshold is reached.
Reduce number of expensive function calls:relloc()memcpy()
Use optimal layout.
List, Sets, Dictionaries
P. Przymus 10/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
List allocation – example
Figure: List growth example
P. Przymus 11/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
List allocation strategy
Represented as fixed-length array of pointers.Overallocation for list growth (by append)
List size growth: 4, 8, 16, 25, 35, 46, . . .For large lists less then 12.5%
Due to the memory actions involved, operations:at end of list are cheap (rare realloc),in the middle or beginning require memory copy or shift!
Note that for 1,2,5 elements lists, space is wasted.List allocation size:
32 bits – 32 + (4 * length)64 bits – 72 + (8 * length)
Shrinking only when list size < 1/2 of allocated space.
P. Przymus 12/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Overallocation of dictionaries/sets
Represented as fixed-length hash tables.Overallocation for dict/sets – when 2/3 of capacity is reached.
if number of elements < 50000: quadruple the capacityelse: double the capacity
1 // dict growth strategy2 (mp ->ma_used >50000 ? 2 : 4) * mp -> ma_used ;3 // set growth strategy4 so ->used >50000 ? so ->used *2 : so ->used *4);5
Dict/Set growth/shrink code1 for ( newsize = PyDict_MINSIZE ;2 newsize <= minused && newsize > 0;3 newsize <<= 1);4
Shrinkage if dictionary/set fill (real and dummy elements) is much largerthan used elements (real elements) i.e. lot of keys have been deleted.
P. Przymus 13/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Various data representation
1 # Fields : field1 , field2 , field3 , ... , field82 # Data: "foo 1", "foo 2", "foo 3", ... , "foo 8"3 class OldStyleClass : #only py -2.x4 ...5 class NewStyleClass ( object ): # default for py -3.x6 ...7 class NewStyleClassSlots ( object ):8 __slots__ = (’field1 ’, ’field2 ’, ...)9 ...
10 import collections as c11 NamedTuple = c. namedtuple (’nt ’, [ ’field1 ’, ... ,])12
13 TupleData = (’value1 ’, ’value2 ’, ....)14 ListaData = [’value1 ’, ’value2 ’, ....]15 DictData = {’field1 ’:, ’value2 ’, ....}16
Listing 8: Various data representation
P. Przymus 14/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Various data representation – allocated memory
0 MB 50 MB 100 MB 150 MB
OldStyleClass
NewStyleClass
DictData
NamedTuple
TupleData
ListaData
NewStyleClassWithSlots
Python 2.x Python 3.x
Figure: Allocated memory after creating 100000 objects with 8 fields eachP. Przymus 15/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Notes on garbage collector, reference count and cycles
Python garbage collectorUses reference counting.Offers cycle detection.Objects garbage-collected when count goes to 0.Reference increment, e.g.: object creation, additional aliases, passed tofunctionReference decrement, e.g.: local reference goes out of scope, alias isdestroyed, alias is reassigned
Warning – from documentationObjects that have del () methods and are part of a reference cycle causethe entire reference cycle to be uncollectable!
Python doesn’t collect such cycles automatically.It is not possible for Python to guess a safe order in which to run the
del () methods.
P. Przymus 16/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Tools
psutilmemory profilerobjgraphMeliae (could be combined with runsnakerun)Heapy
P. Przymus 17/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Tools – psutil
psutil – A cross-platform process and system utilities module for Python.
1 import psutil2 import os3 ...4 p = psutil . Process (os. getpid ())5 pinfo = p. as_dict ()6 ...7 print pinfo [’memory_percent ’],8 print pinfo [’memory_info ’].rss , pinfo [’memory_info ’]. vms
Listing 9: Various data representation
P. Przymus 18/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Tools – memory profiler
memory profiler – a module for monitoring memory usage of a pythonprogram.
Recommended dependency: psutil.May work as:
Line-by-line profiler.Memory usage monitoring (memory in time).Debugger trigger – setting debugger breakpoints.
P. Przymus 19/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
memory profiler – Line-by-line profilerPreparation
To track particular functions use profile decorator.Running
1 python -m memory_profiler
1 Line # Mem usage Increment Line Contents2 ================================================3 45 9.512 MiB 0.000 MiB @profile4 46 def create_lot_of_stuff (
times = 10000 , cl = OldStyleClass ):5 47 9.516 MiB 0.004 MiB ret = []6 48 9.516 MiB 0.000 MiB t = "foo %d"7 49 156.449 MiB 146.934 MiB for i in xrange ( times ):8 50 156.445 MiB -0.004 MiB l = [ t % (j + i%8)
for j in xrange (8)]9 51 156.449 MiB 0.004 MiB c = cl (*l)
10 52 156.449 MiB 0.000 MiB ret. append (c)11 53 156.449 MiB 0.000 MiB return ret
Listing 10: Results
P. Przymus 20/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
memory profiler – memory usage monitoringPreparation
To track particular functions use profile decorator.Running and plotting
1 mprof run --python python uniwerse .py -f 100 100 -s 100100 10
2 mprof plot
Figure: ResultsP. Przymus 21/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
memory profiler – Debugger trigger
1 eror@eror - laptop :˜$ python -m memory_profiler --pdb -mmem =10uniwerse .py -s 100 100 10
2 Current memory 20.80 MiB exceeded the maximumof 10.00 MiB3 Stepping into the debugger4 > /home/eror/ uniwerse .py (52) connect ()5 -> self.adj. append (n)6 (Pdb)
Listing 11: Debugger trigger – setting debugger breakpoints.
P. Przymus 22/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Tools – objgraph
objgraph – draws Python object reference graphs with graphviz.1 import objgraph2 x = []3 y = [x, [x], dict(x=x)]4 objgraph . show_refs ([y], filename =’sample - graph .png ’)5 objgraph . show_backrefs ([x], filename =’sample -backref - graph .png ’
)
Listing 12: Tutorial example
Figure: Reference graph Figure: Back reference graphP. Przymus 23/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Tools – Heapy/Meliae
HeapyThe heap analysis toolset. It can be used to find information about theobjects in the heap and display the information in various ways.
part of ”Guppy-PE – A Python Programming Environment”
MeliaePython Memory Usage Analyzer
”This project is similar to heapy (in the ’guppy’ project), in its attemptto understand how memory has been allocated.”runsnakerun GUI support.
P. Przymus 24/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Tools – Heapy1 from guppy import hpy2 hp=hpy ()3 h1 = hp.heap ()4 l = [ range (i) for i in xrange (2**10) ]5 h2 = hp.heap ()6 print h2 - h1
Listing 13: Heapy example
1 Partition of a set of 294937 objects . Total size = 11538088bytes .
2 Index Count % Size % Cumulative % Kind ( class / dictof class )
3 0 293899 100 7053576 61 7053576 61 int4 1 1025 0 4481544 39 11535120 100 list5 2 6 0 1680 0 11536800 100 dict (no owner )6 3 2 0 560 0 11537360 100 dict of guppy .etc.
Glue. Owner7 4 1 0 456 0 11537816 100 types . FrameType8 5 2 0 144 0 11537960 100 guppy .etc.Glue.
Owner9 6 2 0 128 0 11538088 100 str
Listing 14: ResultsP. Przymus 25/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Meliae and runsnakerun1 from meliae import scanner2 scanner . dump_all_objects (" representation_meliae .dump")3 # In shell : runsnakemem representation_meliae .dump
Listing 15: Heapy example
Figure: Meliae and runsnakerunP. Przymus 26/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
malloc() alternatives – libjemalloc and libtcmalloc
Pros:In some cases using different malloc() implementation ”may” help toretrieve memory from CPython back to system.
Cons:But equally it may work against you.
1 $LD_PRELOAD ="/usr/lib/ libjemalloc .so .1" pythonint_float_alloc .py
2 $ LD_PRELOAD ="/usr/lib/ libtcmalloc_minimal .so .4" pythonint_float_alloc .py
Listing 16: Changing memory allocator
P. Przymus 27/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
malloc() alternatives – libjemalloc and libtcmalloc
Step malloc jemalloc tcmallocres virt res virt res virt
step 1 7.4M 46.5M 8.0M 56.9M 9.4M 56.1Mstep 2 40.0M 79.1M 41.6M 88.9M 42.5M 89.3Mstep 3 16.2M 55.3M 8.2M 88.9M 42.5M 89.3Mstep 4 40.0M 84.3M 41.5M 100.9M 51.5M 98.4Mstep 5 8.2M 47.3M 8.5M 100.9M 51.5M 98.4M
P. Przymus 28/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Other useful tools
Build Python in debug mode (./configure –with-pydebug . . . ).Maintains list of all active objects.Upon exit (or every statement in interactive mode), print all existingreferences.Trac total allocation.
valgrind – a programming tool for memory debugging, leak detection,and profiling. Rather low level.
CPython can cooperate with valgrind (for >= py-2.7, py-3.2)gdb-heap (gdb extension)
low level, still experimentalcan be attached to running processesmay be used with core file
Web applications memory leaksdowser – cherrypy application that displays sparklines of python objectcounts.dozer – wsgi middleware version of the cherrypy memory leak debugger(any wsgi application).
P. Przymus 29/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
Summary
Summary:Try to understand better underlying memory model.Pay attention to hot spots.Use profiling tools.”Seek and destroy” – find the root cause of the memory leak and fix it ;)
Quick and sometimes dirty solutions:Delegate memory intensive work to other process.Regularly restart process.Go for low hanging fruits (e.g. slots , different allocators).
P. Przymus 30/31
Introduction Basic stuff Notes on memory model Memory profiling tools Summary References
References
Wesley J. Chun, Principal CyberWeb Consulting, ”Python 103...MMMM: Understanding Python’s Memory Model, Mutability, Methods”David Malcolm, Red Hat, ”Dude – Where’s My RAM?” A deep dive intohow Python uses memory.Evan Jones, Improving Python’s Memory AllocatorAlexander Slesarev, Memory reclaiming in PythonSource code of PythonTools documentation
P. Przymus 31/31