+ All Categories
Home > Technology > Python and Ruby VMs

Python and Ruby VMs

Date post: 06-May-2015
Category:
Upload: dmitri-babaev
View: 1,684 times
Download: 5 times
Share this document with a friend
Description:
Moscow Big Systems/Big Data, April 2013 meetup presentation slides
21
Python and Ruby VMs CPython and Matz's Ruby Implementation details
Transcript

Python and Ruby VMsCPython and Matz's Ruby Implementation

details

Why should you care about Ruby

● Opscode Chef● Puppet● VMware Cloud Foundry● Red Hat OpenShift● Redmine

Why should you care about Python

● OpenStack● Mercurial● Bazaar

Matz's Ruby Implementation (MRI) / Yet another Ruby VM (YARV)

ruby-lang.org

Matz's Ruby Implementation (MRI) / Yet another Ruby VM (YARV) outline● Memory management

○ Automatic, full heap mark-sweep GC● Execution model

○ Bytecode interpretation (stack machine) from 1.9 (YARV)

○ Direct AST interpretation before 1.9 (MRI)● Concurrency

○ Multi-threaded, one active interpreter thread at time○ Green threads before 1.9 (MRI), OS level threads in

1.9 (YARV)● Method calls

○ Late binding, search for method in class dict by name

Typical interpreter execution model

Script.........

If

a=1 a=2

ParsingBytecode generation

Interpreter thread stacks

Heap

...Instruction aInstruction bInstruction c

...

Currently executed instruction

AST

GIL ownership diagram

Thread 1

Thread 2

GIL state

Interpreting IO Waiting

Owned by Thread 1

IO Interpreting

Owned by Thread 2Free

InterpretingIO

Owned by Thread 1

IO Waiting

MRI memory allocation diagram

Object pool 1

Object pool 2

Heap

RArray data RString data

Free list 1 Free list 2

MRI memory allocation

● Any ruby object is allocated on heap (even local variables)

● SLAB like allocation for Ruby objects○ C union is used, hence all objects are of the same

size (40 bytes)○ unlike typical SLAB allocator there is only one size of

objects to store● RString, RArray, RHash, etc. have a pointer

on external memory block containing the actual contents

MRI memory allocation (continue)

● External memory block for string or array is allocated using plain malloc

● String content can be shared between several objects (copy on write)

● 1.9 changes: small strings (23 bytes or less) are embedded into RString structure rather than allocated externally

MRI GC

● If there is no free slot for an object GC is run○ If there is still no free slot new slab (pool) is allocated

■ Unlike Java GC is not triggered only when all heap is utilized

● Stop the world mark-sweep GC○ Unlike Java or .NET there is no generations

MRI GC (continue)

● 1.9.3 changes: lazy sweep GC○ "In Lazy sweeping, each invocation of the object

allocation sweeps the heap until it finds an appropriate free object"■ i. e. just search for object marked as dead

instead of building free lists● 2.0 changes

○ Instead of marking live objects with FL_MARK flag external bitmap is created■ This allows to avoid excessive copies of memory

regions in forked processes

MRI Links

● Threads in Ruby discussion: http://stackoverflow.com/questions/56087/does-ruby-have-real-multithreading

● MRI GC slides: http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/

CPythonpython.org

CPyton VM outline

● Memory management○ Automatic, reference counting

● Execution model○ Bytecode interpretation (stack machine)○ Maps, lists, tuples are created and managed by

bytecode instructions● Concurrency

○ Multi-threaded, one active interpreter thread at time● Method calls

○ Late binding, search for method in class dict by name

Python GC

● CPython uses reference counting to track object visibility○ Python uses global interpreter lock in order to avoid

synchronization on each reference operation● Cyclic references

○ Example: l = []; l.append(l); del l○ Cyclic references are only possible for "container"

objects● The GC for cyclic references has been

included since version 2.2 and is enabled by default

Search for cyclic references in CPython (generations)

● The GC classifies objects into three generations depending on how many collection sweeps they have survived○ New objects are placed in the youngest generation

(generation 0)○ If an object survives a collection it is moved into the

next older generation○ Since generation 2 is the oldest generation, objects

in that generation remain there after a collection

Search for cyclic references in CPython (activation)

● When the number of allocations minus the number of deallocations exceeds first threshold (gc.get_threshold), collection starts○ Initially only generation 0 is examined○ If generation 0 has been examined more than

second threshold times since generation 1 has been examined, then generation 1 is examined as well

○ Third threshold controls the number of collections of generation 1 before collecting generation 2

Objects with __del__ method in reference cycle

● Which __del__ method for two objects in cycle should be called first?○ After calling the first finalizer the object cannot be

freed as the second finalizer still may access it● Cycles that are referenced from objects with

finalizers are added to a global list of uncollectable garbage (gc.garbage)○ The program can access the global list and free

cycles in a way that makes sense for application

CPython links

● Python GC description: http://arctrix.com/nas/python/gc/

● GC module documentation: http://docs.python.org/2/library/gc.html

● Python method call description: http://css.dzone.com/articles/python-internals-how-callables-0


Recommended