WTF is a Method Cache?

Post on 10-May-2015

575 views 0 download

Tags:

description

Talk about the method caching patches I wrote that led to jamesgolick ruby. Given at RuPy 2013 in Budapest.

transcript

BitLove

What the fuck is a method cache?

James Golick

Thursday, 17 July, 14

James Golickwriting: http://jamesgolick.com

code: https://github.com/jamesgolickshit talk: https://twitter.com/jamesgolick

podcast: http://realtalk.io

Thursday, 17 July, 14

@jamesgolick

Thursday, 17 July, 14

BitLove

DISCLAIMERC O M P U T O L O G Y A H E A D

Thursday, 17 July, 14

Thursday, 17 July, 14

Big “O” Notation

Thursday, 17 July, 14

stuff.each do |thing| # ...end

Variable Time

Thursday, 17 July, 14

a = stuff.popa = !!astuff.unshift a

Constant Time

Thursday, 17 July, 14

BitLove

What the fuckis a method cache?

Thursday, 17 July, 14

BitLove

1. Background

Thursday, 17 July, 14

struct RClass { struct RClass super; struct st_table m_tbl;};

Thursday, 17 July, 14

Method Resolution

Thursday, 17 July, 14

class A def a puts 'Hi!' endend

class B < A; endclass C < B; endclass D < C; endclass E < D; endclass F < E; end

F.new.a

Thursday, 17 July, 14

rb_method_entry_tvm_resolve_method(struct RClass klass, symbol_t method_name){ rb_method_entry_t ent = st_lookup(klass.method_tbl, method_name);

if (ent) { return ent; } else { if (klass->super) { return vm_resolve_method(klass.super, method_name); } else { return NULL; } }}

Thursday, 17 July, 14

Module Inclusion

Thursday, 17 July, 14

module A def a "Hello, World!" endend

module B; include A; endmodule C; include B; end

class D include Cend

D.new.a

Thursday, 17 July, 14

module A ICLASS ICLASS ICLASS

module B ICLASS ICLASS

module C ICLASS

class D

Thursday, 17 July, 14

irb> ActiveRecord::Base.included_modules.length=> 71

Thursday, 17 July, 14

Summary

• Methods are stored in a hashtable on the class where they’re defined.

• Method resolution is a variable time algorithm whose complexity depends on the depth of your class hierarchy.

• Module inclusion substantially increases the depth of your class hierarchy, especially if those modules themselves include modules.

• Method resolution is expensive.

Thursday, 17 July, 14

BitLove

What the fuck is a method cache?

Thursday, 17 July, 14

BitLove

2. Method Cachingin the pre-

jamesgolick era

Thursday, 17 July, 14

Instruction Caches

Thursday, 17 July, 14

static uint global_vm_state = 0;

Thursday, 17 July, 14

struct inline_cache { struct RClass klass; uint vm_state; rb_method_entry_t me;}

Thursday, 17 July, 14

rb_method_entry_tvm_search_method(struct RClass klass, rb_symbol_t method_name, struct inline_cache ic){ rb_method_entry_t me; if (is_valid_cache_entry(ic, cache)) { me = ic.me; } else { me = vm_resolve_method(klass, method_name); ic.me = me; ic.vm_state = GET_VM_STATE(); ic.klass = klass; } return me;}

Thursday, 17 July, 14

intis_valid_cache_entry(struct inline_cache ent, struct RClass klass){ return ent.klass == klass && ent.vm_state = GET_VM_STATE();}

Thursday, 17 July, 14

Global Method Cache

Thursday, 17 July, 14

instruction cache

instruction cache

instruction cache

global cache

method resolution

Thursday, 17 July, 14

instruction cache

instruction cache

instruction cache

global cache

method resolution

Thursday, 17 July, 14

struct method_cache_entry { struct RClass klass; uint vm_state; rb_method_entry_t me;}

Thursday, 17 July, 14

#define METHOD_CACHE_SIZE 2048

static struct rb_method_cache_entry method_cache[METHOD_CACHE_SIZE];

Thursday, 17 July, 14

rb_method_entry_t *vm_resolve_method(struct RClass *klass, symbol_t method_name){ struct method_cache_entry ent; rb_method_entry_t *me; ent = method_cache[method_name % METHOD_CACHE_SIZE]; if (is_valid_cache_entry(ent, klass)) { me = cache_entry.me; } else { me = vm_resolve_method_without_cache(klass, method_name); cache_entry.me = me; cache_entry.vm_state = GET_VM_STATE(); cache_entry.klass = klass; } return me;}

Thursday, 17 July, 14

intis_valid_cache_entry(struct method_cahe_entry ent, struct RClass klass){ return ent.klass == klass && ent.vm_state = GET_VM_STATE();}

Thursday, 17 July, 14

Cache Invalidation

Thursday, 17 July, 14

static uint64_t global_vm_state = 0;

#define INC_VM_STATE global_vm_state++

voidrb_define_method(struct RClass *klass, symbol_t name, rb_method_entry_t *me){ // ... INC_VM_STATE(); // ...}

Thursday, 17 July, 14

Defining methods.

Aliasing methods.

Removing methods.

Setting or removing constants.

Defining a class.

Defining a module.

Including a module.

things that bust the cache

Thursday, 17 July, 14

Extending a module.

Using a refinement. (Ruby 2.0)

Garbage collecting a class.

Garbage collecting a module.

Changing the visibility of a constant.

Marshal loading an extended constant.

Autoload.

Built-in non-blocking IO methods.

things that bust the cache

Thursday, 17 July, 14

OpenStruct instantiation.

things that bust the cache

Thursday, 17 July, 14

Summary

• Method resolutions are cached in two places.

• Instruction caches are structs attached to the send instruction.

• The global method cache is a hash table fixed at 2048 entries with no collision semantics and a random eviction policy.

• Method cache entries are valid if their `vm_state` property is the same as the current value of the `global_vm_state` counter.

Thursday, 17 July, 14

Summary

• Method cache invalidation is always global, and happens frequently in most ruby code.

• Method cache invalidation is constant time.

Thursday, 17 July, 14

Numbers

Thursday, 17 July, 14

BitLove

3. jamesgolick Method Caching

Thursday, 17 July, 14

struct RClass { struct RClass super; struct st_table m_tbl; struct st_table mc_tbl; uint64_t seq; subclass_list_entry_t subclasses;};

Thursday, 17 July, 14

static uint64_t rb_vm_sequence = 0;

#define NEXT_SEQ() ++rb_vm_sequence

Thursday, 17 July, 14

struct RClassclass_alloc(...){ struct RClass klass; // ... klass.seq = NEXT_SEQ(); // ... return klass;}

Thursday, 17 July, 14

struct inline_cache { uint64_t seq; rb_method_entry_t me;}

Thursday, 17 July, 14

rb_method_entry_t *vm_search_method(struct RClass klass, rb_symbol_t method_name, struct inline_cache ic){ rb_method_entry_t me; if (ic.seq == klass.seq) { me = ic.me; } else { me = vm_resolve_method(klass, method_name); ic.me = me; ic.seq = klass.seq; } return me;}

Thursday, 17 July, 14

struct method_cache_entry { uint64_t seq; rb_method_entry_t me;}

rb_method_entry_t *vm_resolve_method(struct RClass klass, symbol_t method_name){ struct method_cache_entry ent; rb_method_entry_t me; ent = vm_get_method_cache_entry(klass, method_name); if (ent.seq == seq) { me = cache_entry.me; } else { me = vm_resolve_method_without_cache(klass, method_name); cache_entry.me = me; cache_entry.seq = klass.seq; } return me;}

Thursday, 17 July, 14

voidrb_clear_cache_by_class(struct RClass klass){ subclass_list_entry_t ent; klass.seq = NEXT_SEQ(); ent = klass.subclasses; while(ent != NULL) { rb_clear_cache_by_class(ent.klass); ent = ent.next; }}

Thursday, 17 July, 14

Object

User

ActionController::Base

ActiveRecord::Base

UsersController

SessionsController Group

Thursday, 17 July, 14

Object

User

ActionController::Base

ActiveRecord::Base

UsersController

SessionsController Group

Thursday, 17 July, 14

Summary

• Both types of method cache entries now only need to store a seq and method entry.

• Method caches are now stored with the RClass structs and are !effectively" unbounded in size.

• Each RClass has a globally unique 64bit identifier.

• Method cache entries are tagged with the sequence of their target klass at the time the cache entry was filled.

Thursday, 17 July, 14

Summary

• Entries are valid if their filled entry sequence is the same as the current sequence identifier of the klass that is the target of the invocation.

• Method caches are invalidated by assigning a new sequence value to a klass.

• When changes are made to a klass, we traverse all of its descendents and assign them new sequence values.

• This traversal is unfortunately a variable time algorithm, and can be quite expensive.

Thursday, 17 July, 14

BitLove

4. rvm install jamesgolick

Thursday, 17 July, 14

Dat Patch

• Top-down class hierarchy tracking.

• Class#subclasses

• Module#included_in

• Possible future bug fixes.

• Hierarchical method cache invalidation.

• Method cache instrumentation.

Thursday, 17 July, 14

Instrumentation

• RubyVM::MethodCache.hits

• RubyVM::MethodCache.misses

• RubyVM::MethodCache.miss_time

• RubyVM::MethodCache.invalidation_time

Thursday, 17 July, 14

Get The Code

• rvm install jamesgolick

• git clone git://github.com/jamesgolick/ruby.git

• https://github.com/jamesgolick/ruby

Thursday, 17 July, 14

Questions?

Thursday, 17 July, 14