+ All Categories
Home > Technology > Acts As Recommendable

Acts As Recommendable

Date post: 19-May-2015
Category:
Upload: maccman
View: 1,495 times
Download: 0 times
Share this document with a friend
Description:
RubyManor talk on using Recommendation systems in production.
Popular Tags:
53
Recommendations in Production Alex MacCaw
Transcript
Page 1: Acts As Recommendable

Recommendations in Production

Alex MacCaw

Page 2: Acts As Recommendable

Netflix Prize

Page 3: Acts As Recommendable

Amazon.comFacebookLast.fmStumbleUpon

Google Suggest

iTunes

Rotten Tomatoes

Yelp

Page 4: Acts As Recommendable

Google Search

Page 5: Acts As Recommendable

Chicken or Egg

Page 6: Acts As Recommendable

• Google Reader

• IMDB

Page 7: Acts As Recommendable

Acts As Recommendable

Page 8: Acts As Recommendable

Types of recommendations

• Content Based

• User Based

• Item Based

Page 9: Acts As Recommendable
Page 10: Acts As Recommendable
Page 11: Acts As Recommendable

Programming Collective Intelligence

Page 12: Acts As Recommendable

Has Many Through Relationship

Page 13: Acts As Recommendable

User Book

UserBooks

Has Many Has Many

Has Many Through

Can have score (rating)

Page 14: Acts As Recommendable

User

class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_booksend

Page 15: Acts As Recommendable

Gives you

User#similar_usersUser#recommended_booksBook#similar_books

Page 16: Acts As Recommendable

The algorithms

• Manhattan Distance

• Euclidean distance

• Cosine

• Pearson correlation coefficient

• Jaccard

• Levenshtein

Page 17: Acts As Recommendable

How does it work?

Page 18: Acts As Recommendable

Strategy

• Map data into Euclidean Space

• Calculate similarity

• Use similarities to recommend

Page 19: Acts As Recommendable

The Black Knight

John Tucker Must Die

James 4 5

Jonah 3 2

George 5 3

Alex 4 2

Page 20: Acts As Recommendable

0

1.25

2.50

3.75

5.00

0 1.25 2.50 3.75 5.00

The Black Knight

John Tucker Must Die

Page 21: Acts As Recommendable

0

1.25

2.50

3.75

5.00

0 1.25 2.50 3.75 5.00

The Black Knight

John Tucker Must Die

Page 22: Acts As Recommendable

item id

user id

score

{ 1 => { 1 => 1.0, 2 => 0.0, ... }, ...}

Page 23: Acts As Recommendable

[[1, 0.5554], [2, 0.888], [3, 0.8843], ...]

Page 24: Acts As Recommendable

Problem 1

It was far too slow to calculate on the fly(obvious)

Page 25: Acts As Recommendable

SELECT * FROM "users" WHERE ("users"."id" = 2) SELECT * FROM "books" SELECT * FROM "users" SELECT "user_books".* FROM "user_books" WHERE ("user_books".user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM "books" WHERE ("books"."id" IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM "books" WHERE ("books"."id" IN (20,3,19,6))

All books All user_books

Page 26: Acts As Recommendable

Solution

Cache the dataset

rake recommendations:build

Build offline

Page 27: Acts As Recommendable

SELECT * FROM "user_books" WHERE ("user_books".user_id = 2) SELECT * FROM "books" WHERE ("books"."id" = 5) SELECT * FROM "books" WHERE ("books"."id" = 4) SELECT * FROM "books" WHERE ("books"."id" = 8) SELECT * FROM "books" WHERE ("books"."id" = 7) SELECT * FROM "books" WHERE ("books"."id" = 2) SELECT * FROM "books" WHERE ("books"."id" = 1)

Page 28: Acts As Recommendable

Problem 2

Fetching the dataset took too long since it was so massive

Page 29: Acts As Recommendable

Solution

Split up the cache by item

Page 30: Acts As Recommendable

Rails.cache.write("aar_books_1", scores

)

Page 31: Acts As Recommendable

Problem 3

The dataset was so big it crashed Ruby!

Page 32: Acts As Recommendable

Solution

Get rid of ActiveRecord

Only deal with integers

Page 33: Acts As Recommendable

items = options[:on_class].connection.select_values("SELECT id from #{options[:on_class].table_name}").collect(&:to_i)

Page 34: Acts As Recommendable

Problem 4

It still crashed Ruby!

Page 35: Acts As Recommendable

{ 1 => { 1 => 1.0, 2 => 0.0, ... }, ...}

Page 36: Acts As Recommendable

Solution

Remove unnecessary cruft from dataset

Page 37: Acts As Recommendable

{ 1 => { 1 => 1.0, ... }, ...}

Page 38: Acts As Recommendable

Problem 5

It was too slow

Page 39: Acts As Recommendable

Solution

Re-write the slow bits in C

Page 40: Acts As Recommendable

Details

• RubyInline

• Implemented Pearson

• Monkey patched original Ruby methods

• Very fast

Page 41: Acts As Recommendable

Ruby Object

InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include "ruby.h" double c_sim_pearson(VALUE items) {

Page 42: Acts As Recommendable

No Floats :(

InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include "ruby.h" double c_sim_pearson(VALUE items) {

Page 43: Acts As Recommendable

Hash Lookup

if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }

Page 44: Acts As Recommendable

Conversion

return num / den;

Page 45: Acts As Recommendable

Design Designs

• Not too many relationships

• Not to many ‘items’

• Similarity matrix for items, not users

Page 46: Acts As Recommendable

Changing data

Page 47: Acts As Recommendable

Scaling Even Further

• K Means clustering

• Split cluster by category

Page 48: Acts As Recommendable

Adding ratingsActiveRecord::Schema.define(:version => 1) do create_table "books", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" end create_table "user_books", :force => true do |t| t.integer "user_id", :null => false t.integer "book_id", :null => false t.integer "rating", :default => 0 end create_table "users", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" endend

Page 49: Acts As Recommendable

class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :ratingend

Page 50: Acts As Recommendable

That’s it

Page 51: Acts As Recommendable

Improvements?

• Better API

• Perform calculations over a cluster (EC2) using Map/Nanite

Page 52: Acts As Recommendable

class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) endend

Page 53: Acts As Recommendable

http://eribium.org/blog

twitter : maccmanemail/jabber: [email protected]

Questions?

http://rubyurl.com/kUpk

http://github.com/maccman/acts_as_recommendable


Recommended