HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Post on 10-May-2015

2,057 views 1 download

Tags:

description

Presented by: Jonathan Natkins (WibiData) and Juliet Hougland (WibiData)

transcript

Real-­‐Time  Model  Scoring  in  Recommender  Systems  

(c)  2013  WibiData,  Inc.  

   Juliet  Hougland  and  Jonathan  Natkins  

Why  Real-­‐Time?  

Who  Are  We?  •  Jon  "Na@y"  Natkins  (@na@yice)  •  Field  Engineer  at  WibiData  •  Before  that,  Cloudera  SoJware  Engineer  •  Before  that,  VerMca  SoJware/Field  Engineer  •  Juliet  Hougland  (@JulietHougland)  •  PlaPorm  Engineer  at  WibiData  •  MS  in  Applied  Math  and  BA  in  Math-­‐Physics  

What  is  Kiji?  The  Kiji  Project  is  a  modular,  open-­‐source  framework  that  enables  developers  and  analysts  to  collect,  analyze  and  use  data  in  real-­‐Mme  applicaMons.  •  kiji.org  •  github.com/kijiproject  

Genera<ng  Recommenda<ons  

Genera<ng  Recommenda<ons  

Modeling  with  KijiMR  Producers  •  Operates  on  a  single  row  in  a  table.  •  Generate  derived  data:  

o  Apply  a  classifier  o  Assign  a  user  to  a  cluster  or  segment  o  Recommend  new  items  

Gatherers  •  Mapper  with  KijiTable  input.  •  Used  when  training  models.  

Genera<ng  Recommenda<ons  

Genera<ng  Recommenda<ons  

Genera<ng  Recommenda<ons  

Batch  Isn't  Good  For  Everything  

Batch  Isn't  Good  For  Everything  

Batch  Isn't  Good  For  Everything  

Fresheners  Compute  Lazily  

Freshness  Policy  

Read  a  column  

Get  from  HBase  

Fresh?  Yes,  return  to  client  

KijiScoring  API   HBase  

Fresheners  Compute  Lazily  

Freshness  Policy  

Read  a  column  

Get  from  HBase  

Fresh?  

Yes,  return  to  client  

KijiScoring  API   HBase  

Producer  Freshen  

Cache  for  next  Mme  

How  can  we  make  "freshenable"  models?  

Population interests change slowly

Individual interests change quickly

How  can  we  make  "freshenable"  models?  

Population interests change slowly

Individual interests change quickly

Models  don't  need  to  retrained  frequently  

ApplicaMon  of  a  model  should  be  fast  

How  can  we  make  "freshenable"  models?  

Individual interests change quickly

ApplicaMon  of  a  model  should  be  fast  •  Train  a  model  over  your  

enMre  data  set  •  Save  fi@ed  model  

parameters  to  a  file,  or  another  table  

•  Access  the  model  parameters  through  a  KeyValueStore  when  scoring  new  data  with  a  producer.  

More  Modeling  with  KijiMR  KeyValueStores  •  Allows  access  to  external  data  in  Producers  and  

Gatherers.  •  Supports  various  file  formats  as  well  as  tables.  •  Makes  joining  dataset  together  very  easy.  •  The  mechanism  for  accessing  fi@ed  model  

parameters  when  freshening.  

•  A real-time product recommendation system •  Content-based model using product

descriptions and TF-IDF

KijiShopping  

Users KijiShopping Web Application

KijiSchema Avro, HBase

KijiMR MapReduce KijiScoring

KijiShopping  Data  Collec<on  

•  User Logins •  Product Information

o  Names, descriptions, SKU information

•  User Ratings o  Explicit ratings from users

How do we go from data to recommendations?

Finding  Useful  Features  

•  TF-IDF

TF-­‐IDF  

•  Term Frequency o  How often does this term appear in this document?

•  Document Frequency o  How many documents does this term appear in?

•  TF-IDF o  How important is this term to this document?

•  In KijiShopping, each is a separate job

•  Written as a Producer o  Executed on the Product table as a Map-only job o  WordCount on a per-record basis

Compu<ng  Term  Frequency  

HBase

Read Product Description

Count Words in Product Description

Write Word Counts Back

•  Written as a Gatherer o  Executed on the Product table as a MapReduce job o  Groups by words

Compu<ng  Document  Frequency  

HBase

Read Term Frequencies Map

Emit (Word, 1)

Write Document Frequencies

HDFS Reduce

Group By Word

•  Written as a Producer o  Executed on the Product table as a Map-only job o  Pulls in Document Frequencies as a KVStore

Compu<ng  TF-­‐IDF  

HBase

Read Term Frequencies

Divide TF by DF

Write TF-IDFs Back

HDFS

Read Document

Frequencies via KVStore

•  Batch training process •  Associations stored in a model table

Associa<ng  Words  with  Products  

gourmet

knife

"gourmet" Products

"knife" Products

tfidfgourmet

tfidfknife

Determine  a  User's  Preferred  Words  

•  Stored in a user table

Natty

gourmet

knife

wgourmet

wknife

•  Producers incorporate models using KeyValueStores

Combining  User  Ra<ngs  and  Models  

Natty

gourmet

knife

"gourmet" Products

"knife" Products

wgourmet

wknife

tfidfgourmet

tfidfknife

Genera<ng  a  Recommenda<on  

•  Pick the best products for your user

KijiShopping  

The  model  was  built  with  KijiMR-­‐  an  extension  of  Hadoop  MapReduce.      

KijiShopping  

The  model  was  built  with  KijiMR-­‐  an  extension  of  Hadoop  MapReduce.      

KijiExpress  Modeling  Lifecycle  

Want  to  know  more?  

•  The Kiji Project o  kiji.org o  github.com/kijiproject

•  KijiShopping o  github.com/wibidata/kiji-shopping

Questions about this presentation? o  juliet@wibidata.com o  natty@wibidata.com

Want  to  know  more?  

•  Come see us at the WibiData booth

•  Join us at KijiCon tomorrow