HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

transcript

Real-‐Time Model Scoring in Recommender Systems

Juliet Hougland and Jonathan Natkins

Why Real-‐Time?

Who Are We? •  Jon "Na@y" Natkins (@na@yice) •  Field Engineer at WibiData •  Before that, Cloudera SoJware Engineer •  Before that, VerMca SoJware/Field Engineer •  Juliet Hougland (@JulietHougland) •  PlaPorm Engineer at WibiData •  MS in Applied Math and BA in Math-‐Physics

What is Kiji? The Kiji Project is a modular, open-‐source framework that enables developers and analysts to collect, analyze and use data in real-‐Mme applicaMons. •  kiji.org •  github.com/kijiproject

Genera<ng Recommenda<ons

Modeling with KijiMR Producers •  Operates on a single row in a table. •  Generate derived data:

o  Apply a classifier o  Assign a user to a cluster or segment o  Recommend new items

Gatherers •  Mapper with KijiTable input. •  Used when training models.

Genera<ng Recommenda<ons

Batch Isn't Good For Everything

Fresheners Compute Lazily

Freshness Policy

Read a column

Get from HBase

Fresh? Yes, return to client

KijiScoring API HBase

Fresheners Compute Lazily

Freshness Policy

Read a column

Get from HBase

Fresh?

Yes, return to client

KijiScoring API HBase

Producer Freshen

Cache for next Mme

How can we make "freshenable" models?

Population interests change slowly

Individual interests change quickly

Population interests change slowly

Models don't need to retrained frequently

ApplicaMon of a model should be fast

ApplicaMon of a model should be fast •  Train a model over your

enMre data set •  Save fi@ed model

parameters to a file, or another table

•  Access the model parameters through a KeyValueStore when scoring new data with a producer.

More Modeling with KijiMR KeyValueStores •  Allows access to external data in Producers and

Gatherers. •  Supports various file formats as well as tables. •  Makes joining dataset together very easy. •  The mechanism for accessing fi@ed model

parameters when freshening.

•  A real-time product recommendation system •  Content-based model using product

descriptions and TF-IDF

KijiShopping

Users KijiShopping Web Application

KijiSchema Avro, HBase

KijiMR MapReduce KijiScoring

KijiShopping Data Collec<on

•  User Logins •  Product Information

o  Names, descriptions, SKU information

•  User Ratings o  Explicit ratings from users

How do we go from data to recommendations?

Finding Useful Features

•  TF-IDF

TF-‐IDF

•  Term Frequency o  How often does this term appear in this document?

•  Document Frequency o  How many documents does this term appear in?

•  TF-IDF o  How important is this term to this document?

•  In KijiShopping, each is a separate job

•  Written as a Producer o  Executed on the Product table as a Map-only job o  WordCount on a per-record basis

Compu<ng Term Frequency

Read Product Description

Count Words in Product Description

Write Word Counts Back

•  Written as a Gatherer o  Executed on the Product table as a MapReduce job o  Groups by words

Compu<ng Document Frequency

Read Term Frequencies Map

Emit (Word, 1)

Write Document Frequencies

HDFS Reduce

Group By Word

•  Written as a Producer o  Executed on the Product table as a Map-only job o  Pulls in Document Frequencies as a KVStore

Compu<ng TF-‐IDF

Read Term Frequencies

Divide TF by DF

Write TF-IDFs Back

Read Document

Frequencies via KVStore

•  Batch training process •  Associations stored in a model table

Associa<ng Words with Products

gourmet

"gourmet" Products

"knife" Products

tfidfgourmet

tfidfknife

Determine a User's Preferred Words

•  Stored in a user table

gourmet

wgourmet

wknife

•  Producers incorporate models using KeyValueStores

Combining User Ra<ngs and Models

gourmet

"gourmet" Products

"knife" Products

wgourmet

wknife

tfidfgourmet

tfidfknife

Genera<ng a Recommenda<on

•  Pick the best products for your user

KijiShopping

The model was built with KijiMR-‐ an extension of Hadoop MapReduce.

KijiShopping

The model was built with KijiMR-‐ an extension of Hadoop MapReduce.

KijiExpress Modeling Lifecycle

Want to know more?

•  The Kiji Project o  kiji.org o  github.com/kijiproject

•  KijiShopping o  github.com/wibidata/kiji-shopping

Questions about this presentation? o  juliet@wibidata.com o  natty@wibidata.com

Want to know more?

•  Come see us at the WibiData booth

•  Join us at KijiCon tomorrow

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Technology