Date post: | 11-Apr-2017 |
Category: |
Technology |
Upload: | codemotion |
View: | 51 times |
Download: | 2 times |
A recommendation engine for your apps
Definition: a system that help people finding things when the process of finding what you need is challenging because you have a lot of choices/alternatives
So… it’s a search engine!
Search Engines
Document base is (almost) static
Queries are dynamic
Search Engines
Create an index analysing the documents
Calculate relevance for a query: tf*idf
Recommender systems
Document base is growing (eg: Netflix)
Query is static: find something I like
Classification
Domain: news, products, …
Helps defining what can be suggested
Purpose: sales, information, education, build a community
What is TripAdvisor purpose?
Personalisation levels
• Non personalised: best sellers
• Demographic: age, location
• Ephemeral: based on current activities
• Persistent
Types of input
• Explicit: ask user to rate something
• Implicit: inferred from user behaviour
Output
• Prediction: predicted rating, evaluation
• Recommendations: suggestion list, top-n, offers, promotion
• Filtering: email filters, news articles
A model for comparison
User: people with preference
Items: subject of rating
Rating: expression of opinion
(Community: space where opinions makes sense)
Non personalised
Best seller
Most popular
Trending
Summary of community ratings: eg best hotel in town
Hotel
Visitor Hotel
Visitor Hotel
Hotel A Hotel B Hotel C
John 3 5
Jane 3
Fred 1 0
Tom 4
AVG 3.5 3 0
Content based
User rate items
We build a model of user preference
Look for similar items based on the model
Action 0.7
Sci Fi 3.2
Vin Diesel 1.2
… …
https://www.amazon.com/Relevant-Search-applications-Solr-Elasticsearch/dp/161729277Xhttp://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine
Problems/Limitations
Need to know items content
User cold start: time to learn important features for the user
What if user interest change?
Lack of serendipity: accidentally discover something you like
Collaborative filtering
No need to analyse (index) content
Can capture more subtle things
Serendipity
User-User
Select people of my neighbourhood with similar taste. If other people share my taste I want their opinion combined
E.T
2 4Joe 2 2 3 ?
1 55 2 4 …
Tom 3 3 2
4 1
User-User: which users have similar tastes?
E.T
2 4Joe 2 2 3 ?
1 55 2 4 …
Tom 3 3 2
4 1
User-User: which users have similar tastes?
Item-Item
Find an items where I have expressed an opinion and look how other people felt about it. Precompute similarities between items
E.T
2 4Joe 2 2 3 ?
1 55 2 4 …
Tom 3 3
4 1
Item-Item: which item are similar?
Problems/Limitations
Sparsity
When recommending from a large item set, users will have rated only some of the items
User Cold start
Not enough known about new user to decide who is similar
Item cold start
Cannot predict ratings for new item till some similar users have rated it [No problem for content-based]
Scalability
With millions of ratings, computations become slow
Dimensionality reduction FTW!
An example
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
How similar are Joe and Tom? How similar are Joe and Bob?
Only consider items both users have rated
For each item - Compute difference in the users’ ratings - Take the average of this difference over the items
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
Sim(Joe, Tom) = (|8-2| + |2-7| + |7-5|)/3 = 13/3 = 4.3
Sim(Joe, Alice) = (|8-5| + |1-4| + |2-4| + |7-7|)/4 = 2
Sim(Joe, Bob) = (|8-7| + |1-1| + |2-3| + |7-8|)/4 = 0.75
Item1 Item2 Item3 Item4 Item5
Joe 8 1 ? 2 7
Tom 2 ? 5 7 5
Alice 5 4 7 4 7
Bob 7 1 7 3 8
Similarity
Bob 0.75
Alice 2
Tom 4.3
D = 1 / 1 + d
Similarity
Bob 1.57
Alice 0.33
Tom 0.18
D = 1 / 1 + d
Recommend what similar user have rated highly
To calculate rating of an item to recommend, give weight to each user’s recommendations based on how similar they are to you.
Rating(Joe, Item3) = (1.57 * 7 + 0.33 * 7 + 0.18 * 5) / 3
10.99 + 2.31 + 0.9 / 3 = 4.3
Similarity
Bob 1.57
Alice 0.33
Tom 0.18
use entire matrix or
use a K-nn algorithm: people who historically have the same tastes as me
aggregate using weighted sum
weights depends on similarity
Cosine similarity
[3,5]
[2,7]
[0,0]
Our domain
Domain: online book shop, both paper and digital
Recommend titles, old and new
- Who bought this also bought
- You might like
Choosing the tool
PredictionIO
Under the Apache umbrella
Based on solid open source stack
Customisable templates engines
SDK for PHP
Installation
http://actionml.com/docs/pio_by_actionml
Pre-baked Amazon AMIs
Installation via source code
http://predictionio.incubator.apache.org/install/install-sourcecode/
You can choose storage
mysql/postgres vs elasticsearch+hbase
The event server
Pattern: user -- action -- item
User 1 purchased product X
User 2 viewed product Y
User 1 added product Z in the cart
$ pio app new MyApp1
[INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [App$] Created new app: [INFO] [App$] Name: MyApp1 [INFO] [App$] ID: 1 [INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F
$ pio eventserver
Server runs on port 7070 by default
$ curl -i -X GET http://localhost:7070
{“status":"alive"}
$ curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY"
Events modeling
what can/should we model?
rate, like, buy, view, depending on the algorithm
setUser($uid, array $properties=array(), $eventTime=null)
unsetUser($uid, array $properties, $eventTime=null)
deleteUser($uid, $eventTime=null)
setItem($iid, array $properties=array(), $eventTime=null)
unsetItem($iid, array $properties, $eventTime=null)
deleteItem($iid, $eventTime=null)
recordUserActionOnItem($event, $uid, $iid, array $properties=array(), $eventTime=null)
createEvent(array $data)
getEvent($eventId)
Engines
D.A.S.E Architecture
Data Source and Preparation
Algorithm
Serving
Evaluation
$ pio template get apache/incubator-predictionio-template-recommender MyRecommendation
$ cd MyRecommendation
engine.json
"datasource": { "params" : { "appName": “MyApp1”, "eventNames": [“buy”, “view”] } },
$ pio build —verbose
$ pio train
$ pio deploy
Getting recommendations
Implementation
2 kind of suggestions
- who bought this also bought (recommendation)
- you may like (similarities)
View
Like (add to basket, add to wishlist)
Conversion (buy)
Recorded in batch
4 engines
2 for books, 2 for ebooks
(not needed now)
Retrained every night with new data
recordLike($user, array $item)
recordConversion($user, array $item)
recordView($user, array $item)
createUser($uid)
getRecommendation($uid, $itype, $n = self::N_SUGGESTION)
getSimilarity($iid, $itype, $n = self::N_SUGGESTION)
user cold start/item cold start
if we don’t get enough suggestion switch to non personalised (also for non logged users)
Alternative approaches
https://github.com/grahamjenson/list_of_recommender_systems
https://neo4j.com/developer/guide-build-a-recommendation-engine/
Michele Orselli CTO@Ideato
_orso_
micheleorselli / ideatosrl
Links• http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-
engine-an-example-of-a-product-recommendation-engine?next_slideshow=1
• https://www.coursera.org/learn/recommender-systems-introduction
• http://actionml.com/
• https://github.com/grahamjenson/ger