Date post: | 10-Jan-2017 |
Category: |
Data & Analytics |
Upload: | akshat-thakar |
View: | 371 times |
Download: | 1 times |
Recommendation Engine
Akshat Thakar
Precursor
Awareness about Analytics • Jargon Buster• Recommendation System for Web/Digital
Analytics• Technology
• Sentiment Analysis
Clustering
• Collaborative-based filtering• Item based• User Based
Recommendation
Similarity Measurement–Pearson, Tanimoto
Algorithm - K-means
Similarity Measurement - Euclidean
Classification NLP
• Content-based filtering
• Regression• Decision Tree• SVM• NN
• Voice Recognition
• Video Analytics
Content Based, Collaborative Filtering[CF] and Hybrid Recommendation System
• Content Based systems focus on properties of items. Similarity of items is determined by measuring the similarity in their properties.
Needs History Data.
• Collaborative-Filtering systems focus on the relationship between users and items. Similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.
Source-http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
How users are similar?
CF - User Similarity
Similarity Notion
User Neighborhood
User BasedRecommender
#1 #2 #3User Id Item Id Rating
Data Model
CF - Item Similarity
How items are similar?
Similarity Notion
Item BasedRecommender#1 #2 #3
User Id Item Id Rating
Data ModelItem-neighborhood
Source-http://www.theregister.co.uk/2006/08/15/beer_diapers/
Similarity Notion
• Pearson Correlation - measures the tendency of the numbers[User Preferences] to move together proportionally. When this tendency is high, the correlation is close to 1
• Spearman Correlation – Rank based on user preference
• Euclidean Distance - based on the distance between users. Smaller the distance, more similarity in users.
• Tanimoto Coefficient – based on number of items in common
• LogLikelihood Similarity
How to code?
How Similarity Definition affects Neighborhood formation?
Source: http://www.slideshare.net/Cataldo/apache-mahout-tutorial-recommendation-20132014Mahout In Action
Threshold based neighborhood
Evaluation• Evaluate Top n Recommendations• Precision and Recall
Relevant Non Relevant
Search Result ShownTrue Positive False Positive
Search result Not Shown False Negative True Negative
Source-https://en.wikipedia.org/wiki/Precision_and_recall
System Solutioning - More than Algorithm Accuracy
• Business Goal Injection• Novelty – avoiding repeated recommendations• Diversity – How diverse are recommended items?
Does it include all sub topics?• Positive Feedback• Negative Feedback
source: http://www.slideshare.net/Zhenv5/diversity-and-novelty-for-recommendation-system
Technology
• Mahout – Hadoop(optional), Java.Lot of stable algorithms.
• RRhadoopLot of Statistics packages.
• SparkEmerging TechnologyAlgorithms are getting added